Amazon Web Services (AWS) has announced the expansion of its Amazon Transcribe product, which enables speech recognition for 100 languages, with a range of new artificial intelligence capabilities for users. This news was shared during the AWS re: Invent event, […]
Amazon Web Services (AWS) has announced the expansion of its Amazon Transcribe product, which enables speech recognition for 100 languages, with a range of new artificial intelligence capabilities for users. This news was shared during the AWS re: Invent event, indicating that Amazon Transcribe can now recognize a greater number of spoken languages and allow for phone call transcription. AWS users can utilize Transcribe to add speech-to-text conversion capabilities in their applications on the AWS Cloud.
According to a company blog post, Transcribe has been trained on “millions of hours of unlabeled audio data across over 100 languages” and utilizes self-supervised learning algorithms to learn patterns of human speech in different languages and accents. AWS has ensured that certain languages are not excessively represented in the training data set to ensure accuracy for less commonly used languages. Previously, Amazon Transcribe supported 79 languages. According to AWS data, Amazon Transcribe achieves accuracy between 20 and 50 percent for most languages. Additionally, it offers automatic punctuation insertion, custom vocabulary, automatic language identification, and custom vocabulary filtering. It also recognizes speech in audio and video formats, as well as in noisy environments.
The Verge reached out to AWS for information on previous accuracy and the basic models used by Amazon Transcribe. With improved language recognition, AWS has highlighted that better accuracy translates to its Call Analytics platform, which is frequently used by contact center users. Amazon Transcribe Call Analytics, which also operates on generative AI models, summarizes interactions between agents and customers. AWS claims that this reduces the need for post-call reporting, while managers can quickly access information without having to go through the entire transcript.