OpenAI has announced that it has launched a new Whisper API that enables third-party developers to integrate its ChatGPT into their apps and services at significantly cheaper rates than using its existing language models. The Whisper API is a hosted version of the open-source Whisper speech-to-text model, which was released by the company in September 2022. It is an automatic speech recognition system that costs just $0.006 per minute and supports large-sized transcription in multiple languages, accepting various file formats such as M4A, MP3, MP4, MPEG, MPGA, WAV, and WEBM.
Despite the presence of competitive tech organizations such as Google, Amazon, and Meta, OpenAI's Whisper API stands out with its outstanding performance, as it is trained on 680,000 hours of multilingual and "multitask" data collected from the web. This affords it upgraded recognition features like unique accents, background noise, and technical jargon.
OpenAI's president and chairman, Greg Brockman, explained that the Whisper API is an optimized version of the same large model that is available as open source, and it is much faster and more convenient to use. The limitations in enterprises adopting voice transcription technology are accuracy, accent- or dialect-related recognition issues, and costs, according to a 2020 Statista survey.
"Our picture is that we really want to be this universal intelligence," Brockman said. "We really want to, very flexibly, be able to take in whatever kind of data you have and whatever kind of task you want to accomplish and be a force multiplier on that attention."
One limitation of Whisper is in "next-word" prediction, due to the enormous amount of data trained with the system. OpenAI cautions that Whisper might include words that weren't spoken in its transcriptions, possibly because it's both trying to predict the next word in the audio and transcribe the audio recording itself. Whisper's performance also varies according to the language used, with speakers of less well-represented languages in the training set experiencing a higher error rate.
OpenAI anticipates using Whisper's transcription capabilities to enhance current software, services, tools, and solutions. The Whisper API is already being used by the AI-powered language learning app Speak to enable a brand-new in-app virtual speaking companion. Furthermore, OpenAI breaking into the speech-to-text market may be quite profitable, with a single estimate placing the potential market value at $5.4 billion by 2026, up from $2.2 billion in 2021.
Play audio
No comments