Loading...

Explained: OpenAI’s Voice Engine for creating voice clones

Explained: OpenAI’s Voice Engine for creating voice clones
Photo Credit: Pixabay
Loading...

ChatGPT maker OpenAI introduced an artificial intelligence-based voice cloning system called Voice Engine, last week. The Microsoft-backed company, valued reportedly at $80 billion, has showcased the ability to clone a voice using only a 15-second audio sample with Voice Engine. We delve into the workings of the system, its significance, and why OpenAI is proceeding cautiously with its public rollout.

What is Voice Engine?

Voice Engine was developed in 2022. It has already been incorporated into voice capabilities of ChatGPT as of the September 2023 release, which allows the chatbot to take voice inputs and read out the output aloud. In the ‘small–scale preview’ released by OpenAI on March 29, the system demonstrated that it could take text input and a 15 second audio clipping to output speech snippet that closely resembles the original speaker that is ‘emotive and realistic’. In a blog introducing Voice Engine, OpenAI noted that the system could be used for a variety of applications such as translations, reading assistance especially in an educational set up, therapeutic applications with speech and learning difficulties. Companies and institutions such as HeyGen, Dimagi, Age of Learning, among others, have tested this tool for a variety of use cases. 

Loading...

Are there similar tools available?

Google DeepMind’s WaveNet requires a special mention when we talk about AI-based natural voice generation systems. Launched in 2017, WaveNet become of the earliest models trained on human speech samples and could generate natural sounding outputs. It is incorporated in some of Google’s products which includes Google Assistant, Maps Navigation, Voice Search and Cloud Text-To-Speech. In 2023, Microsoft, which is also a major investor in OpenAI, launched Vall-e. Microsoft had then claimed that the tool can recreate voice from a three-second sample. Like Voice Engine, Vall-e is also not available in public domain; Microsoft cited safety concerns as the reason. Apart from these major players, there are other smaller tools such as Respeechers, Voice.ai, and Speechify that offer similar services. 

What are concerns with AI-based voice cloning?

Loading...

OpenAI has withheld releasing Voice Engine to the public just as yet, citing associated risks especially when the world stands at the cusp of high-stake elections in 50 countries. OpenAI has said that it is working with US and international partners in governments, media, civil societies, among others to incorporate relevant feedback.  In the recent past, voice deepfakes have been used to dupe individuals and businesses. In February this year, a group of scammsters were able to create deepfake audio and video to dupe a Hong Kong based of $26 million. In India, there have been reported incidents of fund siphoning off of unsuspecting users through AI audios. 

What is the future of Voice Engine?

OpenAI said in its blog that Voice Engine hopes to initiate ‘dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities’. The company said that Voice Engine generations are watermarked to trace the origin of any generated audio. “Partners must also clearly disclose to their audience that the voices they're hearing are AI-generated,” the blog further added. The company also said that it should encourage steps like phasing out of voice-based authentication especially for banking and other sensitive information; development of techniques to track origin of audiovisual content; educating public on potential and capabilities of AI. 

Loading...

Sign up for Newsletter

Select your Newsletter frequency