Technology Artificial Intelligence

AI can clone your voice with just a 3-second sample; Should you worry?

Photo Credit: Pixabay

Leslie D'Monte

10 Jan, 2023

Content creators and voice actors in today's digital age have their work cut out for them with intelligent software mimicking their writings, art, voice, and even their emotions. If OpenAI's DALL-E can generate realistic art and images from plain text prompts, and ChatGPT can write poems, articles, books and even code, here's one more artificial intelligence (AI)-powered tool that can speak and emote like us without us being able to spot the difference in most cases.

Microsoft published a paper early this month about its new text-to-speech AI model, VALL-E, which can simulate a person's voice with just a 3-second recording. Initial results show that VALL-E can also preserve the speaker's emotional tone (https://arxiv.org/abs/2301.02111). The paper describes VALL-E as "a new language model approach for text-to-speech synthesis (TTS) that uses audio codec codes as intermediate representations".

According to the paper's authors, VALL-E was pre-trained on 60,000 hours of English speech data, which the paper claims is "hundreds of times larger than existing systems".

But what's new about this technology, you may ask? And with good reason. Text-to-speech, or TTS systems, have been around for a while. Free TTS tools include Natural Reader, WordTalk, ReadLoud, Listen (which uses Google's TTS application programming interface (API) to convert short snippets of text into natural-sounding synthetic speech), Free TTS (again from Google), Watson Text to Speech (a tool from IBM which supports a variety of voices in different languages and dialects), and Neosapience (which allows users to write out the emotion they want virtual actors to use when speaking).

That said, TTS tools typically require high-quality studio-recorded annotated audio from different speakers with different styles and emotions for commercial applications. The models also typically need at least 30 minutes of such data.

Read the full story on Mint.

Leave Your Comment(s)

AI Vall-E Microsoft OpenAI ChatGPT Dall-E artificial intelligence machine learning

AI can clone your voice with just a 3-second sample; Should you worry?

Leave Your Comment(s)

SUBSCRIBE TO NEWSLETTERS

Most Popular

Women’s Day: Mid, senior-level women techies need more role models, upskilling opportunities

AI governance should be an intrinsic part of tech skilling: Geeta Gurnani, IBM

Gender-balanced cyber workforce can lead to greater efficiency: Kris Lovejoy

SUBSCRIBE TO NEWSLETTERS

Sign up for Newsletter

Leave Your Comment(s)

Sign up for Newsletter

SUBSCRIBE TO NEWSLETTERS

Most Popular

Women’s Day: Mid, senior-level women techies need more role models, upskilling opportunities

AI governance should be an intrinsic part of tech skilling: Geeta Gurnani, IBM

Gender-balanced cyber workforce can lead to greater efficiency: Kris Lovejoy

SUBSCRIBE TO NEWSLETTERS

TRENDING STORIES

Sign up for Newsletter