Share this article
Latest news
With KB5043178 to Release Preview Channel, Microsoft advises Windows 11 users to plug in when the battery is low
Copilot in Outlook will generate personalized themes for you to customize the app
Microsoft will raise the price of its 365 Suite to include AI capabilities
Death Stranding Director’s Cut is now Xbox X|S at a huge discount
Outlook will let users create custom account icons so they can tell their accounts apart easier
Microsoft’s new VALL-E 2 text-to-speech synthesis achieves human level performance
You’re thinking about Morgan Freeman reading you bedtime stories, right?
3 min. read
Published onJune 11, 2024
published onJune 11, 2024
Share this article
Read our disclosure page to find out how can you help Windows Report sustain the editorial teamRead more
Microsoft has come up with VALL-E 2, a new model that takes human-like speech synthesis to another level. This is not just an improvement; it’s a big step forward in making computer-generated voices sound more natural and high quality. The creation of this advanced technology marks significant progress from previous versions likeVALL-E, which were already at par with human speech patterns but still lacked some crucial elements such as intonation control or avoiding monotonous tone repetition.
The researchers fixed the token repetition issue and more
The latest development overcomes these limitations by introducing fresh aspects like Repetition Aware Sampling and Grouped Code Modeling – all aimed at enhancing stability and efficiency during the process of generating spoken words through machine learning techniques. But what does this mean? Well, let’s dive into the details and find out.
One of the sampling problems is token repetition. Sometimes, the model can produce repetitive sequences which might cause stability issues and infinite loops as mentioned above. This method known as Repetition Aware Sampling takes decoding history into account for more stable and reliable results. Have you ever heard speech synthesis that doesn’t sound quite correct? This feature is here to fix that.
Next is Grouped Code Modeling, a method that focuses on efficiency. By grouping together codec codes, it can greatly shorten the sequence length. This approach speeds up inference and deals with issues related to modeling of long sequences. Think about a situation where you have to quickly synthesize a long speech; this feature makes it possible without losing quality.
VALL-E 2 will talk just like a human
These are not merely technical terms; they empower VALL-E 2 to produce speech that is extremely natural, even for intricate sentences. The model’s elegance lies in its simplicity: it only needs a simple set of speech-transcription pairs for training. This makes the process of collecting and handling data much easier.
According to thetechnical paper of VALL-E 2, on the LibriSpeech and VCTK datasets, the new LLM showed better results in terms of speech robustness, naturalness and speaker similarity. It is the initial model to reach human equality on these tests. The new version can produce very good quality speech which deals well with complicated and repeated sentences.
VALL-E 2 holds great promise for aiding people who have difficulty speaking, but its possible uses are not limited to these areas alone. Think of being able to give a voice to someone who struggles with talking because of conditions like aphasia or amyotrophic lateral sclerosis. Yet, we should not overlook the dangers of misuse, like voice spoofing or impersonation. It is very important for practical uses of this technology to have rules about approving speakers and recognizing if a speech is real or made by computer.
Could you, for instance, have all your e-books on your PC narrated by Morgan Freeman? You probably could. Publishing them online? That would be a totally different story, and you shouldn’t be able to do that for the obvious reasons.
What do you think about VALL-E 2 and speech synthesis? Let’s talk about that in the comments below. We’ve learned about this fromAIM.
More about the topics:AI
Claudiu Andone
Windows Toubleshooting Expert
Oldtimer in the tech and science press, Claudiu is focused on whatever comes new from Microsoft.
His abrupt interest in computers started when he saw the first Home Computer as a kid. However, his passion for Windows and everything related became obvious when he became a sys admin in a computer science high school.
With 14 years of experience in writing about everything there is to know about science and technology, Claudiu also likes rock music, chilling in the garden, and Star Wars. May the force be with you, always!
User forum
0 messages
Sort by:LatestOldestMost Votes
Comment*
Name*
Email*
Commenting as.Not you?
Save information for future comments
Comment
Δ
Claudiu Andone
Windows Toubleshooting Expert
Oldtimer in the tech and science press, with 14 years of experience in writing on everything there is to know about science, technology, and Microsoft