Meta AI recently exposed a “breakthrough” text-to-speech (TTS) generator that it claims produces results up to 20 times faster than state-of-the-art artificial intelligence models of comparable performance.
The new system, called Voicebox, eschews the traditional TTS architecture in favor of a model more akin to OpenAI’s ChatGPT or Google’s Bard.
Among the main differences between Voicebox and similar TTS models such as ElevenLabs Prime Voice AI is that Meta’s offering can be generalized through learning in context.
Like ChatGPT or other transformer models, Voicebox uses large training data sets. Previous efforts to use massive amounts of audio data resulted in severely degraded audio output. For this reason, most TTS systems use small, highly edited and labeled datasets.
Meta overcomes this limitation with a new training scheme that removes labels and the curator for an architecture capable of “filling in” audio information.
Ace Meta AI put it down In a June 16 blog post, Voicebox is “the first model that can generalize to speech generation tasks for which it was not specifically trained with state-of-the-art performance.”
With this, Voicebox can translate text to speech, remove unwanted noise by synthesizing surrogate speech, and even apply the speaker’s voice to various language outputs.
According to accompanying research published by Meta, its pre-trained Voicebox system can do just that fulfill all with just the required output text and a three-second audio clip.
The arrival of robust speech generation comes at a particularly sensitive time, just as social media companies continue to do so fight moderationand in the United States, the upcoming presidential election threatens to once again test the limits of online disinformation exposure.
For example, former US President Donald Trump is currently facing allegations that he mishandled confidential government materials after leaving office. Among the alleged evidence cited in the dispute against him are audio recordings in which he allegedly confessed to possible wrongdoing.
While there is currently no indication that the former president intends to deny the content described in the audio files, his case illustrates that data integrity is at the heart of America’s legal system and, by extension, its democracy.
Voicebox is not the first tool of its kind, but it seems to be among the most robust. As such, Meta has developed a tool to determine if speech was generated by it, and the company claims it can “trivially detect” the difference between real and fake audio. According to the blog post:
“As with other powerful new AI innovations, we recognize that this technology carries the potential for misuse and unintended harm. In our paper, we detail how we built a highly efficient classifier that can distinguish between authentic speech and Voicebox-generated audio to mitigate these potential future risks.”
In the world of cryptocurrencies, artificial intelligence has become an integral part of the daily operations of most businesses, like the Internet or electricity. The biggest exchanges rely on AI chatbots for customer interactions and sentiment analysis, and trading bots have become commonplace.
The advent of robust text-to-speech systems such as Voicebox, combined with automated trading, could help bridge the gap for potential cryptocurrency traders who rely on TTS systems that may currently struggle with crypto jargon or multi-language support.