Veritone has launched a new service called MARVEL.ai, a synthetic voice product the company characterizes as a “Voice-as-a-Service” (VaaS) solution for media companies, brands, and celebrities. The platform creates realistic (Veritone says “hyper-realistic”) recorded voice performances based on voice samples. The product can mimic any person’s voice, given a sufficient sample, then make that synthetic voice say anything, read any script, or speak an entire audio program (such as podcasting) in another language using the original speaker’s voice.
The announcement startled us and raised questions. Veritone President Ryan Steelberg agreed to a knowledge-gathering chat, which ended with a compelling use case for podcasting.
Genesis of a Synthetic Speech Initiative
We asked Steelberg when the idea originated for this product, which might seem to differ from Veritone’s reputation as an archiving and indexing company for broadcasters. His answer revealed that synthetic speech is closely related, and grew out of, that core line of business.
“We started looking at this development two years ago, but over many years we have seen the sophistication of conversational AI. We have a tremendous ability to analyze content, and we looked at what the future could be.
He noted that this new product grew out of Veritone’s legacy expertise in collecting and indexing audio.
“We have a dominant position in cognitive analysis in audio and video. Yes, Indexing that content does afford us an opportunity to bring intelligence and new services to studio and broadcasting partners. It has created a tremendous amount of training data.”
“Training data” is the vital ingredient in synthesis, Steelberg explained — it is a volume of a person’s recorded voice which is used to train, or build, a persuasive synthetic version of that voice. Steelberg told us that “hours and hours” of sampling is needed to correctly train a synthetic version.
“On our platform we have so much training data, a tremendous ability to analyze the content, we thought we could take advantage of further advancements, originally looking into the conversational approach. Ultimately the question was whether we get enough sophistication to build synthetic content so we can programmatically create new content based on consumer demand.”
“This is the first inning of Veritone’s B2B approach of empowering our studio and broadcast partners with a full array of synthetic tools and functionality.” Ryan Steelberg, President, Veritone.
Guarding Against “Deep Fakes”
We were curious about possible legal and security risks inherent in a tool which can mimic a person’s voice without limit of content length or type. It brought to mind “deep fakes” in which a celebrity can be fraudulently presented saying thing not actually said by that person. Ryan Steelberg acknowledged the question and made an interesting comparison:
“We all know about the birth of this, because of deep fakes and misinformation. I harken it to the days of Napster when the iTunes Music Store came out. People were misappropriating content, then iTunes came on the market, showing how to charge for the content and have consistency of service and a business model. We say that MARVEL is bringing that opportunity to people who have brand equity in their name and likeness to the market. In effect, the iTunes of this opportunity.”
Security and verification are built into the MARVEL.ai product, Steelberg emphasized:
“We do not allow anybody to create a voice using our technology unless we have dual authentication and authorization from the individual in verbal and written forms. Nobody will use our technology to create voices for fraudulent use unless it’s the person or a directly authorized individual that has the right to create that voice.” He noted that Veritone also embeds a hashtag in the voice itself. “It’s like a watermark which verifies that it was created by Veritone.”
The Use Case for Podcast Translations in Original (Synthetic) Voice
We were curious about real-world use cases, and Ryan Steelberg talked about language translation. With enough training, a synthetic voice can text-to-speak in any language, which opens the door to moving content quickly from one marketplace to another around the world. In discussing this, he brought the conversation straight into podcasting.
“Immediate translation into many different dialects and languages — podcasting, for example. With Crime Junkie for example, which our agency side Veritone One works with, it is no longer a random person’s voice if it gets translated into Spanish or French. It maintains brand equity.”