home_and_garden com
May 8, 2024
(Updated on
Apr 1, 2024

OpenAI's Voice Engine Lets You Copy Someone Else's Voice

OpenAI has developed a voice cloning tool called Voice Engine, which allows users to create a synthetic voice based on a 15-second sample of someone's voice. This tool is currently in preview and is not yet available to the public. The company is taking this time to ensure that the technology is deployed responsibly, considering the potential risks and putting safeguards in place. Voice Engine is an expansion of OpenAI's existing text-to-speech API and has been in development for about two years.

Photo by Mariia Shalabaieva on Unsplash

The generative AI model powering Voice Engine has been used in other OpenAI products, such as the voice and "read aloud" capabilities in ChatGPT and the preset voices in the text-to-speech API. The model was trained on a mix of licensed and publicly available data, although OpenAI has not disclosed specific details about the training data. This training data is crucial for the model's development, but it also raises questions about intellectual property rights and fair use.

Voice Engine uses a combination of a diffusion process and a transformer to generate speech from a small audio sample and text input. The model does not retain the audio data after generating the speech, which helps protect user privacy. OpenAI claims that its approach delivers higher-quality speech compared to other voice-cloning products on the market.

One of the key features of a voice engine is its ability to generate realistic speech without fine-tuning user data. This is achieved through a combination of a diffusion process and a transformer, which allows the model to analyze speech and text data simultaneously to generate a matching voice. While this technology is not new, OpenAI claims that its approach delivers higher-quality speech compared to other voice-cloning products on the market.

In terms of pricing, Voice Engine is listed as costing $15 per one million characters, or ~162,500 words or roughly 18 hours of audio. This pricing is competitive compared to other vendors, but Voice Engine does not offer as many customization options as some other products. The commoditization of voice work that tools like Voice Engine could bring may have an impact on voice actors, whose salaries range from $12 to $79 per hour. Voice Engine does not currently offer controls to adjust tone, pitch, or cadence, although any expressiveness in the 15-second voice sample will carry over to subsequent generations.

The development of voice cloning technology raises ethical concerns, particularly regarding the impact on voice actors. OpenAI's tool has the potential to reduce the cost of voice work, potentially eliminating entry-level voice work in favour of AI-generated speech. While some companies are exploring ways to protect the rights of voice actors, such as creating marketplaces for synthetic voices, OpenAI has not yet implemented such measures. Despite these challenges, OpenAI is taking steps to prevent misuse of Voice Engine. The tool is currently only available to a small group of developers, and clones created with Voice Engine are watermarked to trace their origin. OpenAI is also working with its red teaming network to identify and mitigate potential risks associated with the technology. While OpenAI's Voice Engine represents a significant advancement in AI technology, its implementation requires careful consideration of the ethical and societal implications.

These Insights might also interest you
See all Insights
Let's Talk
Brand Vision Insights

Please fill out the form below if you have any advertising and partnership inquiries.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.