Overview
What is AI voice cloning technology and how does it work?
AI voice cloning technology creates synthetic copies of individual voices using speech samples. These systems analyze distinctive characteristics like tone, pitch, and cadence to generate new speech that mimics the original speaker. With just a short audio sample, there are now a range of (open source) tools that can produce convincing replicas of someone’s voice that can say anything, even phrases the original speaker never uttered.
What legitimate applications exist for AI voice cloning?
AI voice cloning has numerous valuable applications:
- Automating narration for audiobooks, articles, and blogs
- Creating character voices for video games
- Streamlining audio editing without re-recording
- Facilitating language translation and movie dubbing
- Customizing marketing content for different regions
- Providing voice alternatives for people who have lost their ability to speak
- Generating voices for educational or corporate training content
- Powering customer service systems

What are the main risks associated with AI voice cloning?
There are three primary categories of misuse:
- Impersonating everyday people: Scammers use voice clones in schemes like “Grandparent scams,” convincing victims that a loved one is in distress and needs money urgently.
- Impersonating public figures: Creating deepfakes of celebrities or politicians endorsing products or spreading misinformation.
- Bypassing security systems: Using voice clones to circumvent voice-based authentication used by some financial institutions.
Additional risks include causing reputational harm (e.g., creating fake audio of someone making offensive statements) and enabling large-scale disinformation campaigns.
How prevalent are voice cloning scams?
While precise statistics specifically on AI voice cloning scams are limited, imposter scams overall are extremely common. In 2023, nearly 854,000 imposter scams were reported to the FTC, with losses totaling $2.7 billion. Consumer Reports collected testimonials from hundreds of consumers who received calls from scammers mimicking familiar voices. Documented cases show substantial financial losses—one consumer reportedly lost $690,000 after watching a deepfake endorsement, while others have lost tens of thousands to voice-based scams.
Consumer Reports Study
Design
Consumer Reports (CR) undertook an investigation into the practices of six companies providing AI-powered voice cloning services. The study’s central aim was to evaluate the potential for these services to be misused for fraudulent activities, impersonation, and breaches of data privacy. Instead of a broad market survey, CR focused on a representative sample, selecting companies with varying approaches to user safety and data handling. The research team simulated the user experience by attempting to generate voice clones using pre-existing audio recordings of a CR staff member. This practical approach was combined with a review of each company’s publicly stated privacy policies. Furthermore, CR directly engaged with the companies, posing specific questions regarding their data usage, security protocols, and methods for preventing the creation of unauthorized voice copies.
A key aspect of the study involved assessing the barriers each company placed in front of users attempting to create a voice clone. This included examining the type of information required from users (e.g., email, payment details), the financial cost of accessing the cloning service, and, crucially, the presence of any technological measures designed to verify the consent of the individual whose voice was being replicated. The study also analyzed how companies addressed the use of customer voice data, specifically whether it was used to refine their AI models, shared with external entities, or subject to user deletion requests. While acknowledging the inherent limitations of a small sample size and reliance on self-reported information, the study aimed to provide a practical assessment of the risks associated with readily available voice cloning tools. The study did not get responses from all companies, which is a further limitation.

Findings and Recommendations
Key Findings:
- Four out of six tested companies (ElevenLabs, Speechify, PlayHT, and Lovo) had no meaningful technical barriers to prevent cloning someone’s voice without their consent, relying solely on user self-attestation.
- Two companies (Descript and Resemble AI) implemented mechanisms to confirm consent, though these were not foolproof.
- Most companies required minimal customer information (name and email), making it easy for users to remain anonymous and potentially misuse the services.
- Privacy policies varied significantly, with some companies reserving the right to use customer voice data for training or sharing with third parties.
- Some companies’ stated practices were more protective of user data than their privacy policies suggested.
- Imposter scams are common, and Al voice cloning tools have the potential to supercharge impersonation scams.
Recommendations for Voice Cloning Companies:
- Implement robust consent verification mechanisms, such as requiring users to record unique consent statements.
- Collect customer credit card information and implement “know your customer” practices.
- Watermark AI-generated audio and provide tools for detection.
- Implement “semantic guardrails” to flag and prevent the creation of harmful content.
- Consider supervised voice cloning models rather than do-it-yourself products.
- Adhere to data minimization principles, provide non-retention options, and do not reuse customer voice models or data.
- Use reasonable cybersecurity practices, including vulnerability disclosure programs.
- Use only consent-based training data.
Recommendations for Regulators (FTC and States):
- The FTC should use its Section 5 unfairness authority to take action against companies with inadequate safeguards.
- The FTC should be granted additional resources and expanded legal powers to address AI-powered scams.
- State attorneys general should bring cases under state consumer protection laws.
- New legislation could prohibit companies from offering voice cloning products without basic best practices.
The Imminent Arms Race
The future of voice cloning is one of ubiquity. Over the next 10-18 months, expect real-time deepfake detection to become significantly more effective, a direct response to the increasing sophistication of audio manipulation, especially in voice synthesis. This arms race is critical; robust, real-time detection is essential to maintaining trust in digital interactions. Concurrently, falling costs and advancements in processing power will democratize voice cloning further. Models will run natively on mobile devices, and seamless integration with services like real-time translation will put unprecedented capabilities into the hands of everyday users.
This widespread accessibility, however, presents a profound challenge. While commercial applications will undoubtedly flourish, so too must the regulatory frameworks and real-time detection systems designed to mitigate the inherent risks. The central question is no longer if this technology will become pervasive, but rather how effectively industries and policymakers can establish accountability and ethical oversight. Striking a balance between fostering innovation and safeguarding individuals and institutions will be paramount.
Support our work by subscribing to our newsletter📩