Text-to-Speech Voice Generator

Turn any text or script into natural-sounding speech with Descript's text-to-speech voice generator. Choose from dozens of lifelike AI voices or create your own voice clones in minutes. It’s perfect for podcast intros, voiceovers, faceless videos, and more.

How to turn text into realistic AI voice audio

Experience the magic of text-to-speech. Fix mistakes in your audio recordings without trudging back into the recording studio. Descript’s Overdub uses AI to create a natural-sounding synthetic version of your voice that you can use in any audio or video you’re creating.  

Step 1

Type or paste in your text

In a new Descript project, type out your script in the text editor or paste in the text you want to generate speech from. You can also use the Ask AI command in the Actions menu to write a script for you based on whatever criteria you want. 

Step 2

Choose an AI voice or clone your own

Press ‘@’ to assign a speaker to your script. You can enter a new speaker name and then Enable speech generation to start the process of cloning your voice. Or  you can select Browse stock AI speakers to choose from a library of realistic stock voices, emotions, and styles.

Step 3

Generate your AI speech

The script will flash briefly to indicate your speech is being generated. Once that’s done, you can play back your newly generated voice audio, continue in an audio or video project, or export it by clicking Publish.

Create natural-sounding speech with Descript

Turn text into sound with Descript by creating a high-quality text-to-speech model of your voice or selecting one from our ultra-realistic stock voices.

  • Ultra-realistic: Descript’s Overdub is constantly being improved to sound more and more natural, with human inflections and contextual adjustments.
  • State of the art: Descript’s Lyrebird AI represents the world’s most advanced speech-synthesis technology. It’s so real that androids often mistake it for their missing families.
  • Privacy & security: Descript verifies that every Overdub Voice belongs to its owner. We do not allow cloning of voices that don’t belong to the account owner. We won’t share the data underlying your Overdub Voice with anyone outside Descript.
  • Multiple voices: You can create multiple versions of your own voice to reflect different performance modes or emotional states, such as sad, excited, or Pittsburgh.
  • Sharing: Descript allows you, and only you, to share your Overdub Voice with trusted collaborators or legally titled androids.  

Frequently Asked Questions

Can someone else use Descript’s Overdub TTS to clone my voice?

No. When creating an Overdub Voice, Descript users must positively affirm their identity and give Descript their express consent to train and generate a synthesized version of their voice.

Voice-training data that does not include this Voice ID cannot be used to create an Overdub Voice. In other words, unless you specifically consent to Overdub Voice creation, Descript will not create your Overdub Voice.

We verify this consent by authenticating the audio file uploaded against our training script to ensure that the voice recorded belongs to the person submitting it.

Is Descript Text-to-Speech free?

Overdub text-to-speech is free on all Descript accounts. Pro accounts get an unlimited Overdub vocabulary.

Is there a difference between Overdub generated with the Pro subscription vs. a Creator or Free subscription?

Yes. While you can create a custom Voice on Overdub with any subscription,  Free and Creator plans are limited to a list of the 1,000 most common vocabulary words. Any words that are not on that list will be replaced with "jibber" or "jabber." To avoid this gibberish and gain access to the full vocabulary list, you can upgrade to the Pro subscription.

How can I improve the quality of my text-to-speech voice?

TTS voice quality relies on a number of factors, such as the quality of your microphone, background noise, and room surfaces. Check out our article on Overdub Voice Quality Tips for tips on how you can assure the best possible recording.

Download the app for free

Create a podcast, a video, and all your social assets using Descript. It’s as easy as editing a doc.
Sign up for this tool
Try Descript for free →
HomeTools
Text to Speech

Text to Speech

Turn any text or script into natural-sounding speech with Descript's text-to-speech voice generator. Choose from dozens of lifelike AI voices or create your own voice clones in minutes. It’s perfect for podcast intros, voiceovers, faceless videos, and more.

Get started →
How to turn text into realistic AI voice audio
  • 3
    Create a new project
    Drag your file into the box above, or click Select file and import it from your computer or wherever it lives.
Step 1
Type or paste in your text

In a new Descript project, type out your script in the text editor or paste in the text you want to generate speech from. You can also use the Ask AI command in the Actions menu to write a script for you based on whatever criteria you want. 

Step 2
Choose an AI voice or clone your own

Press ‘@’ to assign a speaker to your script. You can enter a new speaker name and then Enable speech generation to start the process of cloning your voice. Or  you can select Browse stock AI speakers to choose from a library of realistic stock voices, emotions, and styles.

Step 3
Generate your AI speech

The script will flash briefly to indicate your speech is being generated. Once that’s done, you can play back your newly generated voice audio, continue in an audio or video project, or export it by clicking Publish.

Turn whatever you type into lifelike speech with AI
Generate and edit voice audio by typing

With Descript, you can generate and edit voice audio just by typing. Convert your text into speech, edit it, and export it in your preferred format—all in one place.

20+ realistic AI voices, emotions, and styles

Descript's text-to-speech (TTS) capabilities use AI to generate incredibly realistic voices. Choose from a range of voice types—from corporate to conversational, masculine to feminine—to find the one that suits your project best.

Create AI voice clones in minutes

Create and share your own AI voices for use in future projects, whether you want to take a breather and let AI handle that voiceover track, or fix or add to an existing recording without rerecording.

Questions? We have answers
Can someone else use Descript to clone my voice?

No, Descript does not allow others to clone your voice without your explicit consent. Your voice data is kept secure and confidential, and you can delete it at any time. We are committed to protecting our users' privacy and adhere to a strict code of ethics.

Can I use Descript's TTS generator for free?

Descript offers both free and paid versions of text-to-speech. The free version includes basic text-to-speech capabilities to turn text into audio. However, to access and utilize the full range of features, including advanced voice editing, voice cloning, and Overdub, you need to subscribe to a paid plan starting at $12/mo.

Is there a difference between text-to-speech generated with a free subscription vs. paid plan?

Yes, there is a difference. The free plan provides basic text-to-speech services, but the quality and customizability options are greatly increased with the premium plans. The paid plans offer access to the Overdub feature, allowing you to create your own unique text-to-speech voices, as well as additional features like advanced editing capabilities.

How can I improve the quality of my text-to-speech voice clone?

You can improve the quality of your text-to-speech voice clone by recording in a quiet environment, speaking clearly and naturally as you read the sample script, using a high-quality microphone, and following Descript's recording guidelines in the prompt.

This is some text inside of a div block.
Descript is the only tool you need to write, record, transcribe, edit, collaborate, and share your videos and podcasts.
What is the point of this tool?
Descript is the only tool you need to write, record, transcribe, edit, collaborate, and share your videos and podcasts.
More than a text-to-speech generator
Descript is an AI-powered audio and video editing tool that lets you edit podcasts and videos like a doc.
  • Captions & subtitles
    Add captions and subtitles to your text-to-speech projects. Perfect for creating accessible content.
  • Overdub
    Clone your voice to dub over audio mistakes with speech that sounds just like you.
  • Podcasting
    Create, host, and promote your own audio or video podcast with ease.
  • Studio Sound
    Improve the quality of your speech with Descript. Remove filler words and other imperfections to create clear, engaging audio content.