How does Descript's speech-to-text tool work?

Descript relies on top-tier AI and machine learning to convert your audio files into a highly accurate transcript in just a few moments.

Can I use Descript to make captions?

Yes, Descript can generate captions for your videos. You just pick the video you want to add text to, convert the audio to text, then use Descript’s Fancy Captions feature to place the text on your video in a few clicks.

Is Descript just a transcription tool?

Not at all. Descript is a complete audio and video editor. With features like automated filler word removal, voice cloning, and Studio Sound voice enhancement, Descript uses AI to streamline your entire production process.

Can I transcribe audio in other languages?

Absolutely. Descript supports transcribing in >23+ languages, including English (US), Latvian, Romanian, Catalan, Finnish, Lithuanian, Slovak, Croatian, French (FR), Malay, Slovenian, Czech, German, Norwegian, Spanish (US), Danish, Hungarian, Polish, Swedish, Dutch, Italian, Portuguese (BR), and Turkish. The AI can handle many accents and speech styles thanks to frequent training of its speech recognition models.

What audio file formats does Descript transcribe?

Descript can transcribe WAV, MP3, AAC, AIFF, M4A, FLAC audio files.

Audio to Text | Convert Recordings to Editable Text

How to transcribe audio files to text

01
Upload your audio file to transcribe
Drag and drop an audio or video file into a new Descript project. A transcript is generated automatically and synced to your audio, capturing dialogue and even nonverbal sounds. If your audio has more than one speaker, Descript will identify and label each person.
02
Edit your transcript
Your transcript is synced with the editing timeline by default. Delete or rearrange text to edit your audio, which allows you to remove filler words in one click. To fix any transcription errors—like a misspelled name—highlight the text and press 'C' to correct the script without changing the audio.
03
Export in your desired format
When your transcript looks good, head over to Publis; Export and pick an option. You can export as plain text, rich text, markdown, HTML, Word doc, or even an SRT or VTT subtitle. You can also share it as a web link or embed your transcript alongside the audio with Descript’s media player.

Drag and drop to convert audio to text in seconds

Convert audio to text—and text into audio
Descript does more than just convert audio to text. It can also create audio from your text to help you explore new ideas. Keep your script and adjust your voice, or make a clone of your voice to enhance your original recording without doing extra takes.
Fix errors and remove filler words in a snap
Whether you create YouTube videos, run a podcast, or just need to transcribe audio to text, Descript’s AI-powered approach is around 95% accurate from the start. After that, you can remove filler words instantly, highlight potential transcription errors, and quickly make corrections throughout your script.
Customize your output with AI
Export your transcribed audio in any format you prefer, with or without speaker labels, time codes, and markers. Plus, AI Actions let you convert your transcript into blog posts, social content, or even a script with the prompts you choose.

Don’t just take our word for it

With a 4.6-out-of-5-star rating and a bunch of distinctions on G2, Descript’s users have declared it an industry standard in the video and podcasting world.

2026

Best Software

Video Editing
AI Video Generators
Screen and Video Capture
Text to Speech

“With Descript I'll be able to at least double my content output since editing is taking one-quarter the time it used to.”
Donna B.
“With Descript we can create videos for our YouTube channel and our LinkedIn page much faster and with high quality.”
Balázs N.
“Descript has made cleaning up and creating my educational videos into professional presentations [possible] without needing extensive technical computer skills.”
Barbara C.
“Descript makes recording and editing audio and video a breeze. It's advanced features have streamlined my workflows, saving me a lot of time usually spent editing.”
Roderick F.
“The collaborative tools streamline teamwork, allowing my team and me to work efficiently together on projects. Overall, Descript enhances productivity and simplifies the editing process.”
Aldrich M.
“Transcription-based editing makes the process much faster…All in all, a must have editor for most audiences, especially in SaaS marketing.”
Nidhin M.

PRICING

Surely there’s one for you

Free

per person / month

Start your journey with text-based editing

Get started

1 media hour / month
100 AI credits / month
Export 720p, watermark-free
Limited use of Underlord, our agentic video co-editor and AI tools
Limited trial of AI Speech

Hobbyist

$24

$16

per person / month

1 person included

Elevate your projects, watermark-free

Get started

10 media hours / month
400 AI credits / month
Export 1080p, watermark-free
Access to Underlord, our AI video co-editor
AI tools including Studio Sound, Remove Filler Words, Create Clips, and more
AI Speech with custom voice clones and video regenerate

Creator

$35

$24

per person / month

Scale to a team of 3 (billed separately)

Unlock advanced AI-powered creativity

Get started

30 media hours / month
+5 bonus hours
800 AI credits / month
+500 bonus credits
Export 4k, watermark-free
Full access to Underlord, our AI video co-editor and 20+ more AI tools
Generate video with the latest AI models
Unlimited access to royalty-free stock media library
Access to top ups for more media hours and AI credits

Questions? We have answers

How does Descript's speech-to-text tool work?
Descript relies on top-tier AI and machine learning to convert your audio files into a highly accurate transcript in just a few moments.
Can I use Descript to make captions?
Yes, Descript can generate captions for your videos. You just pick the video you want to add text to, convert the audio to text, then use Descript’s Fancy Captions feature to place the text on your video in a few clicks.
Is Descript just a transcription tool?
Not at all. Descript is a complete audio and video editor. With features like automated filler word removal, voice cloning, and Studio Sound voice enhancement, Descript uses AI to streamline your entire production process.
Can I transcribe audio in other languages?
Absolutely. Descript supports transcribing in >23+ languages, including English (US), Latvian, Romanian, Catalan, Finnish, Lithuanian, Slovak, Croatian, French (FR), Malay, Slovenian, Czech, German, Norwegian, Spanish (US), Danish, Hungarian, Polish, Swedish, Dutch, Italian, Portuguese (BR), and Turkish. The AI can handle many accents and speech styles thanks to frequent training of its speech recognition models.
What audio file formats does Descript transcribe?
Descript can transcribe WAV, MP3, AAC, AIFF, M4A, FLAC audio files.

Convert audio to text

How to transcribe audio files to text

Upload your audio file to transcribe

Edit your transcript

Export in your desired format

Drag and drop to convert audio to text in seconds

Convert audio to text—and text into audio

Fix errors and remove filler words in a snap

Customize your output with AI

More than an audio to text converter

Text-to-speech

Remote recording

Podcasting

Use AI to flag the best snippets in your audio or transcript.

Don’t just take our word for it

Best Software

PRICING

Free

Hobbyist

Creator

DISCOVER MORE

Questions? We have answers

How does Descript's speech-to-text tool work?

Can I use Descript to make captions?

Is Descript just a transcription tool?

Can I transcribe audio in other languages?

What audio file formats does Descript transcribe?