Convert audio to text

Descript’s audio-to-text capabilities transcribe audio with up to 95% accuracy to create transcripts, captions, subtitles, and text files. The best part? You can edit your audio by editing the text—just like a doc—to remove filler words and make cuts with just a few keystrokes.

The Easiest Speech-to-Text Has Ever Been

Descript’s speech-to-text transcription tool uses advanced speech recognition technology to turn audio files into transcripts that can be edited in real-time, just like a Google Doc, to change the underlying audio. All you have to do is drag and drop your audio or video file, and Descript will immediately begin transcribing.

Download the app →

How to transcribe audio files to text

Experience the magic of Studio Sound on your audio clip. You just need an audio recording that’s no longer than 5 minutes and no more than 25mb.

Step 1

Upload your audio file to transcribe

Drag and drop an audio or video file into a new Descript project to upload it. A transcript will automatically generate and sync to your audio, including dialogue and even "wordless media" like sounds, and pauses. If there are multiple speakers in your audio, Descript will automatically identify and label them for you.

Step 2

Edit your transcript

By default, your new transcript will be synced to your editing timeline. You can delete or rearrange the text to edit your audio, letting you do stuff like remove filler words in one click. If you want to fix any transcription errors, like a misspelled name, highlight the text and enter Correct mode by pressing 'C' to fix your transcript without affecting the audio.

Step 3

Export in your desired format

Once your transcript is polished, head over to Publish > Export and choose an export option. You can export your transcript as plain text, rich text, markdown, HTML, Word doc, or even an SRT or VTT subtitle file. You can also publish it as a web link to share or embed your transcript alongside the audio with Descript's media player.

The Easiest Speech-to-Text Has Ever Been

Descript’s speech-to-text transcription tool uses advanced speech recognition technology to turn audio files into transcripts that can be edited in real-time, just like a Google Doc, to change the underlying audio. All you have to do is drag and drop your audio or video file, and Descript will immediately begin transcribing.

Download the app →

A text converter that is as easy as drag and drop

Descript makes it easy to transcribe audio files into text. Simply create a project, select the audio file you want to transcribe, and wait a few seconds for your accurate transcription. Descript also makes it easy to correct any inaccuracies, so you can quickly take your transcript from highly accurate to perfect.Whether you're a YouTuber, vlogger, podcaster, or simply wanting to transcribe an audio file, Descript’s advanced speech recognition technology ensures precise and accurate transcriptions every time, and our simple, intuitive user interface makes it easy to get started.Sign up for free today and see how easy it is to create searchable transcripts of your audio files.

Descript Audio Transcription is Better Than Ever

With our most recent updates, Descript’s transcription is better than ever.

Automatic transcription will save you a step when you’re importing media; rather than confirming that you want to transcribe, Descript just starts transcribing.

Other fixes & improvements:

  • Our Correction Wizard streamlines transcript correction even more by automatically identifying transcription errors.
  • You can now order our White Glove transcription service or initiate Speaker Detection from the file details section of the Track Inspector (in the rail to the right of your transcript).
  • You can select Speaker Detection from the speaker dropdown menu in the script.  
  • You can click and drag to make Learning Center videos bigger.

How does Descript’s speech-to-text tool work?

Descript uses state-of-the-art artificial intelligence and machine learning to take your audio files and give you a highly accurate transcription of that audio in minutes.

Can I use Descript to make captions?

Yes, you can use Descript to create captions for videos. Simply select the video file you want to add text to, transcribe the audio, and then use Descript’s Fancy Captions feature to add the text to your video in a few clicks.

Is Descript just a transcription tool?

Far from it. With tools like automated Filler Word Removal, Overdub voice synthesis, Studio Sound voice enhancement, and  text-to-speech editing, Descript uses AI and other advanced technological stuff to streamline your entire production workflow — so you spend more time creating content, and less on the technical drudgery.

Can Descript transcribe in different languages?

Yes! Descript supports transcription for 22 languages: Spanish, German, French, Italian, Portuguese, Romanian, Malay, Turkish, Polish, Dutch, Hungarian, Czech, Swedish, Croatian, Finnish, Danish, Norwegian, Slovak, Catalan, Lithuanian, Slovenian, Latvian, (and English).

What audio file formats does Descript transcribe?

Descript can read WAV audio formats from nearly every popular source. Whether you have an audio recording on a mobile device like an Android, an iOS device like an iPad or iPhone, or even something you recorded directly into Windows or Mac, Descript’s transcription software can take that audio and turn it into editable text for your project.

Download the app for free

Create a podcast, a video, and all your social assets using Descript. It’s as easy as editing a doc.
Sign up for this tool
Try Descript for free →
HomeTools
Convert Audio to Text

Convert Audio to Text

Descript’s audio-to-text capabilities transcribe audio with up to 95% accuracy to create transcripts, captions, subtitles, and text files. The best part? You can edit your audio by editing the text—just like a doc—to remove filler words and make cuts with just a few keystrokes.

Get started →
How to transcribe audio files to text
  • 3
    Create a new project
    Drag your file into the box above, or click Select file and import it from your computer or wherever it lives.
Step 1
Upload your audio file to transcribe

Drag and drop an audio or video file into a new Descript project to upload it. A transcript will automatically generate and sync to your audio, including dialogue and even "wordless media" like sounds, and pauses. If there are multiple speakers in your audio, Descript will automatically identify and label them for you.

Step 2
Edit your transcript

By default, your new transcript will be synced to your editing timeline. You can delete or rearrange the text to edit your audio, letting you do stuff like remove filler words in one click. If you want to fix any transcription errors, like a misspelled name, highlight the text and enter Correct mode by pressing 'C' to fix your transcript without affecting the audio.

Step 3
Export in your desired format

Once your transcript is polished, head over to Publish > Export and choose an export option. You can export your transcript as plain text, rich text, markdown, HTML, Word doc, or even an SRT or VTT subtitle file. You can also publish it as a web link to share or embed your transcript alongside the audio with Descript's media player.

Drag and drop to convert audio to text in seconds
Turn audio into text—and text into audio

Descript does more than just transcribe audio. It can also generate audio based on your text to expand your creative options. Keep your words and change your voice, or cloning your voice to add to your original audio without rerecording.

Fix errors and remove filler words in a snap

Whether you're a YouTuber, podcaster, or just want to transcribe an audio file, Descript's 95% accurate AI transcription gets you most of the way. From there, you can remove filler words in one click, automatically flag likely transcription errors, and make bulk corrections across your entire transcript.

Customize your output with AI

Export your transcribed audio in your choice of format, including or excluding speaker labels, time codes, and markers. Plus, AI Actions make it easy to turn your transcript into blog posts, social media posts, or even a script based on your prompts.

Questions? We have answers
How does Descript's speech-to-text tool work?

Descript uses industry-leading artificial intelligence and machine learning to take your audio files and give you a highly accurate transcription of that audio in seconds.

Can I use Descript to make captions?

Yes, you can use Descript to create captions for videos. Simply select the video file you want to add text to, transcribe the audio, and then use Descript’s Fancy Captions feature to add the text to your video in a few clicks.

Is Descript just a transcription tool?

Far from it. Descript is an all-in-one audio and video editor. With features like automated filler word removal, voice cloning, and Studio Sound voice enhancement, Descript uses AI to streamline your entire production workflow.

Can I transcribe audio in other languages?

Yes! Descript supports transcription in 23+ languages, including English (US), Latvian, Romanian, Catalan, Finnish, Lithuanian, Slovak, Croatian, French (FR), Malay, Slovenian, Czech, German, Norwegian, Spanish (US), Danish, Hungarian, Polish, Swedish, Dutch, Italian, Portuguese (BR), and Turkish. The AI can understand a variety of accents and speaking styles thanks to continual training of its speech recognition models.

What audio file formats does Descript transcribe?

Descript can transcribe WAV, MP3, AAC, AIFF, M4A, FLAC audio files.

This is some text inside of a div block.
Descript is the only tool you need to write, record, transcribe, edit, collaborate, and share your videos and podcasts.
What is the point of this tool?
Descript is the only tool you need to write, record, transcribe, edit, collaborate, and share your videos and podcasts.
More than an audio-to-text converter
Descript is an AI-powered audio and video editing tool that lets you edit podcasts and videos like a doc.
  • Text-to-speech
    Turn text into audio using a growing library of AI voices. Or create your own voice clone.
  • Remote recording
    Capture and transcribe up to 10 guests with a built-in remote recording studio.
  • Podcasting
    Record, transcribe, edit, and publish podcast audio in an intuitive text-based editor.
  • Find good clips
    Use AI to flag the best snippets in your audio or transcript.