What are the main ways to convert audio or video to text?

You can transcribe video to text in several different ways. The method you choose will determine how long it takes.
Manual video transcription
- It’s exactly what it says it is. Basically, you transcribe audio recordings by listening and writing out what’s said, with timestamps.
- Although it’s the least efficient, manual transcription remains quite common. It’s free, and it doesn’t require learning any new tools. Some creators find that transcribing manually helps them get a better understanding of what’s in their video.
DIY video transcription
- We’ll use DIY transcription to refer to other forms of transcription that don’t involve external technology tools. It could mean manual transcription divided amongst a team (e.g., three coworkers each transcribe 20 minutes of an hour-long video), or it could mean using some kind of proprietary transcription technology.
- The point is, there are a lot of ways to approach transcription that are better than sitting there grinding it out on your own. You can generally count on about four hours of transcribing for every hour of video — anything that can whittle that down is worth a try.
Automated transcription
- Over the last several years, there has been a considerable rise in consumer apps that automatically transcribe audio and video. All of these programs will drastically increase your productivity. However, you still have to go into the transcript and check it for AI-parsed errors.
- Here I go again: We’re confident when we tell you Descript is the best transcription app you’ll find. There’s also 3Play and a bunch of others. Check them out, compare them to Descript. You’ll find it’s the fastest and most accurate option on the market.
Human transcription
- Lastly, you can transcribe audio recordings through a human transcription service, meaning you pay others to do the work for you.
- Services like these are valuable when trying to translate audio from one language to another. Translators usually pick up on nuanced linguistic meanings that often escape AI. They’re also really the best option when you need near-absolute accuracy; at Descript, we offer a human-powered option that delivers 99% accuracy. Like all human-powered transcriptions, it’s more expensive.
There are other ways to auto-transcribe audio, too: smartphone apps, YouTube’s automatic captioning for YouTube videos, and the Mac Dictation program. However, these tools don’t have many features, whereas a program like Descript does, and their accuracy is hit-and-miss.
How to transcribe a video or audio file
Descript already has an excellent, in-depth tutorial on audio transcription. In it, you'll find step-by-step visual instructions on every aspect of the program. So, if you're brand new to Descript and need a follow-along, I recommend that you check out that link first.
However, if you already know the basics of Descript and are looking for a simple rundown on the app's transcription process, here’s a short set of instructions.
1. Upload your audio file
- Let's say you're a vlogger with two co-hosts, and you're releasing weekly episodes that you want to transcribe to text for editing purposes.
- First, add your video file to Descript by dragging and dropping it into the blank composition space.
- Descript will immediately start transcribing. If you import multiple tracks at once, you'll also have the option to change your recording to a sequence, for multitrack editing.
2. Set your speakers
- As Descript transcribes the file, it will also prompt you to add Speaker Labels for the different voices in your video.
- If you have multiple speakers, click Enter Speaker Name, then hover over Detect Speakers. From there, you can select the number of speakers you want to identify in the file by running it through the Speaker Detective.
After you transcribe your file, you'll need to go into the text record to correct the transcript. While Descript’s automatic transcription is excellent, it makes mistakes, just like any computer program, human being, or home-plate umpire. So we’ve created keyboard shortcuts that make it super easy to correct words and adjust speaker labels.
3. Correct your work
- If you see an error in the transcription, highlight it and press E or click the Correct button.
- Make changes in the text box that appears, then press Enter or click Correct to apply.
- Alternatively, you can permanently toggle the Correct option on by pressing Option-Command-E or by clicking on the Edit menu, then Toggle Correct Text.
And that's it. That’s all you need to know to use Descript's transcription tools.
Tools for Transcription

Beyond this short-form tutorial, here are a few essential tools you should use for any transcription:
Noise-canceling headphones
- Being able to concentrate on your recording while you’re editing is crucial. If you’re working in an area with a lot of distractions, like an office or a Mardi Gras parade. use noise-canceling headphones.
Voice Recognition Software
- Some programs, like Descript, have voice recognition tools built into the app itself. However, it’s not a universal feature. If you’re regularly editing tracks with multiple speakers, you’ll want to spring for voice-recognition technology.
Why Do Creators Use Automatic Transcription?
Transcription software can be powerful, and its versatility as a tool has only increased as the years go by.
Other areas where you will find transcription services to be the most useful are:
- Podcasting: Transcription software can make editing your podcast easier. It can also bring on a broader audience than you initially reached by putting those transcripts on your podcast website, where search engines can crawl them.
- Journalism: Reporters looking for accurate quotes will always gravitate toward transcripts; if your video content is newsworthy, posting the transcripts will make it more likely you’ll be quoted by other journalists.
- Product Placement: If you’re looking to gain the attention of advertisers, know that agencies often search for mentions of their clients’ brands. Transcripts posted to your website make searching for those mentions easier.
- Accessibility: Above all, automatic transcription helps make your content more accessible for all users, especially those who use assistive technology to navigate the internet.
If you want to read more about the transcription process, read up on Descript’s transcription glossary.