Transcribing recorded audio is one of the single most valuable — and least sexy — parts of the creative process. Podcast producers transcribe interviews. So do TV reporters and other journalists. YouTubers post transcripts for their accessibility and SEO value. Lawyers transcribe depositions so they can get quotes for their novels about lawyers.
Covid-19 and shifting social norms have made transcription even more important.
“With many businesses using an online or hybrid approach throughout the pandemic, transcription and its benefits have become more prominent,” says Elisa Lews of 3Play Media, an excellent captioning and transcription service. “Additionally, we’ve seen an increasing focus on inclusion and accessibility. As we navigate new norms, transcription is likely to continue to grow as a necessary tool for online businesses.”
We also know transcription is valuable because for years everybody did it even though it was a huge pain, or very expensive, or both. Transcribing manually from tape, or even from digital media, was a long, tedious process with a high margin of error.
These days technology has made it infinitely easier and less expensive to transcribe video. Transcription apps abound, as do online services that will provide reasonably accurate transcription for reasonable prices.
Try to contain your surprise when I argue that Descript is by far the best transcription app out there. It provides highly accurate video transcription (audio, too) in minutes, not hours or days like other apps or human-powered services. You can easily correct errors in the completed transcript, and then edit the underlying video by editing the transcript just like you would a doc.
You can cancel whatever you had planned today and get started in Descript now, or read on for a primer on transcription: how it works, what tools are available, and some thoughts on automation.
What are the main ways to convert audio or video to text?
YAKOBCHUK VIACHESLAV - shutterstock.com
You can transcribe video to text in several different ways. The method you choose will determine how long it takes.
Manual video transcription
It’s exactly what it says it is. Basically, you transcribe audio recordings by listening and writing out what’s said, with timestamps.
Although it’s the least efficient, manual transcription remains quite common. It’s free, and it doesn’t require learning any new tools. Some creators find that transcribing manually helps them get a better understanding of what’s in their video.
DIY video transcription
We’ll use DIY transcription to refer to other forms of transcription that don’t involve external technology tools. It could mean manual transcription divided amongst a team (e.g., three coworkers each transcribe 20 minutes of an hour-long video), or it could mean using some kind of proprietary transcription technology.
The point is, there are a lot of ways to approach transcription that are better than sitting there grinding it out on your own. You can generally count on about four hours of transcribing for every hour of video — anything that can whittle that down is worth a try.
Automated transcription
Over the last several years, there has been a considerable rise in consumer apps that automatically transcribe audio and video. All of these programs will drastically increase your productivity. However, you still have to go into the transcript and check it for AI-parsed errors.
Here I go again: We’re confident when we tell you Descript is the best transcription app you’ll find. There’s also 3Play and a bunch of others. Check them out, compare them to Descript. You’ll find it’s the fastest and most accurate option on the market.
Human transcription
Lastly, you can transcribe audio recordings through a human transcription service, meaning you pay others to do the work for you.
Services like these are valuable when trying to translate audio from one language to another. Translators usually pick up on nuanced linguistic meanings that often escape AI. They’re also really the best option when you need near-absolute accuracy; at Descript, we offer a human-powered option that delivers 99% accuracy. Like all human-powered transcriptions, it’s more expensive.
There are other ways to auto-transcribe audio, too: smartphone apps, YouTube’s automatic captioning for YouTube videos, and the Mac Dictation program. However, these tools don’t have many features, whereas a program like Descript does, and their accuracy is hit-and-miss.
How to transcribe a video or audio file
Descript already has an excellent, in-depth tutorial on audio transcription. In it, you'll find step-by-step visual instructions on every aspect of the program. So, if you're brand new to Descript and need a follow-along, I recommend that you check out that link first.
However, if you already know the basics of Descript and are looking for a simple rundown on the app's transcription process, here’s a short set of instructions.
1. Upload your audio file
Let's say you're a vlogger with two co-hosts, and you're releasing weekly episodes that you want to transcribe to text for editing purposes.
First, add your video file to Descript by dragging and dropping it into the blank composition space.
Descript will immediately start transcribing. If you import multiple tracks at once, you'll also have the option to change your recording to a sequence, for multitrack editing.
2. Set your speakers
As Descript transcribes the file, it will also prompt you to add Speaker Labels for the different voices in your video.
If you have multiple speakers, click Enter Speaker Name, then hover over Detect Speakers. From there, you can select the number of speakers you want to identify in the file by running it through the Speaker Detective.
After you transcribe your file, you'll need to go into the text record to correct the transcript. While Descript’s automatic transcription is excellent, it makes mistakes, just like any computer program, human being, or home-plate umpire. So we’ve created keyboard shortcuts that make it super easy to correct words and adjust speaker labels.
3. Correct your work
If you see an error in the transcription, highlight it and press E or click the Correct button.
Make changes in the text box that appears, then press Enter or click Correct to apply.
Alternatively, you can permanently toggle the Correct option on by pressing Option-Command-E or by clicking on the Edit menu, then Toggle Correct Text.
And that's it. That’s all you need to know to use Descript's transcription tools.
Tools for Transcription
BarsRsind - shutterstock.com
Beyond this short-form tutorial, here are a few essential tools you should use for any transcription:
Noise-canceling headphones
Being able to concentrate on your recording while you’re editing is crucial. If you’re working in an area with a lot of distractions, like an office or a Mardi Gras parade. use noise-canceling headphones.
Voice Recognition Software
Some programs, like Descript, have voice recognition tools built into the app itself. However, it’s not a universal feature. If you’re regularly editing tracks with multiple speakers, you’ll want to spring for voice-recognition technology.
Why Do Creators Use Automatic Transcription?
Transcription software can be powerful, and its versatility as a tool has only increased as the years go by.
Other areas where you will find transcription services to be the most useful are:
Podcasting: Transcription software can make editing your podcast easier. It can also bring on a broader audience than you initially reached by putting those transcripts on your podcast website, where search engines can crawl them.
Journalism: Reporters looking for accurate quotes will always gravitate toward transcripts; if your video content is newsworthy, posting the transcripts will make it more likely you’ll be quoted by other journalists.
Product Placement: If you’re looking to gain the attention of advertisers, know that agencies often search for mentions of their clients’ brands. Transcripts posted to your website make searching for those mentions easier.
Accessibility: Above all, automatic transcription helps make your content more accessible for all users, especially those who use assistive technology to navigate the internet.