Scene blocking and staging: How fiction podcasts keep your ear on the action

Red podcast microphone holding a red skull in the style of Hamlet

As someone who sound designs and engineers all sorts of podcasts, I often find myself describing fiction podcasts as “watching TV with your eyes closed.” But on TV, you can hear and see what’s going on. So you might wonder: how does the listener follow the story if they can’t keep track of the action visually? 

For many shows, the answer is simple: a narrator, who essentially reads out stage directions. But that’s not every audio drama’s style, and even when it is, a narrator can only explain so much. If a fiction podcast is out to depict not just fight scenes, but any sequence that involves a good deal of character movement and make it comprehensible for a listening audience, what then? The key, in my experience, lies in two tools of sound design: panning and attention to locational detail.

Record or import audio, make edits, add fades, music, and sound effects, then publish online, export the audio in the format of your choice or send it directly to your hosting service.
Create your podcast from start to finish with Descript.


In day-to-day music listening, it can be easy to take panning for granted. Unless you’re listening to an old recording created when stereo mixing was in its infancy, you might not even notice how the instruments in a song are placed on the soundstage. But in fiction podcasts, panning can be the key between clear, understandable blocking and a muddled mess. 

If you’ve never heard the term before, “blocking” is a term used originally in theatrical directing, then film and television, to describe the way the actors in a scene are placed in their environment and how they move during the course of the scene. The script usually includes a little bit of blocking, but in plays, films, and TV shows, it’s mostly determined by the director. In audio drama, this task is typically split between the director and the sound designer since so much of the information the listener is given comes from what the scene sounds like. Something as simple as a person walking over to a park bench and sitting down can sound entirely different when sound designed by two different people — everyone has their own style, rhythm, and details they like to focus on. 

Usually, we follow the blocking of a scene with just our eyes; we watch where each character is at the top of the scene and follow their paths as they move about the area. In an audio-only medium like the fiction podcast, following the characters becomes trickier. Footsteps, perceived distance, and panning become the primary ways we understand where each character is in the scene and how they’re moving. 

When I start sound designing a scene, I almost always get out an old-fashioned piece of paper and consult the script to draw a quick map of the area the scene takes place in. I then mark the listener’s POV spot (usually a character that we follow for the length of the scene) and add in all the other characters and important sound effects that happen in the scene. Then, because most panning will work in a stereo field with sounds moving more left or more right of the listener measured in degrees, I mark the starting degrees of each character/sound effect and trace their blocking through the scene, adding in shifts in degrees left or right as they move. What this gives me is a map of the action that I can then note as markers in my DAW so I can translate it into sound design.

Two sketches the author drew when sound designing Where the Stars Fell

Clear panning makes your blocking easier to follow because it means the listener can rely on both distance and direction to figure out where everyone is and how they’re moving around each other. It’s a lot easier to follow the action when we know that character A is to the right of and about a foot away from character B, then moves to their left to take their bag, rather than just being informed that character A is arbitrarily close to character B. It allows me to give much more information to the audience.

This is especially important in combat sequences, when the audience needs to know where our POV character’s opponents are coming from as well as how close they are. If we know our character is wounded in their left leg, an attack that we can perceive as coming from their left creates extra tension. We can even refuse to choose a POV character and instead block the fight like a stage play, moving all elements left and right across the soundstage and creating a sense of distance in the listener (for an example of this I’m quite proud of, be on the lookout for episode 24 of Where the Stars Fell which comes out July 21, 2023). 

When working in an audio-only medium, it’s important to take every opportunity to convey information to your listener. Panning is one of the most important and versatile ways you can do that.

Attention to locational detail

Locational detail refers to the auditory cues that tell us about the place a scene is set and how the characters are interacting with it. This can be anything from the sound of waves on a beach to crickets chirping at night to the creak of old wooden floors and the crunch of broken glass in someone’s footsteps. Without establishing visual shots, it’s important to pay attention to every aspect of a location that would create a sound and highlight those specific details to give the listener information about it.

Take the example of footsteps: I love the footstep generator plugin Walker 2 because it has tons of options for customizing footsteps to not just location, but characters. Everyone’s footsteps sound a little different, influenced by things like weight, gait, height, clothing preferences, and what items they carry. A person who’s tall, heavy, and is wearing heels and a silk dress is going to sound a lot different than a tiny toddler running around in sneakers. You can train your audience to recognize a character just by their distinct footsteps, and that helps clue them in to who’s moving in a scene with multiple characters. 

You can even use the presence of gravity itself to be a locational cue. In the minisode “Meanwhile”, Wolf 359 lets us know immediately that we’re no longer in the microgravity of the space station in which most of the show is set, just by starting the episode with the sound of footsteps walking across a linoleum floor. It’s a brilliant use of locational detail that takes advantage of what we’re used to hearing in this world, and uses a change in that as exposition.

Other elements such as reverb, ambiance, and distinct sounds that can act as auditory landmarks all make up what a place sounds like and how those sounds are affected by the events of the scene. Clearly and distinctly designing your spaces not only makes your world feel more lived-in and real, but helps the audience to picture things more clearly in their heads.

Featured articles:


How to edit short form video to attract the most views

Short-form video follows its own rules. Learn best practices for short-form platforms, plus tips for making the most engaging videos.


Video post-production: Tips to master video editing

Video post-production is where raw footage turns into magic. Learn the stages of post-production, from color correction to audio editing to motion graphics.

Articles you might find interesting


How to edit videos: A beginner’s tutorial (2023)

There’s no one right or wrong way to edit a video, but there are some essential tips you should know to make things go more smoothly.


The podcast advertising rates guide for 2023

In this guide, we’ll cover different advertising types you can have on your podcast and the range of rates you can charge your sponsors.

Product Updates

Descript 3.7 new features: Remove silence, batch export, and more

Descript 3.7 is here, and that means exciting new features and improvements to your workflow, including detecting and removing silence in one step, batch export, and easy changes between subscription plans. For a quick look at how these functions work, watch the video below. Read on for the full story.

Other stuff

How Cloudinary uses Descript to make customer education video that looks and sounds great

Cloudinary needs to make their customers understand how to use their software, and video is their primary tool. Here's how they use Descript to make video that looks and sounds great—without all the hassle.

Related articles:

Share this article

Get started for free →