Understanding VSE Audio in the Sequencer

General notes on how audio and pitch works in Blender’s VSE
blender
c++
Author

TheKaceFiles

Published

June 7, 2025

This week, I have been taking the time to understand how Audaspace is integrated into Blender’s VSE. The most important files are blenkernel/intern/sound.cc and its header file blenkernel/BKE_sound.h. I will taking a little look at specifically animating sound properties (eg. volume and pitch) which @iss or Richard Antalik describes in the following thread. Furthermore, @neYyon or Jörg Müller gave tips on how to integrate the Rubber Band Library into Audaspace here.

Playing Audio

When you press Spacebar to play an audio clip in the sequencer, the BKE_sound_play_scene function is called, which makes several calls to the Audaspace library below.

Code

  // Called by `wmOperatorStatus ED_screen_animation_play(bContext *C, int sync, int mode)` in `screen_cops.cc`
  void BKE_sound_play_scene(Scene *scene)
  {
    std::lock_guard lock(g_state.sound_device_mutex);
    sound_device_use_begin();
    sound_verify_evaluated_id(&scene->id);

    AUD_Status status;
    const double cur_time = get_cur_time(scene);

    AUD_Device_lock(g_state.sound_device);

    if (scene->sound_scrub_handle &&
        AUD_Handle_getStatus(scene->sound_scrub_handle) != AUD_STATUS_INVALID)
    {
      /* If the audio scrub handle is playing back, stop to make sure it is not active.
      * Otherwise, it will trigger a callback that will stop audio playback. */
      AUD_Handle_stop(scene->sound_scrub_handle);
      scene->sound_scrub_handle = nullptr;
      /* The scrub_handle started playback with playback_handle, stop it so we can
      * properly restart it. */
      AUD_Handle_pause(scene->playback_handle);
    }

    status = scene->playback_handle ? AUD_Handle_getStatus(scene->playback_handle) :
                                      AUD_STATUS_INVALID;

    if (status == AUD_STATUS_INVALID) {
      sound_start_play_scene(scene);

      if (!scene->playback_handle) {
        AUD_Device_unlock(g_state.sound_device);
        return;
      }
    }

    if (status != AUD_STATUS_PLAYING) {
      /* Seeking the synchronizer will also seek the playback handle.
      * Even if we don't have A/V sync on, keep the synchronizer and handle seek time in sync. */
      AUD_seekSynchronizer(cur_time);
      AUD_Handle_setPosition(scene->playback_handle, cur_time);
      AUD_Handle_resume(scene->playback_handle);
    }

    if (scene->audio.flag & AUDIO_SYNC) {
      AUD_playSynchronizer();
    }

    AUD_Device_unlock(g_state.sound_device);
  }

Animating Audio

There are currently 5 properties of audio that can be animated in Audiospace shown below:

Code
AnimateableProperty* SequenceEntry::getAnimProperty(AnimateablePropertyType type)
{
    switch(type)
    {
    case AP_VOLUME:
        return &m_volume;
    case AP_PITCH:
        return &m_pitch;
    case AP_PANNING:
        return &m_panning;
    case AP_LOCATION:
        return &m_location;
    case AP_ORIENTATION:
        return &m_orientation;
    default:
        return nullptr;
    }
}

We’ll be looking at in particular how the volume and pitch is animated in Blender’s VSE.

Volume

(CAUTION: VOLUME WARNING)

Figure 1

The RNA for the volume property is defined rna_sequencer.cc in the function rna_def_audio_options

Code
static void rna_def_audio_options(StructRNA *srna)
{
  PropertyRNA *prop;

  prop = RNA_def_property(srna, "volume", PROP_FLOAT, PROP_NONE);
  RNA_def_property_float_sdna(prop, nullptr, "volume");
  RNA_def_property_range(prop, 0.0f, 100.0f);
  RNA_def_property_float_default(prop, 1.0f);
  RNA_def_property_ui_text(prop, "Volume", "Playback volume of the sound");
  RNA_def_property_translation_context(prop, BLT_I18NCONTEXT_ID_SOUND);
  RNA_def_property_update(prop, NC_SCENE | ND_SEQUENCER, "rna_Strip_audio_update");
}

And the UI for the volume (or the sound properties) is defined in space_sequencer.py

Code
...
def draw(self, context):
        layout = self.layout

        st = context.space_data
        overlay_settings = st.timeline_overlay
        strip = context.active_strip
        sound = strip.sound

        layout.active = not strip.mute

        if sound is not None:
            layout.use_property_split = True
            col = layout.column()

            split = col.split(factor=0.4)
            split.alignment = 'RIGHT'
            split.label(text="Volume", text_ctxt=i18n_contexts.id_sound)
            split.prop(strip, "volume", text="")

            layout.use_property_split = False
        ...

When we scrub the volume property, the function BKE_sound_set_scene_sound_volume_at_frame is called. For example below, sliding the volume to 1.4…

Volume property in Blender's video sequencer

leads to the breakpoint in BKE_sound_set_scene_sound_volume_at_frame.

Breakpoint at BKE_sound_set_scene_sound_volume_at_frame example

The frame variable corresponds to the location of the playhead at 2 seconds (60 frames) and 29 frames, so (60 + 29 = 89 frames). The handle variable is a pointer to Audaspace’s AUD_SequenceEntry class, which stores the variables such as volume and pitch for a sound sequence as shown below in Audaspace’s SequenceEntry.h.

Code
    /// The animated volume.
    AnimateableProperty m_volume;

    /// The animated panning.
    AnimateableProperty m_panning;

    /// The animated pitch.
    AnimateableProperty m_pitch;

    /// The animated location.
    AnimateableProperty m_location;

    /// The animated orientation.
    AnimateableProperty m_orientation;

The BKE_sound_set_scene_sound_volume_at_frame function just only makes a call to AUD_SequenceEntry_setAnimationData as shown below:

Code
void BKE_sound_set_scene_sound_volume_at_frame(void *handle,
                                               const int frame,
                                               float volume,
                                               const char animated)
{
  AUD_SequenceEntry_setAnimationData(handle, AUD_AP_VOLUME, frame, &volume, animated);
}

The function AUD_SequenceEntry_setAnimationData in AUD_Sequence.cpp looks like the following:

Code
AUD_API void AUD_SequenceEntry_setAnimationData(AUD_SequenceEntry* entry, AUD_AnimateablePropertyType type, int frame, float* data, char animated)
{
    AnimateableProperty* prop = (*entry)->getAnimProperty(static_cast<AnimateablePropertyType>(type));
    if(animated)
    {
        if(frame >= 0)
            prop->write(data, frame, 1);
    }
    else
    {
        prop->write(data);
    }
}

And finally, below is the call stack for BKE_sound_set_scene_sound_volume_at_frame

Call Stack
Blender!BKE_sound_set_scene_sound_volume_at_frame(void*, int, float, char) (blender/source/blender/blenkernel/intern/sound.cc:1029)
Blender!blender::seq::strip_update_sound_properties(Scene const*, Strip const*) (blender/source/blender/sequencer/intern/sequencer.cc:997)
Blender!blender::seq::strip_sound_update_cb(Strip*, void*) (blender/source/blender/sequencer/intern/sequencer.cc:1083)
Blender!blender::seq::strip_for_each_recursive(ListBase*, bool (*)(Strip*, void*), void*) (blender/source/blender/sequencer/intern/iterator.cc:29)
Blender!blender::seq::for_each_callback(ListBase*, bool (*)(Strip*, void*), void*) (blender/source/blender/sequencer/intern/iterator.cc:44)
Blender!blender::seq::eval_strips(Depsgraph*, Scene*, ListBase*) (blender/source/blender/sequencer/intern/sequencer.cc:1092)
Blender!blender::deg::DepsgraphNodeBuilder::build_scene_sequencer(Scene*)::$_0::operator()(Depsgraph*) const (blender/source/blender/depsgraph/intern/builder/deg_builder_nodes.cc:2312)

The important thing to note above is that BKE_sound_set_scene_sound_volume_at_frame is called by the function strip_update_sound_properties in sequencer.cc. In a future blog, I’ll likely discuss about the animated parameter in AUD_SequenceEntry_setAnimationData and have a demo program to test out, as currently, I’m not exactly sure what it does and haven’t investigated thoroughly yet!

Pitch

This GSOC project will primarily focus on the sound property of pitch for implementing pitch correction. In Blender, pitch is primarily affected by retiming keys, which allows you to change the playback speed of video/audio clips. However, this has the consequence of increasing the pitch when the playback speed is increased or decreasing the pitch when the playback speed is decreased. Below is an example of using retiming keys to increase the playback audio speed.

Example of using retiming keys to increase the audio playback speed by 2.27x

First, the code relevant to the drawing the retiming keys onto the audio strip can be found in sequencer_retiming_draw.cc but I haven’t had the time look into, but I believe it is not currently not too relevant for this project.

Now, one of the relevant function related to the functionality of the retiming keys (and therefore pitch!) is retiming_sound_animation_data_set in strip_retiming.cc which is declared as the following:

Code
void retiming_sound_animation_data_set(const Scene *scene, const Strip *strip)
{
  /* Content cut off by `anim_startofs` is as if it does not exist for sequencer. But Audaspace
   * seeking relies on having animation buffer initialized for whole sequence. */
  if (strip->anim_startofs > 0) {
    const int strip_start = time_start_frame_get(strip);
    BKE_sound_set_scene_sound_pitch_constant_range(
        strip->scene_sound, strip_start - strip->anim_startofs, strip_start, 1.0f);
  }

  const float scene_fps = float(scene->r.frs_sec) / float(scene->r.frs_sec_base);
  const int sound_offset = time_get_rounded_sound_offset(strip, scene_fps);

  RetimingRangeData retiming_data = strip_retiming_range_data_get(scene, strip);
  for (int i = 0; i < retiming_data.ranges.size(); i++) {
    RetimingRange range = retiming_data.ranges[i];
    if (range.type == TRANSITION) {

      const int range_length = range.end - range.start;
      for (int i = 0; i <= range_length; i++) {
        const int frame = range.start + i;
        BKE_sound_set_scene_sound_pitch_at_frame(
            strip->scene_sound, frame + sound_offset, range.speed_table[i], true);
      }
    }
    else {
      BKE_sound_set_scene_sound_pitch_constant_range(
          strip->scene_sound, range.start + sound_offset, range.end + sound_offset, range.speed);
    }
  }
}

The code above loops over retiming key ranges (which is stored in the RetimingRange class and contains variables like the start and end frame, the playbeed speed, as well as it type which is defined as an enumerator below in strip_retiming.cc)

enum eRangeType {
  LINEAR = 0,
  TRANSITION = 1,
};

Additionally, retiming_sound_animation_data_set makes a call to two different pitch functions BKE_sound_set_scene_sound_pitch_at_frame and BKE_sound_set_scene_sound_pitch_constant_range depending on whether the range is LINEAR or TRANSITION and is defined as below in sound.cc:

Code
void BKE_sound_set_scene_sound_pitch_at_frame(void *handle,
                                              const int frame,
                                              float pitch,
                                              const char animated)
{
  AUD_SequenceEntry_setAnimationData(handle, AUD_AP_PITCH, frame, &pitch, animated);
}

void BKE_sound_set_scene_sound_pitch_constant_range(void *handle,
                                                    int frame_start,
                                                    int frame_end,
                                                    float pitch)
{
  frame_start = max_ii(0, frame_start);
  frame_end = max_ii(0, frame_end);
  AUD_SequenceEntry_setConstantRangeAnimationData(
      handle, AUD_AP_PITCH, frame_start, frame_end, &pitch);
}

Below is an example of the two range types:

Different range types: TRANSITION and LINEAR

where the TRANSITION range is represented by the retiming keys with 77% - 116% in between while the retiming keys with 77% or 116% represents a LINEAR range. In the example image above, the TRANSITION range interpolates between the 77% and 116% playback speed from the 00:23-00:36 range. Codewise, RetimingRange stores the interpolated values from 77% - 116% inside a vector speed_table which is set for which is set for each frame within the TRANSITION range.

Meanwhile, the LINEAR ranges (00:00-00:23 and 00:36-01:10) maintains a constant playback speed (and thus the same constant pitch for that particular playback speed).

Here’s an example audio clip which contains both the RetimingRanges type!

Next Steps

  • In the upcoming weeks, I will probably play around with the AUD_SequenceEntry class in an isolated environment to get a better sense of how it works and also mess around with the different AnimateablePropertys. - I currently have the Rubberband Library built as a static library in a separate, local fork, and make it public for code review this or next week. I will also take the time to understand Audaspace’s Effect and EffectReader class by looking at the many, many examples and see if I can do something with the Rubberband Library there
  • I also need to finish up writing the post about compiling the Rubberband Library, how to use it, and benchmarking it to make sure it is suitable and usable for Audaspace/Blender

Questions

These were things/questions I wasn’t sure about Blender’s codebase or where I can access the variable from the UI in the 1st week! These were answered by Aras, which I summarized his answers below.

Question #1

For this line in sequencer.cc

BKE_sound_set_scene_sound_volume_at_frame(strip->scene_sound, frame, strip->volume, (strip->flag & SEQ_AUDIO_VOLUME_ANIMATED) != 0);

where can the strip flag be toggled for having animated volume in Blender

Answer #1

The volume property is driven by an animation f-curve or animation driver expression.

Example of volume being driven by f-curve

Question #2

For AUD_SequenceEntry_setAnimationData function what exactly does the animated parameter do? I didn’t have time to look into further this week.

Answer #2

When the animated parameter is set to true, then it applies whatever property to that given frame. Otherwise, it sets it for that entire SequenceEntry

Question #3

For the operator types below, what editor has these operators in Blender?
Code
void ED_operatortypes_sound()
{
  WM_operatortype_append(SOUND_OT_open);
  WM_operatortype_append(SOUND_OT_open_mono);
  WM_operatortype_append(SOUND_OT_mixdown);
  WM_operatortype_append(SOUND_OT_pack);
  WM_operatortype_append(SOUND_OT_unpack);
  WM_operatortype_append(SOUND_OT_update_animation_flags);
  WM_operatortype_append(SOUND_OT_bake_animation);
}

Answer #3

It’s from “Render -> Render Audio…” menu item in Blender, but the operations are not too relevant to the project

Random Thoughts

  • I learned a lot of things during this first week from things like compiling/making a blog, to learning a little bit about CMAKE files and how to read them, and to using the debugger in VSCode more efficiently and for other projects
  • I hope these notes were somewhat useful or insightful to anyone looking to understand a bit about Blender’s VSE! I know these notes will be useful for me down the line as I start to implement the pitch correction toggle, and I will have more questions down the line! This post took way longer to write than I initially thought, but I really enjoyed taking the time to understand and explain a portion of Blender’s VSE.
  • It so far seems very likely that I will not have to manually implement the pitch correction algorithm myself!
  • In the future, I will probably update this post to have more information, as I’ve just mostly touched upon areas that were relevant to the project