Technical Case Study — Studio David Preli — 2026

Audio Cue Mapper

Browser-based audio analysis tool for motion design sync workflows

Runtime Vanilla JS — no dependencies
Hosting GitHub Pages
Export CSV — JSON — AE .jsx
Year 2026
01Overview

What it does

Audio Cue Mapper is a browser-based tool that analyses an audio file and produces a structured set of visual cues for use in After Effects. Drop in any standard audio or video file. The tool performs all signal processing locally in the browser, renders an interactive waveform and tiered cue map, and exports an After Effects ExtendScript file that populates the composition timeline with automatically generated marker layers.

The tool was built to support The Dream Team — a pixel-art animation synced to a music track by Fievel Is Glauque — where existing beat-detection tools were unsuitable for the loose, live-recorded character of the source material. No data is uploaded anywhere. No server is involved at any stage.

Live tool
02System architecture

A single self-contained file

The entire application is one HTML file. All analysis, rendering, and export logic runs client-side. There are no external dependencies, no build step, no npm, no framework, and no backend.

Deployment stack

LayerDetail
HostingGitHub Pages — static file, zero infrastructure
EmbeddingCargo 3 iframe with postMessage resize bridge
RuntimeVanilla JS, Web Audio API, Canvas 2D API
FontsIBM Plex Mono + DM Serif Display via Google Fonts
DependenciesNone

iframe communication

Because Cargo 3 has no server-side scripting and limited <head> access, the tool is hosted externally and embedded. After analysis completes, the tool calls window.parent.postMessage() with the new document height. The parent page listens and resizes the iframe accordingly, preventing content clipping.

03Audio analysis pipeline

Seven processing stages

When a file is dropped or selected, the browser reads it as an ArrayBuffer and passes it through seven sequential stages. All computation runs on the main thread, broken into yielding chunks via setTimeout(r, 0) so the UI remains responsive during analysis.

Web Audio API FFT Spectral flux RMS envelope Autocorrelation

2.1 — Decoding

The raw ArrayBuffer is decoded using the Web Audio API's decodeAudioData(), which handles MP3, WAV, FLAC, OGG, AAC, and M4A natively in the browser. The decoded AudioBuffer is immediately mixed to mono by averaging the left and right channel sample arrays. Mono mixing halves the processing load and is standard practice for onset detection, which does not benefit from stereo information.

2.2 — Waveform downsampling

The mono sample array (typically 4–8 million samples for a 90-second track at 44.1kHz) is downsampled to 1,200 min/max pairs for display. Each pair captures the peak positive and peak negative sample within a window, producing the filled waveform shape characteristic of audio editors. This representation is visual only and plays no part in the analysis.

2.3 — Spectral flux onset detection

Onset detection is the core of the analysis. The algorithm steps through the audio in overlapping frames:

ParameterValue
Frame size2048 samples
Hop size512 samples (~11.6ms at 44.1kHz)
Window functionHanning — reduces spectral leakage at frame edges
TransformCooley–Tukey radix-2 FFT, implemented from scratch in JS

For each frame the algorithm computes the positive difference between the current magnitude spectrum and the previous frame's spectrum — this is spectral flux. A spike in flux indicates new frequency energy entering the signal: a drum hit, a chord onset, a melodic phrase beginning. The flux values form a continuous time series across the track duration.

2.4 — Peak picking and tiering

The flux array is lightly smoothed and local maxima above the 85th percentile are identified, with a minimum distance constraint between picks to prevent double-triggering. Peaks are tiered by strength relative to the global maximum:

TierThresholdUse
1 — Major≥72% of maxStrongest hits. Key poses, cuts, major transitions.
2 — Mid48–72%Mid-level onsets. Secondary accents, phrase markers.
3 — Minor<48%Subtle onsets. Texture, micro-timing, detail passes.

2.5 — RMS envelope

Root mean square energy is computed per frame, giving a measure of perceived loudness over time. The RMS array is used for waveform background shading and for structural segmentation.

2.6 — Structural segmentation

The RMS envelope is heavily smoothed over a two-second window and local minima are identified. These represent moments where the music breathes between phrases. The five deepest minima become section boundaries, dividing the track into up to six structural sections. This is a heuristic — it works well for music with clear phrase structure and less well for continuous or through-composed material.

2.7 — Tempo estimation

Autocorrelation is performed on the spectral flux signal across a range of lags corresponding to 50–240 BPM. The lag with the highest correlation coefficient corresponds to the beat period, folded into a musically sensible range (60–200 BPM). This estimate drives the beat grid export and is approximate — for tracks with irregular pulse the grid should be treated as a starting reference.

04Visualisation

Two canvas panels

Results are rendered to two HTML <canvas> elements using the Canvas 2D API. Both use device pixel ratio scaling to render crisply on high-DPI displays and resize responsively on window resize events.

Waveform

The 1,200-point min/max array is drawn as a filled polygon — the top edge traces positive peaks, the bottom edge traces negative peaks in reverse, closed and filled with a gradient. Section boundaries overlay as dashed vertical lines.

Cue map

Three tiers of peaks are drawn as vertical bar elements centred on the canvas midpoint, scaled by strength:

Section boundaries appear as dashed green vertical lines with section index labels. A 5-second grid provides temporal orientation. Hovering shows a tooltip with the nearest peak's timecode, tier, and strength percentage.

05After Effects export

Five null layers, one JSX script

The primary deliverable is a .jsx ExtendScript file. Run in After Effects via File › Scripts › Run Script File, it creates five null layers in the active composition, each carrying a specific category of markers.

LayerAE labelContents
♪ Beat gridSea FoamEvery beat at estimated BPM. Downbeats labelled M1, M2… Off-beats as 1.2, 1.3, 1.4.
§ SectionsYellowOne marker per structural boundary with section index and timecode.
▲ Major hitsTangerineTier 1 peaks. Timecode and strength percentage.
● Mid hitsAquaTier 2 peaks. Timecode and strength percentage.
· Minor hitsLavenderTier 3 onsets. Timecode and strength percentage.

Layers are added in reverse order so the beat grid sits at the top of the layer stack. All markers are placed using MarkerValue objects and setValueAtTime() — the standard ExtendScript API for composition markers. After Effects snaps keyframes and layer in/out points to composition markers when Shift is held while dragging, making the marker layers directly usable as a snap grid.

Suggested workflow

  1. Solo the Sections layer to block the large-scale structure
  2. Solo Major Hits to place key poses and primary cuts
  3. Use Beat Grid to time secondary animation and transitions
  4. Reference Mid and Minor layers for texture and micro-timing
06Design

Visual language

The tool shares a design language with the Pokédex viewer and Dream Team case study on the same portfolio site, creating coherence across all embedded tools.

Colour palette

TokenHexRole
--bg#141414Page background — matches Pokédex viewer canvas
--accent#76c17dPrimary / Tier 1 — Pokédex green
--accent2#b9b0ffTier 2 — periwinkle
--accent3#e8b86dTier 3 / beat grid — warm amber
--text#e1e1e1Primary text — 12:1 contrast on --bg (WCAG AAA)
--muted#888888Secondary text — 4.6:1 contrast (WCAG AA)

Typography

IBM Plex Mono throughout, matching the portfolio site's typographic system. DM Serif Display italic for the track title and upload prompt — a tonal counterpoint to the monospace grid.

Loading animation

During analysis a quarter note hops along a five-line staff via CSS keyframe animation. Ghost notes trail behind on the staff lines. The animation is entirely CSS — no JavaScript, no canvas — and uses the same CSS variable palette as the rest of the interface.

07Limitations

Known constraints