Technical Case Study — Studio David Preli — 2026

Audio Cue Mapper

Browser-based audio analysis tool for motion design sync workflows

Runtime Vanilla JS — no dependencies

Hosting GitHub Pages

Export CSV — JSON — AE .jsx

Year 2026

01Overview

What it does

Audio Cue Mapper is a browser-based tool that analyses an audio file and produces a structured set of visual cues for use in After Effects. Drop in any standard audio or video file. The tool performs all signal processing locally in the browser, renders an interactive waveform and tiered cue map, and exports an After Effects ExtendScript file that populates the composition timeline with automatically generated marker layers.

The tool was built to support The Dream Team — a pixel-art animation synced to a music track by Fievel Is Glauque — where existing beat-detection tools were unsuitable for the loose, live-recorded character of the source material. No data is uploaded anywhere. No server is involved at any stage.

Live tool

02System architecture

A single self-contained file

The entire application is one HTML file. All analysis, rendering, and export logic runs client-side. There are no external dependencies, no build step, no npm, no framework, and no backend.

Deployment stack

Layer	Detail
Hosting	GitHub Pages — static file, zero infrastructure
Embedding	Cargo 3 iframe with `postMessage` resize bridge
Runtime	Vanilla JS, Web Audio API, Canvas 2D API
Fonts	IBM Plex Mono + DM Serif Display via Google Fonts
Dependencies	None

iframe communication

Because Cargo 3 has no server-side scripting and limited <head> access, the tool is hosted externally and embedded. After analysis completes, the tool calls window.parent.postMessage() with the new document height. The parent page listens and resizes the iframe accordingly, preventing content clipping.

03Audio analysis pipeline

Seven processing stages

When a file is dropped or selected, the browser reads it as an ArrayBuffer and passes it through seven sequential stages. All computation runs on the main thread, broken into yielding chunks via setTimeout(r, 0) so the UI remains responsive during analysis.

Web Audio API FFT Spectral flux RMS envelope Autocorrelation

2.1 — Decoding

The raw ArrayBuffer is decoded using the Web Audio API's decodeAudioData(), which handles MP3, WAV, FLAC, OGG, AAC, and M4A natively in the browser. The decoded AudioBuffer is immediately mixed to mono by averaging the left and right channel sample arrays. Mono mixing halves the processing load and is standard practice for onset detection, which does not benefit from stereo information.

2.2 — Waveform downsampling

The mono sample array (typically 4–8 million samples for a 90-second track at 44.1kHz) is downsampled to 1,200 min/max pairs for display. Each pair captures the peak positive and peak negative sample within a window, producing the filled waveform shape characteristic of audio editors. This representation is visual only and plays no part in the analysis.

2.3 — Spectral flux onset detection

Onset detection is the core of the analysis. The algorithm steps through the audio in overlapping frames:

Parameter	Value
Frame size	2048 samples
Hop size	512 samples (~11.6ms at 44.1kHz)
Window function	Hanning — reduces spectral leakage at frame edges
Transform	Cooley–Tukey radix-2 FFT, implemented from scratch in JS

For each frame the algorithm computes the positive difference between the current magnitude spectrum and the previous frame's spectrum — this is spectral flux. A spike in flux indicates new frequency energy entering the signal: a drum hit, a chord onset, a melodic phrase beginning. The flux values form a continuous time series across the track duration.

2.4 — Peak picking and tiering

The flux array is lightly smoothed and local maxima above the 85th percentile are identified, with a minimum distance constraint between picks to prevent double-triggering. Peaks are tiered by strength relative to the global maximum:

Tier	Threshold	Use
1 — Major	≥72% of max	Strongest hits. Key poses, cuts, major transitions.
2 — Mid	48–72%	Mid-level onsets. Secondary accents, phrase markers.
3 — Minor	<48%	Subtle onsets. Texture, micro-timing, detail passes.

2.5 — RMS envelope

Root mean square energy is computed per frame, giving a measure of perceived loudness over time. The RMS array is used for waveform background shading and for structural segmentation.

2.6 — Structural segmentation

The RMS envelope is heavily smoothed over a two-second window and local minima are identified. These represent moments where the music breathes between phrases. The five deepest minima become section boundaries, dividing the track into up to six structural sections. This is a heuristic — it works well for music with clear phrase structure and less well for continuous or through-composed material.

2.7 — Tempo estimation

Autocorrelation is performed on the spectral flux signal across a range of lags corresponding to 50–240 BPM. The lag with the highest correlation coefficient corresponds to the beat period, folded into a musically sensible range (60–200 BPM). This estimate drives the beat grid export and is approximate — for tracks with irregular pulse the grid should be treated as a starting reference.

04Visualisation

Two canvas panels

Results are rendered to two HTML <canvas> elements using the Canvas 2D API. Both use device pixel ratio scaling to render crisply on high-DPI displays and resize responsively on window resize events.

Waveform

The 1,200-point min/max array is drawn as a filled polygon — the top edge traces positive peaks, the bottom edge traces negative peaks in reverse, closed and filled with a gradient. Section boundaries overlay as dashed vertical lines.

Cue map

Three tiers of peaks are drawn as vertical bar elements centred on the canvas midpoint, scaled by strength:

Tier 1 (Major) — tall green bars with glow and capped endpoints
Tier 2 (Mid) — medium periwinkle bars
Tier 3 (Minor) — short amber bars anchored to the canvas bottom

Section boundaries appear as dashed green vertical lines with section index labels. A 5-second grid provides temporal orientation. Hovering shows a tooltip with the nearest peak's timecode, tier, and strength percentage.

05After Effects export

Five null layers, one JSX script

The primary deliverable is a .jsx ExtendScript file. Run in After Effects via File › Scripts › Run Script File, it creates five null layers in the active composition, each carrying a specific category of markers.

Layer	AE label	Contents
♪ Beat grid	Sea Foam	Every beat at estimated BPM. Downbeats labelled M1, M2… Off-beats as 1.2, 1.3, 1.4.
§ Sections	Yellow	One marker per structural boundary with section index and timecode.
▲ Major hits	Tangerine	Tier 1 peaks. Timecode and strength percentage.
● Mid hits	Aqua	Tier 2 peaks. Timecode and strength percentage.
· Minor hits	Lavender	Tier 3 onsets. Timecode and strength percentage.

Layers are added in reverse order so the beat grid sits at the top of the layer stack. All markers are placed using MarkerValue objects and setValueAtTime() — the standard ExtendScript API for composition markers. After Effects snaps keyframes and layer in/out points to composition markers when Shift is held while dragging, making the marker layers directly usable as a snap grid.

Suggested workflow

Solo the Sections layer to block the large-scale structure
Solo Major Hits to place key poses and primary cuts
Use Beat Grid to time secondary animation and transitions
Reference Mid and Minor layers for texture and micro-timing

06Design

Visual language

The tool shares a design language with the Pokédex viewer and Dream Team case study on the same portfolio site, creating coherence across all embedded tools.

Colour palette

Token	Hex	Role
`--bg`	#141414	Page background — matches Pokédex viewer canvas
`--accent`	#76c17d	Primary / Tier 1 — Pokédex green
`--accent2`	#b9b0ff	Tier 2 — periwinkle
`--accent3`	#e8b86d	Tier 3 / beat grid — warm amber
`--text`	#e1e1e1	Primary text — 12:1 contrast on --bg (WCAG AAA)
`--muted`	#888888	Secondary text — 4.6:1 contrast (WCAG AA)

Typography

IBM Plex Mono throughout, matching the portfolio site's typographic system. DM Serif Display italic for the track title and upload prompt — a tonal counterpoint to the monospace grid.

Loading animation

During analysis a quarter note hops along a five-line staff via CSS keyframe animation. Ghost notes trail behind on the staff lines. The animation is entirely CSS — no JavaScript, no canvas — and uses the same CSS variable palette as the rest of the interface.

07Limitations

Known constraints

Tempo estimation is approximate. Music with irregular pulse, significant rubato, or complex polyrhythm will produce a beat grid that diverges from the actual pulse. Treat the grid as a starting reference.
Segmentation is heuristic. The RMS-valley approach works well for music with clear phrase structure. Continuous or through-composed material may produce fewer or less meaningful section boundaries.
The FFT is unoptimised. The Cooley–Tukey implementation is correct but not performance-tuned — analysis of files longer than ten minutes may be slow on lower-powered devices.
Codec support varies. decodeAudioData() does not support all container/codec variants. Some MP4 files with unusual audio encoding may fail to decode. WAV and FLAC are the most reliable formats.
Markers land on null layers. Markers are placed on null layers, not on the audio layer itself. Snapping works identically, but the visual association between waveform and markers requires manual layer management.