Field notes - Signal processing - April 2026

PCA on Audio

One algorithm, two audio pipelines. Compression is straightforward. Denoising has a catch you have to handle first.

By The Anh Nguyen - PCA - STFT - Spectral subtraction

PCA finds the directions in your data with the most variance, ranks them, and lets you keep the top few or drop the bottom few. On audio, that one move does two different jobs.

1. What PCA does

A basis sorted by variance

PCA fits an orthogonal basis to a matrix. The first axis points where the data varies most; each next axis is orthogonal to the previous and explains the next-most variance. Common uses: keep the top-K axes for a compact approximation (compression / dimensionality reduction), or drop the bottom-K axes if you believe noise lives there (denoising).

PC1 points along the spread of the points; PC2 is orthogonal and captures the leftover. Keep top-K to compress; drop bottom-K to denoise.

2. Why audio?

Two ways to turn a waveform into a matrix

A waveform is 1D; PCA needs a 2D matrix. There are two ways to turn audio into a matrix. The compression pipeline uses one; the denoising pipeline uses the other.

Same waveform, two matrices. Time-blocks expose redundancy across short slices of waveform. The Short-Time Fourier Transform (STFT) exposes structure across frequency.

3. Compression

Top-K basis on time-domain blocks

Music repeats itself. The same chord shape comes back, the same drum hit, the same vowel sound. Many short slices of waveform end up looking like other slices, and a small set of basis shapes covers most of them.

K	Variance	SNR (dB)	Ratio
4	~30%	~3	~100×
16	~60%	~8	~40×
64 (default)	~97%	~15	~9×
256	~99.9%	~30	~3×

Advantages vs MP3 / Opus

No training, no learned codec.
Deterministic; one knob (K) for size.
Drops higher frequencies first as a free side effect.

Limitations vs MP3 / Opus

Worse perceptual quality at any matched ratio.
Not streamable. PCA fits the whole signal at once.
Hard cutoffs cause artifacts at low K.

4. Denoising

The simple plan, and why it isn't enough

Music has shape. A sung note, a plucked string, a drum hit. Each one is a recognisable pattern that repeats in the recording. PCA finds those patterns and sorts them from strongest to weakest. Pure random crackle has no pattern, so it falls to the weak end. The plan: keep the strong end (the music), drop the weak end (the noise).

That works for crackles. It doesn't work for steady noise.

A fan hum, microphone hiss, a refrigerator drone. These never let up. They're constant. To PCA, "constant" looks like "the strongest pattern in the room", so the noise gets sorted right next to the music at the top. Drop the weak end and the noise stays where it is.

Variance-sorted principal components. Stationary noise has high variance across frames, so raw PCA ranks it at the top. Spectral subtraction removes the noise floor first; then the ranking inverts and the variance threshold can drop the tail cleanly.

Spectral subtraction fixes this. Find the quietest moments in the recording, like the gaps between notes or the silences between words, and use them to measure what the constant background sounds like. Subtract that background from the rest of the recording. The steady noise is mostly gone before PCA even sees the data, the sorting flips back to normal, and dropping the weak end actually drops noise.

Advantages vs Wiener / RNNoise (classical and ML denoisers)

No training, no model dependencies.
Interpretable; one variance-threshold knob.
Deterministic and sample-rate agnostic.

Limitations vs Wiener / RNNoise (classical and ML denoisers)

Needs a quiet reference window in the recording.
Magnitude-only, so phase keeps the noise.
Hard cutoffs leave musical artifacts.
Outperformed by deep-learning denoisers on perceptual quality.

Stack NumPy, SciPy, scikit-learn, librosa, soundfile, Streamlit.