Field notes - Signal processing - April 2026
PCA on Audio
One algorithm, two audio pipelines. Compression is straightforward. Denoising has a catch you have to handle first.
PCA finds the directions in your data with the most variance, ranks them, and lets you keep the top few or drop the bottom few. On audio, that one move does two different jobs.
1. What PCA does
A basis sorted by variance
PCA fits an orthogonal basis to a matrix. The first axis points where the data varies most; each next axis is orthogonal to the previous and explains the next-most variance. Common uses: keep the top-K axes for a compact approximation (compression / dimensionality reduction), or drop the bottom-K axes if you believe noise lives there (denoising).
2. Why audio?
Two ways to turn a waveform into a matrix
A waveform is 1D; PCA needs a 2D matrix. There are two ways to turn audio into a matrix. The compression pipeline uses one; the denoising pipeline uses the other.
3. Compression
Top-K basis on time-domain blocks
Music repeats itself. The same chord shape comes back, the same drum hit, the same vowel sound. Many short slices of waveform end up looking like other slices, and a small set of basis shapes covers most of them.
| K | Variance | SNR (dB) | Ratio |
|---|---|---|---|
| 4 | ~30% | ~3 | ~100× |
| 16 | ~60% | ~8 | ~40× |
| 64 (default) | ~97% | ~15 | ~9× |
| 256 | ~99.9% | ~30 | ~3× |
Advantages vs MP3 / Opus
- No training, no learned codec.
- Deterministic; one knob (K) for size.
- Drops higher frequencies first as a free side effect.
Limitations vs MP3 / Opus
- Worse perceptual quality at any matched ratio.
- Not streamable. PCA fits the whole signal at once.
- Hard cutoffs cause artifacts at low K.
4. Denoising
The simple plan, and why it isn't enough
Music has shape. A sung note, a plucked string, a drum hit. Each one is a recognisable pattern that repeats in the recording. PCA finds those patterns and sorts them from strongest to weakest. Pure random crackle has no pattern, so it falls to the weak end. The plan: keep the strong end (the music), drop the weak end (the noise).
That works for crackles. It doesn't work for steady noise.
A fan hum, microphone hiss, a refrigerator drone. These never let up. They're constant. To PCA, "constant" looks like "the strongest pattern in the room", so the noise gets sorted right next to the music at the top. Drop the weak end and the noise stays where it is.
Spectral subtraction fixes this. Find the quietest moments in the recording, like the gaps between notes or the silences between words, and use them to measure what the constant background sounds like. Subtract that background from the rest of the recording. The steady noise is mostly gone before PCA even sees the data, the sorting flips back to normal, and dropping the weak end actually drops noise.
Advantages vs Wiener / RNNoise (classical and ML denoisers)
- No training, no model dependencies.
- Interpretable; one variance-threshold knob.
- Deterministic and sample-rate agnostic.
Limitations vs Wiener / RNNoise (classical and ML denoisers)
- Needs a quiet reference window in the recording.
- Magnitude-only, so phase keeps the noise.
- Hard cutoffs leave musical artifacts.
- Outperformed by deep-learning denoisers on perceptual quality.
Stack NumPy, SciPy, scikit-learn, librosa, soundfile, Streamlit.