Builder & Technical Lead - AI / Data Engineering - 15+ years
I optimize data systems that handle billions of records, cut infrastructure costs, and actually last. Hands-on from product discovery, architecture, implementation to deployment.
Case studies
Convolutional neural network for finger-drawn sketches. Four experiments to close the training-inference gap, most notably switching from pre-rendered 28×28 bitmaps to native stroke-vector rendering at 128×128. Built an X-Ray visualization to see what each layer learns. Live input via finger, mouse, or webcam through a Streamlit app.
Five algorithms scoring the same molecule for drug-induced autoimmunity risk. The spread between their probabilities is the headline; the chemistry interpretation runs once and stays the same regardless of which model the user picks.
Two PCA pipelines on audio: lossy compression on raw waveform blocks, and denoising on STFT spectrograms. Without spectral subtraction enabled, stationary noise hides at the top of the variance ranking instead of the bottom, and the denoiser does almost nothing.
Spark jobs that processed billions of records were dominated by shuffle cost on EMR. Pushed filters and projections before joins so data shrank before any wide transformation. Used broadcast joins for small dimension tables to skip shuffle entirely. Pre-partitioned hot datasets so repeated joins reused the layout instead of reshuffling.
Net result: faster jobs, smaller clusters, lower bills.
Expertise
About
15+ years engineering at scale. Pipelines, EMR clusters, Airflow DAGs, and the unglamorous work of cutting cloud costs from the inside. I profile actual bottlenecks before changing anything, and prefer durable fixes over clever ones.
Built teams too: grew my team at iPrice from 3 to 7, mentored two engineers who were later promoted.
Building AI applications now: LLM-powered automation in production, and learning ML by building (see SketchNet, DIA Risk Screener). Also travel and read.