The Anh Nguyen Try the live app ->

Field notes - Drug safety modelling - April 2026

Five Models, One Molecule

A drug-safety screener for a problem with very little data. Five different models, one molecule, and the gap between their answers.

A few prescription drugs cause a rare but serious side effect: the patient's immune system starts attacking their own body. Drug companies want to flag risky candidates early. They have 477 confirmed examples to learn from.

ALL DRUG-LIKE CHEMICAL SPACE vastly larger than any dataset can sample 477 KNOWN EXAMPLES familiar query grounded prediction unfamiliar query model is guessing unfamiliar query model is guessing
A model trained on 477 molecules has to make predictions for any drug-shaped chemical. Near familiar examples it is grounded; far from them it is guessing, and you cannot tell the difference from the probability alone.

1. One number hides too much

A confident probability isn't enough on its own

The previous version of this screener returned a single probability. It looked confident whether the molecule was familiar or out of the model's depth. You couldn't tell which.

2. Five models, watch them disagree

When five algorithms argue, the disagreement is the answer

Five different kinds of models score the same input. A tight cluster of probabilities means the answer is grounded. A wide spread means the molecule is hard to score, and any single number from the group would hide that.

WHEN MODELS AGREE familiar molecule, spread = 0.04 ridge 0.40 knn 0.38 rf 0.42 xgboost 0.39 adaboost 0.41 0.0 0.5 1.0 PREDICTED RISK tight WHEN MODELS DISAGREE procainamide, spread = 0.53 ridge 0.45 knn 0.60 rf 0.32 xgboost 0.07 adaboost 0.50 0.0 0.5 1.0 PREDICTED RISK spread = 0.53
Each row is one model's predicted risk on a 0 to 1 scale. On a familiar molecule the five models cluster tightly. On procainamide they don't: the strictest model gives 0.07 (looks safe), the most cautious gives 0.60 (probably risky). Same molecule, opposite verdicts. The 0.53 spread is the headline; the 0.41 average alone would hide it.

The five live in five different families: a linear model, a nearest-neighbour model, a random forest, and two boosting variants (XGBoost and AdaBoost). They are different enough to disagree when a molecule is unfamiliar, and similar enough to agree when it is not.

3. Chemistry that doesn't need a model

Three checks that run regardless of which model is selected

Three quick checks accompany every prediction. None of them depend on a trained model; each answers a question a chemist would ask before trusting any number from the lineup.

CHECK 1 Has the model seen anything like this before? CHECK 2 Are there chemical patterns known to be dangerous? CHECK 3 What did the closest known molecules actually do?
Three quick checks that don't rely on any trained model. They run once per molecule and accompany every prediction.

4. A learning project, not a product

Built to study, not to ship

This is a study build, not a clinical tool. The goal was to compare five model families on the same small dataset and see what their disagreement looks like. The probabilities are not calibrated for real-world use, and nothing here should feed a medical decision.

Stack Built with scikit-learn, xgboost, RDKit, and Streamlit. Dataset: InterDIA (UCI repository ID 1104, 477 drug compounds labelled for drug-induced autoimmunity risk), 477 train and 120 test compounds.