Field notes - Drug safety modelling - April 2026
Five Models, One Molecule
A drug-safety screener for a problem with very little data. Five different models, one molecule, and the gap between their answers.
A few prescription drugs cause a rare but serious side effect: the patient's immune system starts attacking their own body. Drug companies want to flag risky candidates early. They have 477 confirmed examples to learn from.
1. One number hides too much
A confident probability isn't enough on its own
The previous version of this screener returned a single probability. It looked confident whether the molecule was familiar or out of the model's depth. You couldn't tell which.
2. Five models, watch them disagree
When five algorithms argue, the disagreement is the answer
Five different kinds of models score the same input. A tight cluster of probabilities means the answer is grounded. A wide spread means the molecule is hard to score, and any single number from the group would hide that.
The five live in five different families: a linear model, a nearest-neighbour model, a random forest, and two boosting variants (XGBoost and AdaBoost). They are different enough to disagree when a molecule is unfamiliar, and similar enough to agree when it is not.
3. Chemistry that doesn't need a model
Three checks that run regardless of which model is selected
Three quick checks accompany every prediction. None of them depend on a trained model; each answers a question a chemist would ask before trusting any number from the lineup.
4. A learning project, not a product
Built to study, not to ship
This is a study build, not a clinical tool. The goal was to compare five model families on the same small dataset and see what their disagreement looks like. The probabilities are not calibrated for real-world use, and nothing here should feed a medical decision.
Stack Built with scikit-learn, xgboost, RDKit, and Streamlit. Dataset: InterDIA (UCI repository ID 1104, 477 drug compounds labelled for drug-induced autoimmunity risk), 477 train and 120 test compounds.