Benchmarking Explainability of Molecular Machine Learning with WISP

Speaker: Jonny Proppe, Technische Universität Braunschweig

Interpretable machine learning is crucial for building trust in predictive models, especially in chemistry and drug discovery. We present WISP (Workflow for Interpretability Scoring using Pairs), a framework implemented in Python and designed to systematically benchmark the explainability of machine-learning methods in the context of molecular property prediction. It combines scoring metrics with matched molecular pair (MMP) analysis, providing both quantitative and qualitative insights into how well model explanations align with known structure–property relationships.
Alongside WISP, we introduce a descriptor- and model-agnostic atom attributor that generates robust atom-level explanations. Applied to diverse datasets—including Crippen logP, experimental logP, aqueous solubility, LCAP reaction yields, Factor Xa binding affinities, and AMES mutagenicity—WISP reveals how explanation quality depends on model performance and dataset complexity. For reaction yield prediction, WISP exposes functional group effects that would otherwise remain hidden, supporting rational molecular design and synthesis planning.
Together, WISP and our atom attributor form a flexible toolkit to benchmark and improve explainability, enabling more reliable predictions and better informed decision-making.

Department of Biology, Chemistry, Pharmacy

61. Symposium on Theoretical Chemistry

Benchmarking Explainability of Molecular Machine Learning with WISP