AI/ML for Drug Discovery and Precision Oncology

Translating variant interpretation into therapeutic opportunity

Background

A persistent challenge in precision oncology is that most cancer patients carry rare variants of unknown significance (VUS). These mutations are often overlooked in drug development because they lack frequency-based statistical power. At the same time, common oncogenic variants (e.g., in PIK3CA, TP53) have well-established drug associations that drive current therapeutic strategies.

AI/ML provides a framework to bridge this gap:

  • Leverage clusters of common variants to define functional and drug-response signatures.

  • Repurpose those signatures to annotate and prioritize rare variants that share structural or pathway-level similarity.

  • Enable scalable variant-to-drug mapping, ensuring that even low-frequency mutations can be connected to therapeutic opportunities.

Phase 1: AI-Driven Variant Annotation (complete)

  • Problem: Rare variants lack annotation, limiting clinical actionability.

  • Approach: Developed AI/ML pipelines to cluster variants in 3D protein space and link them to phenotypic readouts such as ESR1/EZH2 pathway activity.

    • Methods: Density-based clustering, Random Forest

    • Data: Cancer cell line data from DepMap/CCLE, Variant annotation data from ClinVar

  • Outcome:

    • Identified common variant clusters (e.g., PIK3CA hotspots) enriched for sensitivity to mTORC and AKT inhibitors.

    • Repurposed these associations to annotate rare variants mapping to the same clusters.

    • Highlighted TP53 clusters with opposing effects on ESR1 signaling, suggesting divergent therapeutic responses.

Phase 2: AI/ML Pipelines for Drug Response Prediction (complete)

  • Idea: Common variant clusters can act as templates for drug-response phenotypes, extending therapeutic predictions to rare variants.

  • Approach: Built supervised ML pipelines trained on pharmacogenomics datasets to generalize drug-response predictions from common → rare clusters.

    • Methods: Density-based clustering, Random Forest, XGBoost, Graph Neural Networks

    • Data: Pharacogenomics datasets like the Cancer Therapeutics Reserch Portal (CTRP) and Genomics of Drug Sensitivity in Cancer (GDSC)

  • Outcome:

    • Annotated >12,000 variants across breast cancer datasets.

    • Connected rare variants to existing targeted therapies via shared functional clusters.

    • This work has the implication to expand the precision oncology therapies available to patients by 25%

Previous
Previous

AI/ML for Variant Impacts on Pathways

Next
Next

Multi-Omics Integration and Single Cell RNAseq (Perturb-Seq)