Simplifying the Search for Drug Targets

A new machine learning model promises fast prediction of drug-target interactions.

Written byAparna Nathan, PhD
| 3 min read
ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.
Register for free to listen to this article
Listen with Speechify
0:00
3:00
Share


ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.
ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.

Thousands of proteins in our body may contribute to disease, but one of the most challenging problems is figuring out what drugs can target them. Testing pairs of proteins and drugs in a laboratory setting is time consuming and expensive, and computational simulations require massive computers and complex computations. “That doesn’t scale to levels where you can scan an entire genome or massive [drug] compound libraries,” said Rohit Singh, a computational biologist at the Massachusetts Institute of Technology.

These challenges motivated Singh and Samuel Sledzieski, a fellow computational biologist at the Massachusetts Institute of Technology, to develop a simpler computational method to predict whether drugs and proteins bind. Their approach, called ConPlex, was recently published in the Proceedings of the National Academy of Sciences.1 Unlike more complicated methods that use 3D protein structure models, ConPlex only requires the sequences of the proteins and simple descriptions of the candidate drugs.

The researchers first fed protein sequences into a protein language model inspired by increasingly common text-generating algorithms such as autofill or ChatGPT.2 “[Text algorithms] are basically just predicting what the next thing should be based on what has come before,” Sledzieski said. “These properties of these algorithms apply really nicely to proteins because they are also a linear chain.” While text algorithms use large amounts of text data to predict the rest of a sentence or answer questions, protein language models use information about millions of protein sequences to identify key features that can predict a protein’s properties.3

Then, the researchers built ConPlex, a machine learning algorithm that can be used by other scientists to predict whether a drug will bind to a protein based on key features extracted by the protein language model and a set of known protein-drug interactions. ConPlex also incorporates information about drugs that are known not to bind to proteins, despite looking similar to drugs that do bind, so that the model can identify subtle features that might promote binding. The researchers found that ConPlex was fast and accurate, even when predicting the binding of new drugs or proteins that the model hadn’t encountered before.

In future iterations, Singh and Sledzieski hope to incorporate additional elements into the model, such as how multiple drugs might interact and the effect of mutations on drug-target binding. Ozlem Garibay, a computer scientist at the University of Central Florida who was not involved in the study, agreed that more details about the proteins could further improve performance. “Simplicity can be a strength,” she said. “But it may be limiting here because [proteins] are three-dimensional structures.”

The researchers have made ConPlex freely available online for scientists to use to find new drugs that target a protein or to identify existing drugs that can be repurposed to target proteins in other diseases. According to Sledzieski, while ConPlex will not offer the final word on whether a drug will work, it can prioritize promising candidates for further study.

ConPlex may even have a role to play in clinical trials because it can predict potential off-target binding that could lead to unwanted side effects. “The failure rate for drugs [in clinical trials] is very high,” Singh said. “The earlier you can model off-target effects into your computational pipeline, the the earlier you can say ‘This drug looks interesting, but it is just not a good idea.’”

  1. Singh R, Sledzieski S, et al. Contrastive learning in protein language space predicts interactions between drugs and protein targets. PNAS. 120(24), e2220778120 (2023).
  2. Brandes N, et al. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics. 38(8), 2102-2110 (2022).
  3. Bepler T, Berger B. Learning the protein language: Evolution, structure, and function. Cell Syst. 12(6), 654-669.e3 (2021).

Related Topics

Meet the Author

  • Aparna Nathan, PhD

    Aparna is a freelance science writer with a PhD in bioinformatics and genomics from Harvard University. She uses her multidisciplinary training to find both the cutting-edge science and the human stories in everything from genetic testing to space expeditions. She was a 2021 AAAS Mass Media Fellow at the Philadelphia Inquirer. Her writing has also appeared in Popular Science, PBS NOVA, and The Open Notebook.

    View Full Profile
Share
You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
Illustration of a developing fetus surrounded by a clear fluid with a subtle yellow tinge, representing amniotic fluid.
January 2026, Issue 1

What Is the Amniotic Fluid Composed of?

The liquid world of fetal development provides a rich source of nutrition and protection tailored to meet the needs of the growing fetus.

View this Issue
Skip the Wait for Protein Stability Data with Aunty

Skip the Wait for Protein Stability Data with Aunty

Unchained Labs
Graphic of three DNA helices in various colors

An Automated DNA-to-Data Framework for Production-Scale Sequencing

illumina
Exploring Cellular Organization with Spatial Proteomics

Exploring Cellular Organization with Spatial Proteomics

Abstract illustration of spheres with multiple layers, representing endoderm, ectoderm, and mesoderm derived organoids

Organoid Origins and How to Grow Them

Thermo Fisher Logo

Products

Brandtech Logo

BRANDTECH Scientific Introduces the Transferpette® pro Micropipette: A New Twist on Comfort and Control

Biotium Logo

Biotium Launches GlycoLiner™ Cell Surface Glycoprotein Labeling Kits for Rapid and Selective Cell Surface Imaging

Colorful abstract spiral dot pattern on a black background

Thermo Scientific X and S Series General Purpose Centrifuges

Thermo Fisher Logo
Abstract background with red and blue laser lights

VANTAstar Flexible microplate reader with simplified workflows

BMG LABTECH