Simplifying the Search for Drug Targets

A new machine learning model promises fast prediction of drug-target interactions.

Written byAparna Nathan, PhD
| 3 min read
ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.
Register for free to listen to this article
Listen with Speechify
0:00
3:00
Share


ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.
ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.

Thousands of proteins in our body may contribute to disease, but one of the most challenging problems is figuring out what drugs can target them. Testing pairs of proteins and drugs in a laboratory setting is time consuming and expensive, and computational simulations require massive computers and complex computations. “That doesn’t scale to levels where you can scan an entire genome or massive [drug] compound libraries,” said Rohit Singh, a computational biologist at the Massachusetts Institute of Technology.

These challenges motivated Singh and Samuel Sledzieski, a fellow computational biologist at the Massachusetts Institute of Technology, to develop a simpler computational method to predict whether drugs and proteins bind. Their approach, called ConPlex, was recently published in the Proceedings of the National Academy of Sciences.1 Unlike more complicated methods that use 3D protein structure models, ConPlex only requires the sequences of the proteins and simple descriptions of the candidate drugs.

The researchers first fed protein sequences into a protein language model inspired by increasingly common text-generating algorithms such as autofill or ChatGPT.2 “[Text algorithms] are basically just predicting what the next thing should be based on what has come before,” Sledzieski said. “These properties of these algorithms apply really nicely to proteins because they are also a linear chain.” While text algorithms use large amounts of text data to predict the rest of a sentence or answer questions, protein language models use information about millions of protein sequences to identify key features that can predict a protein’s properties.3

Then, the researchers built ConPlex, a machine learning algorithm that can be used by other scientists to predict whether a drug will bind to a protein based on key features extracted by the protein language model and a set of known protein-drug interactions. ConPlex also incorporates information about drugs that are known not to bind to proteins, despite looking similar to drugs that do bind, so that the model can identify subtle features that might promote binding. The researchers found that ConPlex was fast and accurate, even when predicting the binding of new drugs or proteins that the model hadn’t encountered before.

In future iterations, Singh and Sledzieski hope to incorporate additional elements into the model, such as how multiple drugs might interact and the effect of mutations on drug-target binding. Ozlem Garibay, a computer scientist at the University of Central Florida who was not involved in the study, agreed that more details about the proteins could further improve performance. “Simplicity can be a strength,” she said. “But it may be limiting here because [proteins] are three-dimensional structures.”

The researchers have made ConPlex freely available online for scientists to use to find new drugs that target a protein or to identify existing drugs that can be repurposed to target proteins in other diseases. According to Sledzieski, while ConPlex will not offer the final word on whether a drug will work, it can prioritize promising candidates for further study.

ConPlex may even have a role to play in clinical trials because it can predict potential off-target binding that could lead to unwanted side effects. “The failure rate for drugs [in clinical trials] is very high,” Singh said. “The earlier you can model off-target effects into your computational pipeline, the the earlier you can say ‘This drug looks interesting, but it is just not a good idea.’”

  1. Singh R, Sledzieski S, et al. Contrastive learning in protein language space predicts interactions between drugs and protein targets. PNAS. 120(24), e2220778120 (2023).
  2. Brandes N, et al. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics. 38(8), 2102-2110 (2022).
  3. Bepler T, Berger B. Learning the protein language: Evolution, structure, and function. Cell Syst. 12(6), 654-669.e3 (2021).

Related Topics

Meet the Author

  • Aparna Nathan, PhD

    Aparna is a freelance science writer with a PhD in bioinformatics and genomics from Harvard University. She uses her multidisciplinary training to find both the cutting-edge science and the human stories in everything from genetic testing to space expeditions. She was a 2021 AAAS Mass Media Fellow at the Philadelphia Inquirer. Her writing has also appeared in Popular Science, PBS NOVA, and The Open Notebook.

    View Full Profile
Share
You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
July Digest 2025
July 2025, Issue 1

What Causes an Earworm?

Memory-enhancing neural networks may also drive involuntary musical loops in the brain.

View this Issue
Explore synthetic DNA’s many applications in cancer research

Weaving the Fabric of Cancer Research with Synthetic DNA

Twist Bio 
Illustrated plasmids in bright fluorescent colors

Enhancing Elution of Plasmid DNA

cytiva logo
An illustration of green lentiviral particles.

Maximizing Lentivirus Recovery

cytiva logo
Explore new strategies for improving plasmid DNA manufacturing workflows.

Overcoming Obstacles in Plasmid DNA Manufacturing

cytiva logo

Products

shiftbioscience

Shift Bioscience proposes improved ranking system for virtual cell models to accelerate gene target discovery

brandtechscientific-logo

BRANDTECH Scientific Launches New Website for VACUU·LAN® Lab Vacuum Systems

The Scientist Placeholder Image

Waters Enhances Alliance iS HPLC System Software, Setting a New Standard for End-to-End Traceability and Data Integrity 

The Scientist Placeholder Image

Agilent Unveils the Next Generation in LC-Mass Detection: The InfinityLab Pro iQ Series