Simplifying the Search for Drug Targets

A new machine learning model promises fast prediction of drug-target interactions.

Written byAparna Nathan, PhD
| 3 min read
ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.
Register for free to listen to this article
Listen with Speechify
0:00
3:00
Share


ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.
ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.

Thousands of proteins in our body may contribute to disease, but one of the most challenging problems is figuring out what drugs can target them. Testing pairs of proteins and drugs in a laboratory setting is time consuming and expensive, and computational simulations require massive computers and complex computations. “That doesn’t scale to levels where you can scan an entire genome or massive [drug] compound libraries,” said Rohit Singh, a computational biologist at the Massachusetts Institute of Technology.

These challenges motivated Singh and Samuel Sledzieski, a fellow computational biologist at the Massachusetts Institute of Technology, to develop a simpler computational method to predict whether drugs and proteins bind. Their approach, called ConPlex, was recently published in the Proceedings of the National Academy of Sciences.1 Unlike more complicated methods that use 3D protein structure models, ConPlex only requires the sequences of the proteins and simple descriptions of the candidate drugs.

The researchers first fed protein sequences into a protein language model inspired by increasingly common text-generating algorithms such as autofill or ChatGPT.2 “[Text algorithms] are basically just predicting what the next thing should be based on what has come before,” Sledzieski said. “These properties of these algorithms apply really nicely to proteins because they are also a linear chain.” While text algorithms use large amounts of text data to predict the rest of a sentence or answer questions, protein language models use information about millions of protein sequences to identify key features that can predict a protein’s properties.3

Then, the researchers built ConPlex, a machine learning algorithm that can be used by other scientists to predict whether a drug will bind to a protein based on key features extracted by the protein language model and a set of known protein-drug interactions. ConPlex also incorporates information about drugs that are known not to bind to proteins, despite looking similar to drugs that do bind, so that the model can identify subtle features that might promote binding. The researchers found that ConPlex was fast and accurate, even when predicting the binding of new drugs or proteins that the model hadn’t encountered before.

Continue reading below...

Like this story? Sign up for FREE Newsletter updates:

Latest science news storiesTopic-tailored resources and eventsCustomized newsletter content
Subscribe

In future iterations, Singh and Sledzieski hope to incorporate additional elements into the model, such as how multiple drugs might interact and the effect of mutations on drug-target binding. Ozlem Garibay, a computer scientist at the University of Central Florida who was not involved in the study, agreed that more details about the proteins could further improve performance. “Simplicity can be a strength,” she said. “But it may be limiting here because [proteins] are three-dimensional structures.”

The researchers have made ConPlex freely available online for scientists to use to find new drugs that target a protein or to identify existing drugs that can be repurposed to target proteins in other diseases. According to Sledzieski, while ConPlex will not offer the final word on whether a drug will work, it can prioritize promising candidates for further study.

ConPlex may even have a role to play in clinical trials because it can predict potential off-target binding that could lead to unwanted side effects. “The failure rate for drugs [in clinical trials] is very high,” Singh said. “The earlier you can model off-target effects into your computational pipeline, the the earlier you can say ‘This drug looks interesting, but it is just not a good idea.’”

  1. Singh R, Sledzieski S, et al. Contrastive learning in protein language space predicts interactions between drugs and protein targets. PNAS. 120(24), e2220778120 (2023).
  2. Brandes N, et al. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics. 38(8), 2102-2110 (2022).
  3. Bepler T, Berger B. Learning the protein language: Evolution, structure, and function. Cell Syst. 12(6), 654-669.e3 (2021).

Related Topics

Meet the Author

  • Aparna Nathan, PhD

    Aparna is a freelance science writer with a PhD in bioinformatics and genomics from Harvard University. She uses her multidisciplinary training to find both the cutting-edge science and the human stories in everything from genetic testing to space expeditions. She was a 2021 AAAS Mass Media Fellow at the Philadelphia Inquirer. Her writing has also appeared in Popular Science, PBS NOVA, and The Open Notebook.

    View Full Profile
Share
You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
February 2026

A Stubborn Gene, a Failed Experiment, and a New Path

When experiments refuse to cooperate, you try again and again. For Rafael Najmanovich, the setbacks ultimately pushed him in a new direction.

View this Issue
Human-Relevant In Vitro Models Enable Predictive Drug Discovery

Advancing Drug Discovery with Complex Human In Vitro Models

Stemcell Technologies
Redefining Immunology Through Advanced Technologies

Redefining Immunology Through Advanced Technologies

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Ensuring Regulatory Compliance in AAV Manufacturing with Analytical Ultracentrifugation

Beckman Coulter Logo
Conceptual multicolored vector image of cancer research, depicting various biomedical approaches to cancer therapy

Maximizing Cancer Research Model Systems

bioxcell

Products

Sino Biological Logo

Sino Biological Pioneers Life Sciences Innovation with High-Quality Bioreagents on Inside Business Today with Bill and Guiliana Rancic

Sino Biological Logo

Sino Biological Expands Research Reagent Portfolio to Support Global Nipah Virus Vaccine and Diagnostic Development

Beckman Coulter

Beckman Coulter Life Sciences Partners with Automata to Accelerate AI-Ready Laboratory Automation

Refeyn logo

Refeyn named in the Sunday Times 100 Tech list of the UK’s fastest-growing technology companies