Simplifying the Search for Drug Targets

A new machine learning model promises fast prediction of drug-target interactions.

Aparna Nathan, PhD
| 3 min read
ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.

ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.

https://www.istockphoto.com/photo/medicines-health-disease-medical-concepts-background-gm1156386917-315136145

Register for free to listen to this article
Listen with Speechify
0:00
3:00
Share


ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.
ConPlex predicts what proteins a drug is likely to bind, which can help identify new targets for existing drugs.

Thousands of proteins in our body may contribute to disease, but one of the most challenging problems is figuring out what drugs can target them. Testing pairs of proteins and drugs in a laboratory setting is time consuming and expensive, and computational simulations require massive computers and complex computations. “That doesn’t scale to levels where you can scan an entire genome or massive [drug] compound libraries,” said Rohit Singh, a computational biologist at the Massachusetts Institute of Technology.

These challenges motivated Singh and Samuel Sledzieski, a fellow computational biologist at the Massachusetts Institute of Technology, to develop a simpler computational method to predict whether drugs and proteins bind. Their approach, called ConPlex, was recently published in the Proceedings of the National Academy of Sciences.1 Unlike more complicated methods that use 3D protein structure models, ConPlex only requires the sequences of the proteins and simple descriptions of the candidate drugs.

The researchers first fed protein sequences into a protein language model inspired by increasingly common text-generating algorithms such as autofill or ChatGPT.2 “[Text algorithms] are basically just predicting what the next thing should be based on what has come before,” Sledzieski said. “These properties of these algorithms apply really nicely to proteins because they are also a linear chain.” While text algorithms use large amounts of text data to predict the rest of a sentence or answer questions, protein language models use information about millions of protein sequences to identify key features that can predict a protein’s properties.3

Then, the researchers built ConPlex, a machine learning algorithm that can be used by other scientists to predict whether a drug will bind to a protein based on key features extracted by the protein language model and a set of known protein-drug interactions. ConPlex also incorporates information about drugs that are known not to bind to proteins, despite looking similar to drugs that do bind, so that the model can identify subtle features that might promote binding. The researchers found that ConPlex was fast and accurate, even when predicting the binding of new drugs or proteins that the model hadn’t encountered before.

In future iterations, Singh and Sledzieski hope to incorporate additional elements into the model, such as how multiple drugs might interact and the effect of mutations on drug-target binding. Ozlem Garibay, a computer scientist at the University of Central Florida who was not involved in the study, agreed that more details about the proteins could further improve performance. “Simplicity can be a strength,” she said. “But it may be limiting here because [proteins] are three-dimensional structures.”

The researchers have made ConPlex freely available online for scientists to use to find new drugs that target a protein or to identify existing drugs that can be repurposed to target proteins in other diseases. According to Sledzieski, while ConPlex will not offer the final word on whether a drug will work, it can prioritize promising candidates for further study.

ConPlex may even have a role to play in clinical trials because it can predict potential off-target binding that could lead to unwanted side effects. “The failure rate for drugs [in clinical trials] is very high,” Singh said. “The earlier you can model off-target effects into your computational pipeline, the the earlier you can say ‘This drug looks interesting, but it is just not a good idea.’”

  1. Singh R, Sledzieski S, et al. Contrastive learning in protein language space predicts interactions between drugs and protein targets. PNAS. 120(24), e2220778120 (2023).
  2. Brandes N, et al. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics. 38(8), 2102-2110 (2022).
  3. Bepler T, Berger B. Learning the protein language: Evolution, structure, and function. Cell Syst. 12(6), 654-669.e3 (2021).

Keywords

Meet the Author

  • Aparna Nathan, PhD

    Aparna Nathan, PhD

    Aparna is a freelance science writer with a PhD in bioinformatics and genomics at Harvard University. Her writing has also appeared in The Philadelphia Inquirer, Popular Science, PBS NOVA, and more.
Share
You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
A greyscale image of cells dividing.
March 2025, Issue 1

How Do Embryos Know How Fast to Develop

In mammals, intracellular clocks begin to tick within days of fertilization.

View this Issue
iStock: Ifongdesign

The Advent of Automated and AI-Driven Benchwork

sampled
Discover the history, mechanics, and potential of PCR.

Become a PCR Pro

Integra Logo
3D rendered cross section of influenza viruses, showing surface proteins on the outside and single stranded RNA inside the virus

Genetic Insights Break Infectious Pathogen Barriers

Thermo Fisher Logo
A photo of sample storage boxes in an ultra-low temperature freezer.

Navigating Cold Storage Solutions

PHCbi logo 

Products

dispensette-s-group

BRAND® Dispensette® S Bottle Top Dispensers for Precise and Safe Reagent Dispensing

Sapio Sciences

Sapio Sciences Makes AI-Native Drug Discovery Seamless with NVIDIA BioNeMo

DeNovix Logo

New DeNovix Helium Nano Volume Spectrophotometer

Olink Logo

Olink® Reveal: Accessible NGS-based proteomics for every lab

Olink logo