Cheminformatics: Redefining the Crucible

A key facet of any drug discovery effort is the library-screening step in which researchers test millions of chemical compounds for a desired property. Scientists must try to determine what the resulting leads have in common, chemically speaking, and then they must scour chemical libraries for other candidate compounds containing those features. This, fundamentally, is the challenge of cheminformatics. Scott Hutton, president and CEO of San Diego-based Inc., defines cheminforma

Apr 15, 2002
Chris Smith
A key facet of any drug discovery effort is the library-screening step in which researchers test millions of chemical compounds for a desired property. Scientists must try to determine what the resulting leads have in common, chemically speaking, and then they must scour chemical libraries for other candidate compounds containing those features. This, fundamentally, is the challenge of cheminformatics. Scott Hutton, president and CEO of San Diego-based Inc., defines cheminformatics as the application of techniques and technology associated with managing chemical structures databases, especially small, drug-like structures.

Though similar in etymology to bioinformatics, which Hutton describes as "the new kid on the block," cheminformatics presents developers with challenges that bioinformaticians need not face. Cheminformatics software manipulates data that describe a compound's three-dimensional structure, while bioinformatics problems deal with long strings of nucleotides or amino acids. These molecules are unconstrained in terms of their constituent parts—that is, they can contain any element—and these parts can be joined in infinite ways, and assume myriad structures.

Cheminformatics applications generally perform one or more of the following five key functions: data mining, docking, defining quantitative structure-activity relationships (QSAR), pharmacophore mapping, and structure/substructure searching. Although these functions can be used separately, they achieve their greatest impact when used in an integrated environment where researchers can capitalize on the seamless synergy of data handling, filtering, and analysis.

A History Lesson

The term cheminformatics was first coined in 1998,1 but the field has existed for much longer. Cheminformatics can be traced back to Justus Liebig's Annalen der Pharmacie (1832), and later to Chemical Abstracts in 1907. These documents were, for all practical purposes, textual and two-dimensional descriptions of compounds, reaction mechanisms, and methods of synthesis and identification.

Modern cheminformatics developed as computational technologies advanced. In the 1950s and 1960s, early computational chemists applied mathematical graph theory to two-dimensional structure searches of chemical structure databases. Electronic warehousing of textual information, and the creation of indexing and searching tools for these information repositories, complemented these developments.

Cheminformatics advanced steadily over the next two decades, especially in the areas of structure searching, molecular manipulation, and visualization. The advent of combinatorial chemistry2 and high-throughput screening3 of the libraries, and the vast and complex array of experimental data that this research generated, put chemical information management and data analysis at the forefront of biotechnology and pharmaceutical research in the mid-1990s.

According to Hutton, two developments have accelerated cheminformatics' recent, rapid rise to prominence. The first is the Human Genome Project, which has led to the identification of new candidate drug targets using bioinformatics tools. The other is the plummeting cost of computational power. Cheminformatics used to be limited by processor speed, he says, but that is no longer the case.

Cheminformatics now straddles the worlds of computational chemistry and information science,3 and its applications and tools allow researchers to manage the staggering volume of information data available today. These tools help users query and filter chemical data from literature, patent, compound, and experimental databases. Collectively, the tools help researchers understand basic biochemical processes at the molecular level, identify compounds with potential pharmaceutical or research value, and experiment with those reagents in the real and virtual worlds.

Scientists who use cheminformatics tools range from academicians studying basic atom arrangements in molecules to pharmaceutical researchers scanning vast compound libraries in the hunt for novel drugs. According to Hutton, this latter group represents the largest slice, by far, of the cheminformatics market.

Basic biological researchers have used cheminformatics to develop criteria that define potential pharmaceutical agents or classes of agents, which are then used in the drug discovery process. Key areas in basic research where cheminformatics plays a pivotal role are the development of pharmacophore maps—descriptors that define the physical, chemical, and structural make-up of a compound with pharmaceutical potential—and the examination of macromolecule-small molecule interactions, such as receptor-ligand and enzyme-inhibitor/activator associations.

Data Mining

Data mining is the nontrivial extraction of implicit, previously unknown and potentially useful information from data, or the search for relationships and global patterns that exist in databases.4 It is a key component of most computation-based fields, including proteomics, genomics, bioinformatics, physiomics, and structural bioinformatics/genomics.

The result of a data mining query will in many ways define a project's starting point, and may provide clues that influence the direction in which the research will proceed. Broadly, scientists can take one of two approaches to data mining: explicit and "shotgun" searching. The former employs an explicitly defined search of a specific dataset, for example, gene or protein expression microarrays, ligand-protein/DNA interaction data, or two-dimensional electrophoresis maps. This process helps filter out noise in the data, while highlighting events that may have biological or chemical significance.

Explicit searching is an integral part of every step in the drug discovery process, but pharmaceutical projects often begin with the shotgun approach, in which scientists query one or several databases using a broad set of search criteria. The results of this type of search places the researcher "in the ballpark" with regard to a starting point in their research plan. The entire discovery process is in essence a series of searching and filtering steps. The objective is to find relationships at each step, then re-examine and fine-tune those relationships, with the end result being the description of a new biological phenomenon, such as the interaction of a novel drug with a particular protein, inducing a desirable biochemical effect.


"Docking" refers to the physical fit between a macromolecular target and a pharmaceutical compound, whether it is a substrate, inhibitor, or activator. Docking is a function of the conformation of each partner in the interaction, and is constrained by the energetics of that interaction. Docking algorithms seek to find the conformational orientations of target and drug that minimize the total energetics of the complex; this configuration represents the optimal target-drug interaction.
Courtesy of WaveFunction Inc.

The process is in many ways similar to three-dimensional structure searching, but in the context of drug discovery it describes the detailed visual examination of known macromolecule-small molecule interactions and virtual manipulation to explore the fine attributes of the interaction or manually "fine tune" a potential drug lead structure. Chemists can then use the manipulated structure or attributes in subsequent cheminformatic analyses.

QSAR Quantitative structure-activity relationship, or QSAR,5 describes a process for predicting the biological activity of a compound or class of compounds based upon a description of its structure, including physico-chemical attributes. QSAR provides a systematic method to evaluate untested and hypothetical—that is, virtual—compounds for a specific biochemical function. Founded upon a numerical description of a given set of structures—geometries, energies, electronic and spectroscopic attributes, and volume—and their associated, known biological or chemical activity, QSAR uses these known structure-activity relationships as training sets to build rules used to predict the putative biological activity of new chemical entities. Therefore, QSAR analyses are an important part of the drug lead process, as it eliminates structurally qualified, but biologically incompetent chemical entities from the pool of drug candidates.

Pharmacophore Mapping

A key application of cheminformatic tools is the development of physico-chemical-structural profiles—descriptions of potential classes of pharmaceutically promising agents, called pharmacophores. A pharmacophore, says ChemNavigator's Hutton, is a subset of a small molecule that contributes to the biological activity of that molecule. A good example is the beta-lactam ring of penicillin and its derivatives, which serves as one of the critical chemical features of that class of antibiotics.

Pharmacophore profiles can serve as starting blocks for combinatorial syntheses of new pharmaceutical agents. Ji-Wang Chern and colleagues at the National Taiwan University recently constructed pharmacophore descriptors for a nonsteroidal inhibitor of rat 5a reductase using San Diego-based Accelrys' program Catalyst and a training set of 16 characterized human type II 5a reductase inhibitors. They then used the pharmacophore profile to screen the NCI DIS 3D database by three-dimensional database searching,6,7 and identified eight isoflavone, potential nonsteroidal enzyme inhibitors.8

In other research efforts various cheminformatics applications are utilized in concert to address a problem. For example, Orazio Nicolotti and colleagues examined 11 high-affinity agonists of the neuronal nicotine acetylcholine receptor (nAChR), using a variety of cheminformatic applications, including Catalyst and SYBYL/DISCOtech from Tripos Inc. of St Louis.9 The confluence of results allowed the research team to identify and locate several spatially independent, key structural features. Additional predictive two- and three-dimensional QSAR models for the pharmacophore could lead to "new nAChR agonists with pharmaceutical potential," according to the authors.

Structure/Substructure Searching

If searching is like finding a needle in the haystack, then structure and substructure searching is like looking for a specific needle in a haystack of needles. Grabbing the wrong needle—that is, drug lead—can be quite painful, both figuratively and literally, given the cost of pharmaceutical R&D today.

Structure and substructure searching refer to three-dimensional comparisons of a molecule, or a part of that molecule, against a database of structures. These searches are typically conducted in pharmacophore mapping to find potential targets—drugs or ligands—that match specific structural criteria. In fact, says Hutton, pharmacophore mapping is a type of substructure searching.

The quality and specificity of any database search is dependent upon both the database's information quality and the tools used to query it. This is particularly true of structural database searching, where the depth of detail and annotation for a particular structure in the database defines what can be learned from the search. The transition from two to three-dimensional representation was a major leap in structure searching, and has facilitated detailed ligand-target fitting of potential pharmacological agents.10

Looking to the Future

The ultimate goal of any cheminformatics system is to provide an integrated environment that conducts searches across multiple data types and dimensions, to find commonalities and relationships between the data, and to provide mechanisms to experiment with these findings in the virtual realm. As new technologies and techniques provide mechanisms to increase the scope and extent of experimentation and data output, new tools are being developed to analyze this data.
Courtesy of WaveFunction Inc.

In the future, says Hutton, pharmaceutical companies will engage in the practice of "virtual drug discovery," in which researchers will design and test drugs in silico before committing R&D money to test those compounds at the bench. He notes, for instance, that toxicity, which often only becomes evident once human testing begins, derails many drugs in the development pipeline. Scientists at ChemNavigator, and at other cheminformatics companies, are actively working on new algorithms to predict what metabolites the body will break a compound into—called ADME, for adsorption, metabolism, and excretion—and to predict the toxicity of those breakdown products. Such applications could potentially save drug developers billions of dollars and years of research.

In 2000, the National Institutes of Health began funding a new drug exploration program, "Molecular Target Drug Discovery for Cancer: Exploratory Grants." The program seeks "to identify novel molecular target[s], or to validate the target as [the] basis for cancer drug discovery."11 Cheminformatics will be the crux upon which most of these exploratory efforts will develop. The stage is therefore set for major drug discoveries over the course of the next decade, owing in no small part to the advances in cheminformatics techniques and applications, and to researchers' capitalizing upon this technology. Since cheminformatics plays an integral part throughout the drug discovery process, perhaps its greatest contribution will be in the unification of knowledge leading to drug discovery.

Chris Smith ( is a freelance writer in San Diego.

1. F.K. Brown, "Chemoinformatics, what it is and how does it impact drug discovery," Annual Reports in Medicinal Chemistry, 33:375-84, 1998.

2. Y.C. Martin, "Challenges and prospects for computational aids to molecular diversity," Perspectives in Drug Discovery and Design, 7/8:159-72, 1997.

3. C. Smith, "The new medicine man," The Scientist, 14[4]:22, Feb. 22, 2000.

4. B. Klevecz, "The whole EST catalog," The Scientist, 12[2]:22, Jan. 18, 1999.

5. C. Hansch and A. Leo, Exploring QSAR: Fundamentals and Applications in Chemistry and Biology: Washington, D.C., American Chemical Society, 1995.

6. J. Boguslavsky, "Creating knowledge from HTS data," Drug Discovery and Development, June 6, 2001.

7. National Cancer Institute, NCI DIS 3D Database,

8. G.S. Chen et al., "Novel lead generation through hypothetical pharmacophore three-dimensional database searching: Discovery of isofavonoids as nonsteroidal inhibitors of rat 5[alpha]-reductase," Journal of Medicinal Chemistry, 44:3759-63, Nov. 8, 2001.

9. O. Nicolotti, et al., "Neuronal nicotinic receptor agonists: A multi-approach development of the pharmacophore," Journal of Computer-Aided Molecular Design, 15:859-72, September 2001.

10. Y.C. Martin, "3D database searching in drug design," Journal of Medicinal Chemistry, 35:2145-54, 1992.

11. National Institutes of Health. "Molecular drug discovery for cancer: Exploratory grants," PAR-00-060,

Suppliers of Cheminformatics Tools