Computers That Can Smell

Teams of modelers compete to develop algorithms for estimating how people will perceive a particular odor from its molecular characteristics.

May 1, 2017
Kerry Grens


In June of 2014, Pablo Meyer went to Rockefeller University in New York City to give a talk about open data. He leads the Translational Systems Biology and Nanobiotechnology group at IBM Research and also guides so-called DREAM challenges, or Dialogue for Reverse Engineering Assessments and Methods. These projects crowdsource the development of algorithms from open data to make predictions for all manner of medical and biological problems—for example, prostate cancer survival or how quickly ALS patients’ symptoms will progress. Andreas Keller, a neuroscientist at Rockefeller, was in the audience that day, and afterward he emailed Meyer with an offer and a request. “He said, ‘We have this data set, and we don’t model,’” recalls Meyer. “‘Do you think you could organize a competition?’”

The data set Keller had been building was far from ordinary. It was the largest collection of odor perceptions of its kind—dozens of volunteers, each having made 10 visits to the lab, described 476 different smells using 19 descriptive words (including sweet, urinous, sweaty, and warm), along with the pleasantness and intensity of the scent. Before Keller’s database, the go-to catalog at researchers’ disposal was a list of 10 odor compounds, described by 150 participants using 146 words, which had been developed by pioneering olfaction scientist Andrew Dravnieks more than three decades earlier.

Meyer was intrigued, so he asked Keller for the data. Before launching a DREAM challenge, Meyer has to ensure that the raw data provided to competitors do indeed reflect some biological phenomenon. In this case, he needed to be sure that algorithms could determine what a molecule might smell like when only its chemical characteristics were fed in. There were more than 4,800 molecular features for each compound, including structural properties, functional groups, chemical compositions, and the like. “We developed a simple linear model just to see if there’s a signal there,” Meyer says. “We were very, very surprised we got a result. We thought there was a bug.”

In January 2015, the call went out to modelers to join a competition for designing the best model from data on 69 odors to predict their scent profiles. Eighteen teams submitted algorithms. They performed fairly well at estimating the presence of certain qualities in an odor—garlicky, fishy, sweet, or burnt, for example—and especially well at predicting how intense or pleasant a smell would be. “It’s a very impressive effort to collect this much data, and it allowed them to model responses and descriptors better than has been done before,” says Kobi Snitz, a modeling specialist in Noam Sobel’s olfaction research group at the Weizmann Institute of Science in Rehovot, Israel, who did not participate in the competition.

See "May the Best Model Win" to read more about DREAM challenges.

One of the results that surprised Meyer most was the second-place performance of a linear model. That algorithm took different parts of each molecule and generated predictions of how each bit would smell—one part might evoke a bakery, for instance, and another, grass. Meyer speculates that this may reflect something fundamental about olfaction and the way odors interact with receptors. Rather than an entire molecule matching a distinct receptor, perhaps it interacts with numerous receptors, with each responding to these various molecular subunits.

Although his data set contained thousands of molecular features, Keller says very few were required to describe each molecule’s smell. “If you know the features of the molecule that make something smell like garlic, you can look at those few and have a pretty good prediction,” he says. “A nice step would be to see how that relates to the binding of odor molecules to odor receptors. If you only have a few features that are important, it becomes a more tractable problem.”

Keller says there’s no consensus in the olfaction field about how the sense works. “The basic science issue is we really have no idea what’s in the odor that makes us [perceive a certain smell],” agrees Johan Lundström, who leads olfactory research groups at the Karolinska Institute in Stockholm and the Monell Chemical Senses Center in Philadelphia. Keller’s database could offer some insight as researchers continue to probe it (the team has made it publicly available), but there’s a limitation: it only includes pure odors, rather than mixtures. “Most odors are not monomolecular,” says Lundström. “Ninety-nine-point-nine percent are complicated mixtures that consist of anywhere from two to 500 different chemicals.”

Several years ago, Snitz and colleagues developed an algorithm to predict the similarity of certain odor mixtures (PLOS Comp Biol, 9:e1003184, 2013). “It turned out that the model works better when mixtures are represented as a single entity rather than as a collection of distinct components,” Snitz says.

Keller is already working on a data set of odor mixtures, using an approach similar to Snitz’s study, but asking study subjects to rate similarities between different smells, rather than to use semantic descriptors. Until this collection is ready, researchers can play around with the data set used in the DREAM challenge. And for eager modelers, Meyer and other DREAM leaders create new challenges every six months. “It’s a very nice idea to have this kind of competition,” says Lundström. “Scientists are naturally competitive. This way you can use that competition to do something great for the community.”

February 2019

Big Storms Brewing

Can forests weather more major hurricanes?


Sponsored Product Updates

Bio-Rad Showcases New Automation Features of its ZE5 Cell Analyzer at SLAS 2019
Bio-Rad Showcases New Automation Features of its ZE5 Cell Analyzer at SLAS 2019
Bio-Rad Laboratories, Inc. (NYSE: BIO and BIOb) today showcases new automation features of its ZE5 Cell Analyzer during the Society for Laboratory Automation and Screening 2019 International Conference and Exhibition (SLAS) in Washington, D.C., February 2–6. These capabilities enable the ZE5 to be used for high-throughput flow cytometry in biomarker discovery and phenotypic screening.
Andrew Alliance and Sartorius Collaborate to Provide Software-Connected Pipettes for Life Science Research
Andrew Alliance and Sartorius Collaborate to Provide Software-Connected Pipettes for Life Science Research
Researchers to benefit from an innovative software-connected pipetting system, bringing improved reproducibility and traceability of experiments to life-science laboratories.
Corning Life Sciences to Feature 3D Cell Culture Technologies at SLAS 2019
Corning Life Sciences to Feature 3D Cell Culture Technologies at SLAS 2019
Corning Incorporated (NYSE: GLW) will showcase advanced 3D cell culture technologies and workflow solutions for spheroids, organoids, tissue models, and applications including ADME/toxicology at the Society for Laboratory Automation and Screening (SLAS) conference, Feb. 2-6 in Washington, D.C.
Corning Introduces New 1536-well Spheroid Microplate
Corning Introduces New 1536-well Spheroid Microplate
High-throughput spheroid microplate benefits cancer research, drug screening