Most of the practical AI success stories in recent years have involved what computer scientists call supervised machine learning: the use of labeled datasets to train algorithms to automate what had been a human activity. For example, take a dataset of symptoms and test results of thousands of patients, along with their eventual diagnosis by doctors, and train an algorithm to learn the patterns in the dataset—that is, which symptoms and clinical markers predict which diseases. Similarly, take a dataset of labeled images and train an algorithm to recognize people’s faces. These successes show that machine learning can, with the right training data, approximate tacit human knowledge. But is it possible for AI to extract knowledge unknown even to experts? Can we automate something like scientific discovery?
One potential approach, which I discuss in my recently published book A Human’s Guide to Machine Intelligence,...
Swanson found many studies that confirmed the observations that 1) fish oil improved blood circulation and 2) Raynaud’s disease was associated with poor blood circulation. But none of the existing research suggested that fish oil could be an effective treatment for Raynaud’s. In 1986, Swanson wrote a research paper proposing the hypothesis. In 1989, a clinical study conducted in the rheumatology clinic at Albany Medical College confirmed Swanson’s hypothesis.
Swanson’s main insight was that new knowledge could be uncovered by connecting disparate fields of knowledge: If A (fish oil) was related to B (blood flow) and B was related to C (Raynaud’s symptoms), then there might be a potential relationship between A and C.
Swanson and University of Illinois at Chicago psychiatry professor Neil Smalheiser developed a computer program called Arrowsmith that plucked out such hypotheses from medical research databases, with a focus on theories generated out of links between disparate specialties. Swanson later hypothesized a relationship between magnesium deficiency and migraine headaches that was also supported by subsequent clinical research.
Over the years, Arrowsmith has had limited effect, but Swanson’s early foray suggests that finding relationships in data from disparate fields can help tap into undiscovered knowledge hidden in data. Although Swanson’s efforts were fully manual, such a process can indeed be automated to help uncover knowledge that scientists might not have discovered yet.
An alternative approach is illustrated by Google Deep Mind’s Go-playing software AlphaGo Zero. While the original version of the software was trained heavily on past games played by human Go players, AlphaGo Zero didn’t bother studying human moves; instead, its entire training dataset was self-generated. The software, armed with basic rules of Go, played millions of games against itself. Next, it analyzed those games to figure out which moves helped and which ones hurt.
While supervised learning relies on cleanly labeled training data, AlphaGo Zero learned from data generated by an algorithm itself via exploration, an approach known as reinforcement learning. Such algorithms explore different actions and learn which actions lead to a better performance. Instead of being restricted to analyzing the data already obtained, the approach can explore the space of potential actions and prioritize what to test next. This ability to take stock of multiple hypotheses and explore them (that is, conduct experiments and acquire data to validate hypotheses), all while recognizing the cost of exploration, can be a big boost to scientific discovery. For example, drug discovery relies on coming up with millions of candidate molecules and running a series of experiments to identify if some molecule seems to work.
While AI is automating routine tasks in a variety of industries, there is great promise for its application in science as well.
Kartik Hosanagar is the John C. Hower Professor at the Wharton School of the University of Pennsylvania, where he studies technology and the digital economy.