On March 14, 2014, HealthMap—an online database created by researchers at Boston’s Children’s Hospital in 2006 to collect accounts of disease cases from various online sources—notified scientists of an article written in French about cases of a “strange fever” in Macenta, Guinea. Nine days later, the World Health Organization officially announced an Ebola outbreak in the area.
Although the outbreak was first identified by the HealthMap software when a news article was published online, other types of sources have proven invaluable for researchers to continue tracking the virus—most recently in the Democratic Republic of Congo. One key source is Twitter, Emily Cohn, who works on HealthMap at Boston Children’s Hospital, tells The Scientist in an email. Tweets containing the term “Ebola” or evidence of specific symptoms such as fever, joint or muscle aches, and coughing or vomiting blood are flagged by a machine learning algorithm and added to information drawn from online news outlets, official reports, and other sources to create a map of cases, a timeline, and a projection of predicted future cases in certain areas. “Social media collects information on location through geo-location,” Cohn says. “It is the most real-time of the data sources we work with.”
HealthMap researchers have also used social media data to create a global map of Zika and a US map of the flu, and they are not alone in their excitement about using data mined from social media and other online sources such as Google searches to track—and perhaps one day predict—disease outbreaks. Research in this area has demonstrated the value of user-generated data from the Internet to predict past outbreaks, and the hope is that social media posts and internet searches could one day help track the spread of diseases in real time, yielding clues about a pathogen’s migration faster than traditional surveillance systems.
In addition, social posts often contain information about people’s attitudes and behaviors in response to illness—for example, whether or not a person plans to evacuate or vaccinate in hopes of avoiding disease, notes says Lu Tang, a communication researcher at Texas A&M University. “We can detect an illness outbreak on social media,” she says, “but also we can figure out what people think about it.”
Social media maps disease
The social media site Twitter exploded onto the social media landscape in 2007. In 2010, the site had 30 million users who logged in at least once a month. By the end of 2019, that number had grown to 145 million. It was clear early on that the site could be a powerful tool for people from all walks of life—including public health officials and scientists.
In a 2011 PLOS ONE article, researchers retroactively mined Twitter data from 2009, the year of the H1N1 swine flu pandemic, and found that the use of predetermined key words such as “flu,” “Tamiflu,” and “vaccine” could provide accurate real-time estimates of the number of flu cases in a geographic region up to two weeks before the US Centers for Disease Control and Prevention could track confirmed cases using hospital records.
Other studies have offered similarly promising results for the use of social media in tracking the geographic march of a disease and predicting how many more cases are likely to emerge in a given area. In 2012, HealthMap researchers reviewed the media articles, tweets, and government reports that the platform would have collected during the first 100 days of the 2010 cholera outbreak in Haiti and showed that the data could have revealed trends in disease spread two weeks earlier than official case data did. And a 2014 study showed that combining data from Twitter with data from the CDC’s Influenza-like Illness surveillance network (ILINet) , which tracks visits to health-care practitioners for flu-like symptoms, resulted in more accurate real-time influenza forecasts than relying on ILINet alone.
Social media data help overcome the challenge that variation in the coverage of surveillance efforts poses to disease mapping, says Cohn—an issue highlighted by the inconsistent methods for reporting Ebola cases in West Africa. Social media “captures data from otherwise underrepresented populations” such as those with limited or no access to health care, she writes in an email to The Scientist.
The same could be said for data on Internet searches—something Google realized right around the time that Twitter came on the scene. The Internet giant designed a platform to mine search data from around the world for terms related to disease outbreaks, particularly influenza. The program’s accuracy “almost exactly matched the CDC’s own surveillance data over time—and it delivers them several days faster than the CDC can,” Nature reported in 2013. Google went on to deploy the system in 29 countries to track cases of flu and, later, dengue fever, Nature noted, but the program repeatedly overestimated flu prevalence. The company acknowledged that media coverage may have led to inflated estimates, but said that the algorithm had been improved.
Google-funded researchers also announced that the program had failed to detect the nonseasonal 2009 H1N1 outbreak in their data. “Google Flu Trends was successful until it wasn’t anymore,” says Kristian Andersen, an infectious disease geneticist at the Scripps Research Institute in La Jolla, California. The program failed, outside scientists reported in Science, in part because the algorithm identified flu-related search terms based on what people searched for during the times when flu cases peaked according to the CDC’s official flu reports, instead of selecting keywords directly related to infection or symptoms. This caught many terms that were not actually related to flu, such as “high school basketball season,” which, as a winter sport, often lines up with flu season, explains Andersen. In 2015, the company shuttered the Google Flu Trends program.
Making sure that the keywords are representative of what the researchers want to track is also a challenge for researchers extracting data from social media sites. Algorithms tasked with making sense of huge amounts of search data, tweets, or Facebook posts need to work in many languages, and the targeted terms may need to evolve over time, notes Soo-Yong Shin, a computer scientist who studies health data at Sungkyunkwan University in South Korea. Before the Middle Eastern respiratory syndrome (MERS) outbreak in the Republic of Korea in 2015, only experts were familiar with the acronym, but “everybody knows ‘MERS’ right now,” Shin says. As more people tweeted about MERS based on what they’d seen in the news, the accuracy of reports based on one of his own sets of search terms for the disease dropped from around 98 percent to 60 percent after one year. Text-mining algorithms can also be susceptible to false positives. For example, the term “Bieber fever”—a slang term referring to an obsession with pop star Justin Bieber—is often flagged as possibly being linked to fever-causing disease.
Thus, while computers can do much of the work, tracking disease in this way will require human experts who can verify and interpret the information as it comes in. Moreover, while social media may allow researchers to reach groups underrepresented in the health-care system, it may miss other swaths of humanity, says Texas A&M’s Tang. People on Twitter tend to be younger and more highly educated than average for their area. “Twitter users only represent a part of the population,” she says. “By looking at Twitter only, you will inevitably miss out [on] some population [such as] older people, less educated people, or people living in rural areas with less than perfect Internet connections.”
Rather than rely exclusively on search or social media data, many researchers believe the way forward is to combine information from varied sources, including standard media and official reports, just as HealthMap does. “The integration of these big data sources, including social media information, is a very promising field that I’m sure will generate some very fruitful outcomes in the years ahead,” says Gerardo Chowell, a mathematical epidemiologist at Georgia State University.
How HealthMap Tracks Disease
HealthMap, an online database created by researchers at Boston’s Children’s Hospital in 2006 to collect accounts of potential disease cases from sources available online, mines text from various online outlets for terms that suggest disease outbreaks. The system pinpoints the location of the case on a world map and reveals clusters as they begin to emerge, such as when reports of a “strange fever” began to pop up at the start of the Ebola outbreak. Epidemiologists review and confirm data, then use them to predict how quickly the disease will spread.
A human element
While the use of social media to help track outbreaks is still maturing as an epidemiological tool, there is something that Twitter, Facebook, and other user-input information from the internet adds above and beyond helping to simply identify cases: hints about the mindsets of those affected by a given disease. Social media’s unique ability to capture information not just concerning where diseases are popping up, but how people are responding to them, may prove invaluable to public health.
Tang studies how social media users discuss vaccines and the illnesses they prevent. Currently, she’s analyzing data from the recent measles outbreak in parts of the US, hoping to figure out what beliefs surrounding the disease prevent or motivate people to get vaccinated. She hopes social media activity will help her assess perceptions of the disease’s severity, how likely social media users think they are to contract it, and if they believe the vaccine is dangerous. “If we understand why people do not get the vaccine, then we can create personalized messages targeting these different types of populations,” Tang says.
Other researchers are undertaking similar work. Chowell, for example, has used social media to study whether people choose to follow evacuation orders under threat of hurricanes, which can bring heavy rains and promote the spread of mosquito-borne disease. He and his team use that information to predict where an outbreak of an infection might go next and determine what types of efforts, such as closing schools or airports, could limit a pathogen’s spread. Meanwhile, University of California, Los Angeles, behavioral psychologist Sean Young uses social media channels to characterize conversations surrounding HIV and PrEP, the preventive drug approved in 2012, to determine who decides against the treatment and why.
Such information—data that goes beyond maps and statistics of disease—could lead to critical insights into how to best respond from a public health perspective, says Paul Russo, who researches social media networks at Yeshiva University in New York City. “If you begin to think about the implications of knowing where people are, who they’re with, and what they’re doing, it’s groundbreaking.”
On December 29, @hannastasia tweeted: “Hey @Postmates_Help @Postmates Can you explain why I’m not getting a refund on the meal that my dr assured me gave me food poisoning, losing me a day of work & pay? I’ve asked twice and they say they can’t help me. I won’t be using your service again and will tell others not to.” The next day, @foodsafetySTL, an account run by the St. Louis Health Department, replied with a link to report the case to the city of St. Louis Department of Health.
The health department was part of a project aimed at using Twitter to track outbreaks of foodborne illness, a malady that is notoriously underreported. While the CDC estimates that 48 million people experience food poisoning each year, only 128,000 are hospitalized. Many never see a doctor at all, nor do they report the illness to public health officials. But some people do report symptoms on social media. Tweets about symptoms or comments on the restaurant-review site Yelp about gastrointestinal discomfort after eating a specific food or dining at a particular establishment could point to unreported cases. But rather than simply mining the text and making assumptions, health officials wanted to use Twitter as a tool to engage the community and encourage people to play an active role in reporting disease.
Several years ago, public health officials across the country began teaming up with researchers at HealthMap—an online database that collects accounts of disease cases from various online sources—to develop strategies to identify tweets about suspected cases of food poisoning in certain geographic areas to track outbreaks. Once a tweet has been identified, a local public health representative reaches out to the poster in hopes of gathering more information about how the illness was contracted. A pilot study in St. Louis showed that the strategy did increase the number of food poisoning reports in the city from Oct 2015 to May 2016. Between 2016 and 2018, there were more than 50 reports of food poisoning that came from people that the department had engaged with on Twitter. Many other local health departments have partnered with HealthMap to improve foodborne illness reporting.
Boston Children’s Hospital’s Emily Cohn, who works on HealthMap, says she and her colleagues have taken a similar crowdsourcing approach with another website, called Flu Near You. Rather than collecting media and tweets about flu, the site asks users throughout the country to register and then anonymously report how they’re feeling each week. Researchers plot the locations of those that report flu-like symptoms on a map of the United States. A 2018 study suggests that if a sufficient number of people contribute information, the data can complement CDC reports.
Asking for public participation through social media has also been employed for tracking very different kinds of disease—those that affect plants. One group at Iowa State University uses Twitter to gather information from farmers about symptoms of infections or infestations in their fields. The researchers created two Twitter accounts, @soydisease and @corndisease, that farmers can tag in tweets with photos of their crop, with location information, if they suspect it is diseased or suffering a pest infestation. The team enters that information into a database to track outbreaks throughout the country to keep farmers informed and warn those growing in at-risk areas. In 2018, for example, the researchers reported that they were able to track an outbreak of the corn pathogen Puccinia polysora, which causes southern rust, as it traveled north from Texas and Louisiana to Kansas, Nebraska, Indiana, and other states. The group is now working with scientists across the country to create online maps monitoring other pest and disease outbreaks in more crop species.
Twitter is a simple, free, and easily-accessible platform, making it particularly suitable for tracking disease, the researchers argue in their 2018 paper. The study showed that “representatives from across a wide variety of agricultural sectors can contribute to a plant disease monitoring system using a common social media engine.”
Emma Yasinski is a Florida-based freelance reporter. Follow her on Twitter @EmmaYas24.