Martha Nelson, a senior biologist in the Laboratory of Parasitic Diseases at the US National Institute of Allergy and Infectious Diseases, tracks variants of SARS-CoV-2 in as close to real time as possible. But linking viral genetic sequence data with details about the person that sample came from is often impossible, yet those details are crucial to understanding how the virus is spreading.
“It’s been painfully slow because there is no national system for data sharing and piles of red tape to get patient metadata needed to make a genetic sequence meaningful,” Nelson writes to The Scientist in an email. “The CDC has been leading efforts to get US sequencing higher in terms of volume, but the lack of coordination and sharing of data makes the information fragmented and hard to interpret.”
What would be ideal, she says, is marrying sequence data with contact tracing information, but for now, she and her colleagues are forced to piece together what geographic and clinical information they can to trace SARS-CoV-2 variants around the country. The Scientist spoke with Nelson about her research and the barriers it faces.
The Scientist: What is your role in tracking SARS-CoV-2 variants in the US, and what are your primary goals in this work?
Martha Nelson: My goals are to understand how these variants are transmitting in the US. We can see that there’s a certain proportion of these viruses that comprise a proportion of the US population of SARS-CoV-2 viruses. And for some of these variants, it’s still very small. But that proportion only tells you so much. It doesn’t tell you how these viruses are getting here, how well they’re spreading, how quickly they’re spreading in different places, how much they’re spreading across different regions of the US. And that’s what we can infer from the genetic data, from the genetic sequences, by seeing how variants that we see in different states, in different cities, how genetically related they are, whether they’re part of the same transmission lineage, or whether they appear to be separate introductions.
Even though there’s much less travel globally and within the US right now because of the pandemic, there’s still a lot of people flying across the world, and we can see how these variants have been introduced multiple times, probably through international travel.
Then our question is, once they get here, how much are they spreading? Do they seem to be outcompeting other viruses? . . . Do they threaten our vaccination strategy? My job is really to crunch the genetic data and understand what that means for the course of the epidemic trajectory in the US.
TS: You told me in an email that progress had been ‘painfully slow.’ Can you elaborate on the challenges to date?
MN: In terms of sheer volume of SARS-CoV-2 sequences, the US is number two behind the UK, and there’s a large volume of genetic data coming out of our labs. . . . But at the same time there’s been some areas that are really particular struggles for the US, and some of that just relates to the way that our country is decentralized, and the way that data is collected ad hoc, by academic groups, by public health labs. And there’s no central coordination or funding stream.
GISAID and GenBank are these databases where you can very easily access the genetic sequence. . . . What’s really important is to put that genetic sequence in context, and to do that you need to know a bit about the person that it was collected from. It would be really helpful, for example, if you see variants that have all been collected from a certain state—are those all from a single household in that state? Is that one tiny transmission chain? Or are those collected from multiple different counties separated by hundreds of miles that could be something that’s really widespread across the entire state?
There’s protections of patient privacy, but the real barrier is to sharing it at a national level, so that you can get a national picture, or even a regional picture, and connect data that’s been collected by different groups and get a more holistic picture of what’s happening with these variants in the country. And that’s what’s really hard to do.
TS: Where is your data coming from?
MN: Some of it is by people who I happen to know who have sequenced viruses from one particular population. A lot of the data is from GISAID, and it is publicly available data. I try and use as much data that’s available as I can to get a larger picture of what’s happening across the US. But [when it comes to data sharing], it’s still really hard to get the additional data.
We have this amazing tool of this genetic data, but I think that in some ways the power of genetic data also can raise alarms and make people wary that you can use this genetic data to trace an individual patient, and sometimes they’ve been reported in the news. And you don’t want to have this patient zero who was associated with this entire outbreak of this variant that proves to undermine the entire vaccine. People can imagine these narratives that would backfire and potentially punish them for releasing data.
You understand why people are reluctant to share the data. No one wants to share information that ends up getting someone in trouble or violates someone’s privacy. But the thing is, everyone’s just interpreting how to protect privacy on their own. There’s not really a strong standard right now, and it’s something where more clarity would be really helpful.
TS: That leads nicely into my next question, which is: How do you think we can improve the system to better track variants going forward? How can we overcome these issues?
MN: Some of the ideas that have been floated are creating new databases that would be quasi protected. So there’s different levels of protection for genetic data. GenBank is completely public—anyone can use that data, there are no restrictions. Then you have GISAID, where you can access the data and you can look at the data, but you can’t directly share the data with anyone else. You have to acknowledge the source of the data in the supplementary table. You’re encouraged to work with people who submitted the data.
It’s kind of like this is the homestretch, and if we can just keep a lid on it until we have enough people vaccinated, we could really prevent this variant problem.
But then there’s potentially a need for another database, where people would be more comfortable submitting more potentially private patient metadata that would be restricted to a certain number of people who can interpret the data in a rigorous way and that have bioinformatics tools for visualizing the data so that people who submit it can rapidly interpret it. And we have this design in mind of this perfect ecosystem where people could submit their sequences and their metadata, and visualization tools would help them interpret it in real time, and that would encourage submission of data, and that would also allow people to get a more holistic picture that includes data from lots of groups around the country. That’s something that I think is one of the most realistic ways to move forward.
TS: I know you have yet to publish on this research, but is there anything preliminary you can tell me at this point about the variants circulating in this country? And related, how do you expect this situation to evolve?
MN: We’re really in a race against time with rolling out vaccine and these variants starting to surge. I mean, we know from what we’ve seen in Europe and in other countries, Brazil, South Africa . . . and even some of our homegrown variants in California, in New York, that these variants have the capacity to rapidly take off in populations. And we know that they’re simmering in low levels. . . . We’re sort of [in] the calm before the storm, as bursts of transmission are likely to occur. And how many people we have vaccinated by the time these bursts begin is really the race-against-time question.
But it does emphasize that we’re in this really important inflection period right now, where there’s a lot of motivation to lift restrictions, and open businesses and open restaurants. And doing that right as these variants are transmitting under our nose and creating conditions for new epidemics is, it’s kind of like this is the homestretch, and if we can just keep a lid on it until we have enough people vaccinated, we could really prevent this variant problem.
I think the upcoming months are going to be really interesting to see how the variants play out with this mix of a proportion being vaccinated, variants . . . that we’re already seeing starting to take off in certain places, and lifting of restrictions that facilitates that [spread].
I can tell you from our initial data that the low proportion of variants that we see overall, certainly for some of these variants in certain places, it belies a story where if you look at the genetics, you can see that there are transmission clusters. . . . There’s so much heterogeneity in the transmission of SARS-CoV-2. So you have this system where it kind of percolates and percolates, and then you have some superspreading and that just facilitates a whole game change in how the epidemic plays out.
Editor’s note: This interview was edited for brevity.