Update (May 18): A whistleblower complaint filed last week with Stanford University reveals that the Santa Clara study was partially funded by JetBlue Airways founder David Neeleman, who has spoken out against the use of lockdowns to slow the spread of COVID-19, BuzzFeed News reports. The information, which was not publicly disclosed, raises “concern that the authors were affected by a severe conflict of interest,” according to the complaint, which was filed by someone involved with the research. The complaint also suggests that the study’s authors disregarded warnings raised by Stanford professors about the accuracy of the antibody test used. In interviews with BuzzFeed, Neeleman and study coauthor Eran Bendavid denied that Neeleman or other funders had influenced the study. 

Update (May 1): Bhattacharya and colleagues respond to criticisms of the Santa Clara study in a revised preprint posted yesterday. Using updated statistical analyses, the...

In mid-April, an eye-catching statistic appeared in the news: the number of people who’d been infected with SARS-CoV-2 in the California county of Santa Clara was 50 to 85 times higher than thought. While just 956 cases of COVID-19 had been officially recorded by April 1, the true number of infections was between 48,000 and 81,000, outlets reported.

The news drew from a preprint posted to medRxiv on April 17 describing what’s known as a seroprevalence survey. A team led by researchers at Stanford University had tested 3,330 people for antibodies against SARS-CoV-2, and 50 had shown up positive. Using statistical analyses to extrapolate their findings, the team concluded that the county’s infection rate was 2.5–4.2 percent, or 48,000–81,000 people.

These elevated numbers were reassuring, some outlets noted, because they suggest that most SARS-CoV-2 infections are milder than feared—a point seized on by conservative political commentators and some of the study’s own coauthors as support for the view that restrictive lockdown measures are an overreaction.

But epidemiologists, statisticians, and other many other researchers were quick to express concerns—on Twitter and in lengthy blog posts—about several aspects of the study, from the choice of testing kit to the recruitment of participants to the statistical treatment of the data.

“I did not anticipate the firestorm,” Jay Bhattacharya, a professor of medicine at Stanford University and the study’s senior author, tells The Scientist. The team stands by the findings, he adds, but plans to rework parts of the paper following the criticisms. “It’s a preprint.”

That the study received so much attention is partly a testament to just how central antibody testing has become to the discourse about COVID-19 in recent weeks. This kind of test aims to detect people who have been exposed to SARS-CoV-2, in contrast to PCR-based diagnostics that pick up active cases of the disease. Public health experts and politicians see it as a critical tool in understanding the virus’s true spread and the epidemiological effects of lockdowns and other attempts at mitigation.

Some epidemiologists and statisticians argue that the preprint’s results are consistent with all or most of the 50 reported positives being false.

Yet not all seroprevalence studies are created equal—a point that needs to be clear when discussing the implications of their findings, says Eva Harris, a professor of infectious diseases at the University of California, Berkeley, School of Public Health. Harris is planning a long-term study of thousands of people across the East Bay area to monitor how seroprevalence and the number of asymptomatic infections in the community respond to changes in COVID-19 mitigation strategies.

“I think that it’s really important that many places do seroprevalence studies—I’m super supportive of that,” Harris says. “I also think it’s incredibly important that people understand the limitations” of individual studies, she continues. “The study design and the test used and the interpretation have to be transparent to the [scientific] community, and there has to be some way to communicate that to the public.”

Serology ramps up

Governments in countries including the UK, Spain, and Italy are planning largescale seroprevalence surveys, and the National Institutes of Health (NIH) recently announced that it plans to recruit up to 10,000 people in the US for its own study. Many US states and institutions are running local versions, too.

Of the few studies reporting data so far, most appear to have found broadly the same thing as the Santa Clara study—that the true number of SARS-CoV-2 infections is higher than official case counts capture, but low as a proportion of the population in most areas.

Last week, New York Governor Andrew Cuomo said that researchers had found antibodies in 21 percent of roughly 1,300 people surveyed outside grocery stores and other shops in New York City—one of the worst-hit regions in the world. Virologists in Germany who surveyed 500 people in the town of Heinsberg told reporters a couple weeks ago that they’d found antibodies in nearly 15 percent. (Both announcements were light on methodological details.)

Another study by Bhattacharya and others estimated a seroprevalence of around 4.1 percent in Los Angeles County, California, based on a survey of 863 people. The study report, which was leaked and temporarily hosted on conservative website RedState.com, is not publicly available, Bhattacharya says, though there is a press release available from the county. The findings have been submitted to a peer-reviewed journal, he adds.

How to choose an antibody test

A major concern raised about the Santa Clara study was that the type of antibody test used was too inaccurate to support the paper’s conclusions—a concern that epidemiologist Aubree Gordon of the University of Michigan says she shares. For seroprevalence studies, “the top thing you’re going to think about is test performance,” says Gordon, who’s worked on surveys of Zika, chikungunya, and dengue and is currently developing a lab-based SARS-CoV-2 antibody test.

Both the Santa Clara and the LA County studies used a test kit manufactured by Chinese company Hangzhou Biotest Biotech, which is not on China’s approved manufacturers list and has since been banned from exporting its kits, NBC reports. The US Food and Drug Administration (FDA) allows this kind of kit to be marketed in the US, but has not formally approved it or vouched for its efficacy.

If you’re estimating a rare disease, it will only work if your test has a very low false positive rate.

—Andrew Gelman, Columbia University

The test is based on what’s known as a lateral flow immunoassay and is designed to detect antibodies in blood taken from a finger prick. Unlike lab-based tests on larger blood samples, which allow for repeat testing and provide quantitative results of antibody abundance, these so-called “point-of-care” kits return a one-off “positive” or “negative” based on some threshold antibody level set by the manufacturer.  

For some researchers in public health, this is a non-starter. “I don’t think any of the current point-of-care tests are appropriate for use in seroprevalence surveys,” says Michael Busch, the director of Vitalant Research Institute, a nonprofit transfusion medicine organization. His team is coordinating a long-term, NIH-funded seroprevalence study using lab-based tests of donor blood across the US—initially in six metropolitan areas, but later in additional parts of the country.

Good antibody surveys require samples that can be retested, he adds. While lateral flow immunoassays offer rapid results, they’re “very non-specific [and] are not amenable to repeat testing and confirmation. . . . If you don’t have a good test, there’s no point in running a serologic survey.”  

Neeraj Sood, the vice dean for research at the University of Southern California’s Price School of Public Policy and a collaborator on both the Santa Clara and LA County studies, argues that you “don’t need a perfect test,” provided you understand the test’s performance—in particular, its sensitivity and specificity.

A very sensitive test returns no or few false negatives for people who have the antibodies. A very specific test returns no or few false positives for people who don’t. When trying to detect something relatively rare such as SARS-CoV-2, specificity is usually the primary consideration because it’s important to avoid the detection of other things in the blood—such as antibodies for any of the relatively harmless coronaviruses already common in humans.

The Heinsburg researchers used a test they claimed had a false positive rate of less than 1 in 100—although other groups have challenged that assessment after running their own assays. The Santa Clara study, meanwhile, reported a rate of 2 in 401.

These false positive rates are rather too high when you take into account statistical uncertainty around those numbers, says Andrew Gelman, a Columbia University statistician who detailed several criticisms and “avoidable screwups” in the Santa Clara preprint on his blog. “It’s a well-known problem in all the introductory probability textbooks,” he adds. “If you’re estimating a rare disease, it will only work if your test has a very low false positive rate.”

For a virus that infects 1 percent of the population, say, a test with a known false positive rate of 1 in 100 is expected to return as many false positives as true positives. Based on the quoted specificity of the test kit used in Santa Clara, some epidemiologists and statisticians argue that the preprint’s results are consistent with all or most of the 50 reported positives being false.

Bhattacharya says that critics focus too much on test specificity, rather than on the combined effect of specificity and sensitivity on the expected number of positives and negatives. He adds that new data on the test’s specificity will be published in the revised preprint, and that the team is confident about the test’s performance.  

Most large seroprevalence surveys have avoided the kind of test used in the Santa Clara study. Like Busch, Harris says her team will rely on well-validated, lab-based tests for its East Bay study. The researchers plan to look for SARS-CoV-2 antibodies in blood collected from finger pricks and for viral DNA in saliva collected using oral and nasal swabs. They will also collect venous blood draws from a subset of 500 participants—250 positive for SARS-CoV-2 antibodies and 250 negative—and use them to evaluate the specificity and sensitivity of the tests used on finger-prick blood.

How to recruit participants

For their Santa Clara study, the Stanford researchers posted advertisements on Facebook saying they were “looking for participants to get tested for antibodies to COVID-19.” The reasons for this strategy were practical, says Sood: social media offers a fast, cheap way of getting people involved.

As readers pointed out, this kind of recruitment strategy can introduce selection bias, as people who think there’s a chance they’ve been infected may be most likely to participate. If at least some people have grounds to be worried, this effect can lead to a higher proportion of infected people in the sample than in the general population—though Sood notes biases can run the other way, too, if “worried well” people who overestimate their risk also decide to take part.

Within days of the preprint being posted, scientists and members of the public were using Twitter to share evidence that selection bias may have been an issue, as people encouraged friends who thought they’d been exposed or had COVID-19–like symptoms to participate in the study, and in some cases seemed confused about whether participation would result in a diagnosis. (It didn’t.)

Compounding these concerns, BuzzFeed News revealed last Friday (April 24) that participants were also recruited via an email sent by Bhattacharya’s wife, radiation oncologist Catherine Su, the day before the study started. The email, which Bhattacharya told BuzzFeed he had nothing to do with, falsely claimed the test was “FDA approved,” and would tell people “if you are immune” and “FREE from the danger of a) getting sick or b) spreading the virus.”  

Bhattacharya tells The Scientist that the researchers learned of Su’s email only after it was sent. The team subsequently tried to correct for bias by upping recruitment from parts of the county not targeted by the email. These and other methodological details omitted from the preprint will be addressed in the revision, Bhattacharya adds.

Other seroprevalence studies are taking precautions to avoid or minimize recruitment biases as much as possible. Vitalant’s study, for example, is using donor blood from people who have previously consented for their blood to be used in scientific research in general, rather for than a specific test related to COVID-19. Since March, and with the goal of continuing through the summer, the group has been working to collect 1,000 samples per month from each of the six study sites, along with demographic information about the donors.

We don’t know whether the antibodies that result in seropositivity provide any protection against re-infection.

—Joseph Wu, Hong Kong University

This population is biased toward people healthy enough to give blood, Busch acknowledges, but he notes that, based on previous studies Vitalant has conducted for outbreaks of dengue, Zika, and other viruses, the approach has “proven to be quite informative in the ways we can generalize from blood donors to the general population.”

Another option, Gordon says, is to select people randomly, from a roster of residents in a county, say, or members of a university community. Setting quotas for age, race, and other demographic characteristics allows researchers to recruit a group that reflects as much as possible the wider community in that area—a particularly pertinent issue for research relating to COVID-19, which appears to hit some parts of society more than others. US hospital statistics assembled by the US Centers for Disease Control and Prevention, for example, indicate that the disease is disproportionately affecting Black Americans, as well as causing more serious infections in older people and men compared with the rest of the population.

Achieving representation is a difficult task for studies that recruit via social media. In the Santa Clara study, only 5 percent of participants were over the age of 65, despite seniors making up around 13 percent of county residents. Non-Hispanic whites made up nearly two-thirds of the study group, but account for less than one-third of the Santa Clara community. The authors note in their preprint that they were able to statistically adjust for some but not all of these demographic discrepancies.

Even roster-based approaches are still susceptible to selection bias, Gordon notes, as people invited to take part have to provide consent—a decision that may be affected by a person’s perceived level of exposure risk. In this situation, it can be difficult to know if the sample is biased or not, she adds, but researchers can address the problem by collecting additional data about who agrees to participate and why, so that attitudes toward testing can be controlled for later in the analyses.

Harris notes that the East Bay study, which will invite people to participate via flyers in English and Spanish sent to every home in the region, will collect this kind of information, as well as details about the likelihood of exposure in the time leading up to participants’ provision of samples.

How to communicate a study’s results responsibly

For many researchers, the problems with the antibody surveys reported so far lie not just in how the work was carried out but in the conclusions the authors have drawn and subsequently publicized.

One of the Santa Clara preprint’s coauthors, biotech investor Andrew Bogan, used a Wall Street Journal op-ed to argue that the study’s findings meant COVID-19 mortality is close to that of seasonal flu, despite criticisms of the study’s methods and the fact epidemiologists have raised concerns about underreporting of deaths as well as infections.

The op-ed, which didn’t initially disclose Bogan’s involvement in the preprint, also questioned the logic of lockdowns around the country in light of the seroprevalence data. Bhattacharya, along with Stanford study coauthors Eran Bendavid and John Ioannidis, made similar points in interviews and opinion articles before and after the preprint was posted.

See “Opinion: Public Health Trumps Privacy in a Pandemic

A two-page summary of the Heinsberg study (translated using Google Translate), meanwhile, concludes that around 15 percent of the population now has “immunity” to SARS-CoV-2 and “can no longer be infected.” Study coauthor Hendrik Streeck of the Institute of Virology at the University of Bonn had said in interviews in late March that he thought SARS-CoV-2 was “not that dangerous.”

Although some of these conclusions may turn out to be correct, researchers tell The Scientist that they’re not supported by current scientific evidence—least of all by the seroprevalence studies they cite. Indeed, antibody testing is unlikely to be the solution to the lockdown measures that some people think it is, Busch adds.

I did not anticipate the firestorm. It’s a preprint.

—Jay Bhattacharya, Stanford University

“While seropositivity . . . is a good proxy for infective exposure, it does not necessarily indicate seroprotection,” Joseph Wu, a disease modeler at Hong Kong University who is involved in a long-term seroprevalence study using lab-based blood tests to monitor different age groups’ exposure to SARS-CoV-2, writes in an email to The Scientist. “This means we don’t know whether the antibodies that result in seropositivity provide any protection against re-infection.” Studies that take a snapshot of seroprevalence at a particular time and place are instead useful as ways of “estimating the proportion of true infections that have been under-reported,” he adds, which in turn can be used to more accurately estimate infection rates in the future.

See “What Do Antibody Tests for SARS-CoV-2 Tell Us About Immunity?

Harris agrees that antibodies and immunity are “two different things.” She notes that antibody levels vary substantially between people and wane over time, and that researchers don’t yet understand the effect of repeated SARS-CoV-2 exposure on a person’s risk of getting sick—an important consideration when modeling the effects of some other viruses. Long-term studies are needed to understand these kinds of patterns and their influence on the future spread of SARS-CoV-2.

While Harris says she hopes that the number of severe or fatal infections really is a tiny proportion of the overall tally, she notes that the data reported so far still suggest the vast majority of people haven’t been exposed.

“Even in the best-case scenario,” she adds, “where there’s a lot of people infected and only a small percentage that gets really sick, look how sick they’ve become and look what it’s done to the healthcare system.”

Interested in reading more?

The Scientist ARCHIVES

Become a Member of

Receive full access to more than 35 years of archives, as well as TS Digest, digital editions of The Scientist, feature stories, and much more!
Already a member?