Update (June 8): Surgisphere’s online COVID-19 Response Center and the four web tools hosted on it have been taken down from the company’s website.
Anonprofit organization in Africa that promoted a tool to help clinicians determine how to allocate limited medical resources among COVID-19 patients is walking back its recommendations in the wake of a scandal involving the company that collaborated on the project.
Surgisphere Corporation, an Illinois-based company founded in 2008 by vascular surgeon Sapan Desai, has come under fire in recent days for failing to obtain independent validation for datasets used in two high-profile studies in The Lancet and the New England Journal of Medicine. Both papers are now retracted.
The Lancet study, which reported safety concerns about the use of the antimalarial drug hydroxychloroquine in coronavirus patients, led the World Health Organization to suspend part of a clinical trial. Testing resumed last week once scientists began to express doubts about the veracity of the data, and reports by The Scientist and other outlets exposed serious concerns about the company.
Originally a producer of medical textbooks, Surgisphere has seen its profile as a data analytics company soar since the start of the COVID-19 pandemic. Working with the African Federation for Emergency Medicine (AFEM), an international nonprofit organization dedicated to supporting medical care across the continent, Surgisphere developed a COVID-19 Severity Scoring Tool to help clinicians decide how to allocate limited resources such as oxygen and mechanical ventilators to patients who need them most.
In the last couple of months, AFEM has promoted the tool for use in 26 countries across Africa (although The Scientist could not determine how many clinicians are currently using it), and several institutions had been set to launch validation studies of the tool in clinical settings. Those activities have all been halted following the retractions and a stream of questions about Surgisphere and Desai himself.
The people who will suffer . . . are African patients. That is the unfortunate reality now, regardless of the outcome of the (totally necessary) investigations into Surgisphere.—Lee Wallis, African Federation for Emergency Medicine
In a statement posted June 5, AFEM announced that it recommends clinicians stop using the tool. “We recognise that we have promoted the use of this tool, and are embarrassed that these findings surrounding Surgisphere have led to our needing to rescind this resource,” the statement reads.
“Over the last decade, AFEM has worked with healthcare providers, researchers, and policymakers throughout the continent to expand emergency care,” the statement continues. “We have built a reputation of developing these systems through informed, evidence-based recommendations, and we deeply regret that this was not one of them.”
A collaboration to aid Africa’s pandemic response
In phone interviews and email correspondence with The Scientist, AFEM’s founding president Lee Wallis says he reached out to Surgisphere about developing a clinical aid for local doctors a few months ago, after discovering the company online.
“Dr Desai was explicit that he didn’t need the company logo associated with it, and had no expectation even of us acknowledging his involvement,” Wallis tells The Scientist in an email. “[Desai] never asked for any endorsements or any other form of benefit, and has repeatedly expressed that he simply wanted to do this to help the response to the pandemic.”
According to Surgisphere, the company then developed the Severity Scoring Tool using advanced machine learning algorithms and the firm’s database of thousands of COVID-19 patients.
The validity of that database has been called into question in recent weeks by hundreds of scientists who say the numbers of patients from various continents don’t seem to add up. In the Lancet paper, for instance, Surgisphere claimed to have amassed data on more than 63,000 COVID-19 patients hospitalized in North America by April 14. But some of the largest health networks in New York, New Jersey, and Illinois—among the states worst hit by the pandemic—tell The Scientist they did not contribute to the company’s database. Multiple institutions once listed on Surgisphere’s website as collaborators have confirmed to The Scientist that they have no records of working with the company.
By April, AFEM had developed a paper version of the Severity Scoring Tool—which guides clinicians through a series of questions about their patients to predict the severity of each case—specifically for use in low-resource areas, and publicized the web version via their website. The ministries of health in Sudan and Tanzania had incorporated the tool into official clinical guidelines, Wallis says. The organization also secured approval from the relevant ethics committees to carry out validation studies with the tool in Sudan and South Africa. Data collection was due to start in the coming week.
In conversations with The Scientist a few days ago, Wallis, who is also on the board of directors for the International Federation for Emergency Medicine, noted the need for caution in rejecting the paper version of the tool, which he and his collaborators had poured resources into and were hoping to see through validation studies. He added that informal tests with the tool seemed to show it working well.
However, now that the coauthors of Surgisphere’s two published studies have retracted them after failing to receive evidence of the validity of the company’s database, the organization is left with no choice but to withdraw its recommendations and the planned studies, Wallis tells The Scientist.
Surgisphere’s COVID-19 Severity Scoring Tool: “It was not credible”
The Severity Scoring Tool is one of a handful of so-called “decision support tools” freely available on Surgisphere’s website as part of its COVID-19 Response Center.
To use the web application, a clinician has to enter details about a patient’s symptoms and underlying health, and then click to receive a prediction. The paper version walks clinicians through a decision tree of questions about whether or not a patient has certain underlying health conditions, and whether their vital signs are above or below a certain threshold. The patient criteria to be included in the tool were suggested by AFEM, but the structure of the tree, and the thresholds themselves, were set by Surgisphere.
Surgisphere has declined to release details about any of its COVID-19 support tools in response to multiple requests from The Scientist, and Desai declined to comment on this story through the public relations firm Bliss Integrated.
According to Wallis, Desai said the Severity Scoring Tool had been developed using advanced machine learning methods on data from 13,500 hospitalized COVID-19 patients—a claim repeated in Wallis’s editorial on the tool in the African Journal of Emergency Medicine in early April. He says Desai also told him the tool had been validated on around 45,000 hospitalized COVID-19 patients. A March 26 press release from Surgisphere states that the tool was developed using “prospectively collected real time data on more than 20,000 COVID-19 patients.”
Desai further claimed in his communication with Wallis that the machine learning algorithms behind the online version would be automatically refined as Surgisphere’s database continued to grow.
Given that the Severity Scoring Tool was initially derived using data from Surgisphere, these data being called into question also calls into question our tool.—African Federation for Emergency Medicine
Around the same time, Desai sent information about the tool to researchers preparing a review article in The BMJ. Maarten van Smeden, a coauthor of the review and a medical statistician at University Medical Center Utrecht in the Netherlands, and colleagues have been keeping track of predictive tools in use during the COVID-19 pandemic. Van Smeden tells The Scientist that his team reached out to Surgisphere Corporation in March and again in April to request information about how its COVID-19 response tools worked.
In documents seen by The Scientist, Desai told the team that the Severity Scoring Tool had been developed using data from 14,390 hospitalized COVID-19 patients, and validated with data from an additional 42,340. He added that the full Surgisphere registry by April 12 contained data from 70,361 patients from more than 800 healthcare institutions. (According to the now-retracted Lancet paper, the registry contained 96,032 patients from 671 institutions by April 14.)
He further told the BMJ authors that the Severity Scoring Tool had already been used by “250,000 people from 114 countries.”
The numbers didn’t add up at that stage in the pandemic, van Smeden tells The Scientist. Considering the lack of detail in the documents’ description of development and validation, he found the whole thing “incredible,” he says. “As in, it was not credible.” The authors chose not to describe Surgisphere’s tool in their review, the latest version of which was published on June 3.
How the online tool works
The code behind Surgisphere’s Severity Scoring Tool is accessible via the HTML version of the company’s website, and contains the rules the application implements to predict a severity score of “moderate/mild,” “severe,” or “critical” on the basis of inputs such as age and heart rate.
Multiple researchers who work with machine learning tell The Scientist that, because the rules underlying the tool are hard-coded—that is, all of the steps and parameters are written out—it would be very difficult to have the application automatically update itself, as Wallis had been led to believe by Desai.
“Every time you get new data, you’d have to go in and change those numbers, or the structure of the code,” says James Watson, a senior scientist at the Mahidol Oxford Tropical Medicine Research Unit in Thailand who organized two open letters expressing concerns about Surgisphere’s studies. “That would just be clearly the wrong way of doing it. No one would do it like that.”
A version log tracking “major updates and changes” to the Severity Scoring Tool doesn’t show any entries after March 31, although it’s not clear whether smaller changes were made after that.
Andrew Forbes, a biostatistician at Monash University in Melbourne, notes that the application also seems to collect data without any attempt at validating the accuracy of what’s being entered. After filling out the Severity Scoring Tool online and clicking to receive the score, a pop-up box asks the user to confirm whether or not the prediction was accurate, and states that, provided the user consents, “The data you entered will become part of our more than 13,000 patients and help improve the quality of this severity scoring system.”
The availability of this option raises serious questions about the quality of data collected by the application, Forbes writes in an email to The Scientist. “I am surprised that anyone accessing this publicly available tool can have their ‘data’ contributing to their database based on their inputs from the web page, regardless of whether the data entered are real or not.”
He adds that he would caution against clinical use of predictive tools that aren’t accompanied by clear descriptions of how they were developed and validated, and by information about the data sources used to create them.
Other applications on the website have also come under scrutiny after scientists began sharing their impressions of them on Twitter. The Diagnosis tool, for example, estimates that a person with a fever and a five-day “interval to respiratory symptoms” has a less-than-2-percent probability of COVID-19 infection. Changing five days to six days increases that probability to more than 98 percent. Ticking a box that says “sore throat” reduces the probability back to less than 2 percent.
AFEM’s response to criticism of Surgisphere
In early June, following negative press coverage of Surgisphere Corporation, clinicians in African hospitals started reaching out to Wallis to say they were uncomfortable using the tool, he tells The Scientist. As questions about Surgisphere’s dataset continued to build, AFEM responded publicly with a statement about the Severity Scoring Tool on its website on June 4.
“We have been made aware of controversies related to Surgisphere Corporation, including recent articles published in The Lancet and The New England Journal of Medicine using their data,” the statement reads. “Alongside our colleagues in the scientific community, we hope that these questions are resolved swiftly and satisfactorily by those involved in the studies.”
The next day, after both The Lancet and NEJM issued retractions, AFEM issued a second statement recommending that hospitals stop using the aid. “Given that the Severity Scoring Tool was initially derived using data from Surgisphere, these data being called into question also calls into question our tool,” the statement reads. “The AFEM team has undertaken urgent discussions to make the safest, most ethical decision regarding the use of the tool.” The paper version and the link to the online version have both been removed from AFEM’s website.
Now, Wallis says, the organization is focused on trying to help teams transition away from using the tool. The group recommends in its statement that clinicians follow the World Health Organization’s guidelines for now, and says it intends to proceed with “original research to develop a robust, evidence-based tool to help clinicians in understanding likely resource requirements of patients presenting to emergency units with potential COVID-19 disease.”
The organization plans to start work right away, Wallis says, but it’s going to take months to develop and test a new tool. He expresses concern that the association with Surgisphere could damage trust in AFEM, and threaten future efforts to develop clinical tools for use in African countries. “The people who will suffer due to that are African patients,” he writes in an email to The Scientist. “That is the unfortunate reality now, regardless of the outcome of the (totally necessary) investigations into Surgisphere. There is no good result for us or our patients in that context.”
More from The Scientist’s investigation into Surgisphere
The company behind a now-discredited study on hydroxychloroquine also posted a report that has been cited by Latin American governments recommending ivermectin as a possible coronavirus treatment. Clinicians there say the effects have been extremely damaging.
All authors other than company founder and CEO Sapan Desai were “unable to complete an independent audit of the data,” The Lancet states.
NEJM and The Lancet issue expressions of concern as researchers question where the company got its data on thousands of coronavirus patients.
Scientists have raised questions about the dataset published in The Lancet last week that triggered the suspension of clinical trials around the world—and about Surgisphere Corporation, the company behind the study.
A paper published in The Lancet reported that hospitalized COVID-19 patients taking the drug had a higher risk of death, although some researchers have raised questions about the data.