Toward Protecting Participants’ Privacy

Genomic data shared via the Beacon Project are vulnerable to privacy breaches, scientists show.

By | October 29, 2015

FREEIMAGES, SCHULERGDAnonymous patients whose DNA data is shared via a network of web servers—or beacons—set up by the Global Alliance for Genomics and Health are at risk of being reidentified, according to a report published today (October 29) in The American Journal of Human Genetics. In it, researchers from Stanford University and their colleagues present recommendations for how security could be improved, but some scientists argue that any promise of DNA data privacy is probably a fallacy.

“The paper shows that . . . if you have access to someone’s DNA, you can now go and check in different beacons to see whether [that person] participated,” said computer scientist and computational biologist Yaniv Erlich of Columbia University in New York City who was not involved in the work.

The Beacon Project was established by the Global Alliance for Genomics and Health as a way for research institutes and hospitals to easily share genomic data while maintaining patient privacy. Essentially, the system allows a user to ask whether a specific nucleotide exists at a particular chromosome location in any genome held in a given beacon, but keeps all other sequence data concealed. This means that a clinician could check whether a mutation found in one of her own patients had been discovered in other patients, for example, without actually having access to those other patients’ genomes.

But there is currently no cap on the number of queries a user can make, explained Carlos Bustamante of Stanford, who led the research. “[We] started thinking about how this particular set up could be exploited by a nefarious user. So thinking: If I can ping this database and I have an unlimited number of queries, could I find out whether a particular individual is in that database?”

The answer was yes. The team showed that with a simulated beacon containing the genomic data of 1,000 individuals, just 5,000 nucleotide queries were enough to determine whether a known individual was in the database. In an artificial beacon containing just 65 genomes, that number of nucleotide queries dropped to just 250. “It is easier to hide in a big crowd than a small crowd,” explained Bustamante.

To find a person’s genome within a given beacon, a user would need to already be in possession of that person’s genome sequence. “When someone has access to your DNA . . . you’ve already lost quite a lot of your privacy,” said Erlich. But because many beacons contain the genomes of people with particular diseases, it may be possible for the user to learn information about the person’s medical history that they had wished to keep private.

“From your genome alone it is hard to predict whether you have diabetes or heart disease or a psychiatric disorder,” said Bustamante. But if a user found a person’s genome in, say, a database of people with psychiatric disorders, they might guess that the person had such a disorder.

“You may share stuff on Facebook,” said Bustamante, “But you don’t necessarily want every aspect of everything you’ve ever done to be known.”

As a result of the study, the authors have made security recommendations to the Global Alliance that the group is “now in the process of implementing,” said Bustamante. For example, he and his colleagues suggested that beacons should contain as many genomes as possible to make detection of a single individual more difficult, that users should be registered rather than anonymous, and that unusual activity—such as thousands of queries from one user—should be investigated.

While Erlich agreed that identifying and patching such holes is worthwhile, he thinks the whole conversation regarding privacy needs to change. “What we should emphasize is protection against harm [resulting from reidentification], not protection against reidentification [itself],” he said, because preventing the latter “is almost impossible.”

“The take home message from a paper like this. . . is that it is important to notify research participants about the limitations of [data] privacy,” he added.

George Church of Harvard Medical School agreed. “I’m the last person on the planet to say something is impossible, but I think, so far, [the idea of privacy] has been delusional.”

“If [patients] are properly educated up front, they realize there is a risk,” he added. “And if they are uncomfortable with that outcome, they should probably let somebody else donate [DNA].”  Losing a few patients who want guaranteed privacy is certainly better than “coercing them into participating by making promises that we can’t keep,” Church said.

S.S. Shringarpure and C.D. Bustamante, “Privacy leaks from genomic data-sharing beacons,” The American Journal of Human Genetics, 97: 1-15, 2015.

Add a Comment

Avatar of: You

You

Processing...
Processing...

Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo

Comments

Avatar of: mightythor

mightythor

Posts: 80

October 30, 2015

Yaniv Ehrlich's point is very profound and is relevant to information in general:

  "What we should emphasize is protection against harm resulting from re-identification, not protection against re-identification itself, because preventing the latter is almost impossible."    

The same thing is true about identity theft in general.  Quarantining is not effective.  That ship has sailed.  We have become far too informationally promiscuous.  We need ways to hold victims harmless against the effects of inevitable security breaches.  

Popular Now

  1. Publishers’ Legal Action Advances Against Sci-Hub
  2. Metabolomics Data Under Scrutiny
    Daily News Metabolomics Data Under Scrutiny

    Out of 25,000 features originally detected by metabolic profiling of E. coli, fewer than 1,000 represent unique metabolites, a study finds.

  3. How Microbes May Influence Our Behavior
  4. Do Microbes Trigger Alzheimer’s Disease?
AAAS