“Anonymous” Genomes Identified

The names and addresses of people participating in the Personal Genome Project can be easily tracked down despite such data being left off their online profiles.

By | May 3, 2013

WIKIMEDIA, GEORGE GASTINData privacy researchers have been able to identify the names of hundreds of participants in the Personal Genome Project (PGP) using demographic data from their profiles, according to a paper out this week on the arXiv preprint server. The authors also suggest ways in which contributors can increase their privacy.

Launched in 2006, the PGP aims to collect genetic data as well as health and lifestyle information from 100,000 people to help researchers tease apart the interactions between genotype, environment, and phenotype. The project does not guarantee privacy, reported MIT Technology Review, and participants can choose to disclose as much personal data as they want, including ZIP code, birth date, and gender, on their online PGP profile. But these profiles are “de-identified,” meaning their names and addresses are not made public.

Now, researchers from Harvard University have demonstrated that this veneer of anonymity is easily breached. By comparing demographic data from 579 PGP profiles containing zip codes, full dates of birth, and genders with information from voter lists and other public records, and identifying patient names in the files they had uploaded to the PGP website, the researchers identified 241 participants. Checking the results with administrators at the PGP, the team found that 84 percent of these matches were correct, demonstrating that PGP profiles are vulnerable to re-identification.

This could be harmful because many participants reveal sensitive personal details, argued the authors of the study, such as predispositions to genetic diseases that might affect life insurance premiums and claims. The 2008 Genetic Information Non-Discrimination Act does covers medical, but not life insurance.

The researchers added that privacy protection could easily be firmed up with little impact on research value if PGP participants included less precise birth date and ZIP code information. They have also developed an editing tool to help people make such changes to their PGP profiles, which cannot otherwise be modified.

Clarification (May 3): The text has been amended to more accurately reflect that a portion of the 241 participants “re-identified” were found using names included in the files they had uploaded to the PGP website. As Jane Yakowitz Bambauer, associate professor of law at the University of Arizona, pointed out on the Info/Law blog, 115 of the 241 were "re-identified" in this way, and 80 of those 115 could not have been found using their demographic data alone. Thus, using demographic data alone, the researchers could only have re-identified 161 of the 579 participants.

Add a Comment

Avatar of: You



Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo


Avatar of: JasonBobe


Posts: 1

May 3, 2013

More background information about the non-anonymous PGP study can be found here, along with a discussion as to whether the proposed data scrubbing tools might actually contribute to a sense of privacy and mislead participants engaged in public research: 



Avatar of: Brian Hanley

Brian Hanley

Posts: 36

May 3, 2013

Welcome to the omni-surveillance future!  This is just the beginning. Deal with it. It's not going away. 

Popular Now

  1. Thousands of Mutations Accumulate in the Human Brain Over a Lifetime
  2. Two Dozen House Republicans Do an About-Face on Tuition Tax
  3. Putative Gay Genes Identified, Questioned
    The Nutshell Putative Gay Genes Identified, Questioned

    A genomic interrogation of homosexuality turns up speculative links between genetic elements and sexual orientation, but researchers say the study is too small to be significant. 

  4. Can Young Stem Cells Make Older People Stronger?