Blue-toned illustration of the DNA double helix, with additional DNA strands in the background
Blue-toned illustration of the DNA double helix, with additional DNA strands in the background

Nearly Complete Human Genome Sequenced

In a preprint, researchers fill in some of the holes left in the first draft of the human genetic code, published at the turn of the century.

Jef Akst
Jef Akst

Jef Akst is managing editor of The Scientist, where she started as an intern in 2009 after receiving a master’s degree from Indiana University in April 2009 studying the mating behavior of seahorses.

View full profile.

Learn about our editorial policies.

Jun 8, 2021


Update (April 1): The preprint from Karen Miga and her colleagues was published yesterday in Science, along with several other papers on research pertaining to the Telomere-to-Telomere (T2T) Consortium’s efforts to create a complete human reference genome.

The Human Genome Project was a tour de force that resulted in the first draft human genome sequence in 2000, but it wasn’t actually complete. The work left sequence gaps that genomicist Karen Miga of the University of California, Santa Cruz, calls the “final unknown” in remarks to STAT. In total, about 8 percent of the more than 3-billion-base-pair human genome—mostly repeats that are computationally challenging to assemble—has remained unsequenced in the two decades since that first draft.

Filling in those gaps has “never been done before,” Miga tells STAT, “and the reason it hasn’t been done before is because it’s hard.” But with an international group of collaborators, Miga last month (May 27) posted a preprint that starts to do just that, adding nearly 200 million DNA bases to the known human genome sequence and discovering some 115 potentially protein-coding genes in the process.

“It’s exciting to have some resolution to the problem areas,” Kim Pruitt, a bioinformatician at the US National Center for Biotechnology Information in Bethesda, Maryland, who was not involved in the research, tells Nature.

Miga and her colleagues used long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore to interrogate the DNA extracted from a cell line derived from a uterine growth called a hydatidiform mole. This structure forms through the fertilization of an egg with no nucleus, meaning that the mole carries only DNA from the sperm, and none from the person whose uterus it was growing in—a genetic anomaly that made it easier to decipher more of the genome because it didn’t involve sorting out the genetic contributions of two parents.

Researchers years ago had generated cell lines from this hydatidiform mole, and therefore it’s possible that mutations arose in the genome before it was sequenced for this latest project, such that the new genetic information “may be largely the detritus that accumulates as a cell line is propagated over many years in culture,” Elaine Mardis, the co–executive director of the Institute for Genomic Medicine at Nationwide Children’s Hospital who did not participate in the work, tells STAT.

Because the cells were frozen for years and not serially passaged that whole time, Miga tells STAT, she thinks the new sequences are biologically relevant. However, she notes to Nature that there are a few regions that need further confirmation. Because the sperm that fertilized the egg to form the mole carried an X chromosome, the team has not dug into the genomic holes that exist in the human Y chromosome sequence—something the researchers are working on now.