In January 2025, the government of India announced the completion of the Genome India Project, which entailed sequencing the genomes of 10,000 Indians across 83 communities. The project, launched in 2020 with a goal of understanding Indian diversity, health, and disease, culminated with a comprehensive catalog of genetic variations, laying the foundation to build an Indian reference genome. What drove the researchers to undertake such a huge project?

Bratati Kahali is a geneticist and computational biologist, and was one of the principal investigators on the Genome India Project.
Bratati Kahali
Shaped by its complex history and vast geography, India is home to more than 1.4 billion people belonging to more than 4,600 communities.1 Ethnically and geographically diverse populations can differ in their genetic makeup, which can influence an individual’s susceptibility to developing diseases like cancers and neurodegenerative disorders.2,3 Despite this, for nearly 20 years, scientists have relied on a reference human genome that lacks sequencing data from diverse populations.4
“The databases [describing] the global genomics landscapes are extremely Eurocentric, with very few numbers of genetic variations captured from Indians,” said Bratati Kahali, a geneticist and computational biologist at the Indian Institute of Science and a principal investigator in the Genome India Project. The Indian population remains largely unstudied, further contributing to the underrepresentation of their genetic variations on a global scale, she explained. For instance, a majority of the sequence of the original reference human genome came from just 11 people, many of whom were enrolled through a newspaper advertisement in New York.5
As scientists sequenced the human genome to build a more reliable reference, Sridhar Sivasubbu, a retired genomics researcher from the Institute of Genomics and Integrative Biology (IGIB), and his colleagues noticed the underrepresentation of the Indian population in these efforts. “We represent a significant percent of the world population, one fifth of the world,” said Sivasubbu. “It is time that we filled that gap, and that filling of the gap started in 2009.”

Sridhar Sivasubbu is a genomics researcher who led human genome sequencing in India for the first time, laying the foundation for the Genome India Project.
Sridhar Sivasubbu
Back in the early 2000s, researchers tried to map genetic variations in the Indian population, said Mohammed Faruq, a clinical genomics researcher at the IGIB and one of the principal investigators in the Genome India Project. With the sequencing technology available to them then, researchers developed a database of genomic variations in 900 candidate genes in the Indian population.6 Following technological advances with next-generation sequencing, in 2009, Sivasubbu set up the first facility in India capable of sequencing an entire human genome and carried out the sequencing of 1,000 genomes.
Mohammed Faruq is a clinical genomics researcher at the IGIB and was one of the principal investigators on the Genome India Project.
Mohammed Faruq
This work formed the foundation of the biggest genome project carried out by India to date, which brought together researchers from 20 institutes across the country to sequence the 10, 000 genomes.
As part of the project, researchers sampled ethnically, linguistically, and geographically diverse populations residing in large urban societies and small tribal villages. “We have selected [a] few areas and [a] few populations which may serve as a guide for understanding the remaining diversity which is to be studied in future,” said Faruq.
This was no small feat and required exemplary team effort, according to Kahali. Teams responsible for collecting samples had to navigate through extremely remote locations, often waiting for the appropriate season to do so. Sample collection also required several permissions at local, regional, and state levels.

Anthropologist consultant Ganga Nath Jha of the Genome India Project conversed with village residents in the state of Uttarakhand to include local participants.
Mohammed Faruq
The researchers transported the collected samples to the nearest sequencing labs, where they isolated DNA and carried out whole genome sequencing. Equipped with the sequencing data, the researchers carried out joint genotyping by analyzing all samples simultaneously to identify variants. This required immense computing power, said Kahali, who added, “It was something that we were doing in the country for the first time.”
The sequencing data released earlier this year provided novel insights about the genetic diversity of the Indian population with distinct ancestries. The researchers observed that linguistic and geographical diversity also manifested as genetic diversity. Genomic comparisons revealed hundreds of millions of variants in the population, of which 27 million are relatively rare and linked to diseases such as hypercholesterolemia, hypertrophic cardiomyopathy, and some cancers. Of these rare variants, seven million are novel not found in similar databases globally.

Team members of the Genome India Project carried out fieldwork to enroll local people from the mountains in the state of Himachal Pradesh.
Mohammed Faruq
Identifying such variants can aid in researchers’ understanding of harmless mutations carried by healthy individuals. Knowledge of benign and deleterious variants can eventually offer clues about the genetic basis of common diseases.
The data also fills important gap in the field of pharmacogenomics, or how genes dictate an individual’s responses to drugs. The researchers observed that many Indian populations carry several gene variants implicated in reducing the efficiency and efficacy of antiviral drugs. “Understanding our own people’s health at a genomic level is going to revolutionize [healthcare],” said Faruq. Insights from this project could guide researchers towards precision medicine, he added.
Genomic data from the project are being stored in a digital repository in one of the participating institutes, which, according to a report, researchers can access after obtaining clearance.

Harsh Sheth is a genomics researcher at Foundation for Research in Genetics and Endocrinology Institute of Human Genetics.
Harsh Sheth
Sequencing 10,000 genomes is a significant milestone, demonstrating India’s capacity to execute a project of this scale, said Harsh Sheth, a genomics researcher at the Foundation for Research in Genetics and Endocrinology Institute of Human Genetics, who was not involved in the project. “The 10,000 will most likely not capture all possible pervasive diversity there is, but it gives you a beautiful flavor of what they’re really like if you want to dig deep,” he said. “The next stage is to really start honing the skills of doing mass scale genomics at a much, much cheaper cost.”
Although the project sequenced the genomes of only about two percent of India’s communities, Faruq said that there are plans for future expansion. “This is not the complete spectrum; this is just the beginning.”
- The Indian Genome Variation Consortium. The Indian Genome Variation database (IGVdb): A project overview. Hum Genet. 2005;118(1):1-11.
- Haiman CA, Stram DO. Exploring genetic susceptibility to cancer in diverse populations. Curr Opin Genet Dev. 2010;20(3):330-335.
- Saleem Q, et al. Expanding colonies and expanding repeats. Lancet. 2002;359(9309):895-896.
- Sherman RM, et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet. 2019;51(1):30-35.
- Khamsi R. A more-inclusive genome project aims to capture all of human diversity. Nature. 2022;603(7901):378-381.
- Narang A, et al. IGVBrowser—A genomic variation resource from diverse Indian populations. Database. 2010;2010:baq022.