With more and more researchers conducting experiments that tease apart the functions of the thousands of genes that make up the genomes of mice, rats, and humans, the number of gene-expression datasets deposited in publicly accessible databases will soon reach 1,000,000, according to an analysis done by Nature. Adding together the number of datasets in the two major public data repositories, the National Center for Biotechnology Information's Gene Expression Omnibus and the gene-expression database at the European Bioinformatics Institute, the milestone should be reached within the next month.
"Some time in the next few weeks, the number of deposited data sets will top one million," Monya Baker wrote in Nature last week.
Gene-expression data can help scientists test preliminary hypotheses about which genes may contribute to the development of certain diseases, leading to potential drug targets. For example, a researcher could comb public data from several studies of people or animal models with Alzheimer's disease to determine which gene or genes are highly expressed. After identifying likely suspects, she could then perform wet lab experiments that further tease apart the roles of those genes in her own study subjects. If one gene emerges as a key driver of the disease, drugs can be developed to target the functioning of that gene with the hope that it may change the course of the disease. Having a robust database at the outset saves time and money in targeting the genes most likely to impact a particular disease. And more data equals a better chance of successfully identifying a key player in disease progression.










