Image: Courtesy of Rosetta Biosoftware |
There is a saying: "Be careful what you wish for, you just may get it." Biologists long pined for faster, more efficient ways to gather data; now they generate genomic information faster than they can assimilate it. The result: information overload. The solution: data mining.
Though data mining is an ambiguous term, most definitions include the idea of dealing with very large data sets and enabling exploratory data analysis, says Simon Lin, manager, Bioinformatics Core Facility, Duke University. That approach is handy when you're not sure what you're looking for. Traditional analysis, in contrast, tests a hypothesis.
"With data mining," Lin says, "you're always getting something unexpected." He cites a clinical collaboration in which a second, heretofore unknown, disease subtype was found, helping to explain why the standard treatment failed for...