Gene Expression Data Mining

Image: Courtesy of Rosetta Biosoftware  The Rosetta Resolver system's Image Viewer application showing an Affymetrix GeneChip probe array. There is a saying: "Be careful what you wish for, you just may get it." Biologists long pined for faster, more efficient ways to gather data; now they generate genomic information faster than they can assimilate it. The result: information overload. The solution: data mining. Though data mining is an ambiguous term, most definitions include the idea

| 9 min read

Register for free to listen to this article
Listen with Speechify
0:00
9:00
Share

There is a saying: "Be careful what you wish for, you just may get it." Biologists long pined for faster, more efficient ways to gather data; now they generate genomic information faster than they can assimilate it. The result: information overload. The solution: data mining.

Though data mining is an ambiguous term, most definitions include the idea of dealing with very large data sets and enabling exploratory data analysis, says Simon Lin, manager, Bioinformatics Core Facility, Duke University. That approach is handy when you're not sure what you're looking for. Traditional analysis, in contrast, tests a hypothesis.

"With data mining," Lin says, "you're always getting something unexpected." He cites a clinical collaboration in which a second, heretofore unknown, disease subtype was found, helping to explain why the standard treatment failed for some patients. "We're using data mining to generate more hypotheses, not to confirm therapies," he emphasizes. In another collaboration, ...

Interested in reading more?

Become a Member of

The Scientist Logo
Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!
Already a member? Login Here

Meet the Author

  • Gail Dutton

    This person does not yet have a bio.

Published In

Share
May digest 2025 cover
May 2025, Issue 1

Study Confirms Safety of Genetically Modified T Cells

A long-term study of nearly 800 patients demonstrated a strong safety profile for T cells engineered with viral vectors.

View this Issue
Detecting Residual Cell Line-Derived DNA with Droplet Digital PCR

Detecting Residual Cell Line-Derived DNA with Droplet Digital PCR

Bio-Rad
How technology makes PCR instruments easier to use.

Making Real-Time PCR More Straightforward

Thermo Fisher Logo
Characterizing Immune Memory to COVID-19 Vaccination

Characterizing Immune Memory to COVID-19 Vaccination

10X Genomics
Optimize PCR assays with true linear temperature gradients

Applied Biosystems™ VeriFlex™ System: True Temperature Control for PCR Protocols

Thermo Fisher Logo

Products

The Scientist Placeholder Image

Biotium Launches New Phalloidin Conjugates with Extended F-actin Staining Stability for Greater Imaging Flexibility

Leica Microsystems Logo

Latest AI software simplifies image analysis and speeds up insights for scientists

BioSkryb Genomics Logo

BioSkryb Genomics and Tecan introduce a single-cell multiomics workflow for sequencing-ready libraries in under ten hours

iStock

Agilent BioTek Cytation C10 Confocal Imaging Reader

agilent technologies logo