Benching Bases

By Kelly Rae Chi Benching Bases How to do heavy computational lifting in genomes and transcriptomes You've unpacked your next-generation sequencing system and popped in some DNA or RNA. Five days later, you've sequenced 50 million tiny strings of nucleotides. Then what? Based on their sequences, you have to align all the fragments, called "reads," with the help of a reference genome—a fully assembled sequence from the same species. In the abse

Written byKelly Rae Chi

| 7 min read

Save for Later

Listen with Speechify

0:00

7:00

You've unpacked your next-generation sequencing system and popped in some DNA or RNA. Five days later, you've sequenced 50 million tiny strings of nucleotides. Then what?

Based on their sequences, you have to align all the fragments, called "reads," with the help of a reference genome—a fully assembled sequence from the same species. In the absence of a reference, you're left with assembling the genome based solely on the portions of the reads that overlap with each other. For both alignment and assembly, "computation becomes a big issue," says Steven Salzberg, director of University of Maryland's Center for Bioinformatics and Computational Biology. "That's a huge amount of data, and in fact even streaming the data off the machine onto other computers causes network bandwidth problems."

That's because most newer technologies generate shorter reads—roughly 25 to 50 nucleotides in length—than those generated using traditional Sanger sequencing. The newer methods create smaller ...

Interested in reading more?

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!

Join for free today

Already a member? Login Here