Stay up to date on the latest science with Brush Up Summaries. 

Article reviewed by Ninning Liu, PhD from the Wyss Institute at Harvard University.

What Is DNA Barcoding?

DNA barcoding is a powerful molecular method that helps researchers identify biomolecules through sequencing short, 10–800 base pair DNA segments called barcodes.1 Scientists use this method widely to identify all species on Earth and conduct high-throughput experiments to detect targeted small molecules within a pooled biomolecule population.2,3

Flow chart illustrating different DNA barcoding applications: taxonomy and species identification fall under biodiversity assessment applications; spatial transcriptomics and cell lineage tracking fall under high throughput DNA barcoding assays.
Scientists use DNA barcoding for biodiversity analysis and several high throughput assays.
The Scientist

DNA Barcoding for Biodiversity Assessment and Species Identification 

Similar to retail stores that use standardized barcodes to differentiate the thousands of items they sell, DNA barcodes are gene sequences that differentiate animals, plants, and microbial species. To do this, scientists extract DNA from tiny tissue samples taken from organisms of interest and sequence specific sections within or between genes for DNA barcoding. Subsequently, they match the unknown barcode sequence against known sequences in a DNA barcode library to identify the organism.1

Researchers employ the DNA barcoding technique to conduct biodiversity assessment. Using this technique, scientists determine changes in species composition and monitor the presence of invasive, endangered, or threatened species. DNA barcoding also helps overcome the shortcomings associated with traditional species identification systems that excessively depend on taxonomists’ expertise to differentiate between closely related species based on morphology. In comparison to the traditional system, this technique offers accurate identification of hundreds of species in a very short time interval.

Paul Hebert, integrative biologist at the University of Guelph, is globally known as the father of DNA barcoding. He proposed this revolutionary technique for species identification in 2003. “Telling organisms apart through morphology is a complex task. With the right person, it is wonderfully executed but it requires a lot of specialized training,” said Hebert. “The idea behind DNA barcoding is, you use a tiny slice of DNA to move identification into the digital realm and make it possible for anyone to identify any organism they encounter, as long as they have a little technology to back them up.”

Mitochondrial DNA barcodes

Hebert initially proposed short mitochondrial DNA (mtDNA) sequences as barcodes, specifically in the gene coding for the cytochrome c oxidase subunit 1 (CO1). Researchers prefer mtDNA over the nuclear genome for developing accurate species-level DNA barcodes because it is maternally inherited.4 This allows scientists to unambiguously retrace evolutionary relationships due to mtDNA’s lack of heterologous recombination, which trades pieces of parental DNA in the nuclear genome and complicates lineage tracing.

Rapid nucleotide sequence changes in mtDNA lead to accumulated differences that distinguish between closely related species. Unlike the nuclear genome, most eukaryotes pack their circular mtDNA into nucleoids that are more prone to sequence changes. These nucleoids lack histone proteins, which play a crucial role in maintaining the structural integrity of nuclear chromosomes. Additionally, mtDNA lacks a suitable repair mechanism and is susceptible to nucleotide substitution in the presence of reactive oxygen species (ROS) generated during the respiratory cycle. 

Types of DNA barcodes 

Animal DNA barcodes 

To date, CO1 is the most adopted DNA barcode for animals.5 The main advantage of this sequence lies in its length: It is short enough for quick and inexpensive sequencing and long enough for identifying variations among species. In addition to CO1, researchers also use mitochondrial cytochrome b (Cytb), 16S ribosomal RNA (rRNA), and 12S rRNA for species identification.

Plant DNA barcodes 

Unlike animals, plant mtDNA undergoes extensive internal rearrangements that result in higher levels of mutations affecting chromosome length without significant nucleotide substitution. Therefore, scientists favor sequences such as the chloroplast intergenic spacer trnH-psbA, nuclear-encoded ribosomal internal transcribed spacer (ITS) regions, and plastid genes (e.g., matK and rbcL)  for barcoding plants.6

DNA barcodes for microorganisms

Microbiologists can identify microorganisms (e.g., bacteria, fungi, and viruses) using DNA barcodes.4 For fungal identification, researchers sequence the ITS region of nuclear rRNA genes. For bacterial and archaeal identification, microbiologists use 16S rRNA, chaperonin-60, and RNA polymerase β subunit (rpoB) genes.

Construction of the DNA barcode reference system

Scientists working on DNA barcode development initially used the Sanger method to sequence DNA. However, despite its efficacy, this method proved to be expensive for community-level studies due to the necessity of a separate sequencing reaction for every sample. Next-generation sequencing (NGS) methods help scientists overcome this limitation.7 

NGS allows researchers to develop comprehensive and well-curated reference systems, which are crucial for species assessment accuracy. For example, the DNA barcode database plays a dual role in biodiversity assessment: It is a library for data deposition and a tool for species monitoring and identification.8 

An online digital DNA barcode reference system serves as a standard that helps identify unknown species. The Barcode of Life Data System (BOLD) is a bioinformatics platform and the largest DNA barcode database that aids in the collection, analysis, management, and publication of DNA barcodes. It is an open-access platform that hosts more than nine million barcodes derived from twelve million specimens.9

DNA Barcoding in High-throughput Assays

In addition to species identification, DNA barcodes aid in high-throughput screening for single-cell sequencing, proteomic analysis, drug formulation screening, and cancer research. Ninning Liu, a bioengineer at the Wyss Institute at Harvard University, elucidated the use of DNA barcodes in high-throughput assays to determine the expression of genes in a single cell. “The goal of DNA barcoding in my mind, I'm thinking of high-throughput labeling and data retrieval of small molecules. Things like genes, expressed genes, or proteins,” said Liu. 

A high-throughput single-cell assay requires multiplexing strategies, which increase the number of simultaneously measured parameters in a single experiment.10 "For a single cell, you might have thousands of unique proteins and expressed genes present within it at any given time. With DNA barcoding, you can easily design thousands of unique DNA sequences to tag every one of those molecules with unique DNA barcode fingerprints,” explained Liu. 

Low throughput techniques such as the Sanger sequencing cannot feasibly sequence every DNA barcode-labeled molecule due to the limitation in the amount of DNA that can be processed at a given time. In contrast, newer sequencing methods can read the sequences of multiple DNA molecules in parallel. “With next-generation sequencing, you can do billions of parallelized readouts at once. You could conceivably retrieve all of your barcode sequences in a single NGS run,” said Liu.

DNA barcoding for identifying targeted RNA transcripts

RNA sequencing (RNA-seq) employing the NGS technique is commonly used to study messenger RNA (mRNA) in a cell. This method provides a snapshot of gene expression using molecular barcodes, which are short nucleotide tags that are used to detect target mRNA from complex cDNA libraries.10 Researchers designed molecular barcodes to identify RNA transcripts. These barcodes consist of adjacently located reporter and capture probes. The reporter portion of the barcode constitutes a fluorescent RNA-DNA hybrid molecule, while the capture region comprises a gene-specific sequence. When a molecular barcode hybridizes with the RNA target molecule, it forms a tripartite structure that scientists can identify through its specific fluorescent signal. 

The fluorescently labeled RNA serves as the identifier molecule that comprises unique color sequences for all genes of interest.11 The intensity of the fluorescent signal helps researchers quantify target RNA transcripts at tissue and subcellular levels. Through combinatorial color coding and iterative hybridization and imaging cycles, researchers can identify thousands of RNA species with this strategy.

In single-cell RNA sequencing (scRNA-seq), scientists use DNA barcodes as versatile labels for sample multiplexing and transcriptome-wide expression profiling of individual cells.12 This strategy assesses molecular abundance and single cell heterogeneity.

DNA barcoding for lineage tracking

Scientists deploy genetic engineering and nucleic acid sequencing technologies for high-throughput cellular lineage tracking strategies. This technique employs genetically engineered DNA barcodes to detect cell lineages within a wide-ranging population of multiple generations.13 For cell lineage tree reconstruction, researchers target mutational hotspots such as retrotransposons and microsatellites for genotyping. Barcode lineage tracking has helped trace cellular differentiation during organ development and characterize T cell recruitment. 

DNA Barcoding Challenges and Future Outlooks 

One of the major challenges that scientists face while using DNA barcoding techniques for high-throughput assays is labeling bias, which could lead to some sequencing features being over- or under-represented. “Depending on how you design your library, it might be biased towards certain sets of small molecules,” said Liu. 

In the context of DNA barcoding for species identification, high sequencing costs once imposed immense challenges. However, technological advancements have helped to address this issue. These advances support Hebert’s ongoing biodiversity research program, BIOSCAN. “When we started BIOSCAN, we were using SQL platforms by PacBio, which dropped the sequencing cost to 20 cents per specimen, so that was very good. Over the last year, another technology developed in the UK by Oxford Nanopore Technologies has absolutely shifted the playing field again. In one of our most recent studies, we managed to run recovered barcodes from a million specimens in a single three-day run, at a cost of a tenth of a cent per specimen,” said Hebert. The latest cost-effective sequencers also help spread the use of DNA barcoding techniques to underserved and underrepresented regions. 

In the future, researchers could use DNA barcoding sequences to identify organisms, track their distribution, and establish organism interaction links. This will help researchers construct a library of life and provide an understanding of how different species interface with each other and their environments.  

According to Hebert, another future aim of DNA barcoding is to catalog all species on Earth and track their global distribution in real time, as a climate crisis call to action. “The grand goal is a weather system equivalent for the planet. Being able to watch and track biodiversity in near real-time at a planetary scale so humanity understands what the impacts of its actions are. You need knowledge to drive behavioral change,” said Hebert.  

References

  1. Kress WJ, Erickson DL. DNA barcodes: methods and protocols. Methods Mol Biol. 2012; 858:3-8. 
  2. Antil S, et al. DNA barcoding, an effective tool for species identification: a review. Mol Biol Rep. 2023;50(1):761-775. 
  3. Hawkins JA, et al. Indel-correcting DNA barcodes for high-throughput sequencing. PNAS. 2018;115(27):E6217-E6226. 
  4. Guo M, et al. Life barcoded by DNA barcodes. Conserv Genet Resour. 2022;14(4):351-365. 
  5. Yang F, et al. DNA barcoding for the identification and authentication of animal species in traditional medicine. Evid Based Complement Alternat Med. 2018;2018:5160254. 
  6. Letsiou S, et al. DNA barcoding as a plant identification method. Applied Sciences. 2024;14(4):1415. 
  7. Shokralla S, et al. Next-generation DNA barcoding: using next-generation sequencing to enhance and accelerate DNA barcode capture from single specimens. Mol Ecol Resour. 2014;14(5):892-901. 
  8. Gostel MR, Kress WJ. The expanding role of DNA barcodes: Indispensable tools for ecology, evolution, and conservation. Diversity. 2022;14(3):213. 
  9. Ratnasingham S, Hebert PD. BOLD: The Barcode of Life Data System (http://www.barcodinglife.org). Mol Ecol Notes. 2007;7(3):355-364. 
  10. Binan L, et al. Exploiting molecular barcodes in high-throughput cellular assays. SLAS Technol. 2019;24(3):298-307. 
  11. Chen HK, et al. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015; 348, aaa6090. 
  12. Cheng J, et al. Multiplexing methods for simultaneous large-scale transcriptomic profiling of samples at single-cell resolution. Adv Sci (Weinh). 2021; 8(17): e2101229. 
  13. Kim IS. DNA barcoding technology for lineage recording and tracing to resolve cell fate determination. Cells. 2024;13(1):27.