| Literature DB >> 35317843 |
Hoi-Yan Wu1, Pang-Chui Shaw2,3,4.
Abstract
Molecular herbal authentication has gained worldwide popularity in the past decade. DNA-based methods, including DNA barcoding and species-specific amplification, have been adopted for herbal identification by various pharmacopoeias. Development of next-generating sequencing (NGS) drastically increased the throughput of sequencing process and has sped up sequence collection and assembly of organelle genomes, making more and more reference sequences/genomes available. NGS allows simultaneous sequencing of multiple reads, opening up the opportunity of identifying multiple species from one sample in one go. Two major experimental approaches have been applied in recent publications of identification of herbal products by NGS, the PCR-dependent DNA metabarcoding and PCR-free genome skimming/shotgun metagenomics. This review provides a brief introduction of the use of DNA metabarcoding and genome skimming/shotgun metagenomics in authentication of herbal products and discusses some important considerations in experimental design for botanical identification by NGS, with a specific focus on quality control, reference sequence database and different taxon assignment programs. The potential of quantification or abundance estimation by NGS is discussed and new scientific findings that could potentially interfere with accurate taxon assignment and/or quantification is presented.Entities:
Keywords: DNA metabarcoding; Genome skimming; Genome2-ID; Herbal products; Kraken; Molecular authentication; Next-generation sequencing; Quality control
Year: 2022 PMID: 35317843 PMCID: PMC8939074 DOI: 10.1186/s13020-022-00590-y
Source DB: PubMed Journal: Chin Med ISSN: 1749-8546 Impact factor: 5.455
Fig. 1A flow diagram of species identification of multi-ingredient products by NGS
Fig. 2Sequences obtained and analysis workflow for DNA barcoding, DNA metabarcoding and genome skimming
Features and applicability of different species identification approaches
| DNA barcoding | DNA metabarcoding | Genome skimming/shotgun metagenomics | |
|---|---|---|---|
| Source of template | PCR product | PCR product | Total DNA |
| No. of sequences obtained | One | Thousands to millions | Thousands to millions |
| Read length | ~ 1000 bp | Short (~ 100–300 bp) or long (> 10,000 bp), depending on sequencing platform | Short (~ 100–300 bp) or long (> 10,000 bp), depending on sequencing platform |
| Detection of multiple species | No | Yes | Yes |
| Affected by PCR bias | Yes | Yes | No |
| Potential for quantification | No | No (Read counting is possible but cannot truly reflect relative abundance) | Yes (Semi-quantification may be possible if all reads can be correctly assigned taxonomically) |
Fig. 3Overview of database building and taxon assignment of Kraken
Fig. 4Overview of database building and species assignment of Genome2-ID
Comparison of three taxonomic assignment programs previously used in herbal identification
| BLAST | Kraken | Genome2-ID | |
|---|---|---|---|
| Method | Alignment-based | k-mer based | k-mer based |
| Database | Sequences downloaded from GenBank or BOLD | Indexed and sorted list of k-mer/LCA pairs | A hash table of k-mer annotated with reference species the k-mer was observed with |
| Classification | 1. BLAST-based search 2. Sequence assigned to species or the LCA by MEGAN or CITESspeciesDetect pipeline | 1. All k-mers of a sequence are mapped to different LCAs according to database 2. Each hit taxon in the classification tree is scored 3. Sequence assigned to the “leaf” (the lowest taxon rank scored) of the highest weighted “tree branch”/path | 1. All k-mers of a sample are mapped to different reference species according to database 2. Presence of the mapped reference species in a sample is determined by computing the number of k-mers of the species matched in the sample, the coverage/proportion of k-mers of the species matched and average coverage depth of the species, with statistical analysis to show confidence for presence of the species |
| Results output | Multiple species assignments for a given read by BLAST, further analyzed to report LCA of the read/contig | LCA of a given read/contig | Species determined to be present in the sample |
| Advantage | Customizable database Gold standard for taxonomic assignment [ | Customizable database Less sensitive to structural rearrangements (e.g. inversions) [ Detection | Customizable database Less sensitive to structural rearrangements (e.g. inversions) [ Semiquantitative estimation possible (for genome skimming without PCR) [ |
| Disadvantage | Computationally demanding and slow Sensitive to structural rearrangements (e.g. inversions) [ | High memory requirement (improvable with smaller database or more updated versions like Kraken 2) | Not publicly available |
| Related programs | BLASTN MegaBLAST | Kraken 2 KrakenUniq Bracken | N/A |