| Literature DB >> 35774988 |
Liming Cai1,2,3, Hongrui Zhang1, Charles C Davis1.
Abstract
Premise: The application of high-throughput sequencing, especially to herbarium specimens, is rapidly accelerating biodiversity research. Low-coverage sequencing of total genomic DNA (genome skimming) is particularly promising and can simultaneously recover the plastid, mitochondrial, and nuclear ribosomal regions across hundreds of species. Here, we introduce PhyloHerb, a bioinformatic pipeline to efficiently assemble phylogenomic data sets derived from genome skimming. Methods andEntities:
Keywords: herbariomics; high‐throughput sequencing; mitochondria; plastome; ribosomal genes
Year: 2022 PMID: 35774988 PMCID: PMC9215275 DOI: 10.1002/aps3.11475
Source DB: PubMed Journal: Appl Plant Sci ISSN: 2168-0450 Impact factor: 2.511
Comparison of existing plastome annotation tools. The execution time for PhyloHerb is estimated on the Lenovo SD650 NeXtScale server of the FASRC Cannon compute cluster at Harvard University. The execution time for all other software is cited from Qu et al. (2019).
| Tools | User interface | Time | Output format | Accept multi‐FASTA/fragmented assembly | References |
|---|---|---|---|---|---|
| Plann | Console | ~30 s | tbl | No | Huang and Cronk ( |
| Verdant/annoBTD | Web/Console | 10–30 min | GFF3 | Currently not supported but can be incorporated (personal communication with author). | McKain et al. ( |
| GeSeq | Web | 6 s–13 min | GenBank | Yes. One assembly per run for fragmented genomes. | Tillich et al. ( |
| PGA | Console | ~20 s | GenBank | Yes, but not recommended. Batch processing. | Qu et al. ( |
| PhyloHerb | Console | 2–30 s | FASTA | Yes. Batch processing. | This paper |
Figure 1PhyloHerb workflow. The five main function modules of PhyloHerb, including qc, getseq, ortho, conc, and order, provide a versatile and efficient tool to curate and analyze genome skimming data.
Figure 2Defining and extracting genetic blocks with PhyloHerb. (A) A 5‐kbp‐long continuous genetic block on the plastid genome of Arabidopsis thaliana divided into two loci (LOC1 and LOC2). (B) The ‘getseq’ function of PhyloHerb can be used to extract sequences of predefined genetic blocks. The ‘genetic_block’ mode will include genes on both ends, while the ‘intergenic’ mode does not.
Figure 3Phylogeny of 10 Clusiaceae species inferred from the complete (A) and subsampled plastid data sets (B–D). Raw reads were randomly subsampled to 100 Mbp (B), 50 Mbp (C), and 20 Mbp (D) to simulate decreasing base coverage in genome skimming. For all four analyses, a partitioned concatenated DNA alignment of 87 plastid genes was used to infer the species tree in IQ‐TREE using the GTRGAMMA model. Nodal support was estimated from 1000 ultrafast bootstrap replicates (UFBoot). Unlabeled nodes indicate 100 UFBoot support. Note the unstable placement of Chrysochlamys skutchii in subsampled data sets.