| Literature DB >> 31911810 |
Paul G Nevill1,2,3, Xiao Zhong4,5, Julian Tonti-Filippini4,5, Margaret Byrne2,6,7, Michael Hislop6, Kevin Thiele2,6, Stephen van Leeuwen6, Laura M Boykin4,5, Ian Small4,5.
Abstract
BACKGROUND: Herbaria are valuable sources of extensive curated plant material that are now accessible to genetic studies because of advances in high-throughput, next-generation sequencing methods. As an applied assessment of large-scale recovery of plastid and ribosomal genome sequences from herbarium material for plant identification and phylogenomics, we sequenced 672 samples covering 21 families, 142 genera and 530 named and proposed named species. We explored the impact of parameters such as sample age, DNA concentration and quality, read depth and fragment length on plastid assembly error. We also tested the efficacy of DNA sequence information for identifying plant samples using 45 specimens recently collected in the Pilbara.Entities:
Keywords: Chloroplast; Genome skimming; Herbarium specimens; Next-generation sequencing; Pilbara; Plant DNA barcoding; Plastid genome
Year: 2020 PMID: 31911810 PMCID: PMC6942304 DOI: 10.1186/s13007-019-0534-5
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Fig. 1Estimation of assembly completeness by comparison with Genbank records. Assemblies were paired with the closest match amongst all complete plastid genomes in Genbank. The scatter plot shows the relationship between the length of the assembly and its paired Genbank record. The straight line indicates the expected (x = y) values. The colours indicate ‘good’ (blue) and ‘poor’ (orange) assemblies based on the discrepancy observed between the paired lengths (calculated as described in the Methods). In all, from 672 samples, 606 assemblies passed this criterion, 54 assemblies failed, and for 12 samples no assembly was obtained
Fig. 2Proportion of species in families for which complete or near complete plastid genome, rDNA, matK and rbcL were retrieved in the sequencing dataset. Families shown are those with more than five species in the study
Fig. 3The distribution of coverage across all the samples
Fig. 4Relationships between various DNA, sequencing and assembly parameters on assembly completeness. The distributions of ten different parameters that might influence assembly success were investigated in samples that were deemed to be ‘good’ (blue) or ‘poor’ (orange) (as described in Methods and depicted in Fig. 1). Individual points represent individual samples; box plots indicate the median (centre line), interquartile range (box) and 1.5× interquartile range (‘whiskers’). The p-values shown indicate the results of t-tests for differences in the means of the two distributions in each case