| Literature DB >> 25902146 |
Dáithí C Murray1, Megan L Coghlan1, Michael Bunce1.
Abstract
Amplicon sequencing has been the method of choice in many high-throughput DNA sequencing (HTS) applications. To date there has been a heavy focus on the means by which to analyse the burgeoning amount of data afforded by HTS. In contrast, there has been a distinct lack of attention paid to considerations surrounding the importance of sample preparation and the fidelity of library generation. No amount of high-end bioinformatics can compensate for poorly prepared samples and it is therefore imperative that careful attention is given to sample preparation and library generation within workflows, especially those involving multiple PCR steps. This paper redresses this imbalance by focusing on aspects pertaining to the benchtop within typical amplicon workflows: sample screening, the target region, and library generation. Empirical data is provided to illustrate the scope of the problem. Lastly, the impact of various data analysis parameters is also investigated in the context of how the data was initially generated. It is hoped this paper may serve to highlight the importance of pre-analysis workflows in achieving meaningful, future-proof data that can be analysed appropriately. As amplicon sequencing gains traction in a variety of diagnostic applications from forensics to environmental DNA (eDNA) it is paramount workflows and analytics are both fit for purpose.Entities:
Mesh:
Year: 2015 PMID: 25902146 PMCID: PMC4406758 DOI: 10.1371/journal.pone.0124671
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Details for the experiments conducted.
| Experiment | Purpose | Methods | Results |
|---|---|---|---|
|
| Illustrate the importance of quantifying samples using a dilution series to select an appropriate working dilution free of inhibition containing a sufficient quantity of input template DNA |
| Section 3.1. |
|
| Explore the potential benefits to the downstream processing of high-throughput sequencing data arising from the inclusion of amplicon-specific single-source samples embedded into sequencing runs |
| Section 3.2. |
|
| Demonstrate the importance of control reactions in bacterial metagenomics and other fields using samples with a high propensity for environmental contamination |
| Section 3.3. |
|
| Assess the efficiency drop-off associated with the use of fusion tagged primers of different ‘architecture’ when compared to standard non-fusion tagged template specific primers |
| Section 3.4. |
|
| Highlight the difficulties in choosing appropriate quality and abundance filtering parameters when analysing complex, heterogeneous samples; the composition of which are unknown. |
| Section 3.5. |
The purpose of each numbered experiment is shown in addition to the title used for each one in the methods and results section. The appropriate methods sections, results sections and figures to consult for each experiment are also given.
Fig 1Definitions used in assessing the importance of analysis parameters.
Shown are the definitions for quality and abundance filtering methods used in assessing their impact on both the number of operational taxonomic units (OTUs) and distance-based operational taxonomic units (DTUs) [24] obtained for a given sample. maxee—Maximum Expected Error
Fig 2Quantitative PCR and sequencing results of the sample screening assay.
Quantitative PCR curves indicating the presence of DNA and the degree of inhibition (LEFT) with agarose gel electrophoresis clearly indicating the presence of DNA post amplification via means of strong bands (INSET ON GRAPH). Samples were subsequently sequenced and the percentage abundance of two fish genera is indicated, where, based on taxa-specific quantitative PCR results, Sardinops (specifically S. sagax—Australian pilchard) should be in the highest abundance, with Engraulis (specifically E. australis—Australian anchovy) being in the lowest abundance. (RIGHT).
Fig 3Average sequencing error rates across a single amplicon region.
Average sequencing error rates are shown for multiple bird species across the whole of a short 12S rRNA gene region (A). Additionally, the error profile across the gene region is shown for Calyptorhynchus lathami for both the Ion Torrent PGM (B) and MiSeq (C) with key. The error patterns observed were similar across all species sequenced. Error rates are shown across 5 bp segments and where error rates were above 1% for a single base this is indicated through the red circles.
Fig 4Impact of analysis parameters on the numbers of taxonomic units obtained for a bulk-bone sample.
A number of analysis parameters were used to analyse a complex mixture containing numerous taxa. Different quality and abundance filtering methods were used in addition to two taxonomy-independent measures of analysis, full definitions and explanations of which are in Fig 1. The spread in the numbers of taxonomic units obtained across the combinations of parameters chosen is seen. The radius of each semicircle represents the number of taxonomic units obtained given a set combination of the parameters used. The number of taxonomic units is also indicated above each semicircle. Each semicircle is proportional to all others. AFM—abundance filtering method; QFM—quality filtering method; TIM—taxonomy-independent method.