| Literature DB >> 28827531 |
Joe Parker1, Andrew J Helmstetter2, Dion Devey2, Tim Wilkinson2, Alexander S T Papadopulos3,4.
Abstract
Advances in DNA sequencing and informatics have revolutionised biology over the past four decades, but technological limitations have left many applications unexplored. Recently, portable, real-time, nanopore sequencing (RTnS) has become available. This offers opportunities to rapidly collect and analyse genomic data anywhere. However, generation of datasets from large, complex genomes has been constrained to laboratories. The portability and long DNA sequences of RTnS offer great potential for field-based species identification, but the feasibility and accuracy of these technologies for this purpose have not been assessed. Here, we show that a field-based RTnS analysis of closely-related plant species (Arabidopsis spp.) has many advantages over laboratory-based high-throughput sequencing (HTS) methods for species level identification and phylogenomics. Samples were collected and sequenced in a single day by RTnS using a portable, "al fresco" laboratory. Our analyses demonstrate that correctly identifying unknown reads from matches to a reference database with RTnS reads enables rapid and confident species identification. Individually annotated RTnS reads can be used to infer the evolutionary relationships of A. thaliana. Furthermore, hybrid genome assembly with RTnS and HTS reads substantially improved upon a genome assembled from HTS reads alone. Field-based RTnS makes real-time, rapid specimen identification and genome wide analyses possible.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28827531 PMCID: PMC5566789 DOI: 10.1038/s41598-017-08461-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Logistics and scope of field-based sequencing. (a) Location of sample collection and extraction, sequencing and analyses in the Snowdonia National Park, Wales. Maps were created using ESRI ArcGIS Desktop v10.5. Source of elevation data: U.S. Geological Survey, Shuttle Radar Topography Mission 1 Arc-Second Global[46]. (b) Arabidopsis thaliana. (c) A. lyrata ssp. petraea. (d) The portable field laboratory used for the research. Ambient temperatures varied between 7–16 °C with peak humidity >80%. A portable generator was used to supply electrical power.
Figure 2Sample identification and phylogenomics using field-sequenced RTnS data. (a–d) Orthogonal species identification using BLASTN difference statistics: HTS data (red) and RTnS (black) matched to reference databases via BLASTN. (a,c) Receiver operating characteristic (ROC; estimated false-positive rate vs. estimated true positive rate) and (b,d) estimated true- (solid lines) and false-positive (dashed lines) rates. (a,b) ∆LT statistic; (c,d) ∆LI statistic. (e) Accumulation curves for ab initio gene models predicted directly from individual A. thaliana reads over time. Count of unique TAIR10 genes (solid line) and total number of gene models (dashed line). Shaded boxes represent periods where the MinION devices were halted while the laboratory was dismantled and moved. (f) phylogenetic tree inferred under the multispecies coalescent from RTnS reads.
Figure 3Simulated accumulation curves for rapid species identification by DNA sequencing. 34k pairwise BLASTN hits of A. thaliana RTnS reads were subsampled without replacement to simulate an incremental accumulation of data (104 reads; 103 replicates). For each read the total identities bias (∆LI) is the number of identities with the A. thaliana reference minus the number of identities with the A. lyrata reference. (a) the proportion of A. thaliana reads correctly identified on a per-read basis, classified as A. thaliana where ∆LI > threshold cutoff (0, 1, 10 or 100). (b) Mean ∆LI in the simulated dataset rapidly stabilises on the population mean (+754 bp, e.g. an average matching read alignment to A. thaliana is 754 bp longer than to A. lyrata). (c) Cumulative aggregate ∆LI; negative or zero ∆LI can rapidly be excluded. Typical data throughput rates exceed 104 reads per hour of sequencing.