| Literature DB >> 27485345 |
Chenhao Li1,2, Kern Rei Chng1, Esther Jia Hui Boey1, Amanda Hui Qi Ng1, Andreas Wilm1, Niranjan Nagarajan3,4.
Abstract
BACKGROUND: Nanopore sequencing provides a rapid, cheap and portable real-time sequencing platform with the potential to revolutionize genomics. However, several applications are limited by relatively high single-read error rates (>10 %), including RNA-seq, haplotype sequencing and 16S sequencing.Entities:
Keywords: Barcode sequencing; Consensus algorithms; Nanopore sequencing; Rolling circle amplification
Mesh:
Substances:
Year: 2016 PMID: 27485345 PMCID: PMC4970289 DOI: 10.1186/s13742-016-0140-7
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Statistics for sequenced datasets
| Dataset | Number of species | Sequencing platform | Statistics before correction | Statistics after correction | Estimated chimera rate | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Number of 2D pass reads (# of reads) | Number of bases | N50 (average read length) | Number of reads | Number of bases | N50 (average read length) | Template switching | Intermolecular ligation | |||
| Simple synthetic community (PacBio) | 3 | PacBio | - (126456) | 669259255 | 7308 (5292) | 16327 | 11855990 | 734 (726) | 0.03 | 0.0006 |
| Simple synthetic community (Nanopore) | 3 | MinION | 14583 (45268) | 82074997 | 6807 (5628) | 2177 | 1592303 | 731 (730) | 0.006 | 0.0009 |
| Ladder synthetic community (replicate 1) | 10 | MinION | 7444 (27937) | 31866337 | 4570 (4280) | 1076 | 794358 | 739 (738) | - | - |
| Ladder synthetic community (replicate 2) | 10 | MinION | 2904 (7989) | 22446690 | 10536 (7729) | 1183 | 867494 | 732 (733) | - | - |
Fig. 1Overview of the INC-Seq workflow. a Template molecules are circularized in optimized conditions, and the remaining linear molecules are removed. The circular products are amplified with RCA and sequenced on the MinION platform. b A subsequence from the raw nanopore reads is used as an anchor to scan the entire read for the location of repeating units. The repeating units flanked by adjacent anchor starting points are aligned, and a consensus sequence is constructed. c In INC-Seq library preparation, chimeras are expected through intermolecular ligation and template switching. Chimeras from template switching are likely to be detected by the anchor mapping protocol. Chimeras from intermolecular ligation were observed to be rare under the experimental conditions used in INC-Seq
Fig. 2INC-Seq evaluation with simulated reads. a 100 representative reference sequences were selected from a customized database. Simulated reads were generated for each reference sequence, as well as artificial RCA products (that is, INC-Seq libraries). Synthetic INC-Seq reads significantly improved the proportion of correctly mapped reads over raw ONT 2D reads at both species and reference level (p-value <10-9; one-sided paired Wilxcon test). b Species-level classification F1 score (that is, the harmonic mean of precision and recall) using simulated raw, CANU corrected and INC-Seq reads from 100 reference sequences and with different identity thresholds for classification (each curve represents the average across 10 replicates)
INC-Seq-estimated abundances for a simple synthetic community
| Species | Reference (GenBank ID) | Relative abundance | ||
|---|---|---|---|---|
| Defined | SMRT | ONT 2D | ||
|
| KT932114.1 | 0.600 | 0.552 | 0.639 |
|
| KF933778.1 | 0.300 | 0.393 | 0.309 |
|
| KP944178.1 | 0.100 | 0.055 | 0.052 |
Fig. 3INC-Seq produces long and accurate reads. a The ratio between the length of INC-Seq corrected reads and the reference sequence is tightly distributed around 1 (only reads with length from 600-800 bp are shown here). b, c INC-Seq boosts overall read accuracy and significantly reduces mismatch, insertion and deletion error rates (one-sided Wilcoxon test p-value <10-15 in all cases). d Accuracy of INC-Seq sequences increases with the number of segments used for consensus construction
Species level profiling for a ‘ladder’ synthetic community
| Species | Relative abundances | |
|---|---|---|
| Defined | INC-Seq | |
|
| 0.362 | 0.631 |
|
| 0.320 | 0.148 |
|
| 0.160 | 0.096 |
|
| 0.080 | 0.037 |
|
| 0.040 | 0.042 |
|
| 0.020 | 0.009 |
|
| 0.010 | 0.022 |
|
| 0.005 | 0.007 |
|
| 0.002 | 0.004 |
|
| 0.001 | 0.002 |
Fig. 4Species detection using INC-Seq. a Ten species were selected to construct an artificial community. Some species have highly similar 16S sequences (for example, S. aureus and S. epidermidis share 99 % identity). b Two separate INC-Seq runs produce consistent results that are well correlated with defined abundances (Pearson ρ = 0.83, p-value = 0.003; Spearman ρ = 0.98, p-value <10-15)