| Literature DB >> 28747151 |
Michael Giolai1, Pirita Paajanen1,2, Walter Verweij1, Kamil Witek3, Jonathan D G Jones3, Matthew D Clark4,5.
Abstract
BACKGROUND: The Oxford Nanopore Technologies MinION™ sequencer is a small, portable, low cost device that is accessible to labs of all sizes and attractive for in-the-field sequencing experiments. Selective breeding of crops has led to a reduction in genetic diversity, and wild relatives are a key source of new genetic resistance to pathogens, usually via NLR immune receptor-encoding genes. Recent studies have demonstrated how crop NLR repertoires can be targeted for sequencing on Illumina or PacBio (RenSeq) and the specific gene conveying pathogen resistance identified.Entities:
Keywords: Gene enrichment; MinION; NLR; NLR gene fusions; Oxford Nanopore technologies; PacBio; R-gene; RenSeq; Resistance gene; Resistance protein; Targeted capture
Mesh:
Year: 2017 PMID: 28747151 PMCID: PMC5530509 DOI: 10.1186/s12864-017-3936-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Overview of the RenSeq protocol: Genomic DNA is sheared to the desired insert size. PCR adapters are ligated to the genomic DNA shear and the shear is amplified. Biotinylated custom made baits are hybridised to the sequences of interest. Molecules with hybridised baits can be separated by Streptavidin magnetic bead capture. The captured DNA sequences are subsequently amplified and the amplified products processed to a MinION library or a PacBio library
Comparison of ONT MinION R7.3 and PacBio RSII sequencing performance values: MinION fail and pass and PacBio RSII SR and RoI amount of reads, read quality and read size
| MinION fail Template | MinION fail Complement | MinION fail 2D | MinION pass Template | MinION pass Complement | MinION pass 2D | PacBio Subreads | PacBio Reads of Insert | |
|---|---|---|---|---|---|---|---|---|
| Number of reads [n] | 268,044 | 112,405 | 83,692 | 193,850 | 193,850 | 193,850 | 383,981 | 101,331 |
| Number of bases [Mbp] | 630 | 273 | 209 | 507 | 484 | 503 | 1360 | 353 |
| Modal accuracy | 74.88% | 60.19% | 84.15% | 74.88% | 74.88% | 92.06% | 90.00% | 99.99% |
| Mean accuracy | 70.24% | 66.42% | 82.62% | 77.84% | 76.79% | 91.36% | 89.83% | 99.57% |
| N50 reads length [bp] | 2916 | 2829 | 2786 | 2278 | 2169 | 2262 | 3540 | 3559 |
| Mean read length [bp] | 11,665 | 10,121 | 3250 | 2838 | 2716 | 2813 | 3818 | 3675 |
| Modal read length [bp] | 1138 | 2454 | 2306 | 2570 | 2720 | 2586 | 3482 | 3485 |
Fig. 2Performance comparison between ONT MinION and Pacbio RSII: a Read length profile of MinION 2D pass reads obtained on four R7.3 flow cells with a 3 kb PCR product and PacBio RoI obtained by Witek et al. [11]. b Accuracy scores of MinION pass reads and PacBio SR and RoI (the PacBio RoI mostly possessing an accuracy of 99% are visible as read peak at the 100% mark)
Fig. 3MinION 2D read assembly pipeline: Basecalling is performed using Metrichor. The FASTA or FASTQ information is extracted from the fast5 files. PCR adapters are removed using cutadapt and chimeric reads are filtered out of the dataset using BLASR. The adapter curated reads are corrected and trimmed in the Canu assembly pipeline and assembled with Canu. After assembly the contigs are polished with nanopolish
Read statistics after adapter trimming, chimera filtering and correction: Adapter trimming, removal of reads smaller than 150 bp and chimera filtering reduced the number of Mbp in each dataset by approximately 5%. Due to the lower quality MinION reads were Canu corrected before assembly. The Canu pipeline further reduced the amont of MinION reads to 304 Mbp before assembly – a number similar to the amount of PacBio RoI
| MinION 2D pass (trimmed, filtered) | MinION 2D pass (trimmed, filtered, corrected) | PacBio Reads of Insert (trimmed, filtered) | |
|---|---|---|---|
| Number of reads [n] | 193,724 | 114,027 | 100,958 |
| Number of bases [Mbp] | 475 | 304 | 337 |
| N50 reads length [bp] | 2681 | 2739 | 3430 |
| Mean read length [bp] | 2681 | 2784 | 3536 |
Fig. 4Coverage (blue) and mapping quality (red) histograms of MinION and PacBio reads mapped to the 649 annotated NLR genes: a MinION 2D pass reads b Adapter trimmed, chimera filtered, corrected MinION 2D pass reads, c PacBio SR, d Adapter trimmed, chimera filtered, PacBio RoI. Approximately 40 contigs are covered with <50× by the MinION 2D and PacBio RoI datasets. For most of the contigs the coverage is ≥50×. For the coverage histograms a cutoff at 1500× was defined. Whereas all datasets are containing some low quality mapping reads indicating ambiguous mapping due to the high similarity of NLR genes, the majority of reads is mapping with a Phred score of 60. As expected PacBio SR and MinION 2D pass reads show a higher number of low quality mapping events as adapter trimmed and chimera filtered PacBio RoI and adapter trimmed, chimera filtered and corrected MinION 2D pass reads
Mapping statistics of reads before and after filtering to the 649 annotated NLR genes: Not adapter filtered MinION 2D pass and PacBio SR and adapter filtered MinION 2D pass (Canu corrected and trimmed) reads and PacBio RoI were mapped to the annotated 649 NLR genes described by Witek et al.
| MinION 2D pass | MinION 2D pass (trimmed, filtered, corrected) | PacBio Subreads | PacBio Reads of Insert (trimmed, filtered) | |
|---|---|---|---|---|
| Mapped reads | 99.13% | 97.27% | 97.38% | 95.54% |
| Mean Coverage | 82.32 | 55.94 | 210.80 | 55.48 |
| Mean mapping quality | 45.53 | 50.50 | 47.63 | 52.11 |
| General error rate | 13.49% | 3.73% | 13.45% | 3.93% |
Comparison of read statistics before and after adapter curation: NLR-Parser statistics of not adapter filtered MinION 2D and PacBio SR and adapter filtered MinION 2D pass (Canu corrected and trimmed) reads and PacBio RoI
| MinION 2D pass | MinION 2D pass (trimmed, filtered, corrected) | PacBio Subreads | PacBio Reads of Insert (trimmed, filtered) | |
|---|---|---|---|---|
| Number of reads containing baits | 121,170 | 93,482 | 219,934 | 74,442 |
| % of reads containing baits | 62.50% | 81.98% | 57.28% | 73.73% |
| NLR-Parser hits | 20,525 | 56,211 | 11,003 | 45,853 |
| % NLR-Parser hits of total reads | 10.59% | 49.30% | 2.86% | 45.41% |
| NLR-Parser hits scored as partial | 19,984 | 50,410 | 10,791 | 39,512 |
| % NLR-Parser hits scored as partial | 97.36% | 89.68% | 98.07% | 86.17% |
| NLR-Parser hits scored as complete | 541 | 5801 | 212 | 6341 |
| % NLR-Parser hits scored as complete | 2.64% | 10.32% | 1.93% | 13.83% |
Assembly statistics and NLR-Parser evaluation of the assemblies: Canu assembly (nanopolished and not nanopolished) using MinION 2D pass data, Canu using PacBio RoI, HGAP using PacBio RoI and Geneious PacBio RoI
| Canu MinION | Canu MinION (nanopolish) | Canu PacBio | HGAP | Geneious | |
|---|---|---|---|---|---|
| Number of contigs | 1085 | 1085 | 1483 | 1460 | 837 |
| Minimal contig length [bp] | 1568 | 1695 | 1008 | 517 | 3882 |
| Contig N80 [bp] | 4873 | 4958 | 4464 | 3949 | 7775 |
| Contig N50 [bp] | 8089 | 8230 | 6817 | 7149 | 10,935 |
| Contig N20 [bp] | 13,963 | 14,185 | 12,785 | 11,796 | 18,321 |
| Mean contig size | 12,167 | 12,366 | 10,099 | 9353 | 13,929 |
| Maximal contig length (bp) | 132,431 | 134,631 | 59,085 | 85,187 | 55,450 |
| Sum of bp assembled | 7,606,604 | 7,749,213 | 9,835,757 | 8,307,997 | 9,008,910 |
| NLR-Parser hits | 584 | 608 | 557 | 667 | 586 |
| NLR-Parser hits scored as partial | 308 | 332 | 324 | 372 | 257 |
| % NLR-Parser hits scored as partial | 52.74% | 54.60% | 58.35% | 55.69% | 43.78% |
| NLR-Parser hits scored as complete | 275 | 276 | 231 | 295 | 329 |
| % NLR-Parser hits scored as complete | 47.26% | 45.39% | 41.65% | 44.31% | 56.21% |
Fig. 5NUCmer comparison of assemblies vs. Geneious: All assemblies were aligned to the Geneious reference assembly using NUCmer and visualized using mummerplot. a Canu MinION 2D pass assembly, b Nanopolished Canu MinION 2D pass assembly, c Canu PacBio assembly, (D) HGAP assembly. A remarkable increase in identity was achieved by nanopolishing the Canu MinION assembly visible in (a). Red dots indicate forward matches, blue dots indicate reverse matches. Contig names on the x and y axis were removed as due to the high number of contigs the names were not resolved properly
Comparison of all assemblies with the annotated NLR genes using BLAST: The 649 NLR genes described by Witek et al. were mapped to each assembly. In all cases all 649 NLR genes are mapping in the assemblies
| Canu MinION | Canu MinION (nanopolish) | Canu PacBio | HGAP | Geneious | |
|---|---|---|---|---|---|
| Average percent identity | 98.61% ± 0.61% | 99.42% ± 0.50% | 99.62% ± 1.21% | 99.85% ± 0.70% | 100.00% ± 0.00% |
| Average alignment length | 6943 bp ± 1718 bp | 6989 bp ± 1700 bp | 6501 bp ± 2000 bp | 6729 bp ± 1742 bp | 7811 bp ± 2166 bp |
| BLAST hits | 591 | 594 | 577 | 606 | 649 |
| % database covered | 91.06% | 91.52% | 88.91% | 93.37% | 100% |
AUGUSTUS protein prediction results: BLASTP comparison of R-proteins predicted of all assemblies with the 641 predicted R-proteins of the 649 NLR gene reference (amino acids is abbreviated with aa)
| Canu MinION (nanopolish) | Canu MinION (nanopolish, pilon) | Canu PacBio | HGAP | Geneious | |
|---|---|---|---|---|---|
| Predicted R-proteins | 649 | 675 | 611 | 805 | 702 |
| NLR-Parser complete | 174 | 283 | 251 | 310 | 380 |
| NLR-Parser partial | 475 | 392 | 361 | 495 | 322 |
| R-proteins with BLASTP hit | 368 | 496 | 445 | 533 | 587 |
| % Database coverage | 57.41% | 77.37% | 69.42% | 83.15% | 91.57% |
| % identity to reference | 90.67% ± 9.06% | 95.53% ± 8.42% | 98.19% ± 5.59% | 99.22% ± 2.98% | 99.87% ± 1.12% |
| BLASTP alignment length of R-proteins | 826aa ± 278aa | 921aa ± 301aa | 914aa ± 296aa | 954 aa ±299 aa | 971 aa ±330 aa |
| BLASTP % alignment length of R-proteins | 84.11% ± 18.18% | 93.65% ± 13.74% | 93.81% ± 13.62% | 95.50% ± 10.82% | 98.30% ± 6.12% |