| Literature DB >> 28369459 |
Benjamin Istace1, Anne Friedrich2, Léo d'Agata1, Sébastien Faye1, Emilie Payen1, Odette Beluche1, Claudia Caradec2, Sabrina Davidas1, Corinne Cruaud1, Gianni Liti3, Arnaud Lemainque1, Stefan Engelen1, Patrick Wincker1,4,5, Joseph Schacherer2, Jean-Marc Aury1.
Abstract
BACKGROUND: Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed.Entities:
Keywords: Genome finishing; MinION device; Nanopore sequencing; Oxford Nanopore; Structural variations; Transposable elements; de novo assembly
Mesh:
Substances:
Year: 2017 PMID: 28369459 PMCID: PMC5466710 DOI: 10.1093/gigascience/giw018
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Identity distribution of Nanopore reads. Percent identity of the aligned MinION 1D (red bars) and 2D (green bars) reads. The MinION reads were aligned using LAST software. (a) R7.3 chemistry. (b) R9 chemistry.
Metrics of the SMARTdenovo S288C assemblies before and after polishing with Nanopolish using R7 reads. The Nanopore 2D reads were aligned to the most continuous SMARTdenovo assembly. The alignment was given as input to Nanopolish to correct assembly errors. Metrics were obtained by aligning the pre-polishing and post-polishing version of the assembly to the reference genome using Quast.
| SMARTdenovo pre-polishing | SMARTdenovo post-polishing | |
|---|---|---|
| # contigs | 26 | 26 |
| Cumulative size | 12 018 244 | 12 204 373 |
| N50 | 771 149 | 782 423 |
| N90 | 238 808 | 242 444 |
| L50 | 7 | 7 |
| L90 | 16 | 16 |
| # mismatches | 6970 | 1930 |
| # insertions | 7735 | 7707 |
| # deletions | 128 050 | 17 445 |
| # deletions in homopolymers | 79 152 | 6869 |
| # genes | 6251 + 24 partial | 6273 + 15 partial |
| # genes without indels | 429 | 2590 |
Metrics of the S288C assemblies after polishing. Assemblies were corrected using 300x of 2 × 250 bp Illumina reads as input to Pilon. The resulting corrected assembly was then aligned to the S288C reference genome using Quast.
| Spades | Canu | Miniasm | SMARTdenovo | ABruijn | |
|---|---|---|---|---|---|
| Reads dataset used | Illumina PE 2 × 250 bp | 2D pass | Canu-corrected | Longest 2D | 2D |
| Coverage | 300x | 67x | 108x | 30x | 120x |
| # reads > 10 kb | 0 | 16 860 | 21 005 | 28 668 | 28 668 |
| # contigs | 376 | 37 | 28 | 26 | 23 |
| Cumulative size | 12 047 788 | 12 230 747 | 12 113 521 | 12 213 590 | 12 182 847 |
| Genome fraction (%) | 96.464 | 98.519 | 98.421 | 99.352 | 98.635 |
| N50 | 149 184 | 610 494 | 736 456 | 783 336 | 816 355 |
| N90 | 19 522 | 191 846 | 265 917 | 242 658 | 257 117 |
| L50 | 27 | 8 | 7 | 7 | 6 |
| L90 | 100 | 20 | 16 | 16 | 16 |
| # mismatches | 1126 | 1898 | 4455 | 4205 | 2138 |
| # mismatches per 100 kb | 9.47 | 15.85 | 37.23 | 34.27 | 17.88 |
| # insertions | 81 | 1657 | 3164 | 2384 | 1325 |
| # deletions | 439 | 1869 | 5208 | 5551 | 1838 |
| # deletions in homopolymers | 38 | 868 | 4248 | 4023 | 740 |
| # indels per 100 kb | 1.97 | 22.49 | 57.27 | 46.76 | 21.76 |
| # genes | 6087 + 177 partial | 6241 + 32 partial | 6215 + 37 partial | 6266 + 33 partial | 6243 + 45 partial |
| # genes without indels | 6023 | 5921 | 5475 | 5881 | 6002 |
Figure 2:Feature composition of the S288C assemblies, assembly and quality metrics, and assembler running statistics. The feature content of the best S288C assemblies for each assembler is shown in the left part of the figure. The feature composition was obtained by aligning each assembly to the S288C reference genome. Assembly and quality metrics for each assembly, obtained by using Quast, are shown in the middle part of the figure. The running time and the memory usage of each assembler are shown in the right part of the figure.
Assembly metrics of the SMARTdenovo assemblies of all yeast strain genomes.
| # contigs | Cumul (bp) | N50 (bp) | N90 (bp) | L50 | L90 | Max size (bp) | |
|---|---|---|---|---|---|---|---|
| ABH | 22 | 11 960 929 | 803 880 | 267 734 | 6 | 16 | 1 483 918 |
| ADM | 41 | 11 883 044 | 474 542 | 171 488 | 10 | 26 | 1 009 064 |
| ADQ | 26 | 11 828 347 | 896 166 | 223 992 | 6 | 18 | 1 223 692 |
| ADS | 33 | 11 706 636 | 524 733 | 247 699 | 9 | 21 | 1 050 223 |
| AEG | 23 | 12 026 175 | 681 360 | 273 814 | 7 | 16 | 1 244 014 |
| AKR | 25 | 11 911 766 | 729 090 | 243 900 | 7 | 17 | 1 056 085 |
| ANE | 47 | 11 900 397 | 312 705 | 144 286 | 11 | 31 | 933 716 |
| ASN | 40 | 11 904 493 | 394 798 | 143 405 | 11 | 28 | 846 371 |
| AVB | 31 | 11 991 127 | 609 633 | 199 011 | 7 | 20 | 1 225 549 |
| BAH | 28 | 11 829 394 | 571 862 | 227 561 | 8 | 20 | 1 066 359 |
| BAL | 27 | 11 907 375 | 678 155 | 269 114 | 7 | 19 | 1 075 839 |
| BAM | 105 | 11 996 380 | 162 412 | 53 623 | 24 | 72 | 450 388 |
| BCN | 19 | 11 775 292 | 785 507 | 458 793 | 6 | 14 | 1 410 650 |
| BDF | 45 | 12 068 568 | 460 458 | 116 953 | 10 | 29 | 863 099 |
| BHH | 26 | 11 973 506 | 577 727 | 221 661 | 7 | 18 | 1 530 377 |
| CBM | 68 | 11 553 446 | 258 798 | 86 167 | 16 | 44 | 521 412 |
| CEI | 18 | 11 987 201 | 800 227 | 451 575 | 6 | 14 | 1 480 681 |
| CFA | 24 | 11 834 226 | 726 317 | 225 716 | 7 | 17 | 1 032 352 |
| CFF | 81 | 12 162 869 | 236 957 | 83 285 | 18 | 54 | 550 022 |
| CIC | 96 | 12 016 445 | 201 870 | 63 799 | 22 | 63 | 377 026 |
| CNT | 22 | 12 171 929 | 800 046 | 440 742 | 6 | 14 | 1 402 970 |
| CRV (S288C) | 26 | 12 213 584 | 783 337 | 242 658 | 7 | 16 | 1 532 642 |
| Median | 27.5 | 11 936 347 | 593 680 | 224 854 | 7 | 19.5 | 1 061 222 |
| Reference | 17 | 12 157 105 | 924 431 | 439 888 | 6 | 13 | 1 531 933 |
Figure 3:Cartography of the Ty transposon family. First and second tracks show, respectively, the percentage identity of the SMARTdenovo S288C assembly before and after polishing with Illumina paired-end reads using Pilon. The third track shows the 80th percentile number of contigs obtained for each strain and for all chromosomes. The remaining tracks show the density of Ty transposons or positions of the Ty1, Ty2, Ty3, Ty4, and Ty5 transposons across all the yeast strains. The red dot on the karyotype track shows the position of the rDNA cluster.
Number of copies of multiple transposons across all yeast strains assemblies.
| Ty1 | Ty2 | Ty3 | Ty4 | Ty5 | |
|---|---|---|---|---|---|
| ABH | 4 | 7 | 6 | 3 | 2 |
| ADM | 5 | 8 | 1 | 1 | 0 |
| ADQ | 4 | 7 | 1 | 2 | 0 |
| ADS | 1 | 9 | 0 | 0 | 1 |
| AEG | 15 | 7 | 2 | 1 | 2 |
| AKR | 4 | 4 | 4 | 1 | 1 |
| ANE | 1 | 5 | 3 | 2 | 0 |
| ASN | 13 | 6 | 0 | 0 | 0 |
| AVB | 0 | 29 | 0 | 0 | 2 |
| BAH | 0 | 6 | 1 | 3 | 0 |
| BAL | 8 | 0 | 12 | 0 | 0 |
| BAM | 4 | 13 | 6 | 2 | 1 |
| BCN | 6 | 0 | 0 | 0 | 0 |
| BDF | 13 | 3 | 3 | 3 | 1 |
| BHH | 20 | 12 | 5 | 4 | 0 |
| CBM | 3 | 1 | 0 | 1 | 0 |
| CEI | 2 | 20 | 1 | 0 | 0 |
| CFA | 8 | 1 | 1 | 0 | 1 |
| CFF | 6 | 6 | 2 | 0 | 1 |
| CIC | 6 | 3 | 1 | 1 | 0 |
| CNT | 17 | 6 | 1 | 1 | 1 |
| CRV (S288C) | 36 | 13 | 2 | 3 | 1 |
| Reference | 31 | 13 | 2 | 3 | 1 |
Copy number of ENA1-2 and CUP1 tandem-repeated genes across the 21 natural isolates assemblies.
|
|
| |
|---|---|---|
| ABH | 1 | 10 |
| ADM | 2 | 1 |
| ADQ | 1 | 1 |
| ADS | 2 | 3 |
| AEG | 2 | 10 |
| AKR | 1 | 1 |
| ANE | 1 | 1 |
| ASN | 1 | 3 |
| AVB | 4 | 2 |
| BAH | 1 | 1 |
| BAL | 1 | 1 |
| BAM | 1 | 2 |
| BCN | 1 | 1 |
| BDF | 4 | 4 |
| BHH | 5 | 3 |
| CBM | 1 | 1 |
| CEI | 1 | 1 |
| CFA | 1 | 1 |
| CFF | 2 | 4 |
| CIC | 2 | 4 |
| CNT | 2 | 1 |
Chromosomic rearrangements detected across all 21 strains.
| Strain | Chromosome 1 | Chromosome 2 | Type |
|---|---|---|---|
| ABH | 5 | 14 | Translocation |
| ABH | 5 | 14 | Translocation |
| ABH | 5 | 14 | Translocation |
| ABH | 14 | 14 | Inversion |
| ADM | 2 | 4 | Translocation |
| ADM | 5 | 7 | Translocation |
| AKR | 15 | 4 | Translocation |
| ANE | 16 | 5 | Translocation |
| ANE | 9 | 14 | Translocation |
| ASN | 5 | 2 | Translocation |
| AVB | 12 | 7 | Translocation |
| AVB | 7 | 12 | Translocation |
| BAH | 4 | 7 | Translocation |
| BAH | 10 | 9 | Translocation |
| BAL | 8 | 9 | Translocation |
| BAM | 4 | 7 | Translocation |
| BAM | 12 | 13 | Translocation |
| BCN | 6 | 13 | Translocation |
| BCN | 6 | 15 | Translocation |
| BDF | 4 | 14 | Translocation |
| BDF | 4 | 4 | Inversion |
| BDF | 5 | 12 | Translocation |
| BDF | 10 | 5 | Translocation |
| BHH | 12 | 12 | Inversion |
| BHH | 12 | 12 | Inversion |
| CBM | 16 | 3 | Translocation |
| CBM | 4 | 7 | Translocation |
| CBM | 12 | 15 | Translocation |
| CEI | 11 | 12 | Translocation |
| CFF | 14 | 12 | Translocation |
| CIC | 11 | 8 | Translocation |
| CIC | 4 | 7 | Translocation |
| CNT | 6 | 14 | Translocation |