| Literature DB >> 30018874 |
Claudia Colabella1, Laura Corte1, Luca Roscini1, Matteo Bassetti2, Carlo Tascini3, Joseph C Mellor4, Wieland Meyer5, Vincent Robert6, Duong Vu6, Gianluigi Cardinali1,7.
Abstract
Species identification of yeasts and other Fungi is currently carried out with Sanger sequences of selected molecular markers, mainly from the ribosomal DNA operon, characterized by hundreds of tandem repeats of the 18S, ITS1, 5.8S, ITS2 and LSU loci. The ITS region has been recently proposed as a primary barcode marker making this region the most used one in taxonomy, phylogeny and diagnostics. The introduction of NGS is providing tools of high efficacy and relatively low cost to amplify two or more markers simultaneously with great sequencing depth. However, the presence of intra-genomic variability between the repeats requires specific analytical procedures and pipelines. In this study, 286 strains belonging to 11 pathogenic yeasts species were analysed with NGS of the region spanning from ITS1 to the D1/D2 domain of the LSU encoding ribosomal DNA. Results showed that relatively high heterogeneity can hamper the use of these sequences for the identification of single strains and even more of complex microbial mixtures. These observations point out that the metagenomics studies could be affected by species inflection at levels higher than currently expected.Entities:
Keywords: ITS; LSU; Next generation sequencing; Sanger; identification
Year: 2018 PMID: 30018874 PMCID: PMC6048569 DOI: 10.5598/imafungus.2018.09.01.07
Source DB: PubMed Journal: IMA Fungus ISSN: 2210-6340 Impact factor: 3.515
Output parameters of de novo assembly and mapping.
| n° reads assembled | 23,389 | 22,755 | 23,721 | 23,674 | |
| n° reads not assembled | 421 | 1,055 | 89 | 136 | |
| Assembly duration | 7.19 seconds | 21.58 seconds | 38 minutes-10 seconds | 12 minutes-29 seconds | |
| CPU time | 8.54 seconds | 12.49 seconds | 2h-11 minutes | 41 minutes-15 seconds | |
| Contigs | 1 | 1 | 119 | 219 | |
| n° reads assembled | 52,270 | 51,599 | 52,187 | 53,022 | |
| n° reads not assembled | 1,227 | 1,898 | 1,310 | 475 | |
| Assembly duration | 11.47 seconds | 33.06 seconds | 47 minutes+34 seconds | 10 minutes-46 seconds | |
| CPU time | 13.37 seconds | 17.31 seconds | 2h-49 minutes | 34 minutes-1 second | |
| Contigs | 1 | 1 | 242 | 357 | |
| n° reads assembled | 18,883 | 18,257 | 19,454 | 19,334 | |
| n° reads not assembled | 879 | 1,505 | 308 | 428 | |
| Assembly duration | 5.10 seconds | 21.01 seconds | 26 minutes-10 seconds | 10 minutes-8 seconds | |
| CPU time | 6.68 seconds | 10.52 seconds | 1h-33 minutes | 34 minutes-2 seconds | |
| Contigs | 1 | 1 | 121 | 209 | |
| n° reads assembled | 47,611 | 46,352 | 48,897 | 48,479 | |
| n° reads not assembled | 1,726 | 2,985 | 440 | 858 | |
| Assembly duration | 10.48 seconds | 34.93 seconds | 1h-40 minutes | 21 minutes-11 seconds | |
| CPU time | 11.48 seconds | 21.11 seconds | 5h-20 minutes | 1h-11 minutes | |
| Contigs | 1 | 1 | 243 | 470 | |
| n° reads assembled | 19,533 | 19,472 | 19,960 | 19,923 | |
| n° reads not assembled | 556 | 617 | 129 | 166 | |
| Assembly duration | 7.69 seconds | 27.36 seconds | 34 minutes-27 seconds | 11 minutes-5 seconds | |
| CPU time | 10.98 seconds | 22.71 seconds | 1h-58 minutes | 37 minutes-37 seconds | |
| Contigs | 1 | 1 | 119 | 236 | |
| n° reads assembled | 36,136 | 36,019 | 36,720 | 36,591 | |
| n° reads not assembled | 819 | 936 | 235 | 364 | |
| Assembly duration | 11.45 seconds | 38.25 seconds | 59 minutes-5 seconds | 15 minutes-47 seconds | |
| CPU time | 7.91 seconds | 14.50 seconds | 3h-20 minutes | 51 minutes-58 seconds | |
| Contigs | 1 | 1 | 219 | 477 | |
| n° reads assembled | 36,717 | 36,584 | 37,683 | 37,432 | |
| n° reads not assembled | 966 | 1,099 | 1,264 | 251 | |
| Assembly duration | 12.19 seconds | 27.66 seconds | 44 minutes-3 seconds | 15 minutes-11 seconds | |
| CPU time | 17.55 seconds | 29.27 seconds | 2h-33 minutes | 51 minutes-48 seconds | |
| Contigs | 1 | 1 | 190 | 335 | |
| n° reads assembled | 21,549 | 21,502 | 22,569 | 22,336 | |
| n° reads not assembled | 2,268 | 2,315 | 1,248 | 1,481 | |
| Assembly duration | 7.24 seconds | 19.88 seconds | 41 minutes-34 seconds | 32 minutes-57 seconds | |
| CPU time | 5.73 seconds | 11.48 seconds | 2h-24 minutes | 1h-1 minute | |
| Contigs | 1 | 1 | 281 | 632 | |
| n° reads assembled | 41,841 | 41,361 | 43,253 | 44,140 | |
| n° reads not assembled | 2,903 | 3,383 | 1,491 | 604 | |
| Assembly duration | 12.23 seconds | 31.09 seconds | 43 minutes-4 seconds | 15 minutes-9 seconds | |
| CPU time | 9.70 seconds | 14.13 seconds | 2h-28 minutes | 45 minutes-59 seconds | |
| n° reads assembled | 18,007 | 17,593 | 19,671 | 19,550 | |
| n° reads not assembled | 2,118 | 2,532 | 454 | 575 | |
| Assembly duration | 6.07 seconds | 22.98 seconds | 33 minutes-32 seconds | 11 minutes-47 seconds | |
| CPU time | 5.45 seconds | 8.12 seconds | 1h-56 minutes | 39 minutes-35 seconds | |
| Contigs | 1 | 1 | 369 | 527 | |
| n° reads assembled | 36,085 | 35,598 | 40,094 | 39,914 | |
| n° reads not assembled | 4,574 | 5,061 | 565 | 745 | |
| Assembly duration | 12.25 seconds | 27.05 seconds | 51 minutes-46 seconds | 19 minutes-55 seconds | |
| CPU time | 15.88 seconds | 15.43 seconds | 2h-58 minutes | 1h-5 minutes | |
| Contigs | 1 | 1 | 507 | 711 | |
| n° reads assembled | 17,593 | 17,335 | 19,195 | 19,152 | |
| n° reads not assembled | 1,795 | 2,053 | 193 | 236 | |
| Assembly duration | 6.33 seconds | 19.18 seconds | 22 minutes-39 seconds | 9 minutes-18 seconds | |
| CPU time | 7.02 seconds | 9.02 seconds | 1h-20 minutes | 31 minutes-10 seconds | |
| Contigs | 1 | 1 | 129 | 248 | |
Type strains employed for contigs identification.
| CBS 562 | |
| CBS 7987 | |
| CBS 1795 | |
| CBS 138 | |
| CBS 573 | |
| CBS 10907 | |
| CBS 10906 | |
| CBS 604 | |
| CBS 1010 | |
| CBS 613 | |
| CBS 159 | |
| CBS 94 | |
| CBS 621 | |
| CBS 6936 | |
| CBS 2030 | |
| CBS 1171 |
*Outgroup type strain.
Fig. 1.Evaluation of the computational time requested for the two different approaches (Mapping against a Reference and De novo Assembly). A. CPU time needed by the two different types of procedures with the four settings. B. Correlation between the four different algorithms.
Fig. 2.Analysis of contigs quality. A. Similarity between a single contig of each of the two mapping algorithms and the members of the reference library. B. Variation of the similarity with the members of the library using High Sensitivity (HS) or Low Sensitivity (LS) algorithms. C. Homology of the contigs with the first and the second most similar species using BTL (dark-light blue) and BBm (dark-light green). D. Homology of the contigs analysed with High Sensitivity (dark-light blue) or Low Sensitivity (dark-light green) algorithms.
Fig. 3.Time performance of the mapping algorithms against three large libraries. A. Variation of the time performance using references and files of different size. B. Regression analysis between the time performance, the dimension of library and FASTQ files using the BTL (circle) and BBm (triangle) algorithms.
Performances of algorithms in mapping FASTQ files of different size with large reference libraries.
| 15,565 | BTL | A | 21,238 | 63 | 33,34 | 18,274 | 2,964 | 684 | 74,00% | 100% | |
| CBS | B | 58,263 | 99 | 123 | 54,822 | 3,441 | 661 | 91,00% | 100% | ||
| ITS | BBm | A | 21,238 | 80 | 39,82 | 18,323 | 2,915 | 626 | 68,20% | 100% | |
| B | 58,263 | 114 | 49,57 | 55,675 | 2,588 | 708 | 62,30% | 100% | |||
| 34,683 | BTL | A | 21,238 | 186 | 104 | 19,163 | 2,075 | 2,051 | 73,40% | 100% | |
| CBS | B | 58,263 | 273 | 393 | 57,140 | 1,123 | 2,457 | 79,50% | 100% | ||
| ITS-LSU | BBm | A | 21,238 | 204 | 109 | 19,000 | 2,238 | 1,939 | 58,80% | 100% | |
| B | 58,263 | 311 | 161 | 56,930 | 1,333 | 2,445 | 63,60% | 100% | |||
| 2,727 | BTL | A | 21,238 | 16,7 | 13,59 | 13,904 | 7,334 | 319 | 74,10% | 100% | |
| ISHAM | B | 58,263 | 30,3 | 33,5 | 36,416 | 21,847 | 297 | 82,00% | 100% | ||
| ITS | BBm | A | 21,238 | 37,7 | 16,13 | 13,665 | 7,573 | 280 | 74,50% | 100% | |
| B | 58,263 | 45,2 | 26,03 | 35,653 | 22,610 | 305 | 85,70% | 100% | |||
Strain A: CMC 1793; strain B: CMC 1818.
Example of a M1 and M2-M3 mapping against an ad hoc library of 16 ITS-LSU concatenate rDNA sequences of type strains of pathogenic yeasts (CMC 1912 strain).
| # Nucleotides | # Sequences | % Of Ref Seq | % Pairwise Identity | Mean Cover. | ||||||||
| 212,814 | 1,426 | 99,90% | 99,00% | 203,48 | 6,14% | 6,14% | 9,26% | 6,14% | 6,08% | 6,80% | ||
| 12,362 | 80 | 53,20% | 99,30% | 10,01 | 0,34% | 0,36% | 0,46% | 0,19% | 0,35% | 0,34% | ||
| 1,863 | 6 | 24,40% | 98,80% | 0,56 | 0,03% | 0,05% | 0,03% | 0,01% | 0,05% | 0,03% | ||
| 3,004,133 | 20,223 | 100,00% | 99,00% | 1813,15 | 87,06% | 86,74% | 82,51% | 86,74% | 85,87% | |||
| 2,689 | 12 | 42,20% | 96,60% | 1,3 | 0,05% | 0,08% | 0,06% | 0,03% | 0,07% | 0,06% | ||
| 3,886 | 20 | 43,10% | 99,00% | 1,93 | 0,09% | 0,11% | 0,09% | 0,05% | 0,11% | 0,09% | ||
| 8,590 | 52 | 92,30% | 99,10% | 6,34 | 0,22% | 0,25% | 0,29% | 0,23% | 0,25% | 0,25% | ||
| 70,672 | 468 | 99,10% | 96,90% | 61,27 | 2,01% | 2,04% | 2,79% | 2,02% | 1,98% | 2,18% | ||
| 0 | 0 | 0 | 0 | 0 | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | ||
| 1,034 | 2 | 4,50% | 92,90% | 0,05 | 0,01% | 0,03% | 0,00% | 0,00% | 0,03% | 0,01% | ||
| 105,986 | 703 | 35,00% | 99,00% | 69,76 | 3,03% | 3,06% | 3,17% | 1,07% | 3,03% | 2,69% | ||
| 23,711 | 154 | 96,40% | 99,20% | 20,51 | 0,66% | 0,68% | 0,93% | 0,66% | 0,68% | 0,73% | ||
| 1,705 | 5 | 14,70% | 99,70% | 0,53 | 0,02% | 0,05% | 0,02% | 0,01% | 0,05% | 0,03% | ||
| 970 | 2 | 12,70% | 100,00% | 0,13 | 0,01% | 0,03% | 0,01% | 0,00% | 0,03% | 0,01% | ||
| 8,425 | 52 | 97,70% | 98,90% | 6,21 | 0,22% | 0,24% | 0,28% | 0,24% | 0,24% | 0,25% | ||
| 4,642 | 25 | 19,00% | 99,20% | 2,32 | 0,11% | 0,13% | 0,11% | 0,03% | 0,13% | 0,10% | ||
| # Nucleotides | # Sequences | % Of Ref Seq | % Pairwise Identity | Mean Cover. | ||||||||
| 91,888 | 606 | 96,40% | 98,20% | 89 | 2,86% | 2,90% | 4,62% | 2,33% | 2,88% | 3,13% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,01% | 0,04% | 0,01% | 0,00% | 0,03% | 0,02% | ||
| 3,231,287 | 21,757 | 100,00% | 87,70% | 2413,9 | 96,13% | 95,85% | 94,02% | 95,85% | 94,32% | |||
| 1,221 | 2 | 13,60% | 77,10% | 0,1 | 0,02% | 0,04% | 0,01% | 0,01% | 0,04% | 0,02% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,02% | 0,05% | 0,01% | 0,00% | 0,05% | 0,02% | ||
| 2,355 | 9 | 44,50% | 97,80% | 1,1 | 0,06% | 0,09% | 0,08% | 0,06% | 0,09% | 0,08% | ||
| 18,410 | 116 | 87,00% | 99,00% | 15,9 | 0,48% | 0,51% | 0,70% | 0,39% | 0,51% | 0,52% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,09% | 0,12% | 0,08% | 0,01% | 0,12% | 0,09% | ||
| 6,670 | 38 | 55,10% | 99,40% | 5,1 | 0,22% | 0,25% | 0,34% | 0,18% | 0,25% | 0,25% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | ||
| 4,147 | 21 | 47,70% | 98,80% | 2,7 | 0,10% | 0,13% | 0,14% | 0,09% | 0,13% | 0,12% | ||
| 0 | 0 | 0,00% | 0,00% | 0 | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | 0,00% | ||
Legend. (a) Mapping of a FASTQ file against a selected library of 16 type strains of pathogenic yeasts; (b) Mapping of the FASTQ file against the type strains of the presumptive species and the resulting mapping of the residual unused reads.
Fig. 4.Distribution of the similarity to ex-type strains with different analytical combinations. A. BTL Local - similarity to the correct species. B. BBm - similarity to the correct species. C. BTL Local - similarity to the incorrect species. D. BBm - similarity to the incorrect species. Light Red = mapping M1; Light Blue = mappings M2 and M3; Yellow = mapping M1; Green = Mappings M2 and M3.
Fig. 5.Estimation of the internal variability among the rDNA copies.