| Literature DB >> 27605096 |
Nicolas Berthet1,2,3, Stéphane Descorps-Declère4, Andriniaina Andy Nkili-Meyong5, Emmanuel Nakouné6, Antoine Gessain7,8, Jean-Claude Manuguerra9, Mirdad Kazanji6.
Abstract
BACKGROUND: New sequencing technologies have opened the way to the discovery and the characterization of pathogenic viruses in clinical samples. However, the use of these new methods can require an amplification of viral RNA prior to the sequencing. Among all the available methods, the procedure based on the use of Phi29 polymerase produces a huge amount of amplified DNA. However, its major disadvantage is to generate a large number of chimeric sequences which can affect the assembly step. The pre-process method proposed in this study strongly limits the negative impact of chimeric reads in order to obtain the full-length of viral genomes.Entities:
Keywords: Amplification with phi29 polymerase; Assembling genome; Next generation sequencing; RNA viral genome; SPAdes
Mesh:
Substances:
Year: 2016 PMID: 27605096 PMCID: PMC5015205 DOI: 10.1186/s40659-016-0099-y
Source DB: PubMed Journal: Biol Res ISSN: 0716-9760 Impact factor: 5.612
Fig. 1Figure describing the main steps of retrotranscription, amplification of RNA and sequencing (a) and the viral reads’ filtering method (b). This method is divided in different parts. The first part obtains all reads in Fasta format after different types of filtration steps. The second step aims at selecting only the viral part in each read using a similarity-based approach. Finally, the last step is to perform assembly using different algorithms with targeted sequences. HTS high throughput sequencing; cDNA complementary DNA; ssDNA single strand DNA
Overview of sequencing data
| Middelburg ArTB 5290 | Mengovirus AnrB 3741 | Mengovirus ArB 19017 | |
|---|---|---|---|
| Total number of reads | 11,875,121 | 11,925,315 | 12,708,896 |
| Mean read length (bp) | 101 | 101 | 101 |
|
| 4,951,166 (41.69 %) | 5,207,943 (43.67 %) | 4,218,708 (33.35 %) |
| Trimmed reads (after host removal) | 6,549,596 (58.31 %) | 6,372,036 (53.43 %) | 8,097,092 (63.71 %) |
| Mean trimmed read length (bp) | 93.75 | 90.49 | 87.13 |
| Mean Phred score | 36 | 36 | 36 |
| Total number of viral reads | 357,760 (5.46 %) | 136,903 (2.1 %) | 495,356 (6.1 %) |
Assembly of Mengovirus and MIDV genomes with different assembler software with targeted and untargeted sequences obtained after selection using similarity-based approach
| ArTB 5290 | ArB 19017 | AnrB 3741 | |||||
|---|---|---|---|---|---|---|---|
| Targeted | Untargeted | Targeted | Untargeted | Targeted | Untargeted | ||
| ABYSS | Contigs (≥50 bp) | 2463 | 11,822 | 24 | 20,618 | 15 | 5905 |
| Contigs (≥1000 bp) | 3 | 0 | 2 | 0 | 1 | 0 | |
| Largest contig | 1807 | 372 | 3219 | 195 | 7025 | 266 | |
| Mean length | 88 | 84 | 576 | 82 | 392 | 82 | |
| N50 | 1185 | 124 | 2574 | 124 | 7025 | 124 | |
| L50 | 3 | 1501 | 1 | 2218 | 1 | 599 | |
| Ray | Contigs (≥50 bp) | 2 | 35 | 3 | 12 | 4 | 28 |
| Contigs (≥1000 bp) | 2 | 0 | 2 | 2 | 2 | 0 | |
| Largest contig | 6906 | 837 | 4088 | 1873 | 4360 | 552 | |
| Mean length | 4844 | 284 | 2013 | 210 | 2571 | 523 | |
| N50 | 6906 | 575 | 4088 | 1102 | 3745 | 552 | |
| L50 | 1 | 2 | 1 | 2 | 1 | 1 | |
| SPAdes v3.0 | Contigs (≥50 bp) | 2 | 371 | 1 | 16 | 1 | 322 |
| Contigs (≥1000 bp) | 1 | 2 | 1 | 2 | 1 | 0 | |
| Largest contig | 11,468 | 1065 | 7548 | 3492 | 7562 | 951 | |
| Mean length | 5789 | 145 | 7562 | 127 | 7548 | 110 | |
| N50 | 11,468 | 892 | 7548 | 3492 | 7562 | 951 | |
| L50 | 1 | 3 | 1 | 1 | 1 | 1 | |
| SPAdes v3.5/SPAdes v3.6 | Contigs (≥50 bp) | 5 | 10,435 | 7 | 569 | 2 | 322 |
| Contigs (≥1000 bp) | 1 | 0 | 1 | 0 | 1 | 0 | |
| Largest contig | 11,314 | 331 | 7548 | 396 | 7548 | 951 | |
| Mean length | 2359 | 100 | 3896 | 127 | 1174 | 110 | |
| N50 | 11,314 | 129 | 7548 | 141 | 7548 | 951 | |
| L50 | 1 | 2492 | 1 | 113 | 1 | 1 | |
The reads whose regions of the viral sequences were selected within the reads were named «Targeted Sequences» or TS, whereas the untreated sequences were named «Untargeted Sequences» or US
Percentage of reads which unmapped on contigs generated with different assemblers
| Middelburg ArTB 5290 | Mengovirus AnrB 3741 | Mengovirus ArB 19017 | p valuea | ||
|---|---|---|---|---|---|
| Abyss | Targeted sequences | 6.99 % | 2.64 % | 2.25 % | 0.04 |
| Untargeted sequences | 40.59 % | 36.93 % | 69.78 % | ||
| Ray | Targeted sequences | 6.76 % | 0.92 % | 2.26 % | 0.14 |
| Untargeted sequences | 53.84 % | 51.5 % | 71.41 % | ||
| SPAdes 3.0 | Targeted sequences | 0.57 % | 0.82 % | 0.65 % | 0.001 |
| Untargeted sequences | 47.39 % | 55.87 % | 60.02 % | ||
| SPAdes 3.5/3.6 | Targeted sequences | 3.05 % | 0.6 % | 0.48 % | 0.02 |
| Untargeted sequences | 34.12 % | 55.54 % | 60.02 % | ||
| p valuea | 0.13 | 0.003 | 0.01 | 5.10e−6 |
a Determined according to the Fisher Test
Graph features from targeted, untargeted and chimeric-part-removed reads
| Parameters | ||
|---|---|---|
| Number of vertices | Number of edges | |
| MIDV ARB5290 | ||
| Targeted sequences | 220,838 | 438,926 |
| Untargeted sequences | 719,848 | 1,442,274 |
| Chimeric reads removal | N.A | N.A |
| Mengo 19017 | ||
| Targeted sequences | 790,086 | 1,572,606 |
| Untargeted sequences | 1,411,912 | 2,803,374 |
| Chimeric reads removal | 573,848 | 1,143,146 |
| Mengo 3741 | ||
| Targeted sequences | 135,440 | 264,970 |
| Untargeted sequences | 534,694 | 1,052,622 |
| Chimeric reads removal | 58,452 | 115,682 |
Fig. 2Phylogenetic tree of mengo and encephalomyocarditis viruses. Phylogenetic analysis was based on nucleic acid sequences of the whole genomes of mengo and encephalomyocarditis viruses from the NCBI database