| Literature DB >> 31697321 |
Marco Cacciabue1,2, Anabella Currá1,2, Elisa Carrillo1, Guido König1, María Inés Gismondi1,2.
Abstract
Deep sequencing of viral genomes is a powerful tool to study RNA virus complexity. However, the analysis of next-generation sequencing data might be challenging for researchers who have never approached the study of viral quasispecies by this methodology. In this work we present a suitable and affordable guide to explore the sub-consensus variability and to reconstruct viral quasispecies from Illumina sequencing data. The guide includes a complete analysis pipeline along with user-friendly descriptions of software and file formats. In addition, we assessed the feasibility of the workflow proposed by analyzing a set of foot-and-mouth disease viruses (FMDV) with different degrees of variability. This guide introduces the analysis of quasispecies of FMDV and other viruses through this kind of approach.Entities:
Keywords: Illumina sequencing platform; analysis workflow; haplotype reconstruction; open-source software; sub-consensus SNV; viral quasispecies
Year: 2020 PMID: 31697321 PMCID: PMC7110011 DOI: 10.1093/bib/bbz086
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Workflow overview. The main steps suggested for NGS analysis and the corresponding file types are indicated in the boxes. Software names are displayed in red. Green dashed lines show particular algorithms for quasispecies reconstruction and SNV detection.
Figure 2Coverage distribution of sequenced samples. The coverage distribution for the sequenced samples obtained with filtered, trimmed reads. Each sample is indicated with different colors. The FMDV genome is represented at the top of the figure.
Statistics of datasets
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| A01L | 849 598 | 445 118 | 9256 (6045) | 330 (87) | 39.99 |
| A01NL | 874 490 | 541 604 | 10 469 (7381) | 298 (96) | 39.91 |
| A01Lc | 646 060 | 408 224 | 6746 (5008) | 250 (95) | 39.87 |
| CapLc | 481 696 | 314 522 | 6199 (4191) | 303 (95) | 39.90 |
aUnfiltered reads
bFiltered, trimmed and concordant reads
Statistics for polymorphic sites
|
|
|
|
|---|---|---|
|
| 302 | 135 |
|
| 199 | 92 |
|
| 77 | 10 |
|
| 45 | 3 |
Figure 3Distribution of SNVs across the FMDV genome. (A) The frequency of the variants detected in each sample is indicated with different colors and symbols. (B) Circos histogram displaying the similarity between A01L, A01NL and A01Lc genomes. Variants with frequency above 1% that were called by LoFreq are shown. The FMDV genome is represented with a line and divided in three regions with different colors (light yellow: 5′UTR, light green: polyprotein-coding sequence, grey: 3′UTR) in the periphery of the circle. The light blue bars represent the frequency of each variant detected (in log scale). Variants shared by two samples (A01L and A01NL or A01L and A01Lc) are linked by the colored lines (light green: non-coding mutation, light blue: synonymous mutation, light red: non synonymous mutation).
Statistics for haplotype reconstruction with tuned parameters
| Dataset | Software | Parameter values | Recall | Precision | RMSD |
|---|---|---|---|---|---|
| 50/50 | QuRe | 1E-25 0.00035 100 | 1 | 1.00 | 9.76 |
| ViQuaS | 20 0.7 | 1 | 0.67 | 23.72 | |
| CliqueSNV | t 10 tf 0.01 | 1 | 1.00 | 7.20 | |
| 70/30 | QuRe | 1E-25 0,00035 100 | 1 | 1.00 | 0.55 |
| ViQuaS | 35 0.7 | 1 | 0.67 | 14.49 | |
| CliqueSNV | t 10 tf 0.01 | 1 | 1.00 | 6.59 | |
| 90/10 | QuRe | 1E-25 0.00035 100 | 0.5 | 1.00 | 10.00 |
| ViQuaS | 5 0.7 | 1 | 0.67 | 2.78 | |
| CliqueSNV | t 10 tf 0.01 | 1 | 1.00 | 2.00 |
aRecall was calculated as the true positive haplotypes/expected number of haplotypes.
bOnly one mutation (with regard to the closest variant) was allowed for a reconstructed haplotype to be consider as a true positive haplotype.
cPrecision was calculated as the true positive haplotypes/total number of haplotypes reconstructed.
dRMSD is the root mean square deviation of the frequency estimations of the two expected haplotypes.
Figure 4Haplotype networks for A01L and A01NL viruses. Haplotype networks were created under the Median-joining method (with epsilon = 0 to minimize the distances) using SplitsTree4 software. (A) FMDV genome representation. The blue bar indicates the genomic region used for haplotype reconstruction. (B) A01L haplotype network. (C) A01NL haplotype network. Font color indicates the program used to obtain the corresponding haplotype: green for QuRe, red for ViQuaS and blue for CliqueSNV. Black color indicates sequences obtained by more than one software; Sanger consensus sequences were included as reference. Only haplotypes representing 75 % of total frequency are presented. For each sequence, the name indicates the software that was used for the analysis, number of haplotype and estimated frequency.