| Literature DB >> 35746732 |
Emilio Mastriani1, Kathrina Mae Bienes1, Gary Wong2, Nicolas Berthet1,3.
Abstract
The taxonomic classification of viral sequences is frequently used for the rapid identification of pathogens, which is a key point for when a viral outbreak occurs. Both Oxford Nanopore Technologies (ONT) MinION and the Illumina (NGS) technology provide efficient methods to detect viral pathogens. Despite the availability of many strategies and software, matching them can be a very tedious and time-consuming task. As a result, we developed PIMGAVir and Vir-MinION, two metagenomics pipelines that automatically provide the user with a complete baseline analysis. The PIMGAVir and Vir-MinION pipelines work on 2nd and 3rd generation data, respectively, and provide the user with a taxonomic classification of the reads through three strategies: assembly-based, read-based, and clustering-based. The pipelines supply the scientist with comprehensive results in graphical and textual format for future analyses. Finally, the pipelines equip the user with a stand-alone platform with dedicated and various viral databases, which is a requirement for working in field conditions without internet connection.Entities:
Keywords: 2nd and 3rd sequencing generation; metagenomic pipeline; multiple strategies analysis; taxonomic classification
Mesh:
Year: 2022 PMID: 35746732 PMCID: PMC9230805 DOI: 10.3390/v14061260
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.818
List of used packages from PIMGAVir and Vir-MinION.
| Package Name | Version | Pipeline | Task |
|---|---|---|---|
| TrimGalore [ | 0.6.5-1 | PIMGAVir | preprocessing |
| SortMeRNA [ | 4.3.4 | PIMGAVir | filtering |
| diamond [ | 2.0.11.149 | PIMGAVir | filtering |
| KronaTools [ | 2.8.1 | PIMGAVir | filtering |
| Taxonkit [ | 0.10.1 | PIMGAVir | filtering |
| seqtk [ | 1.3 | PIMGAVir | filtering |
| megahit [ | v1.2.9 | PIMGAVir/Vir-MinION | assembly |
| flye [ | v2.9 | Vir-MinION | assembly |
| quast [ | v5.0.2 | PIMGAVir | assembly |
| spades [ | 3.13.1 | PIMGAVir | assembly |
| bowtie2 [ | 2.4.4 | PIMGAVir | assembly |
| samtools [ | 1.10-3 | PIMGAVir | assembly |
| pilon [ | 1.23 | PIMGAVir | assembly |
| Prokka [ | 1.14.6 | PIMGAVir | assembly |
| kraken2 [ | 2.1.2 | PIMGAVir/Vir-MinION | taxonomy |
| kaiju [ | 1.8.2 | PIMGAVir/Vir-MinION | taxonomy |
| blastn [ | 2.9.0+ | PIMGAVir/Vir-MinION | taxonomy |
| seqkit [ | 2.0.0 | PIMGAVir | clustering |
| vsearch [ | v2.18.0 | PIMGAVir | clustering |
| guppy_basecaller [ | 5.0.13 | Vir-MinION | basecalling |
| NanoFilt [ | 2.3.0 | Vir-MinION | filtering |
| guppy_barcoder [ | 5.0.13 | Vir-MinION | demultiplexing |
| NGSpeciesID [ | 0.1.2.1 | Vir-MinION | clustering |
| medaka [ | 0.11.5 | Vir-MinION | clustering |
Databases used by PIMGAVir and Vir-MinION.
| Database Name | Version | Build Up Date |
|---|---|---|
| diamond/refseq_protein_nonredund | Refseq protein non redundant genomes Database format version = 3 | 6–14 May 2022 |
| krakendb/SILVA_138.1_SSURef_NR99_tax_silva | Silva DB v. 138.1 | 17 May 2021 |
| krakenviral/database.kraken | Kraken Viral DB v2.0.8 | 17 May 2021 |
| NCBI-RefSeq/viralseq_2021-12-14_14-45-53 | Refseq viruses’ representative genomes. BLASTDB Version: 5 | 14 December 2021 |
| SILVA/ssr138, slr138 | Ribosomal DB for SSR138 and SLR138 | 27 August 2020 |
Figure 1PIMGAVir pipeline. (A) PIMGAVir flowchart showing the pre-processing task, the filtering option, the execution of the three strategies and the taxonomic classification. (B) Example of the data structure organization for the ass_based strategy. (C) Simplification of the PIMGAVir system architecture.
Figure 2PIMGAVir scripts. (A) Table showing some of the results and log files produced by every strategy and the invoked script. (B) Table reporting the databases and the packages used by every script. (C) Table containing the list of scripts used in PIMGAVir and its parameters.
Figure 3Vir-MinION pipeline. (A) Vir-MinION flowchart showing the execution of the three strategies. (B) Vir-MinION table showing the scripts, packages and DBs used. (C) Vir-MinION data structure at the end of the ALL strategy execution.
Statistical values. The table reports the statistical values of the simulated data produced using CAMISIM.
| Reference Genome | Fragment Mean Size | Total Number of Reads | Number of Reads per Genome | Coverage with the Reference (%) | Average Depth per Genome | ||
|---|---|---|---|---|---|---|---|
|
| 2G |
| 270 | 666,618 | 800 | 0.12 | 34.31 |
|
| 666 | 0.01 | 1.35 | ||||
|
| 733 | 0.11 | 15.12 | ||||
|
| 655,218 | 98.29 | 55.44 | ||||
| 3G |
| 500 | 13,499 | 5382 | 0.20 | 3348.52 | |
|
| 423 | 0.03 | 198.974 | ||||
|
| 5485 | 0.18 | 2571.08 | ||||
|
| 2205 | 97.30 | 5.07815 | ||||
PIMGAVir, numerical data flow of data during its execution.
| Task | Starting Number of Reads | Ending Number of Reads | Removed (Number) | Removed (Percentage) |
|---|---|---|---|---|
| Trimming | 666,618 | 642,614 | 24254 | 3.64 |
| Ribosomal removing | 642,614 | 640,614 | 1750 | 0.26 |
| Filtering unwanted | 640,614 | 170,960 | 469,654 | 70.45 |
| MEGAHIT | 170,960 | 884 | 170,086 | 99.48 |
| SPADES | 170,960 | 1354 | 169,616 | 99.20 |
| clustering | 170,960 | 44,783 | 126,187 | 73.80 |
PIMGAVir statistics. The table shows the statistical values associated to the taxonomic classification of PIMGAVir.
| Approach | DB | Hepatitis A Virus—taxid: 12092 | Ippy Virus (Segment S + L)—taxid: 55096 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Name | Total of | Average Size | Total of | % of Reads/Contigs | Coverage of | Accuracy | Total of | % of | S (Ippy-01) | L (Ippy-02) | |||
| % of Coverage | Accuracy %of Align | % of Coverage | Accuracy %of Align | ||||||||||
| read-based | kraken | 170,960 | 139.9 | 648 | 0.38 | 91 | 100 | 790 | 0.46 | 98 | 91.65 | 69 | 7.97 |
| Kaiju | 170,960 | 139.9 | 644 | 0.37 | 91 | 38.69 | 780 | 0.45 | 98 | 44.42 | 72 | 3.92 | |
| BLASTN | 170,690 | 139.9 | 648 | 0.37 | 100 | 100 | 793 | 0.46 | 98 | 91.42 | 72 | 8.07 | |
| clustering-based | kraken | 44,783 | 145.7 | 247 | 0.55 | 100 | 100 | 237 | 0.55 | 97 | 76.25 | 70 | 22.50 |
| kaiju | 44,783 | 145.7 | 244 | 0.54 | 90 | 100 | 233 | 0.52 | 97 | 75.97 | 72 | 22.75 | |
| BLASTN | 44,783 | 145.7 | 247 | 0.55 | 90 | 100 | 241 | 0.54 | 97 | 76.25 | 72 | 22.50 | |
| assembly-based-megahit | kraken | 884 | 728.6 | 4 | 0.45 | 90 | 100 | 1 | 0.11 | 96 | 100 | 0 | 0 |
| kaiju | 884 | 728.6 | 4 | 0.45 | 90 | 100 | 1 | 0.11 | 96 | 100 | 0 | 0 | |
| BLASTN | 884 | 728.6 | 4 | 0.45 | 90 | 100 | 1 | 0.11 | 96 | 100 | 0 | 0 | |
| assembly-based-spades | kraken | 1354 | 617 | 1 | 0.07 | 91 | 100 | 7 | 0.52 | 98 | 14.29 | 23 | 85.71 |
| kaiju | 1354 | 617 | 1 | 0.07 | 91 | 100 | 7 | 0.52 | 98 | 14.29 | 23 | 85.71 | |
| BLASTN | 1354 | 617 | 1 | 0.07 | 91 | 100 | 7 | 0.52 | 98 | 14.29 | 23 | 85.71 | |