| Literature DB >> 35789906 |
István Prazsák1, Zsolt Csabai1, Gábor Torma1, Henrietta Papp2,3, Fanni Földes2,3, Gábor Kemenesi2,3, Ferenc Jakab2,3, Gábor Gulyás1, Ádám Fülöp1, Klára Megyeri4, Béla Dénes5, Zsolt Boldogkői1, Dóra Tombácz1,6.
Abstract
Long-read sequencing (LRS) approaches shed new light on the complexity of viral (Kakuk et al., 2021 [1]; Boldogkői et al., 2019 [2]; Depledge et a., 2019 [3]), bacterial (Yan et al., 2018 [4]) and eukaryotic (Tilgner et al., 2014 [5]) transcriptomes. Emerging RNA viruses are zoonotic (Woolhouse et al., 2016 [6]) and create public health problems, e.g. influenza pandemic caused by H1N1 virus in (Fraser et al., 2009 [7]), as well as the current SARS-CoV-2 pandemic (Kim et al., 2020 [8]). In this study, we carried out nanopore sequencing for generating transcriptomic data valuable for structural and kinetic profiling of six important human pathogen RNA viruses, the H1N1 subtype of Influenza A virus (IVA), the Zika virus (ZIKV), the West Nile virus (WNV), the Crimean-Congo hemorrhagic fever virus (CCHFV), the Coxsackievirus [group B serotype 5 (CVB5)] and the Vesicular stomatitis Indiana virus (VSIV), and the response of host cells upon viral infection. The raw sequencing data were filtered during basecalling and only high quality reads (Qscore ≥ 7) were mapped to the appropriate viral and host genomes. Length distribution of sequencing reads were assessed and statistics of data were plotted by the ReadStat.4 python script. The datasets can be used to profile the transcriptomic landscape of RNA viruses, provide information for novel gene annotations, can serve as resource for studying the virus-host interactions, and for the analysis of RNA base modifications. These datasets can be used to compare the different sequencing techniques, library preparation approaches, bioinformatics pipelines, and to analyze the RNA profiles of viruses with small RNA genomes.Entities:
Keywords: Coxsackievirus; Crimean-Congo hemorrhagic fever virus; Influenza a virus; Third-generation sequencing; Transcriptome profiling; Vesicular stomatitis Indiana virus; West Nile virus; Zika virus
Year: 2022 PMID: 35789906 PMCID: PMC9249600 DOI: 10.1016/j.dib.2022.108386
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Classification of viruses examined in this study.
Details of the viruses, the experimental setup, the raw data statistics and ENA accession numbers A. Basic information of the examined viruses. B. Experimental conditions and summary statistics of the obtained dataset. Abbreviations: kb: kilobase, ss: single-stranded, MDCK: Madin-Darby canine kidney, Vero: kidney epithelial cells extracted from an African green monkey (Chlorocebus sabaeus), sp: species, MOI: multiplicity of infection C. European Nucleotide Archive (ENA) project accession numbers. D. GenBank IDs of viral and host reference genomes used for this study.
| A | Viruses | IVA | ZIKV | WNV | CCHFV | CVB5 | VSIV | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Genome length | 13.5 kb | 10.8 kb | 11.0 kb | 19.2 kb | 7.3 kb | 11 kb | ||||
| Genome type | (-)ssRNA | (+)ssRNA | (+)ssRNA | (-)ssRNA | (+)ssRNA | (-)ssRNA | ||||
| Number of segments | 8 | - | - | 3 | - | - | ||||
| Number of genes | 11 | 1 | 1 | 3 | 1 | 5 | ||||
| PolyA-tailed mRNA | ✓ | - | - | - | ✓ | ✓ | ||||
| 5’-Cap | ✓ | ✓ | ✓ | ✓ | - | ✓ | ||||
| Vectors | - | Aedes mosquitoes | Culex mosquitoes | Ixodid (hard) ticks (Hyalomma sp) | - | Black flies (Simulium sp) | ||||
| Reservoirs | Wild birds | Monkeys, Human | Wild birds | Hard ticks | Human | Horse, cattle, pig | ||||
| B | Viral strain | H1N1 | MR766 | Own isolate, Serbia, 2014 | Kosova Hoti | B5 | ||||
| Host cell(s) | MDCK | Vero | Vero | Vero | Vero | Vero | T98G | |||
| Examined time points p.i. | 1 h, 4 h, 7 h | 24 h, 72 h | 24 h, 72 h | 24 h, 72 h | 1h, 6h, 15h, 24h | 1 h, 6 h, 15 h, 24 h | ||||
| V1 | V2 | G1 | G2 | |||||||
| MOI | 10 | low | low | low | low | 5 | 5 | 5 | 5 | |
| Transcript read counts | 10916 | 592 | 21790 | 528 | 1508 | 193820 | 273882 | 289807 | 255610 | |
| Average read lengths | 812 | 634 | 491 | 683 | 630 | 805 | 954 | 1021 | 1122 | |
| Maximum mapped read length | 2.28 kb | 2.59 kb | 2.60 kb | 1.86 kb | 6.99 kb | 4.45 kb | 6.38 kb | 5.83 kb | 6.24 kb | |
| C | Project ID (ENA) | PRJEB46600 | PRJEB46591 | PRJEB46598 | PRJEB46127 | |||||
| D | Virus reference genome ID | GCF_001343785.1 | NC_012532.1 | NC_001563.2 | NC_005300.2 | AF114383 | NC_001560.1 | |||
| Host reference genome ID | GCA_000002285.4 | GCF_000409795.2 | GCF_000409795.2 | GCF_000409795.2 | GCF_000409795.2 | GCA_000409795.2 | GCA_000001405.28 | |||
Fig. 2Schematic overview of the study and the bioinformatic pipeline.
Fig. 3Violin plots of read length distribution of the sequencing data.
| Subject | Biological sciences |
| Specific subject area | Omics: Transcriptomics |
| Type of data | Raw |
| How the data were acquired | Sequencing – Oxford Nanopore MinION R9.4 SpotOn and Flongle Flow Cells |
| Data format | filtered data: after basecalling, reads were filtered based on quality score (Qscore ≥ 7) and passed mapped reads are stored in BAM files |
| Description of data collection | Various cell cultures were infected with six human pathogen RNA viruses. Total RNA was isolated from the infected cells at different time points after viral infection. Libraries were generated and then sequencing reactions were carried out on a MinION (Oxford Nanopore Technologies) device. Guppy 3.6 was used for basecalling and minimap2 for aligning the raw reads to the viral and host genomes. |
| Data source location | Coxsackievirus (CVB5) Institution: Public Health and Food Chain Safety Service of Government Office for Csongrád County, Laboratory Department City/Town/Region: Szeged Country: Hungary City/Town/Region: Vojvodina Country: Serbia 2013 Institution: National Center for Epidemiology City/Town/Region: Budapest Country: Hungary |
| Vesicular stomatitis Indiana virus Institution: Department of Medical Microbiology and Immunobiology, University of Szeged City/Town/Region: Szeged Country: Hungary City/Town/Region: Balkan Peninsula | |
| Data accessibility | The available BAM files containing the reads aligned to the reference genomes are available at ENA and can be used without restrictions. |