| Literature DB >> 32596428 |
Raquel de M Barbosa1,2,3, Marcelo A C Fernandes2,4,5.
Abstract
As of May 25, 2020, the novel coronavirus disease (called COVID-19) spread to more than 185 countries/regions with more than 348,000 deaths and more than 5,550,000 confirmed cases. In the bioinformatics area, one of the crucial points is the analysis of the virus nucleotide sequences using approaches such as data stream techniques and algorithms. However, to make feasible this approach, it is necessary to transform the nucleotide sequences string to numerical stream representation. Thus, the dataset provides four kinds of data stream representation (DSR) of SARS-CoV-2 virus nucleotide sequences. The dataset provides the DSR of 1557 instances of SARS-CoV-2 virus, 11540 other instances of other viruses from the Virus-Host DB dataset, and three instances of Riboviria viruses from NCBI (Betacoronavirus RaTG13, bat-SL-CoVZC45, and bat-SL-CoVZXC21).Entities:
Keywords: COVID-19; Data stream; SARS-CoV-2
Year: 2020 PMID: 32596428 PMCID: PMC7306612 DOI: 10.1016/j.dib.2020.105829
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Example of the DM-DSR values for the SARS-CoV-2 sequence () stored in dataset (MT126808 - Brazil).
Fig. 2Example of the DM-CGR-DSR values for the SARS-CoV-2 sequence () stored in dataset (MT126808 - Brazil).
Fig. 3Example of the kMersM-DSR values for the SARS-CoV-2 sequence () stored in dataset (MT126808 - Brazil).
Fig. 4Example of the kMersM-CGR-DSR values for the SARS-CoV-2 sequence () stored in dataset (MT126808 - Brazil).
| Specifications Table | |
|---|---|
| Subject | Biochemistry, Genetics and Molecular Biology (General) |
| Specific subject area | Bioinformatics |
| Type of data | Table |
| Number | |
| How data were acquired | NCBI - Genbank - SARS-CoV2 |
| Virus-Host-DB | |
| Matlab Software | |
| Excel Software | |
| Data format | Raw and analyzed data are in Matlab file (.mat) and Microsoft Excel file (.xlsx). |
| Parameters for data collection | The entire dataset was generated using MATLAB 2019b on Windows operating system with Intel Core - i5 6500T 2.5 GHz quad-core processor with 16GB of RAM. |
| Description of data collection | The raw data were downloaded from NCBI - Genbank, and Virus-Host-DB. The data stream values were generated using Matlab. |
| Data source location | Laboratory of Machine Learning and Intelligent Instrumentation, IMD/nPITI, Federal University of Rio Grande do Norte. |
| Data accessibility |