| Literature DB >> 34773807 |
Lifei Ma1, Huiyang Li2, Jinping Lan3, Xiuqing Hao4, Huiying Liu4, Xiaoman Wang5, Yong Huang6.
Abstract
Novel coronavirus disease 2019 (COVID-19) is a global pandemic caused by severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2), which can be transmitted from person to person. As of September 21, 2021, over 228 million cases were diagnosed as COVID-19 infection in more than 200 countries and regions worldwide. The death toll is more than 4.69 million and the mortality rate has reached about 2.05% as it has gradually become a global plague, and the numbers are growing. Therefore, it is important to gain a deeper understanding of the genome and protein characteristics, clinical diagnostics, pathogenic mechanisms, and the development of antiviral drugs and vaccines against the novel coronavirus to deal with the COVID-19 pandemic. The traditional biology technologies are limited for COVID-19-related studies to understand the pandemic happening. Bioinformatics is the application of computational methods and analytical tools in the field of biological research which has obvious advantages in predicting the structure, product, function, and evolution of unknown genes and proteins, and in screening drugs and vaccines from a large amount of sequence information. Here, we comprehensively summarized several of the most important methods and applications relating to COVID-19 based on currently available reports of bioinformatics technologies, focusing on future research for overcoming the virus pandemic. Based on the next-generation sequencing (NGS) and third-generation sequencing (TGS) technology, not only virus can be detected, but also high quality SARS-CoV-2 genome could be obtained quickly. The emergence of data of genome sequences, variants, haplotypes of SARS-CoV-2 help us to understand genome and protein structure, variant calling, mutation, and other biological characteristics. After sequencing alignment and phylogenetic analysis, the bat may be the natural host of the novel coronavirus. Single-cell RNA sequencing provide abundant resource for discovering the mechanism of immune response induced by COVID-19. As an entry receptor, angiotensin-converting enzyme 2 (ACE2) can be used as a potential drug target to treat COVID-19. Molecular dynamics simulation, molecular docking and artificial intelligence (AI) technology of bioinformatics methods based on drug databases for SARS-CoV-2 can accelerate the development of drugs. Meanwhile, computational approaches are helpful to identify suitable vaccines to prevent COVID-19 infection through reverse vaccinology, Immunoinformatics and structural vaccinology.Entities:
Keywords: Application; Bioinformatics technology; COVID-19; SARS-CoV-2
Mesh:
Substances:
Year: 2021 PMID: 34773807 PMCID: PMC8560182 DOI: 10.1016/j.compbiolchem.2021.107599
Source DB: PubMed Journal: Comput Biol Chem ISSN: 1476-9271 Impact factor: 2.877
Fig. 1The descriptive workflow of the most important methods and applications of bioinformatics technologies on COVID-19.
Databases of genome and protein structure.
| Type | Data center |
|---|---|
| Genome sequence | Global Initiative on Sharing Avian Influenza Data (GISAID, |
| Genome sequence | National Center for Biotechnology Information (NCBI, |
| Genome sequence | National Microbiology Data Center (NMDC, |
| Genome sequence | China National GeneBank (CNGB, |
| Genome sequence | China National Center for Bioinformation (CNCB, |
| Protein sequence/function | Universal Protein (UniProt, |
| Protein sequence/structure | Research Collaboratory for Structural Bioinformatics Protein Data Bank Database (RCSB PDB, |
Fig. 2Structure of SARS-CoV-2. (A) Schematic representation of the structure of SARS-CoV-2. It has four structural proteins, S (spike), E (envelope), M (membrane), and N (nucleocapsid) proteins; the N protein holds the single-strand, positive-sense RNA genome, and the S, E, and M proteins together create the viral envelope. (B) SARS-CoV-2 genome comprises a 5′ untranslated region (5′ UTR) including 5′ leader sequence, open reading frame (ORF) 1a/b, envelope, membrane and nucleoprotein, accessory proteins such as orf 3, 6,7a, 7b, 8, and 9b, and 3′ untranslated region (3′ UTR) in sequence. (C) SARS-CoV-2 structure is based upon amplification targets of the NTS method. NTS detected 12 fragments including ORF1ab and virulence factor-encoding regions (M. Wang et al., 2020).
Fig. 3Phylogenetic analysis of full-length genomes of SARS-CoV-2 and representative viruses of the genus Betacoronavirus. 2019-nCoV=SARS-CoV-2 = severe acute respiratory syndrome coronavirus type 2. MERS-CoV = Middle East respiratory syndrome coronavirus. SARS-CoV = severe acute respiratory syndrome coronavirus (Roujian Lu et al., 2020).
Comprehensive analysis of the host of SARS-CoV-2.
| Host | Description |
|---|---|
| Bat | Primary host maybe bat. SARS-CoV-2 had a 96.3% sequence identity to BatCoV RaTG13, which originated from Yunnan, China in 2013. SARS-CoV-2 had a 88% identity to bat-SL-CoVZC45 and bat-SL-CoVZXC21 originating from Zhoushan, China, in 2018, whereas it shared 79.5% identity to SARS-CoV and 50% identity to MERS-CoV. |
| Turtle | Turtle maybe Intermediate host. The interaction between the key amino acids of S protein RBD and ACE2 indicated that, turtles (Chrysemys picta bellii, Chelonia mydas, and Pelodiscus sinensis) may act as the potential intermediate hosts. |
| Pangolin | Pangolin maybe Intermediate host. The isolation of a coronavirus from pangolins that is closely related to SARS-CoV-2. |
| Mink | Mink maybe Intermediate host. A high rate of variation within SARS-CoV-2 mink isolates implies that mink populations were infected before human populations. |
| Snake | Snake maybe Intermediate host. SARS-CoV-2 has most similar codon usage bias with snake. |
Similarities and differences in the S protein of β coronavirus.
| Type | Different sequence structures | Receptors |
|---|---|---|
| SARS-CoV-2 | There are 12 short insertion sequences in the N-terminal domain of S protein, including (PRRA)CCT CGG CGG GCA | ACE2 |
| SARS CoV | S1 domain contains 14 unique amino acids | ACE2 |
| BAT-CoV- RaTG13 | S protein contains short insertion sequences in the N-terminal domain | ACE2 |
| MERS-CoV | The S1 domain RBD is different from SARS-CoV, which contains a core structure and a subsidiary subregion that functions as RBM | DPP4(CD26) |
| HCoV-HKU | Three identical S1 domains form an interwoven cap at the top of the S2 stem | unknown |
Comparison of detection methods of SARS-CoV-2.
| Methods | Advantages | Disadvantages |
|---|---|---|
| qRT-PCR | Gold standard, high specific, high rapid, economic, the most robust, and widely applied technology | Low sensitivity, speed is limited, require sophisticated equipment and exhibit false-negative rates |
| NGS | High accurate, high throughout, sequencing for high-quality genome | Equipment is expensive, operator expertise, and a turnaround time of > 24 h |
| NTS | Target amplification and long-read, High rapid, high accurate, higher sensitivity than standard qRT-PCR and sequencing for high-quality genome | Equipment is expensive, and more expensive than qRT-PCR |
| RT-LAMP | High sensitivity, simple operation, fast reaction speed (within 50 min), less dependent on complex equipment, and can detect of viruses, various pathogens, and even viral variants | The limitation of the RT-LAMP assay was calculated to be 118.6 copies per reaction. |
| LAMP-Seq | Low cost, high rapid (within 40 min) and highly sensitive protocol, population-scale testing | Without clinical certification |
| Antibody | Traditional mature technology | Lower sensitivity, and antibody production takes a long time |
| CT | Easy to operate and helpful in diagnosis | Asymptomatic infection cannot be diagnosed |
Drug databases for SARS-CoV-2.
| Databases | Websites/Resources |
|---|---|
| D3Targets-2019-nCoV | |
| CoViLigands | |
| CORDITE | |
| DockCov2 | |
| DrugBank | |
| TCMD | Software |