| Literature DB >> 34425415 |
Juliana M Serpeloni1, Quirino Alves Lima Neto2, Léia Carolina Lucio3, Anelisa Ramão4, Jaqueline Carvalho de Oliveira5, Daniela Fiori Gradia5, Danielle Malheiros5, Adriano Ferrasa6, Rafael Marchi7, David L A Figueiredo8, Wilson A Silva9, Enilze M S F Ribeiro5, Ilce M S Cólus1, Luciane R Cavalli10.
Abstract
In this review, we highlight the interaction of SARS-CoV-2 virus and host genomes, reporting the current studies on the sequence analysis of SARS-CoV-2 isolates and host genomes from diverse world populations. The main genetic variants that are present in both the virus and host genomes were particularly focused on the ACE2 and TMPRSS2 genes, and their impact on the patients' susceptibility to the virus infection and severity of the disease. Finally, the interaction of the virus and host non-coding RNAs is described in relation to their regulatory roles in target genes and/or signaling pathways critically associated with SARS-CoV-2 infection. Altogether, these studies provide a significant contribution to the knowledge of SARS-CoV-2 mechanisms of infection and COVID-19 pathogenesis. The described genetic variants and molecular factors involved in host/virus genome interactions have significantly contributed to defining patient risk groups, beyond those based on patients' age and comorbidities, and they are promising candidates to be potentially targeted in treatment strategies for COVID-19 and other viral infectious diseases.Entities:
Keywords: COVID-19; Genetic polymorphisms; Genome interaction; SARS-CoV-2; SNPs, ncRNA, miRNA, lncRNA
Mesh:
Substances:
Year: 2021 PMID: 34425415 PMCID: PMC8378551 DOI: 10.1016/j.imbio.2021.152130
Source DB: PubMed Journal: Immunobiology ISSN: 0171-2985 Impact factor: 3.152
Gene variants in the SARS-CoV-2 virus genome.
| Genome coverage | Methodological approach | Main results | Reference |
|---|---|---|---|
| Whole genome | Genome sequencing and alignment analysis of 7710 GISAID | Average pairwise difference of 9.6 SNPs between any two genomes Mutation rate of the global diversity of SARS-Cov2 of ~6 × 10-4 nucleotides/genome/year 290 aminoacid alterations in the genomes: 232 synonymous and 58 non-synonymous mutations | |
| Whole genome | Analysis of SARS-CoV-2 sequences using CoV_GLUE ( | Divergence of the two main mutations (S-D614G and nsp12-P323L) from the NCBI (NC_045512) retrieved in all continents with only three cases in Asia Mutations at ORF8-L84S and ORF3a-Q57H (as the third and fourth most frequent mutation, respectively) Co-evolving of the L84S amino acid substitution with three other mutations: nsp4-F308Y, ORF3a-G196V and N-S197L | |
| Whole genome | Genome sequencing and alignment analysis of 94 Genbank genomes | 156 variants and 116 unique variants across the genome (46 missense, 52 synonymous, 2 insertion, 1 deletion and 14 non-coding alleles) C > T and or T > C as the most common variants in the ORF1ab (NSP1-NSP16), ORF8 and, N genes | |
| Whole genome | Genome sequencing and alignment analysis of ~660 genomes- NCBI | Mutations in the S protein (D614G, V483A, L5F, Q675H, H655Y, and S939F) Substitutions at R203K and G204R in the N protein Substitutions at L84S, V62L, and S24L in the ORF8 Non-synonymous mutations in ORF3a (Q57H and G251V Non-synonymous mutations in ORF1ab (T265I, P4715L, P5828L, and Y5865C) | |
| Whole genome | RNA sequencing analysis of NCBI RNA-seq data | A-to-G (59.1%) RNA modifications (caused by RNA deamination) Non-A-to-G variations, G to A (22,4%) and others (18,5%) (caused by replication errors) A-to-G alterations in the N (>de 40%), ORF1AB (~35%), S, M, E, ORF3A, ORF8, ORF7A, and ORFA6 genes | |
| Whole genome | Genome sequencing and alignment analysis of 12,909 genomes/estimation of common ancestor (TMRCA | Indication that COVID-19 might have originated earlier than and outside of Wuhan Seafood Market The genetic polymorphism patterns, including the enrichment of specific haplotypes and the temporal allele frequency trajectories generated from infection clusters, are similar to those caused by evolutionary forces such as natural selection | |
| Whole genome | Genome sequencing and alignment analysis of 106 NCBI genomes | Higher number of mutations in the S protein, Nsp1, RdRp and the ORF8 regions 47 key point mutations/SNPs located along the entire genome sequence in isolates from 12 different countries NSP1 and ORF8 as the two hot spots with mutations and deletions | |
| Whole genome | Genome sequencing and alignment analysis of 167 sequences from 15 distinct geographical locations | 290 sites with variations (S, M and N genes; orf1ab, orf3a, in the envelope protein-coding gene, orf6, orf8, orf7b and orf10) 244/290 variants were of a nucleotide substitution (158 transitions and 86 transversions) High similarity (>99.9%) amongst all locations | |
| Whole genome | Genome sequencing and alignment analysis of 566 genomes from India compared to NCBI | 933 substitutions, 2449 deletions and 2 insertions, in total 3384 unique point mutations: distributed in 100 clusters of mutations (mostly deletions); 1609 substitution, deletion and insertion point mutations, 64 SNPs in coding regions and 7 in 5′-UTR and 3′-UTR Largest number of SNPs in coding regions of ORF1ab and Spike protein | |
| Whole genome | Genome sequencing and alignment analysis of 86 GISAID genomes from 12 countries | 3 deletions (2 ORF1ab polyprotein and one in the 3′ end of the genome) in the genomes from Japan, USA, and Australia 42 missense mutations (non-structural and structural proteins): 29 in the ORF1ab polyprotein, 8 in the S glycoprotein, 1 in the matrix protein, and 4 in the nucleocapsid protein | |
| Whole genome | Genome sequencing and alignment analysis of 30,366 genomes/software developed by the researcher’s group (ODOTool | 11 variations, with the incidence of over 10% in the 30,366 isolates 8 of these variations (C1059T, G11083T, C14408T, A23403G, G25563T, G28881A, G28882A, and G28883C) caused amino acid substitutions | |
| Whole genome, D614G mutation (gene spike protein) | Statistical analysis of the D614G mutation of 2795 GISAID genomes from 55 countries | Amino acid change from an aspartate to a glycine residue at position 614 (D614G) High frequency of the D614G mutation (87%) among Italians isolates D614G clade report of 954 of 1449 (66%) European isolates and 1237 of 2795 (44%) worldwide isolates | |
| Whole genome, | Mutation analysis of 34 human and animal isolates | 60% of nucleotide variations between human SARS-CoV-2 and bat RaTG13, can be attributed to C > U and U > C substitutions An accumulation of C > U mutations was observed in SARS-CoV2 variants in the human population, suggesting a significant role in the evolution of the SARS-CoV-2 coronavirus | |
| Whole genome, Spike protein | Genome sequencing and alignment analysis of 1,325 genomes and 1604 CDS | 1197 SNPs, classified in 782 clusters 1604 CDS at the S protein Two major phylogenyclades A and B with many subclades in the S protein of SARS-CoV-2 circulating worldwide 23402A > G SNP in 48.2% (the most common) | |
| Spike gene | Development of a bioinformatics pipeline for Spike amino acid variants-GISAID data | A spike protein amino acid change at D614G Association of the D614G variant with high levels of infectivity and viral loads | |
| ORF8 | Evolutionary analysis of ORF8: genetic diversity and genomic rearrangements | The ORF8 is poorly conserved among coronaviruses with a small number of highly frequent lineages Nonsense mutations and three main deletions in the ORF8 gene that either remove or significantly change the ORF8 protein, which suggests that SARS-CoV-2 can persist without a functional ORF8 protein | |
| Orf1a, Orf1b, ORF3a ORF6, ORF7a, ORF8, ORF10, S, E, M, N, Sum | Metatranscriptome sequencinganalysis of eight fluid bronchoalveolar lavage from 25 community-acquired pneumonia patients and 20 healthy controls | No specific polymorphism was described The median number of intra-host variants (iSNVs) was 1–4 in SARS-CoV-2 infected patients SARS-CoV-2 evolves | |
| RdRp, S, and Nsp-2 | Sanger sequencing of the NSP-2, NSP-12, and S genes for phylogenetic analysis of 7 cases from Iran | NSP-2 sequences - highest similarity between Iranian and Wuhan (China) RdRp and S gene sequences-highest similarity between Iranian and China and USA No identified differences between Iranian isolates | |
| S, RdRP, RNA primase, nucleoprotein | Genotyping of 558 isolates worldwide | Mutations in genes encoding the S proteins and RNA polymerase, RNA primase, and nucleoprotein Classification of the SNPs into four major groups: single mutation in nsp6 (11083G > T) (115%), single mutation in ORF3a (26144G > T) (49%), single mutation in RNA polymerase (nsp8) (8782C > T, 28144 T > C) (140%), and double mutations in S protein and RNA polymerase: (241C > T, 3037C > T, 14408C > T, 23403A > G) (178%; 182%; 182%; 183%) Predominance of co-mutations (241C > T, 3037C > T, 23403A > G) in isolates from Europe Estimated transmission of SARS-CoV-2 of 14 generations since its first infection to humans in Dec 2019 | |
GISAID: Global Initiative on Sharing All Influenza Data.
NCBI: National Center for Biotechnology Information.
TMRCA: Time to the most recent common ancestor.
ODOTool: Strategy Based Local Alignment Tool.
CDS: Coding Sequence.
Gene variants in the host genome in association with genetic susceptibility and clinical characteristics in SARS-CoV-2 infected patients.
| Gene | Methodological approach | Main results | Reference |
|---|---|---|---|
Identification of several specific and common Association of the hemizygous viral-entry booster variants of | |||
| Review article on | |||
| In silico analysis of the impact of | Decrease and increase of S19P may protect and K26R may predispose to severe SARS-CoV-2 disease | ||
| Analysis of the | Association of Association of | ||
| Analysis of the 1700 variants in | No direct evidence of SARS-CoV-2 spike protein binding-resistant Association of higher allelic frequency in the eQTL variants with higher | ||
| Whole-exome sequencing (WES) data mining for | Missense changes (Asn720Asp, Lys26Arg, and Gly211Arg) predicted to interfere with ACE2 structure and stabilization Interference of rare variants (Leu351Val and Pro389His) with SARS-CoV-2 spike protein binding Higher allelic variability of | ||
| Construction of intermolecular interactions of molecular models of native and variants of | Variations in the intermolecular interactions of the | ||
| Molecular dynamic simulation on the influences of | Significant differences of minor K26R and I468V variants may affect binding between S protein and ACE2 receptor Marginal differences in gene expression for some populations in HapMap3 as compared to the Chinese population | ||
| Epidemiological investigation of the association between | Positive correlation of D allele of | ||
| Genotype analysis from high-coverage sequenced data of 1KGP (phase 3) and the Korean Personal Genome Project | Negative correlation of No correlation of | ||
| Collection of the literature data on the geographical variation of the | Correlation of Association of the I/D polymorphism in intron 16 of | ||
| Meta-analysis on the prevalence of | Association of the increase of the I/D allele frequency ratio with the patients’ recovery rate No significant differences in the death rate | ||
Presence of | |||
| Association of | Association of | ||
| Analysis of whole-exome sequencing and SARS-CoV-2 infection in a familial multiple sclerosis cohort | Low level of High level of Association of the | ||
| Comparison of the rare-variants burden and polymorphisms frequency from exome and SNP-array data of a large Italian cohort from Europe and East Asia | No association between Differences of exonic variant (Val160Met) between East Asians and Italians Higher frequency of rare alleles of 2 haplotypes, predicted to induce higher levels of | ||
| Analysis of | Association of Suggestive association of the | ||
| Analysis of coding-region variants in | Association of the eQTL variant rs35074065 with high expression of | ||
| Analysis of the coding (missense) and regulatory variants of the | Four regulatory variants in the Significant role of the | ||
| Genotyping analysis of | |||
Increase of deaths caused by COVID‐19 in | |||
| ABO blood group | Analysis of 8,582,968 SNPs and meta-analysis of the two case-control panels | 3p21.31 gene cluster ( Association of 3p21.31 at | |
| ABO blood group | Analysis of ABO blood type in 2173 SARS-CoV-2 infected patients from China | Significant higher risk of SARS-CoV-2 infection in blood group A individuals Significant lower risk of SARS-CoV-2 infection disease in blood group O individuals | |
| Analysis of genes regulated by these variants through | |||
| Analysis of a new mutation CCR5Delta 32 in 416 SARS-CoV-2-positive infection survivors (164 asymptomatic and 252 symptomatic) | Association of the highest number of CCR5Delta32 carriers in SARS-CoV-2-positive/COVID-19-asymptomatic subjects when compared to the SARS-CoV-2-positive/COVID-19-symptomatic patients CCR5Delta32 I/D polymorphism may have the potential to predict the severity of SARS-CoV-2 infection | ||
| Analysis of the | T allele–carrying individuals may be more resistant to SARS-CoV-2 Africans or African Americans with low allelic frequency of rs1990760 (T allele) are more vulnerable-risk groups than Caucasians and Indians | ||
| Analysis of the SNPs rs12252 and rs34481144 in the gene | Neither CAT plays a crucial intermediary role in binding effectiveness of | ||
ACE2: Angiotensin I converting enzyme 2.
CTSB: Cathepsin B.
CTSL: Cathepsin L.
TMPRSS2: Transmembrane serine protease 2.
ACE1: Angiotensin I converting enzyme.
CD26: dipeptidil peptidase IV (DPPIV/CD26).
HLA: Human leukocyte antigen.
CAT: catalase.
EHF: ETS homologous factor.
CCR5: CC chemokine receptor 5.
IFIH1: Interferon-induced helicase 1.
IFITM3: Interferon-induced transmembrane protein 3.
ChinaMAP: China metabolic analytics project (ChinaMAP).
1KGB: 1000 Genomes Project.
HapMap3: International haplotype map project 3.
eQTL: Expression quantitative trait loci.
MHC: Major histocompatibility complex.
Non-coding RNA-like sequences (miRNAs and lncRNAs) in SARS-CoV-2 and host genomes identified by in silico and experimental analysis.
| ncRNA | Methodological approach | Main results | Reference |
|---|---|---|---|
| miR-1307-3p, miR-1468-5p, miR-3611, miR-3691-3p, miR-3934-3p, miR-5197, miR-8066a | Sequence analysis of miRNA sites in MERS, SARS, SARS-CoV-2, and cold virus (OC43 and 229E) from NCBI | -Seven similar miRNAs (miR-1307-3p, miR-1468-5p, miR-3611, miR-3691-3p, miR-3934-3p, miR-5197, and miR-8066a) in the SARS-COV-2 genome from different geographic regions in association with virus pathogenicity and host response | |
| miR-15b-5p, miR-15a-5p, miR-30b-5p, miR-409-3p, miR-505-3p, miR-548c-5p, miR-548d-3p | Sequence analysis of 4 SARS isolates and 29 COVID-19 isolates from NCBI and GISAID databases | 558 miRNAs identified 315/558 miRNAs uniquely targeting COVID-19 patients genome Seven miRNAs (miR-15b-5p, miR-15a-5p, miR-30b -5p, miR-409-3p, miR-505-3p, miR-548c-5p, and miR-548d-3p) with high target score in the COVID-19 patients genomes in association with age-related conditions/co-morbidities | |
| miR-16-2-3p,miR-139-5p, miR-155-3p, miR-1275, let7a-3p | Sequence analysis of SARS-CoV, SARS-CoV-2 and MERS genomes | 128 miRNAs associated with SARS-CoV-2 28/128 miRNAs common to SARS-CoV and 23/128 to MERS Five miRNAs (miR-16-2-3p,miR-139 5p, miR-155-3p, miR-1275, and let7a-3p) diffferentially expressed in SARS-CoV-2 infected lung cancer cells (Calu-3) | |
| 30 viral mature miRNA-like sequences | Sequence analysis of miRNA-like sequences in the SARS-CoV-2 genome from NCBI database and potential host-virus interactions | 30 viral mature miRNA-like sequences predicted to target 1367 host genes miRNAs affected transcription, defense systems, metabolism, and critical signaling cellular pathways, such as the EGFR | |
| miR-10b-5p, miR-16-5p, miR-26b-5p, miR-27a-3p, miR-124-3p, miR-200b-3p miR-302c-5p, miR-587, miR-1305 | 1954 miRNAs predicted to regulate Nine miRNAs (miR-10b-5p, miR-16-5p, miR-26b-5p, miR-27a-3p, miR-124-3p, miR-200b-3p miR-302c-5p, miR-587, miR-1305) among the top ones regulating the 5/9 miRNAs (miR-26b-5p, miR-27a-3, miR-302c-5p, miR-587, and miR-1305) associated with hypertension | ||
| miR-18a, miR-125b, miR-143, miR-181a | Review article- Prediction analysis of miRNAs targeting | Four miRNAs (miR-18a, miR-125b, miR-143, and miR-181a) predicted to target | Vidiasta et al., 2020 |
| miR-127-3p, miR-153-3p, miR-204-5p, miR-211-5p, miR-448, miR-548c-3p, miR-593-3p, miR-1324, miR-4433b-3p,miR-4666b, miR-4685-3p, miR-4696 miR-4716-5p,miR-5011-3p, miR-5089, miR-6076, miR-6729-5p,miR-6797-3p, miR-6818-3p | Sequence analysis of | Alterations in the expression of miRNAs regulating the Prediction analysis of miRNAs targeting this gene, showed the presence of six SNPs influencing miRNA target site and/or seed region of the miR-127-3p, miR-153-3p, miR-204-5p, miR-211-5p, miR-448, miR-548c-3p, miR-593-3p, miR-1324, miR-4433b-3p, miR-4666b,miR-4685-3p, miR-4696, miR-4716-5p,miR-5011-3p, miR-5089, miR-6076, miR-6729-5p,miR-6797-3p, and miR-6818-3p. | |
| miR-26a-5p,miR-29b-3p, miR-34a-5p | Clinical lung biopsies of SARS-CoV-2 patients with Acute lung injury (ALI) compared to biopsies of non-affected patients (qPCR) | Three miRs (miR-26a-5p, miR-29b-3p and miR-34a-5p) down regulated in comparison to the controls miR-26a-5p associated with endothelial dysfunction; induced increased expression of IL-6 miR-29b-3p associated with endothelial dysfunction; induced expression of IL-4 miR-34a-5p no association with inflammatory markers | |
| ANRIL, NEAT1, MALAT1, Gm4419, lincRNA-Cox2, XIST, EPS | Rat models, cell lines, clinical cases, C57BL/6 mice and BV2 mouse microglia | ANRIL, NEAT1, MALAT1, Gm4419, lincRNA-Cox2 interfere in inflammasome formation by regulating NLRP3 XIST and EPS negatively regulated the activation of NLRP3 inflammasome | |
| MALAT1, NEAT, MIR3142HG | Clinical cases analysis (lung tissue/bronchial cells) | 3 lncRNAs (MALAT1, NEAT and MIR3142HG) with high expression in bronchial cells MALAT1 induced IL-6 host immune response NEAT associated with inflammasome formation MIR3142HG- unknown function | |
| MALAT1, TSLNC8, NEAT, CAIF, HOTAIR | Human cell lines, lung injury rats and/or rat pulmonary microvascular endothelial cells | Dysregulate IL-6 signaling pathway | |
GISAID: Global Initiative on Sharing All Influenza Data.
NCBI: National Center for Biotechnology Information.
EGFR: Epidermal Growth Factor Receptor.
ACE2: Angiotensin I converting enzyme 2.
KEGG: Kyoto Encyclopedia of Genes and Genomes.
TMPRSS2: Transmembrane serine protease 2.
GEPIA: Original Research Interactive Analysis.
ExPASY2: Expert protein analysis system 2.
GTEx: Genotype-Tissue Expression.
CCLE: Cancer cell line encyclopedia.
IL-6: Interleukin-6.
ICAM-1: Intercellular Adhesion Molecule 1.
IL-4: Interleukin 4.
IL-8: Interleukin 8.
NLRP3: NLR Family Pyrin Domain Containing 3.
Fig. 1Circle representation of predicted targets for human miRNAs in genes involved in the SARS-CoV 2 life cycle. Outer circle indicates the SARS-CoV-2 genome location (nt) and annotation. In the inner circle, black bars represent loci for human miRNAs. Connecting lines characterize human miRNAs that have multiple targets in the viral genome.