| Literature DB >> 34234167 |
Rong Liu1,2, Pei Wu1,2, Pauline Ogrodzki1, Sally Mahmoud1, Ke Liang3, Pengjuan Liu3, Stephen S Francis4,5, Hanif Khalak1, Denghui Liu6, Junhua Li2,7, Tao Ma3, Fang Chen3, Weibin Liu2, Xinyu Huang3, Wenjun He6, Zhaorong Yuan6, Nan Qiao6, Xin Meng6, Budoor Alqarni1, Javier Quilez1, Vinay Kusuma1, Long Lin2, Xin Jin2, Chongguang Yang8, Xavier Anton1, Ashish Koshy1, Huanming Yang2, Xun Xu2, Jian Wang2, Peng Xiao1, Nawal Al Kaabi9, Mohammed Saifuddin Fasihuddin9, Francis Amirtharaj Selvaraj9, Stefan Weber9, Farida Ismail Al Hosani10, Siyang Liu11,12, Walid Abbas Zaher13.
Abstract
To unravel the source of SARS-CoV-2 introduction and the pattern of its spreading and evolution in the United Arab Emirates, we conducted meta-transcriptome sequencing of 1067 nasopharyngeal swab samples collected between May 9th and Jun 29th, 2020 during the first peak of the local COVID-19 epidemic. We identified global clade distribution and eleven novel genetic variants that were almost absent in the rest of the world and that defined five subclades specific to the UAE viral population. Cross-settlement human-to-human transmission was related to the local business activity. Perhaps surprisingly, at least 5% of the population were co-infected by SARS-CoV-2 of multiple clades within the same host. We also discovered an enrichment of cytosine-to-uracil mutation among the viral population collected from the nasopharynx, that is different from the adenosine-to-inosine change previously reported in the bronchoalveolar lavage fluid samples and a previously unidentified upregulation of APOBEC4 expression in nasopharynx among infected patients, indicating the innate immune host response mediated by ADAR and APOBEC gene families could be tissue-specific. The genomic epidemiological and molecular biological knowledge reported here provides new insights for the SARS-CoV-2 evolution and transmission and points out future direction on host-pathogen interaction investigation.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34234167 PMCID: PMC8263779 DOI: 10.1038/s41598-021-92851-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1COVID-19 outbreak in the United Arab Emirates and the samples subjected for sequencing in this study. (A) Number of confirmed infected cases in the UAE (N = 461,444) until Mar 31st, 2021 was shown in the blue line and the number of subjects sequenced by meta-transcriptomic sequencing (N = 1067) was shown in the red bars. Important dates reflecting governmental responses were marked in black text. (B) Assembly quality of the 1067 viral genomes as a function of the RT-PCR Ct value and SARS-CoV-2 reads per million sequencing reads. Color represents assembly quality stratified by the number of gaps. (C) Allele frequency spectrum of the 1245 genetic variants identified from the 896 assemblies with less than 2% gaps.
Figure 2Phylogenetic analysis of the sequenced UAE viral population during May and June. (A) Maximum likelihood tree of the 637 unique viral genomes with less than 2% gaps and 52 closest relatives from GISAID. Each line indicates a sample colored by the five dominant viral clades worldwide (19A: MidnightBlue, 19B: RoyalBlue, 20A: GoldEnrod, 20B: Purple, 20C: SaddleBrown), annotated with the clade definitive genetic variation. The closest relatives from GISAID were marked by a dot colored by geographical district reported for the viral sample. The subclade-definitive genetic variations were marked in black. (B) Comparison of the alternative allele frequency of the 1245 viral genetic variants between the 896 high quality UAE viral genomes and the 23,164 viral genomes from the globe downloaded from the China National Center for Bioinformation. Nomenclature of the clades was detailed in “Supplementary Notes”.
Allele frequency and functional annotation of the eleven UAE-specific genetic variants.
| Position | UAE AF #1 | Type | Region | Nucleotide change | Amino acid change | CNCB AF #2 | P-value #3 |
|---|---|---|---|---|---|---|---|
| 5924 | 0.194 | Missense | nsp3 | c.5659G>A | p.Val1887Ile | 0 | INF |
| 7171 | 0.091 | Synonymous | nsp3 | c.6906T>C | p.Pro2302Pro | 0 | INF |
| 7851 | 0.069 | Missense | nsp3 | c.7586C>T | p.Ala2529Val | 2.590E − 04 | 3.943E − 82 |
| 11230 | 0.203 | Missense | nsp6 | c.10965G>T | p.Met3655Ile | 6.476E − 04 | 3.108E − 248 |
| 21775 | 0.206 | Synonymous | S | c.213T>G | p.Ser71Ser | 0 | INF |
| 23311 | 0.089 | Missense | S | c.1749G>T | p.Glu583Asp | 8.202E − 04 | 1.250E − 96 |
| 24170 | 0.065 | Missense | S | c.2608A>G | p.Ile870Val | 4.317E − 05 | 5.989E − 84 |
| 27002 | 0.093 | Synonymous | M | c.480C>T | p.Asp160Asp | 8.634E − 05 | 1.058E − 118 |
| 28167 | 0.182 | Missense | ORF8 | c.274G>A | p.Glu92Lys | 4.317E − 04 | 9.830E − 226 |
| 28878 | 0.212 | Missense | N | c.605G>A | p.Ser202Asn | 4.317E − 04 | 2.31E − 186 |
| 29742 | 0.235 | Downstream | S | c.*4358G>A | 1.027E − 02 | 1.658E − 191 |
#1Allele frequency computed from 896 genomes in UAE.
#2Allele frequency computed from 23,164 genomes around the globe.
#3Fisher exact test P-value comparing the allele counts between the 896 high quality UAE viral genomes and 23,164 viral genomes from the globe downloaded from the China National Center for Bioinformation. Comparison for all the 1245 variants were detailed in Table S2.
Figure 3Functional analysis of the unique variants and subclade in the UAE samples. RT-qPCR Ct value distribution for samples in each of the five dominant clades and five subclades. Shown is the p-value using Kruskai–Wallis test and p-value by performing T-test comparing the Ct value for patients carrying certain clade or subclade virus strains with the rest of the patients who did not carry the virus belong to a specific clade or subclade.
Figure 4Human-to-human transmission across settlements. (A) Geographical distribution of 120 viral samples with settlement level information in the Abu Dhabi city. (B) Transmission network of the 120 samples colored by settlements. (C) L1-norm genetic distance for longitudinal samples, samples from the same settlements, and samples from different settlements. Among the 130 samples that report settlement level geographical location in Table S5, ten samples were not displayed because only one sample were collected from that settlement. The UAE map was obtained from the world-geo.js file (https://gist.githubusercontent.com/munaf-zz/4630218/raw/32a389a88f990e01c2c7661c551c84af9eda1a26/world-geo.js) and plotted using jQuery JavaScript Library v1.11.1 and echart v3.7.2.
Figure 5Co-infection with multiple SARS-CoV-2 variants. Evidence for human-to-human transmission of multiple SARS-CoV-2 variants were established using the clade and sub-clade definitive viral genetic variants. Columns display the de-identified sample ID that carried more than one SARS-CoV-2 viral variants in the nasopharyngeal swab sampling (N = 48). Color bar shows the viral clade assigned to the individual, according to the consensus viral sequence, reflecting the dominant clade in one sample. Rows indicate the eleven clade- definitive and eleven sub-clade definitive variants. Heatmap color, ranging from red to blue, suggests the allelic proportion of the derived allele of the iSNV. The ID of two longitudinal samples were marked in red.
Figure 6Human innate immune response to SARS-CoV-2 mediated by the ADAR and APOBEC gene families. (A) Allelic faction (Column 1), the number of mutations (Column 2) and the number of recurrent mutations (Column 3) for twelve mutation types for six studies arranged by row. UAE: 896 nasal swab samples collected in our study; GISAID: 23,164 viral sequences collected; Spain: 36 nasal swab samples collected in Spain; Virginia: 35 nasal swab samples collected in Virginia and 112 nasal swab samples collected in Ruijin hospital in Shanghai city, China. (B) Host ADAR and APOBEC gene expression (logarithm of transcript per million) in the nasal swab samples for all and for each of the five clades.