| Literature DB >> 34695600 |
Shuhui Song1, Cuiping Li1, Lu Kang2, Dongmei Tian1, Nazish Badar3, Wentai Ma2, Shilei Zhao2, Xuan Jiang4, Chun Wang2, Yongqiao Sun5, Wenjie Li5, Meng Lei5, Shuangli Li5, Qiuhui Qi5, Aamer Ikram3, Muhammad Salman3, Massab Umair3, Huma Shireen6, Fatima Batool6, Bing Zhang5, Hua Chen7, Yun-Gui Yang2, Amir Ali Abbasi8, Mingkun Li9, Yongbiao Xue10, Yiming Bao11.
Abstract
COVID-19 has swept globally and Pakistan is no exception. To investigate the initial introductions and transmissions of the SARS-CoV-2 in Pakistan, we performed the largest genomic epidemiology study of COVID-19 in Pakistan and generated 150 complete SARS-CoV-2 genome sequences from samples collected from March 16 to June 1, 2020. We identified a total of 347 mutated positions, 31 of which were over-represented in Pakistan. Meanwhile, we found over 1000 intra-host single-nucleotide variants (iSNVs). Several of them occurred concurrently, indicating possible interactions among them or coevolution. Some of the high-frequency iSNVs in Pakistan were not observed in the global population, suggesting strong purifying selections. The genomic epidemiology revealed five distinctive spreading clusters. The largest cluster consisted of 74 viruses which were derived from different geographic locations of Pakistan and formed a deep hierarchical structure, indicating an extensive and persistent nation-wide transmission of the virus that was probably attributed to a signature mutation (G8371T in ORF1ab) of this cluster. Furthermore, 28 putative international introductions were identified, several of which are consistent with the epidemiological investigations. In all, this study has inferred the possible pathways of introductions and transmissions of SARS-CoV-2 in Pakistan, which could aid ongoing and future viral surveillance and COVID-19 control.Entities:
Keywords: Haplotype network; Molecular evolution; Pakistan; SARS-CoV-2; Virus
Mesh:
Year: 2021 PMID: 34695600 PMCID: PMC8546014 DOI: 10.1016/j.gpb.2021.08.007
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 6.409
Figure 1Epidemic in Pakistan and demographic details of the confirmed COVID-19 cases sampled for sequencing in this study
A. Number of confirmed COVID-19 cases for Pakistan districts as of June 2, 2020. B. Regional distribution of 150 confirmed COVID-19 cases in Pakistan sampled for sequencing in this study. C. Number of samples on each sampling date from the districts indicated for the confirmed cases examined in this study. D. Sample distribution according to gender and contact history of cases in this population. Gender and contact information is missing for two different cases. E. Sample distribution according to age of the confirmed cases examined in this study. SD, Sindh; PB, Punjab; KPK, Khyber Pakhtunkhwa; BA, Balochistan; GB, Gilgit Baltistan; IS, Islamabad; AJK, Azad Jammu Kashmir.
Figure 2Heatmap of MAF for variants with PMF > 0.05 in each sample
Accession ID of COVID-19 sampled cases is represented by a number prefixed with E (E is short for experiment). The gender, age, sampling date, and district information for each sampled case is shown with different color schemes. The Pangolin lineage and cluster information for each sample are also integrated. Variants that have significantly different (P < 0.05, Fisher’s exact test) PMF in Pakistani sequences compared to publicly-released sequences (as of October 9, 2020) are marked with asterisks. MAF, mutant allele frequency; PMF, population mutation frequency.
Figure 3Profile of iSNVs
A. Distribution of the iSNV count per sample. One sample with iSNV count of 109 is not shown in the bar chart. B. MAF distribution and mutation types of all iSNVs. Bars in orange and purple represent mutations that are observed and not observed in the polymorphism data (as of September 11, 2020), respectively. C. The number and genomic distribution of all iSNVs. In the top plot, the number of iSNVs at each position is plotted as a bar graph against the left Y axis, with MAF of each iSNV color-coded. The dash lines represent iSNV count of 10. Proportion of iSNVs for positions with iSNV count ≥ 10 is plotted against the right Y axis (major allele frequency ≥ 0.7, sequencing depth ≥ 100). Open circle and open triangle indicate wild-type and mutant nucleotides, respectively. In the middle plot, the grey histogram shows the substitution rate estimated from the polymorphism data. Positions with iSNV count ≥ 10 are indicated in red. In the bottom plot, the diagram shows the genomic structure of SARS-CoV-2. Coding regions are color coded with the respective gene names indicated below, and non-coding regions on both ends are shown in blank. iSNV, intra-host single-nucleotide variant.
Figure 4Spread and transmission of SARS-CoV-2 sequences in Pakistan
A. Haplotype network of all SARS-CoV-2 sequences in Pakistan (Pakistan; red node) and closely-related publicly-released sequences from other countries (Others; blue node) as of October 9, 2020. Each node represents a distinctive haplotype, and the length of edge between any two nodes is proportional to sequence distance. Pakistani sequence clusters are labeled with C1–C5. Node of the reference sequence is marked by a solid triangle in purple, and nodes of putative introductions are labelled with open circles in yellow. Number of samples for public sequences are marked in each node when available. B. The haplotype network of C1. The color of the nodes, from blue to red, represents the sampling date for SARS-CoV-2 sequences in Pakistan as shown for (A) from March 4, 2020 to June 2, 2020. Sample accession ID is marked in each node. Nodes marked by H0, H1, and H2 represent the parent, the first, and the second generations of Pakistani sequences, respectively. Node of H3 indicates the super spreader sequences. Number of samples from different countries for H0 and H1, as well as sample details for H3 are listed in the table on the right.
Major clusters and signature variants of SARS-CoV-2 genome sequences in Pakistan
| Lineage | B.1 | B.1; B.1.36 | B.1.1.1 | B.1.1.162 | B; B.6 | ||
| No. of sequences (this study/publicly-released sequences as of October 9, 2020) | 74 (70/4) | 24 (24/0) | 22 (18/4) | 10 (10/0) | 9 (9/0) | ||
| Common variants | C2416T, C3037T, G8371T, C14408T, A23403G, G25563T | C3037T, C14408T, C18877T, A23403G, G25563T, C26735T | C3037T, C4002T, G10097A, C13536T, C14408T, A23403G, C23731T, G28881A, G28882A, G28883C | C313T, C3037T, C14408T, A23403G, G28881A, G28882A, G28883C | C6312A, G11083T, C13730T, C23929T, C28311T | ||
| Putative or likely importing countries | Brazil, Denmark, France, Hungary, India, Israel, Japan, Norway, Portugal, Russia, Singapore, Republic of Korea, Switzerland, United Arab Emirates, UK, USA | USA, Australia, Canada, China, Gambia, India, Malaysia, New Zealand, Oman, Senegal, Sierra Leone, Singapore | |||||
| No. of putative introductions | 4 | 3 | 4 | 1 | 1 | ||
| Signature variant | G8371T | C26735T | C4002T | C13536T | C313T | G11083T | C23929T |
| Gene containing the signature variant | |||||||
| Amino acid change for the signature variant | Q2702H | − | T1246I | − | − | L3606F | − |
Note: Putative and likely importing countries for each cluster are presented in bold and regular fonts, respectively; countries visited by two patients are presented in italic. Signature variant refers to the mutation present in each cluster but absent from its parental nodes. “−” indicates synonymous mutation that does not cause amino acid change.
Figure 5Two representative introduction-related clusters and schematic diagram of inferred international importing routes
A. Haplotype network of C2. B. Haplotype network of C3. C. Schematic diagram of inferred global introductions. Thicker lines represent putative importing countries and thin lines represent other likely importing countries.