Literature DB >> 32474554

Mapping the genomic landscape & diversity of COVID-19 based on >3950 clinical isolates of SARS-CoV-2: Likely origin & transmission dynamics of isolates sequenced in India.

Hina Singh1, Jasdeep Singh1, Mohd Khubaib1, Salma Jamal1, Javaid Ahmed Sheikh2, Sunil Kohli3, Seyed Ehtesham Hasnain4, Syed Asad Rahman5.   

Abstract

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32474554      PMCID: PMC7530457          DOI: 10.4103/ijmr.IJMR_1253_20

Source DB:  PubMed          Journal:  Indian J Med Res        ISSN: 0971-5916            Impact factor:   2.375


× No keyword cloud information.
Sir, The COVID-19 pandemic has stalled the world and catapulted the global health systems into unprecedented chaos. More than 200 countries have been affected by this pandemic, resulting in 2.54 million cases in a short period of time and >0.17 million deaths (as of April 23, 2020), with a mere 0.7 million recoveries1. The movement of COVID-19 hotspot from China to Europe, and now to the USA, has been partly due to the staggered restrictions in global travel and partly due to potent transmission through asymptomatic carriers2. India, with 21,393 cases and 681 deaths (as of April 23, 2020)1, had the lowest figures for any country of the comparable population (0.5 deaths per million population). International travellers or their close contacts formed the majority of initially reported cases. The delayed onset of COVID-19 in India has given it an edge, which allowed it to impose severe restrictions to contain the local spread 345. In our in-depth analyses of 1500+ genomes, variability among clinical isolates was shown along the timeline, leading to distinct clustering of SARS-CoV-2 across the globe (unpublished observation). It was predicted, based on the aggregation propensity of the spike protein in the Wuhan and other isolates of SARS-CoV-2, that this virus would exhibit very high transmissibility and confer survival fitness67. Genetic diversity of the virus increases with disease progression and can be utilized to model the evolution and propagation of the disease6 Recently, phylogenetic network analysis of 160 SARS-CoV-2 genome samples showed a parallel evolution of the virus and its evolutionary selection in their human hosts8. Similar whole-genome analyses of the Indian isolates and their comparison with global isolates can provide a better understanding of dominant clades within the population and unveil targets for developing specific interventions. In the present study, machine learning-based t-SNE analysis of global clinical isolates has been utilized to segregate the clinical isolates into clusters while accommodating the outliers910. Whole-genome analysis of 3968 global isolates obtained from GISAID (Global initiative on sharing all influenza data)11, including 25 SARS-CoV-2 genomes sequenced in India [next-genome sequencing (NGS) data submitted by the ICMR-National Institute of Virology, Pune, India] and presented in (Figure 1) (Supplementary Fig. 1 (available from ) and [Supplementary Table 1 (available from )), was an attempt to dissect the global genome diversity and also critically evaluate the placement of Indian isolates to understand the COVID-19 pandemic in India.
Fig. 1

Whole-genome-based t-SNE clustering of 3968 clinical isolates. (A) Comparative genome-based clustering of Indian isolates (red) with Chinease isolates (blue). (B) Comapartive genome-based clustering of Indian isolates (red) and Chinese isolates (blue) with rest of the world (green). (C) Diversity in clinical isolates showing three distinct clustering using hierarchical clustering on the t-SNE clusters. (t-SNE: ).

Supplementary Table I

Details of Indian samples along with their origin and placement in Hierarchical clusters. Genomic sequences were retrieved from GISAID ()

Accession IDContact HistoryGenderAgeVirus nameLocationCollection dateHierarchical Cluster ID
EPI_ISL_413523Travel history to ChinaMale23hCoV-19/India/1-31/2020Asia/India/Kerala2020-1-312
EPI_ISL_420543Italian touristFemale73hCoV-19/India/763/2020Asia/India2020-3-32
EPI_ISL_420544Vero CCL81 isolate P1NANAhCoV-19/India/2020763/2020Asia/India20201
EPI_ISL_420545Italian touristFemale77hCoV-19/India/770/2020Asia/India2020-3-32
EPI_ISL_420546Vero CCL81 isolate P1NANAhCoV-19/India/2020770/2020Asia/India20201
EPI_ISL_420547Italian touristFemale70hCoV-19/India/772/2020Asia/India2020-3-32
EPI_ISL_420548Vero CCL81 isolate P1NANAhCoV-19/India/2020772/2020Asia/India20202
EPI_ISL_420549Italian touristFemale65hCoV-19/India/773/2020Asia/India2020-3-32
EPI_ISL_420550Vero CCL81 isolate P1NANAhCoV-19/India/2020773/2020Asia/India20202
EPI_ISL_420551Indian contact of Italian touristMale59hCoV-19/India/777/2020Asia/India2020-3-32
EPI_ISL_420552Vero CCL81 isolate P1NANAhCoV-19/India/2020777/2020Asia/India20202
EPI_ISL_420553Italian touristMale66hCoV-19/India/781/2020Asia/India2020-3-32
EPI_ISL_420554Vero CCL81 isolate P1NANAhCoV-19/India/2020781/2020Asia/India20202
EPI_ISL_420555Indian contact of Indian Patient having travel history to ItalyFemale37hCoV-19/India/c32/2020Asia/India2020-3-32
EPI_ISL_420556Vero CCL81 isolate P1NAhCoV-19/India/2020c32/2020Asia/India20201
EPI_ISL_421662Indian citizen sampled at IranMale68hCoV-19/India/1073/2020Asia/India2020-3-102
EPI_ISL_421663Indian citizen sampled at IranMale45hCoV-19/India/1093/2020Asia/India2020-3-102
EPI_ISL_421664Indian citizen sampled at IranMale72hCoV-19/India/1100/2020Asia/India2020-3-102
EPI_ISL_421665Indian citizen sampled at IranMale43hCoV-19/India/1104/2020Asia/India2020-3-101
EPI_ISL_421666Indian citizen sampled at IranFemale54hCoV-19/India/1111/2020Asia/India2020-3-102
EPI_ISL_421667Indian citizen sampled at IranMale66hCoV-19/India/1115/2020Asia/India2020-3-102
EPI_ISL_421669Indian citizen sampled at IranFemale70hCoV-19/India/1616/2020Asia/India2020-3-122
EPI_ISL_421670Indian citizen sampled at IranFemale50hCoV-19/India/1617/2020Asia/India2020-3-122
EPI_ISL_421671Indian citizen sampled at IranFemale55hCoV-19/India/1621/2020Asia/India2020-3-122
EPI_ISL_421672Indian citizen sampled at IranMale59hCoV-19/India/1644/2020Asia/India2020-3-122
Whole-genome-based t-SNE clustering of 3968 clinical isolates. (A) Comparative genome-based clustering of Indian isolates (red) with Chinease isolates (blue). (B) Comapartive genome-based clustering of Indian isolates (red) and Chinese isolates (blue) with rest of the world (green). (C) Diversity in clinical isolates showing three distinct clustering using hierarchical clustering on the t-SNE clusters. (t-SNE: ). Details of Indian samples along with their origin and placement in Hierarchical clusters. Genomic sequences were retrieved from GISAID () The initial cases reported from India had a travel history to China, which explained its position in a Chinese cluster5 (Fig. 2). The travel ban from China to India, in early February 2020, has prevented the large-scale spill-over directly from China to the Indian Sub-continent. However, various isolates transmitted from other South-East Asian countries might fall in the same cluster. The overlap of Indian samples majorly with European samples (Supplementary Fig. 1, Panel III) reiterated the fact that the delayed travel restriction from the European hotspot regions affected not just India but also many countries.
Fig. 2

Position of various Indian isolates with other nations. (A-E) Clustering of SARS-CoV-2 genome sequences from India (red) with other nations around the globe. Indian samples clustered with samples from different nations – China, Kuwait, Canada, USA and Spain in whole-genome-based clustering. Figures were generated using FigTree v1.4.4 ().

Position of various Indian isolates with other nations. (A-E) Clustering of SARS-CoV-2 genome sequences from India (red) with other nations around the globe. Indian samples clustered with samples from different nations – China, Kuwait, Canada, USA and Spain in whole-genome-based clustering. Figures were generated using FigTree v1.4.4 (). Hierarchical-based clustering further yielded exciting outcomes on the inter-continent transmission of COVID-19. The segregation of SARS-CoV-2 genomes into three clades indicates the emergence of evolutionary diversity (Fig. 1C). The heterogeneity of these clusters, grouped along with Chinese counterparts, validates a global spill-over event originating from Wuhan512. Hierarchical cluster 2 in Supplementary Figure 1 Panel II (coloured by the continents) indicates the introduction of SARS-CoV-2 in India from the European, other Asian and North American nations (Supplementary Fig. 1). Detailed comparative analysis of Indian isolates with respect to other countries showed its close relationship with samples from China, USA, Canada, Spain and Kuwait, suggestive of exposure to COVID-19 due to travel history from these nations (Fig. 2). However, limited genome sequences from India make it difficult to differentiate and ascertain global transmission and transmission within the country. The conservation of an amino acid in any protein sequence denotes its functional importance1314 as it undergoes fewer amino acid replacements or is more likely to substitute amino acids with similar biochemical properties. The amino acid conservation is inversely proportionate to the evolutionary rate. This is a valuable gauge of the evolutionary divergence and the analogous genomic regions. Sequence similarity between the open reading frames (ORFs) of Indian isolates and the initial sample collected in Wuhan unravels conservation in five ORFs corresponding to envelope protein, membrane glycoprotein, ORF6, ORF7b and ORF10 proteins (Fig. 3A). On the contrary, a number of mutations were observed in ORF1a, ORF1b, spike protein (surface glycoprotein), ORF3a, ORF7a, ORF8 and nucleocapsid phosphoprotein (Supplementary Fig. 2 (available from and Supplementary Table II (available from )). Mean similarity calculated for these ORFs revealed that ORF1a in the Indian isolates was less conserved (more mutated) compared to global isolates (Fig. 3A and Supplementary Table III (available from )). In all other ORFs, a relatively higher conservation was observed among Indian isolates compared to Wuhan strain. When compared with global isolates, Indian isolates have higher entropy for changes in ORF 1a and ORF 1b (Supplementary Fig. 3 (available from )).. Further, qualitative analysis of mutations in non-conserved ORFs showed that each type of amino acid had undergone mutation in the Indian isolates (Fig. 3B). These mutations could be a major contributing factor for the separation of Indian isolates into three distinct clusters. Higher sampling rate driven by NGS of the Indian isolates would help in better understanding of actual variability in SARS-CoV-2 and assist both in identifying better diagnostic markers and in developing specific interventions in terms of vaccine candidates and drug targets.
Fig. 3

Sequence similarity and mutation analysis of open reading frames. (A) Comparison of mean sequence similarity for open reading frames between Indian and global isolates with Wuhan strain. (B) Qualitative analysis on type of mutations occurring in non-conserved open reading frames (ORFs) of Indian isolates compared to Wuhan strain.

Supplementary Table II

Specific high frequency (>=10%) mutations in individual ORFs in Indian isolates compared with reference strain (Wuhan_IPBCAMS-WH-01_2019_EPI_ISL_402123). The sequences with less than 25% gaps were selected for all the studies. Genomic sequences were retrieved from GISAID ()

ORFPositionAmino acid in referenceMutated amino acidMutation
ORF1a207Arginine (R)CysteineR->C
ORF1a378Valine (V)Isoleucine (I)V->I
ORF1a1515Serine (S)Phenylalanine (F)S->F
ORF1a2796Methionine (M)Isoleucine (I)M->I
ORF1a3606Leucine (L)Phenylalanine (F)L->F
ORF1b314Proline (P)Leucine (L)P->L
S614Aspartic acid (D)Glycine (G)D->G
ORF7a74Valine (V)Phenylalanine (F)V->F
Supplementary Table III

Comparison of mean sequence similarity for ORFs between Indian and global isolates with Wuhan strain

Column 1ORFColumn 2NameWuhan vs IndianWuhan vs Global


Indian Mean SimilarityIndian GMean SimilarityGlobal Mean SimilarityGlobal GMean Similarity
ORF5ORF_1a99.9099.9099.9399.93
ORF1ORF_1b99.9799.9799.9799.97
ORF2Surface Glycoprotein99.9499.9499.9499.94
ORF3ORF_3a99.9799.9799.9499.94
ORF4Envelope protein100.00100.00100.00100.00
ORF6Membrane Glycoprotein100.00100.00100.00100.00
ORF7ORF_6100.00100.00100.00100.00
ORF8ORF_7a99.8799.8799.7899.78
ORF9ORF_7b100.00100.00100.00100.00
ORF10ORF_899.9099.9099.8199.81
ORF11Nucleocapsid phosphoprotein99.9799.9799.9599.95
ORF12ORF_10100.00100.00100.00100.00
Sequence similarity and mutation analysis of open reading frames. (A) Comparison of mean sequence similarity for open reading frames between Indian and global isolates with Wuhan strain. (B) Qualitative analysis on type of mutations occurring in non-conserved open reading frames (ORFs) of Indian isolates compared to Wuhan strain. Specific high frequency (>=10%) mutations in individual ORFs in Indian isolates compared with reference strain (Wuhan_IPBCAMS-WH-01_2019_EPI_ISL_402123). The sequences with less than 25% gaps were selected for all the studies. Genomic sequences were retrieved from GISAID () Comparison of mean sequence similarity for ORFs between Indian and global isolates with Wuhan strain Evolutionary divergence, corroborated by epidemiological data, is a valuable tool to implement appropriate measures against this pandemic. The population density of India and the presence of functionally distinct isolates in the Indian population raise concerns and warrant an urgent need for higher sampling rate for better assessment of the evolution of SARS-CoV-2 in India. The situation is further confounded by the fact that many of these Indian isolates submitted in databanks include those of Indians living in Iran, Italian tourists visiting India, and also contains samples cultured in vitro. In conclusion, a whole-genome diversity analysis of 3968 global clinical isolates, including 25 isolates sequenced in India, of SARS-CoV-2 was done. The variations in different open reading frames (ORFs) of SARS-CoV-2, which drives the formation of distinct Indian clusters and functional heterogeneity, were highlighted. Five ORFs corresponding to envelope protein, membrane glycoprotein, ORF6, ORF7b and ORF10 were found to be highly conserved, while a number of mutations were observed in ORF1a, ORF1b, spike protein, ORF3a, ORF7a, ORF8 and nucleocapsid phosphoprotein. Generating diverse genomic datasets will provide insight into the propagation dynamics of COVID-19, leading to a better understanding of pathogenesis and evolution of SARS-CoV-2, which will eventually lead to better intervention methods.

SUPPLEMENTARY INFORMATION

Mapping the genomic landscape and diversity of COVID-19 based on >3950 clinical isolates of SARS-CoV-2: Likely origin and transmission dynamics of isolates sequenced in India Position of Indian genome sequences in sub-cluster a, b, and c with respect to other global genome sequences. Panel I shows presence of Indian isolates in three distinct sub-clusters with respect to Chinese isolates and remaining global isolates. Panel II highlights the placement of Indian clusters in two of the three hierarchical clusters (a, b, and c) obtained from t-SNE whole genome clustering of 3968 sequences. Panel III displays the prevalence of samples from various continents. Continent codes; Africa (Green), Australia (Orange), North America (Blue), Asia (Light purple), Europe (Yellow) and South America (Pink). Comparison of all Indian SARS-CoV-2 genomes with Wuhan strain (first collected sample) shows variation in ORF1a and 1b protein, surface glycoprotein, ORF3a protein, ORF7a protein, ORF8 and nucleocapsid protein. Differential entropy plots for Indian vs global isolates. Mutational entropy of each amino acid position in all ORFs calculated for Indian and global isolates with respect to reference strain (Wuhan_IPBCAMS-WH-01_2019_EPI_ISL_402123). In global isolates, Indian samples have not been included to highlight differential entropy. Genomic sequences were retrieved from GISAID ().
  10 in total

1.  Protein-protein interfaces: analysis of amino acid conservation in homodimers.

Authors:  W S Valdar; J M Thornton
Journal:  Proteins       Date:  2001-01-01

2.  Severe acute respiratory illness surveillance for coronavirus disease 2019, India, 2020.

Authors:  Nivedita Gupta; Ira Praharaj; Tarun Bhatnagar; Jeromie Wesley Vivian Thangaraj; Sidhartha Giri; Himanshu Chauhan; Sanket Kulkarni; Manoj Murhekar; Sujeet Singh; Raman R Gangakhedkar; Balram Bhargava
Journal:  Indian J Med Res       Date:  2020 Feb & Mar       Impact factor: 2.375

3.  Prudent public health intervention strategies to control the coronavirus disease 2019 transmission in India: A mathematical model-based approach.

Authors:  Sandip Mandal; Tarun Bhatnagar; Nimalan Arinaminpathy; Anup Agarwal; Amartya Chowdhury; Manoj Murhekar; Raman R Gangakhedkar; Swarup Sarkar
Journal:  Indian J Med Res       Date:  2020 Feb & Mar       Impact factor: 2.375

4.  Data, disease and diplomacy: GISAID's innovative contribution to global health.

Authors:  Stefan Elbe; Gemma Buckland-Merrett
Journal:  Glob Chall       Date:  2017-01-10

5.  An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity.

Authors:  T T Wu; E A Kabat
Journal:  J Exp Med       Date:  1970-08-01       Impact factor: 14.307

6.  Full-genome sequences of the first two SARS-CoV-2 viruses from India.

Authors:  Pragya D Yadav; Varsha A Potdar; Manohar Lal Choudhary; Dimpal A Nyayanit; Megha Agrawal; Santosh M Jadhav; Triparna D Majumdar; Anita Shete-Aich; Atanu Basu; Priya Abraham; Sarah S Cherian
Journal:  Indian J Med Res       Date:  2020 Feb & Mar       Impact factor: 2.375

7.  Emerging genetic diversity among clinical isolates of SARS-CoV-2: Lessons for today.

Authors:  Javaid Ahmad Sheikh; Jasdeep Singh; Hina Singh; Salma Jamal; Mohd Khubaib; Sunil Kohli; Ulrich Dobrindt; Syed Asad Rahman; Nasreen Zafar Ehtesham; Seyed Ehtesham Hasnain
Journal:  Infect Genet Evol       Date:  2020-04-24       Impact factor: 3.342

8.  Asymptomatic and Presymptomatic SARS-CoV-2 Infections in Residents of a Long-Term Care Skilled Nursing Facility - King County, Washington, March 2020.

Authors:  Anne Kimball; Kelly M Hatfield; Melissa Arons; Allison James; Joanne Taylor; Kevin Spicer; Ana C Bardossy; Lisa P Oakley; Sukarma Tanwar; Zeshan Chisty; Jeneita M Bell; Mark Methner; Josh Harney; Jesica R Jacobs; Christina M Carlson; Heather P McLaughlin; Nimalie Stone; Shauna Clark; Claire Brostrom-Smith; Libby C Page; Meagan Kay; James Lewis; Denny Russell; Brian Hiatt; Jessica Gant; Jeffrey S Duchin; Thomas A Clark; Margaret A Honein; Sujan C Reddy; John A Jernigan
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2020-04-03       Impact factor: 17.586

9.  Phylogenetic network analysis of SARS-CoV-2 genomes.

Authors:  Peter Forster; Lucy Forster; Colin Renfrew; Michael Forster
Journal:  Proc Natl Acad Sci U S A       Date:  2020-04-08       Impact factor: 11.205

10.  The proximal origin of SARS-CoV-2.

Authors:  Kristian G Andersen; Andrew Rambaut; W Ian Lipkin; Edward C Holmes; Robert F Garry
Journal:  Nat Med       Date:  2020-04       Impact factor: 87.241

  10 in total
  7 in total

1.  Phylogenetic classification of the whole-genome sequences of SARS-CoV-2 from India & evolutionary trends.

Authors:  Varsha Potdar; Veena Vipat; Ashwini Ramdasi; Santosh Jadhav; Jayashri Pawar-Patil; Atul Walimbe; Sucheta S Patil; Manohar L Choudhury; Jayanthi Shastri; Sachee Agrawal; Shailesh Pawar; Kavita Lole; Priya Abraham; Sarah Cherian
Journal:  Indian J Med Res       Date:  2021 Jan & Feb       Impact factor: 2.375

Review 2.  Emergence and molecular mechanisms of SARS-CoV-2 and HIV to target host cells and potential therapeutics.

Authors:  Mansab Ali Saleemi; Bilal Ahmad; Khaled Benchoula; Muhammad Sufyan Vohra; Hing Jian Mea; Pei Pei Chong; Navindra Kumari Palanisamy; Eng Hwa Wong
Journal:  Infect Genet Evol       Date:  2020-10-06       Impact factor: 3.342

3.  Contributions of human ACE2 and TMPRSS2 in determining host-pathogen interaction of COVID-19.

Authors:  Sabyasachi Senapati; Pratibha Banerjee; Sandilya Bhagavatula; Prem Prakash Kushwaha; Shashank Kumar
Journal:  J Genet       Date:  2021       Impact factor: 1.166

Review 4.  Recent updates on COVID-19: A holistic review.

Authors:  Shweta Jakhmola; Omkar Indari; Dharmendra Kashyap; Nidhi Varshney; Annu Rani; Charu Sonkar; Budhadev Baral; Sayantani Chatterjee; Ayan Das; Rajesh Kumar; Hem Chandra Jha
Journal:  Heliyon       Date:  2020-12-11

Review 5.  Mutations in the genome of severe acute respiratory syndrome coronavirus 2: implications for COVID-19 severity and progression.

Authors:  Ahmed Ali Al-Qahtani
Journal:  J Int Med Res       Date:  2022-03       Impact factor: 1.671

6.  Community's perceived high risk of coronavirus infections during early phase of epidemics are significantly influenced by socio-demographic background, in Gondar City, Northwest Ethiopia: A cross-sectional -study.

Authors:  Gebisa Guyasa Kabito; Mekuriaw Alemayehu; Tesfaye Hambisa Mekonnen; Sintayehu Daba Wami; Jember Azanaw; Tsegaye Adane; Zelalem Nigussie Azene; Mehari Woldemariam Merid; Atalay Goshu Muluneh; Demiss Mulatu Geberu; Getahun Molla Kassa; Melaku Kindie Yenit; Sewbesew Yitayih Tilahun; Kassahun Alemu Gelaye; Habtamu Sewunet Mekonnen; Abere Woretaw Azagew; Chalachew Adugna Wubneh; Getaneh Mulualem Belay; Nega Tezera Assimamaw; Chilot Desta Agegnehu; Telake Azale; Animut Tagele Tamiru; Bayew Kelkay Rade; Eden Bishaw Taye; Asefa Adimasu Taddese; Zewudu Andualem; Henok Dagne; Kiros Terefe Gashaye
Journal:  PLoS One       Date:  2020-11-19       Impact factor: 3.240

7.  Structure-Function Analyses of New SARS-CoV-2 Variants B.1.1.7, B.1.351 and B.1.1.28.1: Clinical, Diagnostic, Therapeutic and Public Health Implications.

Authors:  Jasdeep Singh; Jasmine Samal; Vipul Kumar; Jyoti Sharma; Usha Agrawal; Nasreen Z Ehtesham; Durai Sundar; Syed Asad Rahman; Subhash Hira; Seyed E Hasnain
Journal:  Viruses       Date:  2021-03-09       Impact factor: 5.048

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.