Literature DB >> 33415739

Unsupervised cluster analysis of SARS-CoV-2 genomes reflects its geographic progression and identifies distinct genetic subgroups of SARS-CoV-2 virus.

Georg Hahn1, Sanghun Lee1,2, Scott T Weiss3, Christoph Lange1.   

Abstract

Over 10,000 viral genome sequences of the SARS-CoV-2virus have been made readily available during the ongoing coronavirus pandemic since the initial genome sequence of the virus was released on the open access Virological website (http://virological.org/) early on January 11. We utilize the published data on the single stranded RNAs of 11,132 SARS-CoV-2 patients in the GISAID database, which contains fully or partially sequenced SARS-CoV-2 samples from laboratories around the world. Among many important research questions which are currently being investigated, one aspect pertains to the genetic characterization/classification of the virus. We analyze data on the nucleotide sequencing of the virus and geographic information of a subset of 7640 SARS-CoV-2 patients without missing entries that are available in the GISAID database. Instead of modeling the mutation rate, applying phylogenetic tree approaches, and so forth, we here utilize a model-free clustering approach that compares the viruses at a genome-wide level. We apply principal component analysis to a similarity matrix that compares all pairs of these SARS-CoV-2 nucleotide sequences at all loci simultaneously, using the Jaccard index. Our analysis results of the SARS-CoV-2 genome data illustrates the geographic and chronological progression of the virus, starting from the first cases that were observed in China to the current wave of cases in Europe and North America. This is in line with a phylogenetic analysis which we use to contrast our results. We also observe that, based on their sequence data, the SARS-CoV-2 viruses cluster in distinct genetic subgroups. It is the subject of ongoing research to examine whether the genetic subgroup could be related to diseases outcome and its potential implications for vaccine development.
© 2021 Wiley Periodicals LLC.

Entities:  

Keywords:  SARS-CoV-2; clustering; covid; jaccard

Mesh:

Year:  2021        PMID: 33415739      PMCID: PMC8005425          DOI: 10.1002/gepi.22373

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  12 in total

1.  Discriminant analysis of principal components: a new method for the analysis of genetically structured populations.

Authors:  Thibaut Jombart; Sébastien Devillard; François Balloux
Journal:  BMC Genet       Date:  2010-10-15       Impact factor: 2.797

2.  locStra: Fast analysis of regional/global stratification in whole-genome sequencing studies.

Authors:  Georg Hahn; Sharon M Lutz; Julian Hecker; Dmitry Prokopenko; Michael H Cho; Edwin K Silverman; Scott T Weiss; Christoph Lange
Journal:  Genet Epidemiol       Date:  2020-09-14       Impact factor: 2.135

3.  Identification of genetic outliers due to sub-structure and cryptic relationships.

Authors:  Daniel Schlauch; Heide Fier; Christoph Lange
Journal:  Bioinformatics       Date:  2017-07-01       Impact factor: 6.937

4.  GISAID: Global initiative on sharing all influenza data - from vision to reality.

Authors:  Yuelong Shu; John McCauley
Journal:  Euro Surveill       Date:  2017-03-30

5.  Data, disease and diplomacy: GISAID's innovative contribution to global health.

Authors:  Stefan Elbe; Gemma Buckland-Merrett
Journal:  Glob Chall       Date:  2017-01-10

6.  Moderate mutation rate in the SARS coronavirus genome and its implications.

Authors:  Zhongming Zhao; Haipeng Li; Xiaozhuang Wu; Yixi Zhong; Keqin Zhang; Ya-Ping Zhang; Eric Boerwinkle; Yun-Xin Fu
Journal:  BMC Evol Biol       Date:  2004-06-28       Impact factor: 3.260

Review 7.  Genotype and phenotype of COVID-19: Their roles in pathogenesis.

Authors:  Leila Mousavizadeh; Sorayya Ghasemi
Journal:  J Microbiol Immunol Infect       Date:  2020-03-31       Impact factor: 4.399

8.  Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study.

Authors:  Fei Zhou; Ting Yu; Ronghui Du; Guohui Fan; Ying Liu; Zhibo Liu; Jie Xiang; Yeming Wang; Bin Song; Xiaoying Gu; Lulu Guan; Yuan Wei; Hui Li; Xudong Wu; Jiuyang Xu; Shengjin Tu; Yi Zhang; Hua Chen; Bin Cao
Journal:  Lancet       Date:  2020-03-11       Impact factor: 79.321

9.  Host susceptibility to severe COVID-19 and establishment of a host risk score: findings of 487 cases outside Wuhan.

Authors:  Yu Shi; Xia Yu; Hong Zhao; Hao Wang; Ruihong Zhao; Jifang Sheng
Journal:  Crit Care       Date:  2020-03-18       Impact factor: 9.097

10.  COVID-19 in a Long-Term Care Facility - King County, Washington, February 27-March 9, 2020.

Authors:  Temet M McMichael; Shauna Clark; Sargis Pogosjans; Meagan Kay; James Lewis; Atar Baer; Vance Kawakami; Margaret D Lukoff; Jessica Ferro; Claire Brostrom-Smith; Francis X Riedo; Denny Russell; Brian Hiatt; Patricia Montgomery; Agam K Rao; Dustin W Currie; Eric J Chow; Farrell Tobolowsky; Ana C Bardossy; Lisa P Oakley; Jesica R Jacobs; Noah G Schwartz; Nimalie Stone; Sujan C Reddy; John A Jernigan; Margaret A Honein; Thomas A Clark; Jeffrey S Duchin
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2020-03-27       Impact factor: 17.586

View more
  3 in total

1.  Genome-wide analysis of 10664 SARS-CoV-2 genomes to identify virus strains in 73 countries based on single nucleotide polymorphism.

Authors:  Nimisha Ghosh; Indrajit Saha; Nikhil Sharma; Suman Nandi; Dariusz Plewczynski
Journal:  Virus Res       Date:  2021-03-26       Impact factor: 3.303

2.  COVID-19: Integrating genomic and epidemiological data to inform public health interventions and policy in Tasmania, Australia.

Authors:  Nicola Stephens; Michelle McPherson; Louise Cooley; Rob Vanhaeften; Mathilda Wilmot; Courtney Lane; Michelle Harlock; Kerryn Lodo; Natasha Castree; Torsten Seemann; Michelle Sait; Susan Ballard; Kristy Horan; Mark Veitch; Fay Johnston; Norelle Sherry; Ben Howden
Journal:  Western Pac Surveill Response J       Date:  2021-12-22

3.  Genome-wide association analysis of COVID-19 mortality risk in SARS-CoV-2 genomes identifies mutation in the SARS-CoV-2 spike protein that colocalizes with P.1 of the Brazilian strain.

Authors:  Georg Hahn; Chloe M Wu; Sanghun Lee; Sharon M Lutz; Surender Khurana; Lindsey R Baden; Sebastien Haneuse; Dandi Qiao; Julian Hecker; Dawn L DeMeo; Rudolph E Tanzi; Manish C Choudhary; Behzad Etemad; Abbas Mohammadi; Elmira Esmaeilzadeh; Michael H Cho; Jonathan Z Li; Adrienne G Randolph; Nan M Laird; Scott T Weiss; Edwin K Silverman; Katharina Ribbeck; Christoph Lange
Journal:  Genet Epidemiol       Date:  2021-06-22       Impact factor: 2.344

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.