Literature DB >> 34698508

From Alpha to Zeta: Identifying Variants and Subtypes of SARS-CoV-2 Via Clustering.

Andrew Melnyk1, Fatemeh Mohebbi1, Sergey Knyazev1, Bikram Sahoo1, Roya Hosseini1, Pavel Skums1, Alex Zelikovsky1,2, Murray Patterson1.   

Abstract

The availability of millions of SARS-CoV-2 (Severe Acute Respiratory Syndrome-Coronavirus-2) sequences in public databases such as GISAID (Global Initiative on Sharing All Influenza Data) and EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute) (the United Kingdom) allows a detailed study of the evolution, genomic diversity, and dynamics of a virus such as never before. Here, we identify novel variants and subtypes of SARS-CoV-2 by clustering sequences in adapting methods originally designed for haplotyping intrahost viral populations. We asses our results using clustering entropy-the first time it has been used in this context. Our clustering approach reaches lower entropies compared with other methods, and we are able to boost this even further through gap filling and Monte Carlo-based entropy minimization. Moreover, our method clearly identifies the well-known Alpha variant in the U.K. and GISAID data sets, and is also able to detect the much less represented (<1% of the sequences) Beta (South Africa), Epsilon (California), and Gamma and Zeta (Brazil) variants in the GISAID data set. Finally, we show that each variant identified has high selective fitness, based on the growth rate of its cluster over time. This demonstrates that our clustering approach is a viable alternative for detecting even rare subtypes in very large data sets.

Entities:  

Keywords:  clustering; entropy; fitness; genomic surveillance; viral subtypes; viral variants

Mesh:

Year:  2021        PMID: 34698508      PMCID: PMC8819513          DOI: 10.1089/cmb.2021.0302

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  32 in total

1.  Sequence logos: a new way to display consensus sequences.

Authors:  T D Schneider; R M Stephens
Journal:  Nucleic Acids Res       Date:  1990-10-25       Impact factor: 16.971

2.  Emergence of a Novel SARS-CoV-2 Variant in Southern California.

Authors:  Wenjuan Zhang; Brian D Davis; Stephanie S Chen; Jorge M Sincuir Martinez; Jasmine T Plummer; Eric Vail
Journal:  JAMA       Date:  2021-04-06       Impact factor: 56.272

3.  Severe reinfection with South African SARS-CoV-2 variant 501Y.V2: A case report.

Authors:  Noémie Zucman; Fabrice Uhel; Diane Descamps; Damien Roux; Jean-Damien Ricard
Journal:  Clin Infect Dis       Date:  2021-02-10       Impact factor: 9.079

4.  SARS-CoV-2 immune evasion by the B.1.427/B.1.429 variant of concern.

Authors:  Matthew McCallum; Jessica Bassi; Anna De Marco; Alex Chen; Alexandra C Walls; Julia Di Iulio; M Alejandra Tortorici; Mary-Jane Navarro; Chiara Silacci-Fregni; Christian Saliba; Kaitlin R Sprouse; Maria Agostini; Dora Pinto; Katja Culap; Siro Bianchi; Stefano Jaconi; Elisabetta Cameroni; John E Bowen; Sasha W Tilles; Matteo Samuele Pizzuto; Sonja Bernasconi Guastalla; Giovanni Bona; Alessandra Franzetti Pellanda; Christian Garzoni; Wesley C Van Voorhis; Laura E Rosen; Gyorgy Snell; Amalio Telenti; Herbert W Virgin; Luca Piccoli; Davide Corti; David Veesler
Journal:  Science       Date:  2021-07-01       Impact factor: 47.728

5.  Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK.

Authors:  Louis du Plessis; John T McCrone; Alexander E Zarebski; Verity Hill; Christopher Ruis; Moritz U G Kraemer; Andrew Rambaut; Oliver G Pybus; Bernardo Gutierrez; Jayna Raghwani; Jordan Ashworth; Rachel Colquhoun; Thomas R Connor; Nuno R Faria; Ben Jackson; Nicholas J Loman; Áine O'Toole; Samuel M Nicholls; Kris V Parag; Emily Scher; Tetyana I Vasylyeva; Erik M Volz; Alexander Watts; Isaac I Bogoch; Kamran Khan; David M Aanensen
Journal:  Science       Date:  2021-01-08       Impact factor: 47.728

6.  gpps: an ILP-based approach for inferring cancer progression with mutation losses from single cell data.

Authors:  Simone Ciccolella; Mauricio Soto Gomez; Murray D Patterson; Gianluca Della Vedova; Iman Hajirasouliha; Paola Bonizzoni
Journal:  BMC Bioinformatics       Date:  2020-12-09       Impact factor: 3.169

7.  Tree inference for single-cell data.

Authors:  Katharina Jahn; Jack Kuipers; Niko Beerenwinkel
Journal:  Genome Biol       Date:  2016-05-05       Impact factor: 13.583

8.  The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

Authors:  Irene Vrbik; David A Stephens; Michel Roger; Bluma G Brenner
Journal:  BMC Bioinformatics       Date:  2015-11-04       Impact factor: 3.169

9.  Introduction of the South African SARS-CoV-2 variant 501Y.V2 into the UK.

Authors:  Julian W Tang; Oliver T R Toovey; Kirsty N Harvey; David D S Hui
Journal:  J Infect       Date:  2021-01-17       Impact factor: 6.072

View more
  3 in total

1.  MeShClust v3.0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores.

Authors:  Hani Z Girgis
Journal:  BMC Genomics       Date:  2022-06-06       Impact factor: 4.547

2.  AutoCoV: tracking the early spread of COVID-19 in terms of the spatial and temporal patterns from embedding space by K-mer based deep learning.

Authors:  Inyoung Sung; Sangseon Lee; Minwoo Pak; Yunyol Shin; Sun Kim
Journal:  BMC Bioinformatics       Date:  2022-04-25       Impact factor: 3.307

3.  PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences.

Authors:  Sarwan Ali; Babatunde Bello; Prakash Chourasia; Ria Thazhe Punathil; Yijing Zhou; Murray Patterson
Journal:  Biology (Basel)       Date:  2022-03-09
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.