Literature DB >> 34960694

Semi-Supervised Pipeline for Autonomous Annotation of SARS-CoV-2 Genomes.

Kristen L Beck1, Edward Seabolt1, Akshay Agarwal1, Gowri Nayar1, Simone Bianco1,2, Harsha Krishnareddy1, Timothy A Ngo1, Mark Kunitomi1, Vandana Mukherjee1, James H Kaufman1.   

Abstract

SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. However, autonomous genome annotation of SARS-CoV-2 genes, proteins, and domains is not readily accomplished by existing methods and results in missing or incorrect sequences. To overcome this limitation, we developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on the use of a single reference genome and by overcoming atypical genomic traits that challenge traditional bioinformatic methods. We analyzed an initial corpus of 66,000 SARS-CoV-2 genome sequences collected from labs across the world using our method and identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction, compared to proteome references, including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools, such as Prokka (base) and VAPiD, we yielded a 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 gene, protein, and domain sequences-some conserved across time and geography and others representing emerging variants. We observed 3362 non-redundant sequences per protein on average within this corpus and described key D614G and N501Y variants spatiotemporally in the initial genome corpus. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized receptor binding domain variants. We further demonstrated the robustness and extensibility of our method on an additional 4000 variant diverse genomes containing all named variants of concern and interest as of August 2021. In this cohort, we successfully identified all keystone spike glycoprotein mutations in our predicted protein sequences with greater than 99% accuracy as well as demonstrating high accuracy of the protein and domain annotations. This work comprehensively presents the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable, high-accuracy method to analyze newly sequenced infections as they arise.

Entities:  

Keywords:  COVID-19; SARS-CoV-2; bioinformatics; computational biology; gene prediction; genome annotation; genomics; protein domain; protein prediction

Mesh:

Substances:

Year:  2021        PMID: 34960694      PMCID: PMC8706859          DOI: 10.3390/v13122426

Source DB:  PubMed          Journal:  Viruses        ISSN: 1999-4915            Impact factor:   5.048


  32 in total

1.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Authors:  Kazutaka Katoh; Kazuharu Misawa; Kei-ichi Kuma; Takashi Miyata
Journal:  Nucleic Acids Res       Date:  2002-07-15       Impact factor: 16.971

2.  Selecting the Right Similarity-Scoring Matrix.

Authors:  William R Pearson
Journal:  Curr Protoc Bioinformatics       Date:  2013

3.  The coronavirus is mutating - does it matter?

Authors:  Ewen Callaway
Journal:  Nature       Date:  2020-09       Impact factor: 49.962

4.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

Review 5.  The Proteins of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS CoV-2 or n-COV19), the Cause of COVID-19.

Authors:  Francis K Yoshimoto
Journal:  Protein J       Date:  2020-06       Impact factor: 2.371

6.  Structural and functional conservation of the programmed -1 ribosomal frameshift signal of SARS coronavirus 2 (SARS-CoV-2).

Authors:  Jamie A Kelly; Alexandra N Olson; Krishna Neupane; Sneha Munshi; Josue San Emeterio; Lois Pollack; Michael T Woodside; Jonathan D Dinman
Journal:  J Biol Chem       Date:  2020-06-22       Impact factor: 5.157

7.  Emergence of Drift Variants That May Affect COVID-19 Vaccine Development and Antibody Treatment.

Authors:  Takahiko Koyama; Dilhan Weeraratne; Jane L Snowdon; Laxmi Parida
Journal:  Pathogens       Date:  2020-04-26

8.  Functional Genomics Platform, A Cloud-Based Platform for Studying Microbial Life at Scale.

Authors:  Edward E Seabolt; Gowri Nayar; Harsha Krishnareddy; Akshay Agarwal; Kristen L Beck; Ignacio Terrizzano; Eser Kandogan; Mark Kunitomi; Mary Roth; Vandana Mukherjee; James H Kaufman
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2022-04-01       Impact factor: 3.710

Review 9.  Coronavirus biology and replication: implications for SARS-CoV-2.

Authors:  Philip V'kovski; Annika Kratzel; Silvio Steiner; Hanspeter Stalder; Volker Thiel
Journal:  Nat Rev Microbiol       Date:  2020-10-28       Impact factor: 60.633

10.  COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM.

Authors:  Frédéric Lemoine; Luc Blassel; Jakub Voznica; Olivier Gascuel
Journal:  Bioinformatics       Date:  2021-07-19       Impact factor: 6.937

View more
  3 in total

1.  Special Issue "Emerging Viruses 2021: Surveillance, Prevention, Evolution and Control".

Authors:  Fabrício Souza Campos; Maité Freitas Silva Vaslin; Luciana Barros de Arruda
Journal:  Viruses       Date:  2022-04-15       Impact factor: 5.818

2.  Confirming Multiplex RT-qPCR Use in COVID-19 with Next-Generation Sequencing: Strategies for Epidemiological Advantage.

Authors:  Rob E Carpenter; Vaibhav Tamrakar; Harendra Chahar; Tyler Vine; Rahul Sharma
Journal:  Glob Health Epidemiol Genom       Date:  2022-07-30

3.  Predicting Epitope Candidates for SARS-CoV-2.

Authors:  Akshay Agarwal; Kristen L Beck; Sara Capponi; Mark Kunitomi; Gowri Nayar; Edward Seabolt; Gandhar Mahadeshwar; Simone Bianco; Vandana Mukherjee; James H Kaufman
Journal:  Viruses       Date:  2022-08-21       Impact factor: 5.818

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.