Literature DB >> 35075458

RASCL: Rapid Assessment Of SARS-CoV-2 Clades Through Molecular Sequence Analysis.

Alexander G Lucaci, Jordan D Zehr, Stephen D Shank, Dave Bouvier, Han Mei, Anton Nekrutenko, Darren P Martin, Sergei L Kosakovsky Pond.   

Abstract

An important component of efforts to manage the ongoing COVID19 pandemic is the R apid A ssessment of how natural selection contributes to the emergence and proliferation of potentially dangerous S ARS-CoV-2 lineages and CL ades (RASCL). The RASCL pipeline enables continuous comparative phylogenetics-based selection analyses of rapidly growing clade-focused genome surveillance datasets, such as those produced following the initial detection of potentially dangerous variants. From such datasets RASCL automatically generates down-sampled codon alignments of individual genes/ORFs containing contextualizing background reference sequences, analyzes these with a battery of selection tests, and outputs results as both machine readable JSON files, and interactive notebook-based visualizations. AVAILABILITY: RASCL is available from a dedicated repository at https://github.com/veg/RASCL and as a Galaxy workflow https://usegalaxy.eu/u/hyphy/w/rascl . Existing clade/variant analysis results are available here: https://observablehq.com/@aglucaci/rascl . CONTACT: Dr. Sergei L Kosakovsky Pond ( spond@temple.edu ). SUPPLEMENTARY INFORMATION: N/A.

Entities:  

Year:  2022        PMID: 35075458      PMCID: PMC8786235          DOI: 10.1101/2022.01.15.476448

Source DB:  PubMed          Journal:  bioRxiv


Rapid characterization and assessment of the clade-specific molecular features of individual persistent or rapidly expanding SARS-CoV-2 lineages has become an important component of efforts to monitor and manage the COVID19 pandemic. Analyses of natural selection have been broadly incorporated into such assessments as a primary tool for inferring the selective processes under which novel SARS-CoV-2 variants evolve (Tegally , Faria , D. Martin , MacLean et al., 2021). Ongoing monitoring of emergent variants of interest (VOI) or concern (VOC) can detect potentially adaptive mutations before they rise to high frequency, and help establish the relationships between individual mutations and key viral characteristics including pathogenicity, transmissibility, and drug resistance (Hamed , Young , Luchsinger , Abdool , Cyrus Maher ). Molecular patterns of ongoing selection that are evident within sequences sampled from particular VOI or VOC clades may also reveal the sub-lineages within these clades that carry potentially fitness-enhancing mutations and which are therefore most likely to drive future viral transmission (Rambaut ). Here, we present RASCL (Rapid Assessment of SARS-CoV-2 CLades), an analytic pipeline designed to investigate the nature and extent of selective forces acting on viral genes in SARS-CoV-2 sequences through comparative phylogenetic analyses (Figure 1A). RASCL is implemented as an easy-to-use, standalone pipeline and as a web application, integrated in the Galaxy framework and available for use on powerful public computing infrastructure (Afgan ).
Figure 1.

(A) A flowchart diagram of the main analytic engine of RASCL. (B) Examples of the ObservableHQ visualization notebook elements for the main Omicron clade (BA.1).

The RASCL pipeline takes as input (i) a “query” dataset comprising a single FASTA file containing unaligned SARS-CoV-2 full or partial genomes belonging to a clade of interest (e.g., all sequences from the PANGO lineage, B.1.617.2) and (ii) a generic “background” dataset that might comprise, for example, a set of sequences that are representative of global SARS-CoV-2 genomic diversity assembled from ViPR (Pickett ). It is not necessary to remove sequences in the query dataset from the reference dataset -- the pipeline will do this automatically. The choice of “query” and “background” datasets is analysis-specific. For example, if another clade of interest is provided as background it is possible to identify sites that are evolving differently between two clades directly. Other sensible choices of query sequences might be: sequences from a specific country/region, or sequences sampled during a particular time period. Following the disassembly of whole genome datasets into individual coding sequences (based on the NCBI SARS-CoV-2 reference annotation), the gene datasets (each containing a set of query and background sequences) are processed in parallel. Using complete linkage distance clustering implemented in the TN93 package (https://github.com/veg/tn93), RASCL subsamples from available sequences while attempting to maintain genomic diversity; the clustering threshold distance is chosen automatically to include no more than a user-specified number of genomes (e.g., 300). A combined (query and background) alignment is created with only the sequences that are divergent enough to be useful for subsequent selection analyses being retained from the background dataset. Inference of a maximum likelihood phylogenetic tree (RAxML-NG, Kozlov , or IQ-TREE, Nguyen ) is performed on the combined dataset and the query and background branches of this tree are labeled. Selection analyses are then performed with state of the art molecular evolution models implemented in HyPhy (Pond ). SLAC: performs substitution mapping (Pond and Frost, 2005) BGM: identifies groups of sites that are apparently co-evolving (Poon ) FEL: locates codon sites with evidence of pervasive positive diversifying or negative selection (Pond and Frost, 2005), MEME: locates codon sites with evidence of episodic positive diversifying selection, (Murrell ) BUSTEDS: tests for gene-wide episodic selection (Wisotsky ) RELAX: compare gene-wide selection pressure between the query clade and background sequences (Wertheim ), CFEL: comparison site-by-site selection pressure between query and background sequences (Pond ). FADE: identify amino-acid sites with evidence of directional selection (Pond et.al., 2008) To mitigate the potentially confounding influences of within-host evolution and sequencing errors, these analyses are performed only on internal branches of the phylogenetic tree (Lorenzo-Redondo ). Results are combined into two machine readable JSON files (“Summary” and “Annotation”) that are used for web processing. A feature-rich interactive notebook in ObservableHQ (Perkel 2021, https://observablehq.com/@aglucaci/rascl) is used to visualize and summarize RASCL results (Figure 1B) RASCL is currently available in two distributions:(i) through a web interface via the Galaxy Project as a workflow (https://usegalaxy.eu/u/hyphy/w/rascl); and (ii) as a standalone pipeline via a dedicated GitHub (https://github.com/veg/RASCL) repository. For the web application implementation, the alignment, tree and analysis results are stored and made web-accessible via the Galaxy platform. Results are visualized with an interactive notebook hosted on ObservableHQ (Figure 1B; Perkel 2021) that includes an alignment viewer, a visualization of individual codons/amino acid states at user-selected sites mapped onto the tips of a phylogenetic tree, and detailed tabulated information on analysis results for individual genes and codon-sites. RASCL has been used to characterize the role of natural selection in the emergence of the Beta (Tegally ), Gamma (Faria ), and Omicron (Moyo et al.,2021) VOC lineages, and for identifying patterns of convergent evolution in N501Y SARS-CoV-2 lineages (Martin et al., 2021). Whenever future genomic surveillance efforts reveal new potentially problematic SARS-CoV-2 lineages, we anticipate that RASCL will be productively used to analyze these too. Finally, RASCL has been designed so that, with minimal modification, it can also be adapted to analyze any other viral pathogens for which sufficient sequencing data is available.
  26 in total

1.  Not so different after all: a comparison of methods for detecting amino acid sites under selection.

Authors:  Sergei L Kosakovsky Pond; Simon D W Frost
Journal:  Mol Biol Evol       Date:  2005-02-09       Impact factor: 16.240

2.  Reactive, reproducible, collaborative: computational notebooks evolve.

Authors:  Jeffrey M Perkel
Journal:  Nature       Date:  2021-05       Impact factor: 49.962

3.  HyPhy 2.5-A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies.

Authors:  Sergei L Kosakovsky Pond; Art F Y Poon; Ryan Velazquez; Steven Weaver; N Lance Hepler; Ben Murrell; Stephen D Shank; Brittany Rife Magalis; Dave Bouvier; Anton Nekrutenko; Sadie Wisotsky; Stephanie J Spielman; Simon D W Frost; Spencer V Muse
Journal:  Mol Biol Evol       Date:  2020-01-01       Impact factor: 16.240

4.  Detection of a SARS-CoV-2 variant of concern in South Africa.

Authors:  Houriiyah Tegally; Eduan Wilkinson; Marta Giovanetti; Arash Iranzadeh; Vagner Fonseca; Jennifer Giandhari; Deelan Doolabh; Sureshnee Pillay; Emmanuel James San; Nokukhanya Msomi; Koleka Mlisana; Anne von Gottberg; Sibongile Walaza; Mushal Allam; Arshad Ismail; Thabo Mohale; Allison J Glass; Susan Engelbrecht; Gert Van Zyl; Wolfgang Preiser; Francesco Petruccione; Alex Sigal; Diana Hardie; Gert Marais; Nei-Yuan Hsiao; Stephen Korsman; Mary-Ann Davies; Lynn Tyers; Innocent Mudau; Denis York; Caroline Maslo; Dominique Goedhals; Shareef Abrahams; Oluwakemi Laguda-Akingba; Arghavan Alisoltani-Dehkordi; Adam Godzik; Constantinos Kurt Wibmer; Bryan Trevor Sewell; José Lourenço; Luiz Carlos Junior Alcantara; Sergei L Kosakovsky Pond; Steven Weaver; Darren Martin; Richard J Lessells; Jinal N Bhiman; Carolyn Williamson; Tulio de Oliveira
Journal:  Nature       Date:  2021-03-09       Impact factor: 49.962

5.  A maximum likelihood method for detecting directional evolution in protein sequences and its application to influenza A virus.

Authors:  Sergei L Kosakovsky Pond; Art F Y Poon; Andrew J Leigh Brown; Simon D W Frost
Journal:  Mol Biol Evol       Date:  2008-05-29       Impact factor: 16.240

6.  ViPR: an open bioinformatics database and analysis resource for virology research.

Authors:  Brett E Pickett; Eva L Sadat; Yun Zhang; Jyothi M Noronha; R Burke Squires; Victoria Hunt; Mengya Liu; Sanjeev Kumar; Sam Zaremba; Zhiping Gu; Liwei Zhou; Christopher N Larson; Jonathan Dietrich; Edward B Klem; Richard H Scheuermann
Journal:  Nucleic Acids Res       Date:  2011-10-17       Impact factor: 16.971

7.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.

Authors:  Enis Afgan; Dannon Baker; Bérénice Batut; Marius van den Beek; Dave Bouvier; Martin Cech; John Chilton; Dave Clements; Nate Coraor; Björn A Grüning; Aysam Guerler; Jennifer Hillman-Jackson; Saskia Hiltemann; Vahid Jalili; Helena Rasche; Nicola Soranzo; Jeremy Goecks; James Taylor; Anton Nekrutenko; Daniel Blankenberg
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

8.  New SARS-CoV-2 Variants - Clinical, Public Health, and Vaccine Implications.

Authors:  Salim S Abdool Karim; Tulio de Oliveira
Journal:  N Engl J Med       Date:  2021-03-24       Impact factor: 91.245

9.  Global dynamics of SARS-CoV-2 clades and their relation to COVID-19 epidemiology.

Authors:  Samira M Hamed; Walid F Elkhatib; Ahmed S Khairalla; Ayman M Noreddin
Journal:  Sci Rep       Date:  2021-04-19       Impact factor: 4.379

10.  Association of SARS-CoV-2 clades with clinical, inflammatory and virologic outcomes: An observational study.

Authors:  Barnaby E Young; Wycliffe E Wei; Siew-Wai Fong; Tze-Minn Mak; Danielle E Anderson; Yi-Hao Chan; Rachael Pung; Cheryl Sy Heng; Li Wei Ang; Adrian Kang Eng Zheng; Bernett Lee; Shirin Kalimuddin; Surinder Pada; Paul A Tambyah; Purnima Parthasarathy; Seow Yen Tan; Louisa Sun; Gavin Jd Smith; Raymond Tzer Pin Lin; Yee-Sin Leo; Laurent Renia; Lin-Fa Wang; Lisa Fp Ng; Sebastian Maurer-Stroh; David Chien Lye; Vernon J Lee
Journal:  EBioMedicine       Date:  2021-04-08       Impact factor: 8.143

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.