| Literature DB >> 34419470 |
Xabier Bello1, Jacobo Pardo-Seco1, Alberto Gómez-Carballa1, Hansi Weissensteiner2, Federico Martinón-Torres3, Antonio Salas4.
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the pathogen responsible for the coronavirus disease 2019 (COVID-19) pandemic. SARS-CoV-2 genomes have been sequenced massively and worldwide and are now available in different public genome repositories. There is much interest in generating bioinformatic tools capable to analyze and interpret SARS-CoV-2 variation. We have designed CovidPhy (http://covidphy.eu), a web interface that can process SARS-CoV-2 genome sequences in plain fasta text format or provided through identity codes from the Global Initiative on Sharing Avian Influenza Data (GISAID) or GenBank. CovidPhy aggregates information available on the large GISAID database (>1.49 M genomes). Sequences are first aligned against the reference sequence and the interface provides different sources of information, including automatic classification of genomes into a pre-computed phylogeny and phylogeographic information, haplogroup/lineage frequencies, and sequencing variation, indicating also if the genome contains known variants of concern (VOC). Additionally, CovidPhy allows searching for variants and haplotypes introduced by the user and includes a list of genomes that are good candidates for being responsible for large outbreaks worldwide, most likely mediated by important superspreading events, indicating their possible geographic epicenters and their relative impact as recorded in the GISAID database.Entities:
Keywords: COVID-19; Phylogeny; RNA; SARS-CoV-2; Superspreading events; Variants of concern
Mesh:
Year: 2021 PMID: 34419470 PMCID: PMC8376833 DOI: 10.1016/j.envres.2021.111909
Source DB: PubMed Journal: Environ Res ISSN: 0013-9351 Impact factor: 6.498
Fig. 1Pipeline of CovidPhy. CovidPhy offers three interfaces: a web, a CLI and a GUI. All three can be fed with a fasta file (top left) that is aligned using libdistfast.so against the Reference (402,125) and scanned looking for differences that allow the classification in a precomputed phylogeny (top, red square marked “core”). The output varies for each program: the CLI and the GUI only output the haplogroup and the variants found (bottom black square), while the web offers additional information: haplogroup frequencies in regions (e.g. countries), candidates for important outbreaks as inferred from database searchers, and VOCs. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)