| Literature DB >> 32956448 |
Chayan Kumar Saha1, Rodrigo Sanches Pires2, Harald Brolin3, Maxence Delannoy4, Gemma Catherine Atkinson1.
Abstract
SUMMARY: Analysis of conservation of gene neighbourhoods over different evolutionary levels is important for understanding operon and gene cluster evolution, and predicting functional associations. Our tool FlaGs (standing for Flanking Genes) takes a list of NCBI protein accessions as input, clusters neighbourhood-encoded proteins into homologous groups using sensitive sequence searching, and outputs a graphical visualization of the gene neighbourhood and its conservation, along with a phylogenetic tree annotated with flanking gene conservation. FlaGs has demonstrated utility for molecular evolutionary analysis, having uncovered a new toxin-antitoxin system in prokaryotes and bacteriophages. The web tool version of FlaGs (webFlaGs) can optionally include a BLASTP search against a reduced RefSeq database to generate an input accession list and analyse neighbourhood conservation within the same run.Entities:
Year: 2021 PMID: 32956448 PMCID: PMC8189683 DOI: 10.1093/bioinformatics/btaa788
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The FlaGs workflow and example results. (A) The user inputs a list of protein accession numbers—optionally with GCF assembly IDs—and can specify the number of adjacent flanking genes to consider, and the sensitivity of the Jackhmmer search through changing the E value cut-off and number of iterations. The web version of FlaGs (webFlaGs) can optionally use a single protein sequence or NCBI accession and begin by executing a BLASTP search against the RefSeq database (excluding eukaryotes) or a representative genome database to generate the input list of accessions. The output always includes a to-scale figure of flanking genes, a description of the flanking gene identities as a legend, and optionally, a phylogenetic tree annotated with colour- and number-coded pennant flags. (B) Example results using toxins of the toxSAS toxin–antitoxin system (Jimmy ) as the query. Empty genes with grey borders are not conserved in the dataset, and grey genes with blue borders are pseudogenes. In this example, FlaGs reveals four different homologous groups of antitoxins as flanking genes, two of which (green and yellow) are antitoxins for the same cognate toxin. Group number 5 is an integrase. As FlaGs does not require complete genomes, regions can lack flanking genes on one side if the query gene is close to the end of a contig, as is the case with Arthrobacter castelli in this example