Literature DB >> 33226067

Casboundary: automated definition of integral Cas cassettes.

Victor A Padilha1, Omer S Alkhnbashi2, Van Dinh Tran2, Shiraz A Shah3, André C P L F Carvalho1, Rolf Backofen2,4.   

Abstract

MOTIVATION: CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette boundaries is key for finding cassettes in CRISPR research field. This is often carried out by using Hidden Markov Models and manual annotation. In this article, we propose the first method able to automatically define the cassette boundaries. In addition, we present a Cas-type predictive model used by the method to assign each gene located in the region defined by a cassette's boundaries a Cas label from a set of pre-defined Cas types. Furthermore, the proposed method can detect potentially new cas genes and decompose a cassette into its modules.
RESULTS: We evaluate the predictive performance of our proposed method on data collected from the two most recent CRISPR classification studies. In our experiments, we obtain an average similarity of 0.86 between the predicted and expected cassettes. Besides, we achieve F-scores above 0.9 for the classification of cas genes of known types and 0.73 for the unknown ones. Finally, we conduct two additional study cases, where we investigate the occurrence of potentially new cas genes and the occurrence of module exchange between different genomes.
AVAILABILITY AND IMPLEMENTATION: https://github.com/BackofenLab/Casboundary. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Year:  2021        PMID: 33226067      PMCID: PMC8208735          DOI: 10.1093/bioinformatics/btaa984

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  29 in total

1.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors:  Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal:  Nat Methods       Date:  2011-12-25       Impact factor: 28.547

2.  The COG database: a tool for genome-scale analysis of protein functions and evolution.

Authors:  R L Tatusov; M Y Galperin; D A Natale; E V Koonin
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 3.  Evolutionary entanglement of mobile genetic elements and host defence systems: guns for hire.

Authors:  Eugene V Koonin; Kira S Makarova; Yuri I Wolf; Mart Krupovic
Journal:  Nat Rev Genet       Date:  2019-10-14       Impact factor: 53.242

4.  A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes.

Authors:  Daniel H Haft; Jeremy Selengut; Emmanuel F Mongodin; Karen E Nelson
Journal:  PLoS Comput Biol       Date:  2005-11-11       Impact factor: 4.475

5.  CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins.

Authors:  David Couvin; Aude Bernheim; Claire Toffano-Nioche; Marie Touchon; Juraj Michalik; Bertrand Néron; Eduardo P C Rocha; Gilles Vergnaud; Daniel Gautheret; Christine Pourcel
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

6.  Comprehensive search for accessory proteins encoded with archaeal and bacterial type III CRISPR-cas gene cassettes reveals 39 new cas gene families.

Authors:  Shiraz A Shah; Omer S Alkhnbashi; Juliane Behler; Wenyuan Han; Qunxin She; Wolfgang R Hess; Roger A Garrett; Rolf Backofen
Journal:  RNA Biol       Date:  2018-06-19       Impact factor: 4.652

7.  The Pfam protein families database.

Authors:  Alex Bateman; Lachlan Coin; Richard Durbin; Robert D Finn; Volker Hollich; Sam Griffiths-Jones; Ajay Khanna; Mhairi Marshall; Simon Moxon; Erik L L Sonnhammer; David J Studholme; Corin Yeats; Sean R Eddy
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

8.  CRISPR adaptive immune systems of Archaea.

Authors:  Gisle Vestergaard; Roger A Garrett; Shiraz A Shah
Journal:  RNA Biol       Date:  2014-02-07       Impact factor: 4.652

9.  CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci.

Authors:  Omer S Alkhnbashi; Fabrizio Costa; Shiraz A Shah; Roger A Garrett; Sita J Saunders; Rolf Backofen
Journal:  Bioinformatics       Date:  2014-09-01       Impact factor: 6.937

10.  Foreign DNA acquisition by the I-F CRISPR-Cas system requires all components of the interference machinery.

Authors:  Daria Vorontsova; Kirill A Datsenko; Sofia Medvedeva; Joseph Bondy-Denomy; Ekaterina E Savitskaya; Ksenia Pougach; Maria Logacheva; Blake Wiedenheft; Alan R Davidson; Konstantin Severinov; Ekaterina Semenova
Journal:  Nucleic Acids Res       Date:  2015-11-19       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.