Literature DB >> 33290505

CRISPRidentify: identification of CRISPR arrays using machine learning approach.

Alexander Mitrofanov1, Omer S Alkhnbashi1, Sergey A Shmakov2, Kira S Makarova2, Eugene V Koonin2, Rolf Backofen1,3.   

Abstract

CRISPR-Cas are adaptive immune systems that degrade foreign genetic elements in archaea and bacteria. In carrying out their immune functions, CRISPR-Cas systems heavily rely on RNA components. These CRISPR (cr) RNAs are repeat-spacer units that are produced by processing of pre-crRNA, the transcript of CRISPR arrays, and guide Cas protein(s) to the cognate invading nucleic acids, enabling their destruction. Several bioinformatics tools have been developed to detect CRISPR arrays based solely on DNA sequences, but all these tools employ the same strategy of looking for repetitive patterns, which might correspond to CRISPR array repeats. The identified patterns are evaluated using a fixed, built-in scoring function, and arrays exceeding a cut-off value are reported. Here, we instead introduce a data-driven approach that uses machine learning to detect and differentiate true CRISPR arrays from false ones based on several features. Our CRISPR detection tool, CRISPRidentify, performs three steps: detection, feature extraction and classification based on manually curated sets of positive and negative examples of CRISPR arrays. The identified CRISPR arrays are then reported to the user accompanied by detailed annotation. We demonstrate that our approach identifies not only previously detected CRISPR arrays, but also CRISPR array candidates not detected by other tools. Compared to other methods, our tool has a drastically reduced false positive rate. In contrast to the existing tools, our approach not only provides the user with the basic statistics on the identified CRISPR arrays but also produces a certainty score as a practical measure of the likelihood that a given genomic region is a CRISPR array.
© The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2021        PMID: 33290505      PMCID: PMC7913763          DOI: 10.1093/nar/gkaa1158

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  53 in total

1.  Mechanism of substrate selection by a highly specific CRISPR endoribonuclease.

Authors:  Samuel H Sternberg; Rachel E Haurwitz; Jennifer A Doudna
Journal:  RNA       Date:  2012-02-16       Impact factor: 4.942

Review 2.  A review of feature selection techniques in bioinformatics.

Authors:  Yvan Saeys; Iñaki Inza; Pedro Larrañaga
Journal:  Bioinformatics       Date:  2007-08-24       Impact factor: 6.937

3.  Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems.

Authors:  Ryan T Leenay; Kenneth R Maksimchuk; Rebecca A Slotkowski; Roma N Agrawal; Ahmed A Gomaa; Alexandra E Briner; Rodolphe Barrangou; Chase L Beisel
Journal:  Mol Cell       Date:  2016-03-31       Impact factor: 17.970

4.  Modulation of CRISPR locus transcription by the repeat-binding protein Cbp1 in Sulfolobus.

Authors:  Ling Deng; Chandra S Kenchappa; Xu Peng; Qunxin She; Roger A Garrett
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

5.  Dynamic properties of the Sulfolobus CRISPR/Cas and CRISPR/Cmr systems when challenged with vector-borne viral and plasmid genes and protospacers.

Authors:  Soley Gudbergsdottir; Ling Deng; Zhengjun Chen; Jaide V K Jensen; Linda R Jensen; Qunxin She; Roger A Garrett
Journal:  Mol Microbiol       Date:  2010-11-18       Impact factor: 3.501

6.  A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes.

Authors:  Daniel H Haft; Jeremy Selengut; Emmanuel F Mongodin; Karen E Nelson
Journal:  PLoS Comput Biol       Date:  2005-11-11       Impact factor: 4.475

7.  CRISPR adaptation biases explain preference for acquisition of foreign DNA.

Authors:  Asaf Levy; Moran G Goren; Ido Yosef; Oren Auster; Miriam Manor; Gil Amitai; Rotem Edgar; Udi Qimron; Rotem Sorek
Journal:  Nature       Date:  2015-04-13       Impact factor: 49.962

8.  Comprehensive search for accessory proteins encoded with archaeal and bacterial type III CRISPR-cas gene cassettes reveals 39 new cas gene families.

Authors:  Shiraz A Shah; Omer S Alkhnbashi; Juliane Behler; Wenyuan Han; Qunxin She; Wolfgang R Hess; Roger A Garrett; Rolf Backofen
Journal:  RNA Biol       Date:  2018-06-19       Impact factor: 4.652

9.  Protospacer recognition motifs: mixed identities and functional diversity.

Authors:  Shiraz A Shah; Susanne Erdmann; Francisco J M Mojica; Roger A Garrett
Journal:  RNA Biol       Date:  2013-02-12       Impact factor: 4.652

Review 10.  Bacterial insertion sequences: their genomic impact and diversity.

Authors:  Patricia Siguier; Edith Gourbeyre; Mick Chandler
Journal:  FEMS Microbiol Rev       Date:  2014-02-26       Impact factor: 16.408

View more
  9 in total

1.  CRISPRclassify: Repeat-Based Classification of CRISPR Loci.

Authors:  Matthew A Nethery; Michael Korvink; Kira S Makarova; Yuri I Wolf; Eugene V Koonin; Rodolphe Barrangou
Journal:  CRISPR J       Date:  2021-08

2.  BioAutoML: automated feature engineering and metalearning to predict noncoding RNAs in bacteria.

Authors:  Robson P Bonidia; Anderson P Avila Santos; Breno L S de Almeida; Peter F Stadler; Ulisses N da Rocha; Danilo S Sanches; André C P L F de Carvalho
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

3.  Spacer prioritization in CRISPR-Cas9 immunity is enabled by the leader RNA.

Authors:  Sahil Sharma; Sarah L Svensson; Anuja Kibe; Zasha Weinberg; Chunyu Liao; Omer S Alkhnbashi; Thorsten Bischler; Rolf Backofen; Neva Caliskan; Cynthia M Sharma; Chase L Beisel
Journal:  Nat Microbiol       Date:  2022-03-21       Impact factor: 30.964

Review 4.  Engineered CRISPR-Cas systems for the detection and control of antibiotic-resistant infections.

Authors:  Yuye Wu; Dheerendranath Battalapalli; Mohammed J Hakeem; Venkatarao Selamneni; Pengfei Zhang; Mohamed S Draz; Zhi Ruan
Journal:  J Nanobiotechnology       Date:  2021-12-04       Impact factor: 10.435

Review 5.  Repetitive DNA Sequences in the Human Y Chromosome and Male Infertility.

Authors:  Yong Xu; Qianqian Pang
Journal:  Front Cell Dev Biol       Date:  2022-07-13

6.  CRISPRtracrRNA: robust approach for CRISPR tracrRNA detection.

Authors:  Alexander Mitrofanov; Marcus Ziemann; Omer S Alkhnbashi; Wolfgang R Hess; Rolf Backofen
Journal:  Bioinformatics       Date:  2022-09-16       Impact factor: 6.931

7.  Comparative Genomics of Closely Related Tetragenococcus halophilus Strains Elucidate the Diversity and Microevolution of CRISPR Elements.

Authors:  Minenosuke Matsutani; Takura Wakinaka; Jun Watanabe; Masafumi Tokuoka; Akihiro Ohnishi
Journal:  Front Microbiol       Date:  2021-06-18       Impact factor: 5.640

8.  Exploring Viral Diversity in a Gypsum Karst Lake Ecosystem Using Targeted Single-Cell Genomics.

Authors:  Sigitas Šulčius; Gediminas Alzbutas; Viktorija Juknevičiūtė; Eugenijus Šimoliūnas; Petras Venckus; Monika Šimoliūnienė; Ričardas Paškauskas
Journal:  Genes (Basel)       Date:  2021-06-08       Impact factor: 4.096

Review 9.  Incorporating Machine Learning into Established Bioinformatics Frameworks.

Authors:  Noam Auslander; Ayal B Gussow; Eugene V Koonin
Journal:  Int J Mol Sci       Date:  2021-03-12       Impact factor: 5.923

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.