Literature DB >> 19846435

CRISPI: a CRISPR interactive database.

Christine Rousseau1, Mathieu Gonnet, Marc Le Romancer, Jacques Nicolas.   

Abstract

SUMMARY: The CRISPR genomic structures (Clustered Regularly Interspaced Short Palindromic Repeats) form a family of repeats that is largely present in archaea and frequent in bacteria. On the basis of a formal model of CRISPR using very few parameters, a systematic study of all their occurrences in all available genomes of Archaea and Bacteria has been carried out. This has resulted in a relational database, CRISPI, which also includes a complete repertory of associated CRISPR-associated genes (CAS). A user-friendly web interface with many graphical tools and functions allows users to extract results, find CRISPR in personal sequences or calculate sequence similarity with spacers. AVAILABILITY: CRISPI free access at http://crispi.genouest.org CONTACT: croussea@irisa.fr; jnicolas@irisa.fr

Entities:  

Mesh:

Year:  2009        PMID: 19846435      PMCID: PMC2788928          DOI: 10.1093/bioinformatics/btp586

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

A notable regular structure made up of a skeleton of repeats alternating with a set of highly variable short sequences has been recognized on numerous occasions in prokaryotic genomes under different names in the literature (TREP, SPIDR, SRSR, etc.), and since 2002 has come to be known as CRISPR Clustered Regularly Interspaced Short Palindromic Repeats; Barrangou et al., 2007; Sorek et al., 2008). The structure generally contains 4–10 direct repeats ranging in size from 25 to 45 nt, separated by spacers of similar length containing specific genomic material that is not present elsewhere in the genome and that has probably been imported from plasmids or viruses. CRISPR are present in all but six archaeal species and half of bacteria. Since they are expected to play an important role in prokaryotic adaptive immunity and may serve as specific markers, it is highly desirable to have dedicated identification tools and regularly updated databases available. Several computational methods have been developed to predict CRISPR using a more or less explicit model introducing many parameters filtering the permitted number of elements, sizes and distances between elements of the structure, mismatches between units (Bland et al., 2007; Edgar, 2007; Grissa et al., 2007b), etc. One of the most complete source of data on CRISPR was designed in 2007 by Grissa et al. (2007a) and most recently released in June 2009. We have tried to improve on this, with a simpler CRISPR model and several new functions.

2 IMPLEMENTATION

2.1 Identification of CRISPR

The usual specification of CRISPR, based on limited empirical data instead of biological functional constraints, remains too informal to be helpful in systematic studies: CRISPR are repeated structures composed of exact repeat sequences 24–48 bases long separated by unique spacers of similar length (Kunin et al., 2007). In actual fact, most CRISPR include altered repeats and spacers are occasionally repeated inside the same structure and sometimes even in different CRISPR on the same chromosome. Some authors give more details on the structure: repeats were thought to exhibit a kind of dyadic symmetry, but as more data becomes available this characterization is being questioned. A leader sequence before the train of repeats is often mentioned, but it is only defined as an A/T rich region and does not appear to be present in all CRISPR. Since the existence of a skeleton seems the only tangible indicator for CRISPR and since we try to minimize a priori assumptions, we have chosen to base the search only on the existence of a periodic spaced suite of units (at least four units) that is not a tandem repeat. Maximal repeats have largely been used for the detection of relevant repeats and applied to the search for units (Grissa et al., 2007b). But short words such as those that appear in CRISPR can occur at a frequency comparable with random words of similar size. We have introduced locality restrictions on the notion of maximal repeats reflecting the kind of repeats that are found in CRISPR: first, each cluster of occurrences has a limited size; second, only maximal repeats with at least one occurrence that is not covered by a larger repeat are retained (Nicolas et al., 2008). We have produced putative units by clustering such overlapping local maximal repeats. Actually, we do not fix any value for the size of units or spacers, and we do not require units to be identical inside a given CRISPR (the minimal required percentage of identity with the consensus is, however, fixed at 60% in order to avoid spurious structures). Bacterial and archaeal genomes have been downloaded from the NCBI FTP Server (ftp.ncbi.nih.gov/genomes/Bacteria/). The detection method we have just outlined has been implemented in C and Java 1.5.0 12. The presence of CRISPR has been checked in all available genomes and results have been stored into a MySQL 4.1.12 database. All web pages are implemented using PHP 4.3.9.

2.2 Access to the CRISPI database

The main page of CRISPI offers three search forms that give access to the database content or allow to analyse personal sequences.

2.2.1 Database searches

CRISPI allows users to view all CRISPR found in Archaea and Bacteria genomes. Microbial genomes can be easily selected by accession number, by entering the genome name (or a part of it) or by selecting a genome from the genome list (alphabetical order) or in the taxonomy browser. Once a genome has been selected, results are summarized in tables. Each CRISPR is highlighted and CRISPR-associated genes (CAS genes) found in its vicinity are displayed. These are identified by dedicated Hidden Markov Model (HMM) profiles that have been constructed from available genes. If new putative CAS genes were found, they are highlighted in red. Annotations contain various elements such as positions, sequence of the consensus unit, links to related NCBI information, links to graphical circular views of the genome [thanks to CGView, see (Stothard and Wishart, 2005)]. Clicking on the consensus takes the user to the CRISPR's details and gives information such as units and spacer coordinates, units and spacer sequences, Pygram image (Durand et al., 2006) and consensus WebLogo image (Fig. 1). Spacers, Units, CRISPR, flanking sequences and CAS genes may be downloaded in Fasta format.
Fig. 1.

Typical graphical views in CRISPI.

Typical graphical views in CRISPI.

2.2.2 Run BLAST on user-provided sequence against CRISPI

Virologists or microbiologists can run BLAST on user-provided sequences to find out if a virus or plasmid sequence matches with one or more spacers in the database. The query sequence must be in Fasta format (DNA or protein). The query sequence can either be pasted into the input field or uploaded from a file on the local machine (files with multiple sequences are allowed). Users have access to BLAST parameters for precise comparisons. Moreover, it can be run against units instead of spacers for studies on the origin of such structures. The BLAST results pages are cross-linked with the CRISPI database so that it is easy to return to the database by clicking on hyperlinks.

2.2.3 Identify CRISPR repeats in user-provided sequence

Users may wish to check their own microbial sequences for annotation purposes. The query sequence must be in Fasta format (only DNA sequences are allowed). The query sequence can either be pasted into the input field or uploaded from a file on the local machine (files with multiple sequences are not allowed). Results are summarized in a table, and users have the option of requesting an email notification once the results are available. These user-submitted genomes remain available on confidential web pages that can be accessed for 10 days before deletion.

3 CONCLUSIONS

CRISPI is a dedicated environment on CRISPR in prokaryotic genomes that offers for the first time an up-to-date view of existing CRISPR (71 archaea totalling 291 CRISPR, and 987 bacteria totalling 2103 CRISPR) including a complete repertory of CRISPR-associated genes CASgenes. The current version contains 1173 archeal CAS genes and 4396 bacterial CAS genes. We have not attempted to retain very small structures (1 or 2 spacers), as in Grissa et al. (2007a), as it is not clear if they have any relevance or activity. In contrast, we have included a richer environment for practical work on CRISPR: access to extended queries via BLAST parameters, multiple graphical views, etc.
  9 in total

1.  Circular genome visualization and exploration using CGView.

Authors:  Paul Stothard; David S Wishart
Journal:  Bioinformatics       Date:  2004-10-12       Impact factor: 6.937

2.  CRISPR provides acquired resistance against viruses in prokaryotes.

Authors:  Rodolphe Barrangou; Christophe Fremaux; Hélène Deveau; Melissa Richards; Patrick Boyaval; Sylvain Moineau; Dennis A Romero; Philippe Horvath
Journal:  Science       Date:  2007-03-23       Impact factor: 47.728

Review 3.  CRISPR--a widespread system that provides acquired resistance against phages in bacteria and archaea.

Authors:  Rotem Sorek; Victor Kunin; Philip Hugenholtz
Journal:  Nat Rev Microbiol       Date:  2008-03       Impact factor: 60.633

4.  Evolutionary conservation of sequence and secondary structures in CRISPR repeats.

Authors:  Victor Kunin; Rotem Sorek; Philip Hugenholtz
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

5.  PILER-CR: fast and accurate identification of CRISPR repeats.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2007-01-20       Impact factor: 3.169

6.  Browsing repeats in genomes: Pygram and an application to non-coding region analysis.

Authors:  Patrick Durand; Frédéric Mahé; Anne-Sophie Valin; Jacques Nicolas
Journal:  BMC Bioinformatics       Date:  2006-10-26       Impact factor: 3.169

7.  The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats.

Authors:  Ibtissem Grissa; Gilles Vergnaud; Christine Pourcel
Journal:  BMC Bioinformatics       Date:  2007-05-23       Impact factor: 3.169

8.  CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats.

Authors:  Charles Bland; Teresa L Ramsey; Fareedah Sabree; Micheal Lowe; Kyndall Brown; Nikos C Kyrpides; Philip Hugenholtz
Journal:  BMC Bioinformatics       Date:  2007-06-18       Impact factor: 3.169

9.  CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats.

Authors:  Ibtissem Grissa; Gilles Vergnaud; Christine Pourcel
Journal:  Nucleic Acids Res       Date:  2007-05-30       Impact factor: 16.971

  9 in total
  44 in total

1.  Characterization of the CRISPR/Cas subtype I-A system of the hyperthermophilic crenarchaeon Thermoproteus tenax.

Authors:  André Plagens; Britta Tjaden; Anna Hagemann; Lennart Randau; Reinhard Hensel
Journal:  J Bacteriol       Date:  2012-03-09       Impact factor: 3.490

Review 2.  RNA-guided genetic silencing systems in bacteria and archaea.

Authors:  Blake Wiedenheft; Samuel H Sternberg; Jennifer A Doudna
Journal:  Nature       Date:  2012-02-15       Impact factor: 49.962

3.  Crystal structures of CRISPR-associated Csx3 reveal a manganese-dependent deadenylation exoribonuclease.

Authors:  Xinfu Yan; Wei Guo; Y Adam Yuan
Journal:  RNA Biol       Date:  2015       Impact factor: 4.652

Review 4.  The RNA- and DNA-targeting CRISPR-Cas immune systems of Pyrococcus furiosus.

Authors:  Rebecca M Terns; Michael P Terns
Journal:  Biochem Soc Trans       Date:  2013-12       Impact factor: 5.407

5.  Stone-dwelling actinobacteria Blastococcus saxobsidens, Modestobacter marinus and Geodermatophilus obscurus proteogenomes.

Authors:  Haïtham Sghaier; Karima Hezbri; Faten Ghodhbane-Gtari; Petar Pujic; Arnab Sen; Daniele Daffonchio; Abdellatif Boudabous; Louis S Tisa; Hans-Peter Klenk; Jean Armengaud; Philippe Normand; Maher Gtari
Journal:  ISME J       Date:  2015-06-30       Impact factor: 10.302

Review 6.  CRISPR-based adaptive immune systems.

Authors:  Michael P Terns; Rebecca M Terns
Journal:  Curr Opin Microbiol       Date:  2011-04-29       Impact factor: 7.934

7.  Complete genome sequence of the hyperthermophilic, piezophilic, heterotrophic, and carboxydotrophic archaeon Thermococcus barophilus MP.

Authors:  Pauline Vannier; Viggo Thor Marteinsson; Olafur Hedinn Fridjonsson; Philippe Oger; Mohamed Jebbar
Journal:  J Bacteriol       Date:  2011-01-07       Impact factor: 3.490

8.  Crystal structure of clustered regularly interspaced short palindromic repeats (CRISPR)-associated Csn2 protein revealed Ca2+-dependent double-stranded DNA binding activity.

Authors:  Ki Hyun Nam; Igor Kurinov; Ailong Ke
Journal:  J Biol Chem       Date:  2011-06-21       Impact factor: 5.157

Review 9.  CRISPR-mediated defense mechanisms in the hyperthermophilic archaeal genus Sulfolobus.

Authors:  Andrea Manica; Christa Schleper
Journal:  RNA Biol       Date:  2013-03-27       Impact factor: 4.652

10.  Processing-independent CRISPR RNAs limit natural transformation in Neisseria meningitidis.

Authors:  Yan Zhang; Nadja Heidrich; Biju Joseph Ampattu; Carl W Gunderson; H Steven Seifert; Christoph Schoen; Jörg Vogel; Erik J Sontheimer
Journal:  Mol Cell       Date:  2013-05-23       Impact factor: 17.970

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.