Literature DB >> 23284085

ChloroMitoSSRDB: open source repository of perfect and imperfect repeats in organelle genomes for evolutionary genomics.

Gaurav Sablok1, Suresh B Mudunuri, Sujan Patnana, Martina Popova, Mario A Fares, Nicola La Porta.   

Abstract

Microsatellites or simple sequence repeats (SSRs) are repetitive stretches of nucleotides (A, T, G, C) that are distributed either as single base pair stretches or as a combination of two- to six-nucleotides units that are non-randomly distributed within coding and in non-coding regions of the genome. ChloroMitoSSRDB is a complete curated web-oriented relational database of perfect and imperfect repeats in organelle genomes. The present version of the database contains perfect and imperfect SSRs of 2161 organelle genomes (1982 mitochondrial and 179 chloroplast genomes). We detected a total of 5838 chloroplast perfect SSRs, 37 297 chloroplast imperfect SSRs, 5898 mitochondrial perfect SSRs and 50 355 mitochondrial imperfect SSRs across these genomes. The repeats have been further hyperlinked to the annotated gene regions (coding or non-coding) and a link to the corresponding gene record in National Center for Biotechnology Information(www.ncbi.nlm.nih.gov/) to identify and understand the positional relationship of the repetitive tracts. ChloroMitoSSRDB is connected to a user-friendly web interface that provides useful information associated with the location of the repeats (coding and non-coding), size of repeat, motif and length polymorphism, etc. ChloroMitoSSRDB will serve as a repository for developing functional markers for molecular phylogenetics, estimating molecular variation across species. Database URL: ChloroMitoSSRDB can be accessed as an open source repository at www.mcr.org.in/chloromitossrdb.

Entities:  

Mesh:

Year:  2013        PMID: 23284085      PMCID: PMC3628443          DOI: 10.1093/dnares/dss038

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

Microsatellites, or simple sequence repeats (SSRs), are repetitive stretches of a tandemly repeated motif of one to six base pairs, which has evolved and expanded owing to the replication slippage mechanism that is supposed to be the cause of their high polymorphic rates.[1] Recently, using a genome-wide alignment of two Orzya species var. indica and japonica, it has been demonstrated that the distribution of microsatellites is also influenced by the motif sequence and the sequence characteristics of the adjoining regions possessing the microsatellites, in addition to the replication slippage and point mutation model.[2] These repetitive stretches may occur in coding and in non-coding regions of the genome. SSRs have been potentially designated as a class of co-dominant markers for evaluating germplasm, establishing phylogenetic and evolutionary relationships. It has been observed that clusters of microsatellite motifs with moderate GC are abundant on chromosome number 2 in the model plant Arabidopsis thaliana, which suggests that repetitive stretches may be biased towards the accumulation in a certain regions.[3] Microsatellites have been associated with various functional roles such as their possible role in the regulation of promoters, transcription and translation, and these sequence repeats have been credited with evolutionary importance.[4-6] The positioning of microsatellites in the genome seems to play an important role in their regulatory activity; hence, studying the distribution and understanding the possible reasons of microsatellites expansions across genomes have currently been the focus of current intense research. Organelle genomes, plant chloroplast and animal mitochondrial genomes have been referred to as natural counterparts.[7,8] Features such as conserved gene order, lack of heteroplasmy (occurrence of more than one type of organelle genome), low recombination rates and their relative small size are making these organelle genomes the widely used tools for phylogenetic studies. However, lack of heteroplasmy has not been universally observed in all the mitochondrial genomes and has been earlier potentially reviewed with the occurrence and factors affecting the stoichiometry of heteroplasmy in mitochondrial genomes of plants and animals.[9] The uniparental inheritance of the organelle markers provides a means to elucidate the genetic flow and genetic structure of the population and the organelle markers have been widely used in population studies (for a review see Provan et al.).[8] In silico development of SSRs of organelle genomes has brought them up as potential markers for transferability among the species, ease of development and as key players in genome length variation. They have been widely demonstrated as potential markers for establishing molecular evolutionary histories, demographic diversity and resolving phylogeny in a wide variety of species from Pinus (forest species) to Oryza sativa (Monocots).[10-12] There have been recent reports on the identification of perfect repeats in organelle genomes of various organisms.[11-16] However, previous studies have only been focused on a relatively small number of genomes and only perfect repeats have been identified. A proper characterization system that would allow researchers to search for the association of these repeats with the coding or non-coding regions has been lacking in these reports. In the past few years, systematic curated web repositories have been developed for the organelle genomes, which includes FUGOID that displays the curated distribution of introns in organelle genomes with functional and structural data.[17] A database of universally published primer sequences of chloroplast genomes has been developed, providing a platform for studying molecular variations and evolution in chloroplasts.[18] These organelle genomes have been exploited further for the mining of genes, exons, introns, gene products, taxonomy, RNA editing sites, SNPs and haplotype information, all of which are displayed as curated information in GOBASE.[19] A comprehensive repository of unique proteins expressed in chloroplast proteome using liquid chromatography-mass spectrometry/mass spectrometry has been developed (AT_CHLORO), serving as a knowledge base to explore the envelope proteins.[20] However, a complete curated web-oriented integrated repository of repeat pattern is still lacking. This has motivated us to undertake a genome-wide study and to develop a web-enabled interface to analyse the perfect and the imperfect repeats in organelle genomes. We propose ChloroMitoSSRDB that offers a wide visualization of perfect and imperfect repeats across the chloroplast and mitochondrial genomes with corresponding genomic coordinates. The aim of ChloroMitoSSRDB is to constitute a platform to access the utility of SSRs as markers for phylogenetic classification across species. To our knowledge, this is the first updated integrated repository of the genomic repeats in chloroplast and mitochondrial genomes accessible via web interface.

Material and methods

Genome data retrieval and pattern search

All the studied chloroplast (179) and mitochondrial (1982) genomes were retrieved from the National Center for Biotechnology Information (NCBI) RefSeq database (www.ncbi.nlm.nih.gov/). The required files such as gbk, fna, faa, gff and ptt were downloaded for the studied chloroplast and mitochondrial genomes and were stored as flat files sorted for each genome. For the identification of the perfect and imperfect repeats, the software tool Imperfect Microsatellite Extractor (IMEx)[21] has been used, which uses a sliding window algorithm to identify the regions with a repetitive stretch of a particular nucleotide motif either stretched perfectly or with levels of imperfection. The algorithm allows the user to specify the minimal length of the consecutive nucleotide stretch and reports the SSR motif, motif repeat counts, coordinates of the SSRs tract in the genome and its location relative to coding and non-coding regions. The association of the repeats in coding and intercoding regions was determined based on the sequence annotation information available in GenBank database (NCBI, www.ncbi.nlm.nih.gov). We applied the following length criteria (Mono-, 12; Di-, 6; Tri-, 4; and for Tetra- to Hexa repeats, a minimum stretch of three minimum repetitions) to define each SSRs as a true repeat. In case of imperfect repeats, the parameter for imperfection percentage (p%) is set to 10% indicating the level of imperfection allowed in each repeat tract.

Results and discussions

Structure of ChloroMitoSSRDB database

ChloroMitoSSRDB is hosted on a 32-bit Linux server pre-installed with MySQL (http://www.mysql.com/), Apache (http://www.apache.org/) and PHP (http://www.php.net/) commonly called as LAMP. A flow chart explaining the organization and the work flow of the ChloroMitoSSRDB has been presented (Fig. 1). ChloroMitoSSRDB is based on a simple comprehensive relational database management system, MySQL, that is sufficient for organizing, storing and retrieving the data with a single query. The details of the relational MySQL tables used in the construction of the ChloroMitoSSRDB database are explained in Tables 1 and 2. Table 1 shows the metadata for each genome, whereas the structure of the MySQL relational tables depicting the repeat information stored for the coding and the non-coding regions is given in Table 2. Each query has been split into hierarchical levels of information that displays information on each Genome (e.g. accession, sequence length and nucleotide composition) (Table 1).
Figure 1.

Schematic illustration showing the flow of the organization of the data in ChloroMitoSSRDB.

Table 1.

Structure of the table ‘chloromitometa’ that stores the meta-information of all the mitochondrial and chloroplast genomes

InformationFieldData typeKeyExample
Accession numberacc_noint(11)5881414, 110189662
Sequence IDseq_idvarchar(11)PRINC_000834, AC_000022
Sequence nameseq_namevarchar(500)Rattus norvegicus strain Wistar mitochondrion, Porphyra purpurea chloroplast
Sequence typeseq_typevarchar(50)Complete genome, complete sequence
Sequence lengthseq_lengthint(11)16 613 bp, 7686 bp
Nucleotide composition of Aa_perFloat33.06%
Nucleotide composition of Tt_perFloat41.87%
Nucleotide composition of Gg_perFloat13.58%
Nucleotide composition of Cc_perFloat11.49%
Organelle typeorganelleChar(1)M (for Mitochondrion), C (Chloroplast)
Taxon IDtaxonInt263 995
Table 2.

Structure of the tables ‘chloromitoperfectmicrosatellite’ and ‘chloromitoimperfectmicrosatellite’ that store the repeat information of all perfect and imperfect microsatellites of mitochondrial and chloroplast genomes

InformationFieldData typeKeyExample
Sequence IDindex_novarchar(11)PRINC_000834, AC_000022
Starting co-ordinate of SSRstartint(11)PRI172, 12843
Ending co-ordinate of SSRendint(11)PRI182, 12885
motif (repeating unit)motifvarchar(10)AT, G, CAAC
Number of repetitionsiterationsint(5)3, 7
Length of repeat tracttract_lengthint(11)12 bp, 18 bp
Nucleotide composition of Aa_perFloat50.00%
Nucleotide composition of Tt_perFloat0.00%
Nucleotide composition of Gg_perFloat33.33%
Nucleotide composition of Cc_perFloat16.67%
Repeat position informationcoding_infovarchar(50)Coding (if repeat is in the coding region) or Null (if outside)
Protein ID (if repeat in coding region)protein_idint(11)110189664 (if repeat is in the coding region) or 0 (if non-coding)
aImperfection percentage of the tractimperfectionFloat9%, 0%
aAlignment line 1alignment_line1TextTTAA-TAATTAA
aAlignment line 2alignment_line2Text**** *******
aAlignment line 3alignment_line3TextTTAATTAATTAA

aThe last four columns (imperfection, alignment_line1, alignment_line2 and alignment_line3) are present only in the table that stores imperfect microsatellites (chloromitoimperfectmicrosatellite).

Structure of the table ‘chloromitometa’ that stores the meta-information of all the mitochondrial and chloroplast genomes Structure of the tables ‘chloromitoperfectmicrosatellite’ and ‘chloromitoimperfectmicrosatellite’ that store the repeat information of all perfect and imperfect microsatellites of mitochondrial and chloroplast genomes aThe last four columns (imperfection, alignment_line1, alignment_line2 and alignment_line3) are present only in the table that stores imperfect microsatellites (chloromitoimperfectmicrosatellite). Schematic illustration showing the flow of the organization of the data in ChloroMitoSSRDB. The information for the genome composition (A-, T-, G- and C- counts, etc.) has been computed from the flat files obtained from the NCBI RefSeq database (Table 1). The complete repeat information of the database is stored in two different tables (refer Table 2), storing the perfect and imperfect repeats of all chloroplast and mitochondrial genomes. The repeat information includes the details of individual repeats such as the sequence ID, start and end coordinates of the repeat, the repeating motif, number of iterations, total tract length, nucleotide composition of the repeat, protein information of coding repeats. In addition, the table displaying the imperfect repeats also stores the imperfection percentage and alignment information that can be used to study the evolution of these repeats.

Web visualization of ChloroMitoSSRDB

The front end of the database is integrated via web accessible PHP scripts. The web interface allows various patterns of search for the repeats in organelle genomes. The complete browsing outlay of the ChloroMitoSSRDB is displayed (Fig. 2). The curated information is organized into several search patterns, and proper navigation pages have been provided. The curated information from the IMEx has been processed further according to gene IDs, organism name, and the SSRs were sorted according to the coding or non-coding regions. The position of the coding regions has been determined using the annotated ptt files of each chloroplast and mitochondrial genome as downloaded from the NCBI Refseq database.
Figure 2.

How to browse: schematic browsing of ChloroMitoSSRDB. This figure appears in colour in the online version of DNA Research.

How to browse: schematic browsing of ChloroMitoSSRDB. This figure appears in colour in the online version of DNA Research. ChloroMitoSSRDB interface provides information on several repeat statistics, including the distribution of the repeat types, length of the motifs and their positions (coding or non-coding repeats). The querying of ChloroMitoSSRDB through the web interface is organized into three search patterns that accomplish all interface functionalities: query page, result page and report page: (i) the first search pattern is according to the organelle classification and it has been classified into chloroplast and mitochondrial genomes, (ii) the second search pattern has been classified according to the type of repeat pattern (perfect or imperfect) and (iii). the last search pattern allows the user to select the repeat size. With the appropriate selection pattern, the user will be directed to the organelle-specific page (chloroplast and mitochondrial) containing the list of the organism for which the SSRs have been identified, which are further linked to the organism-specific repeat pages for further information on the distribution of the repetitive tracts. To ease the access of the database and to enhance the user functionality, we also provide chloroplast and mitochondrial repeat-specific pages alphabetically ordered according to the organism name. An advanced search option has been provided to filter the repeats based on the user-specific criteria allowing the user to search for a repeat region of a specific length. An option to export the search results and the repeat information in excel format has been provided, so that the user can save and analyse the repeats, design primers and can utilize the information for further downstream processing of the observed repeats. A query page for every organism is directed to a ChloroMitoSSRDB repeat summary page for organism-specific summary page that gives a detailed illustration of the distribution of the perfect and the imperfect repeats distribution and the genome composition using bar and pie charts. The genome composition and the repeat occurrence graphs were generated dynamically based on the repeat information using Libchart, a PHP chart drawing library (http://naku.dohcrew.com/libchart/). The repeat pattern summary displayed on the organism specific page are clickable links, which redirects and give further information on the start and end of the SSR repeat containing tract, Motif and the occurrence of the respective repeat pattern across the genomes. Mutations in the SSR stretches prevailing in the coding region may affect the subsequent transcription and translation of the gene harbouring the repetitive stretches of SSRs.[22] Mutations in chloroplast SSRs (mutation rates at cpSSR loci as between 3.2 × 10−5 and 7.9 × 10−5) have been described as low when compared with substitution rates.[23] Recently, it was observed that the plant mitochondrial substitution rates are relatively lower when compared with the invertebrates and mammalian mitochondrial genomes.[24,25] To evaluate the distribution of the SSRs in the coding regions, the repeat-rich regions on the organism page have been linked to the corresponding protein IDs (NCBI, www.ncbi.nlm.nih.gov/), in case of coding repeats, which can shed light on the evolution of these repeated regions either through mutational bias or through selective forces in further ongoing work.

Conclusion

We have consecutively constructed a database ChloroMitoSSRDB that displays curated information of wide spread occurrences of genomic repeats in chloroplast and mitochondrial genomes available so far, and we will be constantly updating ChloroMitoSSRDB with the new chloroplast and mitochondrial genomes as and when they are released. The repeats in the coding regions of the genes may prove to be candidate markers to study the functional role of repeats associated with the genes, as possible markers for species delimitation, evolutionary analyses and also for evaluating the germplasm and to hypothesize conservation strategies for endangered species. In future release, we will make efforts to upgrade the primer pair information for the repeat-rich regions and will also upgrade the database with the systematic visualization of imperfect alignments through the availability of hyperlinked pages in case of imperfect repeats. We believe that ChloroMitoSSRDB will serve as a standard database for exploring and understanding genomic repeats in organelle genomes, and the data represented in ChloroMitoSSRDB make a good starting point for further exploratory investigations on SSR polymorphism, large comparative genome comparison and provide a platform to understand the repetitive nature of organelle genomes.

Funding

This work was supported by BIOMASFOR (Z0912003I, Italy) and EC FP7 (BIOSUPPORT, Bulgaria). M.A.F. was supported by a grant from the Spanish Ministerio de Ciencia e Inovación (BFU2009-12022).
  23 in total

1.  A low mutation rate for chloroplast microsatellites.

Authors:  J Provan; N Soranzo; N J Wilson; D B Goldstein; W Powell
Journal:  Genetics       Date:  1999-10       Impact factor: 4.562

2.  Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance.

Authors:  Kian Guan Lim; Chee Keong Kwoh; Li Yang Hsu; Adrianto Wirawan
Journal:  Brief Bioinform       Date:  2012-05-29       Impact factor: 11.622

Review 3.  Heteroplasmy as a common state of mitochondrial genetic information in plants and animals.

Authors:  Beata Kmiec; Magdalena Woloszynska; Hanna Janska
Journal:  Curr Genet       Date:  2006-06-09       Impact factor: 3.886

4.  In silico analysis of SSRs in mitochondrial genomes of plants.

Authors:  Himani Kuntal; Vinay Sharma
Journal:  OMICS       Date:  2011-10-19

5.  Long, polymorphic microsatellites in simple organisms.

Authors:  D Field; C Wills
Journal:  Proc Biol Sci       Date:  1996-02-22       Impact factor: 5.349

6.  Microsatellite instability regulates transcription factor binding and gene expression.

Authors:  Patricia Martin; Katherine Makepeace; Stuart A Hill; Derek W Hood; E Richard Moxon
Journal:  Proc Natl Acad Sci U S A       Date:  2005-02-22       Impact factor: 11.205

Review 7.  Slipped-strand mispairing: a major mechanism for DNA sequence evolution.

Authors:  G Levinson; G A Gutman
Journal:  Mol Biol Evol       Date:  1987-05       Impact factor: 16.240

8.  Polymorphic simple sequence repeat regions in chloroplast genomes: applications to the population genetics of pines.

Authors:  W Powell; M Morgante; R McDevitt; G G Vendramin; J A Rafalski
Journal:  Proc Natl Acad Sci U S A       Date:  1995-08-15       Impact factor: 11.205

9.  Microsatellite analysis in organelle genomes of Chlorophyta.

Authors:  Himani Kuntal; Vinay Sharma; Henry Daniell
Journal:  Bioinformation       Date:  2012-03-31

10.  Phylogenetic analysis of mitochondrial substitution rate variation in the angiosperm tribe Sileneae.

Authors:  Daniel B Sloan; Bengt Oxelman; Anja Rautenberg; Douglas R Taylor
Journal:  BMC Evol Biol       Date:  2009-10-31       Impact factor: 3.260

View more
  11 in total

1.  FMiR: A Curated Resource of Mitochondrial DNA Information for Fish.

Authors:  Naresh Sahebrao Nagpure; Iliyas Rashid; Ajey Kumar Pathak; Mahender Singh; Rameshwar Pati; Shri Prakash Singh; Uttam Kumar Sarkar
Journal:  PLoS One       Date:  2015-08-28       Impact factor: 3.240

2.  Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats.

Authors:  Shiliang Hu; Gaurav Sablok; Bo Wang; Dong Qu; Enrico Barbaro; Roberto Viola; Mingai Li; Claudio Varotto
Journal:  BMC Genomics       Date:  2015-04-17       Impact factor: 3.969

3.  De novo assembly and characterization of leaf transcriptome for the development of functional molecular markers of the extremophile multipurpose tree species Prosopis alba.

Authors:  Susana L Torales; Máximo Rivarola; María F Pomponio; Sergio Gonzalez; Cintia V Acuña; Paula Fernández; Diego L Lauenstein; Aníbal R Verga; H Esteban Hopp; Norma B Paniego; Susana N Marcucci Poltri
Journal:  BMC Genomics       Date:  2013-10-14       Impact factor: 3.969

4.  ChloroSSRdb: a repository of perfect and imperfect chloroplastic simple sequence repeats (cpSSRs) of green plants.

Authors:  Aditi Kapil; Piyush Kant Rai; Asheesh Shanker
Journal:  Database (Oxford)       Date:  2014-11-07       Impact factor: 3.451

5.  Chloroplast Genome Analysis of Resurrection Tertiary Relict Haberlea rhodopensis Highlights Genes Important for Desiccation Stress Response.

Authors:  Zdravka Ivanova; Gaurav Sablok; Evelina Daskalova; Gergana Zahmanova; Elena Apostolova; Galina Yahubyan; Vesselin Baev
Journal:  Front Plant Sci       Date:  2017-02-20       Impact factor: 5.753

6.  pSATdb: a database of mitochondrial common, polymorphic, and unique microsatellites.

Authors:  Sonu Kumar; Ashutosh Singh; Asheesh Shanker
Journal:  Life Sci Alliance       Date:  2022-02-18

7.  ChloroMitoSSRDB 2.00: more genomes, more repeats, unifying SSRs search patterns and on-the-fly repeat detection.

Authors:  Gaurav Sablok; G V Padma Raju; Suresh B Mudunuri; Ratna Prabha; Dhananjaya P Singh; Vesselin Baev; Galina Yahubyan; Peter J Ralph; Nicola La Porta
Journal:  Database (Oxford)       Date:  2015-09-27       Impact factor: 3.451

8.  PlantFuncSSR: Integrating First and Next Generation Transcriptomics for Mining of SSR-Functional Domains Markers.

Authors:  Gaurav Sablok; Antonio J Pérez-Pulido; Thac Do; Tan Y Seong; Carlos S Casimiro-Soriguer; Nicola La Porta; Peter J Ralph; Andrea Squartini; Antonio Muñoz-Merida; Jennifer A Harikrishna
Journal:  Front Plant Sci       Date:  2016-06-27       Impact factor: 5.753

9.  Complete Chloroplast Genome Sequence of Decaisnea insignis: Genome Organization, Genomic Resources and Comparative Analysis.

Authors:  Bin Li; Furong Lin; Ping Huang; Wenying Guo; Yongqi Zheng
Journal:  Sci Rep       Date:  2017-08-30       Impact factor: 4.379

Review 10.  "Mitochondrial Toolbox" - A Review of Online Resources to Explore Mitochondrial Genomics.

Authors:  Ruaidhri Cappa; Cassio de Campos; Alexander P Maxwell; Amy J McKnight
Journal:  Front Genet       Date:  2020-05-08       Impact factor: 4.772

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.