Literature DB >> 24952649

CicArMiSatDB: the chickpea microsatellite database.

Dadakhalandar Doddamani, Mohan A V S K Katta, Aamir W Khan, Gaurav Agarwal, Trushar M Shah, Rajeev K Varshney¹.

Abstract

BACKGROUND: Chickpea (Cicer arietinum) is a widely grown legume crop in tropical, sub-tropical and temperate regions. Molecular breeding approaches seem to be essential for enhancing crop productivity in chickpea. Until recently, limited numbers of molecular markers were available in the case of chickpea for use in molecular breeding. However, the recent advances in genomics facilitated the development of large scale markers especially SSRs (simple sequence repeats), the markers of choice in any breeding program. Availability of genome sequence very recently opens new avenues for accelerating molecular breeding approaches for chickpea improvement. DESCRIPTION: In order to assist genetic studies and breeding applications, we have developed a user friendly relational database named the Chickpea Microsatellite Database (CicArMiSatDB http://cicarmisatdb.icrisat.org). This database provides detailed information on SSRs along with their features in the genome. SSRs have been classified and made accessible through an easy-to-use web interface.
CONCLUSIONS: This database is expected to help chickpea community in particular and legume community in general, to select SSRs of particular type or from a specific region in the genome to advance both basic genomics research as well as applied aspects of crop improvement.

Entities: Chemical Disease Species

Mesh：

Year: 2014 PMID： 24952649 PMCID： PMC4230034 DOI： 10.1186/1471-2105-15-212

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

Chickpea belongs to the family Fabaceae of class dicots. Great importance has been attributed to chickpea in agriculture in view of its consumption as human food and livestock fodder. As per the FAO 2012 statistics [1], chickpea is grown in more than 50 countries and the production was approximately 11.3 million tons. India is the largest producer and it contributed to 67-70% in the world’s total production during 2009–2012. The two known types of chickpea, kabuli and desi are distinguished based on characteristics such as seed size, color and shape. Desi type is recognized by round dark seed coat, whereas, the kabuli type could be identified by bigger beige-colored round seed coat [2]. Chickpea is low in fat and provides dietary fibre, protein, dietary phosphorus and helps in the lowering of blood cholesterol [3]. As a member of family Fabaceae, it has the ability to increase the soil fertility by fixing the atmospheric nitrogen. In the context of crop improvement, the availability of the genomic sequence information opens the possibility of improving the crop production by developing the molecular markers for supporting breeding programs. Molecular markers are specific sequence of DNA that identifies regions associated with trait of interest in the genome. A range of molecular markers namely restriction fragment length polymorphism (RFLP), random amplified polymorphism DNA (RAPD), amplified fragment length polymorphism (AFLP), simple sequence repeats (SSRs) also known as microsatellites and more recently, single nucleotide polymorphism (SNP) markers have become available in many crop species. SSRs, however, have been widely used in crop genetics and breeding applications [4]. For instance, SSRs have been used in determining hybrid purity, identifying genotypes, discovering genes linked to known markers and also enable an in-depth analysis of quantitative traits, allowing interesting alleles to be found in wild or cultivated germplasm [5]. SSRs are sequence blocks containing 1 to 6 nucleotide units repeated in tandem and tend to be highly polymorphic due to rapid mutation events. SSRs present advantages over other anonymous molecular markers like RAPD and AFLP as they occur randomly in a genome, allow identification of multiple alleles at single locus, and are co-dominant. These markers have been developed in number of crop species [6-8] for a broad range of applications such as genome mapping, genetic diversity studies and fingerprinting [4,9-11]. Recent advances in crop genomics enabled chickpea breeding community at a global scale to make significant improvements in the crop productivity by developing SSR markers from the various available resources like BAC-end sequences [12], transcriptome [13], SSR markers from SSR-enriched genomic library [10] and BAC libraries [14]. Recently, genome analysis of chickpea identified a total of 81,845 SSRs [15]. Primer pairs could be designed for 48,298 SSRs enabling them to be used as genetic markers. Given the huge number of SSRs, geneticists and breeders may be interested in selecting SSR markers from a specific genomic region. Therefore it is highly desirable to have SSR database for chickpea that enable chickpea community to select the SSR markers of choice. Such kind of SSR databases have been developed in some crops such as pigeonpea [16], sorghum, soybean, maize, rice [17] and cotton [18]. In view of above, this study reports a user friendly, comprehensive web based resource (CicArMiSatDB) detailing the information on SSRs present in the chickpea genome to facilitate use of SSRs as genetic markers in chickpea genetics and breeding applications. It is to be noted that the CicArMiSatDB not only contains the SSR markers for which primer pairs have already been reported but also highlight the ones (1,300 in total) which were validated in earlier studies.

Construction and content

The list of chickpea SSRs [19] and genomic features [20] were collected and stored in relational database tables of PostgresSQL (v9.2.4). Importantly, genomic locations of validated SSRs, from earlier studies [2,10,12,13,21-27] were collected, and highlighted amongst the existing SSRs (Additional file 1: Table S1).

Database architecture

The information on SSRs was stored in five database tables (Figure 1). Each SSR was represented with a unique identifier called SSR_ID. The description of database tables is as follows.

Figure 1

Database schema of CicArMiSatDB.

Database schema of CicArMiSatDB. 1. SSR_info table contains SSRs that have been classified into simple and the compound SSRs based on the complexity of the motif. This table describes each SSR with the type of SSR, its length and the motif (2 to 6 nucleotides). 2. SSR_primer table provides the primer sequences which can be used for the amplification and information like amplicon size and melting temperature. 3. SSR_genome table provides information on the genomic coordinates of the SSR, and the classification (information on the location of SSR in the Pseudo molecules, contigs and scaffolds sequence). 4. The SSRs may be located either in coding or non-coding regions. The SSR_gene table contains classification of these SSRs into genic and non-genic categories based on their location inferred from the annotation file (gff). This table also includes the genomic coordinates, orientation of the genes and provides the nearest gene information along with the distance for the non-genic SSRs. 5. Gene annotation table contains the functional annotation of the genes such as gene name, symbol, protein function, organism, pathway information and Gene Ontology (GO) annotations.

Implemented tools

1. BLAST: To retrieve a marker and associated information, various search interfaces were included. Genome wide search for SSR markers was implemented by integrating BLAST [28,29] software into the database. The users may wish to search the database with a nucleotide sequence (e.g., gene of interest) to find the nearest genic and non-genic SSRs, both upstream and downstream to the sequence of interest, which could be used as candidate marker for further applications. To this end, BLAST search has been integrated into the database which enables the user to input multiple fasta format sequences to search for homologous sequences in chickpea genome. The genome coordinates of best hit from the search are resolved and screened within a window of 0.1 million bases (on both directions) to identify the nearest genic and non-genic SSRs in the chickpea genome. 2. GBrowse: Generic genome browser (GBrowse) [30,31] was added to the database to visualize various genomic features like genes, CDS, SSRs etc. GBrowse enables visualization of the genomic features as well as comparison of SSRs in the database with the user provided SSRs in GFF [32] file format. The database is designed by integrating software components such as PostgresSQL (v9.2.4): to store the data in tables; Apache web server (v2.22): to access the data using web interface with the help of PHP (v5.4) and jQuery (v2.0) library was used to ease the implementation of a user friendly interface to the database.

Utility and discussion

Detailed analysis of chickpea genome through perl based MISA script [33] reported 48,298 SSRs [15]. The minimum numbers of repeat units observed in these SSRs were six for di-SSRs, five for tri-SSRs, four for tetra-SSRs, three for penta-SSRs and three for hexa-SSRs, with the longer loci generally having more alleles due to the greater potential for slippage [34]. Identified SSRs have been further classified in the database into simple and compound SSRs based on the complexity of the motif. Simple SSRs were found to be abundant in the genome constituting to 89.6% (43,273) of the total SSRs. In contrast, compound SSRs amount to only 10.4% (5,025) of the SSRs (Figure 2B). The most abundant simple SSR is di-SSRs (26,477) followed by the tri-SSRs (13,729), tetra-SSRs (2,368), penta-SSRs (421) and finally hexa-SSRs (278) (Figure 2A). The longest simple SSR was found to be hexa SSR with 49 repeating CAATTT motifs. The highest number of repeats was observed to be 132 in AT motif, (AT)132. Of the simple SSRs, the most frequently occurring motifs were AT (10,935, 41%) in di-SSRs, and AAT (1,820, 13.25%) in tri-SSRs.The SSRs classified based on genomic features (genic or non-genic) show that they occur predominantly in the non-genic regions (46,088, 95.42%) (Figure 2 C). On the other hand, the SSRs in genic regions were low (2,210, 4.57%) in number.

Figure 2

Distribution and classification of SSRs in CicArMiSatDB. A) Distribution of SSRs classified according to the motif repeats show the abundance of di and tri-nucleotide motifs. B) SSRs classified into simple and compound SSRs, simple SSRs constitute the majority of SSRs in chickpea genome. C) SSRs classified by their occurrence in genic and non-genic features show predominance of SSRs in non-genic regions.

Database as a tool to mine for known SSRs

The database search include simple and advance search with various options to explore the SSR information. Simple search will mine the database with any one of the listed options (see below) whereas advance search option could be used to mine SSRs by selecting two or more simple search criteria. The user can mine database using four options in the simple search as follows: 1. The type of the motif e.g. simple motif (classified into di, tri, tetra, penta and hexa repeats) and compound motif. 2. Based on the genomic locations of the SSRs, e.g. the ones found in regions like Contigs, Scaffolds and Pseudomolecules. 3. With a motif sequence of interest. 4. On the basis of genic and non-genic SSRs.Advanced search is implemented by combining 2 or more options of simple search. For example, one can search the simple SSR with the motif “TA” which is reported to be present in the pseudo-molecule number 5 (Ca5). The query result is tabulated with total number of SSRs found in the database along with genomic location as well as primers which could be used for amplification (Figure 3). Validated SSRs reported previously in the literature (1300 in number) have been highlighted with yellow color. Annotation information e.g. gene co-ordinates, orientation of the gene, gene symbols, function, UniProt ID, pathway information, gene ontology ID and gene ontology was also provided. However, in case of search for non-genic SSRs, similar information is displayed along with the details of nearest gene.

Figure 3

Simple SSR search result displaying chromosome co-ordinates, motif type, primers and other information.

Simple SSR search result displaying chromosome co-ordinates, motif type, primers and other information. The search result could further be optionally customized. For example, one could restrict or filter the number of SSRs displayed within the range 25–100 results per page in a table. The table can be sorted depending on the unique SSR-ID and the chromosome in which SSR is present. BLAST search was integrated into the database to find the nearest genic and non-genic SSR available for the query sequence identified in the chickpea genome thereby enabling to discover linked SSRs. User can click on the marker information displayed on the BLAST result page to visualize the marker details in the configured genome browser (GBrowse). Additional details such as the sequence of the SSR could be obtained by clicking on the expanding icon (“+” symbol).GBrowse enables the user to graphically visualize different details (gene, CDS) present in the genome by extracting information from the GFF file. User can customize the tracks displayed by selecting the genomic features of their interest from the “select tracks” tab and type a search term or landmark into the text field at the top of the page. This fetches the region of the genome that spans the landmark, and displays it in an image panel called the “detailed view”. The detailed view consists of 3 horizontal tracks, each of which contains a particular type of sequence feature like gene, CDS and predicted SSR (Figure 4).

Figure 4

A GBrowse view displaying genomic features.

A GBrowse view displaying genomic features. Further, one can upload set of custom markers in GFF format to GBrowse using “Add custom tracks” option of “custom tracks” tab. The users provided custom markers could be overlaid as track in GBrowse and visualize along with the database markers in order to confirm the novelty of SSRs. We hope to include more features such as upstream/downstream elements, search for multiple SSRs based on BLAST search, and export of search results in excel sheet format as further updates to the database. We wish to add track containing information of the existing QTLs in the GBrowse also additional feature could be added to specify the physical location of the primer pairs on chickpea genome with the SSR repeat motif flanked by the primer pair.

Conclusions

We have developed a comprehensive SSR database (CicArMiSatDB) for chickpea. The database includes powerful web-tools (BLAST and GBrowse) accessible with a user-friendly web interface to mine and filter the SSR markers. Advanced tools embedded in this database would help to query and visualize chickpea genome features. It classifies SSRs into genic and non-genic markers. Genic SSRs could be targeted for precise association with the trait of interest. The database is made openly accessible to the research community. It is developed to benefit the chickpea research in particular and legume research in general for both basic and applied studies.

Availability and requirements

CicArMiSatDB has an open access and provides an integrated web interface to search and filter the simple sequence repeats in chickpea genome. This database is freely available online at http://cicarmisatdb.icrisat.org and works well with the CSS3 enabled browsers like Mozilla Firefox and the Google Chrome and Internet Explorer (9.0 or above).

Abbreviations

BLAST: Basic local alignment search tool; QTL: Quantitative trait loci; CSS: Cascading style sheets; GFF: Generic feature format; PHP: Hypertext preprocessor.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Concept of the database was conceived by RKV. Design, development of database was done by DD, KM and RKV. Implementation and configuration of the GBrowse in the database was done by AWK and DD. GA and TS provided the validated SSR information from the literature. Suggestions and implementation guidance was provided by KM, TS, and RKV. DD together with RKV, AWK, KM, GA and TS prepared the MS and RKV finalized the manuscript. All authors read and approved the final manuscript.

Additional file 1: Table S1

List of published SSR markers included in the CicArMiSatDB database. Click here for file

26 in total

1. The generic genome browser: a building block for a model organism system database.

Authors: Lincoln D Stein; Christopher Mungall; ShengQiang Shu; Michael Caudy; Marco Mangone; Allen Day; Elizabeth Nickerson; Jason E Stajich; Todd W Harris; Adrian Arva; Suzanna Lewis
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

2. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers.

Authors: Rajeev K Varshney; Wenbin Chen; Yupeng Li; Arvind K Bharti; Rachit K Saxena; Jessica A Schlueter; Mark T A Donoghue; Sarwar Azam; Guangyi Fan; Adam M Whaley; Andrew D Farmer; Jaime Sheridan; Aiko Iwata; Reetu Tuteja; R Varma Penmetsa; Wei Wu; Hari D Upadhyaya; Shiaw-Pyng Yang; Trushar Shah; K B Saxena; Todd Michael; W Richard McCombie; Bicheng Yang; Gengyun Zhang; Huanming Yang; Jun Wang; Charles Spillane; Douglas R Cook; Gregory D May; Xun Xu; Scott A Jackson
Journal: Nat Biotechnol Date: 2011-11-06 Impact factor: 54.908

3. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement.

Authors: Rajeev K Varshney; Chi Song; Rachit K Saxena; Sarwar Azam; Sheng Yu; Andrew G Sharpe; Steven Cannon; Jongmin Baek; Benjamin D Rosen; Bunyamin Tar'an; Teresa Millan; Xudong Zhang; Larissa D Ramsay; Aiko Iwata; Ying Wang; William Nelson; Andrew D Farmer; Pooran M Gaur; Carol Soderlund; R Varma Penmetsa; Chunyan Xu; Arvind K Bharti; Weiming He; Peter Winter; Shancen Zhao; James K Hane; Noelia Carrasquilla-Garcia; Janet A Condie; Hari D Upadhyaya; Ming-Cheng Luo; Mahendar Thudi; C L L Gowda; Narendra P Singh; Judith Lichtenzveig; Krishna K Gali; Josefa Rubio; N Nadarajan; Jaroslav Dolezel; Kailash C Bansal; Xun Xu; David Edwards; Gengyun Zhang; Guenter Kahl; Juan Gil; Karam B Singh; Swapan K Datta; Scott A Jackson; Jun Wang; Douglas R Cook
Journal: Nat Biotechnol Date: 2013-01-27 Impact factor: 54.908

4. Analysis of BAC-end sequences (BESs) and development of BES-SSR markers for genetic mapping and hybrid purity assessment in pigeonpea (Cajanus spp.).

Authors: Abhishek Bohra; Anuja Dubey; Rachit K Saxena; R Varma Penmetsa; K N Poornima; Naresh Kumar; Andrew D Farmer; Gudipati Srivani; Hari D Upadhyaya; Ragini Gothalwal; S Ramesh; Dhiraj Singh; Kulbhushan Saxena; P B Kavi Kishor; Nagendra K Singh; Christopher D Town; Gregory D May; Douglas R Cook; Rajeev K Varshney
Journal: BMC Plant Biol Date: 2011-03-29 Impact factor: 4.215

5. Advancing the STMS genomic resources for defining new locations on the intraspecific genetic linkage map of chickpea (Cicer arietinum L.).

Authors: Rashmi Gaur; Niroj K Sethy; Shalu Choudhary; Bhumika Shokeen; Varsha Gupta; Sabhyata Bhatia
Journal: BMC Genomics Date: 2011-02-17 Impact factor: 3.969

6. Novel SSR markers from BAC-end sequences, DArT arrays and a comprehensive genetic map with 1,291 marker loci for chickpea (Cicer arietinum L.).

Authors: Mahendar Thudi; Abhishek Bohra; Spurthi N Nayak; Nicy Varghese; Trushar M Shah; R Varma Penmetsa; Nepolean Thirunavukkarasu; Srivani Gudipati; Pooran M Gaur; Pawan L Kulwal; Hari D Upadhyaya; Polavarapu B Kavikishor; Peter Winter; Günter Kahl; Christopher D Town; Andrzej Kilian; Douglas R Cook; Rajeev K Varshney
Journal: PLoS One Date: 2011-11-15 Impact factor: 3.240

7. PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome.

Authors: Vasu Arora; M A Iquebal; Anil Rai; Dinesh Kumar
Journal: Database (Oxford) Date: 2013-02-08 Impact factor: 3.451

8. Large-scale development of cost-effective single-nucleotide polymorphism marker assays for genetic mapping in pigeonpea and comparative mapping in legumes.

Authors: Rachit K Saxena; R Varma Penmetsa; Hari D Upadhyaya; Ashish Kumar; Noelia Carrasquilla-Garcia; Jessica A Schlueter; Andrew Farmer; Adam M Whaley; Birinchi K Sarma; Gregory D May; Douglas R Cook; Rajeev K Varshney
Journal: DNA Res Date: 2012-10-26 Impact factor: 4.458

9. Comparative analysis of kabuli chickpea transcriptome with desi and wild chickpea provides a rich resource for development of functional markers.

Authors: Gaurav Agarwal; Shalu Jhanwar; Pushp Priya; Vikash K Singh; Maneesha S Saxena; Swarup K Parida; Rohini Garg; Akhilesh K Tyagi; Mukesh Jain
Journal: PLoS One Date: 2012-12-27 Impact factor: 3.240

10. Integrated consensus map of cultivated peanut and wild relatives reveals structures of the A and B genomes of Arachis and divergence of the legume genomes.

Authors: Kenta Shirasawa; David J Bertioli; Rajeev K Varshney; Marcio C Moretzsohn; Soraya C M Leal-Bertioli; Mahendar Thudi; Manish K Pandey; Jean-Francois Rami; Daniel Foncéka; Makanahally V C Gowda; Hongde Qin; Baozhu Guo; Yanbin Hong; Xuanqiang Liang; Hideki Hirakawa; Satoshi Tabata; Sachiko Isobe
Journal: DNA Res Date: 2013-01-12 Impact factor: 4.458

13 in total

1. MusatransSSRDB (a transcriptome derived SSR database) - An advanced tool for banana improvement.

Authors: Suthanthiram Backiyarani; Arumugam Chandrasekar; Subbaraya Uma; Marimuthu Somasundaram Saraswathi
Journal: J Biosci Date: 2019-03 Impact factor: 1.826

Review 2. Current advances in chickpea genomics: applications and future perspectives.

Authors: Uday Chand Jha
Journal: Plant Cell Rep Date: 2018-06-02 Impact factor: 4.570

3. Development of Highly Informative Genome-Wide Single Sequence Repeat Markers for Breeding Applications in Sesame and Construction of a Web Resource: SisatBase.

Authors: Komivi Dossa; Jingyin Yu; Boshou Liao; Ndiaga Cisse; Xiurong Zhang
Journal: Front Plant Sci Date: 2017-08-22 Impact factor: 5.753

4. Boechera microsatellite website: an online portal for species identification and determination of hybrid parentage.

Authors: Fay-Wei Li; Catherine A Rushworth; James B Beck; Michael D Windham
Journal: Database (Oxford) Date: 2017-01-01 Impact factor: 3.451

5. Genome-Wide Novel Genic Microsatellite Marker Resource Development and Validation for Genetic Diversity and Population Structure Analysis of Banana.

Authors: Manosh Kumar Biswas; Mita Bagchi; Dhiman Biswas; Jennifer Ann Harikrishna; Yuxuan Liu; Chunyu Li; Ou Sheng; Christoph Mayer; Ganjun Yi; Guiming Deng
Journal: Genes (Basel) Date: 2020-12-09 Impact factor: 4.096

6. Co-localization of genomic regions associated with seed morphology and composition in a desi chickpea (Cicer arietinum L.) population varying in seed protein concentration.

Authors: Runfeng Wang; Manu P Gangola; Craig Irvine; Pooran M Gaur; Monica Båga; Ravindra N Chibbar
Journal: Theor Appl Genet Date: 2019-01-19 Impact factor: 5.699

7. CicArVarDB: SNP and InDel database for advancing genetics research and breeding applications in chickpea.

Authors: Dadakhalandar Doddamani; Aamir W Khan; Mohan A V S K Katta; Gaurav Agarwal; Mahendar Thudi; Pradeep Ruperao; David Edwards; Rajeev K Varshney
Journal: Database (Oxford) Date: 2015-08-19 Impact factor: 3.451

8. CTDB: An Integrated Chickpea Transcriptome Database for Functional and Applied Genomics.

Authors: Mohit Verma; Vinay Kumar; Ravi K Patel; Rohini Garg; Mukesh Jain
Journal: PLoS One Date: 2015-08-31 Impact factor: 3.240

9. Development of genome-wide informative simple sequence repeat markers for large-scale genotyping applications in chickpea and development of web resource.

Authors: Swarup K Parida; Mohit Verma; Santosh K Yadav; Supriya Ambawat; Shouvik Das; Rohini Garg; Mukesh Jain
Journal: Front Plant Sci Date: 2015-08-21 Impact factor: 5.753

10. PlantFuncSSR: Integrating First and Next Generation Transcriptomics for Mining of SSR-Functional Domains Markers.

Authors: Gaurav Sablok; Antonio J Pérez-Pulido; Thac Do; Tan Y Seong; Carlos S Casimiro-Soriguer; Nicola La Porta; Peter J Ralph; Andrea Squartini; Antonio Muñoz-Merida; Jennifer A Harikrishna
Journal: Front Plant Sci Date: 2016-06-27 Impact factor: 5.753