Literature DB >> 26289427

CicArVarDB: SNP and InDel database for advancing genetics research and breeding applications in chickpea.

Dadakhalandar Doddamani¹, Aamir W Khan¹, Mohan A V S K Katta¹, Gaurav Agarwal¹, Mahendar Thudi¹, Pradeep Ruperao², David Edwards³, Rajeev K Varshney⁴.

Abstract

Molecular markers are valuable tools for breeders to help accelerate crop improvement. High throughput sequencing technologies facilitate the discovery of large-scale variations such as single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs). Sequencing of chickpea genome along with re-sequencing of several chickpea lines has enabled the discovery of 4.4 million variations including SNPs and InDels. Here we report a repository of 1.9 million variations (SNPs and InDels) anchored on eight pseudomolecules in a custom database, referred as CicArVarDB that can be accessed at http://cicarvardb.icrisat.org/. It includes an easy interface for users to select variations around specific regions associated with quantitative trait loci, with embedded webBLAST search and JBrowse visualisation. We hope that this database will be immensely useful for the chickpea research community for both advancing genetics research as well as breeding applications for crop improvement. Database URL: http://cicarvardb.icrisat.org.

Entities: Chemical Disease Species

Mesh：

Year: 2015 PMID： 26289427 PMCID： PMC4541373 DOI： 10.1093/database/bav078

Source DB: PubMed Journal: Database (Oxford) ISSN： 1758-0463 Impact factor: 3.451

Introduction

Chickpea (Cicer arietinum) ranks second in production among grain legumes after common bean. It is a challenging task for breeders to increase crop production to meet the growing demand for this crop. Genetic and genomic resources such as molecular markers, genetic and physical maps and trait associated markers are valuable sources for breeders for crop improvement. Significant efforts have been made by chickpea breeders and over 350 cultivars have been released so far, India being the largest grower of chickpea (1). Recently sequenced draft genome of kabuli chickpea (CDC frontier), along with whole genome re-sequencing (WGRS) data of 29 genotypes and 61 sequenced using restriction site-associated DNA (RAD) have provided genomic resources to support functional genomics and molecular breeding activities for chickpea improvement (2). Single nucleotide polymorphisms (SNPs) are common genetic variants used to identify candidate genes and genotype–phenotype association studies (3). Most significant uses are estimation of breeding value of genotypes and marker-assisted selection studies (4). Concerted efforts of diverse research groups (3, 5–12) have led to the development of many SNP markers. Recent advances in DNA sequencing technologies such as next generation sequencing (NGS) facilitate the discovery of large numbers of SNPs at relatively low cost (13–15). Publicly available alignment algorithms and variant detection tools have facilitated mapping of short read sequences and detection of variations with respect to reference genome sequence (16). A SNP database should help users access a wide variety of biological data by querying with simple commands in both an easy and comprehensive manner. Considering the significance of markers for crop improvement, several attempts have been made to develop marker repositories in various crops, e.g. rice (17, 18), maize (19), wheat (20), soybean (21, 22), barley (18, 23), brassica (18) etc. autoSNPdb (http://autosnpdb.appliedbioinformatics.com.au/) (18, 23) is generalised SNP database storing information for species namely rice, wheat, barley and brassica. Legume information system (LIS) and Chickpea Genomic Web Resource (CGWR) mainly provides accessibility to the chickpea genome and its features through genome browsers and quantitative trait loci maps to the users. The current study brings together all publically available information on SNPs/InDels to the chickpea research community. A SSR marker repository entitled CicArMiSatDB (24) for chickpea was developed and now similar efforts have been initiated to develop a SNP database in chickpea to make the SNP data publically available mainly for assisting breeders in varietal improvement. In the past, several thousand SNPs were identified and reported using RNA-sequencing technologies (5, 8). With the availability of reference genome along with re-sequence data from 90 chickpea lines using WGRS or RAD sequencing, several million variants including SNPs have been identified (2). With an objective to enhance utilization of the identified SNPs/InDels for chickpea genetics and breeding applications, this study reports a database containing large-scale variants along with characterization in the genome.

Material and methods

SNP and InDel variations were identified from WGRS of 29 varieties and RAD sequencing of 61 varieties (2). SNPs located on the pseudo-molecules were used to develop CicArVarDB. The SNPs located on the pseudomolecules were arranged as a binary matrix based on their presence/absence across the 90 genotypes. The final data set was stored in a relational database constructed using PostgreSQL (v9.2.4). Genes and gene annotations were downloaded from http://www.icrisat.org/gt-bt/ICGGC/genomedata.zip. Functional annotation for the predicted chickpea gene set was done using BLASTP (25) comparison with the Swiss-Prot and TrEMBL databases with an e-value threshold of 1e−05. Corresponding UniProt IDs were obtained. The annotated genes were classified based on gene-ontology, domain specificity, protein family and the pathways they featured in from UniProt Knowledge Base (UniProtKB) (26) using a custom made perl script (Uniprot_to_functional_annotation.pl; https://github.com/CEG-ICRISAT/cicarvardb-scripts). Processed SNP information along with genes and their annotation were split into five tables. The schematic overview of the CicArVarDB is described in Figure 1.

Figure 1.

Schematic outline of CicArVarDB.

Schematic outline of CicArVarDB. The web interface was developed using the web-builder platform WordPress. Currently the database, CicArVarDB is hosted on Amazon cloud server with two cores and 8 GB RAM. It is configured with BLAST and JBrowse (27) to enable sequence based searching for SNP containing loci and visualisation of the alignment of reads mapped to the reference genome sequence.

Results and discussion

From an initial dataset of 4.4 million genomic variants reported by Varshney et al. (2), around 2.5 million were removed as they could not be mapped to any of the chickpea pseudomolecules. A final set of 1 965 803 variations, included SNPs (88.48%) and small length (1–10 bp) insertions (5.27%) and deletions (6.23%). InDels of length 1 and 2 bp contributed more than 70% of total small InDels. The minimum variations amongst the WGRS lines was found to be 278 530 in CDC Luna, while the maximum variations was 686 712 in ICCV 88202. In the RAD sequenced genotypes, the minimum variations count was 2580 in ICC 15973 and the maximum, 19 160, was reported in ICCV 94954 (Figure 2a,b). Detailed distribution of variations across pseudo-molecules and other features can be referred to in Supplementary Table S1.

Figure 2.

Genome wide distribution of variations across 90 chickpea genotypes. a) distribution of variation among 29 lines re-sequenced using WGRS approach. b) distribution of variations among 61 lines re-sequenced using RAD-Seq approach. Variations were classified on the basis of their genomic locations using SnpEff (28). The most abundant variations (89.21%) were found in intergenic regions with a density of 6.38 variations per kilo base pair while the variation density in exonic and intronic regions were found to be 2.64 (3.68%) and 3.16 (7.08%), respectively (Figure 3; Table 1). Variations within coding regions were further functionally classified into missense (57.94%), nonsense (2.22%) and silent (39.83%) mutations.

Figure 3.

Distribution of variations on the basis of the genomic locations.

Table 1.

Variation statistics across the chickpea genome

Region	Length	SNPs	Deletions	Insertion	Total	Density (Per Kb)	Change rate
CDS	27 941 821	71 862	1240	692	73 794	2.64	378.65
Intron	45 204 800	119 662	12 677	10 617	142 956	3.16	316.21
Intergenic	274 100 756	1 547 940	108 729	92 384	1 749 053	6.38	156.71
Total	347 247 377	1 739 464	122 646	103 693	1 965 803	5.66	176.64

Distribution of variations on the basis of the genomic locations. Variation statistics across the chickpea genome The overall transitions/transversions ratio (Ts:Tv) was 1.44 which is slightly lower than that observed in maize nuclear SNPs (1.48) and grass chloroplast (1.3) (14, 16, 23, 29–31). Using SnpEff, the impact of variations on the protein coding sequences was classified into low, moderate and high effect variations. The low effect variations (24 303) included synonymous SNPs and the variations observed in splice site region (Sequence variant where a change has occurred either within 1–3 bases of the exon or 3–8 bases of the intron). Moderate effect variations (36 087) included the ones which change coding region by insertion, deletion or alteration of the codon. Variations that lead to loss of exon, changes in splice site acceptor (two bases before exon start, excluding the first exon) or donor (two bases after coding exon end, excluding for the last exon) and frame shift mutations, etc were grouped as high effect variations (2359). Functional annotation of genes containing the variations was carried out by searching for homologous sequences using BLASTP. Genes were grouped into categories including ‘Biogenesis’, ’Biosynthesis’, ’Degradation’, ‘Metabolism’, ‘Modification’ and ’Photosynthesis’. The detailed gene annotation can be obtained by clicking on Gene-ID featuring the corresponding SNP/InDel. Annotation includes UniProt, GO and InterProScan (32) IDs and pathway classification beside other details. User can fetch more details about the genes by clicking on respective UniProt, GO and InterProScan IDs. The maximum number of variations (6128), across 668 genes was observed in the biosynthesis group which included subcategories such as biosynthesis of protein, amino acid, plant hormone etc. The metabolism group contained a total of 672 genes which exhibited 5901 variations followed by Modification group, mainly involved in modification of proteins, containing 741 genes with 4831 variations (Figure 4).

Figure 4.

Distribution of variations based on pathway annotation.

An overview of CicArVarDB

CicArVarDB is a genome variation mining tool for chickpea and it has two different approaches to screen the variations. A basic search provides options to narrow down the reported variations. The user has the following options to perform a variations search. Keyword search: Variations can be searched based on the gene, SNP-ID or the keywords such as ‘disease resistance’, ‘kinase’, ‘transporter’ etc. Variation search using keywords is supported by boolean operators such as ‘AND’, ‘OR’, ’NOT’ for a combination of words, with space as delimiter, supporting search for the exact string match. The keyword search also supports search for a substring of a given single keyword (no spaces). The search output includes the location of variations if present in the gene of interest. Location: Users can search for the presence of variations within specified genome co-ordinates. SNP classification: Variations are classified according to their location in the genome such as intergenic, intronic and CDS. Variations within the CDS regions were further classified into synonymous and non-synonymous. Users can search for variations within these categories. Type of variants: Variations were categorized as insertions, deletions and substitutions to facilitate the search of variants of choice. Between any two genotypes: Search can be executed for variations between any two genotypes of interest. Sequencing type: Currently, the database hosts variations obtained from WGRS and RAD sequencing data. Type of genotypes: Sequenced desi and kabuli chickpea genotypes can be filtered for variants occurring in either of these types. Functional class: SNPs classified as missense, nonsense and silent can be retrieved. Effect intensity: On the basis of the severity of their effects, variations were differentiated into low, moderate and high effect variations. These variations may occur in the intergenic, intronic or exonic regions in the genome. Pathway: Search can be carried out for the variations affecting a gene featured in a specific biological pathway. An advanced search method provides the user with more comprehensive selection criteria to choose at least any two of the listed options: Genomic location SNP classification SNP type Selecting multiple genotypes An advanced search can be used to attain very specific and precise number of variations. The results of both basic and advanced searches includes a table of genomic co-ordinates, bases present at the SNP position in reference genome sequence and corresponding variant base call. A click on ‘+’ symbol present at the beginning of each record expands the row with two columns displaying genotype names which match to the reference and variant base, respectively. Variation can be visualized in JBrowse by clicking on the SNP ID provided in the results table. The basic search method numbers 6 and 7 list out the genotypes depending on the sequencing type (WGRS/RAD) and the type (desi/kabuli) of genotype, respectively. Selecting a particular genotype from this table produces a table detailing the variations present in it. The user may wish to know the presence of reported variations in a sequence of interest. To perform this, webBLAST has been implemented as an additional tool to help the user find a homologous region in the reference sequence with a submitted query sequence and explore the presence of variations within a 10 kb flanking region. BLASTN search is integrated to find such homologous region in the chickpea genome for the user given query sequence and the co-ordinates for resulting HSPs are used to identify the number of SNPs present in the region. The result page displays the SNP information along with its presence in the gene and associated gene annotation. JBrowse, a Java based genome browser is embedded in the database. JBrowse uses a client side scripting which makes the browser faster and allows easy scaling of large genome regions unlike other genome browsers like GBrowse (which is implemented by CGI protocols). These features of JBrowse enables the users to upload BAM files as tracks to visualize alignment files faster as compared to the CGI-based genome browsers and provides pictorial representation for the presence of variations. By default JBrowse displays seven genoytpes' alignment files as tracks along with gene annotation track. It contains an option of selecting one or more genotypes. An option is provided for selecting multiple genotypes at once based on the grouping like desi/kabuli/wild or WGRS/RAD sequencing.

Conclusions

Advances in high throughput sequencing have led to the detection of large number of genome wide SNP markers that are valuable tools in plant genomics. These molecular markers are being utilized extensively in breeding programs in many crops. Chickpea genome sequence provides a comprehensive resource for the mining of SNP markers. CicArVarDB is a comprehensive resource which lists SNPs/InDels reported in 90 chickpea genotypes. Moreover, the variations reported have been classified based on their effects and distribution on genome. The variations containing sequences have also been functionally annotated with UniProtKB and InterProScan. This variation database will support the application of existing genomic resources like several thousand simple sequence repeats (SSRs), several million SNPs, high-density diversity array technology (DArT markers) and Illumina GoldenGate assays, high-density genetic maps, transcriptome assemblies and the draft genome sequence of chickpea (33) to chickpea improvement programmes. CicArVarDB has been designed with an intent to help breeders/researchers mine for SNP markers and is supported by a user friendly web interface. It also implements an easy to use visualization tool JBrowse, which provides a pictorial representation of the variations to the user.

Future directions

The database will be updated periodically based on the availability of the SNP/InDel datasets for genotypes which will be sequenced in future. SNPs/InDels which will be validated in different studies will also be incorporated in the database. We intend to include more features in the database including conversion of SNPs into putative CAPS markers and designing primers for SNPs. Options will be provided to export the SNPs in file formats accepted by various high-throughput SNP genotyping platforms such as the GoldenGate Genotyping Technology (GGGT; Illumina, San Diego, CA, USA), SNPStream (Beckman Coulter, USA), GeneChip (Affymetrix, USA) and KASPar (KBio science, UK).

Availability and requirements

CicArVarDB is open access and available at http://cicarvardb.icrisat.org. The web interface and the genome browser work with latest browser versions of Mozilla Firefox (10 and later), Google Chrome (17 and later), Apple Safari (5 and later, 6 is required for BAM, BigWig, VCF + Tabix) and Microsoft Internet Explorer (9 and later, 10 is required for BAM, BigWig, VCF + Tabix).

Supplementary Data

Supplementary data are available at Database Online.

31 in total

Review 1. Applications of single nucleotide polymorphisms in crop genetics.

Authors: Antoni Rafalski
Journal: Curr Opin Plant Biol Date: 2002-04 Impact factor: 7.834

2. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement.

Authors: Rajeev K Varshney; Chi Song; Rachit K Saxena; Sarwar Azam; Sheng Yu; Andrew G Sharpe; Steven Cannon; Jongmin Baek; Benjamin D Rosen; Bunyamin Tar'an; Teresa Millan; Xudong Zhang; Larissa D Ramsay; Aiko Iwata; Ying Wang; William Nelson; Andrew D Farmer; Pooran M Gaur; Carol Soderlund; R Varma Penmetsa; Chunyan Xu; Arvind K Bharti; Weiming He; Peter Winter; Shancen Zhao; James K Hane; Noelia Carrasquilla-Garcia; Janet A Condie; Hari D Upadhyaya; Ming-Cheng Luo; Mahendar Thudi; C L L Gowda; Narendra P Singh; Judith Lichtenzveig; Krishna K Gali; Josefa Rubio; N Nadarajan; Jaroslav Dolezel; Kailash C Bansal; Xun Xu; David Edwards; Gengyun Zhang; Guenter Kahl; Juan Gil; Karam B Singh; Swapan K Datta; Scott A Jackson; Jun Wang; Douglas R Cook
Journal: Nat Biotechnol Date: 2013-01-27 Impact factor: 54.908

Review 3. Advances in genetics and molecular breeding of three legume crops of semi-arid tropics using next-generation sequencing and high-throughput genotyping technologies.

Authors: Rajeev K Varshney; Himabindu Kudapa; Manish Roorkiwal; Mahendar Thudi; Manish K Pandey; Rachit K Saxena; Siva K Chamarthi; S Murali Mohan; Nalini Mallikarjuna; Hari Upadhyaya; Pooran M Gaur; L Krishnamurthy; K B Saxena; Shyam N Nigam; Suresh Pande
Journal: J Biosci Date: 2012-11 Impact factor: 1.826

4. Large-scale development of cost-effective SNP marker assays for diversity assessment and genetic mapping in chickpea and comparative mapping in legumes.

Authors: Pavana J Hiremath; Ashish Kumar; Ramachandra Varma Penmetsa; Andrew Farmer; Jessica A Schlueter; Siva K Chamarthi; Adam M Whaley; Noelia Carrasquilla-Garcia; Pooran M Gaur; Hari D Upadhyaya; Polavarapu B Kavi Kishor; Trushar M Shah; Douglas R Cook; Rajeev K Varshney
Journal: Plant Biotechnol J Date: 2012-06-16 Impact factor: 9.803

5. Large-scale transcriptome analysis in chickpea (Cicer arietinum L.), an orphan legume crop of the semi-arid tropics of Asia and Africa.

Authors: Pavana J Hiremath; Andrew Farmer; Steven B Cannon; Jimmy Woodward; Himabindu Kudapa; Reetu Tuteja; Ashish Kumar; Amindala Bhanuprakash; Benjamin Mulaosmanovic; Neha Gujaria; Laxmanan Krishnamurthy; Pooran M Gaur; Polavarapu B Kavikishor; Trushar Shah; Ramamurthy Srinivasan; Marc Lohse; Yongli Xiao; Christopher D Town; Douglas R Cook; Gregory D May; Rajeev K Varshney
Journal: Plant Biotechnol J Date: 2011-05-25 Impact factor: 9.803

6. UniProt Knowledgebase: a hub of integrated protein data.

Authors: Michele Magrane
Journal: Database (Oxford) Date: 2011-03-29 Impact factor: 3.451

7. Soybean Knowledge Base (SoyKB): a web resource for soybean translational genomics.

Authors: Trupti Joshi; Kapil Patil; Michael R Fitzpatrick; Levi D Franklin; Qiuming Yao; Jeffrey R Cook; Zheng Wang; Marc Libault; Laurent Brechenmacher; Babu Valliyodan; Xiaolei Wu; Jianlin Cheng; Gary Stacey; Henry T Nguyen; Dong Xu
Journal: BMC Genomics Date: 2012-01-17 Impact factor: 3.969

8. InterProScan 5: genome-scale protein function classification.

Authors: Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal: Bioinformatics Date: 2014-01-21 Impact factor: 6.937

9. AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants.

Authors: Chris Duran; Nikki Appleby; Terry Clark; David Wood; Michael Imelfort; Jacqueline Batley; David Edwards
Journal: Nucleic Acids Res Date: 2008-10-14 Impact factor: 16.971

10. Next generation characterisation of cereal genomes for marker discovery.

Authors: Paul Visendi; Jacqueline Batley; David Edwards
Journal: Biology (Basel) Date: 2013-11-25

8 in total

Review 1. Current advances in chickpea genomics: applications and future perspectives.

Authors: Uday Chand Jha
Journal: Plant Cell Rep Date: 2018-06-02 Impact factor: 4.570

Review 2. Assessing and Exploiting Functional Diversity in Germplasm Pools to Enhance Abiotic Stress Adaptation and Yield in Cereals and Food Legumes.

Authors: Sangam L Dwivedi; Armin Scheben; David Edwards; Charles Spillane; Rodomiro Ortiz
Journal: Front Plant Sci Date: 2017-08-29 Impact factor: 5.753

3. Discovery of Putative Herbicide Resistance Genes and Its Regulatory Network in Chickpea Using Transcriptome Sequencing.

Authors: Mir A Iquebal; Khela R Soren; Priyanka Gangwar; P S Shanmugavadivel; K Aravind; Deepak Singla; Sarika Jaiswal; Rahul S Jasrotia; Sushil K Chaturvedi; Narendra P Singh; Rajeev K Varshney; Anil Rai; Dinesh Kumar
Journal: Front Plant Sci Date: 2017-06-07 Impact factor: 5.753

4. KASPspoon: an in vitro and in silico PCR analysis tool for high-throughput SNP genotyping.

Authors: Alsamman M Alsamman; Shafik D Ibrahim; Aladdin Hamwieh
Journal: Bioinformatics Date: 2019-09-01 Impact factor: 6.937

5. CicerSpTEdb: A web-based database for high-resolution genome-wide identification of transposable elements in Cicer species.

Authors: Morad M Mokhtar; Alsamman M Alsamman; Haytham M Abd-Elhalim; Achraf El Allali
Journal: PLoS One Date: 2021-11-11 Impact factor: 3.240

Review 6. Genetic Augmentation of Legume Crops Using Genomic Resources and Genotyping Platforms for Nutritional Food Security.

Authors: Romesh K Salgotra; Charles Neal Stewart
Journal: Plants (Basel) Date: 2022-07-18

7. Exploring Chickpea Germplasm Diversity for Broadening the Genetic Base Utilizing Genomic Resourses.

Authors: Rajesh Kumar Singh; Charul Singh; B S Chandana; Rohit K Mahto; Ranjana Patial; Astha Gupta; Vijay Gahlaut; Aladdin Hamwieh; H D Upadhyaya; Rajendra Kumar
Journal: Front Genet Date: 2022-08-04 Impact factor: 4.772

8. Whole genome re-sequencing reveals genome-wide variations among parental lines of 16 mapping populations in chickpea (Cicer arietinum L.).

Authors: Mahendar Thudi; Aamir W Khan; Vinay Kumar; Pooran M Gaur; Krishnamohan Katta; Vanika Garg; Manish Roorkiwal; Srinivasan Samineni; Rajeev K Varshney
Journal: BMC Plant Biol Date: 2016-01-27 Impact factor: 4.215

8 in total