Literature DB >> 21364792

Mining and characterization of EST derived microsatellites in Curcuma longa L.

Raj Kumar Joshi1, Ananya Kuanar, Sujata Mohanty, Enketeswara Subudhi, Sanghamitra Nayak.   

Abstract

Turmeric (Curcuma longa L.) (Family: Zingiberaceae) is a perennial rhizomatous herbaceous plant often used as a spice since time immemorial. Turmeric plants are also widely known for its medicinal applications. Recently EST-derived SSRs (Simple sequence repeats) are a free by-product of the currently expanding EST (Expressed Sequence Tag) databases. SSRs have been widely applied as molecular markers in genetic studies. Development of high throughput method for detection of SSRs has given a new dimension in their use as molecular markers. A software tool SciRoKo was used to mine class I SSR in Curcuma EST database comprising 12953 sequences. A total of 568 non-redundant SSR loci were detected with an average of one SSR per 14.73 Kb of EST. Furthermore, trinucleotide was found to be the most abundant repeat type among 1-6-nucleotide repeat types. It accounted for 41.19% of the total, followed by the mononucleotide (20.07%) and hexanucleotide repeats (15.14%). Among all the repeat motifs, (A/T)n accounted for the highest proportion followed by (AGG)n. These detected SSRs can be greatly used for designing primers that can be used as markers for constructing saturated genetic maps and conducting comparative genomic studies in different Curcuma species.

Entities:  

Keywords:  Curcuma longa; Expresses sequence tags; SciRoKo; short sequence repeats

Year:  2010        PMID: 21364792      PMCID: PMC3040487          DOI: 10.6026/97320630005128

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

The genus Curcuma of the family Zingiberaceae constitutes 80 species all over Asia, South East Asia and Africa [1]. Turmeric, also known as the “golden spice” is one of the most important herbs in the tropical and subtropical countries. Turmeric rhizome is valued world over and has been in use from ancient time as a spice, food preservative, coloring agent, and in the traditional systems of medicine [2]. Its medicinal uses are indeed diverse, ranging from cosmetic face cream to the prevention of Alzheimer's disease. Turmeric is also qualified as the queen of natural Cox‐2 inhibitors [3]. India is the world's largest producer, and exporter of turmeric followed by China, Indonesia, Bangladesh and Thailand [4]. The International Trade Centre, Geneva, has estimated an annual growth rate of 10% in the world demand for turmeric. Conventional crop improvement methods are not suitable in turmeric because it is not only completely sterile but also propagate exclusively by vegetative means. Characterization of Curcuma longa using molecular markers is very limited excepting a few sporadic reports on isozyme studies and genetic stability studies using RAPD [5, 6]. Moreover, it is a well-known fact that the genotypic diversity of exclusively asexually reproducing plants like turmeric will be lost in the long course of evolution. Hence, the development of reliable and reproducible molecular markers in turmeric is highly essential to assess the genetic diversity for germplasm conservation and crop improvement Microsatellites, or simple sequence repeats (SSRs), are stretches of DNA consisting of tandemly repeated short units of 1‐6 base pairs in length. Compared with other molecular markers, simple sequence repeats (SSRs) are more advantageous because of their simplicity, high information, and co dominant nature and because they can be rapidly screened and analyzed by polymerase chain reaction (PCR) and gel electrophoresis. In addition, SSR loci are present not only in the non‐coding regions of genes but are also widely distributed in the coding regions. Microsatellites are categorized into two groups‐ class I hypervariable markers with ≥20 repeats and class II potentially variable markers with ≤20 repeats. The standard method for development of genomic SSRs is highly time consuming and labor‐intensive [7]. Recent advances in Curcuma genomic technologies have generated a large number of expressed sequence tags(ESTs) that has been made available in public database, thereby offering an opportunity to develop EST derived SSR markers by data mining. ESTs are short and single pass sequences read from mRNA (cDNA) [8] representing a snapshot of genes expressed in a given tissue and or at a given developmental stage. As of July 2010, GenBank had released 12593 EST sequences from Curcuma longa. In this context, the use of EST or cDNA‐based SSRs has been reported for several species including grape [9], sugarcane [10], durum wheat [11] and rye [12]. Keeping in view the above, the objectives of the research described in this paper were to assess the potential of existing public databases for the discovery of simple sequence repeats. We have mined updated EST tissue libraries of Curcuma longa for this analysis to find the SSR polymorphisms. SSR detecting software SciRoKo was used to identify the SSR polymorphisms. There are other SSR detecting softwares such as MISA [7], SSRFinder [13], SSRIT [14], TRF [15], TROLL [16], Sputnik ( http://espressosoftware.com/pages/sputnik. jsp), Modified Sputnik I [17] and Modified Sputnik II [18] but SciRoKo (SSR Classification and Investigation by Robert Kofler) [19] is the only software with user‐friendly interface with a statistical analysis of genomic microsatellites and interpret results as html files.

Methodology

EST database of NCBI contains 12953 Curcuma longa express sequence tag data. We have mined 12593 EST sequences consisting of two tissue libraries of rhizomes 6870 (DY395309‐DY388440) and leaves 5723 (DY388439‐DY382717). The EST sequences were screened against the UniVec database from NCBI ( ftp://ftp.ncbi.nih.gov/pub/ UniVec/) for detecting vector and adapter sequences by using the program Cross_Match [Li et al 2006]; the following parameters were used: minmatch ≥13 and minscore ≥20. Furthermore, polyA/T tails and X characters were removed using the EST_trimmer.pl script (http://pgrc.ipk-gatersleben.de/misa/download/est_trimmer.pl) until no stretch of (A/T)5 or (X)1 was present in a window of 100bp at the 5′ or 3′ end, respectively. CAP3 program was used to assemble the EST sequence into contigs for creating a non‐redundant dataset. The SSR detection tool SciRoKo version 1.0 [19] was used to detect EST‐SSR loci. SciRoKo required inputs in fasta format.

Discussion

Large-scale sequencing of Expressed Sequence Tags and complete genomes offers information of use to plant breeding programs. With the completion of the first crop genome sequencing projects [20] the potential for plant breeding to be impacted by new technology has never been greater. A total of 12953 redundant EST sequences were retrieved from NCBI database representing about 8.4 Mb of Curcuma longa genome. During pre‐processing, 38366Bp of empty vectors, low‐quality sequences and Poly A/T tails were removed successfully. After sequence redundant analysis, 7139 unique sequences with combined length of 5.11 Mb were obtained and were used for mining of hyper variable class I microsatellites. Using the SciRoKo SSR mining program, while searching for SSRs with 1‐6 nucleotide repeat motifs, 568 hypervariable SSR loci were observed (Table 3 see supplementary material). The frequency of SSR loci in turmeric EST was found to be one SSR in every 14.73 kb of EST sequence (Table 1 see supplementary material) that is higher as compared to earlier retrieved data of one SSR per 17.96 Kb [21]. Cardle et al [22] estimated the average distances between SSRs in sets of non-redundant ESTs in poplar (1/14.0 kb), cotton (1/20.0 kb), Arabidopsis (1/13.8 kb), maize (1/8.1 kb), rice (1/3.4 kb), tomato (1/11.1 kb) and soybean (1/7.4 kb). This clearly suggests that, with the increase in the transcript data of plants, SSR estimation in ESTs will become more precise and reliable. The mined EST‐SSRs were classified according to their structure into the simple motif type, with a single motif; and compound type, with more than two motifs. Among the 568 EST‐SSRs, most (98.92%) consisted of simple repeats with no interruptions in the motif; whereas only six loci (1.05 %) were of the compound type (table 2 seesupplementary material). Among the 1‐6 repeat types, the most abundant repeat type was the trinucleotide repeat type, which accounted for 41.19% of the total, followed by the mononucleotide (20.07%) and hexanucleotide types (15.14%). The dinucleotide, tetranucleotide and pentanucleotide types accounted for only 9.68%, 6.16% and 6.69% respectively. Many studies have suggested that the trinucleotide repeat is the main EST‐SSR repeat type in most plants, followed by the dinucleotide and tetranucleotide repeat types [22, 23] . However, the most abundant motif in the trinucleotide repeat type differed among plants [13]. Kantety et al [24] showed that the (CCG)n repeat motif accounted for 32% and 49% of all repeat motifs in wheat and sorghum, respectively. Gupta et al. [25] found that the (AAG)n repeat was the most abundant motif in the trinucleotide repeat type. In a similar study Lu et al 2010, have found (AAG)n to be the most abundant repeat motif in Gossypium barbadense. Siju et al [21] also found (AAG)n to be the most abundant in turmeric accounting to 8.2%. The dominance of trimeric SSRs observed in the present study could be attributed to the fact that the suppression of non‐trimeric SSRs in the coding regions leads to frame shift mutations [26]. Moreover, we also found that the frequency of mononucleotide (20.072%) and hexanucleotide (15.142%) repeats was more as compared to other repeat motifs. This suggests that the functions of EST‐SSRs derived from Curcuma longa may be different form other members of the Zingiberaceae family. In all the repeat motifs, most of the SSR repeat motifs derived from the ESTs were A/T (16.542%) followed by AAG/CTT (11.442%), AGG/CTT (10.912%) and AG/CT (5.282%) (Figure 1). Rest of the repeat motifs accounted less than 52% contribution to the total SSR motifs. In the 1‐6 repeat types, the most frequent repeat motifs were A/T, AG/CT, AAG/CTT, AAAC/GTTT, AAAAC/GTTTT, and AGGCGG/CCGCCT, which accounted for 78.92%, 54.542% 29.012%, 31.422%, 15.682% and 8.982% of all types, respectively. This frequency analysis of repeat motifs can be used as a potential source for designing repeat probes for effective targeting and isolation of microsatellite repeats from turmeric. Moreover, these probes can be used for designing informative primers that can be used for genetic diversity analysis and related studies.
Figure 1

Distributions of EST-SSRs based on the motifs.

Conclusion

In total, we identified 568 non-redundant hypervariable microsatellites from EST data source of Curcuma longa using SSR identification tool SciRoKo. Development of SSR markers from EST‐databases saves both cost and time, once a sufficient amount of EST sequences is available. These non‐redundant SSR resources will not only be applied in studies of genetic variation and linkage mapping but also provide the foundation for an in‐depth analysis of the characteristics of distribution of genes on chromosomes and for comparative genomic studies on different Curcuma species.
  15 in total

1.  Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes.

Authors:  Michele Morgante; Michael Hanafey; Wayne Powell
Journal:  Nat Genet       Date:  2002-01-22       Impact factor: 38.330

2.  Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat.

Authors:  Ramesh V Kantety; Mauricio La Rota; David E Matthews; Mark E Sorrells
Journal:  Plant Mol Biol       Date:  2002 Mar-Apr       Impact factor: 4.076

3.  Computational and experimental characterization of physically clustered simple sequence repeats in plants.

Authors:  L Cardle; L Ramsay; D Milbourne; M Macaulay; D Marshall; R Waugh
Journal:  Genetics       Date:  2000-10       Impact factor: 4.562

4.  SciRoKo: a new tool for whole genome microsatellite search and investigation.

Authors:  Robert Kofler; Christian Schlötterer; Tamas Lelley
Journal:  Bioinformatics       Date:  2007-04-26       Impact factor: 6.937

5.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

6.  Selection against frameshift mutations limits microsatellite expansion in coding DNA.

Authors:  D Metzgar; J Bytof; C Wills
Journal:  Genome Res       Date:  2000-01       Impact factor: 9.043

7.  Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).

Authors:  T Thiel; W Michalek; R K Varshney; A Graner
Journal:  Theor Appl Genet       Date:  2002-09-14       Impact factor: 5.699

8.  Isolation of EST-derived microsatellite markers for genotyping the A and B genomes of wheat.

Authors:  I. Eujayl; M. E. Sorrells; M. Baum; P. Wolters; W. Powell
Journal:  Theor Appl Genet       Date:  2002-02       Impact factor: 5.699

9.  Development, characterization and cross species amplification of polymorphic microsatellite markers from expressed sequence tags of turmeric (Curcuma longa L.).

Authors:  S Siju; K Dhanya; S Syamkumar; B Sasikumar; T E Sheeja; A I Bhat; V A Parthasarathy
Journal:  Mol Biotechnol       Date:  2010-02       Impact factor: 2.695

10.  Generation and comparison of EST-derived SSRs and SNPs in barley (Hordeum vulgare L.).

Authors:  R Kota; R K Varshney; T Thiel; K J Dehmer; A Graner
Journal:  Hereditas       Date:  2001       Impact factor: 3.271

View more
  7 in total

1.  EST-SSR marker revealed effective over biochemical and morphological scepticism towards identification of specific turmeric (Curcuma longa L.) cultivars.

Authors:  Ambika Sahoo; Sudipta Jena; Basudeba Kar; Suprava Sahoo; Asit Ray; Subhashree Singh; Raj Kumar Joshi; Laxmikanta Acharya; Sanghamitra Nayak
Journal:  3 Biotech       Date:  2017-05-12       Impact factor: 2.406

2.  Genome-wide characterization of microsatellites and marker development in the carcinogenic liver fluke Clonorchis sinensis.

Authors:  Thao T B Nguyen; Yuji Arimatsu; Sung-Jong Hong; Paul J Brindley; David Blair; Thewarach Laha; Banchob Sripa
Journal:  Parasitol Res       Date:  2015-03-19       Impact factor: 2.289

Review 3.  Interaction of turmeric (Curcuma longa L.) with beneficial microbes: a review.

Authors:  Ajay Kumar; Amit Kishore Singh; Manish Singh Kaushik; Surabhi Kirti Mishra; Pratima Raj; P K Singh; K D Pandey
Journal:  3 Biotech       Date:  2017-10-03       Impact factor: 2.406

4.  Assessment of genetic diversity in indigenous turmeric (Curcuma longa) germplasm from India using molecular markers.

Authors:  Sushma Verma; Shweta Singh; Suresh Sharma; S K Tewari; R K Roy; A K Goel; T S Rana
Journal:  Physiol Mol Biol Plants       Date:  2015-03-20

5.  Mining and characterization of EST-SSR markers for Zingiber officinale Roscoe with transferability to other species of Zingiberaceae.

Authors:  Praveen Awasthi; Ashish Singh; Gulfam Sheikh; Vidushi Mahajan; Ajai Prakash Gupta; Suphla Gupta; Yashbir S Bedi; Sumit G Gandhi
Journal:  Physiol Mol Biol Plants       Date:  2017-10-11

6.  Mining, characterization and validation of EST derived microsatellites from the transcriptome database of Allium sativum L.

Authors:  Subodh Kumar Chand; Satyabrata Nanda; Ellojita Rout; Raj Kumar Joshi
Journal:  Bioinformation       Date:  2015-03-31

7.  De novo assembly of transcriptomes, mining, and development of novel EST-SSR markers in Curcuma alismatifolia (Zingiberaceae family) through Illumina sequencing.

Authors:  Sima Taheri; Thohirah Lee Abdullah; M Y Rafii; Jennifer Ann Harikrishna; Stefaan P O Werbrouck; Chee How Teo; Mahbod Sahebi; Parisa Azizi
Journal:  Sci Rep       Date:  2019-02-28       Impact factor: 4.379

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.