Literature DB >> 21364824

Mining of SSR markers from Expressed Sequence Tags of bamboo species.

Oviya Iyappan Ramalakshmi, Shanmughavel Piramanayagam.   

Abstract

With the ever increasing number of Expressed Sequence Tags (ESTs) from various sequencing projects, ESTs have become valuable and first-hand source of in-silico mining of simple sequence repeats (SSR) markers. We examined a total of 3419 EST sequences from three bamboo species, namely, Phyllostachys edulis, Bambusa oldhamii and Dendrocalamus sinicus for the presence of di- to hexa- microsatellites. The frequency of SSR containing ESTs varied from 5.36% in B. oldhamii to 13.05% in P. edulis. No SSRs were found in D. sinicus. Tri-nucleotide repeats (49.34%) were most frequent in P. edulis, while not much comparable difference in repeats was found in B. oldhamii. Flanking primer pairs were also designed in-silico for the sequences containing SSRs and their position on the genome hypothesized using similarity searching. SSRs located in open reading frame (ORF) were given functional annotation using Gene Ontology. Polymorphic SSRs were also detected using new pipeline- polySSR. Polymorphism level was very low (2.43%) and the position of the polymorphic SSRs was determined. The development of SSRs and the study of polymorphism will help in the further study of intra- and inter- gene flow, genetic structure, variability, linkage mapping and evolutionary relationships in bamboo.

Entities:  

Year:  2010        PMID: 21364824      PMCID: PMC3055702          DOI: 10.6026/97320630005240

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

A large number of plant genomes are under consideration for whole genome sequencing but for most of them, due to their large genome size, the task has been little difficult and time consuming. However, for them, as a prehand, large scale EST sequencing projects has been started as an alternative. Expressed Sequence Tags (EST) are cDNA clones, sequenced randomly in a single pass run. Based on the direction of cloning, it can be 5' EST or 3' EST. Since the ESTs are derived from cDNA, they provide direct evidence for the study of transcriptome and genome. Microsatellites or Simple Sequence Repeats (SSR) or Short tandem repeats (STR) are 1-6 bp tandemly repeated motifs present in both coding and noncoding regions of prokaryotic and eukaryotic genome. SSR are extensively used as molecular markers because of its multiallelic nature, co-dominant inheritance and relative abundance. Since EST also represent the coding part of the genome, these serve as an important source for mining putative SSR markers and provide first hand insight into the organism's genetic diversity. Here, in this study, we present the mining of EST-SSR markers and detecting polymorphism in microsatellites in 3 bamboo species, namely, Phyllostachys edulis, Bambusa oldhamii, Dendrocalamus sinicus. Two approaches were used in this work. First, by using MISA, SSR markers were predicted and functional annotation of those SSR containing sequences was done. Another approach was to determine polymorphism using the novel pipeline PolySSR.

Methodology

ESTs for 3 different bamboo species namely, Phyllostachys edulis, Bambusa oldhamii and Dendrocalamus sinicus were downloaded from NCBI's dbEST. A total of 3087 EST sequences of P. edulis, 318 EST sequences of B. oldhamii and 14 EST sequences of D. sinicus was downloaded from dbEST as on may 29, 2009, dbEST release 052909. dbEST has redundancy in EST sequences. In order to remove the redundancy, CAP3 Assembler [1] was used for clustering. SSR were detected using MIcro SAtellite identification tool (MISA) [2]. The search criteria were to search for a minimum of 14 bp SSR repeat. Primer pairs for the SSRs were designed using Primer3 (v 0.4.0) [3]. The web interface allows the user defined constraints; however, here the default parameters were used. EST-SSR sequences were subjected to similarity searching against nonredundant (nr) database with constraint of ORGN: Oryza sativa. Bamboo being monocotyledon, rice was used as model organism owing to more proximity in phylogeny. It was performed using NCBI's Basic Local Alignment Search Tool (BLAST), variant BLASTX [4]. The sequence was considered homologous if the e-value was ≤ 1e-5 and score ≥ 100. Based on the position of SSR in the homology search, they were assigned whether they lie in 5' UTR, 3' UTR or ORF. Only those EST sequences were further analyzed in which SSR were predicted to be in ORF. The functionality was assigned according to Gene Ontology (GO) annotation. Rice was used as model organism for similarity searching. GO annotation of Oryza sativa was used to map functions by similarity searching in GRAMENE. Polymorphism exhibited by EST-SSR was mapped using the new pipeline PolySSR [5].

Results and Discussion

For analysis, the EST data was significantly reduced to a nonredundant set of sequences by atleast 30% using CAP3. Mono-SSR repeats were not considered since they do not serve important as molecular markers. The search criteria were kept low to maximize the SSR discovery [6]. In D. sinicus, no SSRs were detected by MISA and excluded for further study. Among cereal crops, tri-SSRs were most frequent followed by di- and tetra-SSR [7] and in our study also, tri-SSRs were found to be most abundant (49.34%) among all SSRs in P. edulis which is in agreement to other poaceae family members like barley, maize, sorghum, rice, wheat [8, 9,10]. In general, AT motif are common in plants [11, 12]. However in bamboo, AT motif was least occurring. In P. edulis, among di-SSRs, AG/CT was most abundant. The result was consistent with the results from poaceae members [8, 9,10].. AG/CT motif can represent codons GAG, AGA, UCU and CUC in mRNA population and code for R, E, A and L respectively. Since A and L are found in increased amount in proteins, the abundance of AG/CT in the genome can be substantiated. CG repeats are least found in cereal species [13] and in our present study also, CG/GC motif was completely absent. In plants, common motif is AAG. CCG is a specific feature of monocot genome [14] but in bamboo species, this trend was not observed. Among tri-SSRs, P. edulis showed abundance in AGG/CCT, AGC/CGT and AGT/ATC repeats. These motifs were fairly well represented in rice [15], maize [16], pearl millet [13], and barley [8]. In our study, AAT/ATT motif was found to be nil. This may be explained because TAA- based variants code for stop codons that have direct effect on protein synthesis [16]. In this study, tetra- and hexa repeats were found in lowest frequency (Table 1 see supplementary material). In B. oldhamii, different SSR motifs very almost evenly distributed. Lack of considerable frequency difference in the occurrence of various SSR motifs in B. oldhamii and absence of SSRs in D. sinicus can be reasoned with the less number of EST sequences available and analyzed. Flanking primers pairs were designed for SSR containing sequences. For 90.57% of P. edulis EST-SSR sequences, primers were designed whereas for all SSR containing sequences of B. oldhamii primers were designed in silico. Sharma et al. (2009) [17] designed primers for 81.25% of SSR containing EST from P. edulis and B. oldhamii and were able to amplify 76.1% of those EST primers in lab. Most SSRs for which primers were designed were found to be ≫ 20 bp length, thus has a smaller chance of mutation or slipped strand mispairing over smaller sequence length. This may lead to more chance of sequence conservation. The SSR containing sequences for which primers were designed were then analyzed for determining the relative SSR position on genome using sequence similarity search. Most of the SSRs were predicted in 5' UTR followed by 3'UTR. Very few were predicted to be in ORF. Maximum SSRs were found to be in 5'UTR in bamboo species, in accordance to another study [18]. Almost all AG motifs were found in 5' UTR where as CT motif was found maximally in 5' UTR and very few in 3' UTR. ORF contained mostly trinucleotide repeats [19]. High frequency of trinucleotide repeats can be explained as these are less affected by base mutations and hence more conserved. However, disease causing effect of change in SSR sequence in humans has been reported [20]. These SSR predicted to be in ORF were mapped against related Gene Ontology (GO) IDs. They were mapped with variable functions which are summarized giving the corresponding GO annotations for sequences in Table 2 (see Table 2). The polySSR pipeline resulted in 287 contigs and 2127 singlets of P. edulis. The contigs were analyzed for polymorphism. In all, only 7 contigs were found to contain 7 polymorphic SSRs. The 7 contigs included 21 P. edulis EST sequences, of which 4 EST sequences (189099774, 189100527, 189100246, 189100551) were also included in our final MISA analysis. The SSR motif -GA showed high polymorphism. GA/CT motif was also most polymorphic in rice [10]. Polymorphism was previously reported in Bamboo species by Sharma et al. [17]. Primers were designed for the 7 contig sequences and position determination on genome was done. The presence of variation in repeat units of SSR in 5'UTR is said to affect gene transcription and/or translation, if in 3'UTR, they may be responsible for gene silencing or transcription slippage and if in ORF they inactivate or activate genes or truncate protein [7, 21]. These SSRs can be used as molecular markers if the SSR containing genes are identified and their polymorphism validated. The summary of polySSR result is given in Table 3 (see supplementary material). In B. oldhamii and D. sinicus, no polymorphic SSRs were found. Thus the divergent biological role of SSRs is important in the study of plant genomics and functionalities.
  14 in total

1.  Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat.

Authors:  Ramesh V Kantety; Mauricio La Rota; David E Matthews; Mark E Sorrells
Journal:  Plant Mol Biol       Date:  2002 Mar-Apr       Impact factor: 4.076

2.  A novel feature of microsatellites in plants: a distribution gradient along the direction of transcription.

Authors:  Shigeo Fujimori; Takanori Washio; Kenichi Higo; Yasuhiro Ohtomo; Kazuo Murakami; Kenichi Matsubara; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Shoshi Kikuchi; Masaru Tomita
Journal:  FEBS Lett       Date:  2003-11-06       Impact factor: 4.124

Review 3.  The molecular ecologist's guide to expressed sequence tags.

Authors:  Amy Bouck; Todd Vision
Journal:  Mol Ecol       Date:  2007-03       Impact factor: 6.185

Review 4.  EST-SSRs as a resource for population genetic analyses.

Authors:  J R Ellis; J M Burke
Journal:  Heredity (Edinb)       Date:  2007-05-23       Impact factor: 3.821

5.  Maize simple repetitive DNA sequences: abundance and allele variation.

Authors:  E C Chin; M L Senior; H Shu; J S Smith
Journal:  Genome       Date:  1996-10       Impact factor: 2.166

6.  The abundance of various polymorphic microsatellite motifs differs between plants and vertebrates.

Authors:  U Lagercrantz; H Ellegren; L Andersson
Journal:  Nucleic Acids Res       Date:  1993-03-11       Impact factor: 16.971

7.  Microsatellites in different eukaryotic genomes: survey and analysis.

Authors:  G Tóth; Z Gáspári; J Jurka
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

8.  Study of simple sequence repeat (SSR) markers from wheat expressed sequence tags (ESTs).

Authors:  N Nicot; V Chiquet; B Gandon; L Amilhat; F Legeai; P Leroy; M Bernard; P Sourdille
Journal:  Theor Appl Genet       Date:  2004-05-14       Impact factor: 5.699

9.  Development and mapping of simple sequence repeat markers for pearl millet from data mining of expressed sequence tags.

Authors:  S Senthilvel; B Jayashree; V Mahalakshmi; P Sathish Kumar; S Nakka; T Nepolean; Ct Hash
Journal:  BMC Plant Biol       Date:  2008-11-27       Impact factor: 4.215

10.  Large-scale identification of polymorphic microsatellites using an in silico approach.

Authors:  Jifeng Tang; Samantha J Baldwin; Jeanne Me Jacobs; C Gerard van der Linden; Roeland E Voorrips; Jack Am Leunissen; Herman van Eck; Ben Vosman
Journal:  BMC Bioinformatics       Date:  2008-09-15       Impact factor: 3.169

View more
  1 in total

1.  Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus.

Authors:  Swati Singh; Sanchita Gupta; Ashutosh Mani; Anoop Chaturvedi
Journal:  Bioinformation       Date:  2012-02-03
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.