Literature DB >> 21731391

Mining of simple sequence repeats in the Genome of Gentianaceae.

R Sathishkumar1, P T V Lakshmi, A Annamalai, V Arunachalam.   

Abstract

Simple sequence repeats (SSRs) or short tandem repeats are short repeat motifs that show high level of length polymorphism due to insertion or deletion mutations of one or more repeat types. Here, we present the detection and abundance of microsatellites or SSRs in nucleotide sequences of Gentianaceae family. A total of 545 SSRs were mined in 4698 nucleotide sequences downloaded from the National Center for Biotechnology Information (NCBI). Among the SSR sequences, the frequency of repeat type was about 429 -mono repeats, 99 -di repeats, 15 -tri repeats, and 2 --hexa repeats. Mononucleotide repeats were found to be abundant repeat types, about 78%, followed by dinucleotide repeats (18.16%) among the SSR sequences. An attempt was made to design primer pairs for 545 identified SSRs but these were found only for 169 sequences.

Entities:  

Keywords:  Gentianaceae; nucleotide; simple sequence repeats

Year:  2011        PMID: 21731391      PMCID: PMC3119266          DOI: 10.4103/0974-8490.79111

Source DB:  PubMed          Journal:  Pharmacognosy Res        ISSN: 0974-8490


INTRODUCTION

Gentianaceae, or the Gentian family, is a family of flowering plants of 87 genera and over 1650 species.[1] Plants are usually rhizomatous. These are annuals or perennials, mostly upright though a few species lie on the ground and have upright branch tips. Leaves are opposite or whorled with entire edges and bases connately attached to the stem, mostly without a petiole. Flowers have four to five sepals, petals, and stamens, but only one pistil. Sepals and petals are fused at the base, with four to five free lobes above. Stamens alternate with the corolla lobes. Ovary is superior; fruit is a capsule. Stipules is absent. Plants usually accumulate bitter iridoid substances; bicollateral bundles are present. The fruits are dehiscent septicidal capsules splitting into two halves. The Gentianaceae contains many species with interesting phytochemical properties. They have been widely used in traditional medicine and also as constituents in bitters and similar concoctions. The family consists of trees, shrubs, and herbs showing a wide range of colors and floral patterns. Simple sequence repeats (SSRs),[2] or microsatellites[3] or short tandem repeats,[4] are short (1–6 bp) repeat motifs that show a high level of length polymorphism due to insertion or deletion mutations of one or more repeat types.[5] Studies suggest that both protein coding and noncoding regions of DNA sequences contain SSRs.[6] SSRs present in coding sequences are less polymorphic than those in the genomic sequences. Moreover, different taxon varies in abundance of different types of SSRs and these are present in greater abundance in noncoding regions than coding SSRs.[7] The SSRs are either developed conventionally[8] or from sequence databases.[9] PCR-based techniques such as AFLP and microsatellites or SSRs have also played important roles in plant DNA profiling. Primers are essential components of PCR-based systems as well as modern microarray systems which utilize appropriate probes for PCR amplification.[10] In genetics, a sequence motif is a nucleotide or amino acid sequence pattern that is widespread and is believed to have, a biological significance. When a sequence motif appears in the exon of a gene, it may encode the “structural motif” of a protein, that is, a stereotypical element of the overall structure of the protein. “Noncoding” sequences are not translated into proteins. Outside of gene exons, there exist regulatory sequence motifs and motifs within the “junk,” such as satellite DNA.[11] Robinson et al.12developed a computer program to identify and design PCR primers for amplification of SSR loci based on available DNA sequence information. SSR primers have been designed using publicly available expressed sequence tags (ESTs) in barley,[13] almond (Prunus communis Fritsch.), and peach (P. persica (L.) Batsch.),[14] T. aestivum, and O. Sativa.[15] These SSRs are useful as molecular markers because their development is inexpensive, they represent transcribed genes, and their putative function can often be deduced by a homology search.[16] SSRs have been the backbone to creating molecular maps for a number of years. The increasing number of genomic and expressed sequences in public databases provides a valuable source for bioinformatical data mining. However, there are a number of exciting application of these sequence data; used in comparative genome analysis – to trace the evolution among the related species, to study the genome structure and their gene functions. Comparative genome analysis requires the same sets of genes (i.e., cross-reference genes) to be mapped to chromosomes in the species compared. Thus, comparative maps with sets of EST-derived markers (i.e., cross-species markers) are essential for comparative genome analysis. Several studies have utilized publicly available ESTs to mine SSRs or microsatellites markers for plants,[17-20] catfish,[21] insects,[22] animals,[23] and human.[24] The EST-derived SSR markers (EST-SSRs) have proved very useful for the construction of genetic and comparative maps.[25] The software used here is MISA, a microsatellite identifying tool which has the advantage of detecting the mono- to decamer repeats and also compound repeats. But it has the disadvantage of inability to detect above decanucleotide repeats. Riju and Arunachalam,[26] mined the SSRs in oil palm ESTs with five different software and have reported that MISA program has given maximum coverage of SSRs in both oil palm ESTs and Contigs.

PCR primer design in general

Understanding of primer properties is very important for primer design. The major aspects of primer properties include specificity, melting temperature (Tm), and intraprimer or interprimer homology. Primer specificity is mostly determined by the 3’-end sequences. It was reported that single internal mismatches had no significant effect on PCR product yield while the 3’-terminal mismatches, especially the A:A, A:G, G:A, and C:C mismatches, markedly reduced overall PCR product yield.[27] Khabar et al.[28] assessed the annealing specificity of primers in PCR reactions under different annealing temperatures (35°C, 40°C, and 45°C) and found perfect matches between at least eight bases at the 3’-end of the 5’-primers and the target region, whereas mispriming occurred only toward the 5’-end. Therefore it is critical to include 8–10 unique bases at the 3’-end of the primer. Ideally the primer has a Tm in the range of 50–65°C, random nucleotide composition with a 40–60% GC-content, and 18-30 bases long. The intraprimer or interprimer homology is kept as low as possible to avoid formation of hairpin structures or primer dimmers (>3 bp complementarities between primers) which otherwise will interfere with annealing of primer to the DNA template.[29] ESTs, which represent the expressed part of genome, also serve as a source of SSRs.[9] Detection of SSRs facilitates the development of SSR markers that are useful in the study of genetic variation, gene tagging, and linkage mapping,[30] and are also useful across a number of related species.[13] Microsatellites can be amplified for identification by the polymerase chain reaction (PCR) process, using the unique sequences of flanking regions as primers. Once the potentially useful microsatellites are determined (removing nonuseful ones such as those with random inserts within the repeat region), the flanking sequences can be used to design oligonucleotide primers which will amplify the specific microsatellite repeat in a PCR. Microsatellite loci are widely distributed throughout the genome and can be isolated from semidegraded DNA of older specimens, as all that is needed is a suitable substrate for amplification through PCR. Hence, the present study was to find out the distribution and abundance of SSRs for the development of markers and to annotate SSR-containing sequence in Gentianaceae family. Nucleotide database, which contains sequences of well-characterized genes as well as hundreds of thousands novel EST sequences, was retrieved to perform the analysis.

MATERIALS AND METHODS

Retrieval of nucleotide sequences and detection of SSRs

A total of 647 nucleotide sequences of Gentianaceae were downloaded from the NCBI (http://www.ncbi.nlm.nih.gov/Nucleotide/?term=Gentianaceae) and harvested for SSRs using a perl script. The minimum length of SSR was fixed at 14 bp according to the criteria used by Gupta et al..[31] The SSRs were defined as 14-bp mononucleotide or dinucleotide repeats; 15-bp trinucleotide repeats; 16 tetranucleotide repeats; 20 pentanucleotide repeats; 18 hexanucleotide repeats. The poly A and poly T repeats were removed by using an inhouse developed perl script, as these are not considered as SSRs due to their presence at 3’-end of mRNA/cDNA sequences.

Primer designing for SSRs

A pair of primer flanking each SSR was designed using FastPCR software available at www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi, which takes input according to user-defined conditions and pick primers according to these specified parameters. Default parameters of the FastPCR, viz, the optimum primer size of 20.0 (the range was 18–28), the optimum annealing temperature of 60.0 (the range was 57.0–63.0), and the range of% GC content of 44–60, were selected for primer designing.

Detection of SSR positions with respect to open reading frames

Open reading frames (ORFs) are predicted for all the SSR-containing sequences using ORF under available at NCBI (http://www.ncbi.nl m.nih.gov/gorf/gorf.html) using standard genetic code. Sequence fragments with maximum length uninterrupted by stop codon were taken as the primary encoding segment (ORF) of the query sequences. In all the predicted ORFs, the relative positions of SSRs were detected, that is, whether the SSR was present within the ORF, in the 5’ UTR untranslated region (UTR) or in the 3’ UTR

RESULTS

Screening of Gentianaceae sequences for SSRs

In the present study, 4698 nucleotide sequences of Gentianaceae available at NCBI (http://www.ncbi.nlm.nih.gov) were searched for SSRs with a minimum length of 18 bp. A total of 545 SSRs were detected from 2889 kb of data mined, excluding poly A and poly T. Depending upon the length of the repeat unit itself (1–6 bp), the lengths of the identified SSRs varied from 14 to 48 bp, respectively.

Frequencies of classified repeat types of Gentianaceae

From a number of 4698 sequences screened, only a subset of 461 sequences contained 545 SSRs, suggesting that merely 9.83% of sequences contained SSRs. The frequencies of SSRs with mono-, di-, tri-, tetra-, and hexanucleotide repeat units showed the frequent repeat type within the nucleotide sequences of Gentiana family that were found to be in mononucleotide (84.58%) followed by dinucleotide repeats (18.16%), trinucleotide (2.75%), and hexanucleotide (0.65%), respectively [Figure 1]. Whereas, no tetranucleotide and pentanucleotide repeat was detected during the analysis.
Figure 1

Frequency distribution of different repeat types identifi ed in nucleotide sequences of Gentianaceae

Frequency distribution of different repeat types identifi ed in nucleotide sequences of Gentianaceae The observed frequency of different repeat types comprising the SSRs is presented in [Figure 2a–d] and summarized in Table 1. SSRs were comprised of four different types of mononucleotide (A,T, C, and G), nine different types of dinucleotide (CA)n, (TG)n, (AC)n, (GA)n, (CT)n, (TA)n, (AT)n, (GC)n, (TC)n, (AG)n, (GT)n repeats, seven different types of trinucleotide (GAG)n, (ATG)n, (CTT)n, (TTA)n, (CAA)n, (AAC)n, (ACA)n repeats, and two types of hexanucleotide (CCACAC)n, (GGTCAA)n repeats.
Figure 2

Frequency distribution of (a) mono-, (b) di-, (c) tri-, and (d) hexanucleotide repeat motifs in the genome of Gentianaceae

Table 1

Summary of in silico mining of Nucleotide sequences of Gentianaceae

Frequency distribution of (a) mono-, (b) di-, (c) tri-, and (d) hexanucleotide repeat motifs in the genome of Gentianaceae Summary of in silico mining of Nucleotide sequences of Gentianaceae

Designing of primers for SSRs

Out of 545 SSRs detected, the primers could be designed only for 169 (31%) SSRs and the rest 376 (69%) sequences did not produce any acceptable primers. These 169 SSRs for which primers were designed include 133 mono-, 29 di-, 7 tri-, and no hexanucleotide repeats. The details of the accession numbers of nucleotide sequences of Gentiana, repeat motif of SSRs for which primer were designed, primer sequences, GC%, product size, and annealing temperature are given in Table 2.
Table 2

Synthesis of primer

Synthesis of primer

Prediction of ORF in SSR-containing sequences

An attempt was made to predict the ORFs in SSR-containing sequences using ORF finder. Out of the 545 SSRs identified, the positions of 359 SSRs with respect to ORF were determined, while for the remaining 186 SSR-containing sequences, no ORF were predicted. Of these 359 SSRs, a large number of 161 (44.84%) were present in the 5′ untranslated region, 129 (35.93%) SSRs occurred within ORF, and the remaining 69 (19.22%) occurred in the 3’ untranslated region.

DISCUSSION

In the present study, a large number of nucleotide sequences (4698) of Gentiana retrieved from NCBI were mined for SSRs. In the sequences that were mined the SSRs were characterized, and a subset of these SSRs was used for designing the markers. A total of 545 SSRs was detected and this was in accordance to the findings of[32] who reported that the abundance of different repeats varied broadly depending upon the species. Microsatellites or SSRs are stretches of DNA containing tandem repeats of di-, tri-, tetra-, and above nucleotide units ubiquitously distributed throughout the eukaryotic genome. They are found to be abundant in plant genomes and are thought to be the major sources of genetic variation in quantitative traits. The abundance of the different repeat motifs (1–6 bp) in the SSRs as detected in Gentiana family during the present study was variable so that the SSRs with different repeat motifs were not evenly distributed. The SSRs with dinucleotide repeats (18.16%) were abundant. This is in agreement with the results of earlier studies on Arabidopsis in which the dinucleotide repeats were also found to be abundant,[33] perhaps because the genomic sequences of this species may include SSRs in noncoding regions too. The smaller repeat motifs were found to be predominant among SSRs identified and as the length of repeat unit increases, their occurrence decreases. We excluded poly A and poly T repeats due to which their number is under-represented. The abundance of trinucleotide SSRs may be attributed to the absence of frame shift mutations due to variation in trinucleotide repeats.[34] Molecular genetic markers can be used to examine a group of individuals or populations to estimate various diversity measures and genetic distances, intergenetic structure and clustering patterns, test for Hardy-Weinberg equilibrium and multilocus equilibrium, and to test polymorphic loci for the evidence of selective neutrality. This can be useful to plant breeders, germplasm managers, or others who are interested in population genetic properties of materials that they are working with. The three most common types of markers used today are RFLP, RAPD, and microsatellites. A wide variety of methods for the construction of libraries enriched for microsatelite sequences have been reported, the most popular among those being the ones based on vectorette PCR using anchored primers. But this method is highly time-consuming and expensive, and the alternative is to use bioinformatics, that is, computational tools to screen the public database and find SSR. EST-derived molecular markers, especially SSR and SNP, are highly useful in developing linkage maps and markers assisted breeding programs. These markers are also transferable to related genera. Molecular marker techniques are advantageous as they directly reflect variations in the DNA sequences and therefore of independence of environment. Among many molecular marker techniques currently available, microsatellites and SSRs[35] provide an improved technology in assessing genetic diversity and genetic relationships in plants as they are highly polymorphic, codominants, very informative, and PCR based. EST-SSRs offer the following advantages over other genome DNA-based markers: (1) they should detect variation in the expressed portion of the genome so that gene tagging should give “perfect” marker–trait associations; (2) they can be developed at no cost from the EST databases; and (3) once developed, these markers, unlike genomic SSRs, may be used across a number of related species. With the growth of sequence databases, several authors have reported an abundance of SSRs in different genomes. The Distribution of SSRs in the rice genome has also been studied on the basis of the two whole genome draft sequences released, respectively, by Syngenta and by the Beijing Genome Institute (BGI). In the draft sequence released by Syngenta, for instance, 48,351 SSRs (including di-, tri-, and tetranucleotide repeats) were available, giving a density of 8 kb per SSR in the whole genome; SSRs represented by di-, tri-, and tetranucleotide repeats accounted respectively for 24%, 59%, and 17% of the total SSRs. SSRs are very polymorphic due to the high mutation rate affecting the number of repeat units. Such length-polymorphisms can be easily detected on high-resolution gels (e.g., sequencing gels), by running PCR-amplified fragments obtained using a unique pair of primers flanking the repeat.[36] Chung and Staub[37] developed a set of consensus chloroplast primer pairs for ccSSRs from N. tabacum chloroplast sequences. All primer pairs produced amplicons after PCR employing chloroplast DNA from members of the Cucurbitaceae (six species) and Solanaceae (four species). Sixteen, 22, and 19 of the initial 23 primer pairs were successively amplified by PCR using template DNA from species of the Apiaceae (two species), Brassicaceae (one species), and Fabaceae (two species), respectively. Twenty of the 23 primer pairs were also functional in three monocot species of the Liliaceae (onion and garlic), and the Poaceae (oat). ccSSR primers were strategically “recombined” and referred to correctly as recombined consensus chloroplast primers (RCCP) for PCR analysis of cucumber DNA such that the primers designed for the SSR-containing genus of Gentiana family would be utilized for the production of amplicons from different members of family. Kijas et al.[38] tested two primer sets in 10 different Citrus species and two related genera and found conservation of the sequences. Cross-species amplification has also been reported between cultivated rice and related wild species[39] and between Vitis species.[40] Provan et al.[41] could show successful amplification of two tomato SSR primer pairs tested on potato cultivars. Weising et al,.[42] reported conservation of SSR flanking sites in different species of kiwifruit (Actinidia chinensis). Usually, a low percentage of markers also amplified fragments from species belonging to other genera from the same family. Within the Poaceae family, primers worked even across different genera,[43] but only 50% of microsatellite loci identified in wheat were also polymorphic in rye and barley cultivars. Whitton et al.[44tested 13 SSR loci in 25 representatives of the Asteraceae, where it was demonstrated that the regions flanking in the repeats are not highly conserved, neither in the nucleotide sequence nor in the relative position. Indeed, in general, transferability of polymorphic markers in plants is likely to be successful mainly within genera (success rate close to 60% in eudicots and close to 40% in the reviewed monocots) rather than between genera (transfer rates are approximately 10% for eudicots) within the same family.[45] This transferability of polymorphic markers nature in plant generally enhances the utilization of the primers in random way. Comparative genome analysis facilitates high-throughput comparative mapping with the assistance of cross-species markers, and further facilitates gene cloning by identifying cross-reference genes. Seventeen SSR primer sets developed for Quercus petraea were tested on eight different members of the Fagaceae family.[46] In total 66% resulted in interpretable amplification products and most of them were really homologous to the originally cloned SSR fragment from Q. petraea. The primers could be designed successfully for a very large number (169, 31%) of SSRs [Table 2]. However, it was not possible to design the primers for remaining SSRs (376, 69%) because the sequence flanking at both ends of the SSRs was inadequate in size to design the primers. The large number of primer pairs for the SSRs that have been designed during the present study may be utilized for a variety of purposes, for example, gene tagging, genetic mapping, population studies, etc. Due to a high level of potential for length polymorphisms, SSRs have become a valuable source of genetic markers and have been broadly applied to various areas of genetic research including studies of genome variation, establishment of genetic maps, integration of physical and genetic maps, determination of evolutionary relationships, and comparative genome analyses.

CONCLUSIONS

Nucleotide sequences of Gentiana family were systematically searched for SSRs using the “ssr_finder.pl” perl program for the development of SSR markers. This is a valuable approach for both costs and time, given a sufficient amount of available Gentiana family sequences. The use of SSRs in genetic diversity studies is a novel tool that reveals variation in genomes.
  38 in total

1.  A simple sequence repeat-based linkage map of barley.

Authors:  L Ramsay; M Macaulay; S degli Ivanissevich; K MacLean; L Cardle; J Fuller; K J Edwards; S Tuvesson; M Morgante; A Massari; E Maestri; N Marmiroli; T Sjakste; M Ganal; W Powell; R Waugh
Journal:  Genetics       Date:  2000-12       Impact factor: 4.562

2.  Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat.

Authors:  Ramesh V Kantety; Mauricio La Rota; David E Matthews; Mark E Sorrells
Journal:  Plant Mol Biol       Date:  2002 Mar-Apr       Impact factor: 4.076

3.  Microsatellite repeats in grapevine reveal DNA polymorphisms when analysed as sequence-tagged sites (STSs).

Authors:  M R Thomas; N S Scott
Journal:  Theor Appl Genet       Date:  1993-09       Impact factor: 5.699

4.  Noncoding RNAs: persistent viral agents as modular tools for cellular needs.

Authors:  Günther Witzany
Journal:  Ann N Y Acad Sci       Date:  2009-10       Impact factor: 5.691

5.  Survey in the sugarcane expressed sequence tag database (SUCEST) for simple sequence repeats.

Authors:  Luciana Rossini Pinto; Karine Miranda Oliveira; Eugênio César Ulian; Antonio Augusto Franco Garcia; Anete Pereira de Souza
Journal:  Genome       Date:  2004-10       Impact factor: 2.166

6.  Characterization and mapping of four novel human expressed polymorphic trinucleotide microsatellites.

Authors:  L A Haddad; F C Parra; S D Pena
Journal:  Gene       Date:  1998-11-26       Impact factor: 3.688

7.  Abundance, variability and chromosomal location of microsatellites in wheat.

Authors:  M S Röder; J Plaschke; S U König; A Börner; M E Sorrells; S D Tanksley; M W Ganal
Journal:  Mol Gen Genet       Date:  1995-02-06

8.  The development and evaluation of consensus chloroplast primer pairs that possess highly variable sequence regions in a diverse array of plant taxa.

Authors:  Sang-Min Chung; Jack E Staub
Journal:  Theor Appl Genet       Date:  2003-06-25       Impact factor: 5.699

9.  Mining and characterizing microsatellites from citrus ESTs.

Authors:  Chunxian Chen; Ping Zhou; Young A Choi; Shu Huang; Fred G Gmitter
Journal:  Theor Appl Genet       Date:  2006-02-11       Impact factor: 5.699

10.  Recent developments in primer design for DNA polymorphism and mRNA profiling in higher plants.

Authors:  Xiaohan Yang; Brian E Scheffler; Leslie A Weston
Journal:  Plant Methods       Date:  2006-03-01       Impact factor: 4.993

View more
  2 in total

1.  Microsatellite analysis in the genome of Acanthaceae: An in silico approach.

Authors:  Priyadharsini Kaliswamy; Srividhya Vellingiri; Bharathi Nathan; Saravanakumar Selvaraj
Journal:  Pharmacogn Mag       Date:  2015 Jan-Mar       Impact factor: 1.085

2.  Development of a new set of genic SSR markers in the genus Gentiana: in silico mining, characterization and validation.

Authors:  Era Vaidya Malhotra; Rishu Jain; Sangita Bansal; Suresh Chand Mali; Neelam Sharma; Anuradha Agrawal
Journal:  3 Biotech       Date:  2021-09-10       Impact factor: 2.893

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.