Literature DB >> 23139588

Distribution and characterization of simple sequence repeats in Gossypium raimondii genome.

Changsong Zou¹, Cairui Lu, Youping Zhang, Guoli Song.

Abstract

Simple sequence repeats (SSRs) can be derived from the complete genome sequence. These markers are important for gene mapping as well as marker-assisted selection (MAS). To develop SSRs for cotton gene mapping, we selected the complete genome sequence of Gossypium raimondii, which consisted of 4447 non-redundant scaffolds. Out of 775.2 Mb sequence examined, a total of 136,345 microsatellites were identified with a density of 5.69 kb per SSR in the G. raimondii genome leading to development of 112,177 primer pairs. The distributions of SSRs in the genome were non-random. Among the different motifs ranging from 1 to 6 bp, penta-nucleotide repeats were most abundant (30.5%), followed by tetra-nucleotide repeats (18.2%) and di-nucleotide repeats (16.9%). Among all identified 457 motif types, the most frequently occurring repeat motifs were poly-AT/TA, which accounted for 79.8% of the total di-nt SSRs, followed by AAAT/TTTA with 51.5% of the total tetra-nucleotede. Further, 18,834 microsatellites were detected from the protein-coding genes, and the frequency of gene containing SSRs was 46.0% in 40,976 genes of G. raimondii. These genome-based SSRs developed in the present study will lay the groundwork for developing large numbers of SSR markers for genetic mapping, gene discovery, genetic diversity analysis, and MAS breeding in cotton.

Entities: Chemical Disease Species

Keywords: Gossypium raimondii; Simple Sequence Repeats (SSRs); distribution; molecular marker

Year: 2012 PMID： 23139588 PMCID： PMC3488841 DOI： 10.6026/97320630008801

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Simple sequence repeats (SSRs) are tandemly repeated DNA motifs (1-6 bp long) which are present in both protein coding and non-coding regions of DNA sequences, and show a high level of length polymorphism due to mutations of one or more repeats. SSRs are easy to use and analyze by virtue of their multiallelic nature, reproducibility, high abundance and extensive genome coverage [1, 2]. The traditional methods of developing SSR markers are usually time consuming and laborintensive. Generally these processes involve genomic library construction, hybridization with the repeated units of nucleotides and sequencing of the clones. The computational approach for developing SSR markers from the genome sequence provides a better platform than the conventional approach. Several bioinformatic tools for the identification of microsatellites in genomic sequences have been developed. The most commonly used tools for SSR search are: SSRIT [3], TROLL [4], MISA [5], SSRFinder [6], Modified Sputnik I and II [7, 8], and SciRoKo [9]. SciRoKo is a user-friendly software tool for the identification of microsatellites in genomic sequences [9]. Cotton (Gossypium spp.) is a major world agricultural crop, and the annual planting area is about 3,300 million hectares [10]. In recent years, molecular marker technology has been widely applied to such studies on cotton as genetic mapping [11-13], genetic diversity analysis [14], MAS [15, 16], and gene tagging [17]. Due to the facts that the cotton genome is relatively large, with a 1C content of 2,250 Mb, and that intraspecific molecular polymorphism in this species is low, there is a major preoccupation for more highly polymorphic genetic markers for marker-assisted breeding programs. To date, approximately 17,000 pairs of SSR primers have been developed from four cotton species (G. arboretum L., G. barbadense L., G. hirsutum L., and G. raimondii Ulbrich) [18]. However, rare of them are able to represent the large cotton genome adequately. In this study, the frequency and distribution of SSRs in the G. raimondii genome were characterized.

Methodology

Data source:

The genome sequence and annotation information of G. raimondii were download from the CGP (http://cgp.genomics.org.cn/page/species/mapview.jsp).

SSR scanning and analysis:

The genome was scanned for SSRs 1oci with program software SciRoKo 3.4 (SSR Classification and Investigation by Robert Kofler) [9]. The parameters were set for detection of mono, di, tri, tetra, penta, and hexa -nucleotide (nt) motifs with a minimum of 15, 7, 5, 3, 3 and 3 repeats, respectively (under the mismatched and fixed penalty search mode). Initially, each SSR was considered to be unique and was subsequently classified according to theoretically possible combinations. The motif association statistic requires the standardizations. During standardization, the reverse complements of microsatellite motifs were considered, and similar microsatellite motifs are grouped together. For example, a poly-A repeat is equivalent to a poly-T repeat on a complementary strand, and an AAG is equivalent to AGA and GAA in different reading frames and to CTT, TCT and TTC on a complementary strand. Thus, there are two possible combinations for mono-nt repeats, four for di-nt repeats, and ten for tri-nt repeats, 33 for tetra-nt repeats, 102 for penta-nt repeats, and 350 for hexa-nt repeats. In this study, we defined two genomic location categories as genic (5'-Utr, exon, intron, and 3'-Utr) and intergenic regions. To locate the distribution of SSRs in different genomic regions, the position of SSRs were compared with the genome annotation by Perl scripts. To describe the abundance of SSRs in different genomic regions, we calculated the “relative abundance” (RA) by dividing the number of SSRs by the mega base-pair (MB) of sequences in our analyses.

Primer designing:

Primer pairs were designed from the obtained SSR sequences by using Primer3. Perl scripts were used to operate Primer3 core code for batch designing primer. The major parameters for primer design were as follows: primer length, for which we selected 17-27 bp, with 20 bp being optimal, PCR product sizes of 100-250 bp, an optimum annealing temperature of 57°C, and a GC content of 30%-65%, with 50% being optimal. Then the SSRs were searched for both forward and reverse primers.

Discussion

A total of 136,345 microsatellites were identified in the 775.1 Mb (containing 4447 scaffolds) genomic DNA sequence of G. raimondii using the SciRoko programs. With the help of core primer3 and Perl scripts, 112,177 primer pairs were obtained (Data was not shown). Among the SSRs we analyzed, 113,766 (83.4%) were perfect repeats, 22,579 (16.5%) were mismatched repeats. The results showed that SSRs were abundant in the G. raimondii genome with about one SSR every 5.69 kb Table 1 (see supplementary material). The most abundant microsatellite was the penta-nt repeats of which 41,567 (30.5% of the SSRs) were identified, followed by the tetra-nt repeats (24,876, 18.2%) and di-nt repeats (23,109, 16.9%). The numbers of mono-nt, trint and hexa-nt repeats were 11,611 (8.5%), 20,199 (14.8%), and 14,983 (11.0%), respectively (Figure 1). The SSR loci were classified by repeat type and frequency of repeats per motif Table 2 (see supplementary material). We found 457 types of repeat motifs in these SSRs. Among the SSR groups (standardization), the most abundant repeat motif type was poly-A/T in mono-, ploy-AT/TA in di-, poly-AAT/TTA in tri-, poly-AAAT/TTTA in tetra-, poly-AAAAT/TTTTA in penta-, and poly-AAAAAT/ TTTTTA in hexa-nucleotides. For each SSR type, the less frequency the SRR has with number of repeats the more.

Figure 1

Frequency distribution of different repeat types identified in the G. raimondii genome

These SSRs put insight into the frequency distribution of different types of nucleotide repeats in G. raimondii. More SSRs were found in the intergenic regions (64.1%) than in the genic regions (35.9%) Figure 1 & Table 3 (see supplementary material). The different SSR repeat units showed obviously differential or non-random distributions in the different genomic locations. The microsatellite analysis showed that the distribution of SSRs in exonic, intronic and intergenic regions of the genome were non-random and strongly biased, probably reflecting the functional significance of SSRs. In general, the relative abundances of 3-UTR, 5-UTR, and intron region were considerably higher than that of intergenic region Table 1 (see supplementary material); the tri-nt repeats were the most abundant SSR type in the genic region, whereas, penta-nt repeats were the most abundant SSR type in the intergenic region (Figure 2).The relative abundances of the tri-nt SSRs in the Coding Sequence (CDS) regions were 51.3 per Mb, which significantly higher other SSR types. The enhanced frequency of tri-nt SSRs in the coding regions might indicate the effects of selection against possible frameshift mutations.

Figure 2

Genome-wide distribution and relative abundance of SSR types by their unit size. Each bar represents the relative abundance of the SSR types in different genome locations. Relative abundance = number of SSR type/region size in mega bases (Mb)

In an attempt to analyze the differential distribution of SSRs more clearly, we characterized the distribution of the SSR types in each repeat unit across the different genomic locations Table 4 & Table 2 (see supplementary material). The results showed that the distribution of the different SSR types in the genome was non-random. For instance, of the two possible types of mono-nt SSRs, poly-A/T was the predominant form with 10,141 loci, about 89.6% of the total mono-nt loci. Of the ten possible of tri-nt SSRs, the poly-AAT/TTA accounted for 54.8% of the total tri-nt loci, followed by ploy-AAAT/TTTA accounted for 51.5% of the total terta-nucleotide. In genome, the most frequently occurring repeat motifs were poly-AT/TA, which accounted for 79.8% of the total di-nt SSRs Table 2 (see supplementary material). In the genic region, the most frequently occurring trint repeat motifs were poly-AAG/TTC, which accounted for 27.1% of the total tri-nt SSRs in CDS region Table 2 (see supplementary material). Ignoring the mono-nucleotide repeats, the di- and tri- nucleotide repeat motifs with the highest frequencies were poly-AT/TA and -AAG/TTC in the genic region, respectively, which were identical with the previous reports [19]. Currently a number of studies are being reported regarding the development of EST-SSRs in cotton species using the computational tools [15, 19]. Microsatellite markers are very important for studing genetic mapping, genetic diversity analysis, molecular marker-assisted breeding (MAS) [13-16]. In the present study, 18,834 microsatellites were detected from the total protein-coding genes, and the frequency of gene containing SSRs was 46.0% in 40,976 genes of G. raimondii. Although G. raimondii seeds contain no valuable fibers, the epidermal seed trichomes grow thickly. As one of the allotetraploid cotton donors, the D-subgenome has contributed important quantitative trail loci (QTLs) and/or genes to fiber development [20] With the help of gene function annotation, the putative functions of the genes could lead to find the important functional domain markers (FDM) related to gene ontology study such as stress response and fiber development, and develop the important FDM related to genetic diversity analysis and MAS for breeding in cotton species.

Conclusion

Cotton commonly known as fiber crop is a plant of great commercial value. Up to now many works have been reported regarding the application of molecular markers in this plant for genetic mapping, gene discovery, genetic diversity analysis, and MAS. As such, the cotton research community has made efforts to develop many portable markers to overcome the problem of low DNA polymorphism rates among various cultivated cotton breeding programs (http://www.cottonmarker.org/). SSRs are the most powerful genetic markers for genetic linkage analysis, diversity study and marker assisted selection. High-resolution mapping in cotton has not been got because of limited DNA polymorphism within a cotton species. To explore the genetic make-up of cotton, inter-species variability, evolutionary relationship, development and application of molecular markers are of immense importance. The genome-based SSRs developed in the present study will shed light into the discovery of the information. This investigation is laying the groundwork for developing large numbers of SSR markers in cotton. The growing collection of portable markers in cotton provides a cost-effective tool for genome mapping and gene discovery to understand and improve the cotton species.

12 in total

1. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes.

Authors: Michele Morgante; Michael Hanafey; Wayne Powell
Journal: Nat Genet Date: 2002-01-22 Impact factor: 38.330

2. Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat.

Authors: Ramesh V Kantety; Mauricio La Rota; David E Matthews; Mark E Sorrells
Journal: Plant Mol Biol Date: 2002 Mar-Apr Impact factor: 4.076

3. Molecular tagging of a major QTL for fiber strength in Upland cotton and its marker-assisted selection.

Authors: Tianzhen Zhang; Youlu Yuan; John Yu; Wangzhen Guo; Russell J Kohel
Journal: Theor Appl Genet Date: 2002-11-22 Impact factor: 5.699

4. SciRoKo: a new tool for whole genome microsatellite search and investigation.

Authors: Robert Kofler; Christian Schlötterer; Tamas Lelley
Journal: Bioinformatics Date: 2007-04-26 Impact factor: 6.937

5. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).

Authors: T Thiel; W Michalek; R K Varshney; A Graner
Journal: Theor Appl Genet Date: 2002-09-14 Impact factor: 5.699

6. PCR-amplified microsatellites as markers in plant genetics.

Authors: M Morgante; A M Olivieri
Journal: Plant J Date: 1993-01 Impact factor: 6.417

7. EST derived PCR-based markers for functional gene homologues in cotton.

Authors: Peng W Chee; Junkang Rong; Dawn Williams-Coplin; Stefan R Schulze; Andrew H Paterson
Journal: Genome Date: 2004-06 Impact factor: 2.166

8. A draft physical map of a D-genome cotton species (Gossypium raimondii).

Authors: Lifeng Lin; Gary J Pierce; John E Bowers; James C Estill; Rosana O Compton; Lisa K Rainville; Changsoo Kim; Cornelia Lemke; Junkang Rong; Haibao Tang; Xiyin Wang; Michele Braidotti; Amy H Chen; Kristen Chicola; Kristi Collura; Ethan Epps; Wolfgang Golser; Corrinne Grover; Jennifer Ingles; Santhosh Karunakaran; Dave Kudrna; Jaime Olive; Nabila Tabassum; Eareana Um; Marina Wissotski; Yeisoo Yu; Andrea Zuccolo; Mehboob ur Rahman; Daniel G Peterson; Rod A Wing; Jonathan F Wendel; Andrew H Paterson
Journal: BMC Genomics Date: 2010-06-22 Impact factor: 3.969

9. CMD: a Cotton Microsatellite Database resource for Gossypium genomics.

Authors: Anna Blenda; Jodi Scheffler; Brian Scheffler; Michael Palmer; Jean-Marc Lacape; John Z Yu; Christopher Jesudurai; Sook Jung; Sriram Muthukumar; Preetham Yellambalase; Stephen Ficklin; Margaret Staton; Robert Eshelman; Mauricio Ulloa; Sukumar Saha; Ben Burr; Shaolin Liu; Tianzhen Zhang; Deqiu Fang; Alan Pepper; Siva Kumpatla; John Jacobs; Jeff Tomkins; Roy Cantrell; Dorrie Main
Journal: BMC Genomics Date: 2006-05-31 Impact factor: 3.969

10. Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley.

Authors: Mauricio La Rota; Ramesh V Kantety; Ju-Kyung Yu; Mark E Sorrells
Journal: BMC Genomics Date: 2005-02-18 Impact factor: 3.969

7 in total

1. In Silico development of new SSRs primer for aquaporin linked to drought tolerance in plants.

Authors: Karim Rabeh; Fatima Gaboun; Bouchra Belkadi; Abdelkarim Filali-Maltouf
Journal: Plant Signal Behav Date: 2018-10-31

2. Genome-wide mining, characterization, and development of microsatellite markers in gossypium species.

Authors: Qiong Wang; Lei Fang; Jiedan Chen; Yan Hu; Zhanfeng Si; Sen Wang; Lijing Chang; Wangzhen Guo; Tianzhen Zhang
Journal: Sci Rep Date: 2015-06-01 Impact factor: 4.379

3. Development of chromosome-specific markers with high polymorphism for allotetraploid cotton based on genome-wide characterization of simple sequence repeats in diploid cottons (Gossypium arboreum L. and Gossypium raimondii Ulbrich).

Authors: Cairui Lu; Changsong Zou; Youping Zhang; Daoqian Yu; Hailiang Cheng; Pengfei Jiang; Wencui Yang; Qiaolian Wang; Xiaoxu Feng; Mtawa Andrew Prosper; Xiaoping Guo; Guoli Song
Journal: BMC Genomics Date: 2015-02-06 Impact factor: 3.969

4. A comparative genomics approach revealed evolutionary dynamics of microsatellite imperfection and conservation in genus Gossypium.

Authors: Muhammad Mahmood Ahmed; Chao Shen; Anam Qadir Khan; Muhammad Atif Wahid; Muhammad Shaban; Zhongxu Lin
Journal: Hereditas Date: 2017-05-18 Impact factor: 3.271

5. Genetic Evaluation of Natural Populations of the Endangered Conifer Thuja koraiensis Using Microsatellite Markers by Restriction-Associated DNA Sequencing.

Authors: Lu Hou; Yanhong Cui; Xiang Li; Wu Chen; Zhiyong Zhang; Xiaoming Pang; Yingyue Li
Journal: Genes (Basel) Date: 2018-04-17 Impact factor: 4.096

6. Genome-wide characterization of the WAK gene family and expression analysis under plant hormone treatment in cotton.

Authors: Lingling Dou; Zhifang Li; Qian Shen; Huiran Shi; Huaizhu Li; Wenbo Wang; Changsong Zou; Haihong Shang; Hongbin Li; Guanghui Xiao
Journal: BMC Genomics Date: 2021-01-28 Impact factor: 3.969

7. Transcriptome Analysis of Pennisetum glaucum (L.) R. Br. Provides Insight Into Heat Stress Responses.

Authors: Albert Maibam; Showkat Ahmad Lone; Sunil Ningombam; Kishor Gaikwad; S V Amitha Mithra; Madan Pal Singh; Sumer Pal Singh; Monika Dalal; Jasdeep Chatrath Padaria
Journal: Front Genet Date: 2022-06-02 Impact factor: 4.772

7 in total