Literature DB >> 23804557

Development and characterization of simple sequence repeat markers providing genome-wide coverage and high resolution in maize.

Jie Xu1, Ling Liu, Yunbi Xu, Churun Chen, Tingzhao Rong, Farhan Ali, Shufeng Zhou, Fengkai Wu, Yaxi Liu, Jing Wang, Moju Cao, Yanli Lu.   

Abstract

Simple sequence repeats (SSRs) have been widely used in maize genetics and breeding, because they are co-dominant, easy to score, and highly abundant. In this study, we used whole-genome sequences from 16 maize inbreds and 1 wild relative to determine SSR abundance and to develop a set of high-density polymorphic SSR markers. A total of 264 658 SSRs were identified across the 17 genomes, with an average of 135 693 SSRs per genome. Marker density was one SSR every of 15.48 kb. (C/G)n, (AT)n, (CAG/CTG)n, and (AAAT/ATTT)n were the most frequent motifs for mono, di-, tri-, and tetra-nucleotide SSRs, respectively. SSRs were most abundant in intergenic region and least frequent in untranslated regions, as revealed by comparing SSR distributions of three representative resequenced genomes. Comparing SSR sequences and e-polymerase chain reaction analysis among the 17 tested genomes created a new database, including 111 887 SSRs, that could be develop as polymorphic markers in silico. Among these markers, 58.00, 26.09, 7.20, 3.00, 3.93, and 1.78% of them had mono, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs, respectively. Polymorphic information content for 35 573 polymorphic SSRs out of 111 887 loci varied from 0.05 to 0.83, with an average of 0.31 in the 17 tested genomes. Experimental validation of polymorphic SSR markers showed that over 70% of the primer pairs could generate the target bands with length polymorphism, and these markers would be very powerful when they are used for genetic populations derived from various types of maize germplasms that were sampled for this study.

Entities:  

Keywords:  maize; polymorphic SSR markers; simple sequence repeat; teosinte; whole-genome sequences

Mesh:

Substances:

Year:  2013        PMID: 23804557      PMCID: PMC3789560          DOI: 10.1093/dnares/dst026

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Introduction

Maize (Zea mays L.) is one of the most important food, feed, and industrial crops globally and a model system for the study of genetics, evolution, and domestication. The maize genome is large and complex. The estimated total size of genome draft is 2.3 Gb, with over 80% of repeated sequences of various types.[1] The genetic variability in the maize genome can be utilized to enhance biotic and abiotic stress tolerance and to improve agronomic traits such as quality, maturity, and yield potential. Types of variation at the whole-genomic level include microsatellites or simple sequence repeats (SSRs), single-nucleotide polymorphisms (SNPs), insertions and deletions (indels), and various types of structure variation. SSRs are tandemly repeated mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide sequence motifs flanked by unique sequences.[2,3] The unique sequences bordering the SSR motifs provide templates for specific primers to amplify SSR alleles via polymerase chain reaction (PCR), and allelic differences are usually the result of variable numbers of repeat units within a microsatellite structure.[4] A larger number of repeated units is generally related to greater genotypic variation, and the shorter motifs such as those with mono-, and di-nucleotides usually possess more repeats than longer motifs such as those with tetra-, penta-, and hexa-nucleotides. However, shorter motifs can produce more slipped-strand mispairing (stuttering) during PCR, which usually lead to genotyping errors.[5,6] Based on the repetitive architecture, purity, and complexity of their motifs, SSRs can be classified as perfect (single motif in an uninterrupted array), imperfect, or compound (two or more motifs in interrupted or uninterrupted arrays). As we known, SSR loci with longer or perfect motifs can exhibit a higher level of allelic variability.[5] SSRs have been the genetic markers of choice, because they are easy to score, and have multiallelic nature, co-dominant inheritance, and clear advantages over restriction fragment length polymorphism and amplified fragment length polymorphism markers in terms of technical simplicity, throughput level and automation.[7] Compared with SNP markers that are generally biallelic,[8] SSR markers are more informative because it can detect multiple alleles per locus, so they are still commonly used nowadays. Thanks to the availability of whole-genome or transcriptome sequences in public databases and in the recent advent of bioinformatics tools, development of genetic markers including SSRs has become much easier and more cost-effective. Genetic markers can be obtained by screening genomic, cDNA sequences, or libraries of clones. To facilitate access to and utilization of SSR markers in Brachypodium, 27 329 SSR markers were successfully designed through genome-wide analysis, but only 398 SSR markers have been developed from its bacterial artificial chromosomes end and expressed sequence tag databases.[9] The availability of the completed soybean whole-genome sequence also provided an ideal resource for the genome-wide development of locus-specific SSR markers, and 33 065 high-polymorphic SSRs were developed with the availability of their genome positions and primer sequences.[10] Barchi et al.[11] combined the recently developed a restriction site-associated DNA approach with Illumina DNA sequencing to rapidly discover a large number of SNP and SSR markers for eggplant. Huang et al.[12] identified over 3.6 million SNPs by sequencing 517 rice landraces, which were used in genome-wide association studies for 14 agronomic traits. These results show that genetic markers such as SSRs and SNPs are abundant in different crop genomes and can be easily scored, making it more accessible to the breeders and geneticists. SSRs are abundant and well distributed throughout the maize genome, which can be employed as a preferred marker system. SSR markers have been utilized extensively in maize to characterize the genetic structure and diversity, to construct phylogenetic trees and to define potential heterotic groups, and to identify unique sources of allelic diversity.[13-15] Furthermore, SSR markers have been widely used for genetic map construction, quantitative trait locus (QTL) mapping, map-based cloning, and marker-assisted selection (MAS) because of their ubiquity and high level of polymorphism. Hence, enriching the current maize linkage maps with more SSR markers is of great value for the global maize molecular breeding. In recent years, many SSR markers have been developed and are publicly available (http://www.maizegdb.org/ssr.php) based on their target sequences among different maize germplasm accessions. However, a relatively low level of polymorphism was observed between cultivated maize and their relatives, and within populations derived from cultivated × teosinte and temperate × tropical maize crosses. The availability of the reference genome sequence and increasingly cost-effective sequencing facilities makes it possible to do whole-genome sequencing for more maize germplasm accessions. We used whole-genome sequence information from 3 typical tropical maize inbreds, 13 typical temperate maize inbreds from different heterotic groups, and 1 teosinte line, to analyse their genetic variation and to develop polymorphic molecular markers that can be used for high-resolution MAS, genomic selection, and QTL mapping. Using germplasm of diverse resources including teosinte and different types of maize lines, we can reveal and utilize unique alleles and loci better. Thus, the objectives of this study were to determine the abundance and characterization of SSRs in the maize genome and to use stringent screening to develop highly polymorphic SSR markers.

Materials and methods

Plant materials

Sequence data were generated for 17 genotypes including 16 improved maize inbred lines and 1 wild relative, Z. mays ssp. mexicana (hereafter Z. mexicana), which were listed in Table 1. Among the 16 improved maize inbreds, CML411 and P1 from International Maize and Wheat Improvement Center (CIMMYT) and 81565 from China were chosen to represent tropical/subtropical germplasm, ES40 was derived from traditional Chinese landrace, and the remaining 12 temperate maize inbreds were chosen to represent different heterotic groups in Chinese temperate maize. Temperate maize lines 178, Huangzao4, Ye478, Zheng22, and B73 representing PB, SPT, PA, LRC, and BSSS heterotic groups, respectively, were widely used for commercial hybrid production. For marker validation, additional six teosinte species obtained from United States Department of Agriculture (USDA) and four maize inbreds were also included (Table 1).
Table 1.

Maize genotypes used in the study

GenotypesPedigreeAdaptation
178aSelected from an introduced hybridTemperate
18RedaAmerican hybrid P78599Temperate
18WhiteaAmerican hybrid P78599Temperate
48-2aSynthesized populationTemperate
A318Improved S37Tropical
B73aBSSSTemperate
Dan598a(Dan340 × Danhuang11) × (Danhuang02 × Dan599)Temperate
ES40aLandrace Linshuidadudu selected from SichuanTemperate
Han21aAmerican hybrid P78599Temperate
Huangzao4aImproved from Landrace, TangSiPingTouTemperate
Lu9801Ye502 × H21Temperate
Mo17aC103 × 187-2Temperate
RP125aDerived from hybrid Chuandan9Temperate
Ye478aU8112 × Shen5003Temperate
Zheng22a(Duqing × E28) × Lu Jiu KuanTemperate
81565a(Huobai × Jin03)S2 × Heibai94Tropical/subtropical
CML411aP28C7-S4-#-BBBBBBBBBBBTropical/subtropical
CML206[EV7992#/EVPO44SRBC3]#BF37SR-2-3SR-2-4-3-BBTropical/subtropical
CML85P34C5F21-2-#1-2-2-#Tropical/subtropical
P1aUnknownTropical/subtropical
Z. mexicanaaZea mays ssp. mexicanaTropical
Z. parviglumisZea mays ssp. parviglumisTropical
Z. huehuetenangensisZea mays ssp. huehuetenangensisTropical
Z. nicaraguensisZea nicaraguensisTropical
Z. luxuriansZea luxuriansTropical
Z. perennisZea perennisTropical
Z. diploperennisZea diploperennisTropical

All the materials were used for experimental validation.

aMaterials were only used for SSR identification and markers development. The chromosome number is 2n = 40 for Z. perennis and 2n = 20 for other species.

Maize genotypes used in the study All the materials were used for experimental validation. aMaterials were only used for SSR identification and markers development. The chromosome number is 2n = 40 for Z. perennis and 2n = 20 for other species.

Maize genome sequences

For maize lines 81565, 18Red, 18White, 48-2, Dan598, ES40, RP125, and Z. mexicana, sequences were generated, and paired-end libraries were constructed according to the Illumina manufacturer's instructions. An average resequencing depth was 13× and genome coverage was 85% for maize inbreds. One teosinte species Mexican was sequenced with an average of resequencing depth of 9× and genome coverage of 74%. The genome sequences for the remaining maize lines were downloaded from the NCBI Sequence Read Archive (SRA) database (SRA049859 and SRA051245) and NCBI GenBank (JQ886798–JQ887980). All sequence reads were aligned against the maize B73 reference genome (www.maizesequence.org Release 4a.53) using Short Oligonucleotide Alignment Program 2 (http://soap.genomics.org.cn/). Sequencing and reads mapping were carried out at Beijing Genomics Institute (Shenzhen, China).[16-18]

SSR identification and primer design

SSR motifs were identified in 17 genomes using MISA (MIcroSAtellite identification tool) program downloaded from the Leibniz Institute of Plant Genetics and Crop Plant Research website (http://pgrc.ipk-gatersleben.de/misa/). Only perfect SSRs including mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs with numbers of uninterrupted repeat units more than 10, 7, 6, 5, 4, and 4, respectively, were targeted. The 5′- and 3′-untranslated regions (UTR), protein coding sequence (CDS), intron, and intergenic regions were determined based on their original annotations of the maize B73 reference genome (www.maizesequence.org Release 4a.53). Promoter sequences were determined at 2 kb upstream of the transcription initiation site. Any SSR locus to be used to develop genetic markers should include a perfect repeat motif and two unique flanking sequences with 300 bp on each sides of the repeat. In our study, SSR candidate sequences were used for BLASTN search against the genome sequences (e-value cut-off of 1e−10), and filtered with >90% of identity and minimum alignment length with >85% of the flanking sequences. Those with unique hit, together with their specific flanking sequences, were identified as candidate SSR loci. Then, we wrote a Perl script to combine SSRs within 5 kb of different genomes with the same motif and to identify polymorphic SSR loci among 17 genotypes depending on the presence of motifs. The forward and reverse primers were designed based on unique flanking sequences using Primer 3 (http://primer3.sourceforge.net/). Input parameters for the primer design were as follows: minimum, maximum, and optimal sizes were 18, 27, and 20 nt; minimum and maximum GC were 20 and 80%; and minimum, maximum, and optimal Tm were 57, 63, and 60°C, respectively. The deviation of amplicon size of each SSR primer ranged from 30 to 500 bp based on the expected SSR sequence length. In addition, electronic polymerase chain reaction (e-PCR) programme (http://www.ncbi.nlm.nih.gov/projects/e-pcr/) was applied to check the uniqueness and specificity of designed primers in the genomes. The parameters were set as following: the word size was 9, the discontiguous word was 1, the maximal allowed deviation of hit product size was 100, the maximum mismatches allowed, and the maximum indels allowed were 1, respectively. On the other hand, the published SSR markers reposited in MaizeGDB (http://www.maizegdb.org/) were downloaded and amplified in silicon through e-PCR programme for further comparison.

Experimental validation of polymorphic SSR markers

To assess the value of identified SSR markers, 151 primer pairs from 10 chromosomes including all types of SSR were chosen for experimental validation. The samples used in this experiment included 20 improved maize lines and 7 teosinte lines (Table 1). Genomic DNA was extracted from seedlings using the CTAB method. Primers were made by Shanghai DNA Biotechnologies Co., Ltd. PCR was performed in 25 μl reactions containing 2.5 μl buffer, 2.5 μl MgCl2 (25 mM), 4.0 μl dNTP (2.5 mM), 0.2 μl Taq polymerase (5 U/μl), 1 μl template DNA (100 ng/μl),13.8 μl ddH2O, and 0.1 μg primers. The PCR conditions were as follows: 1 cycle at 94°C for 5 min; 35 cycles at 94°C for 30 s, 60°C for 30 s, 72°C for 1 min, and 1 cycle at 72°C for 10 min. PCR products mixed with loading buffer were heated at 95°C for 5 min and quickly chilled on ice. The entire mixture was electrophoresed on 6% denaturing polyacrylamide gel, and the genotype was scored after silver staining. The number of alleles was recorded and the polymorphism information content (PIC) was calculated as described by Smith et al.[19]

Results

The abundance of SSRs in the maize genome

A large number of perfect SSRs with mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs were identified, but the numbers varied among different genomes (Table 2). The average number of SSRs was 135 693 in 17 genotypes, ranging from 133 346 loci observed in mexicana to 136 723 loci in tropical/subtropical maize inbred 81565. Some reads from Z. mexicana could not be mapped onto the reference genome, which resulted in relatively lower genome coverage and thus, less SSRs identified compared with other maize inbreds. A total of 264 658 unique SSR loci were detected in 17 genomes, of which mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs were 153 231, 65 236, 25 910, 6572, 8839, and 4870, respectively. The mono-nucleotide motif is the most abundant, accounted for 57.90%. There were 38 971 common SSRs (15% of the total) observed to be the same across 17 genotypes. The SSR density was calculated based on the maize reference genome size of 2.1 Gb, and there was a little difference among 17 genotypes for each nucleotide motif, with an average interval of 15.48 kb between SSR loci for every genome. However, the average intervals for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs were remarkably different, which were 26.93, 60.88, 150.95, 717.70, 505.05, and 942.55 kb, respectively (Table 2). SSRs were considerably abundant and distributed throughout the maize genome, with a small average marker interval (7.93 kb) for all detected loci.
Table 2.

Numbers and density of SSR loci identified in 17 maize genomes

GenotypesSSR numbers
SSR interval (kb)
MNRDNRTNRTTRPNRHNRTotalMNRDNRTNRTTRPNRHNRTotal
17878 36734 60413 9292 96441722226136 26226.8060.69150.76708.50503.36943.4015.41
8156579 04234 45713 893292941782224136 72326.5760.95151.16716.97502.63944.2415.36
18White77 14434 45113 931292841662217134 83727.2260.96150.74717.21504.08947.2315.57
18Red77 04934 49513 919293341882221134 80527.2660.88150.87715.99501.43945.5215.58
48-276 92034 44813 884292441232219134 51827.3060.96151.25718.19509.34946.3715.61
B7377 88834 75514 028294841812239136 03926.9660.42149.70712.35502.27937.9215.44
CML41178 59134 54613 900293541892225136 38626.7260.79151.08715.50501.31943.8215.40
Dan59878 55834 55913 954290441592227136 36126.7360.77150.49723.14504.93942.9715.40
ES4078 97834 37213 894292441612221136 55026.5961.10151.14718.19504.69945.5215.38
Han2178 53934 58413 951289841572233136 36226.7460.72150.53724.64505.17940.4415.40
Huangzao476 77334 44513 878290741652224134 39227.3560.97151.32722.39504.20944.2415.63
Mo1778 36034 44213 910290241082220135 94226.8060.97150.97723.64511.20945.9515.45
P178 97534 44713 849296141462235136 61326.5960.96151.64709.22506.51939.6015.37
RP12576 88034 52313 934292941982244134 70827.3260.83150.71716.97500.24935.8315.59
Z. mexicana75 99734 33913 806288640852233133 34627.6361.15152.11727.65514.08940.4415.75
Ye47878 57134 50813 973294441592234136 38926.7360.86150.29713.32504.93940.0215.40
Zheng2278 92134 45513 870292041562230136 55226.6160.95151.41719.18505.29941.7015.38
Average77 97434 49613 912292641582228135 69326.9360.88150.95717.70505.05942.5515.48
Total153 23165 23625 910657288394870264 65813.7032.1981.05319.54237.58431.217.93
Common22 4539963460355392947038 97193.53210.78456.223797.472260.504468.0953.89

MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs.

Numbers and density of SSR loci identified in 17 maize genomes MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs. We also examined different SSR repeat types in the genome for all tested genotypes. The frequencies of different nucleotide repeat types in each motif were different, but they showed similar frequency patterns in different genomes. Here, we compared the frequencies of different SSR repeat types by taking the reference line B73 as an example (Table 3). Of mononucleotide motifs, C/G repeats accounted for ∼54.4%, which was slightly higher than A/T repeats. Of the di-nucleotide motifs, (AT)n were most frequent (24.93%), followed by (AG/CT)n (24.36%), (TA)n (21.03%), and (GA/TC) (20.64%), while the (CG)n motif was least frequent (0.80%). Of the tri-nucleotide motifs, (CAG/CTG)n was the most abundant (15.86%), while other nucleotide repeat types had lower frequencies (0.4–6%). Of the tetra-nucleotide SSRs, (AAAT/ATTT)n, was most frequent (10.35%), and the frequencies for the rest nucleotide repeat types were all lower than 7%. There were many types of penta- and hexa-nucleotide SSRs, each with low frequencies, ranging from 0.04 to 4%. The numbers of mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs in different repeat unit classes are also listed in Table 3. The average repeat lengths were different among various motifs ranging from 11.02 for (A/T)n to 58.84 for (AGT/ACT)n.
Table 3.

Number of SSRs in different repeat classes in the maize genome B73

MotifsRepeats number
TotalAverage repeat numberAverage repeat length (bp)
<55–78–1011–1516–2021–2526–3031–40>40
G/C0013 08524 2973 83286624345142 36912.3112.31
A/T0022 41011 687111419257322735 51911.0211.02
AT0187532681379708478378466111866313.2826.56
CT/AG03615350474026512772954784659.4218.84
TA0172524171087625465364492134730914.0328.05
GA/TC03220293851923710371572871739.1918.39
CA/TG06665427215515113078.2416.48
GT/AC05825887312544012688.3216.64
GC0223663300002957.4214.83
CG0219550100002757.2914.59
CAG/CTG0198024230000022256.519.51
GAT/ATC052626446511008437.4922.48
GCA/TGC0711990000008106.4219.25
GAC/GTC058718428100008007.0221.06
ATT/AAT03521246542281810464310.2530.74
TTA/TAA0359966632272414262010.3130.94
CGT/ACG042913134000005947.1821.55
TGA/TCA035016758931005887.8623.59
CGA/TCG04741004000005786.6920.06
CGC/GCG0505494100005596.519.51
GCC/GGC0490425000005376.4919.46
TAT/ATA02461034347351313550511.3934.16
CGG/CCG0416483000004676.519.51
TAC/GTA023250261661513378.5425.63
TTG/CAA02308016400003307.3221.96
TTC/GAA0288269001233297.4222.26
ATG/CAT02604511401003217.0121.04
GCT/AGC0291161100003096.3619.09
TGG/CCA0280242000003066.419.2
TAG/CTA0186492611108332969.6328.9
CTC/GAG0232466000002846.8220.45
CTT/AAG024531111011281721
TCC/GGA0193438100002456.9120.74
CCT/AGG0187325210002276.9320.79
CAC/GTG0198180100002176.4619.38
ACC/GGT0191161000002086.3919.17
ACA/TGT01255810001001947.3422.02
AGA/TCT0169212010001936.5819.74
GTT/AAC085294011001207.4522.35
AAAT/ATTT0288152000003055.5822.33
AGGC/GCCT0153114000001685.923.6
TATT/AATA0195102000002075.7222.88
TCGT/ACGA012600000001265.0720.29
TTAT/ATAA0111160000001275.8823.53
TTTA/TAAA015772000001665.6722.67
CGAGC/GCTCG1532400000001774.1624.95
TTTTA/TAAAA822800000001104.2925.75
ATTTT/AAAAT693510000001054.4222.1

SSR motifs with repeats number >100 in total were listed here.

Number of SSRs in different repeat classes in the maize genome B73 SSR motifs with repeats number >100 in total were listed here.

Screening of SSR loci and development of maize SSR markers

A total of 2034 SSR markers have been recently developed and reported on MaizeGDB website (www.maizegdb.org). Among the public markers, 1556 SSRs have genomic positions. Through e-PCR programme conducted in B73 genome, 827 SSR markers have specific amplicon, 60 SSR markers have more than one binding sites, and the remaining markers have no proper binding sites on the 10 chromosomes. Here, we developed a new database containing more SSR markers with unique flanking sequences. From the SSRs that could be detected (264 658) across 17 maize genomes, 189 087 (71.45%) of them were identified with unique flanking sequences with an average of 82 741.9 SSR loci for each genome (Table 4). The average numbers of SSR loci with different motifs for each genome were notably different, accounting for 55.19, 74.60, 48.09, 81.94, 80.69, and 68.94% of the total SSRs for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs, respectively (Table 4). It implies that over 80% of tetra- and penta- nucleotide motifs in the maize genome can be used to design SSR markers. A total of 25 437 SSRs with unique flanking sequences were found to be shared across 17 tested genomes, of which 9240 (36.33%) were polymorphic.
Table 4.

Summary of SSR loci with unique flanking sequences identified in tested maize genomes

MotifsAverage SSRs
Total SSRs
Common SSRs
Common SSRs with polymorphism
No.%aNo.%bNo.%cNo.%
MNR43 030.455.19103 48667.5412 02954.31458838.14
DNR25 733.274.652 87681.05863079.87377643.75
TNR6689.948.0915 94661.54275559.5356020.33
TTR2397.281.94565886.0965283.2714221.78
PNR3355.480.69748184.6495083.1112913.58
HNR1535.868.94364074.7442274.174510.66
Total82 741.960.98189 08771.4525 43763.47924036.33

MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs.

aPercentage of the average number of SSRs with unique flanking sequences against all for every tested maize genome.

bPercentage of total SSR number with unique flanking sequences against all identified in 17 maize lines.

cPercentage of the common loci against all that are the same in 17 maize lines.

Summary of SSR loci with unique flanking sequences identified in tested maize genomes MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs. aPercentage of the average number of SSRs with unique flanking sequences against all for every tested maize genome. bPercentage of total SSR number with unique flanking sequences against all identified in 17 maize lines. cPercentage of the common loci against all that are the same in 17 maize lines. Of 189 087 candidate SSRs, 188 571 loci have specific physical position and would be developed as genetic markers in the study. Primer pairs were then designed for the 188 571 SSR loci, with 13 344 (chromosome 10) to 29 779 (chromosome 1) SSRs on each chromosome, and 173 587 of them were polymorphic with length differences and present-absent variation in 17 genomes. E-PCR programme was further conducted to validate and refine the specificity of new designed SSR markers, and 111 887 primer pairs of them could bind as expected and the others were amplified with multiple binding sites or false match. Through comparing SSR sequences among 17 tested genomes, a new database was developed to include 111 887 SSR markers with specific physical positions, with proportion of 59% of the candidate SSR loci with specific flanking sequences (Table 5 and Supplementary Table S1). Among these markers, SSRs with mono-, di-, tri-, tetra-, penta-, and hexa- nucleotide motifs accounted for 58.00, 26.09, 7.20, 3.00, 3.93, and 1.78%, respectively. A total of 35 573 SSR loci, accounting for 31.8% of the refined SSR markers, showed length polymorphism in the 17 tested genotypes. The PIC for these polymorphic SSRs varied from 0.05 to 0.83, with an average of 0.31 (Supplementary Table S1). SSR markers with mono- and di-nucleotide motifs showed higher levels of polymorphism (33.87 and 37.31%, respectively) than other SSR markers with tetra-, penta-, and hexa-nucleotide motifs (7.44–17.19%). Comparing with the SSR markers in MaizeGDB database, there were 18 606 SSR markers, accounting for 16.6% of the newly developed SSR markers, shared the same loci with the public SSR markers with various motifs. However, only 527 (0.47%) newly developed SSR markers had completely compatible position with public SSR primers. Additionally, the average SSR lengths and number of loci across 10 chromosomes were calculated for three SSR datasets, all SSRs, SSRs with unique flanking sequences, and polymorphic SSRs (Fig. 1). In each of the three SSR datasets, the numbers of loci gradually declined with the increase of SSR lengths, the same as shown in previous studies.[20]
Table 5.

Numbers of candidate SSR markers, and polymorphic SSR markers detected in 17 maize lines and previously developed SSR in MaizeGDB database

ChrCandidate SSR markers
Poly. (%)aSSRs in MaizeGDB database
MNR
DNR
TNR
TTR
PNR
HNR
Mono-Poly-Mono-Poly-Mono-Poly-Mono-Poly-Mono-Poly-Mono-Poly-
16957342229581737112924243984643733041230.94293
2487524882118121269715729476418462031731.71226
3507924142289121777717733069514492091730.01224
4483123312246126477416633173422451922330.73141
5462324981851114068116427667461551981832.76146
635011816149984457911824844345251581231.11111
735481870147392950515027141285441501232.83112
835531845139190150411818532315501801332.56118
930591726126284745110020339297401371633.85102
102884157212148034361272035224826111833.6883
Total42 91021 98218 30110 8946533151927805773948453184214831.791556

MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs.

Mono: monomorphism; poly: polymorphism; Chr: chromosome.

aPercent of polymorphic SSR markers over all of candidate SSR markers in silicon analysis.

Figure 1.

Distributions of 263,423 SSR loci (a) and 111,887 new developed SSR markers (b) with unique physical positions across 10 chromosomes in the B73 reference genome (www.maizesequence.org Release 4a.53). Different colors represent levels of density of SSRs.

Numbers of candidate SSR markers, and polymorphic SSR markers detected in 17 maize lines and previously developed SSR in MaizeGDB database MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs. Mono: monomorphism; poly: polymorphism; Chr: chromosome. aPercent of polymorphic SSR markers over all of candidate SSR markers in silicon analysis. Distributions of 263,423 SSR loci (a) and 111,887 new developed SSR markers (b) with unique physical positions across 10 chromosomes in the B73 reference genome (www.maizesequence.org Release 4a.53). Different colors represent levels of density of SSRs.

Distribution of SSRs in different genomic regions

A total of 264 658 SSRs were detected in 17 genomes, and 263 423 loci of them have specific physical position. The distributions of 263 423 SSR loci and 111 887 newly developed SSR markers refined by e-PCR programme across 17 tested genomes were shown in Fig. 2 a and b , respectively. SSRs were unevenly distributed on chromosome regions, and there were much more loci located in near telomeric regions than near centromeres, which was accordance with the distribution patterns of genes in maize.[21] Moreover, we compared SSR distributions across five genomic regions using tested genomes of P1, B73, and Z. mexicana to represent tropical, temperate, and wild maize germplasm, respectively (Table 6). SSR loci were most abundant in intergenic region and least frequent in UTR region. Polymorphism rate and GC content of SSRs in coding regions were higher than other genic regions.
Figure 2.

Correlation between SSR numbers and SSR lengths.

Table 6.

The distribution of SSRs in different genomic regions

 All SSR loci
SSR loci with unique flanking sequences
SSR loci with polymorphism
CountInterval (kb)Length (bp)GC%CountInterval (kb)Length (bp)GC%CountInterval (kb)Length (bp)GC%
B73
 5′-UTR38046.9116.1032.2036017.3016.1932.60133519.6916.2326.55
 3′-UTR38296.8716.1333.6836287.2516.2233.94134119.6215.9026.70
 CDS255317.2719.8074.82239123.1019.8775.28309178.7321.2856.84
 Intron11 85710.1316.0329.9411 03810.8816.1229.97470425.5416.7029.60
 Promotors10 3706.2617.3527.5792487.0217.5426.71334419.4217.2620.82
 Intergenic107 65821.6316.1750.4965 09828.5617.1246.2221 65085.8817.7042.24
 Total/average136 03915.1516.3347.0190 44122.7917.1142.6230 90066.7017.4937.61
Z. Mexicana
 5′-UTR252010.2515.5526.23226311.4215.6624.10100725.6614.3517.18
 3′-UTR33297.7617.3438.3430928.3517.4238.10110023.4815.8123.81
 CDS363614.2917.7045.01325215.9817.9343.9695454.4716.1423.08
 Intron810214.2316.2432.41716816.0816.4830.43303537.9815.6324.71
 Promotors10 9035.8617.7934.3510 0356.3717.9534.04362717.6216.3221.71
 Intergenic107 78516.5016.1849.9557 81330.7617.5142.0722 41379.3316.6438.84
 Total/average133 34615.4616.3847.1380 54525.5917.5039.8230 78766.9416.4534.45
P1
 5′-UTR26849.6015.2724.75241910.6515.3722.5092627.8215.4420.02
 3’-UTR34787.4117.0437.0732307.9817.1136.90104424.6816.9629.53
 CDS372413.9117.4444.49335115.4617.6343.4891156.8617.5128.36
 Intron862213.3215.9631.18765914.9916.1429.13307337.3616.2828.44
 Promotors11 2185.6817.5633.8210 3166.1817.6733.50348618.2817.5624.74
 Intergenic110 04316.1716.0649.5159 16730.0717.2641.8122 47579.1617.3941.03
 Total/average136 61315.0916.2446.5882 83124.8817.2439.3830 62167.3117.2837.09
The distribution of SSRs in different genomic regions Correlation between SSR numbers and SSR lengths. The average intervals between SSRs were the longest in intergenic regions, second in CDS regions, and smallest in promotors (Table 6). Distributions for the SSRs with unique flanking sequences and for the polymorphic SSRs across tested genomes also varied among the six genomic regions, but the trend was consistent with that for all the candidate SSR loci. This result is also in full agreement with a previous report in rice.[22] In addition, SSR distribution was rather similar among the three representative genotypes. Furthermore, the repeat types of SSRs in CDS region of B73 were investigated. SSRs with tri-nucleotide repeats were the most (1832) among the six repeat types, with proportion of 71.8% in CDS region. The tri- and hexa-nucleotide SSRs that would not bring the frame shift accounted for 84.1% (2148) of the SSRs in CDS region. Therefore, only 15.9% of the SSRs in CDS region have potential threats to the gene structure.

SSR markers validated for quality and polymorphism

A total of 151 SSR markers were randomly chosen for experimental validation using 20 maize inbreds and 7 teosinte lines (Fig. 3 and Table 1). Of them, 121 primer pairs (80.1%) generated specific products and distinct bands, while 30 primer pairs failed to produce stable or clear bands due to the lack of sequence specificity in the genomic DNA samples. The majority of the 121 primer pairs (112 primer pairs) revealed high levels of allelic diversity in tested 27 lines, with PIC values of 0.074–0.796 (an average of 0.478). The 112 polymorphic SSR loci contained 329 alleles in total and an average of 2.94 alleles with a range of 2–5 (Supplementary Table S2).
Figure 3.

Experimental validation of six randomly selected SSR markers in 27 genotypes. Lanes 1–27 were PCR products of Zea perennis, Z. diploperennis, Z. mays ssp. parviglumis, Z. mays ssp. huehuetenangensis, Z. nicaraguensis, Z. luxurians, Z. mays ssp. mexicana, RP125, 18Red, 18White, CML206, 81565, A318, P1, Han21, CML85, CML411, Ye478, Mo17, Zheng22, 178, 48-2, B73, Lu9801, ES40, Huangzao4, and Dan598, respectively.

Experimental validation of six randomly selected SSR markers in 27 genotypes. Lanes 1–27 were PCR products of Zea perennis, Z. diploperennis, Z. mays ssp. parviglumis, Z. mays ssp. huehuetenangensis, Z. nicaraguensis, Z. luxurians, Z. mays ssp. mexicana, RP125, 18Red, 18White, CML206, 81565, A318, P1, Han21, CML85, CML411, Ye478, Mo17, Zheng22, 178, 48-2, B73, Lu9801, ES40, Huangzao4, and Dan598, respectively. In addition, we made a detailed comparison of the allele number and PIC value in silicon analysis and in maker validation in 17 tested genotypes. Forty-one of the 121 primers possessed the practical alleles in accordance with the expected alleles, 38 primer pairs had more allele number, and 42 primer pairs had less allele number in silicon analysis than in maker validation (Supplementary Table S2). Additionally, comparing polymorphism in silicon analysis using 17 tested genomes and in maker validation using 27 genotypes, we found that 51 primer pairs showed more alleles and higher PIC values in validation experiment (Supplementary Table S2). Interestingly, 26 of 151 SSR markers with no polymorphism in silicon analysis showed more than one alleles in validation experiment. We also found that the length of PCR products in silicon analysis almost consist with those in marker validation (Supplementary Tables S1 and S2). The results indicate that newly developed SSR markers are informative and useful, and 70% of the SSR markers in our database are valid and polymorphic.

Discussion

SSRs are co-dominant, abundant, high polymorphic, and dispersed throughout plant genomes. Based on the survey across genomes, on average one SSR was found every 1.14 kb in Arabidopsis,[23] 3.6 kb in rice,[22] 4 kb in Brassica oleracea,[24] 4.5 kb in soybean,[10] 220 kb in sorghum,[25] and 578 kb in wheat.[26] In this study, average SSR density was one SSR every 15.48 kb. These may reflect real genetic differences existing among plant genomes at DNA level, and also the differences involved in sequencing methods and procedures. We used maize inbred B73 as the reference genome, some reads from maize wild relative, Z. mexicana, could not be mapped onto the reference, resulting in a relatively lower genome coverage and thus, less SSRs identified compared with other maize inbreds. Therefore, the number of SSR loci identified from Z. Mexicana may be underestimated. In general, a small difference in SSR distribution was found for different populations or ecotypes in the same species. For instance, a very similar SSR distribution was found between indica and japonica rice, and the SSR density (interval between two SSRs) varied from one SSR every 2.0–8.1 kb, which was higher in 5′-UTR (one SSR every 2.1 and 2.0 kb, respectively) but low in CDS regions (one SSR every 8.1 and 7.7 kb, respectively).[22] Our study revealed a similar SSR distribution pattern across the tested temperate, tropical, and wild maize lines. However, SSRs are not evenly distributed in different genomic regions with much lower SSR density in CDS region than in UTR and intronic regions. Intriguingly, we found that majority of SSRs resided in CDS region were tri-nucleotide repeats, which was consistent with other report and implied the specific selection against frame shift mutations in coding regions.[27] Comparing with rice genome, the average SSR length was approximately identical (16–17 bp), but the average GC content in maize SSR sequences was much higher (47%) than rice (27%). In the maize genome, the proportions of mono-, di-, and tri-nucleotide SSR motifs were ∼60, 20, and 10%, respectively. Tetra-, penta-, and hexa-nucleotide SSR motifs were less abundant, together accounting for 10%, which was accordance with the report in rice.[22] SSR densities for different motifs were also unbalanced and the average interval varied from 26 to 950 kb. Moreover, we found that C/G, AT, and CAG/CTG repeats were the most common for mono-, di-, and tri-nucleotide SSRs, respectively, in maize, while A/T, AG, and AGG/CCT repeats are the most common in rice.[22] Meanwhile, AT repeats were also the most common dinucleotide motifs in sorghum.[25] Short-read data from next-generation sequencing technologies are now being generated across a range of research projects. The fidelity of this data can be affected by several factors, and mapping errors and gaps still exist to a certain extent.[28] However, the availability of the maize genome sequence still affords us a simple and economical way to survey and identify markers, thus enabling us to develop more convenient molecular markers for breeding applications. Several sets of maize germplasm including temperate, tropical, and their wild relatives were resequenced using next-generation sequencing technology.[18,29-31] There are two major advantages in using currently available data for the analysis of SSR distribution and marker development. The maize germplasms from different ecological regions and heterotic groups (PB, SPT, PA, LRC, and BSSS) are highly diverse and host rare and unique alleles, providing an opportunity of using these types of genetic variation in hybrid maize breeding. On the other hand, whole-genome sequence data provide an ideal resource and the most complete picture of genetic variation for developing high-density genetic markers. SSR markers of highly polymorphic among diverse germplasms provide some advantages in genetics and breeding applications. In spite of considerable efforts in developing molecular markers in maize, the number of SSRs publicly available is still limited. From >260 000 SSRs identified from 17 tested genomes, we detected 111 887 SSR loci with unique flanking sequence and single binding site through genome sequence blast and e-PCR analysis. These SSR loci can be developed as polymorphic markers in silico and public on the MaizeGDB database, which are ∼60 times more than those deposited in the MaizeGDB database so far. A total of 1556 SSR markers from the MaizeGDB database have specific location, and 16.6% of the newly developed SSR markers shared the same loci with public SSR markers. For some of the public SSR markers, the amplicon size was too large and it contained several newly developed SSR primers with different motifs. Therefore, only 0.47% (527) of newly developed SSR markers had completely compatible position with public SSR markers. Another reason for a few common SSRs shared with the two datasets maybe the traditional method for SSR marker development was based on screening of small-insert or microsatellite-enriched genomic libraries by hybridization in different materials,[32] which was different from our analyses based on B73 reference genome and other resequenced genomes. Furthermore, the second-generation sequencing was different from the Sanger sequencing, which also lead to the differences. The experimental validation also proved to detect more alleles than the expected in silicon analysis due to diverse materials used in the study, but some SSR loci with little length differences were also hard to distinguish. The average marker density for the newly developed dataset reached one SSR per 14.7 kb in the B73 reference genome, indicating that maize is a highly polymorphic species.[33] The availability of abundant SSR markers allows dramatic improvement in the efficiency of marker-assisted selection and fine mapping of QTL regions. Previous studies have mainly focused on di-, tri-, and tetra-nucleotide SSRs, whereas mono-, penta-, and hexa-nucleotide SSRs have not drawn enough attention for marker development. We found that mono-nucleotide SSRs had much higher polymorphism rates than others, and penta- and hexa-nucleotide SSRs had relatively longer repeat units. Intron and UTR SSRs were more polymorphic than CDS SSRs due to low selective pressure in non-coding regions, which were consistent with previous reports.[22,34-36] Experimental validation using 20 maize inbreds and 7 teosinte species showed that over 70% of the primer pairs could generate the target bands with length polymorphism, promising a great potential for the application of these SSR markers. In practice, it would be very powerful when they are used for genetic populations derived from various types of maize germplasm that were sampled for this study.

Supplementary data

Supplementary data are available at www.dnaresearch.oxfordjournals.org.

Funding

This work was supported by the National High Technology Research and Development Program of China (2012AA101104) to Y.L. and Y.X., Sichuan Youth Science and Technology Foundation of China (2012JQ0003), and the National Natural Science Foundation of China (31101162) to Y.L.
  31 in total

1.  Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes.

Authors:  Michele Morgante; Michael Hanafey; Wayne Powell
Journal:  Nat Genet       Date:  2002-01-22       Impact factor: 38.330

2.  Maize HapMap2 identifies extant variation from a genome in flux.

Authors:  Jer-Ming Chia; Chi Song; Peter J Bradbury; Denise Costich; Natalia de Leon; John Doebley; Robert J Elshire; Brandon Gaut; Laura Geller; Jeffrey C Glaubitz; Michael Gore; Kate E Guill; Jim Holland; Matthew B Hufford; Jinsheng Lai; Meng Li; Xin Liu; Yanli Lu; Richard McCombie; Rebecca Nelson; Jesse Poland; Boddupalli M Prasanna; Tanja Pyhäjärvi; Tingzhao Rong; Rajandeep S Sekhon; Qi Sun; Maud I Tenaillon; Feng Tian; Jun Wang; Xun Xu; Zhiwu Zhang; Shawn M Kaeppler; Jeffrey Ross-Ibarra; Michael D McMullen; Edward S Buckler; Gengyun Zhang; Yunbi Xu; Doreen Ware
Journal:  Nat Genet       Date:  2012-06-03       Impact factor: 38.330

3.  Genetic structure and diversity among maize inbred lines as inferred from DNA microsatellites.

Authors:  Kejun Liu; Major Goodman; Spencer Muse; J Stephen Smith; Ed Buckler; John Doebley
Journal:  Genetics       Date:  2003-12       Impact factor: 4.562

Review 4.  Microsatellites: simple sequences with complex evolution.

Authors:  Hans Ellegren
Journal:  Nat Rev Genet       Date:  2004-06       Impact factor: 53.242

5.  Genome-wide genetic changes during modern breeding of maize.

Authors:  Yinping Jiao; Hainan Zhao; Longhui Ren; Weibin Song; Biao Zeng; Jinjie Guo; Baobao Wang; Zhipeng Liu; Jing Chen; Wei Li; Mei Zhang; Shaojun Xie; Jinsheng Lai
Journal:  Nat Genet       Date:  2012-06-03       Impact factor: 38.330

Review 6.  Genic microsatellite markers in plants: features and applications.

Authors:  Rajeev K Varshney; Andreas Graner; Mark E Sorrells
Journal:  Trends Biotechnol       Date:  2005-01       Impact factor: 19.536

Review 7.  Molecular and functional diversity of maize.

Authors:  Edward S Buckler; Brandon S Gaut; Michael D McMullen
Journal:  Curr Opin Plant Biol       Date:  2006-02-03       Impact factor: 7.834

Review 8.  Microsatellite marker development, mapping and applications in rice genetics and breeding.

Authors:  S R McCouch; X Chen; O Panaud; S Temnykh; Y Xu; Y G Cho; N Huang; T Ishii; M Blair
Journal:  Plant Mol Biol       Date:  1997-09       Impact factor: 4.076

9.  Development and mapping of SSR markers for maize.

Authors:  Natalya Sharopova; Michael D McMullen; Linda Schultz; Steve Schroeder; Hector Sanchez-Villeda; Jack Gardiner; Dean Bergstrom; Katherine Houchins; Susan Melia-Hancock; Theresa Musket; Ngozi Duru; Mary Polacco; Keith Edwards; Thomas Ruff; James C Register; Cory Brouwer; Richard Thompson; Riccardo Velasco; Emily Chin; Michael Lee; Wendy Woodman-Clikeman; Mary Jane Long; Emmanuel Liscum; Karen Cone; Georgia Davis; Edward H Coe
Journal:  Plant Mol Biol       Date:  2002 Mar-Apr       Impact factor: 4.076

10.  Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes.

Authors:  Mark J Lawson; Liqing Zhang
Journal:  Genome Biol       Date:  2006-02-21       Impact factor: 13.583

View more
  25 in total

1.  Validating a Major Quantitative Trait Locus and Predicting Candidate Genes Associated With Kernel Width Through QTL Mapping and RNA-Sequencing Technology Using Near-Isogenic Lines in Maize.

Authors:  Yanming Zhao; Xiaojie Ma; Miaomiao Zhou; Junyan Wang; Guiying Wang; Chengfu Su
Journal:  Front Plant Sci       Date:  2022-06-30       Impact factor: 6.627

2.  Genome-wide simple sequence repeat markers in potato: abundance, distribution, composition, and polymorphism.

Authors:  Yinqiao Jian; Wenyuan Yan; Jianfei Xu; Shaoguang Duan; Guangcun Li; Liping Jin
Journal:  DNA Res       Date:  2021-10-11       Impact factor: 4.477

3.  Two fingerprinting sets for Humulus lupulus based on KASP and microsatellite markers.

Authors:  Mandie Driskill; Katie Pardee; Kim E Hummer; Jason D Zurn; Keenan Amundsen; Annette Wiles; Claudia Wiedow; Josef Patzak; John A Henning; Nahla V Bassil
Journal:  PLoS One       Date:  2022-04-14       Impact factor: 3.752

4.  Transcriptome sequencing of Himalayan Raspberry (Rubus ellipticus) and development of simple sequence repeat markers.

Authors:  Samriti Sharma; Rajinder Kaur; Amol Kumar U Solanke; Himanshu Dubey; Siddharth Tiwari; Krishan Kumar
Journal:  3 Biotech       Date:  2019-03-30       Impact factor: 2.406

5.  Genome-wide simple sequence repeats (SSR) markers discovered from whole-genome sequence comparisons of multiple spinach accessions.

Authors:  Gehendra Bhattarai; Ainong Shi; Devi R Kandel; Nora Solís-Gracia; Jorge Alberto da Silva; Carlos A Avila
Journal:  Sci Rep       Date:  2021-05-11       Impact factor: 4.996

6.  Genome-Wide Analysis of Simple Sequence Repeats and Efficient Development of Polymorphic SSR Markers Based on Whole Genome Re-Sequencing of Multiple Isolates of the Wheat Stripe Rust Fungus.

Authors:  Huaiyong Luo; Xiaojie Wang; Gangming Zhan; Guorong Wei; Xinli Zhou; Jing Zhao; Lili Huang; Zhensheng Kang
Journal:  PLoS One       Date:  2015-06-12       Impact factor: 3.240

7.  Positional cloning in maize (Zea mays subsp. mays, Poaceae).

Authors:  Andrea Gallavotti; Clinton J Whipple
Journal:  Appl Plant Sci       Date:  2015-01-12       Impact factor: 1.936

8.  Genome-Wide Identification of SSR and SNP Markers Based on Whole-Genome Re-Sequencing of a Thailand Wild Sacred Lotus (Nelumbo nucifera).

Authors:  Jihong Hu; Songtao Gui; Zhixuan Zhu; Xiaolei Wang; Weidong Ke; Yi Ding
Journal:  PLoS One       Date:  2015-11-25       Impact factor: 3.240

9.  Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L.).

Authors:  Bin Han; Changbiao Wang; Zhaohui Tang; Yongkang Ren; Yali Li; Dayong Zhang; Yanhui Dong; Xinghua Zhao
Journal:  PLoS One       Date:  2015-11-04       Impact factor: 3.240

10.  Efficient molecular marker design using the MaizeGDB Mo17 SNPs and Indels track.

Authors:  A Mark Settles; Alyssa M Bagadion; Fang Bai; Junya Zhang; Brady Barron; Kristen Leach; Janaki S Mudunkothge; Cassandra Hoffner; Saadia Bihmidine; Erin Finefield; Jaime Hibbard; Emily Dieter; I Alex Malidelis; Jeffery L Gustin; Vita Karoblyte; Chi-Wah Tseung; David M Braun
Journal:  G3 (Bethesda)       Date:  2014-04-17       Impact factor: 3.154

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.