| Literature DB >> 18793407 |
Jifeng Tang1, Samantha J Baldwin, Jeanne Me Jacobs, C Gerard van der Linden, Roeland E Voorrips, Jack Am Leunissen, Herman van Eck, Ben Vosman.
Abstract
BACKGROUND: Simple Sequence Repeat (SSR) or microsatellite markers are valuable for genetic research. Experimental methods to develop SSR markers are laborious, time consuming and expensive. In silico approaches have become a practicable and relatively inexpensive alternative during the last decade, although testing putative SSR markers still is time consuming and expensive. In many species only a relatively small percentage of SSR markers turn out to be polymorphic. This is particularly true for markers derived from expressed sequence tags (ESTs). In EST databases a large redundancy of sequences is present, which may contain information on length-polymorphisms in the SSR they contain, and whether they have been derived from heterozygotes or from different genotypes. Up to now, although a number of programs have been developed to identify SSRs in EST sequences, no software can detect putatively polymorphic SSRs.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18793407 PMCID: PMC2562394 DOI: 10.1186/1471-2105-9-374
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Number of ESTs, clusters, SSRs and polymorphic SSRs of chicken, rice, Arabidopsis, Brassica, potato and tomato
| chicken | rice | potato | tomato | |||
| ESTs | 599,330 | 1,211,078 | 734,275 | 163,750 | 219,765 | 249,794 |
| Non-redundant sequences1 | 283,434 | 493,818 | 224,994 | 58,260 | 72,381 | 54,182 |
| Clusters1 | 44,654 | 35,154 | 33,052 | 20,468 | 25,228 | 21,229 |
| Singletons1 | 238,780 | 458,664 | 191,942 | 37,792 | 47,153 | 32,953 |
| ESTs with SSRs (%)2 | 74,297 (12%) | 336,569 (28%) | 127,757 (17%) | 14,968 (9%) | 29,481 (13%) | 28,728 (11%) |
| SSRs Non-redundant sequences (%)2 | 40,020 (14%) | 133,861 (27%) | 38,096 (17%) | 13,251 (23%) | 10,537 (15%) | 7,163 (13%) |
| Singletons with SSRs (%)2 | 31,119 (13%) | 118,649 (26%) | 29,843 (16%) | 7,328 (19%) | 5,717 (12%) | 3,261 (10%) |
| SSR in clusters (%)2 | 8,901 (20%) | 15,212 (43%) | 8,253 (25%) | 5,923 (29%) | 4,820 (19%) | 3,902 (18%) |
| Polymorphic SSRs3 (%) | 1,724 (19%) | 2,646 (17%) | 1,248 (15%) | 997 (17%) | 1,080 (22%) | 265 (7%) |
| Polymorphic SSRs with primers (%) | 1,667 (97%) | 2,555 (97%) | 1,163 (93%) | 937 (94%) | 1,053 (97%) | 263 (99%) |
| % polymorphism in long SSRs4 | 15% | 11% | 12% | 13% | 17% | 6% |
| % polymorphism in short SSRs5 | 36% | 23% | 30% | 40% | 46% | 14% |
1 clusters and singletons were produced using CAP3 with 95% similarity for 100 nucleotides overlaps; 'non-redundant sequences' includes clusters and singletons
2 SSR detected by Sputnik [13] using default settings
3polymorphic SSRs detected by PolySSR (settings used are described in Materials and Methods); % = percentage of SSRs in clusters that are polymorphic SSRs.
4 long polymorphic SSRs are SSRs with at least 10 repeats for dinucleotide SSRs, 6 repeats for trinuleotide SSRs, 5 repeats for tetra-, penta- or hexanucleotide SSRs; % = percentage of polymorphic SSRs in long SSRs;
5short polymorphic SSRs are SSRs with a maximum of 5 repeats for dinucleotide SSRs, and 4 repeats for tri-, tetra-, penta- and hexanucleotide SSRs; the minimum number of repeats was in this case set to 3 for all SSR types; % = percentage of polymorphic SSR in short SSRs
Results of experimental validation of predicted long and short, polymorphic and monomorphic EST-SSRs of potato
| Number of SSRs | long poly2a | short poly2b | long mono3a | short mono3b | total |
| for which primers were designed | 25 | 25 | 15 | 15 | 80 |
| with no or not clear products | 2 | 4 | 6 | 1 | 13 |
| with products more than 500 bp | 2 | 2 | 1 | 2 | 7 |
| that produced scorable markers1 | 21 (84%) | 19 (76%) | 8 (53%) | 12 (80%) | 60 (75%) |
| N of polymorphic SSRs | 21 (100%) | 18 (95%) | 7 (88%) | 10 (83%) | 56 (93%) |
1Marker are considered scorable when they produced amplicons of less than 500 base pairs
2 EST-SSRs classified as polymorphic by PolySSR; 3 EST-SSRs classified as monomorphic by PolySSR; along SSRs: at least 10 repeats in repeat motif for dinucleotide SSRs, 6 for tri-SSRs, 5 for tetra-, penta- and hexa-SSRs; bshort SSRs: at most 5 repeats for dinucleotide SSRs, 4 for tri-, tetra-, penta- and hexa-SSRs (see Materials and Methods).
Motif length and position of polymorphic SSRs in EST sequences of chicken, rice, Arabidopsis, Brassica, potato and tomato
| species | Dia- | Tri-a | Pentaa | Othersb | total | Densityc | |
| chicken | all | 846 (49%) | 378 (22%) | 386 (22%) | 114 (7%) | 1724 (100%) | 71.51 |
| coding | 95 | 70 | 38 | 9 | 212 | 55.13 | |
| 5'UTR1 | 19 | 33 | 24 | 10 | 86 | 137.66 | |
| 3'UTR2 | 155 | 54 | 77 | 24 | 310 | 74.81 | |
| TSS3 | 1 | 3 | 3 | 1 | 8 | ||
| rice | all | 1070 (40%) | 1113 (42%) | 320 (12%) | 143 (5%) | 2646 (100%) | 93.47 |
| coding | 154 | 372 | 86 | 26 | 638 | 57.33 | |
| 5'UTR1 | 294 | 285 | 69 | 32 | 680 | 227.16 | |
| 3'UTR2 | 342 | 121 | 70 | 36 | 569 | 92.66 | |
| TSS3 | 5 | 0 | 1 | 1 | 7 | ||
| all | 668 (53%) | 410 (33%) | 133(11%) | 37 (3%) | 1248 (100%) | 102.63 | |
| coding | 196 | 222 | 34 | 13 | 465 | 72.84 | |
| 5'UTR1 | 193 | 73 | 34 | 7 | 307 | 215.96 | |
| 3'UTR2 | 136 | 42 | 40 | 10 | 228 | 112.02 | |
| TSS3 | 6 | 2 | 1 | 1 | 10 | ||
| all | 468 (47%) | 350 (35%) | 139 (14%) | 40 (4%) | 997 (100%) | 119.45 | |
| coding | 55 | 203 | 43 | 13 | 314 | 63.09 | |
| 5'UTR1 | 205 | 55 | 27 | 13 | 300 | 378.62 | |
| 3'UTR2 | 126 | 44 | 47 | 9 | 226 | 169.90 | |
| TSS3 | 1 | 2 | 3 | 2 | 8 | ||
| potato | all | 379 (35%) | 358 (33%) | 248 (23%) | 95 (9%) | 1080 (100%) | 89.78 |
| coding | 61 | 155 | 87 | 44 | 347 | 62.66 | |
| 5'UTR1 | 64 | 38 | 29 | 8 | 139 | 160.21 | |
| 3'UTR2 | 129 | 40 | 59 | 24 | 252 | 134.75 | |
| TSS3 | 0 | 1 | 3 | 1 | 5 | ||
| tomato | all | 124 (47%) | 79 (30%) | 54 (20%) | 8 (3%) | 265 (100%) | 98.52 |
| coding | 42 | 35 | 19 | 2 | 98 | 67.06 | |
| 5'UTR1 | 50 | 18 | 19 | 1 | 88 | 265.74 | |
| 3'UTR2 | 12 | 7 | 5 | 0 | 24 | 70.79 | |
| TSS3 | 1 | 1 | 1 | 1 | 4 | ||
aDi- is dinucleotide SSR, and so on for others; bothers means tetranucleotide and hexanucleotides repeats or higher
cDensity of number of SSR per 100 kb of EST sequences (See Materials & Methods).
1 5' UTR is the portion of an mRNA from the 5' end to the position of the first codon used in translation.
2The 3' UTR is the portion of an mRNA from the last codon used in translation to the 3' end of the mRNA.
3 TSS is the position of the first codon (translation start) and the last codon (translation stop) used in translation. SSR contain the first codon or the last codon.
Figure 1An example of unreliable polymorphic SSRs. Since the repeat chain in EST 3 and 4 does not extend to the end it is not clear whether these two ESTs represent a different (shorter) allele of the SSR or not. For that reason a minimum length for the flanking sequence used must be specified to reliably detect polymorphic SSRs.
Figure 2Flowchart of the PolySSR pipeline.
Figure 3Flowchart of the PolySSR core program. Two parameters used in step 2 are the degree of matching in a repeat motif and the degree of matching in a repeat chain; four parameters used in step 3 include two parameters from step 2, and plus the length of flanking sequences of repeats and the minimum repeat times for different length of repeat motifs; three parameters used in step 4 consist of two parameters used in step 2 and the minimum number of sequences per allele. * actions in steps 2, 3 and 4 all use the algorithm described in Figure 4 and in the Materials and Methods section.
Figure 4The flowchart used to identify perfect and imperfect repeat chains. The parameter used in step 2 is the degree of matching in a repeat motif; the parameter used in step 3 is the degree of matching in a repeat.