| Literature DB >> 19091106 |
Robert Kofler1, Christian Schlötterer, Evita Luschützky, Tamas Lelley.
Abstract
BACKGROUND: Compound microsatellites are a special variation of microsatellites in which two or more individual microsatellites are found directly adjacent to each other. Until now, such composite microsatellites have not been investigated in a comprehensive manner.Entities:
Mesh:
Year: 2008 PMID: 19091106 PMCID: PMC2644718 DOI: 10.1186/1471-2164-9-612
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Influence of .
Frequency of compound microsatellites in the whole genome and in the coding sequence (cds).
| whole genome | coding sequence | |||||||||||
| species | m.1 | c.2 | cSSR3 | %4 | m.d.5 | c.d.6 | m.1 | c.2 | cSSR3 | %4 | m.d.5 | c.d.6 |
| 1 169 530 | 59 792 | 129 848 | 11.1 | 413.0 | 21.1 | 4 965 | 104 | 233 | 4.7 | 77.4 | 1.6 | |
| 1 178 381 | 61 407 | 134 455 | 11.4 | 445.3 | 23.2 | 3 638 | 64 | 139 | 3.8 | 71.3 | 1.3 | |
| 1 574 180 | 173 535 | 398 361 | 25.3 | 617.9 | 68.1 | 3 995 | 95 | 202 | 5.1 | 72.5 | 1.7 | |
| 1 307 474 | 133 120 | 291 304 | 22.3 | 527.8 | 53.7 | 1 883 | 92 | 226 | 12.0 | 92.6 | 4.5 | |
| 133 984 | 1 913 | 3 969 | 3.0 | 327.2 | 4.7 | 1 535 | 16 | 34 | 2.2 | 42.8 | 0.5 | |
| 233 896 | 8 532 | 17 989 | 7.7 | 237.5 | 8.7 | 1 889 | 36 | 77 | 4.1 | 58.3 | 1.1 | |
| 1 048 258 | 94 159 | 225 069 | 21.5 | 688.1 | 61.8 | 3215 | 86 | 180 | 5.6 | 72.0 | 1.9 | |
| 44 600 | 714 | 1 457 | 3.3 | 376.9 | 6.0 | 4 168 | 105 | 213 | 5.1 | 145.6 | 3.7 | |
1total number of microsatellites in DNA sequence space
2total number of compound microsatellites in DNA sequence space
3number of individual microsatellites being part of a compound microsatellite
4percentage of individual microsatellites being part of a compound microsatellite (cSSR-%)
5microsatellite density [m./Mbp]
6compound microsatellite density [c./Mbp]
H. sap.: Homo sapiens; M. mul.: Macaca mulatta; M. mus.: Mus musculus; R. nor.: Rattus norvegicus; O. anat.: Ornithorhynchus anatinus; G. gal.: Gallus gallus; D. rerio: Danio rerio; D. mel.:
Figure 2Compound microsatellite density in the chromosomes of . Regions which have not yet been sequenced are designated yellow. The scale of the compound microsatellite density is on the left hand side and the scale of the SSR density on the right hand side. The SSR and the compound microsatellite density were calculated with an sliding window approach using a window size of 5 Mbp and a step size of 1 Mbp.
Compound microsatellite complexity in the whole genome and in the cds.
| whole genome | cds | |||||||||||
| c.c.:1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ≥ 9 | 2 | 3 | 4 | ≥ 5 |
| 51 997 | 6 096 | 1 198 | 335 | 106 | 41 | 7 | 12 | 81 | 21 | 2 | 0 | |
| 52 796 | 6 565 | 1 389 | 433 | 155 | 49 | 10 | 10 | 53 | 11 | 0 | 0 | |
| 137 237 | 26 551 | 6 561 | 2 080 | 652 | 241 | 99 | 114 | 84 | 10 | 1 | 0 | |
| 113 077 | 16 505 | 2 632 | 607 | 170 | 78 | 19 | 32 | 72 | 11 | 5 | 4 | |
| 1 791 | 105 | 13 | 4 | 0 | 0 | 0 | 0 | 14 | 2 | 0 | 0 | |
| 7 782 | 610 | 115 | 17 | 6 | 2 | 0 | 0 | 32 | 3 | 1 | 0 | |
| 71 280 | 15 703 | 4 163 | 1 641 | 592 | 336 | 143 | 301 | 78 | 8 | 0 | 0 | |
| 685 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 102 | 3 | 0 | 0 | |
1compound microsatellite complexity
Complexity refers to the number of individual microsatellites constituting the compound microsatellite. All values are in counts
Overrepresentation of SSR-couples in the whole genome and in the cds.
| whole genome | cds | |||||||
| obs.1 | exp.2 | or.3 | P4 | obs.1 | exp.2 | or.3 | P4 | |
| 69 670 | 4 488 | 15 | 05 | 129 | 4 | 36 | 05 | |
| 72 780 | 4 800 | 15 | 05 | 74 | 2 | 30 | 3E-82 | |
| 223 973 | 9 526 | 23 | 05 | 107 | 3 | 40 | 05 | |
| 157 300 | 6 639 | 23 | 05 | 134 | 2 | 81 | 05 | |
| 2 052 | 399 | 5 | 05 | 18 | 1 | 28 | 6E-22 | |
| 9 435 | 512 | 18 | 05 | 41 | 1 | 40 | 9E-52 | |
| 130 012 | 7 026 | 18 | 05 | 93 | 2 | 42 | 05 | |
| 743 | 164 | 4 | 05 | 108 | 4 | 24 | 05 | |
1observed number of SSR-couples
2expected number of SSR-couples with respect to a random distribution of microsatellites within DNA sequence space
3overrepresentation (obs./exp.)
4significance of the overrepresentation based on a Poisson Distribution
5p < 1E - 99
Characteristics and probable genesis of the most abundant SSR-couples in the whole genome
| motif | obs.1 | or.2 | %plus3 | gen.4 | motif | obs.1 | or.2 | %plus3 | gen.4 |
| AT-AC | 5 975 | 134 | (100) | s | AAAG-AAGG | 5 659 | 870 | 100 | s |
| AC-AG | 5 456 | 173 | 28 | s | AC-AG | 5 628 | 169 | 31 | s |
| AAAG-AAGG | 5 149 | 844 | 100 | s | AT-AC | 5 205 | 173 | (100) | s |
| A-AAAG | 4 401 | 37 | 100 | s | A-AAAG | 4 481 | 32 | 100 | s |
| AAGG-AGGG | 4 325 | 2265 | 100 | s | AAGG-AGGG | 4 456 | 2311 | 100 | s |
| A-AT | 4 234 | 25 | (100) | s | A-AT | 3 505 | 26 | (100) | s |
| A-AAAAG | 3 263 | 50 | 100 | s | A-AAAAG | 3 296 | 42 | 100 | s |
| AT-AG | 2 025 | 133 | (100) | s | AG-AAAG | 2 582 | 222 | 100 | s |
| AG-AAAG | 1 750 | 161 | 100 | s | AT-AG | 1 618 | 146 | (100) | s |
| AAAT-AAAAT | 1 106 | 58 | 99 | s | A-AG | 1 547 | 11 | 95 | s |
| AC-AG | 38 006 | 94 | 48 | s | AC-AG | 42 254 | 103 | 50 | s |
| AAAG-AAGG | 15 941 | 943 | 100 | s | AT-AC | 7 963 | 48 | (100) | s |
| AT-AC | 11 459 | 69 | (100) | s | AAAG-AAGG | 6 248 | 1000 | 100 | s |
| AAG-AGG | 9 439 | 1983 | 100 | s | AAG-AGG | 4 662 | 1962 | 100 | s |
| AAGG-AGGG | 8 829 | 913 | 100 | s | AC-ACAG | 4 107 | 50 | 95 | s |
| AG-AAAG | 8 350 | 129 | 100 | s | AG-AGGG | 3 993 | 184 | 100 | s |
| AG-AGGG | 7 645 | 206 | 100 | s | AG-ACAG | 3 372 | 110 | 99 | s |
| AAAC-AAAAC | 3 877 | 59 | 100 | s | AC-CG | 3 013 | 308 | (100) | s |
| AG-AAGG | 3 763 | 83 | 100 | ? | AT-AG | 2 654 | 43 | (100) | s |
| A-AAAT | 3 623 | 37 | 98 | s | AC-ACGC | 2 554 | 168 | 99 | s |
| AC-AG | 476 | 267 | 4 | s | A-AAAG | 530 | 48 | 99 | s |
| AT-AC | 175 | 111 | (100) | s | AAAC-AAAAC | 412 | 74 | 100 | s |
| AAT-ATC | 113 | 11 | 14 | s | AAAG-AAGG | 341 | 1209 | 100 | s |
| AT-AG | 79 | 87 | (100) | s | AT-AC | 309 | 173 | (100) | s |
| AAT-AATG | 76 | 1 | 37 | c | A-AC | 293 | 21 | 98 | s |
| AAT-AAT | 71 | 1 | (0) | s | AAC-AAAC | 266 | 72 | 99 | s |
| AATG-ACTG | 65 | 38 | 98 | s | A-AAAC | 260 | 6 | 95 | s |
| AATG-ATCC | 37 | 79 | 0 | s | AAGG-AGGG | 254 | 5492 | 100 | s |
| AATC-AATG | 31 | 3 | 26 | c/s | A-AAAAG | 228 | 45 | 99 | s |
| AG-AAAG | 31 | 301 | 100 | s | A-AAG | 223 | 95 | 100 | s |
| AT-AC | 21 990 | 63 | (100) | s | AAC-AGC | 45 | 53 | 100 | s |
| A-AT | 11 172 | 48 | (100) | s | A-AAT | 23 | 20 | 57 | s |
| ATAG-ACAG | 10 370 | 1516 | 100 | s | AT-AC | 18 | 5 | (100) | s |
| ATAG-ATCC | 6 503 | 497 | 0 | s | AT-ATAC | 17 | 25 | (100) | s |
| AAT-AAT | 5 910 | 38 | (0) | r/s | ATC-AGC | 15 | 29 | 93 | s |
| AT-ATAC | 4 587 | 230 | (100) | s | ACC-AGC | 12 | 42 | 100 | s |
| AC-AG | 3 830 | 49 | 26 | s | AAT-AAAT | 12 | 68 | 100 | s |
| AAT-ACT | 3 685 | 316 | 84 | s | AGC-AGG | 8 | 29 | 88 | s |
| AAT-AAC | 3 624 | 204 | 91 | s | AGC-AACAGC | 7 | 69 | 100 | s |
| AT-AAAT | 2 973 | 17 | (100) | s | AT-AAT | 7 | 11 | (100) | s |
1observed number of SSR-couples having the given motif
2overrepresentation
3percent of the SSR-couples found in the plus-conformation (see Text). Values in brackets indicate that only the specified conformation is feasible (e.g.: SSR-Couples containing self complementary microsatellites)
4suggested genesis of the SSR-couple: c: chance; r: recombination; s: slippage; ?: unknown
Characteristics and probable genesis of the most abundant SSR-couples in the cds
| motif | obs.1 | or.2 | %plus3 | gen.4 | motif | obs.1 | or.2 | %plus3 | gen.4 |
| AGC-CCG | 20 | 74 | 20 | s | AAC-AGC | 12 | 2 244 | 100 | s |
| AAC-AGC | 18 | 1 913 | 100 | s | AGC-CCG | 8 | 61 | 25 | s |
| AAG-AGG | 10 | 133 | 100 | s | AAG-AGG | 7 | 160 | 100 | s |
| AGG-CCG | 9 | 38 | 22 | s | AAAG-AAGG | 5 | > 104 | 100 | s |
| AAG-ATC | 6 | 428 | 0 | s | ACC-CCG | 4 | 134 | 100 | - |
| ACC-CCG | 5 | 73 | 80 | s | AGC-AGCTCC | 3 | 367 | 100 | - |
| AGCCTG-AGGCCC | 4 | > 104 | 0 | - | AGG-AAGAGG | 3 | 508 | 100 | - |
| AGC-AGCCTG | 4 | 2 381 | 0 | - | A-AAG | 3 | 122 | 100 | - |
| AGC-AGG | 4 | 12 | 100 | - | AGC-AGG | 3 | 15 | 100 | - |
| ACG-AGG | 3 | 419 | 100 | - | AGG-CCG | 2 | 19 | 0 | - |
| AAG-AGG | 13 | 210 | 100 | s | AACC-ATCC | 16 | > 104 | 100 | s |
| AAC-AGC | 10 | 751 | 100 | s | AT-AC | 12 | 2 473 | (100) | s |
| AC-AG | 7 | 5 655 | 43 | s | AAG-AGG | 12 | 353 | 100 | s |
| CCG-AGCCGG | 6 | 2 937 | 100 | s/? | AAAG-AAGG | 9 | > 104 | 100 | s |
| AGC-AGGCCC | 6 | 732 | 100 | ? | AG-AAAG | 9 | 3520 | 100 | s |
| ACC-CCG | 5 | 121 | 100 | s | AC-AG | 7 | 481 | 86 | s |
| AAAG-AAGG | 5 | > 104 | 100 | s | CCG-AGCCGG | 5 | 4 828 | 100 | s/? |
| AGC-CCG | 4 | 25 | 0 | - | AGG-CCG | 4 | 86 | 0 | - |
| AGG-CCG | 3 | 23 | 67 | - | AG-AAGG | 4 | 2 347 | 100 | - |
| AAG-AAAAG | 2 | 1 159 | 100 | - | AG-ACAG | 4 | 9 387 | 100 | - |
| AAC-AGC | 2 | 4 265 | 100 | - | AAAG-AAGG | 5 | > 104 | 100 | s |
| AGC-AATG | 2 | 262 | 100 | - | ACG-AGC | 4 | 1 260 | 100 | - |
| ACG-AGG | 2 | 319 | 100 | - | A-AAAG | 4 | 2 605 | 100 | - |
| ACT-AGG | 2 | 3 828 | 0 | - | ACC-AGG | 3 | 121 | 0 | - |
| AC-AG | 1 | 1 866 | 0 | - | AAG-AGG | 3 | 107 | 100 | - |
| AATG-AAGG | 1 | 3 445 | 100 | - | AAGG-AGGG | 2 | > 104 | 100 | - |
| AGC-ACACC | 1 | 2 843 | 100 | - | CCG-CCGCG | 2 | 2 085 | 100 | - |
| AG-AAAG | 1 | 7 464 | 100 | - | AGC-CCG | 1 | 21 | 0 | - |
| AAC-ACACC | 1 | > 104 | 100 | - | ACCGC-AGCGG | 1 | > 104 | 0 | - |
| ATC-ACG | 1 | 1 464 | 0 | - | AGC-AGG | 1 | 12 | 100 | - |
| AAC-AGC | 12 | 788 | 100 | s | AAC-AGC | 36 | 62 | 100 | s |
| AAT-AAAT | 9 | 4 273 | 100 | s | AGC-CCG | 8 | 40 | 75 | s |
| AACC-ATCC | 6 | > 104 | 100 | s | ACC-AGC | 7 | 19 | 100 | s |
| AC-AC | 6 | 41 | (0) | r/? | AGC-AGG | 5 | 13 | 80 | s |
| ATCC-ACGG | 6 | > 104 | 0 | s | AAT-AAC | 4 | 315 | 100 | - |
| ATC-ACG | 4 | 5 622 | 0 | - | AAC-ATC | 4 | 140 | 100 | - |
| AAG-ATC | 4 | 113 | 0 | - | ATC-AGC | 4 | 24 | 100 | - |
| ATC-AGG | 4 | 58 | 0 | - | ACG-AGG | 3 | 240 | 100 | - |
| AAT-ACT | 3 | 9 081 | 100 | - | AGC-AACAGC | 3 | 31 | 100 | - |
| ACC-AGC | 3 | 126 | 0 | - | AAC-ACC | 3 | 47 | 100 | - |
1observed number of SSR-couples having the given motif
2overrepresentation
3percent of the SSR-couples found in the plus-conformation (see Text). Values in brackets indicate that only the specified conformation is feasible (e.g.: SSR-Couples containing self complementary microsatellites)
4suggested genesis of the SSR-couple: c: chance; r: recombination; s: slippage; ?: unknown
Overview of the recognition pattern of different mechanism potentially generating SSR-couples
| proposed origin | overrepresentation | conformation | motif length | motif similarity |
| chance (c) | none (low) | balanced | none required | none required |
| recombination (r) | medium | unbalanced – minus | equal | reverse complement |
| slippage (s) | high | unbalanced | equal (stepwise equal) | high |
Features of the DNA sequences used in this work.
| whole genome | #1 | 23 | 21 | 20 | 21 | 19 | 31 | 25 | 6 |
| nt2 | 2 832 | 2 646 | 2 547 | 2 477 | 409 | 984 | 1 523 | 118 | |
| cds | #1 | 41 997 | 35 463 | 36 240 | 14 026 | 26 818 | 22 013 | 31 623 | 17 242 |
| nt2 | 64 | 51 | 55 | 20 | 36 | 32 | 44 | 28 | |
| 5'-UTR | #1 | 31 051 | 16 925 | 27 882 | 11 269 | 1 737 | 10 353 | 8 717 | 14 466 |
| nt2 | 9 | 4 | 7 | 3 | 0.2 | 1 | 1 | 4 | |
| 3'-UTR | #1 | 28 839 | 17 284 | 27 124 | 11 441 | 2 436 | 12 444 | 8 615 | 11 351 |
| nt2 | 31 | 12 | 28 | 7 | 1 | 6 | 5 | 5 | |
1number of individual fasta sequences
2length of the sequence, not considering the character 'N')
All data were obtained with the tool 'Seq-CC' (see section Bioinformatics)