| Literature DB >> 20015383 |
Trevor J Pemberton1, Conner I Sandefur, Mattias Jakobsson, Noah A Rosenberg.
Abstract
BACKGROUND: Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database.Entities:
Mesh:
Year: 2009 PMID: 20015383 PMCID: PMC2806349 DOI: 10.1186/1471-2164-10-612
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Summary of the identification and sequence analysis of the microsatellite DNA sequences. Red bars indicate the allele size range in the HGDP-CEPH data set, for which h and H are the smallest and largest allele sizes, respectively. Blue bars indicate the allele size range in the Marshfield primer data set, for which m and M are the smallest and largest allele sizes, respectively. The BLASTN fragment size in the human RefSeq database is denoted by x. A, B, and C refer to the repeat units of the different STR regions in a microsatellite sequence, with a, b, and c being the number of times they are repeated, respectively. N indicates a nucleotide not within an STR region, with n being the number of nucleotides separating two STR regions. For microsatellites with three STR regions, n1 and n2 respectively represent the numbers of nucleotides separating the first and second, and the second and third, STR regions. Key: ∧, and; ∨, or; ROS, range overlap score.
Figure 2Mosaic plots describing microsatellites with (A) one, (B) two or (C) three separate STR regions. In the mosaic plots [117,118], tiles represent categories of microsatellites, and the area of each tile is proportional to the number of microsatellites in the corresponding category. For loci with two or more STR regions, loci are first grouped by the relationships between the repeat units in their STR regions - identical ("A=B"), similar ("A≈B"; 1 bp difference between their sequences), or different by more than 1 bp ("A≠B") - and by the sizes of those repeat units (i.e. di-, tri-, tetra-, or penta-nucleotide, with "mixed" referring to loci whose STR regions are comprised of repeat units of different sizes). Each group of loci is partitioned into distinct categories based on the distance (in nucleotides) separating their STR regions, as described below the plot, and each category is represented in a different color. Black bars represent groups that contain no loci. Filled circles represent those categories within a group that contain no loci. For microsatellites with two STR regions, n represents the number of nucleotides separating the first and second STR regions. For microsatellites with three STR regions, n1 and n2 respectively represent the numbers of nucleotides separating the first and second, and the second and third, STR regions. Key: ∧, and; ∨, or; ||A||, length of A.
The number of microsatellite loci with regular and irregular allele structure
| Regular | Irregular | |
|---|---|---|
| Number of loci | 390 | 237 |
| One STR region | 302 | 186 |
| Di | 25 | 5 |
| Tri | 94 | 39 |
| Tetra | 183 | 142 |
| Two separate STR regions | 76 | 46 |
| Di | 8 | 2 |
| Tri | 10 | 5 |
| Tetra | 58 | 39 |
| Three separate STR regions | 12 | 5 |
| Di | 3 | 0 |
| Tri | 1 | 1 |
| Tetra | 8 | 4 |
The effect of repeat unit size on microsatellite heterozygosity
| Di | Tri | Tetra | Di | Tri | Tetra | |
| Number of loci | 30 | 133 | 325 | 10 | 15 | 97 |
| Mean | 0.779 | 0.749 | 0.739 | 0.789 | 0.721 | 0.772 |
| Di | 0.246 | |||||
| Tri | 0.064 | |||||
P values are shown for two-sided Wilcoxon rank-sum tests for differences in heterozygosity (H) between microsatellites with different repeat unit sizes. Loci are grouped by whether they had one or two separate STR regions. No comparisons were made for loci with three separate STR regions because of small sample size for di-nucleotide and tri-nucleotide loci (3 and 2, respectively). Tests with P < 0.05 are highlighted in bold.
The effect of repeat unit sequence on microsatellite heterozygosity for tetra-nucleotide loci
| AAAT & TTTA | AATG | CATA & GTAT | TTCA | GATG | ATCT & TAGA | AAGG & TTCC | |
|---|---|---|---|---|---|---|---|
| Number of loci | 30 | 4 | 8 | 3 | 5 | 253 | 18 |
| Mean | 0.683 | 0.691 | 0.714 | 0.719 | 0.722 | 0.744 | 0.789 |
| AAAT [AATA-ATAA-TAAA] & TTTA [TTAT-TATT-ATTT] | 0.979 | 0.407 | 0.416 | 0.421 | |||
| AATG [ATGA-TGAA-GAAT] | 0.808 | 0.857 | 0.556 | 0.229 | 0.066 | ||
| CATA [ATAC-TACA-ACAT] & GTAT [TATG-ATGT-TGTA] | 0.921 | 1 | 0.239 | ||||
| TTCA [TCAT-CATT-ATTC] | 0.786 | 0.541 | 0.080 | ||||
| GATG [ATGG-TGGA-GGAT] | 0.304 | ||||||
| ATCT [TCTA-CTAT-TATC] & TAGA [AGAT-GATA-ATAG] | |||||||
| AAGG [AGGA-GGAA-GAAG] & TTCC [TCCT-CCTT-CTTC] | |||||||
P values are shown for two-sided Wilcoxon rank-sum tests for differences in heterozygosity (H) between the different repeat unit sequences of microsatellites with one tetra-nucleotide STR region. Four loci were excluded from these comparisons because their repeat unit sequences only appeared once (TTTG and GAAA) or twice (TCCA) in the data set. Tests with P < 0.05 are highlighted in bold.
The effect of the number of separate STR regions on microsatellite heterozygosity
| 1 STR | 2 STRs | 1 STR | 2 STRs | 1 STR | 2 STRs | 3 STRs | |
| Number of loci | 30 | 10 | 133 | 15 | 325 | 97 | 12 |
| Mean | 0.779 | 0.789 | 0.749 | 0.721 | 0.739 | 0.772 | 0.814 |
| 1 STR region | 0.770 | 0.319 | |||||
| STR regions | |||||||
P values are shown for two-sided Wilcoxon rank-sum tests for differences in heterozygosity (H) between microsatellites with one, two, or three separate STR regions. Loci are grouped by their repeat unit size. For di-nucleotide and tri-nucleotide loci, because of small sample size (3 and 2, respectively), the column for three separate STR regions was omitted. Tests with P < 0.05 are highlighted in bold.
Spearman's rank correlations of heterozygosity with microsatellite sequence properties and measures of variation across individuals
| 1 STR region | 2 STR regions | 3 STR regions | |||||
|---|---|---|---|---|---|---|---|
| Di | Tri | Tetra | Di | Tri | Tetra | Tetra | |
| Number of loci | 30 | 133 | 325 | 10 | 15 | 97 | 12 |
| G/C content of flanking sequence | 0.247 | 0.037 | -0.034 | 0.237 | 0.291 | 0.142 | 0.035 |
| Number of nucleotides separating STR regions | - | - | - | 0.140 | 0.032 | 0.252 | |
| Number of distinct alleles | 0.628 | 0.497 | |||||
| Variance in number of repeats | 0.325 | ||||||
| Range of number of repeats | 0.151 | 0.304 | |||||
| Mean PCR fragment size | 0.040 | 0.007 | -0.016 | -0152 | 0.126 | 0.308 | |
| Mean number of repeats | 0.455 | 0.018 | 0.503 | ||||
| Maximum number of repeats | 0.235 | ||||||
| Minimum number of repeats | 0.335 | -0.036 | -0.044 | 0.049 | -0.079 | -0.036 | 0.385 |
Spearman's rank correlation coefficients (ρ) are shown for comparisons of microsatellite heterozygosity with continuous microsatellite sequence properties and with measures of variation across individuals in the HGDP-CEPH data set. Microsatellites were classified by the number of separate STR regions embedded in their sequence and by their repeat unit size. Hyphens indicate comparisons that were not evaluated. For three STR regions, no comparisons were performed for di-nucleotide and tri-nucleotide loci because of small sample size (3 and 2, respectively). Correlations with P < 0.05 are highlighted in bold.
Spearman's rank correlations of heterozygosity with skewness in the number of repeats across individuals
| Number of loci | 12 | 7 | 12 | 55 | 4 | ||
| -0.105 | -0.286 | -0.252 | -0.124 | - | |||
| Number of loci | 18 | 124 | 3 | 3 | 42 | 8 | |
| -0.007 | -0.043 | - | - | -0.225 | -0.143 | ||
Spearman's rank correlation coefficients (ρ) are shown for comparisons of microsatellite heterozygosity with skewness in the distribution of the number of repeats across individuals in the HGDP-CEPH data set. Microsatellites were classified by the number of separate STR regions embedded in their sequence, by their repeat unit size, and based on whether skewness (γ1) was greater or less than zero. Hyphens indicate comparisons that were not evaluated. For three STR regions, no comparisons were performed for di-nucleotide and tri-nucleotide loci because of small sample size (3 and 2, respectively). Correlations with P < 0.05 are highlighted in bold.