| Literature DB >> 20041112 |
Jianbing Yan1, Trushar Shah, Marilyn L Warburton, Edward S Buckler, Michael D McMullen, Jonathan Crouch.
Abstract
A newly developed maize Illumina GoldenGate Assay with 1536 SNPs from 582 loci was used to genotype a highly diverse global maize collection of 632 inbred lines from temperate, tropical, and subtropical public breeding programs. A total of 1229 informative SNPs and 1749 haplotypes within 327 loci was used to estimate the genetic diversity, population structure, and familial relatedness. Population structure identified tropical and temperate subgroups, and complex familial relationships were identified within the global collection. Linkage disequilibrium (LD) was measured overall and within chromosomes, allelic frequency groups, subgroups related by geographic origin, and subgroups of different sample sizes. The LD decay distance differed among chromosomes and ranged between 1 to 10 kb. The LD distance increased with the increase of minor allelic frequency (MAF), and with smaller sample sizes, encouraging caution when using too few lines in a study. The LD decay distance was much higher in temperate than in tropical and subtropical lines, because tropical and subtropical lines are more diverse and contain more rare alleles than temperate lines. A core set of inbreds was defined based on haplotypes, and 60 lines capture 90% of the haplotype diversity of the entire panel. The defined core sets and the entire collection can be used widely for different research targets.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20041112 PMCID: PMC2795174 DOI: 10.1371/journal.pone.0008451
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of SNPs used in this study.
| Chr. | SNP Number | Unique loci | Minor Allelic Frequency | |||
| 2+SNPs | 1SNP | ≥0.05 | ≥0.1 | ≥0.2 | ||
| 1 | 211 | 61 | 19 | 178 | 148 | 98 |
| 2 | 166 | 37 | 33 | 124 | 93 | 60 |
| 3 | 131 | 33 | 30 | 92 | 77 | 52 |
| 4 | 132 | 36 | 26 | 101 | 75 | 45 |
| 5 | 128 | 36 | 30 | 102 | 79 | 43 |
| 6 | 65 | 18 | 14 | 50 | 38 | 25 |
| 7 | 112 | 22 | 17 | 90 | 75 | 47 |
| 8 | 102 | 26 | 13 | 82 | 75 | 48 |
| 9 | 83 | 21 | 13 | 63 | 56 | 38 |
| 10 | 73 | 16 | 13 | 61 | 45 | 34 |
| Unknown | 26 | 21 | 3 | |||
| Total | 1229 | 327 | 211 | 943 | 761 | 490 |
Properties of SNPs and SNP haplotypes.
| Marker | Loci | Represent unique locus | Alleles |
| SNP | 211 | 211 | 422 |
| 1018 | 327 | 2036 | |
| SNP haplotype | 327 | 327 | 1749 |
*SNPs from same locus within 10Kb region were combined and identified as a unique locus.
Number of haplotypes (alleles) observed in each locus from which one or more SNPs were amplified.
| SNPs | Locus | Allele number | |||||||||||||||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 19 | 20 | 23 | 33 | 41 | ||
| 1 | 211 | 211 | |||||||||||||||||||
| 2 | 154 | 5 | 90 | 59 | |||||||||||||||||
| 3 | 81 | 4 | 32 | 13 | 17 | 10 | 5 | ||||||||||||||
| 4 | 52 | 1 | 3 | 3 | 14 | 10 | 9 | 3 | 1 | 4 | 1 | 1 | 1 | 1 | |||||||
| 5 | 14 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 1 | 1 | |||||||||||
| 6 | 13 | 2 | 2 | 1 | 2 | 1 | 1 | 2 | 2 | ||||||||||||
| 7 | 5 | 1 | 1 | 1 | 1 | 1 | |||||||||||||||
| 8 | 4 | 1 | 1 | 1 | 1 | ||||||||||||||||
| 9 | 1 | 1 | |||||||||||||||||||
| 10 | 1 | 1 | |||||||||||||||||||
| 12 | 1 | 1 | |||||||||||||||||||
| 13 | 1 | 1 | |||||||||||||||||||
| Total | 538 | 217 | 97 | 94 | 31 | 32 | 22 | 13 | 5 | 5 | 4 | 3 | 1 | 2 | 3 | 3 | 1 | 2 | 1 | 1 | 1 |
Figure 1Allele frequency for total SNPs.
SNPs scored as biallelic markers are in the top panel, and scored as SNP haplotypes are in the bottom panel.
Figure 2Estimated ln(probability of the data).
Ln(probability of the data) was calculated for K ranging from 1 to 10 (top panel) and delta K values for SNPs and SNP haplotypes for K from 2 to 9 (bottom panel).
Figure 3Estimated population structure of the diverse inbred maize lines in the study.
Each of the 632 individuals is represented by a thin vertical line, which is partitioned into k colored segments that represent the individual estimated membership to the k clusters.
Figure 4Distribution of pairwise relative kinship values.
Values equal to or greater than 0.40 were gouped as 0.40.
Diversity comparison of tropical (A) and temperate (B) and subgroups.
| Subset size | Allele (number) | within-group diversity (%) | total diversity (%) | |||||||||
| SNP | Haplotype | SNP | Haplotype | SNP | Haplotype | |||||||
| A | B | A | B | A | B | A | B | A | B | A | B | |
| 10 | 2155 | 2162 | 1015 | 999 | 89.2 | 92.8 | 61.7 | 74.6 | 87.7 | 88.0 | 56.9 | 56.0 |
| 20 | 2309 | 2268 | 1236 | 1162 | 95.5 | 97.4 | 75.1 | 86.7 | 93.9 | 92.3 | 69.2 | 65.1 |
| 40 | 2390 | 2322 | 1414 | 1281 | 98.9 | 99.7 | 85.9 | 95.6 | 97.2 | 94.5 | 79.2 | 71.8 |
| 60 | 2414 | 2329 | 1503 | 1323 | 99.9 | 100.0 | 91.3 | 98.7 | 98.2 | 94.8 | 84.2 | 74.1 |
| 80 | 2417 | 2329 | 1559 | 1340 | 100.0 | 100.0 | 94.7 | 100.0 | 98.3 | 94.8 | 87.3 | 75.1 |
| 100 | 2417 | 2329 | 1590 | 1340 | 100.0 | 100.0 | 96.6 | 100.0 | 98.3 | 94.8 | 89.1 | 75.1 |
| 120 | 2417 | 2329 | 1613 | 1340 | 100.0 | 100.0 | 98.0 | 100.0 | 98.3 | 94.8 | 90.4 | 75.1 |
| 140 | 2417 | 2329 | 1632 | 1340 | 100.0 | 100.0 | 99.1 | 100.0 | 98.3 | 94.8 | 91.4 | 75.1 |
| 156 | 2417 | 2329 | 1643 | 1340 | 100.0 | 100.0 | 99.8 | 100.0 | 98.3 | 94.8 | 92.0 | 75.1 |
| 365 | 2417 | 1646 | 100.0 | 100.0 | 98.3 | 92.2 | ||||||
*the ratio of captured alleles within the subgroup compared to a given subset.
the ratio of captured alleles in the whole panel compared to a given subset.
Alleles captured by subsets of the data for SNP haplotypes.
| Subset size | Haplotypes obtained | Captured (%) | Inbreds |
| 20 | 1333 | 74.7 | H84, IDS91, K4, Ki44, L109, L128, L172, L292, L419, Mo45, NC236, NC264, NC302, NC306, NC342, NC362, Tzi16, Va26, Va99, Wf9 |
| 27 | 1332 | 74.6 | B73, B97, CML103, CML228, CML247, CML277, CML322, CML333, CML52, CML69, HP301, Il14H, Ki11, Ki3, Ky21, M162W, M37W, Mo17, Mo18W, MS71, NC350, NC358, Oh43, OH7B, P39, Tx303, Tzi8 |
| 40 | 1517 | 85.0 | B73Htrhm, CI66, CM174, CML258, CO106, H84, IDS69, IDS91, K4, K55, Ki44, KY226, L108, L12, L128, L165, L18, L202, L268, L284, L292, L337, L363, L374, L398, L414, L67, L91, MoG, NC260, NC264, NC300, NC302, NC342, NC362, SC213R, Tzi16, Va17, Va35, Va99 |
| 60 | 1608 | 90.1 | B115, B75, CML258, CML312, CML38, CO106, F2834T, H84, I137TN, I205, IDS91, K4, K55, Ki44, KY226, L101, L108, L109, L111, L114, L12, L128, L131, L18, L192, L198, L209, L217, L245, L256, L264, L284, L291, L292, L30, L349, L36, L374, L39, L419, L64, L67, L7, L91, Mo46, MoG, NC236, NC260, NC264, NC296, NC300, NC302, NC338, NC342, NC362, Oh40B, Tzi16, VA102, Va35, Va99 |
| 80 | 1660 | 93.0 | 38-11, A272, A6, A661, B115, B75, CML14, CML258, CML261, CML312, CO125, F2834T, H84, I137TN, I205, IDS91, K148, K4, K55, Ki44, KY228, L108, L109, L111, L114, L119, L12, L128, L154, L170, L18, L19, L192, L198, L200, L208, L245, L248, L258, L284, L290, L292, L296, L30, L333, L349, L368, L373, L417, L419, L436, L437, L453, L454, L54, L59, L64, L67, L79, L91, L99, Mo17, NC236, NC250, NC260, NC264, NC296, NC300, NC302, NC338, NC342, NC362, Oh40B, Sg18, T232, VA102, Va17, Va35, Va99, W182B |
| 100 | 1698 | 95.1 | A272, A441-5, A680, B115, B73Htrhm, C49A, CI66, CM37, CML14, CML258, CML261, CML312, CML341, I137TN, I205, IDS91, K4, K55, Ki11, Ki44, KY228, L108, L109, L114, L119, L12, L128, L131, L154, L160, L170, L173, L18, L181, L185, L189, L200, L201, L209, L230, L232, L245, L246, L248, L250, L262, L27, L284, L291, L293, L296, L297, L309, L317, L328, L333, L334, L343, L349, L36, L363, L368, L373, L39, L406, L414, L419, L437, L438, L445, L454, L5, L51, L54, L578, L59, L64, L67, L71, L83, L91, L99, Mo47, MoG, NC236, NC264, NC294, NC296, NC300, NC302, NC338, NC342, NC362, Oh40B, SA24, Tzi16, VA102, Va26, Va35, Va99 |
*Represent the 27 parents of the NAM population.
The detailed pedigree information of the inbreds, including group classification by Structure, can be found in Table S1.
Comparison of the 21 “CML” lines with the same name stored at CIMMYT and North Carolina State University, respectively.
| Lines | Different SNPs | Total SNPs | difference (%) |
| CML103 | 65 | 1182 | 5.5 |
| CML108 | 20 | 1177 | 1.7 |
| CML218 | 9 | 1160 | 0.8 |
| CML220 | 100 | 999 | 10.0 |
| CML228 | 56 | 1148 | 4.9 |
| CML238 | 29 | 1134 | 2.6 |
| CML254 | 3 | 1182 | 0.3 |
| CML261 | 8 | 1199 | 0.7 |
| CML281 | 16 | 1197 | 1.3 |
| CML287 | 3 | 1204 | 0.2 |
| CML321 | 6 | 1212 | 0.5 |
| CML322 | 231 | 1182 | 19.5 |
| CML323 | 2 | 1175 | 0.2 |
| CML328 | 171 | 867 | 19.7 |
| CML331 | 10 | 1216 | 0.8 |
| CML333 | 5 | 1197 | 0.4 |
| CML341 | 34 | 1193 | 2.8 |
| CML38 | 39 | 1085 | 3.6 |
| CML69 | 27 | 1140 | 2.4 |
| CML91 | 150 | 1109 | 13.5 |
| CML92 | 18 | 1206 | 1.5 |
| Average | 47.7 | 1150.7 | 4.1 |
Mean LD among all SNPs with a minimum allelic frequency greater than 0.05, over different map distances and across 10 chromosomes.
| Distance | N | Mean ( | SD ( | 25th percentile | 50th percentile | 75th percentile | Minimum number SNP required to cover genome |
| 0–0.1kb | 114 | 0.237 | 0.283 | 0.019 | 0.121 | 0.342 | >24,000,000 |
| 0.1–0.2kb | 167 | 0.244 | 0.278 | 0.044 | 0.128 | 0.33 | 24,000,000-12,000,000 |
| 0.2–0.3kb | 121 | 0.211 | 0.27 | 0.029 | 0.099 | 0.31 | 12,000,000-8,000,000 |
| 0.3–0.4kb | 98 | 0.234 | 0.28 | 0.022 | 0.099 | 0.376 | 8,000,000-6,000,000 |
| 0.4–0.6kb | 80 | 0.131 | 0.212 | 0.013 | 0.055 | 0.159 | 6,000,000-4,000,000 |
| 0.6–1kb | 122 | 0.207 | 0.278 | 0.013 | 0.096 | 0.283 | 4,000,000-2,400,000 |
| 1–1.5kb | 100 | 0.287 | 0.332 | 0.032 | 0.126 | 0.408 | 2,400,000-1,600,000 |
| 1.5–2kb | 53 | 0.213 | 0.304 | 0.017 | 0.032 | 0.25 | 1,600,000-1,200,000 |
| 2–5kb | 131 | 0.158 | 0.255 | 0.013 | 0.044 | 0.168 | 1,200,000-480,000 |
| 5–10kb | 34 | 0.077 | 0.126 | 0.011 | 0.033 | 0.091 | 480,000-240,000 |
| 10–100kb | 68 | 0.053 | 0.139 | 0.003 | 0.012 | 0.032 | 240,000-24,000 |
| 0.1–1Mb | 479 | 0.022 | 0.099 | 0.001 | 0.004 | 0.012 | 24,000-2,400 |
| 1–5Mb | 2285 | 0.008 | 0.014 | 0.001 | 0.003 | 0.009 | 2,400-1,600 |
| 5–10Mb | 2911 | 0.006 | 0.011 | 0.001 | 0.003 | 0.007 | 1,600-240 |
| 10–100Mb | 23701 | 0.005 | 0.011 | 0 | 0.002 | 0.006 | 240-24 |
| >100Mb | 19618 | 0.004 | 0.007 | 0 | 0.002 | 0.005 | NA |
Figure 5Linkage disequillibrium across the maize 10 chromosomes measured with 914 SNPs.
Only SNPs with a minor allele frequency greater than 0.05 are shown.
Average LD decay distance of the 10 chromosomes for r greater than 0.1.
| Chr. | LD decay (kb) |
| 1 | 1.5–2 |
| 2 | 5–10 |
| 3 | 5–10 |
| 4 | 5–10 |
| 5 | 5–10 |
| 6 | 2–5 |
| 7 | 5–10 |
| 8 | 5–10 |
| 9 | 5–10 |
| 10 | 2–5 |
| Average | 5–10 |
Figure 6Mean LD estimates at different physical distances for three different minimum threshold cutoff levels for minimum allele frequency.
Mean LD estimates are pooled over all chromosomes, and three different minimum threshold cutoff levels for minimum allele frequency are shown.
Correlations between pairwise estimates of LD obtained from different sample sizes compared to the entire sample of 632 lines.
| Entire sample | Selected sample | Correlations of estimates between selected and entire sample | |
|
| MAF | ||
| 632 | 25 | 0.55 (0.51–0.57) | 0.79 (0.77–0.81) |
| 632 | 50 | 0.75 (0.73–0.76) | 0.89 (0.88–0.91) |
| 632 | 100 | 0.79 (0.78–0.80) | 0.95 (0.94–0.95) |
| 632 | 200 | 0.89 (0.89–0.89) | 0.98 (0.98–0.98) |
| 632 | 400 | 0.91 (0.90–0.91) | 0.99 (0.99–0.99) |
Each subsample was taken 10 times to obtain a mean and range.
Figure 7Mean LD estimates for different physical distances for six different sample sizes.
Mean LD estimates are pooled over all chromosomes, and six different sample sizes using SNPs with minor allele frequency greater than 0.05 are shown.
Figure 8Mean LD estimates at different physical distances for tropical and temperate subgroups.
Mean LD estimates are pooled over all chromosomes, and two subgroups (A = tropical, B = temperate) at sample size = 80 are shown, using SNPs with a minor allele frequency of 0.05.