| Literature DB >> 28158969 |
Lori L Hinze1, Amanda M Hulse-Kemp2, Iain W Wilson3, Qian-Hao Zhu3, Danny J Llewellyn3, Jen M Taylor3, Andrew Spriggs3, David D Fang4, Mauricio Ulloa5, John J Burke5, Marc Giband6,7, Jean-Marc Lacape6, Allen Van Deynze2, Joshua A Udall8, Jodi A Scheffler9, Steve Hague10, Jonathan F Wendel11, Alan E Pepper12,13, James Frelichowski14, Cindy T Lawley15, Don C Jones16, Richard G Percy14, David M Stelly17,18.
Abstract
BACKGROUND: Cotton germplasm resources contain beneficial alleles that can be exploited to develop germplasm adapted to emerging environmental and climate conditions. Accessions and lines have traditionally been characterized based on phenotypes, but phenotypic profiles are limited by the cost, time, and space required to make visual observations and measurements. With advances in molecular genetic methods, genotypic profiles are increasingly able to identify differences among accessions due to the larger number of genetic markers that can be measured. A combination of both methods would greatly enhance our ability to characterize germplasm resources. Recent efforts have culminated in the identification of sufficient SNP markers to establish high-throughput genotyping systems, such as the CottonSNP63K array, which enables a researcher to efficiently analyze large numbers of SNP markers and obtain highly repeatable results. In the current investigation, we have utilized the SNP array for analyzing genetic diversity primarily among cotton cultivars, making comparisons to SSR-based phylogenetic analyses, and identifying loci associated with seed nutritional traits.Entities:
Keywords: Breeding; Cotton; Diversity analysis; Genome-wide association analysis; Germplasm collection; Molecular markers; Seed protein content
Mesh:
Substances:
Year: 2017 PMID: 28158969 PMCID: PMC5291959 DOI: 10.1186/s12870-017-0981-y
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Measures of SNP-based genetic diversity within G. hirsutum germplasm groups
| Average MAF | |||||
|---|---|---|---|---|---|
| Group | N | Including monomorphic | Excluding monomorphic | Total polymorphic SNPs (proportion) | Genetic diversity (HE) |
|
| 390 | 0.184 | 0.213 | 33507 (0.86) | 0.249 |
|
| 363 | 0.169 | 0.252 | 25829 (0.67) | 0.225 |
|
| 292 | 0.145 | 0.242 | 23145 (0.60) | 0.195 |
|
| 71 | 0.177 | 0.260 | 26299 (0.68) | 0.232 |
|
| 185 | 0.141 | 0.241 | 22626 (0.58) | 0.190 |
|
| 107 | 0.147 | 0.248 | 22961 (0.59) | 0.197 |
|
| 48 | 0.132 | 0.234 | 21810 (0.56) | 0.177 |
|
| 48 | 0.111 | 0.222 | 19357 (0.50) | 0.151 |
|
| 43 | 0.135 | 0.245 | 21350 (0.55) | 0.183 |
|
| 12 | 0.118 | 0.267 | 17093 (0.44) | 0.157 |
|
| 34 | 0.142 | 0.231 | 23851 (0.62) | 0.193 |
Number of polymorphic SNPs (MAF ≥ 0.01) was calculated out of 38,822 SNPs. N sample size, MAF minor allele frequency
Average expected number of SNPs between a pair of genotypes based on Gossypium germplasm group
|
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Group |
| Overall | Improved | Wild | US | Other countries | Eastern | Mid-south | Plains | Western | N/A |
|
| 7167.8 | ||||||||||
|
| 6866.0 | 6564.2 | |||||||||
| Improved | 6408.7 | 6107.0 | 5649.7 | ||||||||
| Wild | 7029.7 | 6727.9 | 6270.6 | 6891.5 | |||||||
|
| |||||||||||
| US | 6334.5 | 6032.7 | 5575.5 | 6196.4 | 5501.2 | ||||||
| Other countries | 6441.7 | 6139.9 | 5682.7 | 6303.6 | 5608.5 | 5715.7 | |||||
|
| |||||||||||
| Eastern | 6154.2 | 5852.5 | 5395.2 | 6016.1 | 5321.0 | 5428.2 | 5140.7 | ||||
| Mid-south | 5744.1 | 5442.3 | 4985.1 | 5606.0 | 4910.8 | 5018.1 | 4730.6 | 4320.4 | |||
| Plains | 6212.6 | 5910.9 | 5453.6 | 6074.5 | 5379.4 | 5486.6 | 5199.1 | 4789.0 | 5257.5 | ||
| Western | 5875.9 | 5574.1 | 5116.9 | 5737.8 | 5042.6 | 5149.8 | 4862.4 | 4452.2 | 4920.8 | 4584.0 | |
| N/A | 6350.6 | 6048.8 | 5591.5 | 6212.5 | 5517.3 | 5624.5 | 5337.0 | 4926.9 | 5395.5 | 5058.7 | 5533.4 |
Calculations were based on the average MAF (including monomorphic SNPs) for each group obtained from Table 1
Fig. 1SNPs unique and common to different sets of G. hirsutum germplasm. a 292 improved and 71 wild samples, b improved samples from the United States (185) and from other countries (107), and c improved types from breeding regions within the United States (eastern, 48 samples; mid-south, 48; plains, 43; western, 12; n/a (unclassified breeding region), 34)
Average proportion of alleles shared identical by state (IBS) as an estimate of genetic similarity
| a) | Improved | Wild | ||||
| Improved | 0.709 | |||||
| Wild | 0.573 | 0.652 | ||||
| b) | US | Other countries | ||||
| US | 0.683 | |||||
| Other countries | 0.671 | 0.671 | ||||
| c) | Eastern | Mid-south | Plains | Western | N/A | |
| Eastern | 0.694 | |||||
| Mid-south | 0.680 | 0.739 | ||||
| Plains | 0.659 | 0.665 | 0.682 | |||
| Western | 0.666 | 0.657 | 0.661 | 0.711 | ||
| N/A | 0.666 | 0.689 | 0.660 | 0.666 | 0.673 |
IBS values are calculated for groups based on a) 363 G. hirsutum samples, b) 292 improved G. hirsutum samples with global distribution, and c) 185 improved G. hirsutum samples from breeding regions within the United States
Fig. 2Identical by state (IBS) distributions for all pairwise sets of G. hirsutum. a improved versus wild samples, and improved samples from b US versus other countries, c between and within US regions overall, and d within US regions individually
Fig. 3Two dimensional multidimensional scaling (MDS) plot of all Gossypium samples, showing separation of improved and wild (i.e. non-cultivated) forms of G. hirsutum from other Gossypium species. Identical by state genetic similarities of 390 Gossypium samples were used in generating the MDS plot. The three labelled samples are other Gossypium species that plotted similarly to wild samples of G. hirsutum
Fig. 4Two dimensional multidimensional scaling (MDS) plots of G. hirsutum groups. a improved type G. hirsutum samples from the United States and from other countries, and b improved type G. hirsutum samples from breeding regions in the United States. Identical by state genetic similarities of 185 G. hirsutum samples from the United States (eastern, 48; mid-south, 48; plains, 43; western, 12; n/a, 34) and 107 samples from other countries were used in generating the MDS plots
Fig. 5Estimated population structure for 363 G. hirsutum samples. fastSTRUCTURE bar plots for K = 2 show a clear split between improved and wild (i.e., non-cultivated landrace) samples
Phenotypic descriptor values for eight pairs of accessions with high identical by state (IBS > 0.98) similarity
| Descriptors | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Pair | PI number | Designation | Pedigree Notes | IBS | Stem hair | Leaf hair | Leaf shape | Leaf nectaries | Bract nectaries | Boll nectaries | Boll shape |
| 1 | 566941 | MD51ne | DP90*3/MD65-11ne; | 1.0000 | moderate | moderate |
|
|
|
|
|
| 607166 | Siokra 104–90 | moderate | moderate |
|
|
|
|
| |||
| 2 | 529067 | DES 716 | 0.9999 | none | few | normal |
| present |
| round | |
| 528649 | Rowden | Sel. Bohemian | none | few | normal |
| present |
| round | ||
| 3 | 528649 | Rowden | Sel. Bohemian | 0.9880 | none | few | normal | one, main vein | present |
| round |
| 528634 | Kekchi | none | few | normal | one, main vein | present |
| round | |||
| 4 | 528634 | Kekchi | 0.9860 | none | few | normal |
| present | present | round | |
| 529067 | DES 716 | none | few | normal |
| present | present | round | |||
| 5 | 529215 | Auburn 56 | Cook 37–6/2*CKR 1//CKR 1 W | 0.9850 |
| moderate | normal |
| present | present | round |
| 528655 | Delfos 9169 |
| moderate | normal |
| present | present | round | |||
| 6 | 529565 | Deltapine 66 | DP16/DP554; | 0.9810 | hairy | moderate | normal | three | present | reduced | round |
| 528820 | Paymaster 54 | Sel. Kekchi | hairy | moderate | normal | three | present | reduced | round | ||
| 7 | 528970 | Deltapine 14 | DP 11/DP 1 | 0.9800 |
|
| normal | one, main vein | present |
| round |
| 528649 | Rowden | Sel. Bohemian |
|
| normal | one, main vein | present |
| round | ||
| 8 | 529067 | DES 716 | 0.9800 |
|
| normal | one, main vein | present | present | round | |
| 528970 | Deltapine 14 | DP 11/DP 1 |
|
| normal | one, main vein | present | present | round | ||
Seven of 26 descriptors showed differences between accessions when grown in the field at College Station, TX in 2015. Differences are highlighted in bold type
Fig. 6Comparisons of SNP and SSR principal coordinate analyses based on Jaccard’s coefficient. 192 G. hirsutum samples (123 improved and 69 wild types) from the US National Cotton Germplasm Collection were compared based on (a) 38,682 SNP loci and (b) 105 SSR loci. The 123 improved G. hirsutum samples (77 from the United States and 46 from other countries) were further independently analyzed using (c) SNP and (d) SSR loci
Fig. 7Relationship between SNP (x-axis) and SSR (y-axis) marker sets as calculated using Jaccard’s genetic similarity. a 192 G. hirsutum improved and wild samples (Mantel r = 0.798) and (b) 123 improved G. hirsutum samples grouped by global breeding region (Mantel r = 0.509). Each dot represents a pairwise comparison between samples
Fig. 8Histogram distributions and Q-Q plots of corrected p-values for seed trait association analysis. The corrected p-values are based on relatedness for population structure versus the expected p-values. GEMMA software adjusted for population structure over 26,099 SNPs for (a) seed oil content (%), (b) seed protein content (%), and (c) seed index (grams per 100 seeds)
Four SNP loci significantly related to protein content as determined by genome-wide association analysis
| SNP | Significance test |
| Genome sequence | Chromosome | Position | |
|---|---|---|---|---|---|---|
| Marker | i28873Gh | Wald test | 3.70E-06 |
| * | 0 |
| Group | TAMU | Likelihood Ratio Test | 1.11E-05 |
| Ca13 | 41,692,946 |
| Group_name | GH_TBb049I03r236 | Score Test | 4.81E-05 |
| NBI_A02 | 7,820,104 |
| Sequence | TCGATATGAACGGAAAATGCTTGCTCGTCGGTTGGAAGGGGACGCCGATGYTTTCAATTTCGGTTTGGAA | |||||
| Marker | i34975Gh | Wald test | 3.70E-06 |
| Chr05 | 7,028,098 |
| Group | TAMU | Likelihood Ratio Test | 1.11E-05 |
| Ca13 | 41,671,786 |
| Group_name | GH_TBb119J20f639 | Score Test | 4.81E-05 |
| NBI_A02 | 7,800,710 |
| Sequence | ATGGATGACAGAAATAGGACTATGATCAATCCCATCCACCGCTACTCGGTMCCTGTGTATCCAGGTACC | |||||
| Marker | i20295Gh | Wald test | 1.52E-05 |
| Chr13 | 30,556,525 |
| Group | CSIRO | Likelihood Ratio Test | 9.78E-04 |
| * | 0 |
| Group_name | Scaffold_13_30556525 | Score Test | 6.41E-03 |
| * | 0 |
| Sequence | AACGTACTAAATTCGTAGTTAGATAGTAGCCAAGGACTCACTTAAACCAACTAAAACATCAACCTATTCTAAGTTCTCATGTAACAAAAATTTAACATAAYAAACTTAGAATGCTTATAACTCGGTCTATGCTTAACCTTTTCACCTAAAACGAATTTTGTTCACCTATTTAGTCTTCTACGACTAATCATCAACCCTTAA | |||||
| Marker | i22490Gh | Wald test | 1.60E-05 |
| * | 0 |
| Group | USDA | Likelihood Ratio Test | 1.03E-03 |
| * | 0 |
| Group_name | CFB2569 | Score Test | 6.41E-03 |
| NBI_D01 | 35,251,544 |
| Sequence | TGCCGCATACTTGTGGACCACATARTCGTGTACAATTGGAAAATTAGGGATTTAGAGGAATTTTGGTGCC | |||||
Information is provided on p-values associated with three significance tests, as well as known chromosomal locations in three available Gossypium genome sequences