| Literature DB >> 25908569 |
Amanda M Hulse-Kemp1, Jana Lemm2, Joerg Plieske2, Hamid Ashrafi3, Ramesh Buyyarapu4, David D Fang5, James Frelichowski6, Marc Giband7, Steve Hague8, Lori L Hinze6, Kelli J Kochan9, Penny K Riggs10, Jodi A Scheffler11, Joshua A Udall12, Mauricio Ulloa13, Shirley S Wang6, Qian-Hao Zhu14, Sumit K Bag15, Archana Bhardwaj15, John J Burke13, Robert L Byers12, Michel Claverie16, Michael A Gore17, David B Harker12, Md S Islam5, Johnie N Jenkins18, Don C Jones19, Jean-Marc Lacape16, Danny J Llewellyn14, Richard G Percy6, Alan E Pepper20, Jesse A Poland21, Krishan Mohan Rai15, Samir V Sawant15, Sunil Kumar Singh15, Andrew Spriggs14, Jen M Taylor14, Fei Wang8, Scott M Yourstone12, Xiuting Zheng8, Cindy T Lawley22, Martin W Ganal2, Allen Van Deynze3, Iain W Wilson14, David M Stelly23.
Abstract
High-throughput genotyping arrays provide a standardized resource for plant breeding communities that are useful for a breadth of applications including high-density genetic mapping, genome-wide association studies (GWAS), genomic selection (GS), complex trait dissection, and studying patterns of genomic diversity among cultivars and wild accessions. We have developed the CottonSNP63K, an Illumina Infinium array containing assays for 45,104 putative intraspecific single nucleotide polymorphism (SNP) markers for use within the cultivated cotton species Gossypium hirsutum L. and 17,954 putative interspecific SNP markers for use with crosses of other cotton species with G. hirsutum. The SNPs on the array were developed from 13 different discovery sets that represent a diverse range of G. hirsutum germplasm and five other species: G. barbadense L., G. tomentosum Nuttal × Seemann, G. mustelinum Miers × Watt, G. armourianum Kearny, and G. longicalyx J.B. Hutchinson and Lee. The array was validated with 1,156 samples to generate cluster positions to facilitate automated analysis of 38,822 polymorphic markers. Two high-density genetic maps containing a total of 22,829 SNPs were generated for two F2 mapping populations, one intraspecific and one interspecific, and 3,533 SNP markers were co-occurring in both maps. The produced intraspecific genetic map is the first saturated map that associates into 26 linkage groups corresponding to the number of cotton chromosomes for a cross between two G. hirsutum lines. The linkage maps were shown to have high levels of collinearity to the JGI G. raimondii Ulbrich reference genome sequence. The CottonSNP63K array, cluster file and associated marker sequences constitute a major new resource for the global cotton research community.Entities:
Keywords: breeding; interspecific SNPs; intraspecific SNPs; linkage analysis; recombination
Mesh:
Substances:
Year: 2015 PMID: 25908569 PMCID: PMC4478548 DOI: 10.1534/g3.115.018416
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Samples included for array validation and cluster file development
| Sample Type | No. of Samples |
|---|---|
| Inbred, | 516 |
| Inbred, | 59 |
| Intraspecific F1 | 53 |
| Intraspecific F2 | 157 |
| Intraspecific backcross | 31 |
| Intraspecific RIL | 34 |
| Inbred, | 18 |
| Interspecific F2 | 69 (49 |
| Interspecific RIL | 14 |
| Interspecific aneuploid | 21 |
| Wild tetraploid species | 4 |
| Synthetic tetraploid | 3 |
| Diploid species | 8 |
| Interspecific backcross | 146 |
| Interspecific F1 | 20 |
| Haploid | 3 |
| Total | 1156 |
A total of 49 interspecific F2 samples were not included in cluster file development but were genotyped using the resulting cluster file for inclusion in linkage mapping.
These samples were used in the cluster file development, but the cluster file is not suitable for scoring such samples because it is only optimized for tetraploid samples.
Datasets utilized in intraspecific content design on the CottonSNP63K array
| Data Set Name | Authors/Reference | Lines |
|---|---|---|
| Brigham Young University | Acala Maxxa, TX2094 | |
| CSIR-NBRI | JKC703, JKC725, JKC737, JKC770, MCU-5, LRA5166 | |
| USDA Set 1 | TM-1, NM24016 | |
| UC-Davis/TAMU GH RNA-seq | TM-1, FM832, Sealand 542, PD-1, Acala Maxxa | |
| USDA Set 2 | Acala Ultima, Pyramid, Coker 315, STV825, FM966, M-240 RNR, HS26, DP-90, SG747, PSC355, STV474 | |
| CSIRO | MCU-5, Delta Opal, Sicot 70, Siokra 1-4, DP-16, Tamcot SP37, Namcala, Riverina Poplar, LuMein 14, Sicala 3-2, Sicala 40, Sicala V-2, Sicot 81, Sicot 71, Sicot 189, Sicot F-1, Deltapine 90, Coker 315 | |
| TAMU/UC-Davis Intra Genomic Set 1 | M-240 RNR, TM-1, HS26, SG747, STV474, FM832, Sealand542, PD-1, Coker 312, Tamcot Sphinx, TX231, Acala Maxxa | |
| UC-Davis/TAMU Intra Genomic Set 2 | Ashrafi | M-240 RNR, TM-1, HS26, SG747, STV474, FM832, Sealand542, PD-1, Coker 312, Tamcot Sphinx, TX231, Acala Maxxa |
| DOW AgroSciences | DAS (unpublished) | Unreleased |
A total of 50K putative single nucleotide markers were used to produce the 45,104 intraspecific assays on the array after production. DAS, DOW AgroSciences.
Datasets utilized in inter-specific content design the CottonSNP63K array
| Data Set Name | Authors/Reference | Lines |
|---|---|---|
| UC-Davis Inter | ||
| CIRAD | ||
| TAMU/UC-Davis Inter RNA-seq | ||
| TAMU/UC-Davis Inter Genomic |
A total of 20K putative single nucleotide markers were used to produce the 17,954 inter-specific assays on the array after production.
Figure 1SNP markers shared across five species included on the CottonSNP63K array from TAMU/UC-Davis Inter RNA-seq discovery set (Hulse-Kemp ).
Figure 2Types of call frequency of SNP markers. NormTheta or relative amount of each of the two fluorophore signals is plotted on the X-axis, whereas NormR or signal intensity is plotted on the Y-axis. (A) Failed marker with call frequency = 0. (B) Call frequency 0.500–0.990 with major sample deviations. (C) Call frequency 0.990–0.999 with few uncalled samples. (D) Call frequency = 1 with all called samples. (E) Distribution of call frequencies for all SNP markers on the array.
Figure 3Classification of scorable SNP markers according to Illumina GenTrain score. NormTheta or relative amount of each of the two fluorophore signals is plotted on the X-axis, whereas NormR or signal intensity is plotted on the Y-axis. (A) Monomorphic marker. (B) Intergenomic or homeo-SNP marker. (C–F) Classification of polymorphic markers based on Illumina GenTrain score. (C) Genome-specific marker representing a single polymorphic locus with GenTrain score >0.60. (D) Marker with GenTrain score 0.30–0.59 on half the plot representing two genomes, one monomorphic and one polymorphic locus. (E) Marker with GenTrain score 0.21–0.29 representing multiple monomorphic loci and one polymorphic locus. (F) Marker with GenTrain score less than 0.20 representing many monomorphic loci and one polymorphic locus. (G) Distribution of cluster types in polymorphic markers based on GenTrain score.
Figure 4Distribution of minor allele frequencies of all polymorphic SNPs on the CottonSNP63K array. Minor allele frequencies were determined using only inbred line samples; mapping samples and other noninbred line samples used for cluser file development were excluded from this analysis.
Distribution of classified SNPs across the discovery sets and success rates of these SNPs on the CottonSNP63K array
| Data Set | SNPs on Array | Failed | Successful Assays (No.) | Success Rate (%) | |||
|---|---|---|---|---|---|---|---|
| No. | % | Monomorphic | Intergenomic | Polymorphic | |||
| Brigham Young University | 185 | 23 | 12.43 | 16 | 5 | 141 | 76.22 |
| CSIR-NBRI | 343 | 41 | 11.95 | 84 | 116 | 102 | 29.74 |
| USDA-Set1 | 104 | 10 | 9.62 | 81 | 4 | 9 | 8.65 |
| UC-Davis/TAMU GH RNA-Seq | 938 | 153 | 16.31 | 81 | 66 | 638 | 68.02 |
| USDA Set 2 | 2223 | 474 | 21.32 | 193 | 372 | 1184 | 53.26 |
| CSIRO | 17,230 | 2048 | 11.89 | 4325 | 772 | 10,085 | 58.53 |
| TAMU/UC-Davis Intra Genomic Set 1 | 23,418 | 3509 | 14.98 | 4639 | 3565 | 11,705 | 49.98 |
| UC-Davis/TAMU Intra Genomic Set 2 | 445 | 48 | 10.79 | 5 | 41 | 351 | 78.88 |
| DOW AgroSciences | 218 | 13 | 5.96 | 1 | 0 | 204 | 93.58 |
| UC-Davis Inter | 143 | 10 | 6.99 | 0 | 1 | 132 | 92.31 |
| CIRAD | 145 | 20 | 13.79 | 14 | 27 | 84 | 57.93 |
| TAMU/UC-Davis Inter RNA-Seq | 13,055 | 913 | 6.99 | 374 | 307 | 11,461 | 87.79 |
| TAMU/UC-Davis Inter Genomic | 4611 | 595 | 12.90 | 501 | 789 | 2726 | 59.12 |
| Total | 63,058 | 7857 | 12.46 | 10,314 | 6065 | 38,822 | 61.57 |
Percent similarities for technical and biological replicates of lines
| Line | Percent Similarity | Percent Residual Heterozygosity in Pools | |||
|---|---|---|---|---|---|
| Technical Replicates | Individual Plants | Line Pool | |||
| Same Seed Source | Diff. Seed Source | ||||
| TM-1 | 100.00 | — | 89.77 | — | — |
| 3-79 | 99.90 (±0.0006) | — | — | — | — |
| F1(TM-1x3-79) | 99.87 (±0.0006) | — | — | — | — |
| Coker 315 | — | 100.00 | 89.32 | 99.67 (±1.83E-5) | 0.33 |
| Delta Opal | — | 97.86 | — | 96.16 (±0.0015) | 3.51 |
| Deltapine 16 | — | 96.23 | 95.54 | 94.62 | 4.96 |
| Deltapine 90 | — | 99.74 | 95.68 | 99.47 (±0.0023) | 0.57 |
| LuMien 14 | — | 73.39 | — | 79.48 (±0.1688) | 19.40 |
| MCU-5 | — | 99.52 (±0.0020) | 66.61 | 99.42 (±0.0019) | 0.64 |
| Namcala | — | 97.17 | — | 95.92 | 2.72 |
| Riverina Poplar | — | — | — | 80.52 | 10.82 |
| Sicala 40/Fibermax 966 | — | — | 97.59 (±0.0202) | 99.25 | 0.57 |
| Sicala V-2/Fibermax 989 | — | — | — | 99.60 | 0.38 |
| Sicot 189 | — | — | — | 99.84 | 0.15 |
| Sicot 71 | — | — | — | 98.92 | 1.06 |
| Sicot 81 | — | — | — | 99.29 | 0.67 |
| Sicot F-1 | — | — | — | 92.53 | 6.63 |
| F1(TM-1x | — | 99.97 | — | — | — |
| F1(DP5690x | — | 99.87 | — | — | — |
| F1(STV474xHS26) | — | 94.58 | — | — | — |
| Acala Maxxa | — | — | 88.16 | — | — |
| Stoneville 474 | — | — | 99.36 | — | — |
| Guazuncho II | — | — | 89.19 | — | — |
| Coker 312 | — | — | 84.20 | — | — |
| Tamcot SP37 | — | — | 89.28 | — | — |
Standard deviations (SDs) are listed for comparisons with three samples; when no SD is listed, comparisons are between two samples.
Complete identity, i.e., no SD between the three samples.
Figure 5Intraspecific linkage map of 26 allotetraploid cotton chromosomes. Map determined using 93 F2 individuals from a Phytogen 72 by Stoneville 474 cross. Only one marker is listed on the right per Kosambi centiMorgan (cM) on the left, even if there were more markers co-segregating. Chromosomes are listed based on AD chromosome number.
Figure 6Inconsistencies between initial de novo interspecific map and the intraspecific map. (A) Initial plots of interspecific map order and correlation with intraspecific map show area of incorrect placement in center of the linkage group. (B) Corrected interspecific linkage group and final plots.
Figure 7Interspecific linkage map of 26 allotetraploid cotton chromosomes. Map determined using 118 F2 individuals from a G. barbadense line 3-79 by G. hirsutum genetic standard line Texas Marker -1 cross. One marker listed on the right per Kosambi centiMorgan (cM) on the left. Chromosomes are listed based on AD chromosome number.
Figure 8Frequency distribution of the number of crossovers. Numbers of crossovers detected for each F2 individual per chromosome (0 to >8) are displayed chromatically for each linkage group, which are organized by genetic size (longest at top, shortest at bottom). (A) Distribution of crossovers in the intraspecific mapping population. (B) Distribution of crossovers in the interspecific mapping population.
Figure 9Dot plot of the syntenic positions of SNP markers in the allotetraploid linkage maps vs. the JGI G. raimondii reference genome. The 26 allotetraploid chromosomes are shown on the y-axis and the 13 chromosomes of G. raimondii are shown on the x-axis. Red arrows indicate translocation events relative to G. raimondii. (A) Intraspecific linkage map displaying positions of 4521 mapped SNP in G. hirsutum with alignments to G. raimondii. (B) Interspecific linkage map (G. barbadense line 3-79 by G. hirsutum genetic standard line Texas Marker -1) displaying positions of 12,027 mapped SNP with alignments to G. raimondii.