| Literature DB >> 25858960 |
Amanda M Hulse-Kemp1, Hamid Ashrafi2, Kevin Stoffel2, Xiuting Zheng3, Christopher A Saski4, Brian E Scheffler5, David D Fang6, Z Jeffrey Chen7, Allen Van Deynze2, David M Stelly8.
Abstract
A bacterial artificial chromosome library and BAC-end sequences for cultivated cotton (Gossypium hirsutum L.) have recently been developed. This report presents genome-wide single nucleotide polymorphism (SNP) mining utilizing resequencing data with BAC-end sequences as a reference by alignment of 12 G. hirsutum L. lines, one G. barbadense L. line, and one G. longicalyx Hutch and Lee line. A total of 132,262 intraspecific SNPs have been developed for G. hirsutum, whereas 223,138 and 470,631 interspecific SNPs have been developed for G. barbadense and G. longicalyx, respectively. Using a set of interspecific SNPs, 11 randomly selected and 77 SNPs that are putatively associated with the homeologous chromosome pair 12 and 26, we mapped 77 SNPs into two linkage groups representing these chromosomes, spanning a total of 236.2 cM in an interspecific F2 population (G. barbadense 3-79 × G. hirsutum TM-1). The mapping results validated the approach for reliably producing large numbers of both intraspecific and interspecific SNPs aligned to BAC-ends. This will allow for future construction of high-density integrated physical and genetic maps for cotton and other complex polyploid genomes. The methods developed will allow for future Gossypium resequencing data to be automatically genotyped for identified SNPs along the BAC-end sequence reference for anchoring sequence assemblies and comparative studies.Entities:
Keywords: BAC-derived SNPs; SNP genotyping; cotton genomics; intraspecific; resequencing
Mesh:
Substances:
Year: 2015 PMID: 25858960 PMCID: PMC4478540 DOI: 10.1534/g3.115.017749
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
KASP assay screening results for G. barbadense–derived markers
| Type | Marker Name | KASP Result | Identified in | Homeo-SNP Removal | Final Result | |||
|---|---|---|---|---|---|---|---|---|
| Iteration 1 | Iteration 2 | Iteration 1 | Iteration 2 | Iteration 1 | Iteration 2 | |||
| Not Transcriptome-Associated | GH_TBb001A07f_381 | Good | Yes | Yes | Retain | Retain | Good | Good |
| GH_TBb001A07f_486 | Bad | No | Yes | — | Remove | — | — | |
| GH_TBb001A23r_531 | Bad | No | Yes | — | Retain | — | Bad | |
| GH_TBb001A23r_614 | Good | Yes | Yes | Retain | Retain | Good | Good | |
| GH_TBb001D22r_204 | Good | Yes | Yes | Retain | Retain | Good | Good | |
| GH_TBb001D22r_445 | Bad | Yes | No | Retain | — | Bad | — | |
| GH_TBb001D22r_511 | Bad | Yes | Yes | Retain | Retain | Bad | Bad | |
| GH_TBb001B05f_180 | Good | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb001B05f_564 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb001C03f_116 | Good | No | Yes | — | Retain | — | Good | |
| GH_TBb001C03f_401 | Good | Yes | Yes | Retain | Retain | Good | Good | |
| GH_TBb001F01r_117 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb001F01r_310 | Bad | Yes | No | Remove | — | — | — | |
| GH_TBb001A17r_218 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb001F06f_303 | Bad | No | Yes | — | Remove | — | — | |
| Transcriptome-Associated | GH_TBb004J20r_76 | Good | Yes | Yes | Remove | Remove | — | — |
| GH_TBb004J20r_348 | Good | Yes | Yes | Retain | Retain | Good | Good | |
| GH_TBb053N14f_270 | Good | Yes | Yes | Retain | Retain | Good | Good | |
| Gh_TBh036B20r_583 | Good | No | Yes | — | Retain | — | Good | |
| GH_TBr162H20f_547 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb046O02r_64 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb046O02r_138 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb046O02r_418 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb069A06f_248 | Bad | Yes | No | Retain | — | Bad | — | |
| GH_TBb069A06f_304 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBh034D07r_276 | Good | Yes | Yes | Retain | Retain | Good | Good | |
| GH_TBh030P12r_332 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBb119O19f_465 | Maybe (DOM) | Yes | Yes | Retain | Retain | N/A | N/A | |
| GH_TBh055K17f_57 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBh055K17f_506 | Bad | Yes | Yes | Remove | Remove | — | — | |
| GH_TBh023O21f_162 | Good | Yes | No | Remove | Retain | — | Good | |
| GH_TBh023O21f_537 | Good | Yes | Yes | Retain | Retain | Good | Good | |
KASP assay screening results for G. barbadense–derived markers for mapping parameter optimization in CLC Genomics Workbench. Iteration-1 was performed using the following mapping parameters: 0.70 length fraction and 0.99 similarity fraction. Iteration-2 was performed using the following mapping parameters: 0.99 length fraction and 0.98 similarity fraction.
Figure 1Distance between polymorphisms developed in (A) G. hirsutum, (B) G. barbadense, and (C) G. longicalyx relative to the G. hirsutum-derived BAC-end sequences.
Figure 2Overlap of SNPs identified using G. hirsutum, G. barbadense, and G. longicalyx samples.
Distribution of SNP types identified in G. hirsutum, G. barbadense, and G. longicalyx
| Overall | ||||||||
|---|---|---|---|---|---|---|---|---|
| 132,262 | 100.0% | 187,355 | 100.0% | 450,577 | 100.0% | 770,194 | 100.0% | |
| 11,146 | 8.4% | 16,558 | 8.8% | 42,802 | 9.5% | 70,506 | 9.2% | |
| 45,444 | 34.4% | 64,420 | 34.4% | 139,410 | 30.9% | 249,274 | 32.4% | |
| 11,992 | 9.1% | 16,021 | 8.6% | 60,070 | 13.3% | 88,083 | 11.4% | |
| 6862 | 5.2% | 9188 | 4.9% | 25,420 | 5.6% | 41,470 | 5.4% | |
| 45,408 | 34.3% | 64,574 | 34.5% | 139,620 | 31.0% | 249,602 | 32.4% | |
| 11,410 | 8.6% | 16,594 | 8.9% | 43,255 | 9.6% | 71,259 | 9.3% | |
| 90,852 | 68.7% | 128,994 | 68.9% | 279,030 | 61.9% | 498,876 | 64.8% | |
| 41,410 | 31.3% | 58,361 | 31.1% | 171,547 | 38.1% | 271,318 | 35.2% | |
Description of missing data and heterozygous loci in the final VCF file for G. hirsutum, G. barbadense, and G. longicalyx samples
| Species | Sample | Genotyping Data | |||
|---|---|---|---|---|---|
| Missing (No.) | Missing (%) | Heterozygous (No.) | Heterozygous (%) | ||
| TM-1 | 2144 | 0.28% | 16,880 | 2.20% | |
| Sealand 542 | 4180 | 0.54% | 22,460 | 2.93% | |
| PD-1 | 4435 | 0.58% | 27,039 | 3.53% | |
| Paymaster HS-26 | 4450 | 0.58% | 38,201 | 4.99% | |
| M-240 RNR | 4146 | 0.54% | 30,450 | 3.97% | |
| Fibermax 832 | 4994 | 0.65% | 31,862 | 4.16% | |
| Coker 312 | 4604 | 0.60% | 21,831 | 2.85% | |
| SureGrow 747 | 4187 | 0.54% | 21,589 | 2.82% | |
| Stoneville 474 | 4058 | 0.53% | 21,554 | 2.81% | |
| Tamcot Sphinx | 5934 | 0.77% | 22,342 | 2.92% | |
| Acala Maxxa | 5159 | 0.67% | 20,207 | 2.64% | |
| TX0231 | 12,057 | 1.57% | 24,974 | 3.29% | |
| 3-79 | 43,530 | 5.65% | 20,443 | 2.81% | |
| F1-1 | 221,786 | 28.80% | 30,204 | 5.51% | |
Pairwise comparison of homozygous different genotypes between 14 resequenced samples using BCFtools command gtcheck
| TM-1 | Sealand 542 | PD-1 | Paymaster HS-26 | M-240 RNR | Fibermax 832 | Coker 312 | SureGrow 747 | Stoneville 474 | Tamcot Sphinx | Acala Maxxa | TX0231 | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TM-1 | — | 15,755 | 20,141 | 16,177 | 17,983 | 18,405 | 19,418 | 18,837 | 16,103 | 29,052 | 25,710 | 88,174 | 232,669 | 444,358 |
| Sealand 542 | 15,755 | — | 11,965 | 9062 | 10,216 | 11,711 | 12,283 | 13,938 | 11,275 | 21,305 | 17,917 | 94,229 | 231,726 | 442,608 |
| PD-1 | 20,141 | 11,965 | — | 9113 | 11,672 | 13,319 | 13,163 | 14,435 | 13,187 | 20,015 | 18,488 | 92,151 | 229,962 | 442,294 |
| Paymaster HS-26 | 16,177 | 9062 | 9113 | — | 6937 | 5562 | 7974 | 9948 | 8825 | 13,096 | 8821 | 85,791 | 224,848 | 440,842 |
| M-240 RNR | 17,983 | 10,216 | 11,672 | 6937 | — | 8865 | 13,599 | 12,779 | 10,929 | 17,726 | 16,283 | 89,815 | 228,074 | 442,033 |
| Fibermax 832 | 18,405 | 11,711 | 13,319 | 5562 | 8,865 | — | 13,462 | 13,507 | 11,668 | 15,290 | 14,310 | 89,910 | 226,029 | 441,551 |
| Coker 312 | 19,418 | 12,283 | 13,163 | 7974 | 13,599 | 13,462 | — | 12,834 | 11,891 | 21,663 | 18,365 | 94,397 | 231,818 | 442,483 |
| SureGrow 747 | 18,837 | 13,938 | 14,435 | 9948 | 12,779 | 13,507 | 12,834 | — | 7046 | 21,919 | 17,913 | 94,752 | 232,196 | 443,104 |
| Stoneville 474 | 16,103 | 11,275 | 13,187 | 8825 | 10,929 | 11,668 | 11,891 | 7046 | — | 22,235 | 17,739 | 94,936 | 232,169 | 442,949 |
| Tamcot Sphinx | 29,052 | 21,305 | 20,015 | 13,096 | 17,726 | 15,290 | 21,663 | 21,919 | 22,235 | — | 20,078 | 98,814 | 228,004 | 442,364 |
| Acala Maxxa | 25,710 | 17,917 | 18,488 | 8821 | 16,283 | 14,310 | 18,365 | 17,913 | 17,739 | 20,078 | — | 94,015 | 231,328 | 445,417 |
| TX0231 | 88,174 | 94,229 | 92,151 | 85,791 | 89,815 | 89,910 | 94,397 | 94,752 | 94,936 | 98,814 | 94,015 | — | 230,410 | 442,605 |
| 3-79 | 232,669 | 231,726 | 229,962 | 224,848 | 228,074 | 226,029 | 231,818 | 232,196 | 232,169 | 228,004 | 231,328 | 230,410 | — | 429,140 |
| F1-1 | 444,358 | 442,608 | 442,294 | 440,842 | 442,033 | 441,551 | 442,483 | 443,104 | 442,949 | 442,364 | 445,417 | 442,605 | 429,140 | — |
Annotation of BAC-derived SNPs
| Species | BES-derived SNPs Identified | ||||
|---|---|---|---|---|---|
| Genic | Nongenic | Overall | |||
| No. | % | No. | % | ||
| 10,329 | 7.81% | 121,933 | 92.19% | 132,262 | |
| 13,274 | 7.08% | 174,081 | 92.92% | 187,355 | |
| 97,366 | 21.61% | 353,211 | 78.39% | 450,577 | |
| Total | 120,969 | 15.71% | 649,225 | 84.29% | 770,194 |
Figure 3Linkage groups determined utilizing 118 interspecific (G. barbadense line 3-79 × G. hirsutum line TM-1) F2 samples for BAC-end–derived SNPs in JoinMap.
Figure 4Dendrogram produced by hierarchical clustering utilizing BAC-end sequence-associated SNPs for 12 G. hirsutum samples (TM-1, Sealand 542, PD-1, Paymaster HS-26, M-240 RNR, Fibermax 832, Coker 312, SureGrow 747, Stoneville 474, Tamcot Sphinx, Acala Maxxa, TX0231), G. barbadense (3-79), and G. longicalyx using the SNPRelate package in R.