| Literature DB >> 29127298 |
Christopher A Saski1, Brian E Scheffler2, Amanda M Hulse-Kemp3, Bo Liu3, Qingxin Song4, Atsumi Ando4, David M Stelly3, Jodi A Scheffler5, Jane Grimwood6, Don C Jones7, Daniel G Peterson8, Jeremy Schmutz9, Z Jeffery Chen10.
Abstract
Like those of many agricultural crops, the cultivated cotton is an allotetraploid and has a large genome (~2.5 gigabase pairs). The two sub genomes, A and D, are highly similar but unequally sized and repeat-rich, which pose significant challenges for accurate genome reconstruction using standard approaches. Here we report the development of BAC libraries, sub genome specific physical maps, and a new-generation sequencing approach that will lead to a reference-grade genome assembly for Upland cotton. Three BAC libraries were constructed, fingerprinted, and integrated with BAC-end sequences (BES) to produce a de novo whole-genome physical map. The BAC map was partitioned by sub genomes through alignment to the diploid progenitor D-genome reference sequence with densely spaced BES anchor points and computational filtering. The physical maps were validated with FISH and genetic mapping of SNP markers derived from BES. Two pairs of homeologous chromosomes, A11/D11 and A12/D12, were used to assess multiplex sequencing approaches for completeness and scalability. The results represent the first sub genome anchored physical maps of Upland cotton, and a new-generation approach to the whole-genome sequencing, which will lead to the reference-grade assembly of allopolyploid cotton and serve as a general strategy for sequencing other polyploid species.Entities:
Mesh:
Year: 2017 PMID: 29127298 PMCID: PMC5681701 DOI: 10.1038/s41598-017-14885-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
De novo physical map assembly statistics.
| De novo | A-subgenome | D-subgenome | |
|---|---|---|---|
| Total fingerprints (validated) | 92,391 | 58,485 | 33,906 |
| BACs in contigs | 82,816 | 46,014 | 33,022 |
| Average number BACS/contig | 10 | 9 | 17 |
| No. Contigs | 7,906 | 5,298 | 1,998 |
| No. Contigs (anchored) | 2,370 | 1,000 | |
| No.Contigs (unplaced) | 2,928 | 998 | |
| No. Singletons | 9,575 | 12,471 | 884 |
| Total Contig length (Mbp) | 3.1 | 1.9 | 1.1 |
| Longest contig (Mbp) | 3.8 | 3.8 | 4.3 |
| Shortest contig (Kb) | 45 | 46 | 54 |
| Mean contig length (Kb) | 396 | 369 | 558 |
| N50 contig (kb) | 308 | 287 | 419 |
| Minimal Tile Path (no. clones) | 12,389 | 8,609 |
G. hirsutum TM-1 PI pseudomolecule lengths.
|
| Length (bp) |
|---|---|
| PI A1 | 90,616,590 |
| PI A2 | 60,403,230 |
| PI A3 | 86,809,905 |
| PI A4 | 77,945,040 |
| PI A5 | 48,044,880 |
| PI A6 | 85,263,480 |
| PI A7 | 72,898,790 |
| PI A8 | 96,529,545 |
| PI A9 | 62,615,205 |
| PI A10 | 81,184,050 |
| PI A11 | 79,980,840 |
| PI A12 | 76,012,335 |
| PI A13 | 81,823,500 |
| PI D1 | 73,209,284 |
| PI D2 | 64,675,878 |
| PI D3 | 46,884,792 |
| PI D4 | 44,842,490 |
| PI D5 | 71,793,353 |
| PI D6 | 63,336,944 |
| PI D7 | 53,430,680 |
| PI D8 | 62,651,828 |
| PI D9 | 44,173,004 |
| PI D10 | 62,620,504 |
| PI D11 | 63,570,550 |
| PI D12 | 56,052,445 |
| PI D13 | 60,522,067 |
Figure 1(A) Alignment of the G. hirsutum A-sub genome physical map pseudomolecules (PI) to the Gossypium raimondii JGI reference genome assembly (Gr). Two known translocations (A2/A3 and A4/A5) in the cotton tetraploid A-sub genome are highlighted by the outer grey links on the ideograms. (B) Alignment of the G. hirsutum D-sub genome physical map pseudomolecules (PI) to the Gossypium raimondii JGI reference genome assembly (Gr). Colored ribbons connect contiguous blocks of at least 6 BAC-end sequences with at least 95% sequence identity between the BAC-map contigs and the reference genome assembly.
Figure 2(A) Alignment of the G. hirsutum inbred TM-1 A-sub genome physical map pseudomolecules (PI) to the International Cotton Sequencing Consortium draft sequence (NI). (B) Alignment of the G. hirsutum D-sub genome PI pseudomolecules to the corresponding NI draft assemblies. (C) Alignment of the G. hirsutum A-sub genome physical map pseudomolecules (PI) to the Institute of Cotton Research (BI) draft sequence. (D) Alignment of the G. hirsutum D-sub genome physical map pseudomolecules (PI) to the corresponding BI draft sequences. Colored ribbons connect BAC physical map contigs to the respective reference genome assemblies with alignment criteria of 98% identity and a cluster of at least 6 contiguous BAC end sequences.
Figure 3Relative positions of BACs from chromosomes 12 (A12) and 26 (D12) in cytological versus linkage maps. Cytological positions were determined by concomitant FISH of multiple BACs in chromosome-specific multi-BAC probe cocktails to spreads of meiotic pachytene bivalents from Gossypium hirsutum var. TM-1. Images of A12 and D12 bivalents are from two cells. White bars represents 10 µM. Linkage map positions for SNPs associated with end sequences of BACs in the respective BAC contigs are demarcated by vertical colored bars. (A) G. hirsutum bivalent and linkage group for chromosome A12. (B) G. hirsutum bivalent and linkage group for chromosome D12.
Large and small clone pool sequencing of homeologous chromosome pairs A11/D11 and A12/D12.
| Phys. Map Chromosome | No. MTP BACs | No. scaffolds | Estimated length (FPC) | Actual length (bp) | Gr. scaffold | Gr. length (Mbp) |
|---|---|---|---|---|---|---|
| PI A11 | 441 | 209 | 63,681,390 | 51,733,654 | 7 | 63.6 |
| PI D11 | 407 | 116 | 63,641,115 | 42,228,869 | 7 | 63.6 |
| PI A12 | 465 | 3244 | 76,012,335 | 40,542,802 | 8 | 57.1 |
| PI D12 | 347 | 3223 | 56,052,360 | 36,445,133 | 8 | 57.1 |
Figure 4(A) Dot plot of the pilot BAC pseudochromosome (PI A11) as aligned to the corresponding NI A11 draft pseudomolecule. (B) Dot plot of the pilot BAC pseudochromosome (PI A11) as aligned to the corresponding BI A01 draft pseudomolecule. (C) Dot plot of the pilot BAC pseudochromosome (PI D11) as aligned to the corresponding NI D11 draft pseudomolecule. (D) Dot plot of the pilot BAC pseudochromosome (PI D11) as aligned to the corresponding BI D07 draft pseudomolecule.
QTL markers aligned to BAC-based assemblies of A12/D12 and A11/D11.
| PI A12 | PI D12 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Trait | Mapping chr | Flanking Markers | Type | chr12 start | chr12 stop | Interval size | No. genes | chr26 start | chr26 stop | Interval size | No. genes |
|
| A12 | TMB2789 | SSR | 8,393,934 | 8,393,919 | 15,458,589 | 594 | 1,808,409 | 1,808,393 | 1,668,601 | 178 |
| BNL2894 | SSR | 23,852,387 | 23,852,523 | 139,792 | 139,776 | ||||||
|
| D12 | NAU2170 | SSR | 24,690,757 | 24,690,614 | 347,984 | 34 | 13,319,068 | 13,318,937 | 9,185,618 | 244 |
| NAU1231 | SSR | 25,038,865 | 25,038,741 | 22,504,660 | 22,504,686 | ||||||
|
| A12 | NAU3862 | SSR | 1,075,530 | 1,075,495 | 8,781,226 | 437 | 23,479,480 | 23,479,499 | 10,419,078 | 302 |
| MUCS0303 | SSR | 9,856,984 | 9,856,756 | 13,060,174 | 13,060,402 | ||||||
|
| A12 | NAU1278 | SSR | 18,319,569 | 18,319,490 | 298,370 | 12 | 25,986,076 | 25,986,060 | 2,039,828 | 133 |
| NAU2096 | SSR | 18,618,136 | 18,617,939 | 23,946,057 | 23,946,248 | ||||||
|
| D12 | NAU3163 | SSR | 3,913,858 | 3,913,690 | 13,928,054 | 468 | N/A | N/A | N/A | N/A |
| DPL0838 | SSR | N/A | N/A | 1,068,678 | 1,068,662 | 2,754,066 | 261 | ||||
| BNL1227 | SSR | 17,841,990 | 17,841,912 | 3,822,762 | 3,822,744 | ||||||
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| A11 | MUCS0399 | SSR | 43,018,392 | 43,018,546 | 607,856 | 11 | 24,650,439 | 24,650,455 | 1,088,232 | 21 |
| NAU3703 | SSR | 43,626,232 | 43,626,248 | 25,738,686 | 25,738,671 | ||||||
|
| A11 | CIR0316 | SSR | 768,585 | 768,601 | 9,430,332 | 669 | 26,500,308 | 26,500,290 | 7,320,774 | 155 |
| pGH648 | SSR | 10,198,935 | 10,198,917 | 33,821,100 | 33,821,082 | ||||||
|
| A11 | MUSB0827 | SSR | 26,879,919 | 26,879,751 | 23,161,628 | 335 | 7,511,327 | 7,511,497 | 8,701,630 | 528 |
| BNL3592 | SSR | 50,041,685 | 50,041,547 | 16,212,976 | 16,212,957 | ||||||
|
| D11 | par0535 | SSR | 2127872 | 2127739 | 25,098,129 | 1,107 | 23,511,066 | 23,511,310 | 1,486,163 | 48 |
| BNL2805 | SSR | 27225858 | 27226001 | 24,997,986 | 24,997,229 | ||||||
|
| D11 | TMB1871 | SSR | 29,682,832 | 29,682,853 | 10,467,767 | 219 | 4,966,397 | 4,966,377 | 3,350,416 | 290 |
| STV0067 | SSR | 19,215,050 | 19,215,065 | 8,316,798 | 8,316,813 | ||||||