| Literature DB >> 23979935 |
Justin T Page1, Mark D Huynh, Zach S Liechty, Kara Grupp, David Stelly, Amanda M Hulse, Hamid Ashrafi, Allen Van Deynze, Jonathan F Wendel, Joshua A Udall.
Abstract
Understanding the composition, evolution, and function of the Gossypium hirsutum (cotton) genome is complicated by the joint presence of two genomes in its nucleus (AT and DT genomes). These two genomes were derived from progenitor A-genome and D-genome diploids involved in ancestral allopolyploidization. To better understand the allopolyploid genome, we re-sequenced the genomes of extant diploid relatives that contain the A1 (Gossypium herbaceum), A2 (Gossypium arboreum), or D5 (Gossypium raimondii) genomes. We conducted a comparative analysis using deep re-sequencing of multiple accessions of each diploid species and identified 24 million SNPs between the A-diploid and D-diploid genomes. These analyses facilitated the construction of a robust index of conserved SNPs between the A-genomes and D-genomes at all detected polymorphic loci. This index is widely applicable for read mapping efforts of other diploid and allopolyploid Gossypium accessions. Further analysis also revealed locations of putative duplications and deletions in the A-genome relative to the D-genome reference sequence. The approximately 25,400 deleted regions included more than 50% deletion of 978 genes, including many involved with starch synthesis. In the polyploid genome, we also detected 1,472 conversion events between homoeologous chromosomes, including events that overlapped 113 genes. Continued characterization of the Gossypium genomes will further enhance our ability to manipulate fiber and agronomic production of cotton.Entities:
Keywords: allopolyploid; comparative genomics; cotton fiber; molecular evolution
Mesh:
Year: 2013 PMID: 23979935 PMCID: PMC3789805 DOI: 10.1534/g3.113.007229
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Transitions and transversions in the homoeo-SNP index
| A | G | C | T | |
|---|---|---|---|---|
| A | — | 2,495,527 | 626,075 | 1,003,583 |
| G | 2,547,739 | — | 353,034 | 647,148 |
| C | 644,840 | 352,239 | — | 2,544,261 |
| T | 1,003,739 | 628,050 | 2,492,619 | — |
Rows = A allele. Columns = D allele. There was an overall transition/transversion ratio of ∼1.92 and GC fractions of 45.4% (A genome) and 45.1% (D genome).
Figure 1Plot of genes, homoeo-SNPs, duplications, deletions, and conversion events in the A-genomes, relative to the D5 reference sequence, produced by Circos (Krzywinski ). Considering the concentric circles from the outside inward, the outermost (and first) green circle indicates the location of annotated genes. The next circle (red) is a histogram of the number of homoeo-SNPs in a 1-Mbp window throughout the genome. The next two red (high-frequency) to yellow (low-frequency) circles are heat maps showing the location of duplications in the A1 and A2 genomes as compared to the D5 genome (A2 interior). The next two blue (high-frequency) to yellow (low-frequency) circles are heat maps showing the location of deletions in the A1 and A2 genomes as compared to the D5 genome (A2 interior). The final two circles show conversion events in the tetraploid G. hirsutum cv. Maxxa. The first circle shows conversion of loci to the A nucleotide on a red-to-yellow scale, whereas the innermost (and last) circle shows conversion of loci to the D nucleotide on a blue-to-yellow scale.
Summary of nine diploid WGS re-sequencing libraries that were re-sequenced in this study and additional libraries (A1_97, F1_1, and Maxxa) obtained from the SRA
| Accession | PI | Raw Pairs | Trimmed Reads | Raw Mapping % | Mapped % Using SNP Index 2.0 |
|---|---|---|---|---|---|
| A1_155 | 630024 | 385,657,228 | 761,269,884 | 65.3 | 78.0 |
| A1_73 | 485587 | 202,723,343 | 238,035,929 | 53.1 | 85.6 |
| A1_97 | 529670 | 328,713,056 | 652,350,335 | 65.0 | 77.8 |
| A2_1011 | 629339 | 412,420,252 | 816,274,495 | 58.3 | 73.8 |
| A2_255 | 615756 | 300,406,057 | 595,289,591 | 61.1 | 75.5 |
| A2_34 | 183160 | 367,844,399 | 729,370,248 | 62.3 | 76.5 |
| A2_44 | 185788 | 78,180,657 | 153,728,823 | 63.3 | 76.9 |
| A2_4 | 529707 | 343,470,023 | 686,940,046 | 48.9 | 82.4 |
| D5_2 | 530899 | 152,913,856 | 304,706,886 | 95.6 | 95.3 |
| D5_31 | 530928 | 217,334,954 | 428,323,703 | 95.8 | 95.5 |
| D5_4 | 530901 | 310,387,080 | 616,432,521 | 95.1 | 94.8 |
| D5_53 | 530950 | 188,469,224 | 375,193,268 | 96.2 | 96.0 |
| F1_1 | 530986 | 534,258,839 | 1,055,751,863 | 71.1 | 79.1 |
| Maxxa Acala | 540885 | 463,761,132 | 919,898,042 | 72.5 | 79.8 |
Amount of molecular evolution between the A and D genomes of cotton
| dN | dS | dN/dS | ||
|---|---|---|---|---|
| A | Mean | 0.0094 | 0.0276 | 0.3726 |
| n = 28,462 | Median | 0.0068 | 0.0256 | 0.2768 |
| SD | 0.0106 | 0.0225 | 0.4236 | |
| AT
| Mean | 0.0092 | 0.0266 | 0.3772 |
| n = 26,156 | Median | 0.0066 | 0.0237 | 0.2843 |
| SD | 0.0104 | 0.0228 | 0.4156 |
Figure 2Premature stop codons were found in each Gossypium genome. (A) Premature stop codons (compared to the annotations of the D-reference genome) were found in the A, AT, and DT genomes. (B) Common genes with premature stop codons in the first 90% of the gene.
Number of heterozygous loci in each accession, along with the percentage of total observable loci that were heterozygous
| Whole Genome | Genic Loci Only | Nongenic Loci Only | ||||
|---|---|---|---|---|---|---|
| Accession | n | % | n | % | n | % |
| F1_1 | 9,968,998 | 17.2 | 332,247 | 6.1 | 9,636,751 | 18.4 |
| A1_73 | 2,963,374 | 7.1 | 126,260 | 2.6 | 2,837,114 | 7.7 |
| A1_97 | 6,504,768 | 12.4 | 265,607 | 5.0 | 6,239,161 | 13.2 |
| A1_155 | 7,549,531 | 13.9 | 322,095 | 6.0 | 7,227,436 | 14.8 |
| A2_4 | 7,061,224 | 13.2 | 283,151 | 5.3 | 6,778,073 | 14.1 |
| A2_34 | 6,826,660 | 12.9 | 270,384 | 5.1 | 6,556,276 | 13.7 |
| A2_255 | 5,898,387 | 11.6 | 236,113 | 4.5 | 5,662,274 | 12.4 |
| A2_1011 | 6,878,801 | 13.1 | 252,230 | 4.9 | 6,626,571 | 14.1 |
| D5_2 | 193,418 | 0.3 | 20,536 | 0.4 | 172,882 | 0.3 |
| D5_4 | 257,399 | 0.4 | 25,370 | 0.5 | 232,029 | 0.4 |
| D5_31 | 178,290 | 0.3 | 20,723 | 0.4 | 157,567 | 0.3 |
| D5_53 | 181,224 | 0.3 | 20,665 | 0.4 | 160,559 | 0.3 |
| Maxxa.A | 4,465,088 | 9.2 | 198,477 | 3.9 | 4,266,611 | 9.8 |
| Maxxa.D | 686,674 | 1.3 | 47,861 | 1.0 | 638,813 | 1.4 |
Figure 3Neighbor-joining tree built by PHYLIP, based on SNPs between genomes. Units (as measured by the indicated scale) are percentage of represented polymorphic sites that differed between two individuals. Image rendered by Archaeopteryx (Han and Zmasek 2009).
SNPs attributable to specific areas of the phylogeny
| Genome | SNPs | Deletions |
|---|---|---|
| All A | 5,544,440 | 25,408 |
| A1 | 1,024,299 | 3809 |
| A2 | 1,152,825 | 2941 |
| AT | 1,472,900 | 5247 |
| All D | 14,601,331 | 0 |
| DT | 3,563,979 | 4518 |
As shown in Figure 3. Because of possible conversion events, it is not possible to determine how many SNPs were shared by all A or D diploids.