| Literature DB >> 35099532 |
Justin L Conover1, Jonathan F Wendel1.
Abstract
Whole-genome duplication (polyploidization) is among the most dramatic mutational processes in nature, so understanding how natural selection differs in polyploids relative to diploids is an important goal. Population genetics theory predicts that recessive deleterious mutations accumulate faster in allopolyploids than diploids due to the masking effect of redundant gene copies, but this prediction is hitherto unconfirmed. Here, we use the cotton genus (Gossypium), which contains seven allopolyploids derived from a single polyploidization event 1-2 Million years ago, to investigate deleterious mutation accumulation. We use two methods of identifying deleterious mutations at the nucleotide and amino acid level, along with whole-genome resequencing of 43 individuals spanning six allopolyploid species and their two diploid progenitors, to demonstrate that deleterious mutations accumulate faster in allopolyploids than in their diploid progenitors. We find that, unlike what would be expected under models of demographic changes alone, strongly deleterious mutations show the biggest difference between ploidy levels, and this effect diminishes for moderately and mildly deleterious mutations. We further show that the proportion of nonsynonymous mutations that are deleterious differs between the two coresident subgenomes in the allopolyploids, suggesting that homoeologous masking acts unequally between subgenomes. Our results provide a genome-wide perspective on classic notions of the significance of gene duplication that likely are broadly applicable to allopolyploids, with implications for our understanding of the evolutionary fate of deleterious mutations. Finally, we note that some measures of selection (e.g., dN/dS, πN/πS) may be biased when species of different ploidy levels are compared.Entities:
Keywords: deleterious mutations; molecular evolution; polyploidy; purifying selection
Mesh:
Year: 2022 PMID: 35099532 PMCID: PMC8841602 DOI: 10.1093/molbev/msac024
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Phylogeny and biogeography of Gossypium allopolyploids and progenitor diploids. Diploid Gossypium species are classified into eight diploid genome groups. The A (represented by G. herbaceum) and D (represented by G. raimondii) genome groups diverged approximately 5 Mya, with ranges in different hemispheres. Allopolyploids formed approximately 1–1.6 Mya following transoceanic dispersal of an A genome ancestor (modeled by G. herbaceum [A1]) to the Americas and hybridization with a native D genome species (modeled by G. raimondii [D5]). Subsequent diversification of the new allopolyploid (AD genome) lineage led to the evolution of seven currently recognized species with a broad geographic range in the Americas and the Pacific islands. Flower and fruit morphology for each species are shown, and the island location and geographic range are indicated. Branch lengths on the phylogeny are not to scale but notable divergence times are labeled.
Fig. 2.Derived mutations and deleterious loads at three phylogenetic depths. Number of derived synonymous, nonsynonymous, and deleterious mutations in the CDS regions of 8,884 pairs of homoeologs (17,768 genes in total) in eight cotton species at three phylogenetic depths (indicated by bold branches in phylogeny in middle). For all panels, the ancestral state of each mutation was determined using three Australian cotton species as an outgroup (see Materials and Methods). The left portion of every panel indicates the At subgenome, and the right portion indicates the Dt subgenome. The deepest phylogenetic depth (top row: panels A–D) includes all derived mutations that originated since the divergence of the A and D diploid progenitors. For example, in panel (A), the blue solid circles represent the number of derived synonymous mutations that have occurred in the At subgenome (left half of panel) of AD1 (Gossypium hirsutum) since its divergence from the D5 diploid (G. raimondii), and in the Dt subgenome (right half of panel) since its divergence from A1 (G. arboreum). In this row, the diploids G. arboreum (A1, open red circle) and G. raimondii (D5, open black circle) are represented twice to show that reads mapped to either subgenome resulted in similar estimates of the number of derived mutations, indicating no genome reference bias in SNP calling. The middle row (panels E–H) shows mutations that are variable within each subgenome and its associated progenitor diploid species. For example, in panel (F), the yellow solid circles indicate the number of derived nonsynonymous mutations in the At subgenome of AD5 (G. darwinii) that have occurred since its divergence from the A1 diploid (left half of panel), and in the Dt subgenome of AD5 (G. darwinii) that have occurred since its divergence from the D5 diploid (right half of panel). The bottom row (panels I–L) shows mutations that originated postpolyploidy and are variable within the polyploids; for example, in panel (L), the purple solid circles indicated the number of derived deleterious mutations identified by BAD_Mutations that have occurred in the At subgenome of AD3 (G. tomentosum) since its divergence from the At subgenome of AD4 (G. mustelinum; left half of panel), and in the Dt subgenome of AD3 (G. tomentosum) since its divergence from the Dt subgenome of AD4 (G. mustelinum; right half of panel). Panels (A), (B), (E), (F), (I), (J): the y axis for both synonymous and nonsynonymous represents the sum of the derived allele frequencies, interpreted as the average number of derived mutations in that category in each species. Panels (C), (G), and (K): the y axis indicates the GERP Load of each species, calculated as the sum of (derived allele frequency×GERP Score) for all variant positions with GERP>0. Panels (D), (H), and (L): the y axis shows the total number of deleterious mutations in each species, calculated by BAD_Mutations with Bonferroni-corrected significance (see Materials and Methods)and indicates the average number of deleterious mutations in each species at a given phylogenetic depth. For panels (E–H), comparisons between subgenomes cannot be made because the D5 diploid is more distantly related to the D subgenome than the A1 diploid is related to the A subgenome. Therefore, we would expect a larger number of derived mutations in D than A simply due to evolutionary history rather than to polyploidization per se.
Fig. 3.Proportions of all nonsynonymous mutations that are deleterious. Rows (A), (B), and (C) summarize mutations segregating within the entire clade, within each subgenome and its respective progenitor diploid, and within each subgenome, as indicated by the bolded branches along the phylogeny at left (similar to fig. 2). Values indicate the proportion of nonsynonymous mutations that are deleterious within 8,884 homoeologous pairs (17,768 total genes) that are syntenically conserved between the two subgenomes of AD1 (Gossypium hirsutum; see Materials and Methods for filtering criteria). For example, the values in row (A) are calculated by dividing the values in figure 2 by the values in figure 2 for each species. Similar to figure 2, comparisons between subgenomes in row (B) reflect differing phylogenetic distances, not asymmetries between the subgenomes and/or their diploid progenitors.
Nucleotide Diversity (π) in 8,884 Homoeologs in Eight Gossypium Species, by Subgenome.
| Species | Species Code | At Subgenome | Dt Subgenome |
|---|---|---|---|
|
| A1 | 7.41E-04 | |
|
| D5 | 2.36E-04 | |
|
| AD1 | 6.69E-04 | 7.06E-04 |
|
| AD3 | 1.75E-04 | 1.67E-04 |
|
| AD4 | 2.64E-04 | 3.15E-04 |
|
| AD5 | 1.71E-04 | 1.60E-04 |
|
| AD6 | 7.75E-04 | 7.67E-04 |
|
| AD7 | 4.94E-05 | 5.59E-05 |
Fig. 4.Relative increase of deleterious mutations among GERP categories in polyploids compared with diploids. For mutations that originated since the divergence of each subgenome from its diploid progenitor, we plotted the relative increase in deleterious alleles across three GERP load categories: mildly deleterious (0