| Literature DB >> 33863805 |
Yue Hao1, Makenzie E Mabry2, Patrick P Edger3,4, Michael Freeling5, Chunfang Zheng6, Lingling Jin7, Robert VanBuren3,8, Marivi Colle3, Hong An2, R Shawn Abrahams2, Jacob D Washburn9, Xinshuai Qi10, Kerrie Barry11, Christopher Daum11, Shengqiang Shu11, Jeremy Schmutz11,12, David Sankoff6, Michael S Barker10, Eric Lyons13,14, J Chris Pires2,15, Gavin C Conant1,16,17,18.
Abstract
The members of the tribe Brassiceae share a whole-genome triplication (WGT), and one proposed model for its formation is a two-step pair of hybridizations producing hexaploid descendants. However, evidence for this model is incomplete, and the evolutionary and functional constraints that drove evolution after the hexaploidy are even less understood. Here, we report a new genome sequence of Crambe hispanica, a species sister to most sequenced Brassiceae. Using this new genome and three others that share the hexaploidy, we traced the history of gene loss after the WGT using the Polyploidy Orthology Inference Tool (POInT). We confirm the two-step formation model and infer that there was a significant temporal gap between those two allopolyploidizations, with about a third of the gene losses from the first two subgenomes occurring before the arrival of the third. We also, for the 90,000 individual genes in our study, make parental subgenome assignments, inferring, with measured uncertainty, from which of the progenitor genomes of the allohexaploidy each gene derives. We further show that each subgenome has a statistically distinguishable rate of homoeolog losses. There is little indication of functional distinction between the three subgenomes: the individual subgenomes show no patterns of functional enrichment, no excess of shared protein-protein or metabolic interactions between their members, and no biases in their likelihood of having experienced a recent selective sweep. We propose a "mix and match" model of allopolyploidy, in which subgenome origin drives homoeolog loss propensities but where genes from different subgenomes function together without difficulty.Entities:
Mesh:
Year: 2021 PMID: 33863805 PMCID: PMC8092008 DOI: 10.1101/gr.270033.120
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Subgenome assignment and inference of gene loss after the shared WGT in four species. After the WGT, each ancestral locus could potentially expand to three gene copies, but owing to biases in the loss events, the number of surviving genes from the subgenomes are unequal. Our analyses (Results) indicate the presence of a less fractionated (LF) subgenome and two more fractionated ones (MF1 and MF2). These inferences are based on the gene losses observed across four genomes and along the phylogeny depicted. Shown here is a window of 16 post-WGT loci (of the total 14,050 such loci) in four species that share the WGT: Brassica rapa, Brassica oleracea, Crambe hispanica, and Sinapis alba. Each pillar corresponds to an ancestral locus, and the boxes represent extant genes. Pairs of genes are connected by lines if they are genomic neighbors (e.g., in synteny). The numbers above each pillar are the posterior probabilities assigned to this combination of orthology relationships relative to the other (3!)4−1 = 1295 possible orthology states. The numbers above each branch of the tree give the number of genes in each subgenome surviving to that point, with the number of gene losses in parentheses. The gene loss inferences made by POInT are probabilistic: because some gene losses cannot be definitively assigned to a single branch, the resulting loss estimates are not integers. The numbers below the branches in the first subtree are POInT's branch length estimates (αt).
Figure 2.POInT's models for inferring WGT. Five different models of post-WGT evolution and their ln-likelihoods are shown. In each model, the colored circles represent different states. The brown circle represents the triplicated state (T); the pink circles are duplicated states (D, D, and D); the blue, green, and yellow circles are three single-copy states (S for the LF subgenome, S for the MF1 subgenome, and S for the MF2 subgenome). The transition rates between states are shown above the arrows: (α) transition rate from triplicated state to duplicated states; (ασ) transition rates from duplicated states to single-copy states; (f) fractionation parameters; (β and τ) root model parameters. Red arrows connect pairs of models compared using likelihood ratio tests (Methods). In the WGT Null model, transition rates are the same across three subgenomes, modeling the scenario of no biased fractionation. In the WGT 1Dom model with the biased fractionation parameter f (0 ≤ f ≤ 1), the MF1 and MF2 subgenomes are more fractionated than LF subgenome. In the WGT 1DomG3 model, two fractionation parameters f and f were introduced, distinguishing the three subgenomes: MF2 is more fractionated than MF1, and MF1 is more fractionated than LF. The Root-spec. WGT 1DomG3 model is similar to the previous model, but with two sets of parameters, one set for the root branch and the other for the remainder of the branches. The WGT 1DomG3 + Root model is a two-step hexaploidy model created by starting each pillar in an intermediate state D. This state represents the merging of the MF1 and MF2 subgenomes as the first step of the hexaploid formation. The T, D, and D states represent the second allopolyploidy, with either no prior homoeolog losses (T) or a loss from one of the two MF subgenomes before that event (D or D).
Figure 3.Protein–protein interaction networks after the WGT. (A) The Arabidopsis PPI network at the root branch (bottom), and the same PPI network colored by the Brassica rapa gene retention status (top). The dark purple nodes represent retained triplets (Supplemental Code). (B) The PPI network partitioned by subgenome assignment at the root branch: (LF) red, 4249 nodes and 8454 edges; (MF1) green, 3379 nodes and 6442 edges; (MF2) blue, 3073 nodes and 4961 edges. (C) A subset of the PPI network where only nodes encoded by single copies genes and connected to other single-copy nodes are shown. Red nodes are from the LF subgenome, green nodes are from the MF1 subgenome, and blue nodes are from the MF2 subgenome.
Figure 4.Subgenome-specific edge counts for 100 rewired Brassica rapa coexpression networks compared to those from the actual network. (A) Distribution of the number of edges connecting pairs of B. rapa genes from the LF subgenome in 100 rewired networks. (B) Distribution of the number of edges connecting pairs of genes from the MF1 subgenome. (C) Distribution of the number of edges connecting pairs of genes from the MF2 subgenome. (D) Distribution of the number of edges connecting LF genes to MF1 genes. (E) Distribution of the number of edges connecting LF genes to MF2 genes. (F) Distribution of the number of edges connecting MF1 and MF2 genes. In each panel, the dark gray dashed line shows the number of edges with that set of subgenome assignments for the true network. See Supplemental Code.