| Literature DB >> 26560339 |
Kathy Scienski1, Justin C Fay2, Gavin C Conant3.
Abstract
We find evidence for interlocus gene conversion in five duplicated histone genes from six yeast species. The sequences of these duplicated genes, surviving from the ancient genome duplication, show phylogenetic patterns inconsistent with the well-resolved orthology relationships inferred from a likelihood model of gene loss after the genome duplication. Instead, these paralogous genes are more closely related to each other than any is to its nearest ortholog. In addition to simulations supporting gene conversion, we also present evidence for elevated rates of radical amino acid substitutions along the branches implicated in the conversion events. As these patterns are similar to those seen in ribosomal proteins that have undergone gene conversion, we speculate that in cases where duplicated genes code for proteins that are a part of tightly interacting complexes, selection may favor the fixation of gene conversion events in order to maintain high protein identities between duplicated copies.Entities:
Keywords: gene conversion; genome duplication; histones
Mesh:
Substances:
Year: 2015 PMID: 26560339 PMCID: PMC4700949 DOI: 10.1093/gbe/evv216
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Patterns of Histone Protein Sequence Identity and Gene Phylogenies Provide Evidence for Gene Conversion among Duplicated Histones
| Gene type | Gene IDs | Dist( | Min[Dist( | ln | ln | ln | |
|---|---|---|---|---|---|---|---|
| TPHA0L01110 | 0.008 | 0.038 | −1,586 | −1,534 | −1,524 | ||
| TPHA0C02050 | |||||||
| Kpol_1031.53 | |||||||
| KAFR0C00780 | 0.0 | 0.015 | −1,357 | −1,324 | −1,316 | ||
| KAFR0F02490 | |||||||
| KNAG0K01430 | |||||||
| KAFR0C00770 | 0.030 | 0.091 | −1,294 | −1,259 | −1,258 | ||
| KAFR0F02480 | |||||||
| KNAG0K01420 | |||||||
| KAFR0C00700 | 0.0 | 0.010 | −1,251 | −1,174 | −1,146 | ||
| KAFR0A01280 | |||||||
| KNAG0J01060 | |||||||
| CAGL0C04136g | 0.0 | 0.010 | |||||
| CAGL0H09834g | |||||||
| YBR009C ( | |||||||
| NDAI0B03480 | 0.0 | 0.010 | −1,011 | −963 | −962 | ||
| NDAI0G00750 | |||||||
| NCAS0B06180 | |||||||
| NCAS0B06180 | 0.0 | 0.010 | |||||
| NCAS0G03710 | |||||||
| NDAI0B03480 |
aSaccharomyces cerevisiae histone gene name. Note that S. cerevisiae has no surviving histone duplicates from the WGD, making these names unambiguous.
bProportion of amino acid difference between the two paralogs (D1 and D2) created by WGD.
cMinimum of the proportion of amino acid difference between one of the two orthologs (D1 or D2) and the nearest homolog in its nearest species relative (O).
dln-likelihood of the full sequence alignment fit to the assumed species tree (lnLSPP), the gene conversion tree (lnLGC), or the phylogeny estimated by PhyML (lnLPhyML). See Methods for details.
eRelationship between two paralogs hypothesized to have undergone gene conversion (D1 and D2) and an assumed ortholog of D1, O (see table 2 for precise orthology inferences).
Triplet-based Relative Rate Tests Coupled to Orthology Predictions Show Evidence for Gene Conversion at Synonymous Sites of Duplicated Histones
| Gene type | Species-specific genes | Probabilities of orthology relationshipb | |||||
|---|---|---|---|---|---|---|---|
| TPHA0L01110 | >0.99 | 0.004 | 0.15 | 0.062 | |||
| TPHA0C02050 | ≈0 | 0.039 | |||||
| Kpol_1031.53 | 0.015 | 0.186 | |||||
| D1 | KAFR0C00780 | >0.99 | ≈0 | ≈0 | |||
| D2 | KAFR0F02490 | ≈0 | 0.189 | ||||
| O | KNAG0K01430 | 0.012 | 0.551 | ||||
| KAFR0C00770 | >0.99 | 0.008 | 0.08 | 0.156 | |||
| KAFR0F02480 | 0.015 | 0.098 | |||||
| KNAG0K01420 | 0.027 | 0.411 | |||||
| KAFR0C00700 | 0.97 | ≈0 | 0.07 | 0.011 | |||
| KAFR0A01280 | ≈0 | 0.124 | |||||
| KNAG0J01060 | 0.010 | 0.459 | |||||
| D1 | CAGL0C04136g | =0.97 | ≈0 | =0.24 | 0.028 | ||
| D2 | CAGL0H09834g | ≈0 | 0.001 | ||||
| O | YBR009C ( | 0.005 | 0.426 | ||||
| NDAI0B03480 | >0.99 | ≈0 | 0.20 | 0.054 | |||
| NDAI0G00750 | ≈0 | 0.076 | |||||
| NCAS0B06180 | 0.004 | 0.315 | |||||
| NCAS0B06180 | >0.99 | ≈0 | 0.21 | 0.071 | |||
| NCAS0G03710 | ≈0 | 0.088 | |||||
| NCAS0B06180 | 0.004 | 0.297 |
aSaccharomyces cerevisiae histone gene name (see table 1).
bEstimated probability of the full set of orthology relationships used for this and later analyses from POInT. Thus, of all possibe orthology relationship, what proportion of the probability is apportioned to the one described.
cUsing our triplet-based likelihood approach (Conant and Wagner 2003), we estimated for each of the three branches (corresponding to the three genes) the number of nonsynonymous (Ka) and synonymous (Ks) substitutions per site.
dP value for the hypothesis test of equal values of Ka (or Ks) for D1 and O. This condition corresponds to the hypothesis of no gene conversion: D1 and its ortholog O are equally distant from paralog D2. The test is based on a likelihood-ratio test of a null model where all values of Ka (or Ks) are free to an alternative model where the Ka (or Ks) values of D1 and O are forced to be equal. The P value was computed by comparing twice the difference in ln-likelihood to a chi-square distribution with one degree of freedom. Values shown in bold are significant at P<=0.05.
eRelationship between two paralogs hypothesized to have undergone gene conversion (D1 and D2) and the orthology of D1, O.
FFor all five examples of post-WGD GC, a tree joining the putatively gene-converted ohnologs explain the sequence data better than does the post-WGD species phylogeny. For each of the 5 loci, we simulated 1,000 sequence alignments under the presumed species tree (SPP) of figure 2 (omitting any branches where gene loss had occurred). We then analyzed those alignments under both the SPP tree and all possible GC trees. We calculated the difference in ln-likelihood between the best GC tree and the SPP tree. Thus, values greater than zero implies that the GC tree better explains the data than does the SPP tree. The proportion of simulations with a given value of the difference in ln-likelihood for the two trees is shown on the y-axis. For reference, we show the improvement in ln-likelihood seen under the GC tree for the real data with arrows.
F(A) Orthology prediction for 12 post-WGD yeasts from POInT for the genomic region around histone 4 (HHF1). WGD produced two duplicated regions, shown as the top and bottom panels. For this set of genes (gray column) there are two orthology assignments of reasonably high probability: One that makes the genes from Vanderwaltozyma polyspora, Tetrapisispora phaffii, and Tetrapisispora blattae paralogous to the nine genes in the upper panel (P = 0.90) and one that makes them orthologous (P = 0.07). Importantly, neither of these relationships contradicts the inference that gene H09834 from Candida glabrata and gene A0128 from Kazachstania africana are paralogous to the upper group of nine genes (hence P > 0.97 for that assignment). As a result, we expect the gene tree of these 11 sequences to have these 2 genes cluster outside of the other 9, as depicted in the species tree of B. Instead, the two genes in pink from C. glabrata and K. africana are each other’s closest relatives in the tree, a result only explicable under the hypothesis of gene conversion. (B) Fit of the HHF1 sequence alignment to the species tree from A under the MG/GY 94 model. (C) Fit of the HHF1 sequence alignment to a hypothesized gene conversion tree under the MG/GY 94 model. (D) Maximum-likelihood estimate of the gene tree from PhyML (see Methods) for HHF1 fit to the MG/GY 94 model.
FAn excess of radical amino acid substitutions is observed among the histones of the post-WGD yeasts, a trend that is most marked among the clades having undergone gene conversion. On the x-axis is the ratio of the rate of radical (Rr) to conservative (Rc) substitutions along all branches of the phylogeny not showing evidence of gene conversion (as estimated from our ML code, see Methods). The gray area indicates the realm of purifying selection (Rr/Rc ≤ 1.0). On the y-axis is the same statistic for the three branches showing gene conversion (e.g., the two gene converted tips and their shared ancestral branch). The line y = x indicates equal values of Rr/Rc for the two sets of branches. Points in gray with a value of 5.0 have Rc = 0 (and hence an actual ratio that is undefined).