| Literature DB >> 25837578 |
Ya Yang1, Michael J Moore2, Samuel F Brockington3, Douglas E Soltis4, Gane Ka-Shu Wong5, Eric J Carpenter6, Yong Zhang7, Li Chen7, Zhixiang Yan7, Yinlong Xie7, Rowan F Sage8, Sarah Covshoff3, Julian M Hibberd3, Matthew N Nelson9, Stephen A Smith1.
Abstract
Many phylogenomic studies based on transcriptomes have been limited to "single-copy" genes due to methodological challenges in homology and orthology inferences. Only a relatively small number of studies have explored analyses beyond reconstructing species relationships. We sampled 69 transcriptomes in the hyperdiverse plant clade Caryophyllales and 27 outgroups from annotated genomes across eudicots. Using a combined similarity- and phylogenetic tree-based approach, we recovered 10,960 homolog groups, where each was represented by at least eight ingroup taxa. By decomposing these homolog trees, and taking gene duplications into account, we obtained 17,273 ortholog groups, where each was represented by at least ten ingroup taxa. We reconstructed the species phylogeny using a 1,122-gene data set with a gene occupancy of 92.1%. From the homolog trees, we found that both synonymous and nonsynonymous substitution rates in herbaceous lineages are up to three times as fast as in their woody relatives. This is the first time such a pattern has been shown across thousands of nuclear genes with dense taxon sampling. We also pinpointed regions of the Caryophyllales tree that were characterized by relatively high frequencies of gene duplication, including three previously unrecognized whole-genome duplications. By further combining information from homolog tree topology and synonymous distance between paralog pairs, phylogenetic locations for 13 putative genome duplication events were identified. Genes that experienced the greatest gene family expansion were concentrated among those involved in signal transduction and oxidoreduction, including a cytochrome P450 gene that encodes a key enzyme in the betalain synthesis pathway. Our approach demonstrates a new approach for functional phylogenomic analysis in nonmodel species that is based on homolog groups in addition to inferred ortholog groups.Entities:
Keywords: Caryophyllales; RNA-seq; paleopolyploidy; substitution rate heterogeneity
Mesh:
Year: 2015 PMID: 25837578 PMCID: PMC4833068 DOI: 10.1093/molbev/msv081
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FSchematic outline of analyses in this study. (A) Homology and orthology inferences and (B) mapping gene duplications detected from homolog trees to the species tree.
FOrtholog groups ranked from high to low by number of taxa represented. Only ortholog groups represented by at least ten Caryophyllales taxa were shown. An ortholog group with full taxon occupancy included 95 taxa.
FSpecies tree from RAxML analysis of the 1,122-gene supermatrix. Numbers on branches indicate proportion of genes showing duplication; duplications were not investigated on unmarked branches.
FDistribution of substitution rate contrasts. Contrast values were calculated from individual homologous gene trees for five woody–herbaceous sister pairs (A–E in fig. 3). Rate contrasts were considered positive (blank) when the herbaceous (H) side possessed a faster rate than the woody (W) wide, and negative (gray) when the reverse was true.
Substitution Rate Contrasts between Woody (W) and Herbaceous (H) Sister Pairs: Without Filtering by Number of Tips.
| Synonymous Rates | Nonsynonymous Rates | |||||||
|---|---|---|---|---|---|---|---|---|
| No. of W > H | No. of W < H | Median Contrast | No. of W > H | No. of W < H | Median Contrast | |||
| Contrast A | 49 | 2,951 | <2.2e-16 | 2.227 | 68 | 2,932 | <2.2e-16 | 2.018 |
| Contrast B | 31 | 2,969 | <2.2e-16 | 1.624 | 150 | 2,850 | <2.2e-16 | 1.382 |
| Contrast C | 1,065 | 1,917 | <2.2e-16 | 0.564 | 935 | 2,008 | <2.2e-16 | 0.605 |
| Contrast D | 165 | 2,835 | <2.2e-16 | 1.176 | 336 | 2,664 | <2.2e-16 | 0.904 |
| Contrast E | 188 | 534 | <2.2e-16 | 0.366 | 231 | 491 | <2.2e-16 | 0.331 |
Note.—A random subset of 3,000 extracted Caryophyllales clades were used for each sister pair, except for contrast E, in which we analyzed all clades. Contrast values either lower than −10 or higher than 10 were excluded. P values were calculated from sign tests, assuming that the average substitution rates were equal between the woody and the herbaceous side at each sister pair.
Substitution Rate Contrasts between Woody (W) and Herbaceous (H) Sister Pairs: Values Calculated from Cases Where the Woody Lineage and the Herbaceous Lineage Had the Same Number of Tips.
| Synonymous Rates | Nonsynonymous Rates | |||||||
|---|---|---|---|---|---|---|---|---|
| No. of W > H | No. of W < H | Median Contrast | No. of W > H | No. of W < H | Median Contrast | |||
| Contrast A | 14 | 160 | <2.2e-16 | 1.950 | 22 | 152 | <2.2e-16 | 1.484 |
| Contrast B | 23 | 874 | <2.2e-16 | 1.434 | 93 | 804 | <2.2e-16 | 1.199 |
| Contrast C | 1,065 | 1,917 | <2.2e-16 | 0.564 | 935 | 2,008 | <2.2e-16 | 0.605 |
| Contrast D | 66 | 180 | 2.1e-13 | 0.715 | 84 | 162 | 7.5e-7 | 0.461 |
| Contrast E | 188 | 534 | <2.2e-16 | 0.366 | 231 | 491 | <2.2e-16 | 0.331 |
Note.—A random subset of 3,000 extracted Caryophyllales clades were used for each sister pair, except for contrast E, in which we analyzed all clades. Contrast values either lower than −10 or higher than 10 were excluded. P values were calculated from sign tests, assuming that the average substitution rates were equal between the woody and the herbaceous side at each sister pair.
Caryophyllales Clades with the Highest Number of Tips for Any Single Taxon.
| Clade ID | No. of Tips | No. of Taxa | Taxa with The Most Copies:No. of Copies | Annotation |
|---|---|---|---|---|
| cc2163-1.mm.cary | 105 | 30 | Nyctaginaceae_Anulocaulis_leiosolenus_H:18 | Probable nucleoredoxin 1-like ( |
| cc123-2.mm.1.cary | 354 | 65 | Amaranthaceae_Alternanthera_brasiliana_H:17; Caryophyllaceae_Spergularia_media_H:13 | Oxidoreductase family protein ( |
| cc42-2.mm.2.cary | 74 | 23 | Nyctaginaceae_Allionia_incarnata2_H:16; Nyctaginaceae_Allionia_incarnata_H:16 | Retrotransposon protein, putative, Ty1-copia subclass ( |
| cc504-1.mm.1.cary | 184 | 41 | Aizoaceae_Sesuvium_ventricosum_H:15; Nyctaginaceae_Allionia_incarnata2_H:13 | Mariner transposase ( |
| cc3-6.mm.1.cary | 218 | 54 | Portulacaceae_Portulaca_molokiniensis_H:15 | Contains similarity to reverse transcriptases ( |
| cc327-1.mm.3.cary | 56 | 18 | Amaranthaceae_Atriplex_rosea_H:15 | Protein FAR1-related sequence 5-like ( |
| cc1593-1.mm.1.cary | 56 | 18 | Nyctaginaceae_Boerhavia_burbidgeana_H:14 | Hypothetical protein VITISV_026753 ( |
| cc3-8.mm.cary | 56 | 18 | Nyctaginaceae_Boerhavia_burbidgeana_H:14 | Integrase core domain containing protein ( |
| cc28-1.mm.1.cary | 353 | 68 | Nyctaginaceae_Boerhavia_coccinea_H:13 | Cytochrome P450 71A25-like ( |
| cc6-7.mm.1.cary | 99 | 30 | Nyctaginaceae_Boerhavia_coccinea_H:13 | Putative reverse transcriptase ( |
| cc27-3.mm.1.cary | 279 | 62 | Amaranthaceae_Aerva_lanata_H:13 | G-type lectin S-receptor-like serine/threonine-protein kinase At1g11330-like ( |
| cc1593-1.mm.1.cary | 97 | 18 | Nyctaginaceae_Allionia_incarnata2_H:13 | Uncharacterized protein LOC100259102 ( |
| cc5-17.mm.1.cary | 352 | 59 | Nyctaginaceae_Guapira_obtusata_W:13; Amaranthaceae_Atriplex_hortensis_H:13 | NBS-LRR type resistance protein ( |
Caryophyllales Clades with the Highest Total Number of Tips.
| Clade ID | No. of Tips | No. of Taxa | Annotation |
|---|---|---|---|
| cc25-1.mm.1.cary | 361 | 68 | Cytochrome P450, family 72, subfamily A, polypeptide 15 isoform 2 ( |
| cc123-2.mm.1.cary | 354 | 65 | Oxidoreductase family protein ( |
| cc28-1.mm.1.cary | 353 | 68 | Cytochrome P450 71A25-like ( |
| cc5-17.mm.1.cary | 352 | 59 | NBS-LRR type resistance protein ( |
| cc333-1.mm.1.cary | 302 | 63 | Cytochrome P450 76C1-like ( |
| cc144-1.mm.1.cary | 287 | 68 | Peptide transporter PTR2-like ( |
| cc171-1.mm.1.cary | 285 | 61 | Zeatin O-glucosyltransferase-like ( |
| cc27-3.mm.1.cary | 279 | 62 | G-type lectin S-receptor-like serine/threonine-protein kinase At1g11330-like ( |
| cc521-1.mm.cary | 277 | 67 | GDSL esterase/lipase 1-like ( |
| cc50-2.mm.1.cary | 273 | 63 | Wall-associated receptor kinase-like 9-like ( |