| Literature DB >> 22194700 |
Ernest K Lee1, Angelica Cibrian-Jaramillo, Sergios-Orestis Kolokotronis, Manpreet S Katari, Alexandros Stamatakis, Michael Ott, Joanna C Chiu, Damon P Little, Dennis Wm Stevenson, W Richard McCombie, Robert A Martienssen, Gloria Coruzzi, Rob Desalle.
Abstract
A novel result of the current research is the development and implementation of a unique functional phylogenomic approach that explores the genomic origins of seed plant diversification. We first use 22,833 sets of orthologs from the nuclear genomes of 101 genera across land plants to reconstruct their phylogenetic relationships. One of the more salient results is the resolution of some enigmatic relationships in seed plant phylogeny, such as the placement of Gnetales as sister to the rest of the gymnosperms. In using this novel phylogenomic approach, we were also able to identify overrepresented functional gene ontology categories in genes that provide positive branch support for major nodes prompting new hypotheses for genes associated with the diversification of angiosperms. For example, RNA interference (RNAi) has played a significant role in the divergence of monocots from other angiosperms, which has experimental support in Arabidopsis and rice. This analysis also implied that the second largest subunit of RNA polymerase IV and V (NRPD2) played a prominent role in the divergence of gymnosperms. This hypothesis is supported by the lack of 24nt siRNA in conifers, the maternal control of small RNA in the seeds of flowering plants, and the emergence of double fertilization in angiosperms. Our approach takes advantage of genomic data to define orthologs, reconstruct relationships, and narrow down candidate genes involved in plant evolution within a phylogenomic view of species' diversification.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22194700 PMCID: PMC3240601 DOI: 10.1371/journal.pgen.1002411
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Maximum Likelihood Phylogram of the Genus-Only Alignment of 101 Taxa with at Least 30% Representation per Partition (ML-30) Using the GTR Substitution Matrix and the CAT Model of Among-Site Rate Heterogeneity.
Taxon with genus-only label represents multiple species of the same genus. ML and MP bootstrap support percentages are color-coded. ML node values indicate the percentage of rapid bootstrap pseudoreplicates containing the nodes of the best ML tree. MP values correspond to the bootstrap proportions on the 50% majority-rule consensus tree. The red bars represent the relative number of genes per species represented in this matrix. The most-represented species in the full matrix (with respect to number of genes) is Glycine max (10,071 genes), and the least-represented species is Puccinellia tenuiflora (173 genes; see Table S1). The median number of gene partitions in which a taxon is represented is 2,071. This ML-30 tree has 101 taxa (genera), derived from 2,970 gene partitions and 1,660,883 characters. Outgroups include the ferns, Adiantum and Ceratopteris; the mosses Physcomitrella and Tortula; and the liverwort, Marchantia. The estimated GTR substitution matrix is provided in Table S5.
Figure 2Select Overrepresented GO/MIPS Categories of Genes with Positive PBS at Major Nodes.
There are statistically higher numbers of genes belonging to these GO/MIPS categories with positive support for the specific clades, implying that these genes may have special functional importance to the evolution of the corresponding clades. Only gene categories mentioned in the main text are shown. For a full list of overrepresented categories in each node see Table S3.
Figure 3Distribution Map of Overrepresented GO Terms per Node.
Each GO/MIPs category is shown in the upper axis. Color gradients show differences in proportions of these genes, with red being the category with the highest counts, light blue the least counts, and black with no match to any category. Overrepresentation is estimated on per-node basis. The reference tree is based on the MP-30 phylogenetic tree, and can be used to locate the relative position of a node represented by a heatmap row. The node numbering here corresponds to the node labels in Figure S7. Heat map constructed based on the Arabidopsis genome (source: http://noble.gs.washington.edu/prism – accessed on February 2009).