Literature DB >> 25018760

A novel phylogeny and morphological reconstruction of the PIN genes and first phylogeny of the ACC-oxidases (ACOs).

Ronald M Clouse1, Nicola Carraro2.   

Abstract

The PIN and ACO gene families present interesting questions about the evolution of plant physiology, including testing hypotheses about the ecological drivers of their diversification and whether unrelated genes have been recruited for similar functions. The PIN-formed proteins contribute to the polar transport of auxin, a hormone which regulates plant growth and development. PIN loci are categorized into groups according to their protein length and structure, as well as subcellular localization. An interesting question with PIN genes is the nature of the ancestral form and location. ACOs are members of a superfamily of oxygenases and oxidases that catalyze the last step of ethylene synthesis, which regulates many aspects of the plant life cycle. We used publicly available PIN and ACO sequences to conduct phylogenetic analyses. Third codon positions of these genes in monocots have a high GC content, which could be historical but is more likely due to a mutational bias. Thus, we developed methods to extract phylogenetic information from nucleotide sequences while avoiding this convergent feature. One method consisted in using only A-T transformations, and another used only the first and second codon positions for serine, which can only take A or T and G or C, respectively. We also conducted tree-searches for both gene families using unaligned amino acid sequences and dynamic homology. PIN genes appear to have diversified earlier than ACOs, with monocot and dicot copies more mixed in the phylogeny. However, gymnosperm PINs appear to be derived and not closely related to those from primitive plants. We find strong support for a long PIN gene ancestor with short forms subsequently evolving one or more times. ACO genes appear to have diversified mostly since the dicot-monocot split, as most genes cluster into a small number of monocot and dicot clades when the tree is rooted by genes from mosses. Gymnosperm ACOs were recovered as closely related and derived.

Entities:  

Keywords:  1-aminocyclopropane-1-carboxylate oxidase; GC mutational bias; PIN-formed; ancestral reconstruction; auxin; ethylene synthesis

Year:  2014        PMID: 25018760      PMCID: PMC4071234          DOI: 10.3389/fpls.2014.00296

Source DB:  PubMed          Journal:  Front Plant Sci        ISSN: 1664-462X            Impact factor:   5.753


Introduction

The dramatic increase in the amount of publicly available genomic information has facilitated analyses of gene-family origins and evolution. Plant gene phylogenies have proliferated, but they are commonly made using distance matrices of amino acid sequences. This method is expected to amplify misleading information resulting from convergence (Farris, 1983; Simmons, 2000; Simmons et al., 2002), obscuring any signal of the true history. Methodological approaches to studying gene family histories need focused attention, since gene phylogenies are inherently difficult to verify. Molecular phylogenies of species can be compared to their morphology and biogeography to ask whether resulting trees are plausible. Perfect congruence is not expected, but phylogenies at extreme variance cast doubt on the quality of the methods and underlying data. Because gene copies can undergo any combination of diversification, loss, and neofunctionalization in different taxonomic lineages, nearly any sort of phylogenetic hypothesis can be the reasonable outcome of an analysis. Gene copies in the same taxon or with the same morphology would be expected to have a higher probability of being closely related, but this is not necessarily so, and it is dependent on the timing of gene family diversification events and the phylogenetic placement of progenitor copies. Here we employ phylogenetic methods designed to minimize the effects of convergence and amplify historical signal in the date. We apply these methods to the PIN-formed (PIN) auxin transporters and the ethylene-forming, ACC-oxidase enzymes (ACO). We chose to analyze these gene families because both are important plant development genes for which there are hypotheses about the timing and drivers of their diversification, but they differ in the amount of phylogenetic study thus far received. The PIN genes have been the subject of different phylogenetic analyses (Paponov et al., 2005; Křeček et al., 2009; De Smet et al., 2011; Carraro et al., 2012; Viaene et al., 2013; Bennett et al., 2014), but for ACOs no such analysis has been attempted. We thus are able to compare the results of our methods to prior analyses in PIN genes, and we lay the groundwork for a new discussion on the history of ACOs. PIN-formed proteins polarly transport the plant hormone auxin, which regulates several aspects of plant growth and development (Robert and Friml, 2009; Zazímalová et al., 2010). Since the discovery and characterization of the first PIN mutant in Arabidopsis (Okada et al., 1991), several other PINs have been identified and characterized in different plant species. In the Arabidopsis genome, there are eight PIN loci, which are categorized into groups according to their protein length and structure, as well as subcellular localization (Paponov et al., 2005; Křeček et al., 2009). The first cladistic analysis of PIN genes (Carraro et al., 2012), which was rooted by moss and lycophyte copies, suggested that PIN genes diversified mostly since the rise of land plants, around the time of the monocot-dictot split. A subsequent analysis (Viaene et al., 2013) focused on the evolution of PIN gene morphology; the preferred topology—rooted by protist, animal, bacterial genes—suggested that the moss sequence “PpPIN1D” is sister to all other PIN genes, and the morphology evolved from short forms to long. ACOs help in the synthesis of ethylene, which is a gaseous hormone under normal environmental conditions, and which regulates many aspects of the plant life cycle (Bleecker and Kende, 2000; Lin et al., 2009). In higher plants, ethylene is synthesized via two committed enzyme-catalyzed steps from S-adenosyl-L-methionine. The first step is catalyzed by 1-aminocyclopropane-1-carboxylic acid (ACC) synthase (ACS), and the second (and last) step is carried out by ACC oxidase (ACO). ACOs are members of a superfamily of oxygenases and oxidases, most of which utilize Fe (II) as a cofactor and 2-oxoglutarate (2OG) as a cosubstrate (Sato and Theologis, 1989; Bidonde et al., 1998; Wang et al., 2002). The subcellular localization of ACO proteins is prevalently cytosolic rather than membrane-bound (Chung et al., 2002; Hudgins et al., 2006; Lin et al., 2009). ACOs have high similarity throughout the protein coding sequences and expression analyses reveal that the ACO genes display a high degree of differential expression in tissues at various stages of development. A variety of plant species produce ethylene, including unicellular and multicellular algae, although angiosperms use a different biosynthetic pathway from primitive land plants and algae (Wang et al., 2002; De Paepe and Van der Straeten, 2005; Plettner et al., 2005; Yordanova et al., 2010; Wanke, 2011; Yasumura et al., 2012).

Materials and methods

Assembling of the ACO and PIN data sets

For all phylogenetic analyses, unless otherwise specified, coding sequences (CDS) were used. All taxa with publicly available sequences were included, although a random subset of all available angiosperm sequences were taken, so as to generate a manageable data set size. For the PIN data set (Table 1), reported unique sequence identifiers were used to retrieve the corresponding sequences from the Phytozome v.9.1 (www.phytozome.org) (Goodstein et al., 2012), ConGenIE (congenie.org), and Genebank (www.ncbi.nlm.nih.gov/genbank/) (Benson et al., 2005) databases. The only major plant group that was not included in the PIN data set was ferns, as previously reported PIN genes from them are not publicly available (Viaene et al., 2013). ACO sequences (Table 2) were identified from previously published studies and via queries with the BLASTn algorithm at the National Centre for Biotechnology Information (NCBI) nucleotide collection and from the Phytozome v.9.1 (Goodstein et al., 2012). Only proteins that were annotated as aminocyclopropane-carboxylate oxidases were retained. Transmembrane profiles for PIN amino acid sequences were predicted querying the TMHMM Server v.2.0 (www.cbs.dtu.dk/services/TMHMM/) (Moller et al., 2001). PIN proteins were classified (1–5) according to their length, number of transmembrane domains, and length of the central hydrophilic loop (See Table 1). Generally, caution should be exerted when classifying PIN proteins according to their number of TMDs, as those are predicted protein domains that will need final confirmation by reconstruction of the tertiary structure by crystallography. In two cases where gene sequences showed no notable differences from long forms but were predicted to have only the N-terminal trans-membrane domain (OsPIN3a and Aco018694), we coded them as long.
Table 1

.

SpeciesGene codeTerminal nameProtein (aa)TMDsTypeScore
Aquilegia caeruleaAcoGoldSmith_v1.001931m.gAco0019316549Long1
AcoGoldSmith_v1.007499m.gAco0074993569Short2
AcoGoldSmith_v1.016169m.gAco0161696209Long1
AcoGoldSmith_v1.018139m.gAco0181396419Long1
AcoGoldSmith_v1.018694m.gAco0186946125Long1
Arabidopsis thalianaAT1G73590AtPIN16229Long1
AT5G57090AtPIN26479Long1
AT1G70940AtPIN36409Long1
AT2G01420AtPIN461610Long1
AT5G16530AtPIN53519Short2
AT1G77110AtPIN65709Reduced3
AT1G23080AtPIN76199Long1
AT5G15100AtPIN83678Short2
Citrus sinensisOrange1.1g006199m.gCsi_g00619965710Long1
Orange1.1g007420m.gCsi_g0074206048Long1
Orange1.1g007826m.gCsi_g0078265888Long1
Orange1.1g018360m.gCsi_g0183603578Short2
Orange1.1g019021m.gCsi_g0190213479Short2
Orange1.1g035534m.gCsi_g0355342915N-terminal TMD only5
Orange1.1g036474m.gCsi_g0364746469Long1
Orange1.1g041301m.gCsi_g0413012917Short2
Orange1.1g048649m.gCsi_g0486492565N-terminal TMD only5
Eucalyptus grandisEucgr.A02229EgrA02229_15998Long1
Eucgr.B00948EgrB00948_15879Long1
Eucgr.B01403EgrB01403_13659Short2
Eucgr.B01405EgrB01405_13649Short2
Eucgr.B01406EgrB01406_12857Short2
Eucgr.B02902EgrB02902_16579Long1
Eucgr.C00078EgrC00078_16269Long1
Eucgr.F04265EgrF04265_15308Reduced3
Eucgr.G02187EgrG02187_16529Long1
Eucgr.G02548EgrG02548_13389Short2
Eucgr.G02549EgrG02549_13609Short2
Eucgr.H01382EgrH01382_12626Short2
Eucgr.H01390EgrH01390_15197Long1
Eucgr.01919EgrI01919_13568Short2
Eucgr.K02271EgrK02271_15989Long1
Manihot esculentaCassava4.1_003367m.gMes0033676469Long1
Cassava4.1_003794m.gMes0037946148Long1
Cassava4.1_003807m.gMes0038076148Long1
Cassava4.1_006998m.gMes0069984689Long1
Cassava4.1_010607m.gMes0106073579Short2
Cassava4.1_010688m.gMes0106883549Short2
Cassava4.1_026579m.gMes0265795987Long1
Cassava4.1_029063m.gMes0290633618Short2
Cassava4.1_029078m.gMes0290786269Long1
Cassava4.1_030090m.gMes0300903805N-terminal TMD only5
Cassava4.1_033391m.gMes0333913558Short2
Medicago truncatulaMedtr2g043210Mtr2g0432103154N-terminal TMD only5
Medtr4g154810Mtr4g1548105248Long1
Medtr6g083450Mtr6g08345065910Long1
Medtr7g008720Mtr7g0087203578Short2
Medtr7g089430Mtr7g0894303639Short2
Medtr7g106430Mtr7g1064305918Long1
Medtr8g130020Mtr8g1300206259Long1
Medtr8g130040Mtr8g13004056810Long1
Medtr4g084870Mtr4g08487065910Long1
MtrAAT48627MtrAAT486275279Long1
MtrAY115838MtrAY11583862110Long1
Oryza sativaLOC_Os06g12610OsPIN1a59510Long1
LOC_Os02g50960OsPIN1b5549Reduced3
LOC_Os11g04190OsPIN1c59210Long1
LOC_Os12g04000OsPIN1d3904C-terminal TMD only4
LOC_Os06g44970OsPIN26309Long1
LOC_Os01g45550OsPIN3a6705Long1
LOC_Os05g50140OsPIN3b59110Long1
LOC_Os01g69070OsPIN5a3637Short2
LOC_Os08g41720OsPIN5b3987Short2
LOC_Os09g32770OsPIN5c3577Short2
LOC_Os01g51780OsPIN83115Short2
LOC_Os01g58860OsPIN942610Reuced3
Physcomitrella patensPp1s10_17V6.1PpPIN1A7139Long1
Pp1s18_186V6.1PpPIN1B7139Long1
Pp1s32_43V6.1PpPIN1C6989Long1
Picea abiesFJ031883.2PaPIN169910Long1
MA_61553g0010PaPIN24263C-terminal TMD only4
MA_69724g0010PaPIN36258Long1
Populus tomentosaAAP59843.1PtoPIN16197Long1
Populus tremula × tremuloidesAF190881.1PttPIN16149Long1
AF515435.1PttPIN26409Long1
AF515434.1PttPIN35889Long1
Populus trichocarpaPOPTR_0015s04570PtrPIN16149Long1
POPTR_0016s03450PtrPIN25888Long1
POPTR_0010s12320PtrPIN36459Long1
POPTR_0005s20990PtrPIN45349Long1
POPTR_0002s07310PtrPIN55328Long1
POPTR_0008s12830PtrPIN66499Long1
POPTR_0012s04470PtrPIN76099Long1
POPTR_0006s03540PtrPIN85879Long1
POPTR_0018s13610PtrPIN96339Long1
POPTR_0001s21230PtrPIN1054710Long1
POPTR_0013s08510PtrPIN113469Short2
POPTR_0019s07990PtrPIN1234610Short2
POPTR_0004s12310PtrPIN133558Short2
POPTR_0017s11440PtrPIN143588Short2
POPTR_0014s14390aPtrPIN153708Short2
POPTR_0014s14390aPtrPIN163046Short2
Prunus persicumppa002528m.gPpe00252866210Long1
ppa002944m.gPpe0029446199Long1
ppa003159m.gPpe0031595978Long1
ppa007621m.gPpe0076213619Short2
ppa021573m.gPpe0215733577Short2
ppa022797m.gPpe0227975509Long1
ppa024134m.gPpe0241346499Long1
ppa025174m.gPpe0251746028Long1
Ricinus communisRco27985.t000045Rc279855448Long1
Rco29662.t000026Rc296626358Long1
Rco29816.t000014Rc2981664610Long1
Rco29822.t000149Rc298223137Short2
Rco30128.t000486Rc301283579Short2
Rco30180.t000054Rc301806138Long1
Selaginella moellendorrfii234325SmPIN1-16259Long1
XM_002990455.1SmPIN1-26179Long1
102666SmPIN2-16029Long1
XM_002977411.1SmPIN2-27169Long1
99301SmPIN3-16699Long1
XM_002976656.1SmPIN3-26729Long1
119024SmPIN4-16879Long1
231064SmPIN5-16369Long1
268490SmPIN5-26259Long1
Sorghum bicolorSb02g029210SbPIN13719Short2
Sb03g029320SbPIN26538Long1
Sb03g032850SbPIN33627Short2
Sb03g037350SbPIN444410Reduced3
Sb03g043960SbPIN53367Short2
Sb04g028170SbPIN660510Long1
Sb05g002150SbPIN75839Long1
Sb07g026370SbPIN84029Short2
Sb10g004430SbPIN960010Long1
Sb10g008290SbPIN106069Long1
Sb10g026300SbPIN116269Long1
Vitis viniferaGSVIVG01025748001VvPIN1a6048Long1
GSVIVG01025749001VvPIN1b5918Long1
GSVIVG01029266001VvPIN263010Long1
GSVIVG01019110001VvPIN5b3619Short2
GSVIVG01019126001VvPIN5a3569Short2
GSVIVG01010025001VvPIN65329Reduced3
GSVIVG01033005001VvPIN83578Short2
GSVIVG01031663001VvPIN94638Reuced3
Zea maysGRMZM2G098643ZmPIN1a6019Long1
GRMZM2G074267ZmPIN1b5958Long1
GRMZM2G149184ZmPIN1c5978Long1
GRMZM2G171702_T01ZmPIN1d5808Long1
JQ421085.1ZmPIN26269Long1
GRMZM2G025742ZmPIN5a3829Short2
GRMZM2G148648ZmPIN5b3857Short2
GRMZM2G040911ZmPIN5c3657Short2
GRMZM5G839411ZmPIN83597Short2
GRMZM5G859099ZmPIN943310Short2
GRMZM2G126260ZmPIN10a6108Long1
GRMZM2G160496ZmPIN10b5818Long1

These genes are distinct in GenBank but they retrieve the same entry in the phytozome database (www.phytozome.org).

Table 2

.

SpeciesGene codeTerminal name
Arabidopsis thalianaAT1G62380AtACO2
AT1G12010AtACO3
AT1G05010AtACO4
AT1G77330AtACO5
AT1G03400AtACO6
AT2G25450AtACO7
AT2G30830AtACO8
AT2G30840AtACO9
AT3G47190AtACO10
AT3G61400AtACO11
AT5G43440AtACO12
AT5G43450AtACO13
Carica papayaevm.model.supercontig_132.27CpACO1
evm.model.supercontig_64.148CpACO2
Eucalyptus grandisEucgr.K00740EgK00740
Eucgr.K00750EgK00750
Eucgr.K00746EgK00746
Eucgr.K00749EgK00749
Eucgr.K00747EgK00747
Eucgr.C00906EgC00906
Eucgr.C03886EgC03886
Eucgr.F03839EgF03839
Glycine maxGlyma07g39420Gm07g39420
Glyma17g01330Gm17g01330
Glyma09g01110Gm09g01110
Glyma15g11930Gm15g11930
Glyma14g05390Gm14g05390
Glyma02g43560Gm02g43560
Glyma06g12340Gm06g12340
Glyma04g42460Gm04g42460
Glyma05g36310Gm05g36310
Glyma07g15480Gm07g15480
Glyma08g03310Gm08g03310
Gossipium raimondiiGorai.010G184900GrACO1
Gorai.009G182300GrACO2
Gorai.004G062100GrACO3
Gorai.007G170100GrACO4
Gorai.001G096300GrACO5
Gorai.001G096400GrACO6
Gorai.001G011100GrACO7
Gorai.013G107500GrACO8
Malus domesticaMDP0000195885MdACO1
MDP0000200737MdACO2
MDP0000725984MdACO3
MDP0000251295MdACO4
MDP0000453114MdACO5
MDP0000025650MdACO6
MDP0000200896MdACO7
Oryza sativaLOC_Os02g53180Os02g53180
LOC_Os09g27750Os09g27750
LOC_Os05g05680Os05g05680
LOC_Os05g05670Os05g05670
LOC_Os01g39860Os01g39860
LOC_Os11g08380Os11g08380
LOC_Os06g37590Os06g37590
LOC_Os09g27820Os09g27820
Physcomitrella patensPp1s191_95V6PpACO1
Pp1s50_26V6PpACO2
Pp1s50_26V6PpACO3
Pp1s180_67V6PpACO4
Pp1s327_42V6PpACO5
Picea abiesMA_2297g0010PaACO1
MA_9554510g0010PaACO2
MA_10431299g0010PaACO3
MA_10437223g0010PaACO4
MA_54476g0010PaACO5
Picea glaucaDQ480741PgACO1
Picea sitchensisDQ480740PsiACO1
ABR17770PsiACO2
Pinus pinasterCBL95267PpiACO1
Pinus taedaGQ258776PtdaACO1
GQ258775PtdaACO2
GQ258774PtdaACO3
Pisum sativumAB128037PsACO1
Populus trichocarpaPotri002G224100PtACO1
Potri004G003000PtACO2
Potri011G020900PtACO3
Potri014G159000PtACO4
Potri.002G078600PtACO5
Potri.005G182700PtACO6
Potri.006G151600PtACO7
Pseudotsuga menziesiiABF20554PsmACO1
Selaginella moellendorffii116993SmACO1
407386SmACO2
169250SmACO3
228878SmACO4
117056SmACO5
402706SmACO6
Solanum lycopersicumSolyc06g060070SlACO1
Solyc12g005940SlACO2
Solyc07g049550SlACO3
Solyc07g026650SlACO4
Solyc07g049530SlACO5
Solyc02g081190SlACO6
Solyc02g036350SlACO7
Sorghum bicolorSb02g026280Sb02g026280
Sb05g005710Sb05g005710
Sb05g005720Sb05g005720
Sb09g003790Sb09g003790
Sb09g003800Sb09g003800
Sb10g022640Sb10g022640
Sb04g034520Sb04g034520
Zea maysGRMZM2G052422Zm2G052422
GRMZM2G072529Zm2G072529
GRMZM2G126732_T02Zm2G126732
GRMZM2G164883Zm2G164883
GRMZM2G166616Zm2G166616
GRMZM2G166639_T01Zm2G166639_T01
GRMZM2G166639_T02Zm2G166639_T02
GRMZM2G332423Zm2G332423
. These genes are distinct in GenBank but they retrieve the same entry in the phytozome database (www.phytozome.org). .

Alignment and phylogenetic analysis

The moss PIN gene “PpPIN1D” (Gene Code “Pp1s79_126V6.1”) was excluded from analysis due to suspicion of it being a pseudogene; although not the most distant gene in the data set, it has only about half the number of nucleotides relative to other moss PIN genes (with the gaps appearing in the middle), and in preliminary phylogenies it was recovered on a very long branch (usually twice as long as its sister). This can also be seen in our previous phylogeny of PIN genes (Figure 2A in Carraro et al., 2012). Likewise, we did not include purported PIN homologs from non-plants (Viaene et al., 2013), as we had no evidence for their homology. We aligned the amino acids with ClustalW2 (Larkin et al., 2007) in SeaView (Gouy et al., 2010) (-gapopen parameter set to 15) and then back-translated the amino acids to nucleotides (Figures 1A,B). The resulting nucleotide alignments for the PIN and ACO data sets are available as supplemental material. SeaView retains original nucleotide data through the process of amino acid alignment and thus allows an accurate back-translation after alignment. SeaView also has the alignment program MAFFT (Katoh et al., 2009) implemented, but, with the -gapopen parameter defined, it failed to run with our data set. Clustal misaligned one of the short genes, which was fixed by hand, although tree-searching on the uncorrected version of the alignment resulted in nearly identical topologies. We noticed that monocots tended to have a very high GC content, especially in the third codon position (Figure 2). Thus, using the program Mega v. 4.0.2 (Tamura et al., 2007), we removed the third codon positions of these nucleotide alignments, and in the program BioEdit (Hall, 2007) we replaced G and C with N (Figures 1C,D). Also in BioEdit, we translated nucleotide alignments and replaced all amino acids but Serine with N (Figure 1G); this required several steps, as BioEdit uses a default codon translation for amino acids in back-translation. We also used the program Mesquite (Maddison and Maddison, 2009) to convert fasta files into files that can be read by the phylogenetic program TNT (Goloboff et al., 2008). (Mesquite changes N to “?” automatically when generating TNT files and so these were returned to N using the search-and-replace function in BioEdit.) We used TNT because it clearly reads N as “any nucleotide,” preserving the gap information. We analyzed the same alignments under likelihood bootstrap in the program RAxML (Stamatakis et al., 2008) on the Cipres (Miller et al., 2010) computing cluster. RAxML does not recognize gaps and treats them and N as simply missing data.
Figure 1

Different alignments used in this study. Alignments began by translating coding sequences and aligning them with ClustalW2 (“gapopen” option set to 15), and then back-translating them into nucleotides (A,B). From these only the 1st and 2nd codon positions were taken (C), and then all G and C nucleotides converted to “N” (D), which is read as “any nucleotide” in the phylogenetic program TNT. This alignment then had gappy regions removed by the program Gblocks (E) or had all of its GC bases and gaps converted to “?” (F), which is read as “any nucleotide or gap” in TNT. The alignment in B also had all codon positions except those coding for Serine converted to “N”, then the 3rd codon positons removed (G). This also had all Ns and gaps converted to “?” (H).

Figure 2

Frequencies of nucleotides (T, C, A, and G) at each codon position (1–3) for 151 . Bars represent 95% confidence intervals.

Different alignments used in this study. Alignments began by translating coding sequences and aligning them with ClustalW2 (“gapopen” option set to 15), and then back-translating them into nucleotides (A,B). From these only the 1st and 2nd codon positions were taken (C), and then all G and C nucleotides converted to “N” (D), which is read as “any nucleotide” in the phylogenetic program TNT. This alignment then had gappy regions removed by the program Gblocks (E) or had all of its GC bases and gaps converted to “?” (F), which is read as “any nucleotide or gap” in TNT. The alignment in B also had all codon positions except those coding for Serine converted to “N”, then the 3rd codon positons removed (G). This also had all Ns and gaps converted to “?” (H). Frequencies of nucleotides (T, C, A, and G) at each codon position (1–3) for 151 . Bars represent 95% confidence intervals. For searches in TNT, we used the “new technology” function (which combines several strategies for exploring tree space) with 100 initial builds, and we followed with calculations of bootstrap support using 1000 pseudoreplicates. We took the strict consensus of the shortest trees, which is reported here. For searches of amino acids in POY, we started with 100 Wagner tree builds and conducted SPR and TBR swapping, selecting the shortest trees (with zero-length branches collapsed) and reporting the strict consensus. Bootstrap support, which is a measure of the redundancy of signal for optimal clades, was not expected to be high, given the nature of these analyses. Single-gene phylogenies with large portions of their content removed to avoid convergence are unlikely to contain enough information to support every clade in resampling analyses, but we do report resampling support for the largest alignments (all 1st and 2nd codon positions, with GC replaced by N). To mimic the way RAxML treats gaps and missing data, we also replaced all Ns and gaps with “?” (Figures 1F,H) and reran them in TNT, which reads “?” as “any nucleotide or gap.” We also removed gappy regions in the program Gblocks (Castresana, 2000) (Figure 1E), replacing the remaining gaps (which were allowed in half of the positions) with N. We also searched the unaligned amino acid sequences in the phylogenetic program POY v. 4 and 5 (Varón et al., 2009), which optimizes the multiple sequence alignment and tree searching simultaneously. When completed, PIN trees were uploaded to Mesquite with a character matrix of their protein lengths (Figure 3; coded as 1–5), and parsimony ancestral reconstructions traced over the trees. Characters were treated as unordered.
Figure 3

The different morphologies postulated for PIN proteins. PIN proteins are classified into 5 groups according to their length and structure. PINs with two complete TMDs and a long (A), reduced (B), or short (C) central hydrophylic loop. PINs with reduced protein length and presence of a TMD at the C-terminal (D), or N-Terminal end (E) only.

The different morphologies postulated for PIN proteins. PIN proteins are classified into 5 groups according to their length and structure. PINs with two complete TMDs and a long (A), reduced (B), or short (C) central hydrophylic loop. PINs with reduced protein length and presence of a TMD at the C-terminal (D), or N-Terminal end (E) only.

Results

PIN phylogenies recovered from a variety of sequence alignments and under parsimony or likelihood reconstructed the evolution of these genes from long to short (Figures 4, 5). The phylogeny recovered with full gap information under parsimony showed a clearer evolution from long, through intermediate, and to short forms, and the short versions were recovered as two clades under likelihood and parsimony when gaps were replaced by “?” or gappy regions removed by Gblocks (Figures 4B, 5A,B). Although several smaller clades remained stable throughout the analysis, and the moss and most of the lycophyte sequences tended to remain as sister to the remaining PIN genes, the relationships among the major lineages were generally unresolved. Genes having the trans-membrane domain only on the C-terminal end appeared to have evolved from long-form PINs, possibly twice, and those having this domain only on the N-terminal appear to have evolved from short-form genes, perhaps more than once. Monocot PINs were recovered mostly in small stable clades mixed among the dicot genes. Lycophyte genes were monophyletic under likelihood and under parsimony when the alignment was trimmed by Gblocks (Figure 5B); otherwise they tended to form a paraphyletic grade at the base of the tree, near the Pyscomitrella patens (moss) genes, or several small clades, only some of which were near the moss genes. Gymnosperm genes were found monophyletic only under likelihood (Figure 4B), and at the base of the tree, after the lycophytes, under parsimony (Figure 4A).
Figure 4

Phylogeny of the PIN genes, using only the 1st and 2nd codon positions and with al Gs and Cs converted to “N.” The strict consensus of 85 equally parsimonious trees (A), and the most likely tree under the GTR model (B). Branches are colored according to gene morphology, with parsimony-based historical reconstructions in both trees; equally parsimonious ancestral reconstructions are shown by multi-colored branches. Terminals are colored according to plant taxon, with icons used as guides. Clades recovered under the likelihood optimality criterion (B) which were also recovered under parsimony (A) are noted with the letter “p.”

Figure 5

Strict consensuses of equally parsimonious trees using various alignments of the PIN gene data set with only 1st and 2nd codon positions. With Gs and Cs removed, the influence of gaps was minimized so as to mimic how gaps and Ns are treated in likelihood programs by replacing them with “?” (A). Alternatively, the alignment had gappy regions removed using the program Gblocks, and then all Gs, Cs, and remaining gaps replaced by “?” (B). A Serine-only alignment was also used, with only 1st and 2nd positions and all gaps and non-Serine postions replaced with “?” (C).

Phylogeny of the PIN genes, using only the 1st and 2nd codon positions and with al Gs and Cs converted to “N.” The strict consensus of 85 equally parsimonious trees (A), and the most likely tree under the GTR model (B). Branches are colored according to gene morphology, with parsimony-based historical reconstructions in both trees; equally parsimonious ancestral reconstructions are shown by multi-colored branches. Terminals are colored according to plant taxon, with icons used as guides. Clades recovered under the likelihood optimality criterion (B) which were also recovered under parsimony (A) are noted with the letter “p.” Strict consensuses of equally parsimonious trees using various alignments of the PIN gene data set with only 1st and 2nd codon positions. With Gs and Cs removed, the influence of gaps was minimized so as to mimic how gaps and Ns are treated in likelihood programs by replacing them with “?” (A). Alternatively, the alignment had gappy regions removed using the program Gblocks, and then all Gs, Cs, and remaining gaps replaced by “?” (B). A Serine-only alignment was also used, with only 1st and 2nd positions and all gaps and non-Serine postions replaced with “?” (C). ACO genes clustered more distinctly by taxonomic group, under both parsimony and likelihood, except for the moss and lycophyte genes, which were mixed at the base of the tree (Figure 6). Gymnosperm ACOs were recovered as closely related but not near the basal plants. Under parsimony gymnosperms ACOs were recovered as monophyletic, with the exception of the Norway spruce gene PaABO4, and under likelihood they were recovered in two clades that formed a paraphyletic grade between two angiosperm diversifications. In both analysis, ACOs of monocotyledonous species (Oryza sativa, Sorghum bicolor, and Zea mays) cluster into three groups of closely related copies; under likelihood these groups are each monophyletic. Genes resulting from duplications are recovered as closest in both analysis (for example Sb05g005710 and Sb05g005720). ACOs from dicotyledonous species form several clades or paraphyletic grades, each containing copies from a mixture of species, except one which contains exclusively Arabidopsis sequences. As in monocots, dicot species that underwent relatively recent whole genome duplications (Malus domestica, Populus thricocarpa) present tightly related gene copies with high sequence similarity.
Figure 6

Phylogeny of the The strict consensus of 42 equally parsimonious trees is shown in (A), and the most likely tree under the GTR model is shown in (B). Terminals are colored according to plant taxon. Clades recovered under the likelihood optimality criterion (B) which were also recovered under parsimony (A) are noted with the letter “p.” Scale bar equals per-site mutation probability.

Phylogeny of the The strict consensus of 42 equally parsimonious trees is shown in (A), and the most likely tree under the GTR model is shown in (B). Terminals are colored according to plant taxon. Clades recovered under the likelihood optimality criterion (B) which were also recovered under parsimony (A) are noted with the letter “p.” Scale bar equals per-site mutation probability. Trees recovered under dynamic homology using amino acids were less organized by taxonomic group than those recovered using the nucleotide alignments above (Figure 7). With PIN genes, copies from the moss P. patens remained sister to all other PINs, but the lycophyte copies were recovered throughout the rest of the tree. This would make any ancestral reconstruction of gene length ambiguous, as the lycophyte copies are classified as long. With the ACOs, the mixing of gene copies by taxonomic group occurred mostly at the base of the tree, although it was more complete.
Figure 7

The most parsimonious reconstructions of the . Terminals are colored according to plant taxon.

The most parsimonious reconstructions of the . Terminals are colored according to plant taxon. Tree lengths, numbers, and likelihoods are provided in Table 3. The full alignment of 1st and 2nd codon positions, with all GC content replaced by N resulted in a large number of equally parsimonious trees for PINs and ACOs, mostly due to equally parsimonious resolutions among the main lineages in PINs, and among members of small, derived clades in ACOs. These unstable regions of each gene family's phylogeny also received little bootstrap support (Figures S1, S2).
Table 3

The number of most parsimonious trees found and their lengths, or the log likelihoods, for each of the alignments searched.

Gene familyAlignment and figure*No. treesLengthsLikelihood
PIND (Figure 4A)8511,745
D (Figure 4B)−12279.865284
F (Figure 5A)12960
E (Figure 5B)2843
H (Figure 5C)1494
A (Figure 7A)122,706
ACOD (Figure 6A)424215
D (Figure 6B)−6064.187817
A (Figure 7B)18688

Alignment codes (D, F, H, etc.) follow those illustrated in Figure 1, and the figure showing the resulting tree is noted.

Alignments, illustrated in Figure 1:

(A) Amino acids.

(D) G and C replaced by N.

(E) Gappy regions removed by g-blocks, remaining gaps replaced by “?.”

(F) G, C, and gaps replaced by “?.”

(H) Non-Serine and gaps replaced by “?.”

The number of most parsimonious trees found and their lengths, or the log likelihoods, for each of the alignments searched. Alignment codes (D, F, H, etc.) follow those illustrated in Figure 1, and the figure showing the resulting tree is noted. Alignments, illustrated in Figure 1: (A) Amino acids. (D) G and C replaced by N. (E) Gappy regions removed by g-blocks, remaining gaps replaced by “?.” (F) G, C, and gaps replaced by “?.” (H) Non-Serine and gaps replaced by “?.”

Discussion

We find strong support for shorter PINs evolving during the diversification of angiosperms, and the basal plants (here represented by the moss Physcomitrella patens and lycophyte Selaginella moellendorfii) retaining only long PINs. Whether originating more than once or not, short PINs have evolved more recently and their number increased in monocots and dicots following genome duplications in species such as Oryza Sativa, Zea mays, Populus trichocarpa. This finding contradicts that of Viaene et al. (2013). The difference results from how our phylogenies are rooted, and rooting rests homology assessments, alignment methods, the way phylogenetic programs handle missing data, and assumptions about whether primitive plants are more likely to retain primitive gene copies. There are several tests for homology (Patterson, 1988), and the only one presently available for analyzing gene families is similarity; thus for highly dissimilar sequences some justification should be offered for their inclusion. To illustrate how divergent the outgroups of Viaene et al. (2013) are, their supposed PIN homolog from ants has a shorter uncorrected p-distance relative to the ingroups when reversed than in its original direction. Homology for the animal sequences cannot be justified based on auxin transport, since animals do not have auxin, so it seems likely such sequences are simply not homologous. For such divergent sequences, even if homologous, they will generate large gappy sections in the alignment, only some of which can be removed by hand, and they will likely root randomly. Another source of alignment difficulties is the partial nature of the algal PIN sequences which Viaene et al. (2013) took from ESTs. If the phylogenetic programs they employed treated gaps and missing data the same, algal sequences among the outgroups would be inclined to pull short PIN genes to the base of the tree. Indeed, it is not surprising that non-homologous and partial sequences in the outgroups attracted the strange and truncated PpPIN1D sequence to the outside of the ingroups—more through shared exclusion than similarity—thus strengthening the appearance of a short-to-long evolution of PIN genes. In the absence of a suitable outgroup for plant PIN genes, we have simply made the assumption that plants recovered as sister to all the others (what one might call “primitive” or “basal” plants) carry gene copies that are also likely to be sister to all the others. This may not be correct, as evidenced by the placement of some gymnosperm copies as more derived than angiosperm copies. However, perhaps the most stable and supported relationship we recovered with PINs and ACOs was the clear distinction between moss (and usually lycophyte) copies and all the others. Thus, these gene families either diversified very early, and mosses and lycophytes retained only the most derived copies, or, more parsimoniously, the gene families simply diversified after the rise of spermatophytes. In any case, we note that the reconstruction of a long-to-short evolution of PIN genes is not merely the result of putting moss copies as the outgroup, for even in generally unresolved trees, like the one recovered from the alignment trimmed of gaps by Gblocks (Figure 5B), the main clades of short-form PINs are closely related to each other and are derived from angiosperm long forms. Only rerooting specifically by a short-form copy would change this, and in reconstructions with two origins of short-form copies, one clade would remain derived. A recent phylogenetic analysis of PIN genes done independently of the analysis here (Bennett et al., 2014) uses a similar methodological approach and obtains results that are broadly congruent with our previous analysis (Carraro et al., 2012) and the analysis here. Bennett et al. (2014) use nucleotide sequences and root by bryophytes, and they recover the odd PpPIND in a derived position on a long branch. They also find multiple, later origins for short (or more specifically, what they term “non-canonical”) forms of PIN genes. It thus appears that whether improvements are made to PIN phylogenies by adding more sequences (Bennett et al., 2014) or excluding sources of convergence in the data, as we do here, PIN genes increasingly seem to have undergone shortening events multiple times. In our phylogenies it appears that most modern ACOs arose subsequent to the monocot-dicot split [140–150 Ma ago, during the late Jurassic-early Cretaceous (Chaw et al., 2004)], but PIN genes diversified much earlier, as evidenced by their more thorough historical mixture of monocot and dicot copies. A broad diversification of ACOs during the Mesozoic is later than was previously hypothesized by John (1997), who did not consider them present in primitive land plants and believed their appearance was necessitated by droughts at the end of certain Permian periods (the Devonian at 360 Ma ago and the Carboniferous at 300 Ma ago). It appears that multiple ACO copies were present during the Permian and before the split between gymnosperms and angiosperms, and even before the split between mosses and lycophytes, but the later proliferation of copies in angiosperms requires a new ecological driver besides droughts in the Permian. Gymnosperm PINs and ACOs appear to have derived from angiosperm gene copies. This is not corrected simply by using a gymnosperm root, since that renders moss and lycophyte copies derived. Rather it indicates that for both PINs and ACOs the ancestral copy in gymnosperms was a more derived copy than some of the copies inherited and retained in the ancestral angiosperm. A very limited number of gymnosperm sequences are available for both gene families, so the possibility remains that the history of gymnosperm sequences will become clearer as more of them are included in future analyses. Amino acid sequences should present problems for historical reconstruction (Simmons, 2000), despite their popularity in plant gene family phylogenies. Genetic code degeneracy and selection pressure on protein function are sources of convergence, and although amino acids may correct for back-mutations in the third codon position, which can be another source of convergence, ignoring the third codon position removes information on recent divergences. Using amino acids with phenetic tree-building methods like Neighbor-Joining (algorithms that cluster sequences by overall similarity) (Saitou and Nei, 1987) will likely amplify convergence in amino acids, and using them with probabilistic optimality criteria requires a model of evolution both for the alignment step and tree-searching. We find here that trees made with amino acids and the most agnostic cladistic method for optimizing alignment and tree-searching (dynamic homology) produced trees with little in their favor relative to the trees made from GC-free nucleotide alignments, and we would not recommend using amino acids for future historical studies of gene families. Although a few monocot PIN and ACO seqeunces do not have a high GC content, it seems likely that this quality is the result of an ongoing substitution bias and thus a source of misleading, convergent signal. Monocot copies are not monophyletic, and almost all of them present very high GC content, which argues against this being the result of a historical event no longer maintained in monocots. For example, in PINs almost half of the monocot copies have GC content at the third codon position above 50%, and all but four are over 40%. We notice in Meister and Barow's survey (2007) that monocots in general have statistically significantly higher GC content than dicots (using a t-test of the GC percentages, p < 0.001), and that they attain a maximum content around 50%, about 10% higher than the maximum GC content of dicots. However on average monocot genomes have only about 1% more GC content than dicots. We notice that the most GC-rich species in Meister and Barow (2007) are the grasses, which constitute all the monocots in our data set. Given how strong this bias appears in grass ACOs and PINs, it was perhaps trivial for us to retrieve monocot clades in our previous PIN phylogeny (Carraro et al., 2012), which was based on all three codon positions and did not have GC content removed. However, we still recover most monocots together (either still in clades or paraphyletic grades) and in the same combinations here as before. For example, the monocot group of nine short-form genes that includes OsPIN5b and SbPIN5 was recovered both previously and here (Figure 4B), but previously these were monopheletic, and here they form a grade out of which diversifies a clade of short-form and N-terminal-TMD-only genes. Even using the alignment that had only the first and second positions of serine, which should be immune from a GC mutational bias, we recovered clusters of monocots, a close relationship among primitive land plants, and a clade of mostly short PIN genes (Figure 5C). Here we present new phylogenies for PIN and ACO genes, after working to improve the methods used to reconstruct the histories of gene families. First, we avoid the use of amino acids and distance (phenetic) algorithms, which have the potential to convey and amplify homoplasy. Next, in a further attempt to avoid homoplasy among genes with similar lengths and GC content, we avoid the use of indels and treat Gs and Cs or their transformations as missing data. We root trees by gene copies found in bryophytes, and we exclude sequences which are not clearly homologous. The results suggest an evolution from long to short PINs, perhaps multiple times, and a diversification of ACOs mostly after the dicot-monocot split. More sequences from a wider taxonomic range for these gene families are welcome for the continued development of their phylogenetic hypotheses and a deeper understanding of their histories.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  32 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

Review 2.  Ethylene biosynthesis and signaling networks.

Authors:  Kevin L-C Wang; Hai Li; Joseph R Ecker
Journal:  Plant Cell       Date:  2002       Impact factor: 11.277

Review 3.  The PIN auxin efflux facilitators: evolutionary and functional perspectives.

Authors:  Ivan A Paponov; William D Teale; Martina Trebar; Ikram Blilou; Klaus Palme
Journal:  Trends Plant Sci       Date:  2005-04       Impact factor: 18.313

Review 4.  Recent advances in ethylene research.

Authors:  Zhefeng Lin; Silin Zhong; Don Grierson
Journal:  J Exp Bot       Date:  2009-06-30       Impact factor: 6.992

Review 5.  The ABA-mediated switch between submersed and emersed life-styles in aquatic macrophytes.

Authors:  Dierk Wanke
Journal:  J Plant Res       Date:  2011-06-15       Impact factor: 2.629

6.  The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Authors:  N Saitou; M Nei
Journal:  Mol Biol Evol       Date:  1987-07       Impact factor: 16.240

7.  Involvement of ethylene and nitric oxide in cell death in mastoparan-treated unicellular alga Chlamydomonas reinhardtii.

Authors:  Zhenya P Yordanova; Elena T Iakimova; Simona M Cristescu; Frans J M Harren; Veneta M Kapchina-Toteva; Ernst J Woltering
Journal:  Cell Biol Int       Date:  2010-02-22       Impact factor: 3.612

8.  Requirement of the Auxin Polar Transport System in Early Stages of Arabidopsis Floral Bud Formation.

Authors:  K. Okada; J. Ueda; M. K. Komaki; C. J. Bell; Y. Shimura
Journal:  Plant Cell       Date:  1991-07       Impact factor: 11.277

9.  Unraveling the evolution of auxin signaling.

Authors:  Ive De Smet; Ute Voss; Steffen Lau; Michael Wilson; Ning Shao; Ruth E Timme; Ranjan Swarup; Ian Kerr; Charlie Hodgman; Ralph Bock; Malcolm Bennett; Gerd Jürgens; Tom Beeckman
Journal:  Plant Physiol       Date:  2010-11-16       Impact factor: 8.340

Review 10.  The PIN-FORMED (PIN) protein family of auxin transporters.

Authors:  Pavel Krecek; Petr Skupa; Jirí Libus; Satoshi Naramoto; Ricardo Tejos; Jirí Friml; Eva Zazímalová
Journal:  Genome Biol       Date:  2009-12-29       Impact factor: 13.583

View more
  5 in total

Review 1.  Producing the Ethylene Signal: Regulation and Diversification of Ethylene Biosynthetic Enzymes.

Authors:  Matthew A Booker; Alison DeLong
Journal:  Plant Physiol       Date:  2015-07-01       Impact factor: 8.340

2.  Different Preclimacteric Events in Apple Cultivars with Modified Ripening Physiology.

Authors:  Vikram Singh; Asya Weksler; Haya Friedman
Journal:  Front Plant Sci       Date:  2017-09-05       Impact factor: 5.753

Review 3.  1-Aminocyclopropane-1-Carboxylic Acid Oxidase (ACO): The Enzyme That Makes the Plant Hormone Ethylene.

Authors:  Maarten Houben; Bram Van de Poel
Journal:  Front Plant Sci       Date:  2019-05-29       Impact factor: 5.753

Review 4.  The role of auxin transporters in monocots development.

Authors:  Sara Balzan; Gurmukh S Johal; Nicola Carraro
Journal:  Front Plant Sci       Date:  2014-08-15       Impact factor: 5.753

Review 5.  The Control of Auxin Transport in Parasitic and Symbiotic Root-Microbe Interactions.

Authors:  Jason Liang Pin Ng; Francine Perrine-Walker; Anton P Wasson; Ulrike Mathesius
Journal:  Plants (Basel)       Date:  2015-08-24
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.