Literature DB >> 24137047

Identification, nomenclature, and evolutionary relationships of mitogen-activated protein kinase (MAPK) genes in soybean.

Achal Neupane1, Madhav P Nepal, Sarbottam Piya, Senthil Subramanian, Jai S Rohila, R Neil Reese, Benjamin V Benson.   

Abstract

Mitogen-activated protein kinase (MAPK) genes in eukaryotes regulate various developmental and physiological processes including those associated with biotic and abiotic stresses. Although MAPKs in some plant species including Arabidopsis have been identified, they are yet to be identified in soybean. Major objectives of this study were to identify GmMAPKs, assess their evolutionary relationships, and analyze their functional divergence. We identified a total of 38 MAPKs, eleven MAPKKs, and 150 MAPKKKs in soybean. Within the GmMAPK family, we also identified a new clade of six genes: four genes with TEY and two genes with TQY motifs requiring further investigation into possible legume-specific functions. The results indicated the expansion of the GmMAPK families attributable to the ancestral polyploidy events followed by chromosomal rearrangements. The GmMAPK and GmMAPKKK families were substantially larger than those in other plant species. The duplicated GmMAPK members presented complex evolutionary relationships and functional divergence when compared to their counterparts in Arabidopsis. We also highlighted existing nomenclatural issues, stressing the need for nomenclatural consistency. GmMAPK identification is vital to soybean crop improvement, and novel insights into the evolutionary relationships will enhance our understanding about plant genome evolution.

Entities:  

Keywords:  MAPK family; gene evolution; homology; nomenclature; signal transduction; soybean genomics

Year:  2013        PMID: 24137047      PMCID: PMC3785387          DOI: 10.4137/EBO.S12526

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Background

Plants are perpetually exposed to both biotic and abiotic stresses, and they have evolved sophisticated cellular and physiological mechanisms to cope with some severe forms of stresses (i.e., drought, heat, cold, wounding, disease, and so on).1–3 Perception of these stresses and the initiation of appropriate molecular responses occur through an intricate network of proteins involved in signal transductions.4,5 Mitogen-activated protein kinase (MAPK) genes play crucial roles in stress signaling pathways and regulate myriads of cellular and physiological processes.6–9MAPK was originally discovered in animal cells as a microtubule-associated-protein-kinase in 1986 by Sturgill and Ray.10 It was later found to be a group of proteins phosphorylated at tyrosine residues in response to mitogens; hence, the name “mitogen-activated protein kinase”.11,12 Studies of genome sequences from diverse species at various taxonomic levels have shown that these genes always occur as gene families.13 The MAPK gene members belong to three functionally linked families called MAPK (=MPK), MAPKK (=MEK or MKK), and MAPKKK (=MEKK), forming a cascade of signaling networks associated with various cellular functions, including plant response to biotic and abiotic stresses.2,8,9 The number of MAPK genes within each family varies widely across species. For example, the number of MAPKs and MAPKKs are five and one in Chlamydomonas, six and six in Sachharomyces, 21 and 11 in Populus,13 16 and 12 in Brachypodium,14 20 and 10 in Arabidopsis,2,8 and 15 and eight in Oryza,13 respectively. The diversity of MAPKKKs is less understood, and the estimated numbers in Chlamydomonas, Arabidopsis, Oryza, and Sorghum are 8–10,3 80,2 75,15 and 40–60,3 respectively. The MAPK genes in complex organisms have diversified into various clades over time.8,13 In Arabidopsis, MAPK genes are classified into four different subgroups (A, B, C, and D) based on their evolutionary relationships and the presence of TEY and TDY phosphorylation motifs.2,8 Similarly, MAPKK genes in Arabidopsis form four distinct clades,8 and the three subgroups of MAPKKK genes—MEKK-like, ZIK-like, and Raf-like members—generally occur as monophyletic groups.2 Although the biological functions of all the MAPKs are not fully understood, the MAPKs of same subgroups are likely to involve in similar physiological responses or are functionally redundant, and their gene structures are highly conserved across species.8 Initiation of a MAPK signaling module involves catalytic activation of MAPKKK by upstream cellular receptors, G-proteins, and sometimes by phosphorylated MAPKKKK and other protein kinases.2,16 Once phosphorylated, the MAPKKK then activates MAPKK through phosphorylation of two serine/threonine residues in the activation segment, followed by dual phosphorylation of MAPK by MAPKK through phosphorylation of both threonine and tyrosine residues at the TXY motif17–19 in the activation or T-loop located in between the kinase subdomains VII and VIII. This phosphorylation results in an activation of certain transcription factors along with the transduction of MAPK signaling to numerous substrates and downstream protein kinases.20 With the onset of the signal transduction module, the MAPK proteins are promptly translocated from the cytoplasm to the nucleus,21 while the substrates of MAPKs are abundantly present in cytoplasm (for example, phospholipase A222 and transmembrane proteins, such as the endothelial growth factor receptor).23 These genes have a conserved domain for docking to their cognate activators, suppressors, and protein substrates to increase the efficiency during protein–protein interactions.24 Inactivation of MAPK genes is carried out by dephosphorylation of the activation motif by protein phosphatase, including tyrosine phosphatases, serine/threonine-specific phosphatase, and dual specificity MAPK phosphatases,25 a process equally important in establishing the physiological equilibrium in living cells. Some of the functional significances of the dephosphorylation of MAPKs can be described by the role of AP2C3 MAPK phosphatase-regulating ectopic cell development and epidermal cell conversion leading to stomatal development in plants.26 Another MAPK phosphatase known as PP2C5 has been found to enhance the ABA sensitivity, and therefore ABA induced seed dormancy and stomatal closure.27 MAPK regulatory pathways are involved in several developmental and defense mechanisms underlining the functional importance of the gene families—MAPK, MAPKK, and MAPKKK (Table 1). In Arabidopsis, some of the MAPKs, like MAPKKKɛ1 (=AtMAPKKK6) and MAPKKKɛ2 (=AtMAPKKK7), are found to be vital for the normal development and functioning of pollen28 and YODA (=AtMAPKKK4) in cell fate during embryonic development.29,30 Similarly, MAPK3 and MAPK6 in Arabidopsis are found to mediate signals in response to microbe-associate molecular patterns (MAMPs), such as flagellin (flg22) and chitin elicitors.31,32 The same MAPKs in Arabidopsis are known to control cell death along with the pathogenic responses by producing reactive oxygen species and indole-derived phytoalexin (camalexin).33,34 Orthologs of these two genes in rice are known to play similar roles in fungal resistance, but their upstream MAPKK pathways and complete roles are not yet fully understood.35
Table 1

A synopsis of some of the known MAPK functions in plant species.

Gene namesFunctionsReferences
MAPKKK
AtMEKK1Bacterial and fungal pathogens31
ANP1 (=AtMAPKKK1)Oxidative stress92
MAPKKKɛ1 (=AtMAPKKK6)Pollen viability28
MAPKKKɛ2 (=AtMAPKKK7)Pollen viability28
YODA, YDA (=MAPKKK4)Extra-embryonic cell fate29,30
CTR1 (Raf1)Oxidative stress, ethylene signaling93
ANP3 (=AtMAPKKK2)Cytokinesis, phragmoplast assembly92
EDR1 (Raf2)Defensive responses94
MAPKK
AtMKK1Oxidative stress95
GmMKK1Downey mildew, soybean mosaic virus38
OsMEK1Cold stress96
ZmMEK1Root apex proliferation97
NtMek1Cytokinesis, cell death, bacterial ellictor signaling37,98100
AtMAPKK2Cold and salt stress101
NtMEK2Ethylene signaling, fruit ripening102
AtMKK3Participate in JA signaling103,104
GmMKK2Downey mildew, soybean mosaic virus38
AtMAPKK4Response to microbial pathogens, stomatal development31,105
OsMAPKK4Fungal pathogens35
AtMAPKK5Bacterial and fungal pathogens, stomatal development31,105
AtMKK6Cytokinesis37,106
AtMKK7Polar auxin transport107
AtMAPKK9Leaf senescence, ethylene biosynthesis108
MAPK
AtMAPK1Activated by JA, ABA and H2O2109
AtMAPK2Activated by JA, ABA and H2O2109
PsMAPK2Seed germination110
AtMAPK3Bacterial and fungal pathogens31,111
MsMAPK3Cell cycle regulation (after metaphase), cytokinesis112
OsMAPK3Fungal pathogen35
OsMAPK3Stomatal development105
AtMAPK4Cold and salt stress101
AtMAPK4Immune responses113
GmMAPK4Downey mildew, soybean mosaic virus38
AtMAPK6Bacterial and fungal pathogens, cold and salt stress, Leaf senescence, stomatal development31,101,105,108,111
OsMAPK6Fungal pathogen35
GhMAPK7Colletotrichum nicotianaea (fungus), PVY virus114
AtMPK12Auxin signaling115
AtMPK18Stabilization of microtubule116
Similarly, MAPKKs such as NtMEK1 in tobacco and AtMKK6 in Arabidopsis, are found to regulate cytokinesis, indicating their crucial role in early-life activities.36,37 In soybean, these genes are reported to be involved in developmental and various stress responses, yet largely remain uncharacterized. Homologs of MAPK4 in soybean are reported to negatively regulate defense responses to diseases such as downy mildew and soybean mosaic virus, and positively regulate plant development and growth.38 Some of the MAPKs in soybean, such as MAPK1 homolog and another MAPK homolog of 49 kDa (wound-activated protein kinase), are activated in response to salt stress39 and to elevated phosphatidic acid in wounded soybean plants,40 respectively. With the recent completion of the soybean genome sequencing project,41 it has now become possible to identify and characterize GmMAPKs for the advancement of soybean research. In addition, comparative genomics of legume species along with their nonlegume relatives would allow us to identify any legume-specific MAPK genes that might regulate legume-specific processes such as symbiotic nodule development and isoflavonoid biosynthesis. Arabidopsis MAPKs and MAPKKs were the first published plant MAPKs to be systematically named.8 There are a few studies, however, that have applied the Arabidopsis model in their MAPK nomenclature,13 and nomenclatural inconsistencies are very common even in some recent literature,14,15,42 making the communication about these genes very difficult. Although the MAPK genes have been identified in a few plant species including Arabidopsis, only a few of them are characterized and our overall knowledge about these genes in many other plant species including soybean is very limited. The primary objectives of this study were to identify the members of all three subfamilies of GmMAPKs and assess their functional and evolutionary relationships. Identification of GmMAPK genes is the vital first step towards understanding their roles in stress response, growth, development, and defense mechanism in soybean. Understanding evolutionary relationships and functions of these genes is crucial to soybean crop improvement with potential implications in other plant species.

Results

We identified 38 GmMAPKs, 11 GmMAPKKs, and 150 GmMAPKKKs. The results for each MAPK family are described below. We also propose a nomenclature schema consistent with founder MAPKs in Arabidopsis to enable efficient comparative genomics (Table 2). The phylogenetic analysis using both maximum parsimony (MP) and maximum likelihood (ML) resulted into trees with similar topologies, but with slight variation in branch support for MAPK, MAPKK, and MAPKKK datasets. The model test performed showed that the Jones—Taylor—Thornton model with discrete gamma distribution and invariant sites (G+I) was the best-fit evolutionary model for each dataset.
Table 2

Nomenclature of Arabidopsis MAPK orthologs across five plant species (Arabidopsis,8 poplar,13 rice,42 grapes,56 and soybean). The names in the bold letters within parentheses are the suggested names.

ArabidopsisPoplarRiceGrapesSoybeanPhytozome ID
MAPK
AtMAPK1PtMPK1OsMPK3 (OsMPK1)VvMPK2 (VvMPK1)GmMAPK1Glyma04g03210
AtMAPK2PtMPK2OsMPK4 (OsMPK2)GmMAPK2Glyma06g03270
AtMAPK3PtMPK3-1OsMPK5 (OsMPK3)GmMAPK3-1Glyma11g15700
PtMPK3-2GmMAPK3-2Glyma12g07770
AtMAPK4PtMPK4OsMPK6 (OsMPK4)VvMPK5 (VvMPK4)GmMAPK4-1Glyma07g07270
GmMAPK4-2Glyma16g03670
AtMAPK5PtMPK5-1GmMAPK5-1Glyma01g43100
PtMPK5-2GmMAPK5-2Glyma11g02420
GmMAPK5-3Glyma08g02060
GmMAPK5-4Glyma05g37480
AtMAPK6PtMPK6-1OsMPK1 (OsMPK6)VvMPK6 (VvMPK6)GmMAPK6-1Glyma07g32750
PtMPK6-2GmMAPK6-2Glyma02g15690
AtMAPK7PtMPK7OsMPK4 (OsMPK7)GmMAPK7Glyma05g28980
AtMAPK8
AtMAPK9PtMPK9-1OsMPK14 (OsMPK9)VvMPK7 (VvMPK9)GmMAPK9-1Glyma05g33980
PtMPK9-2GmMAPK9-2Glyma08g05700
GmMAPK9-3Glyma07g11470
GmMAPK9-4Glyma09g30790
AtMAPK10
AtMAPK11PtMPK11OsMPK2 (OsMPK11)GmMAPK11-1Glyma18g47140
GmMAPK11-2Glyma09g39190
AtMAPK12
AtMAPK13VvMPK9 (VvMPK13)GmMAPK13-1Glyma12g07850
GmMAPK13-2Glyma11g15590
AtMAPK14PtMPK14OsMPK3 (OsMPK14)VvMPK1 (VvMPK14)GmMAPK14Glyma08g12150
AtMAPK15
AtMAPK16PtMPK16-1VvMPK10 (VvMPK16)GmMAPK16-1Glyma13g28120
PtMPK16-2GmMAPK16-2Glyma15g10940
GmMAPK16-3Glyma17g02220
GmMAPK16-4Glyma07g38510
AtMAPK17PtMPK17
AtMAPK18PtMPK18GmMAPK18Glyma15g38490
AtMAPK19PtMPK19OsMPK17 (OsMPK19)VvMPK11 (VvMPK19)GmMAPK19Glyma13g33860
AtMAPK20PtMPK20-1OsMPK10 (OsMPK20-1)VvMPK12 (VvMPK20)GmMAPK20-1Glyma18g12720
PtMPK20-2OsMPK9 (OsMPK20-2)GmMAPK20-2Glyma14g03190
OsMPK11 (OsMPK20-3)GmMAPK20-3Glyma02g45630
OsMPK8 (OsMPK20-4)GmMAPK20-4Glyma08g42240
OsMPK7 (OsMPK20-5)
GmMAPK22-1Glyma03g21610
GmMAPK22-2Glyma16g10820
GmMAPK23-1Glyma01g35190
GmMAPK23-2Glyma09g34610
GmMAPK23-3Glyma16g08080
GmMAPK23-4Glyma16g17580
MAPKK
AtMAPKK1GmMAPKK1Glyma15g18860
AtMAPKK2PtMKK2-1GmMAPKK2-1Glyma13g16650
PtMKK2-2GmMAPKK2-2Glyma17g06020
AtMAPKK3PtMKK3GmMAPKK3-1Glyma05g08720
GmMAPKK3-2Glyma19g00220
AtMAPKK4PtMKK4GmMAPKK4Glyma07g00520
AtMAPKK5PtMKK5GmMAPKK5Glyma08g23900
AtMAPKK6PtMKK6GmMAPKK6-1Glyma10g15850
GmMAPKK6-2Glyma02g32980
AtMAPKK7PtMKK7
AtMAPKK8PtMKK11-1GmMAPKK8Glyma09g30310
PtMKK11-2
AtMAPKK9PtMKK9
AtMAPKK10PtMKK10GmMAPKK10Glyma01g01980

MAPKs

The GmMAPK amino acid sequence length ranged from 326 to 615. The identified 38 GmMAPKs were nested into five distinct clades as shown in Figure 1 and Additional file 1. Phylogenetic placement of the MAPK genes in the ML tree (Fig. 1) is similar to that in the MP tree, but there was a slight variation in the bootstrap support (Additional file 1). Among these five clades, four clades (A, B, C, and D) are well supported and corresponded to those of their homologs in Arabidopsis,8 rice, and poplar.13 The members with phosphorylation motif TEY were found to be nested in clade A, B (except GmMAPK5-2 with the TVY motif), and C, while those genes with the TDY motif were nested in clade D. The numbers of GmMAPKs in clade A, B, C, D, and E were four, ten, four, 14, and six, respectively. The fifth clade, E, contained genes that were not orthologs of any of the Arabidopsis MAPKs, four of which had TEY motif and two genes (GmMAPK22-1 and GmMAPK22-2, denoted by * in Fig. 1 and Additional file 1) had a TQY motif. The occurrence of MAPK with the TQY motif has not been reported in other plant species before. Four paralogs of five MAPK genes (GmMAPK5, GmMAPK9, GmMAPK16, GmMAPK20, and GmMAPK23), along with two paralogs of each of six different MAPK genes (GmMAPK3, GmMAPK4, GmMAPK6, GmMAPK11, GmMAPK13, and GmMAPK22) were identified. We did not find the orthologs of AtMAPK8, AtMAPK10, AtMAPK12, AtMAPK15, AtMAPK17, and OsMAPK21 in soybean.
Figure 1

Maximum likelihood analysis of GmMAPKs and their orthologs in Arabidopsis, poplar, and rice.

Notes: The values above the branches are bootstrap support of 100 replicates. The JTT+G+I evolutionary model was employed in MEGA5.2.2 to perform maximum likelihood analysis. The members with phosphorylation motif TEY are included in clades A, B, and C; TDY in clade D, and members with the TQY (denoted by *) and the TVY (denoted by **) motif in clades E and B, respectively. The MAPK gene models were accepted for phylogenetic analysis using protein sequences of the serine/threonine kinase subfamily having conserved aspartate and lysine residues in their catalytic domain with the (D[L/I/V]K) motif and the TXY phosphorylation motif in their activation loop.

Functional divergence among the paralogs of MAPK gene members, including their functional conservation across taxa, were assessed through transcriptomic data analyses. Expression profiles based on transcriptomic data showed that GmMAPK5-2, GmMAPK5-3, GmMAPK5-4, GmMAPK9-3, GmMAPK9-4, and GmMAPK11-1 had the lowest or no expression in the tissues examined under the given experimental conditions, while the GmMAPK1, GmMAPK2, GmMAPK20-4, GmMAPK23-1, GmMAPK23-2, GmMAPK23-3, and GmMAP23-4 had relatively higher expression values (Fig. 2).
Figure 2

Heatmap visualization of GmMAPKs.

Note: Log 2-based value was employed to construct the heatmap for MAPK gene expression in different tissues and treatment conditions.

MAPKKs

The GmMAPKK amino acid sequence length ranged from 227 to 526. As illustrated in Figure 3 and Additional file 2, eleven GmMAPKKs were identified including one new MAPKK (ie, GmMAPKK11) that lacked a potential ortholog in Arabidopsis. The GmMAPKK members formed four distinct clades that corresponded to the four clades of the Arabidopsis MAPKKs (A–D). The topologies in the ML tree and MP tree (Additional file 2) were similar, with slight variations in bootstrap support. We did not find the orthologs of AtMAPKK7 and AtMAPKK9 in soybean. Clade A with five GmMAPKKs included GmMAPKK1 and two paralogs of each GmMAPKK2 and GmMAPKK6; clade B included two paralogs of GmMAPKK3; clade C had two genes—GmMAPKK4 and GmMAPKK5; whereas, clade D included two genes, GmMAPKK8 and GmMAPKK10. Expression data showed that GmMAPKK8 and GmMAPKK10 had the lowest expression values, while GmMAPKK4 and GmMAPKK5 had the highest expression values among GmMAPKK genes (Fig. 4).
Figure 3

Maximum likelihood analysis of GmMAPKKs and their orthologs in Arabidopsis, poplar, and rice.

Notes: In the ML phylogram, the values above the branches are bootstrap support of 100 replicates. The JTT+G+I evolutionary model was employed in MEGA5.2.2 to perform maximum likelihood analysis. The MAPKK gene models were accepted for phylogenetic analysis using dual-specificity protein kinases having conserved aspartate and lysine residues in their catalytic domain with the (D[L/I/V]K) motif and the S-X5-T phosphorylation motif along with their activation loop.

Figure 4

Heatmap visualization of GmMAPKKs.

Note: Log 2-based value was employed to construct the heatmap for MAPKK gene expression in different tissues and treatment conditions.

MAPKKKs

The GmMAPKKK amino acid sequence length ranged from 228 to 1411. Altogether, 150 GmMAPKKKs were identified (Fig. 5 and Additional file 3). The GmMAPKKK members formed three distinct clades: MEKK-like (34 genes), Raf-like (92 genes), and Zik-like (24 genes), consistent to those in Arabidopsis.2 The GmMEKK, GmRaf, and GmZIK-like members are color-coded: MEKK, Raf, and ZIK-like genes are represented by blue, black, and red branches, respectively (Fig. 5). The ML tree topologies were similar to MP tree (Additional file 3), with slight variations in branch supports. We found multiple paralogs of MAPKKKs in soybean, orthologs of which were not recognized as paralogs in Arabidopsis. Orthologs of some of the MEKK-like gene members (AtMAPKKK8, AtMAPKKK9, AtMAPKKK12, AtMAPKKK15, AtMAPKKK16, AtMAPKKK19, AtMAPKKK20, and AtMAPKKK21), Raf-like members (AtRaf5, AtRaf7–AtRaf9, AtRaf14, AtRaf15, AtRaf24, AtRaf25, AtRaf44–AtRaf46, and AtRaf48), and a ZIK-like member (AtZIK7) of Arabidopsis could not be recovered in soybean. However, new MAPKKK members of each MEKK-like, Raf-like, and ZIK-like subgroup were recovered in soybean. The new MEKK-like gene members in soybean included the paralogs of GmMAPKKK22, GmMAPKKK23, and GmMAPKKK24; the new Raf-like gene members included GmRaf49, GmRaf50, GmRaf51, GmRaf52, and GmRaf53, and the new ZIK-like members that included two paralogs of GmZIK12. Expression profiles showed: (1) GmMAPKKK5-1, GmMAPKKK17, GmMAPKKK18-1, GmMAPKKK18-2, GmMAPKKK18-3, GmMAPKKK22-1, and GmMAPKKK22-2 had the lowest to no expression values, while GmMAPKKK3-1, GmMAPKKK3-2, GmMAPKKK4-4, and GmMAPKKK23-1 had the highest expression values among MEKK-like genes (Fig. 6); (2) GmRaf4-1, GmRaf4-2, GmRaf6-2, GmRaf6-3, GmRaf6-4, GmRaf12, GmRaf16-1, GmRaf16-2, GmRaf19-1, GmRaf19-2, GmRaf19-3, GmRaf19-4, GmRaf26, GmRaf30-2, GmRaf34-2, GmRaf41-2, GmRaf42-3, GmRaf49-3, GmRaf51-1, GmRaf51-2, and GmRaf52 had the lowest expression values, while GmRaf2-2, GmRaf17-2, GmRaf20-1, GmRaf20-2, GmRaf21, GmRaf28, GmRaf29, GmRaf33-2, GmRaf49-2, and GmRaf54-1 had the highest expression values among Raf-like genes (Fig. 7A and B); and (3) GmZIK2-2, GmZIK2-3, GmZIK5, GmZIK8-1, GmZIK8-2, GmZIK8-3, GmZIK12-1, and all the paralogs of GmZIK1 (except in root tissue) had the lowest expression values, while GmZIK6-1, GmZIK6-2, GmZIK9-2, and GmZIK11 had the highest expression values among ZIK-like genes in all examined tissues (Fig. 8).
Figure 5

Maximum likelihood analysis of GmMAPKKKs and their orthologs in Arabidopsis.

Notes: Phylogenetic representation of GmMAPKKKs in circular tree format shows the three subgroups: GmMEKK-like, GmRaf-like, and GmZIK-like are indicated by blue, black, and red branches, respectively. The JTT+G+I evolutionary model was employed in MEGA5.2.2 to perform maximum likelihood analysis with 100 bootstrap replicates. The MAPKKK gene models were accepted for phylogenetic analysis using protein sequences of serine/threonine kinase subfamily having conserved aspartate and lysine residues in their catalytic domain with the (D[L/I/V]K) motif, and the members in each subgroup were categorized based on their signature motifs.

Figure 6

Heatmap visualization of MEKK-like GmMAPKKKs.

Note: Log 2-based value was employed to construct the heatmap for the MEKK-like GmMAPKKK gene expression in different tissues and treatment conditions.

Figure 7

Heatmap visualization of Raf-like GmMAPKKKs.

Note: (A and B) Log 2-based value was employed to construct the heatmap for Raf-like GmMAPKKKs gene expression in different tissues and treatment conditions.

Figure 8

Heatmap visualization of ZIK-like GmMAPKKKs.

Note: The log 2-based value was employed to construct the heatmap for ZIK-like GmMAPKKK gene expression in different tissues and treatment conditions.

Discussion

Genomic structure and comparative genomics

Soybean is an allopolyploid species,43 which has undergone two major polyploidization events approximately 59 million and 13 million years ago, followed by chromosomal rearrangements that perhaps resulted into extinction and diversification of its genes.41 Approximately 25% of the duplicated genes of the soybean genome are estimated to have been lost, averaging 3.1 retained copies.44 Ancient events conforming duplication of individual genes, whole genomes, or segmental duplication of chromosomes are thought to have contributed to the evolutionary novelties resulting in functional complexities.45,46 Gene duplication could occur by three processes: unequal crossing over (results in tandem gene duplication), retroposition (results in random gene insertions), and chromosomal or genome duplication.46 Two paralogs of each of GmMAPK23 (GmMAPK23-3 and GmMAPK23-4 in chromosome 16), GmMAPKKK18 (GmMAPKKK18-1 and GmMAPKKK18-2 in chromosome 12), GmRaf13 (GmRaf13-1 and GmRaf13-4 in chromosome 12), GmRaf18 (GmRaf18-1 and GmRaf18-3 in chromosome 8), GmRaf18 (GmRaf18-2 and GmRaf18-4 in chromosome 15), and GmRaf43 (GmRaf43-1 and GmRaf43-2 in chromosome 15) were found in the same chromosome (Additional file 4), indicating tandem duplications, and the rest of the GmMAPK, GmMAPKK, and GmMAPKKK paralogs of soybean are located in different chromosomes, ruling out the possibility for their duplication through unequal crossing over. Similarly, four paralogs of GmRaf18 are located in two separate chromosomes (Additional file 4), two on each, indicating the occurrence of tandem duplication preceding chromosomal duplication. Among 38 GmMAPKs, 21 genes were TEY, 14 were TDY, two were TQY, and one was the TVY type. Since the unicellular alga Chlamydomonas reinhardtii has MAPKs with both TEY and TDY motifs, we infer that the split of the TEY and TDY MAPKs is an ancient event, which is in agreement with similar inferences on the MAPKs of Arabidopsis, rice, and poplar.13 However, TDY MAPKs could have descended from the TEY-type ancestor (Fig. 1 and Additional file 1). The gene members in clade E are comprised of genes with both TEY and TQY motifs, indicating a recent evolution of MAPK22 in soybeans. Apparently, MAPKs with the TQY motif also occur in other legume genomes: Phaseolus (Phytozome: Phvul.010G023600), Lotus (Kazusa: chr3.CM0423.40.r2.a), and Medicago (Phytozome: Medtr8g012450). In case of MAPKKs, the number of genes was almost equal to that in Arabidopsis. The GmMAPKKs were nested in four clades (A–D; Fig. 3), consistent with those in Arabidopsis. In soybean, we found MAPKK1 and two paralogs of MAPKK2 and MAPKK6 in clade A, two paralogs of MAPKK3 in clade B, MAPKK4 and MAPKK5 in clade C, and MAPKK8 and MAPKK10 in clade D. Similarly, newly identified GmMAPKKK members formed three clades (34 genes in the MEKK-like, 92 in the RAF-like, and 24 in the ZIK-like subgroup) that are consistent with those reported in Arabidopsis2 and Oryza.15 Multiple Expectation-maximization for Motif Elicitation (MEME)80 analysis for the approximate sequence pattern, including logos of protein motifs related to MAPK-specific “signals” also showed the conservation of domains throughout the gene members of each subfamily and of the subgroups (Figs. 9–11). Some of these conserved domains as predicted by Pfam and PROSITE for MAPK and MAPKKs included protein kinase catalytic domain, MAPK-conserved site (specific to MAPKs), serine/threonine-active site, and the Adenosine Triphosphate (ATP) binding site. Some of the domains specific to MAPKKKs included the phenylacetic acid catabolic site, ATP-binding cassette, transporter region, bipartite nuclear localization signal domain, generalized PAS site, prokaryotic membrane lipoprotein lipid attachment site, predicted transmembrane region, septum formation initiator, and protein tyrosine kinase domain. MEME analysis indicated that at least six different motifs are shared by all members within the GmMAPK subfamily (Fig. 9), five motifs shared within GmMAPKK subfamily (Fig. 10), eight within the GmMEKK subfamily, five within the GmRaf-like, and six motifs shared by the members within the GmZIK-like subgroups of the GmMAPKKK subfamily (Fig. 11A–D).
Figure 9

Predicted domain structure of GmMAPKs.

Notes: Conserved domain structures as predicted by MEME analysis of the GmMAPK subfamily. Ten different sites were analyzed for the prediction of conserved domain structures in the MAPK subfamily. Each stack height in the logos for ten different predicted motifs represents the sequence conservation in that region which is measured in bits, whereas the height of each residue within the stack indicates the frequency of corresponding amino acid competing for that position.

Figure 11

Predicted domain structure of GmMAPKKKs.

Notes: Conserved domain structures as predicted by MEME analysis of the GmMAPKKK subfamily: (A) MEKK-like; (B and C) Raf-like; and (D) ZIK-like. Ten different sites were analyzed for the prediction of conserved domain structures in MAPKKK gene subfamily. Each stack height in the logos for ten different predicted motifs represents the sequence conservation in that region, which is measured in bits, whereas the height of each residue within the stack indicates the frequency of the corresponding amino acid competing for that position.

Figure 10

Predicted domain structure of GmMAPKKs.

Notes: Conserved domain structures as predicted by MEME analysis of the GmMAPKK subfamily. Ten different sites were analyzed for the prediction of conserved domain structures in the MAPKK subfamily. Each stack height in the logos for ten different predicted motifs represents the sequence conservation in that region which is measured in bits, whereas the height of each residue within the stack indicates the frequency of the corresponding amino acid competing for that position.

The genome size of soybean (1115 MB; 46,430 protein coding genes)41,47 is larger than that of Arabidopsis (125 MB; 27,416 genes),48 grapevine (487 MB; 30,434 genes),49 poplar (~485 MB; 45,555 genes),50 and rice (389 MB; ~37,544 genes).51 The number of paralogs in the MAPK family is greater in soybean than in Arabidopsis, grapes, poplar, and in rice as a result of evolutionary processes involving gene duplication and protein diversification.52 For example, MAPK16 has four paralogs in soybean, two in poplar, and one each in Arabidopsis and Brachypodium, and none in rice, suggesting higher diversity of these genes in dicots. Similarly, MAPKK10 paralogs are more diversified in monocots than in dicots, as evidenced by three paralogs in rice, five in Brachypodium, and one in each of soybean, poplar, and Arabidopsis. Several of these duplication events are deemed to have taken place long before the eudicot-monocot branching. For example, the number of paralogs of MAPK20 in soybean, rice, and poplar are four, five, and two, respectively. This inference in soybean is consistent to that in rice and poplar.13 Previous studies in rice13 and Brachypodium14 showed the absence of MAPK1, MAPK2, MAPK5, MAPK9, and MAPK11, whereas the orthologs of these genes are found to be conserved in most eudicot species, suggesting the evolution of these orthologs in eudicot species. Conversely, the MAPK21 paralogs were found to have evolved exclusively in monocot species. Within the eudicot species, our analyses revealed the absence of some Arabidopsis orthologs in soybean and poplar. For example, the Arabidopsis ortholog of MAPK12 is absent in both poplar and soybean. Additionally, MAPK13 and MAPK10 are absent in poplar and soybean, respectively. In our separate analysis of MAPK gene members from four legumes and three nonlegume species, we consistently recovered the putative legume-specific clade (clade E) consisting of both TEY and TQY members, which was distinctly separated from the four traditional clades of the MAPK subfamily. The presence of the new clade (clade E; Fig. 1) – including its members with the TQY motif, most likely descended from the TEY type – is an evolutionary innovation within the legume species. As shown in Figure 1, the five paralogs of OsMAPK20 are present in three different chromosomes: OsMAPK20-1 and OsMAPK20-4 in chromosome 1; OsMAPK20-2 and OsMAPK20-5 in chromosome 5; and MAPK20-3 in chromosome 6. A restriction fragment length polymorphism test performed in rice showed that chromosome 1 and chromosome 5 are ancient duplicates.53 Likewise, chromosome 2 and chromosome 6 of rice are believed to be ancient duplicates,54 each housing a paralaog of OsMAPK17 (Phytozome: LOC_Os02g04230 and Phytozome: LOC_Os06g49430). Orthologs of AtMAPK20 in Glycine max are present in four paralogs, but all in different chromosomes that possibly resulted from two rounds of whole genome duplications. This duplication process seems distantly possible for segmental or chromosomal duplication, as there is a slim chance for the same locus to undergo two events of duplications. Similarly, the ortholog of AtMAPK16 in soybean has evolved four paralogs, compared to only one in rice and two paralogs in poplar, possibly as a result of successive whole genome duplication events. These differences in the number of duplicates make sense considering the number of chromosomes (2n = 40) in soybean41 compared to that in rice (2n = 24),55 along with the differences in their genome sizes. From our combined phylogenetic analysis of Arabidopsis, rice, poplar, and soybean (Fig. 1), the paralogous relationship between MAPK7 and MAPK1413 was also recognized in soybean. In parallel to the findings from Arabidopsis,8 rice,13,42 grapevine,56 and poplar,13 genes of clades A, B, and C of GmMAPKs (Fig. 1) have TEY, and those of clade D have TDY amino acid motifs.13 Interestingly, two new types of MAPKs were identified in soybean that were not previously reported in plants—MAPKs containing TVY (GmMAPK5-2) and TQY motifs (GmMAPK22-1 and GmMAPK22-2) in clade B and clade E, respectively. Surprisingly, when soybean MAPK amino acid sequences with the TQY motif were searched for using the Basic Local Alignment Search Tool in the National Center for Biotechnology Information, they were also found in nematodes such as Caenorhabditis elegans (GenBank: NP_494947.2), Caenorhabditis brenneri (GenBank: EGT51072.1), Brugia malayi (GenBank: XP_001896626.1), Loa loa (GenBank: XP_003140630.1), Caenorhabditis remanei (GenBank: XP_003109019.1), and Caenorhabditis briggsae (GenBank: XP_002630655.1). Further investigation into the role of these genes in legume—nematode interactions would provide valuable insights as to whether these genes with the TQY motif have evolved independently in legume and nematode species. The presence of MAPKs with the TQY motif in distantly related groups (legumes and nematodes) is perhaps due to evolutionary convergence. As in MAPKK10 of Arabidopsis,8 poplar, and rice,13GmMAPKK10 also lacks a complete activation loop (S/TxxxxxS/T) that is required for the phosphorylation of MAPKKs in plants. Nonetheless, GmMAPKK10, as in Arabidopsis, poplar, and rice orthologs,13 retains the lysine and aspartate residues. These residues are believed to be required within the designated motif—D(L/I/V)K—of the catalytic loop for its kinase activity.13,57,58 It is also worth noting that GmMAPKKs in clades C and D (GmMAPKK4, GmMAPKK5, GmMAPKK8, and GmMAPKK10) have only one exon, consistent with the Arabidopsis model.8MAPKKK members in soybean are found in relatively large numbers (150 MAPKKKs) compared to 80 members in Arabidopsis and 75 in rice. The comparative genomics of MAPKKK genes is beyond the scope of this study and will be discussed elsewhere.

Functional genomics

Presence of large numbers of duplicate MAPK genes in the soybean genome led us to survey the functional genomics of these genes. Gene duplication may result in functional redundancy, divergence, and diversification, including neofunctionalization (where one of the copies is assigned a new function), nonfunctionalization (where one of the copies is destined to lose function), subfunctionalization (where duplicated gene copies evolve to complement one another to retain the ancestral gene function), or hypofunctionalization (where expression of one of the copies is diminished).59–63 With the recent surge in whole genome sequencing and the availability of sequencing data, it is not feasible to characterize each and every single gene in different species through laboratory experiments. Therefore, integrated approaches including bioinformatics and functional genomics are vital to the study of gene functions and their evolution. We used transcriptomic data (www.soybase.org, http://plantgrn.noble.org/LegumeIP and http://mpss.udel.edu/) and mapped them onto phylogenies to assess evolution and functional divergence of GmMAPKs. Genes involved in stress-specific physiological responses are evolutionarily more conserved and the genes involved in deoxyribonucleic acid (DNA) repair are likely to be lost; this is as expected of allopolyploidization. Some genes are deemed to become nonfunctional and some will evolve to gain new functions.64,65 Using phylogenetic placements and expression profiles, we inferred functional divergence in paralogs of GmMAPK3, GmMAPK4, GmMAPK5, and GmMAPK11 (Fig. 2). In addition, GmMAPK5-2 with the presence of a TVY motif in the activation loop differs from its paralog, GmMAPK5-1, though the functional significance for this change is yet to be investigated. Our inference about the subfunctionalization MAPK16 paralogs in soybean (Fig. 2) is consistent with the functional divergence reported for their orthologs in poplar.13 On the other hand, those genes with very low levels to null expression in all tissues are perhaps undergoing constraints of nonfunctionalization. Interestingly, out of six GmMAPK genes nested into clade E, four paralogs of GmMAPK23 have the TEY motif and two paralogs of GmMAPK22 have the TQY motif (Fig. 1). The majority of these genes have higher expression values. The presence of the novel clade of these genes (two of them with the TQY motif) led us to predict that this clade of genes is perhaps involved in legume-specific physiology and organogenesis. Previous work has shown that nematodes such as C. elegans play an important mediatory role that is helpful in establishing legume—rhizobia symbiotic interactions by transferring the Sinorhizobium meliloti to the root tissues of Medicago truncatula in response to nematode-attracting signaling molecules released by the plant.66 Interestingly, C. elegans, which has some MAPKs with the TQY motif, is able to identify the bacteria for their food needs under varying environmental conditions.67,68 Further investigation of these genes and potential nematode—rhizobia—legume interaction(s) would provide valuable insights into their roles in root nodulation or any legume-specific physiology. In terms of transcriptomic data survey for MAPKK genes, Figure 4 shows relatively higher levels of expression values for GmMAPKK4 and GmMAPKK5 than other GmMAPKKs. The expression pattern of these two genes is consistent with that of their orthologs in Arabidopsis (source: MPSS database).90 Protein—protein interactions among AtMAPKK4 and AtMAPKK5 with AtMAPK6 have also been well established in a previous study, as compared to other AtMAPKKs.69 Interestingly, the ortholog of AtMAPKK4 in rice (Phytozome: LOC_Os02g54600) also shows higher protein expression values relative to other OsMAPKKs. The ortholog of this gene in rice is associated with OsMAPK3 (Phytozome: LOC_Os03g17700) acting as an upstream MAPKK.70 Phosporylation of OsMAPKK4 is undertaken by six upstream MAPKKKs: Phytozome: LOC_Os01g50370; LOC_Os05g46760; LOC_Os01g50400; LOC_Os01g50410; LOC_Os01g50420; and LOC_Os05g46750, prompting the pathways that regulate myriads of stress responses including pathogen, insect, drought, salinity, flood, and cold.71 This pathway in rice seems to be in parallel to the predicted AtMKK4AtMAPK3/AtMAPK6 pathway in Arabidopsis, directing cellular responses in various pathogen-related stresses.3,31,35,72,73 In soybean, GmMAPKK8 and GmMAPKK10 show no expression in any of the tissues examined (Fig. 4). This result in soybean was also consistent to the MPSS expression data for MAPKK8 and MAPKK10 of Arabidopsis, including all the paralogs of MAPKK10 in poplar and in rice, except for OsMAPKK10-2, as previously shown.13 Protein—protein interaction assays on MAPK and MAPKK of Arabidopsis also suggest no evidence of the interaction of AtMAPKK8 with any of the AtMAPKs, and a very weak interaction of AtMAPKK10 with only AtMAPK17.69 In the same study, there was no evidence of the interaction of AtMAPK8, AtMAPK9, AtMAPK12, AtMAPK16, AtMAPK18, and AtMAPK19 with any of the AtMAPKK members. This perhaps could serve as evidence that the majority of the plant signal transductions involve a few regulatory MAPKs, and the same signaling pathway might be used for multiple responses.74 In the case of GmMAPKKKs, a large number of MEKK-like, Raf-like and ZIK-like genes present a more complex evolutionary pattern. Relative to evolutionarily, less dynamic MAPKKKs of Arabidopsis, the GmMAPKKK subfamily seems to be extensively amplified and functionally diversified. Differentially expressed paralogs of different genes of the MAPKKK subfamily also showed an interesting pattern of functional divergence. One or more paralogs of GmMAPKKKs such as GmMAPKKK3 (GmMAPKKK3-1 and GmMAPKKK3-2), GmMAPKKK4 (GmMAPKK4-1, GmMAPKK4-2, and GmMAPKKK4-3), GmMAPKKK11 (GmMAPKK11-1 and GmMAPKKK11-2), GmMAPKKK13 (GmMAPKKK13-1 and GmMAPKKK13-2), GmMAPKKK24 (GmMAPKKK24-1 and GmMAPKKK24-2), GmRaf3 (GmRaf3-1 and GmRaf3-2), GmRaf20 (GmRaf20-1 and GmRaf20-2), GmRaf23 (GmRaf23-1 and GmRaf23-2), GmRaf27 (GmRaf27-1 and GmRaf27-2), GmRaf31 (GmRaf31-1, GmRaf31-2, and GmRaf31-3), GmRaf32 (GmRaf32-1 and GmRaf32-2), GmRaf33 (GmRaf33-1 and GmRaf33-2), GmRaf35 (GmRaf35-1 and GmRaf35-2), GmRaf36 (GmRaf36-1, GmRaf36-2, and GmRaf36-3), GmZIK6 (GmZIK6-1 and GmZIK6-2), and GmZIK9 (GmZIK9-1 and GmZIK9-2) are inferred to have undergone functional divergence either through subfunctionalization or neofunctionalization of the duplicated copies (Figs. 6–8). Similarly, some of the duplicated genes are retaining or gaining significant levels of expression in one or few tissues, as found in the case of GmMAPKKK22 (GmMAPKK22-1 and GmMAPKK22-2), GmRaf16-2, GmRaf19-3, GmRaf51-1, and GmZIK1 (GmZIK1-1 and GmZIK1-2). We speculate this gain in function by these genes might have occurred after the duplication of genes that had undergone severe mutation, destroying their ability to process information for biological processes and functions. From the expression data, one or more duplicated copies of the MAPKKK genes in soybean, such as GmMAPKKK5-1, GmMAPK18 (GmMAPKKK18-1, GmMAPKKK18-2, and GmMAPKKK18-3), GmMAPKKK17, GmRaf4 (GmRaf4-1 and GmRaf4-2), GmRaf6 (GmRaf6-2, GmRaf6-3, and GmRaf6-4), GmRaf16 (GmRaf16-1 and GmRaf16-2), GmRaf19 (GmRaf19-1, GmRaf19-2, GmRaf19-3, and GmRaf19-4), GmRaf26, GmRaf30 (GmRaf30-2), GmRaf34 (GmRaf34-2), GmRaf41 (GmRaf41-2), GmRaf42 (GmRaf42-3), GmRaf49 (GmRaf49-3), GmRaf51 (GmRaf51-1 and GmRaf51-2), GmZIK1 (GmZIK1-3), GmZIK8 (GmZIK8-1 and GmZIK8-3), and GmZIK12 (GmZIK12-1) are inferred to be undergoing nonfunctionalization (Figs. 6–8). A thorough study of the expression data for these genes is required to understand the evolutionary processes involved in gene/genome duplications in polyploid species such as soybean. We also realized the need for functional genomics and rigorous analyses to test our inferences, but they were beyond the scope of this project.

MAPK nomenclature in plants

Arabidopsis is the first plant species to have its complete genome sequenced, and also to have its two MAPK gene families systematically named. We used the Arabidopsis model8 for the nomenclature of GmMAPKs and GmMAPKKs. There is no published literature on the MAPKKK nomenclature, except for the identification of putative MAPKKKs of Arabidopsis11 and rice.15 For GmMAPKKK nomenclature (see Fig. 5 and Additional file 3), we followed “The Arabidopsis Information Resources” website (TAIR, http://www.arabidopsis.org).2 Although the nomenclature model presented in the study of Arabidopsis MAPK8 is described as being robust enough to be adopted and expanded to the MAPKs of newly sequenced genomes,13 the model alone does not seem to capture the paralogous/orthologous status of all MAPK genes. Later in the study of rice and poplar MAPKs,13 the nomenclature model was redesigned to manifest the evolutionary relationships among the duplicated MAPK genes. MAPK nomenclature adopted by Hamel et al13 seems to be an appropriate step towards naming MAPK genes that resulted from duplication events including ancient polyploidization. One of these ancient polyploid genomes is Arabidopsis thaliana itself.75 Homology inferred on the basis of phylogenetic placements and sequence identity of MAPKs in Arabidopsis revealed several gene members such as AtMAPK1 and AtMAPK2, AtMAPK4 and AtMAPK11, AtMAPK7 and AtMAPK14, AtMAPK8 and AtMAPK15, AtMAPK18 and AtMAPK19 in the MAPK subfamily, as well as AtMAPKK1 and AtMAPKK2; AtMAPKK4, AtMAPKK5, and AtMAPKK7; and AtMAPKK8 and AtMAPKK9 in the MAPKK subfamily to be paralogous. Based on the phylogenetic analyses of the Arabidopsis MAPKKK protein sequences from soybean, we have assigned paralogous status to numerous genes in soybean, which could be attributable to both recent and ancient duplication events. A comparative genomic analysis of the dataset, including both legume and nonlegume species, was also performed to further confirm the paralogous status of some of these gene members. We encountered several plant MAPK nomenclatural inconsistencies while searching for nomenclatural codes for naming the identified GmMAPKs. In poplar,13 most of the genes were systematically named using the Arabidopsis model. Nonetheless, most of the MAPKs previously studied in rice15,42 and grapes56 were found to be incoherently named, without following the Arabidopsis model (Table 2). In the worst scenario, genes of a species with the same names, described by different authors,13,42,76 do not correspond to each other. Oftentimes, the same gene name/number has been used to name different genes as orthologs in different species of prokaryotes, neglecting the fact that the genes thus named could not be automatically orthologs.77 A common problem is that without knowing all of the several alternative names of MAPK genes, it is difficult to figure out exactly what gene is being presented in the literature. This might be partly due to evolutionary processes including gene and genome duplication or chromosomal rearrangements, complicating the identification and nomenclatural processes. The problem of nomenclature lies not only within the MAPK family, but the lack of allegiance in nomenclature can also be realized to a wider extent from desultory labeling of “p21” from p21ras to p21waf1 (also known as WAF1, CIP1, SDI1, and CAP20), each with strict functional differences, to the lack of uniformity in nomenclature of regulatory proteins named variously as FLIP, Casper, FLAME, CASH, and I-FLICE.78 Inconsistent practices of gene nomenclature and a lack of a universal code could potentially reduce the reliability of cross-species functional analysis.77 One of the most challenging tasks is the nomenclature of the MAPKKK subfamily, particularly in the polyploid species (such as soybean). For the purpose of nomenclature of MAPK in plants, including those with duplicated genomes, we suggest the comparison among multiple species to identify the homologs of these genes. Comparative studies including species with a larger number of identified MAPK orthologs, and perhaps the most basal diploid plant species, can facilitate the naming process. If the genome of the species does not have orthologs of a particular gene in Arabidopsis or in previously studied species, a new name should be proposed. It is therefore important to confirm whether the intended names have already been established to any MAPKs in previously studied organisms, and to determine whether the homology model could be applied to the nomenclature of MAPKs of interest. Such homology-based nomenclature, however, is not always the ultimate and error-free method used to name the genes. In such cases, a suit of practices including phylogenetics, protein function analysis, and structural comparisons may be required for the purpose of correct gene nomenclature.

Conclusion

Systematic identification of MAPK genes in soybean is vital to our understanding of their roles in stress response, growth, development, and defense mechanisms. In this study, we have presented genome-wide identification and nomenclature of GmMAPK families. A total of 38 GmMAPKs, 11 GmMAPKKs, and 150 GmMAPKKKs are identified, and our results suggest the expansion of GmMAPK families in soybean due to ancient genome duplications and recent chromosomal rearrangements. Expression profiles based on transcriptomic data showed expression patterns on a continuum between null expressions to a high level of expression in almost all examined tissues under the given experimental conditions. The expression profiles, when mapped onto the phylogeny, provided evidence of strong functional divergence in GmMAPKs. Comparative genomics of legume and nonlegume species and characterization of legume-specific genes using ribonucleic acid interference (RNAi) and protein—protein interaction is underway in the authors’ labs. In addition, our ongoing study on structural conservation, selection pressure, and expression experiment is expected to add another dimension to the current findings. Evolutionary processes driving the expansion of the MAPK families can be better understood through comparative genomics of MAPKs from multiple representative species at various taxonomic levels. Advancement in DNA sequencing technology is yielding a wealth of genome sequence data from an increasing number of taxa, thus making it challenging for the systematic identification and nomenclature of their genes. The results from this study may facilitate functional dissection of the important genes and their communication among scientific communities.

Materials and Methods

Profile hidden Markov model (HMMs)

In order to perform a thorough search for MAPK gene members in soybean, we performed HMM searches based on the multiple sequence alignments of 20 MAPKs, ten MAPKKs, and 80 MAPKKKs of Arabidopsis. The HMM profile was built using HMM version 3.0 (HMMER Project; Howard Hughes Medical Institute, Chevy Chase, MD, USA) in our Linux system to execute homology searches against whole protein dataset from soybean genome available at phytozome.net. The resulted gene models from hmmbuild/hmmsearch options were accepted only if they were within the inclusion threshold of e-value 0.01. The resulted sequences were aligned in ClustalW version 2.0,79 as described previously.13 The MAPK gene models were accepted only if they displayed the consensus for MAPK-specific motifs and domains.2,8

Identification of conserved domains

MEME was used to predict similarities among protein sequences and visualize conserved motifs in specific subdomains of MAPK, MAPKK, and MAPKKK employing two conditions: (1) the ideal motif widths were set to be between six and 50; and (2) the motif search was set to identify ten motif regions.80 We also chose to use a bulk of the protein sequences from each subfamily of MAPK and MAPKK, and from each subgroup of MAPKKK (MEKK-like, Raf-like, and ZIK-like) for each input dataset. This allowed us to predict the higher number of conserved domains by avoiding the maximum amount of noise. The identified genes were confirmed using their signature motifs and published domains.2,3,8,9,14,15,81 The conserved protein domains of the MAPK gene members were analyzed using the PROSITE protein database,82 and the domains within each query sequence were analyzed for the presence of serine and threonine kinase residues using Pfam (pfam.sanger.ac.uk) and SMART (www.smart.emblheidelberg.de). We also confirmed the presence of residues required for kinase activity in the catalytic loop.57,58

Phylogenetic analysis of GmMAPK families

The MAPK and MAPKK protein sequences for rice, poplar,13 and Arabidopsis2 (TAIR—http://www.arabidopsis.org) were directly adopted from the published data sources. The MAPK protein dataset was aligned using ClustalW. The aligned amino acid sequences were manually edited using the program Se-AL (v2.0a11 Carbon; http://tree.bio.ed.ac.uk/software/seal) in order to avoid errors related to residue misplacements due to highly variable indel sizes. The data matrices were analyzed using the MP method in the program PAUP* 4.0b10.83 Heuristic searches were performed treating all characters equally weighted and unordered. Portions of the data matrices with ambiguous alignments were excluded from the analysis. Data matrices were also analyzed using the ML method using the best evolutionary model in the program, MEGA5.2.2.84 Branch supports were computed using bootstrapping85 of 2,000 and 100 replicates for MP and ML analyses, respectively. Human MAPK sequences, HsMAPK1 (GenBank: NP_002736.3), HsMAPKK1 (GenBank: AAI37460.1), HsMAPKKK10 (GenBank: NP_002437.2) were used as out-groups for the analyses of MAPK, MAPKK, and MAPKKK families, respectively. Gene—gene relationships were examined based on phylogenetic tree, genetic distance for each member of all three MAPK subfamilies, and their sequence identities (Additional file 5).

Homology assessment, nomenclature, and transcriptomic data

A sequence identity greater than 40% was used to detect homology among the protein sequences.86–88 Homology confidence among the protein sequences was established using sequence identity (40% or higher identity), ML estimates of pairwise genetic distances (smallest genetic distance among the potential homologs), and phylogenetic placements to define paralogous status of the duplicated genes and also to assign a number to the identified soybean MAPK genes. Orthologous and paralogous relationships were also inferred through both parametric (ML) and nonparametric (MP) methods. Available RNAseq expression data were downloaded using RNAseq atlas of soybean89 ( www.soybase.org). Arabidopsis MAPK expression data were also downloaded from the MPSS database90 (http://mpss.udel.edu/; accessed May 12th, 2012), which was the database used for comparative analysis. Mayday workbench91 was used to create a heatmap visualization for the expression data of soybean and Arabidopsis MAPK genes. For heatmap visualization, normalized gene expression at different growth conditions, tissues, and at different stress levels were log 2-based to explore the functional divergence across the soybean MAPK genes and their orthologs in Arabidopsis. Additional file 1: maximum parsimony analysis of The values above the branches are bootstrap support of 2,000 replicates. The members with the phosphorylation motif TEY are included in clades A, B, and C, TDY in clade D, and the members with the TQY (denoted by *) and TVY (denoted by **) motif in clades E and B, respectively. The MAPK gene models were accepted for phylogenetic analysis using protein sequences of the serine/threonine kinase subfamily having conserved aspartate and lysine residues in their catalytic domain with the (D[L/I/V]K) motif and TXY phosphorylation motif in their activation loop. Additional file 2: maximum parsimony analysis of The values above the branches are bootstrap support of 2,000 replicates. The MAPKK gene models were accepted for phylogenetic analysis using dual-specificity protein kinases having conserved aspartate and lysine residues in their catalytic domain with the (D[L/I/V]K) motif and the S-X5-T phosphorylation motif along their activation loop. Additional file 3: maximum parsimony analysis of Phylogenetic representation of GmMAPKKKs in circular tree format shows the three subgroups: GmMEKK-like, GmRaf-like, and GmZIK-like, as labeled. The values above the branches are bootstrap support of 2,000 replicates. The MAPKKK gene models were accepted for phylogenetic analysis using protein sequences of the serine/threonine kinase subfamily having conserved aspartate and lysine residues in their catalytic domain with the (D[L/I/V]K) motif, and the members in each subgroup were confirmed based on their signature motifs. Additional file 4: chromosomal distribution of Distribution of gene members of the GmMAPK family in 20 sets of soybean chromosomes. Additional file 5: plant MAPK gene nomenclature and homology assessment based on sequence identity, genetic distance, and phylogenetic placement.
  102 in total

Review 1.  Preservation of duplicate genes by complementary, degenerative mutations.

Authors:  A Force; M Lynch; F B Pickett; A Amores; Y L Yan; J Postlethwait
Journal:  Genetics       Date:  1999-04       Impact factor: 4.562

2.  MEKK1 is required for flg22-induced MPK4 activation in Arabidopsis plants.

Authors:  Maria Cristina Suarez-Rodriguez; Lori Adams-Phillips; Yidong Liu; Huachun Wang; Shih-Heng Su; Peter J Jester; Shuqun Zhang; Andrew F Bent; Patrick J Krysan
Journal:  Plant Physiol       Date:  2006-12-01       Impact factor: 8.340

3.  Diverse stress signals activate the C1 subgroup MAP kinases of Arabidopsis.

Authors:  Dolores Ortiz-Masia; Miguel A Perez-Amador; Juan Carbonell; Maria J Marcote
Journal:  FEBS Lett       Date:  2007-04-09       Impact factor: 4.124

4.  Soil nematodes mediate positive interactions between legume plants and rhizobium bacteria.

Authors:  Jun-ichiro Horiuchi; Balakrishnan Prithiviraj; Harsh P Bais; Bruce A Kimball; Jorge M Vivanco
Journal:  Planta       Date:  2005-07-15       Impact factor: 4.116

5.  HINKEL kinesin, ANP MAPKKKs and MKK6/ANQ MAPKK, which phosphorylates and activates MPK4 MAPK, constitute a pathway that is required for cytokinesis in Arabidopsis thaliana.

Authors:  Yuji Takahashi; Takashi Soyano; Ken Kosetsu; Michiko Sasabe; Yasunori Machida
Journal:  Plant Cell Physiol       Date:  2010-08-27       Impact factor: 4.927

6.  Isolation and characterization of two growth factor-stimulated protein kinases that phosphorylate the epidermal growth factor receptor at threonine 669.

Authors:  I C Northwood; F A Gonzalez; M Wartmann; D L Raden; R J Davis
Journal:  J Biol Chem       Date:  1991-08-15       Impact factor: 5.157

7.  Evidence that CTR1-mediated ethylene signal transduction in tomato is encoded by a multigene family whose members display distinct regulatory features.

Authors:  Lori Adams-Phillips; Cornelius Barry; Priya Kannan; Julie Leclercq; Mondher Bouzayen; Jim Giovannoni
Journal:  Plant Mol Biol       Date:  2004-02       Impact factor: 4.076

Review 8.  Specificity of receptor tyrosine kinase signaling: transient versus sustained extracellular signal-regulated kinase activation.

Authors:  C J Marshall
Journal:  Cell       Date:  1995-01-27       Impact factor: 41.582

9.  Negative regulation of defense responses in plants by a conserved MAPKK kinase.

Authors:  C A Frye; D Tang; R W Innes
Journal:  Proc Natl Acad Sci U S A       Date:  2001-01-02       Impact factor: 11.205

10.  MAPK phosphatase AP2C3 induces ectopic proliferation of epidermal cells leading to stomata development in Arabidopsis.

Authors:  Julija Umbrasaite; Alois Schweighofer; Vaiva Kazanaviciute; Zoltan Magyar; Zahra Ayatollahi; Verena Unterwurzacher; Chonnanit Choopayak; Justyna Boniecka; James A H Murray; Laszlo Bogre; Irute Meskiene
Journal:  PLoS One       Date:  2010-12-23       Impact factor: 3.240

View more
  26 in total

1.  Expression analysis of genes encoding mitogen-activated protein kinases in maize provides a key link between abiotic stress signaling and plant reproduction.

Authors:  Wei Sun; Hao Chen; Juan Wang; Hong Wei Sun; Shu Ke Yang; Ya Lin Sang; Xing Bo Lu; Xiao Hui Xu
Journal:  Funct Integr Genomics       Date:  2014-11-12       Impact factor: 3.410

2.  Evolutionary history of mitogen-activated protein kinase (MAPK) genes in Lotus, Medicago, and Phaseolus.

Authors:  Achal Neupane; Madhav P Nepal; Benjamin V Benson; Kenton J Macarthur; Sarbottam Piya
Journal:  Plant Signal Behav       Date:  2013-12-02

3.  Expression profiling of the mitogen-activated protein kinase gene family reveals their diverse response pattern in two different salt-tolerant Glycyrrhiza species.

Authors:  Aiping Cao; Ling Gao; Fei Wang; Xuechen Tong; Shuangquan Xie; Xifeng Chen; Tianxin Lu; Haitao Shen; Hailiang Liu; Xiang Jin; Hongbin Li
Journal:  Genes Genomics       Date:  2022-02-28       Impact factor: 2.164

4.  Immunolocalization of dually phosphorylated MAPKs in dividing root meristem cells of Vicia faba, Pisum sativum, Lupinus luteus and Lycopersicon esculentum.

Authors:  Konrad Winnicki; Aneta Żabka; Joanna Bernasińska; Karolina Matczak; Janusz Maszewski
Journal:  Plant Cell Rep       Date:  2015-02-05       Impact factor: 4.570

5.  Genome-wide identification of MAPK, MAPKK, and MAPKKK gene families and transcriptional profiling analysis during development and stress response in cucumber.

Authors:  Jie Wang; Changtian Pan; Yan Wang; Lei Ye; Jian Wu; Lifei Chen; Tao Zou; Gang Lu
Journal:  BMC Genomics       Date:  2015-05-15       Impact factor: 3.969

6.  Stress induced MAPK genes show distinct pattern of codon usage in Arabidopsis thaliana, Glycine max and Oryza sativa.

Authors:  H Surachandra Singha; Supriyo Chakraborty; Himangshu Deka
Journal:  Bioinformation       Date:  2014-07-22

7.  Genome-wide identification and transcriptional expression analysis of mitogen-activated protein kinase and mitogen-activated protein kinase kinase genes in Capsicum annuum.

Authors:  Zhiqin Liu; Lanping Shi; Yanyan Liu; Qian Tang; Lei Shen; Sheng Yang; Jinsen Cai; Huanxin Yu; Rongzhang Wang; Jiayu Wen; Youquan Lin; Jiong Hu; Cailing Liu; Yangwen Zhang; Shaoliang Mou; Shuilin He
Journal:  Front Plant Sci       Date:  2015-09-25       Impact factor: 5.753

8.  Tomato SlMKK2 and SlMKK4 contribute to disease resistance against Botrytis cinerea.

Authors:  Xiaohui Li; Yafen Zhang; Lei Huang; Zhigang Ouyang; Yongbo Hong; Huijuan Zhang; Dayong Li; Fengming Song
Journal:  BMC Plant Biol       Date:  2014-06-15       Impact factor: 4.215

9.  Genome-wide identification of mitogen-activated protein kinase gene family in Gossypium raimondii and the function of their corresponding orthologs in tetraploid cultivated cotton.

Authors:  Xueying Zhang; Liman Wang; Xiaoyang Xu; Caiping Cai; Wangzhen Guo
Journal:  BMC Plant Biol       Date:  2014-12-10       Impact factor: 4.215

10.  Genome-wide identification of MAPKK and MAPKKK gene families in tomato and transcriptional profiling analysis during development and stress response.

Authors:  Jian Wu; Jie Wang; Changtian Pan; Xiaoyan Guan; Yan Wang; Songyu Liu; Yanjun He; Jingli Chen; Lifei Chen; Gang Lu
Journal:  PLoS One       Date:  2014-07-18       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.