| Literature DB >> 14583099 |
Abstract
BACKGROUND: The late embryogenesis abundant (LEA) proteins cover a number of loosely related groups of proteins, originally found in plants but now being found in non-plant species. Their precise function is unknown, though considerable evidence suggests that LEA proteins are involved in desiccation resistance. Using a number of statistically-based bioinformatics tools the classification of a large set of LEA proteins, covering all Groups, is reexamined together with some previous findings. Searches based on peptide composition return proteins with similar composition to different LEA Groups; keyword clustering is then applied to reveal keywords and phrases suggestive of the Groups' properties.Entities:
Mesh:
Substances:
Year: 2003 PMID: 14583099 PMCID: PMC280651 DOI: 10.1186/1471-2105-4-52
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
LEA classification rule set induced by supervised learning
| Group | Rule |
| 2a | H < = 0.15 and aromatic > = 0.077 and min_hyph < = -1.97 and charged < = 0.42 |
| 2b | L > = 0.23 and H < = 0.3 and ave_hyph > = -1.233 and ave_hyph < = -0.978 |
| 2c | aromatic > = 0.077 and min_hyph < = -2.743 and charged > = 0.4 |
| 3 | H > = 0.34 |
| 1 | E > = 0.02 and ave_hyph < = -1.241 |
| LE5 | max_hyph > = 1.0 and ave_hyph < = -0.3 |
| LE14 | aliphatic > = 0.25 |
| 6 | H > = 0.25 and max_hyph > = 0.5 |
| 4 | Otherwise |
The rules are to be applied in a top-down, if .. else if, manner, so, for example, if the percentage of predicted helical conformation (expressed as a number in the range 0 .. 1.0) is greater than or equal to 0.34 then the protein is classified as a Group 3 LEA protein, but only if each of the rules above has failed, e.g. because the percentage of aromatic residues is less than 0.077 in the Group 2 rules. min_hyph and max_hyph are, respectively, the values of minimum and maximum hydrophobicity windows, while ave_hyph is the average across all the hydrophobicity windows. H, E and L refer, respectively, to the percentage composition of amino acids that are found by ProteinPredict (in four-state mode) to be alpha-helical, beta-sheet or loop.
LEA Protein Group 2 (D11) Exemplar(s): DH11_GOSHI
| ID | Species | Tissue | Expression | Pep | SF | Evidence |
| DH11_GOSHI | GOSHI | Seed | ABA, | 2Y(2), 2S(9), k(4) | 1 | PF00257_ma |
| DH14_LYCES | LYCES | Root, Stem Leaf | ABA, Salt notCold | 2Y(1), 2K(1), 2S(5) | 1 | PF00257_ma; DH1B_ORYSA (5e-17) |
| DH15_WHEAT | WHEAT | Root | ABA, Desc | 2Y(1), 2K(2) 2S(8) | 1 | PF00257_ma; DH1B_ORYSA (1e-37) |
| DH18_ARATH | ARATH | Leaf Stem | ABA, Desc notCold | 2Y(2), 2K(2) 2S(5) | 1,3 | PF00257_ma EC40_DAUCA (3e-25) |
| DH1B_ORYSA | ORYSA | Seed, Shoot | ABA, | 2Y(2), 2K(2), 2S(8) | 1,10 | PF00257_ma |
| DH1C_ORYSA | ORYSA | Seed, Shoot | ABA, | 2Y(2), 2K(2), 2S(8) | 1,10 | PF00257_hmm |
| DH1_MAIZE | MAIZE | Shoot | ABA, Desc | 2Y(1), 2K(2) 2S(7) | 1 | PF00257_ma DH1B_ORYSA (4e-47) |
| DH25_ORYSA | ORYSA | Callus | ABA, Desc notCold | 2Y(2), 2K(2), 2S(9), k(3) | 1,8 | PF00257_ma DHLE_RAPSA (2e-26) |
| DHA_CRAPL | CRAPL | Leaf | ABA, Desc | 2K(1), 2S(7) | 10 | PF00257_ma; DHLE_RAPSA (8e-20) |
| DHB_CRAPL | CRAPL | Leaf | ABA, Desc | 2Y(1), 2K(2) 2S(8) | 1 | PF00257_ma; DH1B_ORYSA |
| DHLE_RAPSA | RAPSA | Seed | 2Y(3), 2K(1) 2S(7) | 1 | PF00257_ma | |
| DHN1_PEA | PEA | Shoot Cotyledon | ABA, Desc notCanon | 2Y(3), 2K(2) | 1 | PF00257_ma DH1B_ORYSA (1e-17) |
| EC40_DAUCA | DAUCA | Seed, Embryo cells | ABA, | 2Y(3), 2K(1) 2S(6) | 1 | PF00257_hmm; |
| O22623 | VACCO | Floral buds, Leaf | Cold | None | 1,3 | DH1B_ORYSA (1e-5) |
| O65216 | WHEAT | Leaf, Root Crown, Seed | ABA, Cold, Desc | 2K(6) | 1,3 | PF00257_hmm; DH1B_ORYSA (8e-18) |
| P93701 | VIGUN | Leaf | ABA, notCold Desc, Salt | 2Y(2), 2K(1) | 1 | PF00257_hmm; EC40_DAUCA (6e-23) |
| Q39937 | HELAN | Leaf | ABA, Desc | 2Y(3), 2K(1), 2S(7) | 1 | PF00257_hmm; EC40_DAUCA (1e-38), |
| Q39938 | HELAN | Leaf | ABA, Desc | 2K(2), 2S(5), k(3) | 1 | PF00257_hmm; DH1B_ORYSA (6e-21) |
| Q40331 | MEDFA | Callus | notABA, Cold, notDesc | 2K(1) | 1 | PF00257_hmm; DH1C_ORYSA (1e-7) |
| Q40968 | PRUPE | Bark | Cold, Desc | 2Y(2), 2K(4) | 1 | PF00257_hmm; EC40_DAUCA(3e-13) |
| Q41306 | SOLCO | Leaf, Stem | ABA, Cold | 2Y(2), 2K(2), 2S(7), k(3) | 1 | PF00257_hmm; DHLE_RAPSA (9e-29) |
| Q41451 | SOLTU | Leaf, Stem | ABA, Cold | 2Y(2), 2K(2), 2S(7), k(3) | 1 | PF00257_hmm; DHLE_RAPSA (2e-29) |
| Q9SBI7 | HORVU | Seedling | Desc, ABA, Cold | 2K(9) | 1,3 (295) | PF00257_hmm; DH1B_ORYSA (5e-16) |
| Q9SPL8 | VIGUN | Seed | Cold | 2Y(2), 2K(1) | 1 | PF00257_hmm; EC40_DAUCA (1.8e-25) |
| Q9ZTR2 | HORVU | Seedling | Desc, notABA notCold | 2Y(2), 2K(3), 2S(9) | 1 | PF00257_hmm; DH1C_ORYSA (4e-33) |
| Q9ZTR3 | HORVU | Seedling | Desc, ABA notCold | 2Y(1), 2K(2), 2S(8) | 1 | PF00257_hmm; DH1B_ORYSA (2e-44) |
| Q9ZTR4 | HORVU | Seedling | Desc, ABA, notCold | 2Y(1), 2K(2), 2S(7) | 1 | PF00257_hmm; DH1C_ORYSA (1e-47) |
| Q9ZTR5 | HORVU | Seedling | Desc, ABA, notCold | 2Y(2), 2K(3), 2S(9) | 1,9 | PF00257_hmm; EC40_DAUCA (1e-43) |
| COR4_WHEAT | WHEAT | Root, Leaf Crown | ABA, Cold Desc | 2K(1), 2S(9), k(7) | 3 | PF00257_ma, EC40_DAUCA (1e-7) |
| CS12_WHEAT | WHEAT | Shoot | Cold, notABA notDesc | 2K(6) | 1,3 | PF00257_ma, EC40_DAUCA (8e-17) |
| CS66_WHEAT | WHEAT | Shoot | Cold, notABA notDesc | 2K(5) | 1,3 | PF00257_hmm, DH1B_ORYSA (6e-17) |
| DH14_ARATH | ARATH | Leaf, Stem, Root, Seed, Flower | ABA, Desc notCanon Cold | 2K(2), 2S(7), k(10) | 3 | PF00257_ma; EC40_DAUCA (2e-10) |
| DH1D_ORYSA | ORYSA | Seed Shoot | ABA Salt | 2Y(1), 2K(2), 2S(4) | 1 | PF00257_ma; DH1B_ORYSA (9e-43) |
| DH1_HORVU | HORVU | Shoot | ABA, Desc Desc | 2Y(1), 2K(2), 2S(7) | 1 | PF00257_ma; DH1B_ORYSA (2e-35) |
| DH21_ORYSA | ORYSA | Seed | ABA, Desc | 2Y(1), 2K(2), 2S(7) | 1 | PF00257_hmm; DH1B_ORYSA (3e-50) |
| DH2_HORVU | HORVU | Shoot | ABA, Desc | 2Y(1), 2K(2), 2S(7) | 1 | PF00257_hmm; DH1C_ORYSA (4e-35) |
| DH3_HORVU | HORVU | Shoot | ABA, Desc | 2Y(1), 2K(2), 2S(7) | 1 | PF00257_ma; DH1C_ORYSA (7e-45) |
| DH4_HORVU | HORVU | Shoot | ABA, Desc | 2Y(1), 2K(2), 2S(7) | 1 | PF00257_hmm; DH1C_ORYSA (2e-38) |
| DH47_ARATH | ARATH | Leaf, Stem Seed | ABA, Cold, Desc, notCanon | 2K(3), 2S(7), k(4) | 3 | PF00257_hmm DHLE_RAPSA (2e-12) |
| DHX2_ARATH | ARATH | Leaf, Stem | Cold, weak ABA weak Desc | 2K(3) | 1 | PF00257_hmm |
| O64939 | LOPEL | Root | Salt | 2K(6) | 1,3,9 | PF00257_hmm; DH1B_ORYSA (7e-16) |
| Q41347 | STELP | Leaf | ABA, Desc, PEG | 2K(1), 2S(5), 2S(6), k(3) | 8 | PF00257_hmm; DHLE_RAPSA (4e-8) |
| Q42409 | TRITU | Root, Shoot | ABA, Desc | 2K(2) | 1 | PF00257_hmm; DH1C_ORYSA (6e-16) |
| Q43488 | HORVU | Leaf | notABA, Cold Desc | 2K(1), 2S(9), k(10) | 3 | PF00257_hmm; DH1B_ORYSA (4e-9) |
| DH10_ARATH | ARATH | Leaf, Stem, Root, Seeds Flower | weak ABA, weak Desc, notCanon Cold | 2K(2), 2S(7), k(11) | 3 | PF00257_ma EC40_DAUCA (4e-9) |
| O04232 | SOLTU | Tuber | Cold | 2K(2), 2S(9), k(9) | 3 | PF00257_hmm; DH1C_ORYSA (3e-11) |
| O48622 | SPIOL | Shoot | Cold, Desc | 2K(4), k(3) | 3, (280) | EC40_DAUCA (1e-6) |
| Q41091 | PONTR | Leaf Leaf | Cold, notSalt notDesc | 2S(5), k(8) | 3 | DH1C_ORYSA (1e-8) |
| Q9XEL3 | PICGL | Bud, Stem | ABA, Cold, Desc | 2K(3), 2S(8), k(12) | 3 | PF00257_hmm; DH1C_ORYSA (2e-11) |
| Q9ZR21 | CITUN | Leaf | Cold | 2S(5), k(9) | 3 | DH1C_ORYSA (1e-9) |
LEA Group 2 proteins, with the exemplar being DH11_GOSH. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, 5) whether any of the Close LEA Group 2 motifs 2Y, 2K or 2S, or poly-lysine stutters are detected using agrep and the number of times each is found, 6) the superfamilies/stand-alone clusters in which the protein is found and 7) other evidence for accepting the protein as LEA Group 2.
Mapping from SwissProt Species Codes toSpecies Names Used in LEA Protein Group Tables
| Code | Species |
| APHAV | |
| ARATH | |
| BRANA | |
| CHLVU | |
| CICAR | |
| CITSI | |
| CITUN | |
| CRAPL | |
| DAUCA | |
| DAUCA | |
| GOSHI | |
| HELAN | |
| HORVU | |
| LOPEL | |
| LYCES | |
| MAIZE | |
| MEDFA | |
| MORBO | |
| ORYSA | |
| PEA | |
| PHAVU | |
| PICGL | |
| PONTR | |
| PRUPE | |
| PSEMZ | |
| RAPSA | |
| RICFL | |
| SOLCO | |
| SOLTU | |
| SOYBN | |
| SPIOL | |
| STELP | |
| TRITU | |
| VACCO | |
| VIGUN | |
| WHEAT |
The codes are those used in forming SwissProt protein identifiers. With a small number of exceptions for the most common species such as PEA, WHEAT and MAIZE, identifiers are generally made up of the first three letters of the genus name followed by the first two letters of the species name.
Highly significantly over- and under-represented peptides across LEA Protein Groups
| Grp | Threshold Pr | Representation | Sample of Significant Peptides (negative p-values indicate under-representation) |
| 1 | 3.9e-07 | over | G (2.6e-49), E (7.9e-36), Q (2.6e-10), R (4.2e-08), GG (5.8e-47), KGG (1.6e-41), EMG (9.5e-33), QMG (2.1e-19) |
| under | I (-3.7e-16), V (-1.1e-14), P (-1.7e-14), F (-7.4e-14), N (-1.3e-12), L (-4.2e-11), C (-6.6e-11), W (-1.7e-09) | ||
| 2 | 1.1e-10 | over | G (0), TG (0), H (7.4e-291), GG (6.4e-178), T (6.8e-122), K (1.4e-59), Q (8.7e-28), HG (5.6e-187), KLP (2.8e-170), EK (1.1e-120), YG (4.7e-101), SSS (2.0e-43) |
| under | L (-5.6e-123), F (-2.2e-81), I (-1.1e-52), V (-9.8e-47), N (-3.7e-43), R (-3.8e-39), C (-1.0e-36), W (-1.7e-35), S (-5.2e-25), | ||
| 3 | 4.3e-08 | over | A (3.6e-246), K (7.3e-140), T (3.2e-48), E (2.9e-37), Q (7.8e-32), KD (1.6e-93), AKD (1.6e-83), AKE (2.4e-45), KDY (2.1e-46), EK (1.8e-45) |
| under | L (-1.2e-89), I (-3.9e-61), P (-2.8e-51), F (-6.1e-35), W (-8.3e-27), C (-4.2e-19), N (-1.5e-14), R (-2.3e-12) | ||
| 4 | 4.1e-04 | over | TG (1.9e-17), G (4.7e-14), T (9.5e-12), GH (8.7e-09), A (7.8e-08), AKA (1.9e-07), EK (8.6e-07), AA (3.8e-05) |
| under | L (-5.1e-19), I (-4.8e-09), F (-1.4e-08), V (-5.6e-06), C (-4.6e-05), W (-1.8e-04), S (-8.5e-04) | ||
| 5 | 6.2e-04 | over | AKE (1.2e-17), K (9.2e-13), A (4.1e-10), E (1.0e-08), EK (2.7e-05) |
| under | L (-5.2e-11), P (-4.3e-08), I (-1.5e-06) | ||
| 6 | 8.4e-04 | over | A (6.5e-30), AA (3.1e-18), AT (1.7e-08), AE (1.8e-07), QS (2.2e-06), GV (3.7e-06), GG (8.6e-06), Q (4.1e-05), V (1.6e-04), QSA (2e-13) |
| under | L (-2.2e-10), F (-8.9e-09), C (-7.9e-07), Y (-6.0e-06), I (-3.5e-05), K (-3.3e-04), W (-3.0e-04) | ||
| Lea5 | 1.2e-03 | over | A (4.4e-05), GA (4.1e-05), GY (8.8e-05), SS (1.3e-04), R (2.6e-04), S (6.9e-04) |
| under | Q (-4.9e-04) | ||
| Lea14 | 1.3e-03 | over | IP (1.1e-07), D (7.7e-05), K (3.3e-04), I (1.2e-03) |
| under | R (-4.1e-06), Q (-8.0e-06), F (-3.3e-03) |
Applying popp_create.py to each group of LEA protein sequence taken as a whole, the table lists a sample of the peptides that are highly over-represented or highly under-represented, i.e. their probabilities are more stringent than the thresholds listed in the second column. The different thresholds arise due to differences in the numbers of sequences, hence differing amino acid counts, corresponding to each Group.
LEA Protein Group 1 (D19) Exemplar(s): LE19_GOSHI
| ID | Species | Tissue | Expression | Pep | SF | Evidence |
| EM1_ARATH | ARATH | Seed | ABA, | 4 | PF00477_hmm; L194_HORVU (1e-67) | |
| EM1_WHEAT | WHEAT | Seed | ABA, | 1(1) | 6 | PF00477_ma |
| EM2_WHEAT | WHEAT | Seed | ABA, | 1(1) | 6 | PF00477_hmm, EMP1_ORYSA (2e-41) |
| EM6_ARATH | ARATH | Seed | ABA, | 1(1) | 6 | PF00477_ma; EMB1_DAUCA (9e-38) |
| EMB1_DAUCA | DAUCA | Seed | 1(1) | 4 | PF00477_hmm | |
| EMB5_MAIZE | MAIZE | Seed | ABA, | 1(1) | 6 | PF00477_hmm |
| EMP1_ORYSA | ORYSA | Seed | ABA, | 1(1) | 6 | PF00477_hmm |
| L193_HORVU | HORVU | Seed | ABA, notCold, | 1(3) | 4 | PF00477_hmm |
| L194_HORVU | HORVU | Seed | ABA, notCold | 1(4) | 4 | PF00477_hmm |
| L19A_HORVU | HORVU | Seed | ABA, notCold, | 1(1) | 6 | PF00477_hmm |
| L19B_HORVU | HORVU | Seed | ABA, notCold, | 1(1) | 6 | PF00477_hmm |
| LE10_HELAN | HELAN | Seed | ABA, | 4 | PF00477_hmm | |
| LE19_GOSHI | GOSHI | Seed | ABA, | 4 | PF00477_ma | |
| SEEP_RAPSA | RAPSA | Seed | 1(1) | 3,6 (280) | PF00477_hmm PF00477_hmm |
LEA Group 1 proteins, with the exemplar being LE19_GOSHI. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, 5) whether the LEA Group 1 motif is detected using agrep and the number of times it is found, 6) the superfamilies/stand-alone clusters in which the protein is found and 7) other evidence for accepting the protein as LEA Group 1.
Uncharacterised LEA Proteins
| ID | Species | Tissue | Expression | SF |
| O24439 | PHAVU | Root, Stem, Embryo | ABA, Desc, | (295) |
| O81483 | ARATH | Seed | notABA, notDesc, notSalt, | (279) |
| Q9S7S3 | ARATH | Seed | notABA, Cold, notDesc, notSalt, | (279) |
Currently uncharacterised proteins which have expression patterns that are literally late embryogenesis, but which have no similarity to any of the previously described proteins. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, and 5) the stand-alone clusters in which the protein is found. Note that one pair only cluster with each other, while the third is found in a stand-alone cluster together with a Group 2 LEA protein.
Consensus POPPs for the anchor families of each superfamily
| Group | SF | Anchor Family Consensus POPP |
| 1 | 4 | +E, +G, +EG, +GE, +GG, +KG, +QE, +RK, +GGE, +KGG |
| 1 | 6 | +E, +G, +DE, +EG, +ES, +GG, +GQ, +RE, +RK, +ARE, +DES, +REG |
| 2 | 1 | +G, -L, +EK, +GG, +GT, +EKL, +IKE, +KEK, +KIK, +KKG, +KLP, +LPG |
| 2 | 3 | -F, -I, -L, -R, -W, +DK, +EK, +KK, +KL, +LP, +TH, +EKK, +KEK, +KLP, +LPG |
| 2 | 8 | +EK, +SS, +EKI, +KEK, +KIK, +SSS |
| 2 | 9 | -F, +G, -I, -L, -V, +AG, +EK, +GG, +GH, +GT, +TA, +TG, +GGT, +GTG, +TAG, +TGG |
| 2 | 10 | -F, +G, -I, +AG, +EK, +GG, +GQ, +KE, +SS, +EKL, +GAG, +IKE, +KEK, +KLP, +LPG, +SSS |
| 3 | 2 | +A, -C, +E, -F, -I, +K, -L, -P, +AE, +AK, +EK, +ET, +GE, +GK, +KE, +AAE, +AKD, +EKA |
| 3 | 5 | +A, -I, +K, -L, -P, +Q, +T, -V, +AA, +AQ, +EK, +KE, +KT, +QA, +QQ, +QS, +QT, +TQ, +AAK, +AQA, +EKT, +QAA, +TQQ |
| 6 | 7 | +A, -F, -L, +AA, +AE, +MQ, +QS, +VA, +AAA, +GVA, +QSA, +SAA |
| Lea5 | 299 | +A, +R, +S, +AM, +GA, +GY, +RP, +SF, +SS, +YS |
| Lea14 | 297 | +D, -R, +AS, +IP, +KV, +VS, +TIP |
Clusters, families and superfamilies closely mirror the structure of the LEA Groups, with the exception of Group 4 and Group 5. Against each LEA Group are listed the superfamilies that contain proteins from that Group (column 2) and the peptides forming the consensus POPP of the anchor (i.e. most typical) family in the superfamily. '+' before a peptide indicates significant over-representation; '-' indicates significant under-representation.
LEA Protein Group 3 (D7) Exemplar(s): LE7_GOSHI, LE76_BRANA
| ID | Species | Tissue | Expression | Pep | SF | Evidence |
| DRPF_CRAPL | CRAPL | Leaf | ABA, Desc | 3(1), k(3) | 2 | PF02987_ma; LE76_BRANA (3e-12) |
| EDC8_DAUCA | DAUCA | Seed | ABA, | 3(5) | 2 | PF02987_ma |
| LE76_BRANA | BRANA | Seed | ABA, | 3(5) | 5 | PF02987_ma |
| LE7_GOSHI | GOSHI | Seed | ABA, | 3(2) | 2 | PF02987_ma |
| LEA1_HORVU | HORVU | Aleurone | ABA, Cold, Desc | 3(7) | 2 | PF02987_ma |
| LEA3_MAIZE | MAIZE | Seed, Leaf Shoot | ABA, Desc, | 3(4) | 2 | PF02987_ma |
| LEA3_WHEAT | WHEAT | Shoot | ABA, Desc | 3(7) | 2 | PF02987_hmm; LEA1_HORVU (1e-100) |
| LED3_DAUCA | DAUCA | Seed | 3(4) | 2 | LE7_GOSHI (3e-27) | |
| O49816 | CICAR | Mesocotyl | notABA, notCold, Desc, Salt | 3(5) | 5 | PF02987_ma; LE76_BRANA (1e-46) |
| O49817 | CICAR | Mesocotyl | notABA, notCold Desc, Salt | 3(4) | 5 | PF02987_ma; LE76_BRANA (5e-36) |
| Q03967 | WHEAT | Shoot | Desc, | 2 | PF02987_ma | |
| Q06540 | WHEAT | Shoot | Cold, notABA notDesc, notSalt | 2S(3) | 2 | Q39660 (6e-8) DRPF_CRAPL (5e-5) |
| Q39058 | ARATH | Shoot | ABA, Cold, notDesc | 2 | Q39873 (7e-8) | |
| Q39660 | CHLVU | Whole cells | Cold | 2 | PF02987_ma; LE76_BRANA (2e-5) | |
| Q39873 | SOYBN | Seed, Leaf, Root | ABA, | 2 | PF02987_ma; EDC8_DAUCA (8e-37) | |
| Q40696 | ORYSA | Root | ABA, Salt | 3(5) | 2 | PF02987_ma; LEA1_HORVU (2e-51) |
| Q40709 | ORYSA | Shoot | notABA, Cold Mannitol | 3(3) | 2 | PF02987_ma; LEA3_MAIZE (2e-44) |
| Q40869 | PICGL | Embryo | ABA | 2 | PF02987_ma; LEA1_HORVU (9e-14) | |
| Q40929 | PSEMZ | Seed | Cold, | 3(1) | 5 | PF02987_ma; LE76_BRANA (2e-23) |
| Q41060 | PEA | Seed | notABA, Sucrose, notCanon | 3(1) | 2 | PF02987_ma; LE76_BRANA (8e-14) |
| Q41154 | RICFL | Thalli | ABA, Desc | 2 | PF02987_ma; EDC8_DAUCA (1e-31) | |
| Q41213 | BRANA | Shoot, Seed | notABA, Cold notDesc, notCanon | 2 | PF02987_hmm; EDC8_DAUCA (2e-5) | |
| Q42386 | BRANA | Leaf | notABA, Cold | 2 | PF02987_hmm; EDC8_DAUCA (5e-6) | |
| Q42512 | ARATH | Shoot | ABA, Cold, Desc | 2 | LEA3_MAIZE (6e-5) | |
| Q95V77 | APHAV | Whole animal | Desc | 3(1) | 2 | LEA1_HORVU (1e-13) |
| Q96246 | ARATH | Seed, immature silique | ABA, | 3(1) | 2 | PF02987_ma; EDC8_DAUCA (1e-73) |
| Q9M4T9 | WHEAT | Shoot | ABA, Cold | 3(1) | 2 | PF02987_ma; LEA1_HORVU (1e-26) |
| Q9SDV6 | WHEAT | Shoot | Cold, notABA notDesc, notSalt | 2S(3) | 2 | PF02987_ma Q39873 (2e-4) |
| Q9XET0 | SOYBN | Seed | 3(4) | 5 | PF02987_ma; LE7_GOSHI (2e-28) | |
| Q9XFD0 | WHEAT | Shoot | ABA, Cold | 3(4) | 2 | PF02987_ma; LEA1_HORVU (4e-56) |
LEA Group 3 proteins, with the exemplars being LE7_GOSHI and LE76_BRANA. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, 5) whether the Group 3 motif or poly-lysine stutters are detected using agrep and the number of times each is found, 6) the superfamilies/stand-alone clusters in which the protein is found and 7) other evidence for accepting the protein as LEA Group 3. Note the presence in two cases of the 2S (i.e. poly-serine) motif.
LEA Protein Group 4 (D113) Exemplar(s): LE13_GOSHI
| ID | Species | Tissue | Expression | Pep | SF | Evidence |
| LE11_HELAN | HELAN | Seed, Shoot | ABA, Desc, | 2 | PF03760_hmm; PM1_SOYBN (7e-27) | |
| LE13_GOSHI | GOSHI | Seed | ABA, | 9 | PF03760_hmm; PM1_SOYBN (1e-19) | |
| LE25_LYCES | LYCES | Leaf | ABA, Desc | 2 | PF03760_hmm; PM1_SOYBN (9e-18) | |
| O24442 | PHAVU | Root, Embryo | ABA, Desc, | 1 | PF03760_hmm PM1_SOYBN (2e-32) | |
| PM1_SOYBN | SOYBN | Seed | ABA, | 2Y(1) | 1 | PF03760_ma |
LEA Group 4 proteins, with the exemplars being LE13_GOSHI. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, 5) whether any of the LEA Group 1, 2 or 3 motifs or poly-lysine stutters are detected using agrep and the number of times each is found, 6) the superfamilies/stand-alone clusters in which the protein is found and 7) other evidence for accepting the protein as LEA Group 4. Note the presence in one case of the 2Y motif.
LEA Protein Group 5 (D29) Exemplar(s): LE29_GOSHI
| ID | Species | Tissue | Expression | Pep | SF | Evidence |
| LE29_GOSHI | GOSHI | Seed | ABA, | k(3) | 2 | PF02987_ma |
| Q93Y63 | MORBO | Cortical parenchymal cells | ABA, Cold, Desc | 3(2) | 2 | LE29_GOSHI (2e-26) |
LEA Group 5 proteins, with the exemplar being LE29_GOSHI. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, 5) whether any of the LEA Group 1, 2 or 3 motifs are detected using agrep and the number of times it is found, 6) the superfamilies/stand-alone clusters in which the protein is found and 7) other evidence for accepting the protein as LEA Group 5. Note the presence of the Group 3 motif and poly-lysine.
LEA Protein Group 6 (D34) Exemplar(s): LE34_GOSHI
| ID | Species | Tissue | Expression | SF | Evidence |
| LE34_GOSHI | GOSHI | Seed | ABA, | 7 | |
| Q41850 | MAIZE | Embryo, Leaf | ABA, Desc, | 7 | LE34_GOSHI (6e-61) |
| Q43424 | DAUCA | Embryo | ABA, | 7 | LE34_GOSHI (4e-75) |
| Q96245 | ARATH | Seed | 7 | LE34_GOSHI (2e-77) |
LEA Group 6 proteins, with the exemplar being LE34_GOSHI. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, 5) the superfamilies/stand-alone clusters in which the protein is found and 6) other evidence for accepting the protein as LEA Group 6. None of the LEA Group 1, 2 or 3 motifs match these protein sequences.
LEA Protein Group Lea5 (D73) Exemplar(s): LE5A_GOSHI
| ID | Species | Tissue | Expression | Pep | SF | Evidence |
| LE5A_GOSHI | GOSHI | Leaf | Desc | 2S(4) | (299) | PF03242_hmm |
| LE5D_GOSHI | GOSHI | Leaf | Desc | 2S(4) | (299) | PF03242_ma |
| Q39644 | CITSI | Leaf, Ovule | Salt, notCold | PF03242_hmm, LE5D_GHOSHI (2.4e-46) |
Lea5/D73 proteins – currently not part of any numbering scheme for LEA proteins – with the exemplar being LE5A_GOSHI. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, 5) whether any of the LEA Group 1, 2 or 3 motifs or poly-lysine stutters are detected using agrep and the number of times each is found, 6) the superfamilies/stand-alone clusters in which the protein is found and 7) other evidence for accepting the protein as LEA Group Lea5/D73. Note the presence in two cases of the 2S motif (poly-serine stutter). Note also that two of the three proteins are found in a single, stand-alone cluster containing just the pair of proteins, while the other sequence is not found in any cluster.
LEA Protein Group Lea14 (D95) Exemplar(s): LE14_GOSHI
| ID | Species | Tissue | Expression | SF | Evidence |
| DRPD_CRAPL | CRAPL | Leaf | ABA, Desc | PF03168_ma; LE14_SOYBN (2e-52) | |
| LE14_GOSHI | GOSHI | Leaf | Desc | (297) | PF03168_ma; LE14_SOYBN (2e-64) |
| LE14_SOYBN | SOYBN | Leaf | ABA, | (297) | PF03168_ma |
| Q40159 | LYCES | Root | notABA, notDesc, possible osmotic stress | 3 | PF03168_ma, LE14_SOYBN (8e-53) |
Lea14/D95 proteins with the exemplar being LE5A_GOSHI. The columns are: 1) the protein identifier, 2) a code for the species (see Table 10), 3) the tissue(s) in which it has been found, 4) the conditions that give rise (or fail to give rise) to the expression of the gene, 5) the superfamilies/stand-alone clusters in which the protein is found and 6) other evidence for accepting the protein as LEA Group 6. None of the LEA Group 1, 2 or 3 motifs match these protein sequences. Note that two of the three proteins are found in a single, stand-alone cluster containing just the pair of proteins, one protein is found clustered with Group 2 LEA proteins in SF 3, while the other sequence is not found in any cluster.
Keywords/Phrases for each Group and Superfamily
| Group | SF | Principal Keywords/Phrases |
| 1 | 4 | histone H4, chromosomal protein, nuclear protein, DNA binding |
| 1 | 6 | dsRNA binding, DNA gyrase, breakage, CLP, ATP binding |
| 2 | 1 | break, ATP binding, DNA topoisomerase, protein biosynthesis, topoisomerase, repair |
| 2 | 3 | coiled, coil, nuclear protein, caldesmon, histone H1, chaperone, tropomyosin filament, break, DNA topoisomerase |
| 2 | 8 | DNA topoisomerase, nuclear protein, HMG box, coiled coil |
| 2 | 9 | transcriptional inhibition, glycosyl hydrolase, nuclear protein, |
| 2 | 10 | nuclear protein, DNA binding, transcription regulation, intermediate filament, keratin, chaperone, homeobox, coiled coil, HMG box domain, cytoskeletal |
| 3 | 2 | chaperone, coiled coil, tropomyosin, stress, filament, phosphorylation, caldesmon elongation factor, neurofilament, actin binding, cytoskeleton, rotamase |
| 3 | 5 | coiled coil, histone H1, filament, nuclear protein, neurofilament, flagella, HAMP domain, synuclein, DNA binding, hsp70 |
| 6 | 7 | groel protein, nuclear protein, histone H1, chaperonin, DNA binding, HAMP domain, synuclein, transcription regulation |
| Group | Cluster | Principal Keywords/Phrases |
| Lea5 | 299 | DNA binding, transcription regulation, nuclear protein, gata, zinc finger, homeobox |
| Lea14 | 297 | esterase, gapdh, chaperone protein DNA, glycoprotein |
For each superfamily, the consensus POPP (Table 13) has been set as a query against a database of POPP vectors representing SwissProt. The protein hits, excluding LEA proteins for each query were submitted to the Protein Annotators' Assistant, which returns a list of keywords and phrases shared by sets of the submitted proteins. A sample of the most prominent are listed against the Group/Class and the corresponding superfamilies/clusters. Rather than being understood as the actual functions which the search hits share with the LEA proteins, matches based on shared biases in peptide composition can indicate shared mechanisms or structural elements.
Comparison of New LEA Protein Classes with Previous Group Classifications
| Class | Baker/Dure | Bray 1994 | Bray 2000 | Cuming | Comments |
| I | D19 | 1 | 1 | 1 | |
| IIa | D11 | 2 | 2 | 2 | Includes some Group 4 (D113); Subgroup 2a from Rules |
| IIb | D11 | 2 | 2 | 2 | Subgroups 2b and 2c from the Rules |
| III | D7 | 3 | 3 | 3 | Includes Group 5 (D29) and remainder of Group 4 (D113) |
| IV | D34 | 6 | - | 5 | |
| V | - | - | - | - | Lea5/D73 – Named in [ |
| VI | - | - | 4 | 5 | Lea14/D95 – Named in [ |
A proposed LEA Class numbering scheme (column 1), encompassing all the Groups listed above with the exception of Group 4 and Group 5, is compared with past numbering schemes from: Baker/Dure (column 2), Bray 1994 (column 3), Bray 2000 (column 4) and Cuming (column 5).