| Literature DB >> 22615859 |
Emmanuel Jaspard1, David Macherel, Gilles Hunault.
Abstract
Late Embryogenesis Abundant Proteins (LEAPs) are ubiquitous proteins expected to play major roles in desiccation tolerance. Little is known about their structure - function relationships because of the scarcity of 3-D structures for LEAPs. The previous building of LEAPdb, a database dedicated to LEAPs from plants and other organisms, led to the classification of 710 LEAPs into 12 non-overlapping classes with distinct properties. Using this resource, numerous physico-chemical properties of LEAPs and amino acid usage by LEAPs have been computed and statistically analyzed, revealing distinctive features for each class. This unprecedented analysis allowed a rigorous characterization of the 12 LEAP classes, which differed also in multiple structural and physico-chemical features. Although most LEAPs can be predicted as intrinsically disordered proteins, the analysis indicates that LEAP class 7 (PF03168) and probably LEAP class 11 (PF04927) are natively folded proteins. This study thus provides a detailed description of the structural properties of this protein family opening the path toward further LEAP structure - function analysis. Finally, since each LEAP class can be clearly characterized by a unique set of physico-chemical properties, this will allow development of software to predict proteins as LEAPs.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22615859 PMCID: PMC3353982 DOI: 10.1371/journal.pone.0036968
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Some characteristics of the 12 LEAP classes.
| Class | PFAM | Motifs | LEAP number | Length range | Class consensus sequence | |||
| Total characters | Gap number | % gap | % similarity (35%<CV<60%) | |||||
|
| PF00257 | [GS]SSE.[DEG] | 145 | 117–507 | 790 | 680 | 86.1 | 13.9 |
|
| PF00257 | S{5,}[ADGV][DES][DEGKT]. | 65 | 140–292 | 478 | 338 | 70.7 | 29.3 |
|
| PF00257 | DSD$ | 20 | 86–179 | 219 | 129 | 58.9 | 41.1 |
|
| PF00257 | Motif class 4 | 63 | 83–616 | 724 | 664 | 91.7 | 8.3 |
|
| PF00477 | G[AG][ENQT].R[AKR][DEQ] | 58 | 83–217 | 235 | 145 | 61.7 | 38.3 |
|
| PF02987 | Motif class 6 | 125 | 67–742 | 847 | 767 | 90.6 | 9.4 |
|
| PF03168 | NPY.{4,}P[IV].[ADEQ] | 30 | 95–181 | 212 | 72 | 34.0 | 66.0 |
|
| PF03168 | [AILV].{0,1}NPN.[FIRSVY] | 35 | 153–368 | 440 | 329 | 74.8 | 25.2 |
|
| PF03242 | W.{1,3}DP.{1,3}G | 64 | 78–144 | 191 | 134 | 70.2 | 29.8 |
|
| PF03760 | [AS].{3,3}[EG][HK].[DE].{3,3}[AT].{4,4}[DEKQ].{3,3}[AT] | 68 | 88–173 | 195 | 68 | 34.9 | 65.1 |
|
| PF04927 | (T.GEAL[EH]A)|(PGGVA) | 20 | 159–278 | 307 | 133 | 43.3 | 56.7 |
|
| PF10714 | [HY]K.{2,2}[AG]Y | 17 | 71–117 | 124 | 58 | 46.8 | 53.2 |
Meaning of the regular expression syntax used for motifs: «.» = any amino acid; X{n, } = at least n times X; X{n,m} = n to m times X; [XY] = X or Y; [∧XY] = neither X nor Y; (XY) = X followed by Y; X? = X present or not; XY$ = XY at the end; (M1)|(M2) = motif M1 or motif M2 or both.
Number of sequences in LEAPdb using the motif indicated.
Amino acid sequences length range of LEAP classes in LEAPdb.
Consensus sequences of the LEAP classes obtained using Multalin [74]: alignment of all sequences of each LEAP class was performed with a low consensus value (CV) = 35% and a high consensus value = 60% (i.e., above the «twilight zone» [31]) with a PAM matrix (since sequences of each LEAPs class are either distant or not). Gap penalties values (gap open penalty = 2/gap extension penalty = 0/no gap penalty for extremities) were chosen in order to have not stringent conditions for the alignments, thus introducing numerous gaps (see the gaps percentage). This «local - global alignment» of each LEAP class sequences leads to a consensus sequence for each LEAP class, revealing a high level of similarity between those sequences (also much above the «twilight zone»), especially in the case of LEAP classes 3, 5, 7, 10, 11 and 12.
Motif class 4: STTAPGHY|HKTGTTTS|GGGGIGTG|HS[DR]N?K$|DVE$|LH(TRASHEES)?$|C?TGH$|DKLPGQH$|QQN(KTGCD)?$| RGD$|KEGY$|GHRPQI$|GHNN$|SFKS$|GTHKGL$|SSRDNY$|GQSK$|HRDV$|NDL$.
Motif class 6: [∧LNP][∧G][ADEGILMQRSTVY] [AEKQRSTY].[KR][AT].[ADENT][∧DP][EGIKLMQST].{1,67}[∧DER][∧AS]K[AD][∧IL][∧N].[∧E]?.{1,6}G?
Figure 1Schematic representation of the 12 LEAP classes.
(A) Consensus sequences of the LEAP classes. The way they were obtained introduced gaps (see Material and Methods). Therefore, the lengths indicated in the figure do not reflect real LEAP sequence lengths. (B) Radial phylogram obtained with the 12 LEAP clas consensus sequences.
Physico-chemical properties and combinations of plain percentages of amino acids of LEAPs.
| Physico-chemical properties | |
| Length | Number of amino acids |
| MW | Molecular weight |
| MW/Length | Mean molecular weight |
| pI | Isoelectric point |
| Foldindex | Numerical prediction of intrinsic folding propensity |
| Net charge | Mean net charge at pH 7 |
| Hydrophilicity | Mean hydrophylicity (scale: Hopp & Woods |
| GRAVY | Grand average of hydropathy (scale: Kyte & Doolittle |
| Hydrophobicity | Mean hydrophobicity (<H>) (scale: Eisenberg, Schwarz, Wall |
| Bulkiness | Mean bulkiness (scale: Zimmerman, Eliezer & Simha |
| Flexibility | Mean flexibility (scale: Bhaskaran & Ponnuswamy |
| Residues accessibility | Mean value of the molar fraction of 3220 accessible values per residue (scale: Janin |
| Buried residues | Mean value of the molar fraction of 2001 buried values per residue (scale: Janin |
| Transmembrane tendency | Mean transmembrane tendency value (scale: Zhao & London |
Binarya representation of the physico-chemical properties distribution among LEAP classes, IDP and FS proteins.
| LEAP Class | MW/Length | Fold Index | Mean bulkiness | Mean flexibility | MBR | MAR | MTT | pI | MNC pH 7 | Mean hydrophilicity | GRAVY | <H> |
|
| −1 | −1 | −1 | +1 | +1 | −1 | +1 | +1 | +1 | +1 | −1 | −1 |
|
| +1 | −1 | +1 | +1 | −1 | +1 | −1 | −1 | −1 | +1 | −1 | −1 |
|
| +1 | −1 | −1 | +1 | −1 | +1 | −1 | −1 | −1 | +1 | −1 | −1 |
|
| −1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 |
|
| +1 | −1 | −1 | +1 | −1 | +1 | −1 | −1 | −1 | +1 | −1 | −1 |
|
| −1 | −1 | +1 | −1 | −1 | +1 | −1 | +1 | +1 | +1 | −1 | −1 |
|
| +1 | +1 | +1 | −1 | +1 | −1 | +1 | −1 | −1 | +1 | −1 | +1 |
|
| +1 | +1 | +1 | −1 | +1 | −1 | +1 | −1 | −1 | +1 | −1 | +1 |
|
| +1 | +1 | +1 | −1 | +1 | −1 | +1 | +1 | +1 | +1 | −1 | −1 |
|
| −1 | −1 | +1 | −1 | −1 | −1 | −1 | +1 | +1 | +1 | −1 | −1 |
|
| −1 | +1 | +1 | −1 | +1 | −1 | +1 | −1 | −1 | +1 | −1 | +1 |
|
| −1 | −1 | +1 | +1 | −1 | +1 | −1 | −1 | −1 | +1 | −1 | −1 |
|
| +1 | −1 | +1 | −1 | −1 | −1 | −1 | −1 | −1 | +1 | −1 | −1 |
|
| +1 | +1 | +1 | −1 | +1 | −1 | +1 | −1 | −1 | −1 | −1 | +1 |
Values +1 and −1 indicate that the physico-chemical properties considered is upper or lower, than either the overall median value or a reference value (e.g., 7 for pI).
Mean molar fraction of buried residues.
Mean molar fraction of accessible residues.
Mean transmembrane tendancy.
Mean net charge at pH 7.
Grand average of hydropathy.
Mean hydrophobicity.
Intrinsically disordered proteins dataset.
Fully structured proteins dataset.
Binary representation of amino acids usage by LEAPs, IDP and FS proteins compared to the overall proteins contained in Uniprot.
| LEAP Class | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y |
|
| −1 | −1 | −1 | −1 | −1 | +1 | +1 | −1 | +1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 | +1 |
|
| −1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | +1 | −1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 | −1 | −1 |
|
| −1 | −1 | +1 | +1 | −1 | +1 | +1 | −1 | +1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 |
|
| −1 | −1 | −1 | −1 | −1 | +1 | +1 | −1 | +1 | −1 | −1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 | +1 |
|
| −1 | −1 | −1 | +1 | −1 | +1 | −1 | −1 | +1 | −1 | +1 | −1 | −1 | +1 | +1 | −1 | +1 | −1 | −1 | −1 |
|
| +1 | −1 | +1 | +1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 | −1 |
|
| −1 | −1 | +1 | −1 | −1 | −1 | −1 | +1 | +1 | −1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | +1 | −1 | −1 |
|
| −1 | −1 | +1 | +1 | −1 | +1 | −1 | +1 | +1 | −1 | −1 | −1 | +1 | −1 | −1 | −1 | +1 | +1 | −1 | −1 |
|
| +1 | −1 | −1 | −1 | −1 | +1 | −1 | −1 | +1 | −1 | +1 | −1 | −1 | −1 | +1 | +1 | −1 | −1 | −1 | +1 |
|
| +1 | −1 | −1 | +1 | −1 | +1 | +1 | −1 | +1 | −1 | +1 | +1 | +1 | +1 | −1 | −1 | +1 | −1 | −1 | −1 |
|
| +1 | −1 | +1 | −1 | −1 | +1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 | +1 | −1 | −1 | +1 | +1 | −1 | −1 |
|
| +1 | −1 | +1 | +1 | −1 | +1 | +1 | −1 | +1 | −1 | −1 | −1 | +1 | +1 | −1 | −1 | +1 | −1 | −1 | +1 |
|
| −1 | −1 | −1 | +1 | −1 | +1 | +1 | −1 | +1 | −1 | −1 | −1 | +1 | −1 | −1 | −1 | −1 | −1 | −1 | −1 |
|
| −1 | −1 | +1 | −1 | −1 | +1 | +1 | −1 | −1 | −1 | −1 | +1 | −1 | −1 | −1 | −1 | +1 | +1 | +1 | +1 |
Values +1 and −1 indicate that the median value of the ratio (% amino acid considered in LEAP/% amino acid considered in Uniprot) is upper or lower than 1 (Figure 3 and Figures S4, S5, S6 and S7).
Intrinsically disordered proteins dataset.
Fully structured proteins dataset.
Figure 3Boxplot representation of charged amino acid usage by the 12 LEAP classes, IDP and FS proteins.
The percentage of each amino acid was first calculated for each LEAP class. This value was then divided by the percentage of each amino acid found in the release 2010_04 of UniProtKB/Swiss-Prot [41]. This ratio thus describes the frequency of usage of each amino acid by LEAPs. The line corresponds to a ratio equal to 1. (A) Ratio for Asp. (B) Ratio for Glu. (C) Ratio for Arg. (D) Ratio for Lys.
Figure 2Boxplot representation of MW/length ratio, FoldIndex, mean bulkiness and mean flexibility of the 12 LEAP classes, IDP and FS proteins.
The line indicates either the mean or the median value calculated for the 12 LEAP classes. In the case of the FoldIndex, the line corresponds to 0. (A) MW/Length. (B) FoldIndex. Data are presented as a graded scale for a better comparison of classes vs. IDP and FS datasets. (C) Mean bulkiness. (D) Mean flexibility.
Figure 4Some normal and non normal distributions of the variables for the 710 LEAP contained in LEAPdb
[ . The red line corresponds to the normal distribution associated to the data whereas the blue line corresponds to the estimated density curve. (A) and (B) Normal distributions for mean net charge at pH 7 and [D+E−K−R] combination. (C) and (D) Non normal distributions for isoelectric point and mean bulkiness.
Figure 5The main projection of the variables on the first two axis of the PCA.
This is the correlation circle of the projection of the 45 variables using the first two components of the PCA. Axis I is horizontal and axis II is vertical. The most- contributing variables are in black, the others in grey.
Groups of highly inter-correlated variables and other high correlation coefficients among variables using Spearman's r.
| Groups of highly inter correlated variables | |||
| Count | Highest r | Lowest r | Variables |
| 5 | 0.984 | 0.805 | GRAVY; FoldIndex; Mean hydrophobicity; Mean molar fraction of buried residues; Mean transmembrane tendency |
| 4 | 0.929 | 0.798 | [D+E+K+R] combination; [D+E] combination; Mean hydrophilicity; (% Glu LEAP/% Glu Uniprot) |
Figure 6Mean normalized hydrophobicity (
The two areas are delimited by the following equations:
Figure 7Mean normalized hydrophobicity (
The two areas are delimited by the following equations, respectively:
Compilation of secondary structure data available for LEAPs.
| ACCESSION | CLASS | STRUCTURAL FEATURES AND TRANSITIONS | METHODS | SPECIES | REF. |
| P_201441 | 1 | 12% PII helix; no α-helix induction with TFE | CD |
|
|
| AAK00404 | 1 | 15% α-helix; 30% α-helix with TFE; no structural change with lipid vesicles | 1H-NMR, CD |
|
|
| CAA33364 | 1 | Largely unstructured; 9–10% α-helix with lipid vesicles or SDS (attributed to K segment, a 15 aminoacid peptide) | CD |
|
|
| NP_850947 | 2 | 15% α-helix; 30% α-helix with TFE; no structural change with lipid vesicles | 1H-NMR, CD |
|
|
| NP_850947 | 2 | 12% PII helix; 20–30% α-helix with TFE; | CD |
|
|
| NP_173468 | 2 | 5% α-helix, 15% PII helix; 50% α-helix with TFE; | CD |
|
|
| ADK66263 | 2 | Largely unstructured, presence of PII helix at low temperature; 3–10% α-helix and 50–75% β-sheet with lipid vesicles; Structural transitions stimulated by phosphorylation, Zn | FTIR |
|
|
| AEE78733 | 4 | 5% α-helix, 12% PII helix; 20–30% α-helix with TFE; | CD |
|
|
| AAA18834 | 4 | 27% PII helix at 12°C, decrease to 15% at 80°C; no α-helix induction with TFE or SDS | CD |
|
|
| CAA77508 | 5 | 1% α-helix and 18% β-sheet; increase to 38% α-helix upon drying | CD |
|
|
| NP_190749 | 5 | 3% α-helix and 19% β-sheet; increase to 23% α-helix upon drying | CD |
|
|
| ABB13462 | 5 | 33% α-helix; increase to 56% α-helix upon drying | FTIR |
|
|
| AAB68027 | 5 | 14% PII helix, 8% α-helix; 30% α-helix with TFE; | CD |
|
|
| CAA36323 | 5 | 13% α-helix and 17% β-sheet; 29% α-helix with TFE | CD |
|
|
| NP_181782 | 6 | 10% α-helix in the hydrated state, 65% α-helix in the dry state | CD |
|
|
| NP_181781 | 6 | 17% α-helix in the hydrated state, 57% α-helix in the dry state | CD |
|
|
| NP_175678 | 6 | 27% α-helix and 15% β-sheet in the dry state; α-helix formation is favored by lipid vesicles in the hydrated and dry states. β-sheet is attributed to aggregation. | CD, FTIR |
|
|
| CAF32327 | 6 | 3% α-helix and 14% β-sheet in the hydrated state; 70% α-helix with TFE or in the dry state | CD, FTIR |
|
|
| AAL18843 | 6 | Largely unstructured; α-helix and coiled coil formation upon drying | CD, FTIR |
|
|
| O03983 | 7 | Structured protein: 9% α-helix and 41% β-sheet | NMR |
|
|
| NP_182137 | 7 | Structured protein: 18% α-helix and 42% β-sheet | NMR |
| |
| ACJ46652 | 9 | 3% α-helix and 25% β-sheet; 40% α-helix with TFE or upon drying | Sync rad, CD, 1H NMR, FTIR |
|
|
| AAC61808 | 10 | 15% β-sheet in the hydrated state, 35% β-sheet with lipid vesicles | CD |
|
|
| ABB72365 | 10 | 25% α-helix; 90% α-helix with TFE, SDS or upon drying | CD, FTIR |
|
|
| Q01417 | 10 | 15–17% α-helix and 15–16% β-sheet; 36% α-helix upon drying | CD, FTIR |
|
|
| AAD09208 | 10 | Largely unstructured | CD |
|
|
| AAF21311 | 11 | 33% α-helix, 18% β-sheet; increase to56% α-helix, 25% β-sheet upon drying | FTIR |
|
|
| AAF21311 | 11 | FTIR spectroscopy | CD |
|
|
| NP_179892 | 12 | 3% α-helix and 20% β-sheet; increase to 25% α-helix upon drying | CD |
|
|
| NP_565548 | 12 | 3% α-helix and 16% β-sheet; increase to 20% α-helix and 31% β-sheet upon drying | CD |
|
|
| AAS47599 | 12 | 1.5% α-helix and 22% β-sheet; increase to 19% α-helix upon drying | CD |
|
|
| DN776754.1 | NC | Largely unstructured, presence of PII helix at low temperature; 5–30% α-helix and 30–80% β-sheet with lipid vesicles; Structural transitions stimulated by phosphorylation, Zn | FTIR |
|
|
NC, not classified in LEAdb; CD, Circular dichroism; NMR, Nuclear magnetic resonance; FTIR, Fourier transformed infrared spectroscopy; Sync rad, Synchrotron radiation.