| Literature DB >> 31574084 |
Martin Pendola1, Gaurav Jain1, John Spencer Evans1.
Abstract
The formation of the sea urchin spicule skeleton requires the participation of hydrogel-forming protein families that regulate mineral nucleation and nanoparticle assembly processes that give rise to the spicule. However, the structure and molecular behavior of these proteins is not well established, and thus our ability to understand this process is hampered. We embarked on a study of sea urchin spicule proteins using a combination of biophysical and bioinformatics techniques. Our biophysical findings indicate that recombinant variants of the two most studied spicule matrix proteins, SpSM50 and SpSM30B/C (S. purpuratus) have a conformational landscape that include a C-terminal random coil/intrinsically disordered MAPQG sequence coupled to a conserved, folded N-terminal C-type lectin-like (CTLL) domain, with SpSM50 > SpSM30B/C with regard to intrinsic disorder. Both proteins possess solvent-accessible unfolded MAQPG sequence regions where Asn, Gln, and Arg residues may be accessible for protein hydrogel interactions with water molecules. Our bioinformatics study included seven other spicule matrix proteins where we note similarities between these proteins and rare, unusual proteins that possess folded and unfolded traits. Moreover, spicule matrix proteins possess three types of sequences: intrinsically disordered, amyloid-like, and folded protein-protein interactive. Collectively these reactive domains would be capable of driving protein assembly and hydrogel formation. Interestingly, three types of global conformations are predicted for the nine member protein set, wherein we note variations in the arrangement of intrinsically disordered and interactive globular domains. These variations may reflect species-specific requirements for spiculogenesis. We conclude that the molecular landscape of spicule matrix protein families enables them to function as hydrogelators, nucleators, and assemblers of mineral nanoparticles.Entities:
Year: 2019 PMID: 31574084 PMCID: PMC6771980 DOI: 10.1371/journal.pone.0222068
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Spicule matrix protein sequences.
| Spicule Matrix Protein | Sea Urchin Species | Accession Number |
|---|---|---|
| SpSM50 | P11994, SM50_STRPU | |
| SpSM37 | Uniprot O76450, GenBank AAC33762.1 | |
| SpSM32 | Uniprot Q8MUL1, GenBank AAM70486.1 | |
| SpSM30B/C | P28163, SM30_STRPU | |
| SpSM29 | Uniprot Q8MUL0, GenBank AAM70487.1 | |
| LSM34 | Uniprot Q05904, GenBank CAA42179.1 | |
| HSM30 | Uniprot Q25116 | |
| HSM41 | Uniprot Q26264, GenBank AAB24285 | |
| PM27 | Uniprot Q95W96 |
Fig 1Far-UV circular dichroism spectra of rSpSM30B/C-G (7.5 μM) and rSpSM50 (3 μM) proteins in 100 μM HEPES, pH 8.0.
Dashed line extrapolates the ellipticity minima for each protein.
Fig 2Homonuclear 800 MHz 1H TOCSY spectra (exchangeable sidechain amide chemical shift region) of 22μM rSpSM30B/C-G and rSpSM50 hydrogel particle samples, 100 μM HEPES, pH 7.5.
Diagonal and off-diagonal regions for sidechain and backbone NH Arg, Asn, and Gln resonances are shown, along with corresponding 1-D spectra.
Fig 3Primary sequences of SpSM30B/C-G and SpSM50.
Arg, Gln, and Asn residues are presented in red. MAQPG domains are highlighted in yellow. Note high concentration of Arg, Asn, Gln within disordered MAQPG regions.
Fig 4Predicted regions of intrinsic disorder (GLOBPLOT 2.3, DISOPRED, IUP) and aggregation-prone amyloid-like (AGGRESCAN, FOLD_AMYLOID, ZIPPER_DB).
Shaded areas (red = intrinsic disorder; blue = amyloid-like cross-beta strand) denote sequence regions predicted as positive by each cohort of algorithms. Grey area denotes regions that do not score as positive for either intrinsic disorder or amyloid-like sequences. Purple color denotes sequence region overlap between aggregation-prone and intrinsic disorder.
Fig 5(A) Four quadrant (1–4) CH-CDF plot for spicule matrix protein sequences. (B) Enlargement of relevant Quadrant 1. The Y-coordinate in the CH-CDF plot corresponds to the distance from the obtained ordinate value to the correlation line separating the structured and unstructured conformational state of the protein on the CH (charge-hydrophobicity) plot. The X-coordinate on the CH-CDF plot corresponded to the distance from the obtained ordinate value to the correlation line separating the structured and unstructured conformational state of the protein in the CDF. There are 4 quadrants: Quadrant 1 (CH > 0, CDF > 0) representing rare proteins for which it is impossible to determine accurately the state, i.e., their CDF scores correspond to structured domains but CH scores correspond to unstructured proteins. Quadrant 2 (CH > 0, CDF < 0) represents unfolded proteins (U), Quadrant 3 (CH < 0, CDF < 0) represents the molten globule state (MG). Quadrant 4 (CH < 0, CDF > 0) represents structured or folded proteins (F)[44,45].
Fig 6Categories of spicule matrix protein backbone conformations predicted by DISOclust/Intfold 4.0 (ribbon representation, lowest energy conformer) for nine sea urchin spicule matrix proteins (Table 1).
Under each Type is a cartoon representation of global conformation (circle = folded conformation; squiggle line = disordered conformation). Best template model for the globular domain, confidence levels, P scores, and global model quality scores can be found in Table 2. N- and C-terminal ends are denoted.
DISOclust/INTFOLD4 fitted crystal structure template models homologous to conserved globular domains in sea urchin spicule matrix proteins.
| Protein | Model Template (globular domain) | Confidence/P value | Global model quality score |
|---|---|---|---|
| SpSM50 | 3alsA | High/3.23 E-3 | 0.5112 |
| SpSM30B/C | 1qddA, 1jznA, 1eggB | High/4.53 E-3 | 0.5092 |
| SpSM37 | 3alsA | Medium/1.552 E-2 | 0.3794 |
| SpSM32 | 1wmyA, 1jzna | High/2.684 E-3 | 0.4205 |
| SpSM29 | 2ox9C | Cert/2.406 E-5 | 0.5333 |
| LSM34 | 1wmyA | High/5.13 E-3 | 0.4055 |
| HSM41 | 2nbiA, 2pff | High/8.79 E-3 | 0.3920 |
| HSM30 | 1qddA, 1jznA | High/6.215 E-3 | 0.4788 |
| PM27 | 1wmyA | Cert/4.182 E-5 | 0.5158 |
3alsA, 1wmyA = C-type lectin CEL-I, Cucumaria echinata
1jznA = Galactose-specific C-type lectin, Crotalus atrox
1qddA = Lithostathine, Homo sapiens
2h2r = CD23 lectin domain, Homo sapiens
2ox9C = Mouse scavenger receptor C-type Lectin carbohydrate-recognition domain, Mus musculus
1eggB = C-type carbohydrate recognition domain (CRD-4), macrophage mannose receptor, Homo sapiens
2pff = fatty acid synthase subunit alpha, Saccharomyces cerevisiae.
2nbiA = pscd-region of the cell wall protein pleuralin-1, Cylindrotheca fusiformis