| Literature DB >> 27824121 |
Christopher R So1, Kenan P Fears1, Dagmar H Leary2, Jenifer M Scancella1, Zheng Wang2, Jinny L Liu2, Beatriz Orihuela3, Dan Rittschof3, Christopher M Spillmann2, Kathryn J Wahl1.
Abstract
Barnacles adhere by producing a mixture of cement proteins (CPs) that organize into a permanently bonded layer displayed as nanoscale fibers. These cement proteins share no homology with any other marine adhesives, and a common sequence-basis that defines how nanostructures function as adhesives remains undiscovered. Here we demonstrate that a significant unidentified portion of acorn barnacle cement is comprised of low complexity proteins; they are organized into repetitive sequence blocks and found to maintain homology to silk motifs. Proteomic analysis of aggregate bands from PAGE gels reveal an abundance of Gly/Ala/Ser/Thr repeats exemplified by a prominent, previously unidentified, 43 kDa protein in the solubilized adhesive. Low complexity regions found throughout the cement proteome, as well as multiple lysyl oxidases and peroxidases, establish homology with silk-associated materials such as fibroin, silk gum sericin, and pyriform spidroins from spider silk. Distinct primary structures defined by homologous domains shed light on how barnacles use low complexity in nanofibers to enable adhesion, and serves as a starting point for unraveling the molecular architecture of a robust and unique class of adhesive nanostructures.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27824121 PMCID: PMC5099703 DOI: 10.1038/srep36219
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Collection and breakdown of cement samples from adult A. amphitrite barnacles.
(a) Examples of fine nanofibrils networked into dense materials observed in multiple collection methods: (i) Thick and opaque cement layer secreted from the barnacle underside, (ii) SEM micrographs of (i) showing a mat consisting of fine nanomaterials, scale bar represents 1 μm. (iii) Underside of a barnacle settled on a bed of glass microspheres after one day, with large accumulations at the periphery, (iv) SEM micrographs of barnacle-adhered microspheres entrapped by cement secretions, scale bar is 100 μm. Inset, 2.25 μm2 image showing close up of nanofibrils on beads, scale bar is 500 nm. (b) Breakdown of collected cement using ethyl acetate (lanes 1 and 4), urea (lanes 2 and 6), hexofluoroisopropanol (lanes 3 and 7), and dithiothreitol (lanes 4 and 8) solvents, eluted proteins analyzed by SDS-PAGE and (c) Full length PAGE run of ‘opaque’ and ‘microsphere’ adhesives after dissolution by HFIP with an abundant protein released at 63 kDa among other well-known bands at 250, 100, 35, and 19 kDa. Left is ‘opaque’ cement, right is ‘microsphere’. Uncropped whole gels from (b,c) are available in Supplementary Figure S7.
Figure 2Broad sequence properties of the combined proteome showing (a) Number of identified proteins shared among sample collection methods that contain at least four peptides. (b) Bar graph of proteins sorted by GRAVY values showing 80% are hydrophilic with hydropathy values at around −0.4, while 14% are more hydrophobic with values above 0. (c) Graph of isoelectric point (pI) values in increasing order, with an average of 8.9±2.4 where 40% of the proteins lie above 10. (d) Pairwise E-values for 52 protein sequences represented by a two toned look up table, clustered into 7 regions of homologous proteins with 10 outlying individual pairs. Darker blue regions have no homology. Yellow squares indicate identity. (e) Outline and identification of self-similar protein families, named by the highest scoring protein sequence.
Classification of cement-related proteins from aggregate sequencing into Glycine/Serine rich cement proteins (GSrCPs) and Leucine rich Cement Proteins (LrCPs).
| Protein Name | Sequence Homology | PAGE MW | Protein MW | emPAI | pI | GRAVY | Aliphatic | State | Corresponding Transcript ID | ORF |
|---|---|---|---|---|---|---|---|---|---|---|
| Glycine/Serine rich Cement Proteins (GSrCP) | ||||||||||
| Aa43-1 | 43 | 100, 66, 63, 58 AG | 43460 | 17.87 | 9.73 | −0.423 | 56.14 | Full | comp41238_c0_seq1 | 4 |
| Aa43-2 | 43 | 100, 67, 58, AG | 28183 | 10.67 | 6.81 | −0.47 | Partial | comp27343_c0_seq1 | 4 | |
| Aa43-3 | 43 | 100, 63, 58, AG | 20450 | 3.17 | 6.76 | −0.167 | Full | comp27593_c0_seq1 | 5 | |
| Aa43-4 | 43 | 63, 58, AG-MS | 9710 | — | 6.39 | 0.219 | Full | comp49909_c1_seq1 | 6 | |
| Aa43-5 | 43 | Transcript | — | — | — | — | — | Partial | comp121192_c0_seq1 | 4 |
| Aa43-6 | 43 | 63-MS | — | — | — | — | — | C-term | comp47087_c0_seq1 | 6 |
| Aa19-1 | 19 | 19, AG | 20154 | 53.75 | 10.45 | 0.013 | 92.17 | Full | Aacp19k | |
| Aa19-2 | 19 | 63, AG-GG | 49400 | 2.63 | 10.52 | −0.3 | Full | comp46137_c1_seq5 | 5 | |
| Aa19-3 | 19 | 250, 66, 63, AG | 45320 | 1.36 | 9.52 | −0.387 | 67.65 | Full | comp40515_c0_seq1 | 5 |
| Aa19-4 | 19 | 100, 66, 63, 58 AG | 48610 | 0.82 | 8.4 | −0.702 | 56.12 | Full | comp45928_c1_seq1 | 6 |
| Aa19-5 | 19 | 250, 63 | 38260 | 0.33 | 11.01 | −0.515 | 62.34 | Full | comp46330_c0_seq1 | 6 |
| Aa19-6 | 19 | 63, AG-MS | 9710 | 0.15 | 7.72 | −0.459 | C-term | comp33102_c0_seq1 | 4 | |
| Aa19-7 | 19 | 63, AG-GG | 9620 | 0.74 | 10.77 | −0.5 | N-term | comp170921_c0_seq1 | 1 | |
| Aa19-8 | 19 | 63 | 8010 | 0.8 | 11.48 | −0.511 | N-term | comp87199_c0_seq1 | 5 | |
| Aa19-9 | 19 | Transcript | — | — | — | — | — | C-term | comp49344_c0_seq1 | 2 |
| Aa57-1 | 57 | 250, 100, 63 | 57120 | 2.26 | 8.74 | −0.519 | 71.46 | Full | comp47308_c0_seq1 | 4 |
| Aa57-2 | 57 | 56710 | 0.16 | 9.15 | −0.668 | Full | comp46196_c0_seq1 | 5 | ||
| Aa57-3 | 57 | AG | 59740 | — | 10.45 | −0.768 | Full | comp25334_c1_seq1 | 5 | |
| Aa105-1 | 105 | 250, 100, 66, 58, 63, 30 | 105009 | Full | comp40644_c0_seq1 | 4 | ||||
| Aa105-2 | 105 | 100, GG | 92610 | 8.88 | −0.550 | Full | comp39288_c0_seq1 | 5 | ||
| Leucine rich Cement Proteins (LrCP) | ||||||||||
| Aa100-1 | 100 | 100, AG | 129500 | 4.77 | 10.25 | 0.178 | 117.08 | Full | Aacp100k-1 | |
| Aa100-2 | 100 | 100, AG | 114220 | 1.71 | 10.01 | 0.133 | Full | Aacp114k-2 | ||
| Aa52-1 | 52 | Transcript | 73010 | 0.19 | 11.02 | −0.08 | Full | Aacp52 | ||
| Aa52-2 | 52 | AG | 28070 | — | 10.75 | −0.022 | Full | comp33562_c1_seq1 | 4 | |
| Aa52-3 | 52 | AG-MS | 46260 | 3.83 | 10.28 | −0.426 | C-term | comp48220_c0_seq1 | 6 | |
Amino acid sequences are coded by an open reading frame (ORF) identifier for protein MWs and sequence annotation, where ORF #s are appended to the Transcript ID in the proteome repository. Abbreviations are defined, AG: Aggregate, AG-MS: Aggregate microsphere, AG-GG: Aggregate gummy glue. emPAI values are all taken from one dataset of gummy MS/MS analysis for comparison.
Annotations for cement proteins determined by searching nrNCBI.
| Protein Name | NCBI Annotation | Accession | PAGE MW | Protein MW | E Value | Corresponding Transcript ID | ORF |
|---|---|---|---|---|---|---|---|
| AAmulti-1 | MULTIFUNCin [Chthamalus fissus] | AFY13482.1 | AG, 250 | 182231 | 0 | comp48163_c0_seq1 | 4 |
| AAmulti-2 | MULTIFUNCin [Balanus glandula] | AFY13480.1 | AG, 250 | 227597 | 0 | comp39924_c0_seq1 | 5 |
| AAmulti-3 | MULTIFUNCin [Chthamalus fissus] | AFY13482.1 | 22835 | 3E-18 | comp79394_c0_seq1 | 2 | |
| AAwap-1 | Whey acidic protein protein 2 [Dufourea novaeangliae] | KZC10758.1 | 14 | 10903 | 4E-05 | comp45426_c4_seq1 | 5 |
| AAwap-2 | Single whey acidic protein domain-containing protein isoform 2 [Penaeus monodon] | ACF28465.1 (XP_009052071.1) | 22181 | 4E-15 | comp32063_c0_seq1 | 4 | |
| AAsp | Neurotrypsin precursor [Mus musculus] | NP_032965.1 | 14, 63 | 16625 | 2E-10 | comp44772_c1_seq1 | 2 |
| AApxt-1 | Peroxinectin [Pacifastacus leniusculus] | CAA62752.1 (EFX80390.1) | 100, 58, 30, AG | 62231 | 3E-48 | comp45999_c0_seq1 | 4 |
| AApxt-2 | Peroxinectin [Eriocheir sinensis] | ADF87945.1 (XP_001946672.2) | 58, AG | 53067 | 2E-75 | comp27704_c0_seq1 | 6 |
| AApxt-3 | Peroxinectin [Scylla serrata] | AAT12270.1 (XP_013775632.1) | 15547 | 3E-22 | comp83572_c0_seq1 | 4 | |
| AApxt-4 | Chorion peroxidase [Daphnia magna] | KZS19290.1 (XP_015373617.1) | 33033 | 3E-57 | comp42709_c1_seq1 | 4 | |
| AAlox-1 | Lysyl Oxidase-like 2 [Drosophila melanogaster] | CAB99481.1 (XP_001960248.1) | 100, 63, AG | 55870 | 2E-102 | comp44772_c0_seq1 | 4 |
| AAlox-2 | Lysyl Oxidase-like protein [Stegodyphus mimosarum] | KFM62898.1 | AG | 9460 | 6.2E-2 | comp81853_c0_seq1 | 6 |
| AAlox-3 | Lox2 [Drosophila busckii] | ALC40946.1 (EFX81056.1) | 41383 | 1E-83 | comp43852_c1_seq1 | 2 | |
| AApi-1 | Protease inhibitor [Ctenocephalides felis] | AKR52931.1 (XP_014606419.1) | AG | 42380 | 3E-44 | comp44898_c1_seq1 | 4 |
| AApi-2 | Serpin B10 [Trachymyrmex septentrionalis] | KYN35588.1 (XP_011183022.1) | 250, 66, 58 | 63948 | 2E-34 | comp45011_c0_seq2 | 6 |
| AApi-3 | Protease inhibitor [Ctenocephalides felis] | AKR52931.1 (XP_013104691.1) | AG | 44390 | 7E-44 | comp44997_c0_seq1 | 6 |
| AAmuc* | Intestinal mucin [Operophtera brumata] | KOB70769.1 (AGR65306.1) | 63, 58, AG | 46260 | 1E-85 | comp43534_c0_seq1 | 4 |
| AAtrx | Thioredoxin | AG | 0 | Thioredoxin | |||
| AAsilk-1* | cross-beta structure silk protein 1 [Mallada signata] | ACN87361.1 (XP_012246550.1) | 55051 | 3E-10 | comp46330_c0_seq1 | 6 | |
| AAsilk-2* | cross-beta structure silk protein 1 [Mallada signata] | ACN87361.1 | 21890 | 2E-11 | comp27593_c0_seq1 | 5 | |
| AAsilk-3* | cross-beta structure silk protein 1 [Mallada signata] | ACN87361.1 (CEP02176.1) | 28200 | 7E-12 | comp27343_c0_seq1 | 4 | |
| AAsilk-4* | cross-beta structure silk protein 2 [Mallada signata] | ACN87362.1 (XP_013410251.1) | 54012 | 1E-17 | comp48220_c0_seq1 | 6 |
ORF identifiers are appended to the Transcript ID in the proteome repository. Starred entries indicate results from unfiltered BLAST searching, without low complexity and compositional bias filters. Accession numbers in parentheses correspond to hypothetical proteins or proteins with predicted function that have higher e-values.
Figure 3Classification of homologous cement proteins by amino acid composition.
(a) Corresponding values of hydropathy ranging from hydrophobic to hydrophilic, (b) Aliphatic index values of cement proteins are bimodal, when relative volumes of alanine, valine, isoleucine and leucine side chains are compared. (c) Left, polar GSrCPs defined by an abundance of glycine, alanine, serine and threonine residues. Right, aliphatic LrCPs defined by an abundance of valine, isoleucine and leucine residues and higher aliphatic index values. Bottom right, acid hydrolysis of whole cement from microspheres yields an abundance of glycine, alanine, serine and threonine in addition to hydrophobic residues. Asterisks indicate residues that are abundant in individual components.
Figure 4Sequence homology of Aa43 and Aa19 GSrCPs.
(a) Primary structure of Aa43-1 showing three homologous domains spanning ca. 100 residues each. (b) Multiple sequence alignment of three low complexity domains in Aa43-1 showing conservation of 60% residues with shared chemistry, (c) primary structure of four GSrCPs with repeated domains of high homology to AaCP19 (d) Multiple sequence alignment of all 19-homologous domains, where conserved regions are composed mainly of low complexity residues. Black lines highlight regions of low complexity.
Search results for GSrCPs against unfiltered nrNCBI, showing highest alignment with silk and cement proteins.
| Cement Homology | E value | Coverage | Silk Homology | E value | Coverage | |
|---|---|---|---|---|---|---|
| Leucine rich Cement Proteins (LrCPs) | ||||||
| Aa100-1 | cement protein 100k [Amphibalanus amphitrite] | 0 | 1 | — | — | — |
| 114 kDa cement protein [Amphibalanus amphitrite] | 0 | 0.84 | — | — | — | |
| cement protein-100k [Megabalanus rosa] | 0 | 0.84 | — | — | — | |
| Aa100-2 | 114 kDa cement protein [Amphibalanus amphitrite] | 0 | 1 | — | — | — |
| cement protein 100k [Amphibalanus amphitrite] | 0 | 0.98 | — | — | — | |
| cement protein-100k [Megabalanus rosa] | 0 | 1 | — | — | — | |
| Aa52-1 | 52 kDa cement protein [Amphibalanus amphitrite] | 0 | 1 | — | — | — |
| 52kDa cement protein [Megabalanus rosa] | 2.00E-158 | 0.99 | — | — | — | |
| AaCP20-1 | cement protein 20 kDa-1 [Amphibalanus amphitrite] | 7.00E-87 | 0.92 | — | — | — |
| cement protein 20 kDa-3 [Amphibalanus amphitrite] | 1.00E-40 | 0.84 | — | — | — | |
| cement protein 20 kDa-2 [Amphibalanus amphitrite] | 2.00E-27 | 0.91 | — | — | — | |
| Glycine/Serine rich Cement Proteins (GSrCPs) | ||||||
| AaCP19 | 19 kDa cement protein [Amphibalanus amphitrite] | 7.00E-141 | 1 | — | — | — |
| cement protein-19k [Fistulobalanus albicostatus] | 2.00E-77 | 0.85 | — | — | — | |
| cement protein-19k [Balanus improvisus] | 3.00E-60 | 0.85 | — | — | — | |
| Aa19-2 | cement protein-19k [Fistulobalanus albicostatus] | 2.00E-20 | 0.86 | cross-beta structure silk protein 1 [Mallada signata] | 9.00E-32 | 0.98 |
| cement protein-19k [Megabalanus rosa] | 4.00E-19 | 0.92 | silk sericin MG-1 [Galleria mellonella] | 7.00E-25 | 0.92 | |
| 19 kDa cement protein [Amphibalanus amphitrite] | 8.00E-19 | 0.88 | cross-beta structure silk protein 2 [Mallada signata] | 8.00E-25 | 0.91 | |
| Aa19-3 | cement protein-19k [Megabalanus rosa] | 4.00E-29 | 0.81 | pyriform spidroin 2 [Nephila clavipes] | 5.00E-15 | 0.93 |
| cement protein-19k [Fistulobalanus albicostatus] | 1.00E-27 | 0.74 | piriform spidroin [Nephila clavipes] | 3.00E-11 | 0.96 | |
| 19 kDa cement protein [Amphibalanus amphitrite] | 3.00E-25 | 0.84 | piriform-like spidroin [Nephilengys cruentata] | 1.00E-10 | 0.86 | |
| Aa19-4 | cement protein-19k [Fistulobalanus albicostatus] | 1.00E-28 | 0.69 | silk sericin MG-1 [Galleria mellonella] | 2.00E-17 | 0.7 |
| 19 kDa cement protein [Amphibalanus amphitrite] | 2.00E-22 | 0.81 | cross-beta structure silk protein 1 [Mallada signata] | 1.00E-16 | 0.72 | |
| cement protein-19k [Megabalanus rosa] | 3.00E-20 | 0.81 | cross-beta structure silk protein 2 [Mallada signata] | 2.00E-14 | 0.73 | |
| Aa19-5 | cement protein-19k [Balanus improvisus] | 4.00E-08 | 0.69 | cross-beta structure silk protein 1 [Mallada signata] | 1.00E-10 | 0.95 |
| — | — | — | sericin 3 precursor [Bombyx mori] | 1.00E-09 | 0.88 | |
| — | — | — | cross-beta structure silk protein 2 [Mallada signata] | 1.00E-09 | 0.94 | |
| Aa43-1 | cement protein-19k [Balanus improvisus] | 1.50E-01 | 0.26 | cross-beta structure silk protein 1 [Mallada signata] | 3.00E-33 | 0.8 |
| — | — | — | silk sericin MG-1 [Galleria mellonella] | 7.00E-29 | 0.81 | |
| — | — | — | cross-beta structure silk protein 2 [Mallada signata] | 3.00E-26 | 0.8 | |
Figure 5Shared primary structure of polar GSrCPs.
(a) Recursive dot plot analysis of GSrCP-Aa43-1, GSrCP-Aa19-3, and LrCP-Aa52-1 against archetypal silk proteins, highlighting regions of low complexity along primary sequences. (i) AaCP19, (ii) Egg Stalk Silk from M. signata, (iii) Heavy Chain Fibroin from B. mori, (iv) Heavy Chain Fibroin from R. fugax, (v) Sericin I from B. mori, (vi) Spidroin I from N. clavipes. Dots represent silk homologous domains (SHDs). (b) Distinct alternating primary structure observed in GSrCPs defined by alternating silk homologous and complex domains. (c) Pairwise alignment of archetypal silk proteins to representative SHDs, demarked by dot plotting, where bold letters are identical to the query sequence and italicized bold letters are chemically similar. (d) Stringency of primary sequence in GSrCP low complexity regions measured by pairwise alignment to de novo silk-like motifs, showing higher stringency with [SS] based models over [GG]. Percentages are the number of similar and identical alignments between pairs, divided by cement protein sequence length.