| Literature DB >> 31004089 |
David J Rees1, Arash Hanifi1, Angelico Obille1, Robert Alexander2, Eli D Sone3,4,5.
Abstract
The European freshwater mollusk Dreissena bugensis (quagga mussel), an invasive species to North America, adheres to surfaces underwater via the byssus: a non-living protein 'anchor'. In spite of its importance as a biofouling species, the sequence of the majority of byssal proteins responsible for adhesion are not known, and little genomic data is available. To determine protein sequence information, we utilized next-generation RNA sequencing and de novo assembly to construct a cDNA library of the quagga mussel foot transcriptome, which contains over 200,000 transcripts. Quagga mussel byssal proteins were extracted from freshly induced secretions and analyzed using LC-MS/MS; peptide spectra were matched to the transcriptome to fingerprint the entire protein primary sequences. We present the full sequences of fourteen novel quagga mussel byssal proteins, named Dreissena bugensis foot proteins 4 to 17 (Dbfp4-Dbfp17), and new sequence data for two previously observed byssal proteins Dbfp1 and Dbfp2. Theoretical masses of the newly discovered proteins range from 4.3 kDa to 21.6 kDa. These protein sequences are unique but contain features similar to glue proteins from other species, including a high degree of polymorphism, proteins with repeated peptide motifs, disordered protein structure, and block structures.Entities:
Mesh:
Year: 2019 PMID: 31004089 PMCID: PMC6474901 DOI: 10.1038/s41598-019-41976-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of LC-MS/MS spectra data and protein matches to the quagga mussel transcriptome library.
| Sample | MSScans | MS/MS Scans | Spectrum matches | Protein Groups | Proteins | |
|---|---|---|---|---|---|---|
| Whole TP | 17,999 | 15,282 | 2,958 | 1,414 | 599 | 1,066 |
| 6 kDa | 10,272 | 5,909 | 201 | 129 | 85 | 202 |
| 7 kDa | 9,603 | 7,249 | 299 | 239 | 117 | 241 |
| 14 kDa | 9,975 | 6,510 | 157 | 283 | 107 | 192 |
Byssal proteins identified by LC-MS/MS analysis using the foot transcriptome library.
| Protein name | Protein Sequence (# of amino acids)a,b | MW, pIa | Scorec | Spectral Matches (#) | |||
|---|---|---|---|---|---|---|---|
| TP | 6 kDa | 7 kDa | 14 kDa | ||||
| Dbfp1-f1 | WNDKYPGDGKDKYPGDGDDK | 4.6 kDa pI 4.4 | 75.07 | 3 | 1 | 1 | 1 |
| Dbfp1-f2 | KYPGDGNPKYPGGGNPKYPGGGNPKYPGGGNPKYPGGGNAK | 5.1 kDa pI 10 | 27.12 | 1 | 0 | 0 | 0 |
| Dbfp2 | 20.2 kDa pI 9.7 | 90.13 | 16 | 4 | 6 | 4 | |
| Dbfp4 | 21.8 kDa pI 8.1 | 149.48 | 20 | 0 | 0 | 0 | |
| Dbfb5 | 21.3 kDa pI 8.7 | 85.71 | 3 | 1 | 0 | 0 | |
| Dbfp5β variant MW:15.6 kDa, pI: 5.7 | |||||||
| Dbfp6 | 17.6 kDa pI 6.6 | 110.57 | 9 | 1 | 1 | 1 | |
| Dbfb7α | 13.2 kDa pI 4.7 | 147.16 | 16 | 10 | 10 | 9 | |
| Dbfb7α–Dbfb7ζ variants MW/pI: 14.8/6.6, 14.2/4.3, 13.2/4.7, 12.3/4.7, 10.3/8.7, 9.7/9.1 | |||||||
| Dbfp8β | 16.3 kDa | 68.3 | 3 | 0 | 0 | 0 | |
| Dbfp8α variant MW,pI: 16.7 kDa, pI 10.7 | |||||||
| Dbfp9β | 7.9 kDa pI 4.6 | 106.28 | 9 | 9 | 7 | 1 | |
| Dbfb9α variant MW,pI: 9.0 kDa, pI 4.3 | |||||||
| Dbfp10α | 10.3 kDa pI 3.5 | 101.96 | 4 | 0 | 0 | 0 | |
| Dbfb10β and Dbfp10γ variants MW/pI (kDa/pH): 7.9/4.7, 6.6/8.6 | |||||||
| Dbfp11γ | 12.7 kDa pI 4.4 | 137.49 | 0 | 0 | 0 | 0 | |
| Dbfb11α–Dbfp11ε variants MW/pI (kDa/pH): 13.1/5.5, 12.7/4.4, 12.5/4.9, 12.4/4.9 | |||||||
| Dbfp12δ | 10.5 kDa pI 7.8 | 119 | 3 | 0 | 0 | 3 | |
| Dbfp12α–Dbfp12δ variants MW/pI (kDa/pH): 10.9/8.3, 10.7/8.1, 10.7/8.1 | |||||||
| Dbfp13α | 9.6 kDa pI 10.8 | 52.4 | 2 | 0 | 0 | 0 | |
| Dbfp13α–Dbfp13ε variants MW/pI (kDa/pI): 9.6/10.8, 9.2/10.6, 8.3/10.2, 7.1/9.94, 7.1/10.0 | |||||||
| Dbfp14γ | 5.6 kDa pI 8.6 | 89.84 | 6 | 6 | 14 | 7 | |
| Dbfp14α-Dbfp14δ variants MW/pI (kDa/pH): 6.6/8.2, 6.5/8.6, 5.6/8.6, 5.4/8.2 | |||||||
| Dbfp15α | 5.5 kDa pI 9.9 | 69.24 | 2 | 0 | 2 | 1 | |
| Dbfp15β variant MW/pI (kDa/pH): 5.0/9.0 | |||||||
| Dbfp16 | 5.3 kDa pI 4.4 | 68.66 | 3 | 1 | 1 | 0 | |
| Dbfp17 | 4.3 kDa pI 9.2 | 47.92 | 2 | 0 | 0 | 0 | |
Novel proteins have been named Dbfp4 – Dbfp17. Signal peptides are underlined and matching peptide spectra are indicated in bold. For proteins with isoforms, the transcript with highest number of observed spectra or the largest molecular weight is shown as a representative sequence. Proteins variants are listed beneath the representative sequence.
aSequence properties were calculated after removing the predicted signal peptide sequence.
bN and Q signify asparagine deamidation, and Y signifies tyrosine hydroxylation, respectively.
cPEAKS scoring method: a −10LogP score cut-off of 20 is equivalent to P-value of 0.01.
Prominent amino acids in quagga mussel byssal proteins and their respective protein category.
| Protein | Prominent amino acids (mol%) | Protein category | |||
|---|---|---|---|---|---|
| 1st | 2nd | 3rd | Notable | ||
| Dbfp1a | G (26.6) | Y (14.7) | P (12.2) | K (12.3) | G, Y rich |
| Dbfp7α | G (18.6) | N (13.6) | Y (10.2) | Q/P (9.2) | G, Y rich |
| Dbfp9β | G (37.8) | Y (21.6) | N (10.8) | G, Y rich | |
| Dbfp15 | G (30.8) | Y (13.5) | S (11.5) | G, Y rich | |
| Dbfp2 | P (25) | Y (23.2) | K (17.7) | T (10.4) | P rich |
| Dbfp4 | P (8.9) | L (8.9) | V (7.4) | P rich | |
| Dbfp5 | P (18.4) | Y (11.1) | G (11.1) | Q (10.5) | P rich |
| Dbfp6 | P (9.5) | G (7.6) | L (7.6) | V (7.6) | P rich |
| Dbfp10α | P (18.2) | D (15.9) | Y (13.6) | C (6.8) | P rich |
| Dbfp13α | P (15.9) | R (12.2) | T (12.2) | Q (6.1) | P rich |
| Dbfp11γ | C (15.7) | P (13.0) | D (10.3) | Y (5.2) | C rich |
| Dbfp12δ | C (12.5) | G (11.5) | R (10.4) | P (8.3) | C rich |
| Dbfp14γ | R (18.4) | C (16.3) | P (12) | G (10.2) | C rich |
| Dbfp16 | C (13.6) | W (13.6) | P (6.8) | C rich | |
| Dbfp17 | C (15.0) | K (15.0) | G (10.0) | C rich | |
| Dbfp8 | S (22.7) | L (12.0) | G (11.3) | — | |
aRzepecki and Waite 1993b[12].
Expression analysis of quagga mussel byssal proteins.
| Protein | Transcriptome component | Expected count | Transcripts per million (TPM) | Fragments per million (FPM) | Rank |
|---|---|---|---|---|---|
| Dbfp9 | comp42199_c1 | 31302209 | 130366 | 252497 | 2 |
| Dbfp14 | comp35860_c0 | 18547819 | 51358 | 98942 | 4 |
| Dbfp9-start | comp31072_c0 | 14439051 | 71373 | 137830 | 5 |
| Dbfp7 | comp52765_c1 | 10955530 | 12930 | 25112 | 7 |
| Dbfp9-alt-start | comp31079_c0 | 9617925 | 68934 | 133137 | 9 |
| Dbfp2 middle | comp12693_c0 | 9558067 | 48823 | 94269 | 10 |
| Dbfp11 | comp49590_c4 | 9360859 | 13623 | 26269 | 12 |
| Dbfp1-fragment | comp39272_c0 | 9347159 | 83646 | 162151 | 13 |
| Dbfp2-end | comp31091_c1 | 5283169 | 21456 | 41387 | 16 |
| Dbfp2-start | comp9618_c0 | 5014310 | 27942 | 53928 | 18 |
| Dbfp10α/β | comp35857_c0 | 4541260 | 3914 | 7545 | 22 |
| Dbfp1-start | comp34314_c0 | 4163681 | 30984 | 60032 | 25 |
| Dbfp1-middle | comp9617_c0 | 2604072 | 23918 | 46162 | 30 |
| Dbfp15 | comp37514_c0 | 2561050 | 7518 | 14572 | 31 |
| Dbfp4 | comp37554_c0 | 1999121 | 1957 | 3745 | 35 |
| Dbfp10c | comp40957_c0 | 1487483 | 2738 | 5323 | 44 |
| Dbfp9η/θ | comp35873_c0 | 1348059 | 2258 | 4381 | 51 |
| Dbfp1-end | comp30958_c0 | 1103356 | 4990 | 9635 | 60 |
| Dbfp9α | comp58518_c0 | 769975 | 3094 | 6074 | 76 |
| Dbfp13α-δ | comp49016_c0 | 640379 | 656 | 1270 | 82 |
| Dbfp17 | comp12176_c0 | 814345 | 1572 | 3022 | 103 |
| Dbfp16 | comp13545_c0 | 436715 | 2001 | 3876 | 120 |
| Dbfp6 | comp31337_c0 | 143463 | 192 | 368 | 141 |
| Dbfp5 | comp47359_c0 | 309503 | 46 | 90 | 172 |
| Dbfp13ε | comp57838_c0 | 199337 | 32 | 62 | 259 |
| Dbfp12 | comp45612_c1 | 62698 | 81 | 152 | 675 |
Figure 1Dbfp7 variants with signal peptide removed, aligned using Clustal to illustrate the effects of alternative splicing. Greek letters replaced by English letters. Stars beneath sequences indicate exact match, a colon dot indicates substitution with strongly similar properties, and a single dot indicates a substitution with weakly similar properties.
Figure 2Dbfp2 assembly, with tandem repeats highlighted in green, yellow, and blue.
Figure 3Kyte-Doolittle hydropathy plot (right) for Dbfp4 indicating alternating domains; higher score indicates higher hydrophobicity. Hydrophobic residues are highlighted in red (left).
Figure 4Dbfp5 protein with interesting repeats highlighted (left), and IUPred disorder prediction chart for Dbfp5, where values above 0.5 indicate a tendency to be disordered (right). Note how the cysteine-containing region has low disorder.
Figure 5IUPred Disorder prediction chart for Dbfp13α, where values above 0.5 indicate a tendency to be disordered.