Literature DB >> 26201648

Identification and Characterization of a Novel Family of Cysteine-Rich Peptides (MgCRP-I) from Mytilus galloprovincialis.

Marco Gerdol¹, Nicolas Puillandre², Gianluca De Moro¹, Corrado Guarnaccia³, Marianna Lucafò¹, Monica Benincasa¹, Ventislav Zlatev³, Chiara Manfrin¹, Valentina Torboli¹, Piero Giulio Giulianini¹, Gianni Sava¹, Paola Venier⁴, Alberto Pallavicini⁵.

Abstract

We report the identification of a novel gene family (named MgCRP-I) encoding short secreted cysteine-rich peptides in the Mediterranean mussel Mytilus galloprovincialis. These peptides display a highly conserved pre-pro region and a hypervariable mature peptide comprising six invariant cysteine residues arranged in three intramolecular disulfide bridges. Although their cysteine pattern is similar to cysteines-rich neurotoxic peptides of distantly related protostomes such as cone snails and arachnids, the different organization of the disulfide bridges observed in synthetic peptides and phylogenetic analyses revealed MgCRP-I as a novel protein family. Genome- and transcriptome-wide searches for orthologous sequences in other bivalve species indicated the unique presence of this gene family in Mytilus spp. Like many antimicrobial peptides and neurotoxins, MgCRP-I peptides are produced as pre-propeptides, usually have a net positive charge and likely derive from similar evolutionary mechanisms, that is, gene duplication and positive selection within the mature peptide region; however, synthetic MgCRP-I peptides did not display significant toxicity in cultured mammalian cells, insecticidal, antimicrobial, or antifungal activities. The functional role of MgCRP-I peptides in mussel physiology still remains puzzling.

Entities: CellLine Chemical Disease Species

Keywords: antimicrobial peptide; bivalve mollusk; mussel; toxin; transcriptome

Mesh：

Substances：

Year: 2015 PMID： 26201648 PMCID： PMC4558851 DOI： 10.1093/gbe/evv133

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Marine ecosystems are characterized by an astonishing species diversity, with over 2 million different eukaryotic species belonging to various phyla estimated to compose the marine fauna (Mora et al. 2011). Thus, marine organisms and environments can be regarded as a virtually unlimited source of bioactive compounds, either produced through complex biochemical synthetic reactions or gene-encoded peptides (Mayer et al. 2011). Nowadays, computer-assisted data mining coupled with the advent of next-generation sequencing (NGS) technologies allows the in silico identification of bioactive molecules also in nonmodel marine organisms (Li et al. 2011; Sperstad et al. 2011). The quick increase of bivalve transcriptome data sets (Suárez-Ulloa, Fernández-Tajes, Manfrin, et al. 2013) and the recent genome sequencing of the oysters Crassostrea gigas (Zhang et al. 2012) and Pinctada fucata (Takeuchi et al. 2012) further broadens the horizons of genetic and genomic studies in bivalve mollusks. Due to their relevance as sea food and sentinel organisms, significant RNA sequencing (RNA-seq) efforts, both with 454 and Illumina technologies, have been performed on Mytilus spp. (Craft et al. 2010; Philipp et al. 2012; Suárez-Ulloa, Fernández-Tajes, Aguiar-Pulido, et al. 2013; Bassim et al. 2014; Freer et al. 2014; Gerdol et al. 2014; Romiguier et al. 2014; González et al. 2015). Moreover, a recently released unrefined genome of the Mediterranean mussel Mytilus galloprovincialis further extends the molecular data available for this species (Nguyen et al. 2014). The bioinformatic analysis of the mussel data has already contributed to the discovery of important immune-related molecules, including pathogen-recognition receptors, signaling intermediates, and antimicrobial peptides (AMPs) (Gerdol et al. 2011, 2012; Gerdol and Venier 2015; Rosani et al. 2011; Toubiana et al. 2013, 2014). As reported in this article, large-scale bioinformatic analyses can also drive the discovery of novel gene families encoding peptides with unique chemico-physical properties and/or sequence patterns. Actually, cysteine-rich peptides (CRPs) encompass a large and widespread group of secreted bioactive molecules, heterogeneous in primary sequence and structural arrangement, with different functional roles and present in almost all living organisms, from bacteria to fungi, animals, and plants (Gruber et al. 2007; Taylor et al. 2008; Marshall et al. 2011). Invertebrate CRPs are particularly abundant and they have been frequently related to the immune defense against potential pathogens (Mitta, Vandenbulcke, Noël, et al. 2000). According to the number of cysteine residues and their arrangement in the tridimensional space, many families of cysteine-rich AMPs have been described in invertebrates, as for instance in crustaceans (Destoumieux et al. 1997; Bartlett et al. 2002), insects (Bulet and Stöcklin 2005), and arachnids (Ehret-Sabatier et al. 1996; Fogaça et al. 2004). In Mytilus spp., different families of cysteine-rich AMPs have been progressively discovered starting from the mid 1990s. Peptides similar to arthropod defensins were purified from active fractions of hemolymph almost contemporarily in Mytilus edulis and M. galloprovincialis (Charlet et al. 1996; Hubert et al. 1996), together with different novel AMPs whose structure and biological activities were characterized in the following years. Those included mytilins (Mitta, Vandenbulcke, Hubert, et al. 2000), myticins (Mitta et al. 1999), and the strictly antifungal mytimycins. Only very recently other AMP families were described in mussel, either by cloning from hemolymph cDNAs libraries, such as in the case of myticusins (Liao et al. 2013), or by detection in high throughput sequencing data sets, in the case of mytimacins and big defensins (Gerdol et al. 2012). Other evolutionarily related CRPs are animal venom components which possess neurotoxic properties, as they can selectively block various types of ion channels for predation or defense (Froy and Gurevitz 2004; Rodríguez de la Vega and Possani 2005). Notably, spider and scorpion venoms contain an extraordinary mixture of CRPs whose complexity has been only recently fully appreciated by “omics” approaches (Ma et al. 2009; Zhang et al. 2010). Even within the Mollusca phylum some species have developed a lethal venom arsenal to be used for predation: Marine gastropods of the genus Conus indeed use a modified radula as a sting to inject and paralyze their prey with a powerful venom cocktail, mostly of peptidic nature (Olivera et al. 2012). Due to their biological properties, many CRPs have been studied to guide the development of new drugs for therapeutic applications in both human and veterinary medicine (Adams et al. 1999; Otero-González et al. 2010; Saez et al. 2010). Despite having physico-chemical properties similar to cysteine-rich AMPs and toxins, certain animal CRPs lack the expected activities and are instead involved in diverse functions: Among these, Kunitz-type (Ranasinghe and McManus 2013) and Kazal-type (Rimphanitchayakit and Tassanakajon 2010) proteinase inhibitors represent two widespread groups. The abundance and diversity of the CRPs described in protostomes is remarkable and, given the poor genomic knowledge of many taxonomic groups, a large part of these peptides probably still remain to be uncovered. In this article, we report the application of a genome- and transcriptome-scale approach to the identification of sequences encoding novel CRPs from the Mediterranean mussel M. galloprovincialis. In agreement to nomenclature criteria reported elsewhere (Gerdol and Venier, forthcoming), we present the new MgCRP-I family, characterized by a conserved pre-pro region and an highly variable mature peptide with six conserved cysteine residues organized in the consensus C(X3–6)C(X1–7)CC(X3–4)C(x3–5)C. We investigated the organization and evolution of mussel MgCRP-I genes and pseudogenes, as well as the main features and possible functional roles of the encoded peptides.

Materials and Methods

Identification of MgCRP-I Sequences in Mussel Transcriptomes

The M. galloprovincialis Illumina transcriptomes available at the NCBI (National Center for Biotechnology Information) Sequence Reads Archive (retrieved in February 2015) were assembled with Trinity v.2014-07-17 (Grabherr et al. 2011) and with the CLC Genomics Workbench 7.5 (CLC Bio, Aarhus, Denmark), using default parameters. Following translation of the assembled contigs into the six possible reading frames with EMBOSS TranSeq (Rice et al. 2000) we investigated the virtual mussel proteins for the presence of the C-C-CC-C-C signature, allowing a spacing between cysteine residues of one to ten amino acids, using a Perl script developed in-house (available upon request to the corresponding author). Matching sequences were aligned with MUSCLE (Edgar 2004) to generate a HMMER v3.0 profile (Eddy 2011), which was then used to retrieve partial-matching cases within the assembly (e-value cutoff 1 × 10−5). The procedure was reiterated until no additional matches could be retrieved. Sequences showing an identity higher than 95% at the nucleotide level were considered as redundant and collapsed in a single consensus sequence, unless they were confirmed by genomic evidence (see section below). With just two exceptions (see the Discussion section), all the sequences retrieved matched the presence of at least one C(X3–6)C(X1–7)CC(X3–4)C(x3–5)C motif.

Identification of MgCRP-I Genes in the Mussel Genome

The M. galloprovincialis genomic contigs (Nguyen et al. 2014) were downloaded from GenBank and scanned for the presence of MgCRP-I genes as follows: 1) Genes were identified based on BLASTn identity (Altschul et al. 1990) to the previously identified MgCRP-I transcripts (e-value threshold of 1 × 10−5), and 2) genomic scaffolds were translated into the six possible reading frames with the EMBOSS Transeq tool (Rice et al. 2000) and novel MgCRP-I loci were identified with HMMERv 3.0 (Eddy 2011). The genes identified were manually annotated with mRNA and coding sequences (CDS) traces, based on: 1) MUSCLE alignment between genomic contigs and the corresponding assembled transcripts, whenever available; 2) mapping of the available M. galloprovincialis sequencing reads (see above), with the CLC Genomics Workbench “large gap mapping” tool; and 3) refinement of splice site positions with Genie (Reese et al. 1997). An example of the results of the annotation procedure is shown in supplementary figure S1, Supplementary Material online. Results obtained from the genome and transcriptome analyses were compared and redundant results (identity percentage higher than 95%) were removed, unless multiple gene copies were confirmed in the mussel genome (e.g., the presence of paralogous genes was tolerated).

MgCRP-I Protein Sequence Analysis

Protein translations of mussel genes and transcripts identified with the strategy mentioned above were further analyzed as follows: The presence of a signal peptide was detected with SignalP 4.0 (Petersen et al. 2011), and discriminated from transmembrane domains with Phobius (Käll et al. 2004). Potential sites of posttranslational proprotein convertase cleavage were identified with ProP 1.0 (Duckert et al. 2004). Possible posttranslational C-terminal cleavage sites by carboxypeptidase E or by peptidylglycine, α-amidating monooxygenase were detected with ELM (Dinkel et al. 2011). The subcellular localization was predicted (for full-length peptides only) with TargetP 1.1 (Emanuelsson et al. 2007). Isoelectric point and molecular weight of the predicted mature peptides were calculated at ExPASy (http://web.expasy.org/compute_pi/, last accessed July 24, 2015). Structural homologies with proteins deposited in the RCSB Protein DataBank database were investigated by Phyre2 (Kelley and Sternberg 2009). The probabilities of codon bias for the six conserved cysteines and for the arginine residue responsible of the posttranslational pro region cleavage were calculated assuming a binomial distribution, based on the codon usage inferred from the M. galloprovincialis transcriptome (Gerdol et al. 2014) and using the tool “cusp” included in the EMBOSS package (Rice et al. 2000). We used PAML 4.7 (Yang 2007) and the graphical interface PAMLX 1.2 (Xu and Yang 2013) to test whether some sites in the codon-based alignments of MgCRP-I nucleotide sequences were under positive selection. In detail, only full-length CDS were processed and two site models were compared: M1, which assumes that the dN/dS ratio along the sequence ranges from 0 to 1 (purifying selection to neutral drift), and M2, which assumes that a few sites have a dN/dS ratio (i.e., ω > 1; positive selection). The likelihoods of the two models were compared using a likelihood ratio test (LRT) with a χ2 distribution, with 2 degrees of freedom. The Empirical Bayes approach was used to calculate the posterior probabilities (PP) for site classes. Positive selection was concluded at PP > 0.95.

Comparative Genomics Analyses

The NGS Illumina transcriptome data available for 71 bivalve species were downloaded from the Sequence Read Archive (SRA). The full list and the corresponding Bioproject accession IDs are shown in supplementary table S1, Supplementary Material online. The bivalve sequence data sets were independently de novo assembled with the CLC Genomic Workbench 7.5 (CLC Bio, Aarhus, Denmark). All transcriptomes were translated into the six possible open-reading frames (ORFs) with EMBOSS Transeq (Rice et al. 2000) and significant similarity with MgCRP-I proteins was assessed with BLASTp (e-value threshold 0.01) and HMMER v 3.0 using the protein profile mentioned above (P-value threshold 0.01). The complete UniProtKB/Swiss-Prot protein sequence database and the whole set of peptides predicted from the fully sequenced genomes of C. gigas (release 9) (Zhang et al. 2012) and P. fucata (v 1.0) (Takeuchi et al. 2012) were screened for the presence of the C-C-CC-C-C pattern with a custom Perl script, without any constraint about the spacing between cysteine residues. Only full-length sequences shorter than 100 amino acids and showing a signal peptide by SignalP 4.0 (Petersen et al. 2011) were selected. Sequences bearing more than seven cysteine residues within the mature region were considered as characterized by more complex disulfide arrays and discarded. The possible presence of misannotated CRP-I-like genes in C. gigas and P. fucata was evaluated by performing the same analyses on the genomic scaffolds translated into the six possible reading frames with EMBOSS TranSeq.

Phylogeny and Evolutionary Tests

We used all the available MgCRP-I proteins, their orthologous sequences identified in M. edulis (MeCRP-I), Mytilus californianus (McCRP-I) and Mytilus trossulus (MtCRP-I), and the positive hits resulting from the data mining, to infer the phylogenetic relationships among sequences bearing a similar cysteine signature. Given the high sequence diversity, only the signal peptides, as predicted using SignalP 4.0 (Petersen et al. 2011), were retained to facilitate sequence alignment. Following alignment with Muscle (Edgar 2004) and manual refinement, maximum-likelihood analyses were performed with RaxML (Stamatakis 2006) as implemented on the CYPRES Portal (www.phylo.org/portal2), using the RAxML-HPC2 on TG Tool. The robustness of the nodes was assessed with a bootstrapping procedure of 100 replicates. Because the evolutionary relationships of the peptides included in the analysis with other peptides were unknown, no outgroup was considered and a midpoint rooting strategy was applied. A similar analysis was performed with a data set that included only the M. galloprovincialis CRP-I peptides. Partial MgCRP-I peptides with an incomplete signal peptide were excluded from the analyses.

Peptide Synthesis, Oxidative Folding, and Disulfide Mapping

The peptides MgCRP-I 7 (26 amino acids) and MgCRP-I 9 (26 amino acids) were selected for solid phase peptide synthesis (SPPS) for their primary sequence features: A low number of hydrophobic residues, especially if not consecutive, facilitates synthesis and improves solubility during purification and folding, proline acts as secondary structure breaker during SPPS, and the presence of aromatic amino acids allows easy UV quantitation. The two peptides were synthesized according standard solid-phase Fmoc chemistry using four equivalents of HCTU/Fmoc-Xaa-OH/DIEA (0.95/1.00/1.90) with respect to the resin loading (Tentagel S-Trt, 0.2 mmol/g; Sigma-Aldrich, St. Louis, MO). The synthesis was semiautomatically performed with a customized Gilson Aspec XL peptide synthesizer (Middleton, WI) on a 0.05-mmol scale. Cysteines were manually added as N-α-Fmoc-S-trityl-l-cysteine pentafluorophenyl ester in order to minimize racemization. After cleavage from the resin the peptides were precipitated with diethylether, washed, and freeze-dried. The peptides reduced by TCEP (tris(2-carboxyethyl)phosphine) treatment were purified by RP-HPLC (Reverse Phased High-Performance Liquid Chromatography) on a semipreparative Zorbax 300SB-C18 9.4 × 250 mm column (Agilent, Santa Clara, CA) using a gradient from A (0.1% trifluoroacetic acid [TFA] in water) to B (0.1%TFA/60% acetonitrile in water) in 100 min at 4 ml/min. The calculated K* (retention factor) is 4.12 assuming a shape selectivity factor (S) for the peptides of 0.25*Mw0.5. Peptide fractions from the semipreparative RP-HPLC were checked by electrospray mass spectrometry (amaZonSL iontrap; Bruker, Billerica, MA) and fractions with at least 95% purity were quantified by UV absorbance at 280 nm and immediately diluted at 0.1 mg/ml in either of the following refolding buffers: 1) RefoldA: 0.2 M Tris–HCl, 2 mM ethylenediaminetetraacetic acid (EDTA), 10 mM glutathione (GSH), 1 mM glutathione disulfide (GSSG), pH 8, previously degassed with argon bubbling; and 2) RefoldB: 50 mM NaOAc, 1 mM EDTA, 1 mM GSH, 0.1 mM GSSG, 2 M (NH4)2SO4, pH 7.7. The oxidative refolding proceeded for 18 h at 4 °C, was quenched by TFA addition and finally checked by Liquid Chromatography-Mass Spectrometry (LC-MS) analysis. All proteolysis reactions were carried out at 37 °C for 18–48 h in sodium acetate buffer (100 mM, pH 5.5) containing 1 M GuHCl and 5 mM CaCl2. The purified MgCRP-I 7 peptide (60 μg) was dissolved in 90 µl of buffer and trypsin (3 μg) was added. A second aliquot of MgCRP-I 7 (60 µg in 90 µl) was incubated for 48 h at 37 °C in the presence of chymotrypsin (6 μg). Digestions of MgCRP-I 9 with trypsin and chymotrypsin were carried out in the same conditions. The digestions were quenched using formic acid (1% final) and the proteolytic fragments were fractionated by RP-HPLC (column Jupiter C18, 1 × 50 mm, Phenomenex (Torrance, CA) using a gradient from water/0.1% formic acid to 60% acetonitrile and analyzed by Liquid Chromatography-tandem Mass Spectrometry (LC-MS/MS) (amaZonSL; Bruker).

Gene Expression Analysis

The expression levels of selected MgCRP-I genes were evaluated in samples representing hemolymph, digestive gland, inner mantle, mantle rim, gills, foot, and posterior adductor muscle. Total RNA was extracted from the tissues of 30 adult specimens (5–7 cm shell length) collected from the Gulf of Trieste, Italy, homogenized in equal quantity in Trizol (Life Technologies, Carlsbad, CA) according to the manufacturer’s protocol. RNA quality was assessed by electrophoresis on denaturing agarose gel and its quantity was estimated by UV-spectrophotometry. cDNAs were prepared using a qScript cDNA Synthesis Kit (Quanta BioSciences Inc., Gaithersburg, MD) according to the manufacturer’s instructions. Primer pairs were designed to obtain the specific polymerase chain reaction (PCR) amplicons (table 1), with the exception of the primer pairs coamplifying the paralogous sequences MgCRP-I 3/25 and MgCRP-I 10/26. The 15 µl PCR reaction mix comprised 7.5 µl of SsoAdvanced SYBR Green Supermix (Bio-Rad, Hercules, CA), 0.3 µl of each of the two 10 µM primers, and 2 µl of a 1:20 cDNA dilution.

Table 1

Primers Designed for Assessing the Tissue-Specific Expression Levels of MgCRP-I Genes by Real-Time PCR

Primer Name	Primer Sequence
MgCRP-I 1 for	TGTGTGTTGTTGGTCGTCGT
MgCRP-I 1 rev	GTAACCGGAACGACAAAAGC
MgCRP-I 2 for	AGCCTCAAGTAAGAAGTAAAACAGA
MgCRP-I 2 rev	CAGCTTMTTCTACCGCATCC
MgCRP-I 3/25 for	GACAAAGTGAACTAAAGCATTTCA
MgCRP-I 3/25 rev	CTCCGTTTTCTCCAAAGCTG
MgCRP-I 4 for	CATGGCACATGAMGAAATGC
MgCRP-I 4 rev	TTAGCCACCATAGCGTTTGC
MgCRP-I 5 for	TGGATAAAAGGTGACCCACAG
MgCRP-I 5 rev	TCTTCCAGCATTTCGTCCTT
MgCRP-I 6 for	AAYATGGCGAAGGAAGACAT
MgCRP-I 6 rev	AAGTTCAGTCGCGCCTACAT
MgCRP-I 7 for	GTTGGAGTCAACATGGCAAA
MgCRP-I 7 rev	GCGCATGCATTTTCTGTAAG
MgCRP-I 8 for	GCATTTGCTTATAGTGTTGCAGA
MgCRP-I 8 rev	TKCAAATGATGGATGGCTAA
MgCRP-I 9 for	GCTTTTTGTTTGTTTGGTAGCC
MgCRP-I 9 rev	CGAACACATCTTCTGTATGAGCA
MgCRP-I 10/26 for	GGCACATGAAGAAATGTTCG
MgCRP-I 10/26 rev	CCTGCATACGCCAAAACAT
MgCRP-I 11 for	TAAACCCCTTGTTCGGTCAC
MgCRP-I 11 rev	AGTGTGACGGATGCAAACAA
MgCRP-I 14 for	AGCCTTCGTTGGAACTAGCA
MgCRP-I 14 rev	TCGAGCGAGATTGACATCTG
multi-MgCRP-I 1 for	CTGACGAAATGGTGGAGGAT
multi-MgCRP-I 1 rev	TACAGCATTGACGGCTGTTT
multi-MgCRP-I 2 for	GCAAACATGGCCAAAGAAGT
multi-MgCRP-I 2 rev	GTCACGGGTCTTTTTGCATT
multi-MgCRP-I 3 for	AAGAGCTCCTGCATGTGGAT
multi-MgCRP-I 3 rev	TCCTCCTCCCGTTCTCTTTT
EF-1 alpha for	CCTCCCACCATCAAGACCTA
EF-1 alpha rev	GGCTGGAGCAAGGTAACAA

Primers Designed for Assessing the Tissue-Specific Expression Levels of MgCRP-I Genes by Real-Time PCR The following thermal profile was used for quantitative PCR (qPCR) amplification in a C1000 thermal cycler (Bio-Rad): An initial denaturation step at 95 °C for 3′, followed by 40 cycles at 95 °C for 5″ and 55 °C for 30″. The products of amplification were analyzed with a 65/95 °C melting curve. The expression of the selected genes was calculated with the delta Ct method; Ct values were corrected based on primer pairs PCR efficiencies using Lin-RegPCR (Ramakers et al. 2003) and expression values were normalized using the elongation factor EF-1 as a housekeeping gene. Results are shown as the mean with standard deviation of three technical replicates.

Cytotoxicity Assays

Human colorectal carcinoma (HT-29), human neuroblastoma (SHSY5Y), and breast cancer (MDAMB231) cell lines were used for the cytotoxicity assays. HT-29 was maintained in RPMI-1640 and MDAMB231 was maintained in Dulbecco’s Modified Eagle’s Medium (DMEM): The culture medium was supplemented with 10% (v/v) fetal bovine serum (FBS), penicillin (100 U/ml), streptomycin (100 µg/ml), and l-glutamine 2 mM. SHSY5Y was cultured in DMEM medium supplemented with penicillin (100 U/ml), streptomycin (100 µg/ml), l-glutamine 2 mM, and with 10% heat-inactivated FBS. Cells were grown at 37 °C in a 95% air and 5% CO2 humidified incubator. HT-29, SHSY5Y, and MDAMB231 were harvested by trypsinization and plated into 96-well culture plates at a density of approximately 1.5 × 104 cells per well. After 24 h of incubation, different concentrations of MgCRP-I 7 and 9 (10, 1, 0.1, and 0.01 µM) dissolved in culture medium were added to each well. Then, the samples were incubated 24 h at 37 °C in the humidified atmosphere (5% CO2). The colorimetric 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide (MTT) assay was performed to assess the metabolic activity of cells treated as described above. Aliquots of 20 µl stock MTT (5 mg/ml) were added to each well, and cells were then incubated for 4 h at 37 °C. Cells were lysed with isopropanol HCl 0.04 N. Absorbance was measured at 540 and 630 nm using a microplate reader (Automated Microplate Reader EL311; BIOTEK Instruments, VT). All measurements were done in six technical replicates, and three independent experiments were carried out.

Insecticidal Test

The synthetic peptides MgCRP-I 7 and 9 were dissolved in Phosphate Buffered Saline (PBS) and injected with a sterile syringe in Zophobas morio larvae (∼50 mm long, weighting ∼500 mg). The control group (n = 10) was injected with a volume of 50 µl PBS. Two experimental groups of larvae for each peptide (n = 10) were injected with 30 and 300 µg peptide/kg body weight, respectively, for an injection volume of 50 µl. Larvae were monitored for signs of neurotoxicity for 48 h, including lack of movement, twitching, and death. During the experimental time course, larvae were not fed and kept at room temperature. The median lethal dose (LD50) for the two mussel peptides was calculated according to Tedford et al. (2001).

Bacterial/Fungal Strains and Minimum Inhibitory Concentration (MIC assay)

The growth inhibitory effect of MgCRP-I 7 and 9 was tested on Escherichia coli ATCC 25922, Staphylococcus aureus ATCC 25923, two strains of Candida albicans (ATCC 90029 and a clinical isolate), four strains of Cryptococcus neoformans (ATCC 90112, ATCC 52816, ATCC 52817, and a clinical isolate), and two strains of filamentous fungi (a clinical isolate of Aspergillus fumigatus and Aspergillus brasiliensis ATCC 16404). The bacterial inoculum was incubated overnight in Mueller–Hinton Broth (MHB, Difco) at 37 °C with shaking. For the assays, the overnight bacterial cultures were diluted 1:30 in fresh MHB and incubated at 37 °C with shaking for approximately 2 h to obtain a midlogarithmic phase bacterial culture. Fungi were grown on Sabouraud agar (Difco) plates at 30 °C for 48 h. Fungal suspensions were prepared by picking and suspending five colonies in 5 ml of sterile PBS. The turbidity of the bacterial or fungal suspensions was measured at 600 nm and was adjusted to obtain the appropriate inoculum according to previously derived curves relating the number of colony forming units with absorbance. Filamentous fungi were grown on Sabouraud agar slants at 30 °C for 7 days. The fungal colonies were then covered with 3 ml of PBS and gently scraped with a sterile pipette. The resulting suspensions were transferred to sterile tubes, and heavy particles were allowed to settle. The turbidity of the conidial spore suspensions was measured at 600 nm and was adjusted in Sabouraud broth to obtain an appropriate inoculum. The antimicrobial activity was evaluated by the broth microdilution susceptibility assay performed according to the guidelines of the Clinical and Laboratory Standards Institute and previously described (Benincasa et al. 2004, 2010). Briefly, 2-fold serial dilutions of MgCRP-I 7 and MgCRP-I 9 were prepared in 96-well microplates in the appropriate medium, to a final volume of 50 μl. Fifty microliters of bacterial suspension in MHB, or fungal suspension in Sabouraud, was added to each well to a final concentration of 1–5 × 105 cells/ml for bacteria, and 5 × 104 cells/ml for fungi. Bacterial and fungal samples were then incubated at 37 °C for 24 h or 30 °C for 48 h, respectively. The MIC (Minimum Inhibitory Concentration) was taken as the lowest concentration of peptide resulting in the complete inhibition of visible growth after incubation. All tests were performed in triplicate.

Results and Discussion

MgCRP-I Sequence Features

Overall, we identified 67 different MgCRP-I sequences (table 2). More in detail, 48 sequences could be identified in over 2 million genomic contig sequences, which provide a preliminary view of the mussel genome (Nguyen et al. 2014). Twenty-three of them were also detected as expressed transcripts in publicly available RNA-seq data. In addition, 19 expressed sequences with no match in the genomic contigs were identified, likely corresponding to genes located in genomic regions which are still not covered by the assembly. Overall, 41 sequences can be considered of full length, as the entire CDS from the initial ATG to the STOP codon was represented. The remaining partial sequences (table 2) were lacking either the 5′- or the 3′-end, due to truncated genomic contigs or low read coverage from RNA-seq data. In addition, BLAST (Basic Local Alignment Search Tool) and HMMER approaches revealed at least six pseudogenes with an ORF interrupted by nonsense or frameshift mutations and which lacked expression in RNA-seq experiments (supplementary table S5, Supplementary Material online).

Table 2

List of the MgCRP-I Sequences Identified in This Work

Sequence Name	Status^a	Evidence	Genomic Scaffold	Cysteine-Rich Domains
MytiCRP-I 1	Complete	T	/	1
MytiCRP-I 2	Complete	G, T	APJB011836849.1	1
MytiCRP-I 3	Complete	T	/	1
MytiCRP-I 4	Complete	G, T	APJB011405634.1	1
MytiCRP-I 5	Complete	G, T	APJB010137175.1	1
MytiCRP-I 6	Complete	T	/	1
MytiCRP-I 7	Complete	T	/	1
MytiCRP-I 8	Complete	G, T	APJB0118677191.1	1
MytiCRP-I 9	Complete	G, T	APJB010390130.1	1
MytiCRP-I 10	Complete	T	/	1
MytiCRP-I 11	Incomplete	T	/	1
MytiCRP-I 12	Complete	T	/	None
MytiCRP-I 13	Complete	G, T	APJB010539225.1	1
MytiCRP-I 14	Complete	G, T	APJB011981014.1	1
MytiCRP-I 15	Complete	G, T	APJB012303283.1	1
MytiCRP-I 16	Complete	T	/	1
MytiCRP-I 17	Complete	T	/	1
MytiCRP-I 18	Complete	G, T	APJB010096024.1	1
MytiCRP-I 19	Complete	T	/	1
MytiCRP-I 20	Incomplete	T	/	1
MytiCRP-I 21	Complete	T	/	1
MytiCRP-I 22	Complete	G, T	APJB010149223.1	1
MytiCRP-I 23	Complete	T	/	None
MytiCRP-I 24	Complete	G, T	APJB011420996.1	1
MytiCRP-I 25	Complete	G, T	APJB011525896.1	1
MytiCRP-I 26	Complete	G, T	APJB011451595.1	1
MytiCRP-I 27	Incomplete	G	APJB010022937.1	1
MytiCRP-I 28	Complete	G	APJB010019019.1	1
MytiCRP-I 29	Incomplete	G, T	APJB010215939.1	1
MytiCRP-I 30	Incomplete	G	APJB010309773.1	1
MytiCRP-I 31	Complete	G, T	APJB010337167.1	1
MytiCRP-I 32	Incomplete	G	APJB010405325.1	1
MytiCRP-I 33	Incomplete	G	APJB010538560.1	1
MytiCRP-I 34	Complete	G, T	APJB010602145.1	1
MytiCRP-I 35	Incomplete	G	APJB010726714.1	1
MytiCRP-I 36	Complete	G, T	APJB010858750.1	1
MytiCRP-I 37	Incomplete	G	APJB011013544.1	1
MytiCRP-I 38	Incomplete	G, T	APJB011377302.1	1
MytiCRP-I 39	Incomplete	G, T	APJB011417411.1	1
MytiCRP-I 40	Complete	G	APJB011602152.1	1
MytiCRP-I 41	Incomplete	G	APJB01171896.1	1
MytiCRP-I 42	Incomplete	G	APJB011833940.1	1
MytiCRP-I 43	Incomplete	G	APJB011892489.1	1
MytiCRP-I 44	Incomplete	G	APJB011902451.1	1
MytiCRP-I 45	Complete	G, T	APJB012001676.1	1
MytiCRP-I 46	Incomplete	G	APJB012002994.1	1
MytiCRP-I 47	Incomplete	G	APJB012084462.1	1
MytiCRP-I 48	Incomplete	G	APJB011591868.1	1
MytiCRP-I 49	Incomplete	G	APJB011815456.1	1
MytiCRP-I 50	Complete	T	/	1
MytiCRP-I 51	Complete	T	/	1
MultiMytiCRP-I 1	Complete	T	/	3
MultiMytiCRP-I 2	Complete	G, T	APJB011508508.1	4
MultiMytiCRP-I 3	Complete	G, T	APJB012209485.1	2
MultiMytiCRP-I 4	Incomplete	T	/	2
MultiMytiCRP-I 5	Complete	T	/	2
MultiMytiCRP-I 6	Incomplete	G	APJB010388843.1	2 or more
MultiMytiCRP-I 7	Incomplete	G	APJB010167718.1	2 or more
MultiMytiCRP-I 8	Incomplete	G	APJB010303175.1	3 or more
MultiMytiCRP-I 9	Incomplete	G	APJB010305277.1	4
MultiMytiCRP-I 10	Incomplete	G	APJB010449694.1	4
MultiMytiCRP-I 11	Complete	G	APJB010750334.1	2
MultiMytiCRP-I 12	Complete	G	APJB011083975.1	2
MultiMytiCRP-I 13	Complete	G	APJB011153262.1	2
MultiMytiCRP-I 14	Complete	G, T	APJB011903515.1	4
MultiMytiCRP-I 15	Incomplete	G	APJB011965594.1	2
MultiMytiCRP-I 16	Complete	T	/	2

Note.—T, transcriptome; G, genome.

aComplete sequence corresponds to a full-length coding sequence, from the initial ATG to the STOP codon.

List of the MgCRP-I Sequences Identified in This Work Note.—T, transcriptome; G, genome. aComplete sequence corresponds to a full-length coding sequence, from the initial ATG to the STOP codon. The MgCRP-I peptides are secreted pro-peptides characterized by two features: A conserved pre-pro region and the presence of at least one conserved cysteine array C-C-CC-C-C. In detail, all the members of this family display an unambiguous N-terminal signal peptide cleavage site, followed by an approximately 15 residues long pro-region ending with a highly conserved dibasic cleavage site for proprotein convertases (KR or, more rarely, RR). Although the signal peptide and pro-peptide regions show low sequence variability, the C-terminal region of MgCRP-I corresponding to the putative mature peptide appears hypervariable (fig. 1). The six invariant cysteine residues involved into the formation of intramolecular disulfide bridges are embedded within this highly variable region. The two central cysteine residues (Cys3 and Cys4) are directly linked with a peptidic bond. As a result, the consensus of these peptides can be defined as C(X3–6)C(X1–7)CC(X3–4)C(x3–5)C (figs. 1 and 2). We also detected a limited number of protein-coding genes sharing significant sequence similarity with MgCRP-I peptides but which lacked the expected cysteine array (these sequences will be described in detail in the sections below); these, together with noncoding pseudogenes, should be defined as Mg-CRP-I-like sequences.

Exon/intron structure of the complete coding regions of the MgCRP-I 13, 14, 28, 45 and multi-MgCRP-I 2 genes (A) and corresponding organization of the encoded peptide precursors (B). The positions of the signal peptide, pro-region and mature peptide regions are highlighted, and each cysteine-rich module is marked by a box.

Sequence variability of MgCRP-I sequences; variability index W is plotted in the upper panel, whereas the sequence consensus, obtained with Weblogo (http://weblogo.berkeley.edu, last accessed July 24, 2015) is shown in the lower panel. Only positions covered by at least 50% sequences in the global alignment of MgCRP-I peptides are shown. Sites under positive selection are indicated by an asterisk. Exon/intron structure of the complete coding regions of the MgCRP-I 13, 14, 28, 45 and multi-MgCRP-I 2 genes (A) and corresponding organization of the encoded peptide precursors (B). The positions of the signal peptide, pro-region and mature peptide regions are highlighted, and each cysteine-rich module is marked by a box. Most MgCRP-I peptides present a short C-terminal extension and, after the sixth cysteine, they often display dibasic amino-acidic motifs which might be the target of posttranslational cleavage by carboxypeptidase E, one of the most common modifications observed in neurotoxic peptides from invertebrates such as scorpion venoms (Xiong et al. 1997) and conotoxins (Fan et al. 2003; Wang et al. 2003). Secreted CRPs are among the molecules undergoing the largest amount of different posttranslational modifications, as demonstrated by the case of conotoxins (Craig et al. 1999; Bergeron et al. 2013) and the case of MgCRP-I peptides could be similar; given the difficulty of obtaining purified peptides from mussel tissues due to their low expression levels, in the absence of proteomic studies we had to rely on in silico prediction for the identification of the most likely modification sites. Based on the predicted proteolytic cleavages, and with a few exceptions, the virtual MgCRP-I mature peptides are 25–38 amino acids long with estimated molecular weight of 2.5–4 kDa. Almost invariably, the mature peptides have a basic isoelectric point (mostly between 8 and 9.5), indicative of positive net charge at physiological pH, which might be balanced by the presence of conserved negatively charged residues in the pro-region (fig. 1); this feature might be important for the biological activity of MgCRP-I peptides, as it is maintained in all sequences despite their remarkable sequence variability. In addition to the standard pre-pro-peptide organization described above, several peculiar transcripts, named “multi-MgCRP-I” encoding precursors characterized by multiple cysteine-rich modules were also identified (table 2). The modules, ranging from 2 to 4 in number, are structurally close to each other, as each Cys6 of the N-terminal module is separated by just 2–4 residues from the Cys1 of the following one. Cysteine-rich domains of multi-MgCRP-I do not show any peculiarity compared with those of regular, mono-domain peptides (see supplementary fig. S3, Supplementary Material online) and the different domains within the same sequence are likely derived by the duplication of a single original module. Although the functional significance of the multi-MgCRP-I peptides is still unknown, the maintenance of cysteine pattern and isoelectric points within each single domain (data not shown) suggest that these long precursors might be posttranslationally cleaved into smaller functional peptides. In this case, such a process would be an interesting strategy adopted to achieve the coexpression of different variants, in a similar fashion to other invertebrate AMPs characterized by several sequential tandem repeats of conserved motifs (Casteels-Josson et al. 1993; Destoumieux-Garzón et al. 2009; Rayaprolu et al. 2010; Ratzka et al. 2012).

Structure of MgCRP-I Genes

Despite the low average size of the assembled genomics contigs of M. galloprovincialis, we could identify 11 MgCRP-I gene regions corresponding to a full-length coding sequence, from the initial ATG to the STOP codon (MgCRP-I 13, 14, 18, 24, 28, 40, 45, multi-MgCRP-I 2, 11, 12, and 13) and several partial matches, either to the 3′ or to the 5′ region of the CDS (table 2). The structure of MgCRP-I genes is conserved, with four exons and three introns (fig. 2). The first exon, which could only be annotated in five sequences thanks to the alignment with RNA-seq data, includes part of the 5′-UTR region. In most cases, the second exon encodes the first 17 amino acids of the precursor protein, thus comprising most of the signal peptide. A phase-1 intron separates the second and the third exon. The third exon is approximately 100 nt long and covers the signal peptide cleavage site and most of the propeptide region. The ORF is interrupted by a phase-2 intron, which separates the third and the fourth exons. The last exon invariably comprises the final ten nucleotides of the pro-region, including the highly conserved dibasic precursor cleavage site, the complete cysteine-rich C-terminal region, and the entire 3′-UTR region. Although the modular organization of the multi-MgCRP-I precursor proteins finds striking similarities to other invertebrate AMP-related cysteine-rich proteins (i.e., the Lepidoptera X-Tox family), their genomic organization is remarkably different: Indeed, although the defensin-like motifs in X-Tox are encoded by separate exons (d’Alençon et al. 2013), all cysteine-rich motifs in multi-MgCRP-I precursors are encoded within a single exon. A schematic representation of the structural organization of MgCRP-I genes and encoded precursor proteins is shown in figure 2.

Gene Duplication and Positive Selection Are Driving the Evolution of the MgCRP-I Gene Family

The combined genomic/transcriptomic analyses indicate the presence of at least 67 different potentially functional CRP-I loci in the genome of the Mediterranean mussel. Owing to the preliminary nature of the released mussel genome (Nguyen et al. 2014), this has to be considered as a conservative estimate. The phylogenetic analysis of the CRP-I signal peptide regions (fig. 3) evidenced the existence of several highly similar paralogous genes, highlighting the important role of gene duplication events in the evolution of the MgCRP-I gene family, which in some cases appear to have occurred very recently (supplementary fig. S4, Supplementary Material online). A number of MgCRP-I pseudogenes with frameshift or missense mutations were identified in the mussel genome (supplementary table S5, Supplementary Material online) and, at the same time, the frequent sequence truncations caused by the small size of the genomic contigs make impossible to infer how many of the incomplete MgCRP-I loci are fully functional (see table 2). For the same reason, the presence of common regulatory regions and transposable elements which could have driven the expansion of this gene family will be matter of future investigations.

Maximum-likelihood tree obtained with the MgCRP-I peptides based on the alignment of the signal peptide region only. Only bootstraps values greater than 75 are shown. Some sequences were not considered in this analysis as their N-terminal region was incomplete (see table 2). Arrows indicate two MgCRP-I-like peptides with a disrupted cysteine array (MgCRP-I 12 and 23), marking an unconventional mature region. However, gene duplication is not sufficient by itself to explain some peculiar features of MgCRP-I genes: Indeed, the amino acid diversity of the peptide precursors is strikingly higher within the mature peptide region compared with the signal peptide and pro-region which are, in turn, highly conserved (fig. 1). This observation suggests an increased rate of mutations within the fourth exon, and an accelerated evolutionary rate. The LRTs we performed to assess this hypothesis strongly support positive selection of the MgCRP-I genes (P = 1.717 × 10−8). In fact, we could identify nine positively selected sites (PP > 0.95), all located within the mature peptide region, after the cleavage site of the pro-peptide (fig. 1). Five of the six invariable cysteines engaged in disulfide bridges, buried within this hypervariable and positively selected region, undergo site-specific codon preservation (fig. 4). This peculiar phenomenon, which was also observed for the arginine responsible of the pro-peptide cleavage, is likely driven by the unique properties of these residues for the maintenance of the tridimensional structure and an efficient biosynthesis and folding of the mature peptide, as suggested for many other protein families, including conotoxins (Conticello et al. 2001; Steiner et al. 2013).

Codon usage for the Arg residue responsible of the pro-peptide cleavage site and for the six cysteine residues engaged in disulfide bridges, calculated on Mytilus galloprovincialis MgCRP-I peptides. The probabilities of finding the observed codon biases were calculated assuming a binomial distribution and the codon usage (a priori probabilities) inferred from the transcriptome published by Gerdol et al. (2014) (75.1–24.9% for TGT-TGC, encoding Cys, 49.3–16.3–13.3–12.5–4.7–3.8% for AGA-AGG-CGA-CGT-CGG-CGC, encoding Arg). Significant (P < 0.05) and highly significant (P < 0.01) deviations from the expected distributions (P < 0.01) are marked by * and **, respectively. NS, not significant. Although structurally important codons appear to be somehow protected from variation, the high selective pressure acting on the fourth exon (encoding the entire mature peptide) of MgCRP-I genes in some cases introduced mutations which resulted in the loss of cysteine residues (supplementary fig. S5, Supplementary Material online). MgCRP-I 12 and MgCRP-I 23 represent two instructive examples, as the deduced proteins maintain the highly conserved signal peptide and pro-peptide regions, features clearly identifying them as CRP-I-related sequences in a phylogenetic analysis though lacking the canonical cysteine array. Indeed, MgCRP-I 12 lacks four of the six conserved cysteine residues and just the two adjacent residues Cys3 and Cys4 are retained; on the other hand, MgCRP-I 23 is an even more extreme case, as completely devoid of cysteines. Altogether, we propose to identify cases such as MgCRP-I 12 and 23 and the six noncoding pseudogenes we identified as MgCRP-I-like sequences, as they lack one of the two main distinctive features of the MgCRP-I family (a conserved signal peptide and the C(X3–6)C(X1–7)CC(X3–4)C(X3–5)C array), but they retain significant similarity with known MgCRP-I sequences (detectable by HMMER or BLASTn with an e-value threshold of 1 × 10−5). These criteria will be important to identify further MgCRP-I-related loci once the mussel genome will be fully released.

CRP-I Sequences Are Only Found in the Order Mytiloida

Following BLAST searches, the MgCRP-I peptides did not show significant sequence similarity with any other sequence deposited in public databases and the prediction of their tridimensional structure was considered unreliable by Phyre 2, due to the absence of models sharing sufficient sequence affinity within the PDB database. For this reason, we investigated the presence of MgCRP-I-like sequences in genomic and transcriptomic data sets which are increasingly available also for bivalve mollusks (Suárez-Ulloa, Fernández-Tajes, Manfrin, et al. 2013). Looking for short secreted peptides with a C-C-CC-C-C motif in the transcriptomes of 71 different species (see Materials and Methods and supplementary table S1, Supplementary Material online), we could find this signature only in the mussels M. edulis, M. trossulus, and M. californianus. The first two species and M. galloprovincialis are widespread and genetically close to each other, as evidenced by the presence of natural hybrid populations (Beaumont et al. 2004) whereas M. californianus is distributed in the Pacific coast of North America and is more distantly related to the other mussel ecotypes (Hilbish et al. 2000). The full length CRP-I sequences identified, here named McCRP-I, MeCRP-I or MtCRP-I on the basis of the species name, are reported in supplementary material, Supplementary Material online. No CRP-I-like sequence was detected in any of the other bivalve transcriptomes and we hypothesize their complete absence or no detectable expression in the analyzed bivalve species. Due to the high depth of NGS technologies, the latter hypothesis is unlikely and the genomic analysis of the oysters C. gigas and P. fucata strongly supports the absence of CRP-I-like genes, thus ruling out the possibility of a missed detection of poorly expressed transcripts. In addition, no evidence of CRP-I-like peptides was found in Limnoperna fortunei, Perna viridis and Bathymodiolus platifrons, the only three species among the over 50 different genera of the order Mytiloida, beside Mytilus spp., which have been subjected to Illumina RNA-seq so far. Based on the available data, CRP-Is are certainly present in Mytilus spp. and appear to be absent in other mytiloids. As the C-C-CC-C-C array is not common in bivalves and nothing similar was found in the fully sequenced genomes of C. gigas (order Ostreoida) and P. fucata (order Pterioida), the CRP-I-like sequences appear to have a narrow taxonomical distribution, comparable to that of other mussel CRPs (i.e., mytilins, myticins and mytimycins which cannot be found outside Mytiloida) and can be therefore considered as a taxonomically restricted gene family (Khalturin et al. 2009).

Occurrence of the C-C-CC-C-C Array in Protein Databases and Relationship with CRP-I Peptides

Large-scale bioinformatic analyses revealed the presence of a CRP-I-like cysteine pattern mostly in animals and, among them, almost exclusively in invertebrates (see table 3). More in detail, within the UniProtKB/Swiss-Prot database we found 452 peptides (279 if considering nonredundant peptides based on a 95% sequence identity criterion) mostly belonging to invertebrate animals: Cone snails, turrids and terebrids (grouped as Conoidea in fig. 5), spiders, scorpions, pancrustaceans (with a single horseshoe crab and 35 insect sequences), and nematodes. Almost the totality of these peptides displays a neurotoxic activity due to their high affinity to ion channels, like in the case of conotoxins, turritoxins, and teretoxins (Imperial et al. 2003; Terlau and Olivera 2004; Aguilar et al. 2009) and peptides produced in the venom gland of spiders (Zhang et al. 2010) and scorpions (Ma et al. 2009). With the exception of a wasp toxin, all the entries from insects were related to bombyxin, a prothoracicotropic hormone involved in morphogenesis (Nijhout and Grunert 2002).

Table 3

Number of Peptides with an MgCRP-I-Like Cysteine Array Found in the UniProtKB/Swiss-Prot Protein Sequence Database, Listed per Taxonomic Group (Function Is Indicated If Available)

Taxonomic Group	Number of Identified CRPs in UniprotKB^a	Molecular Function
Cone snails	163	Conotoxins
Spiders	223	Venom toxins
Fungi	2	Uncharacterized
Viruses (Buculoviridae)	4	Uncharacterized
Insects	35	Hormones/venom toxins
Green Plants	5	AMPs
Scorpions	8	Venom toxins
Terebrids	2	Teretoxins
Horseshoe crabs	1	AMPs
Nematodes	1	Insulin-like
Vertebrates	6	Unknown/AMPs
Turrids	2	Turritoxins

aNonredundant positive matches based on threshold criteria of a 95% sequence identity.

Maximum-likelihood tree obtained with the signal peptide of MgCRP-I, the orthologous sequences from other mussel species, and all the CRPs mined from UniProtKB/Swiss-Prot (see Materials and Methods). Peptides from pancrustaceans, conoideans, spiders, scorpions, nematodes, chordates, plants, fungi, viruses, and mussels are shown. Mussel CRP-I peptides are highlighted in a gray background. Number of Peptides with an MgCRP-I-Like Cysteine Array Found in the UniProtKB/Swiss-Prot Protein Sequence Database, Listed per Taxonomic Group (Function Is Indicated If Available) aNonredundant positive matches based on threshold criteria of a 95% sequence identity. No CRP-I-like cysteine pattern was found in chordates, with the exception of six peptides (five intestinal trefoil factors, with unclear function, and veswaprin, a snake AMP). A limited number of such peptides were found in fungi, with the positive hits corresponding to uncharacterized proteins, and in green plants, with all cases representing AMPs. The CRP-I-like array was also found in Baculoviruses (in a viral family of conotoxin-like peptides) (Eldridge et al. 1992). The full list of these peptides with their taxonomic origin and reported function is shown in supplementary table S2, Supplementary Material online. Given the great sequence diversity, likely dependent on a fast evolutionary rate of molecular substitutions, the relationships among the MgCRP-I peptides (fig. 3) and with other CRPs (fig. 5) remain unresolved. Nevertheless, several features could be underlined by the phylogenetic analyses. All MgCRP-I sequences and the orthologous sequences from M. edulis, M. trossulus and M. californianus clustered together in a single clade, highlighting that the conservation of the signal peptide is a relevant criteria for the identification of CRP-I protein precursors. Most of the other CRP sequences clustered in distinct groups which included peptides from the same taxon (Pancrustacea, Conoidea, Spiders, Scorpions, Nematodes, Chordates, Plants, Fungi, and Viruses), even if: 1) Peptides from a single taxon were found in different clusters (e.g., conopeptides typically cluster in different groups that correspond to different superfamilies [Kaas et al. 2010; Puillandre et al. 2012]); and 2) some peptides, characterized by a long branch in the tree, clustered in a group mostly composed by peptides from another taxon (as an example, two conopeptides clustered within the group of mussel CRP-I peptides, with long branches, suggesting that they may belong to completely different structural classes). Overall, no distinct traits can unequivocally link the evolutionary history of CRP-I peptides with those of other protein families characterized by the same cysteine array. As CRP-I-like peptides are absent in bivalves other than mussels, the only other molluscan group where the C-C-CC-C-C array is present is represented by Conoidea (Gastropoda). Nevertheless, the absence of this molecular motif in the fully sequenced genomes of other gastropods (Aplysia californica, Lottia gigantea, and Biomphalaria glabrata) suggests that it might have been acquired independently in these two molluscan groups. Even though the large sequence divergence prevents definitive conclusions, the study of the disulfide bonds topology in the MgCRP-I synthetic peptides provided further support to this hypothesis, as reported in the next section.

Oxidative Refolding and Disulfide Bond Topology of Synthetic MgCRP-I Peptides

Optimization of the oxidative folding yield for peptides with disulfide bridges is still an empiric exercise. Various parameters affect folding yields, such as temperature, additives, redox couples, peptide concentration, and duration of the folding reaction (Bulaj 2005; Bulaj and Olivera 2008). We synthetized the MgCRP-I 7 and MgCRP 9 peptides and their purified fractions were subjected to oxidative folding reactions in the presence of redox reagents GSH and GSSG (10:1) at 4 °C buffered at pH 8 (RefoldA; see Materials and Methods). This protocol, with slight modifications, has been commonly used in the refolding of disulfide rich proteins and, for example, it yielded good amounts (∼90%) of the cystine knot peptide Huwentoxin-IV in its native structure (Deng et al. 2013) and also in the synthesis of ω-conotoxin MVIIC (∼50%). We tested also a high salt refolding mixture (RefoldB) which had shown improved yields of the same conotoxin (Kubo et al. 1996). In our hands each peptide refolding mixture (RefoldA) produced one major component, with purity approximately 85% and approximately 75% for MgCRP-I 7 and MgCRP-I 9, respectively, as determined by RP-HPLC peak area integration. RefoldB gave similar results but with slightly lower yields (data not shown). According to Electrospray Ionisation Mass Spectrometry (ESI-MS), the molecular mass of folded MgCRP-I 7 and folded MgCRP-I 9 was in good agreement with those of the fully oxidized products (see supplementary figs. S3 and S4, Supplementary Material online): 3,160.0 (calc. mono. [calculated monoisotopic mass] 3,160.2) and 3,090.1 (calc. mono. = calculated monoisotopic mass 3,090.2), respectively. The evidence of one dominant product in the refolding of both peptides is consistent with the assumption of native conformation but, on the other hand, no comparison is currently possible between the synthetic products and the native counterparts eventually present in Mytilus, in particular due to the very low expression of MgCRP-I gene products in all tissues in physiological conditions (see section below). We performed a disulfide connectivity prediction using the DiANNA (DiAminoacid Neural Network Application) Web Server of the Boston college (Ferrè and Clote 2005); the algorithm predicted a 1-2, 3-4, 5-6 topology for MgCRP-I 9 and a 1-4, 2-6, 3-5 topology for MgCRP-I 7 but with a very high score (0.76 on a maximum of 1) in favor of a Cys14–Cys15 disulfide. At this point, we experimentally determined the disulfide bond geometry using enzymatic fragmentation (trypsin and chymotrypsin) and LC/MS/MS. The analysis of both peptides revealed that the cysteine connectivity follows the nearest-neighbor pattern (1-2, 3-4, 5-6), namely Cys3–Cys8, Cys14–Cys15, and Cys20–Cys25 (see supplementary tables S3 and S4, Supplementary Material online). In a published detailed disulfide classification based on SwissProt and PFam databases, the topology 1-2, 3-4, 5-6 is largely represented and contains a very heterogeneous ensemble of protein families (Gupta et al. 2004); notably, the vicinal disulfide bond present in our MgCRP-I peptides and formed between the side chains of adjacent cysteines (Cys14–Cys15) represents a rare structural element. The vicinal disulfide, due to its intrinsic constrained nature, is usually described to be accompanied by the formation of a tight turn of the protein backbone (Carugo et al. 2003); additionally, the oxidized and reduced states of this bond present very different structural features suggesting a possible role as conformational switch (Carugo et al. 2003). At the present time, we do not know the significance that this vicinal bond could have on the activity of the MgCRP-I peptides but the observed 1-2, 3-4, 5-6 topology is distinctively different from the 1-4, 2-5, 3-6 topology common to knottins, which comprise conopeptides and most of the other peptides represented in figure 5 with an experimentally determined tridimensional structure (Hartig et al. 2005). Further studies will be aimed in the future at the purification of native peptides to confirm the experimental results obtained concerning the folding of MgCRP-I 7 and 9 synthetic peptides.

MgCRP-I Transcript Levels

The number of sequences related to the MgCRP-I family identified in the many mussel transcriptome data sets analyzed was extremely low (table 4) and suggests a very limited basal expression of these genes in different tissues under physiological conditions. More in detail, no evidence of MgCRP-Is was found in Sanger sequencing-based EST (expressed sequence tag) collections, with the exception of a single M. edulis sequence detected in an SSH (suppressive subtractive hybridization) library (digestive gland of mussels exposed to styrene). The number of CRP-I sequences detected in the pyrosequencing-based data sets increased, even though in many cases MgCRP-I transcripts could not be detected. Finally, the analysis of Illumina sequencing-based transcriptomes clearly pointed out the high sequencing depth necessary to detect MgCRP-I messenger RNAs, which can be estimated to cumulatively contribute to less than 0.01% (but often to even less than 0.001%) of the total gene expression in most tissues.

Table 4

Number of CRP-I Sequencing Reads Identified in the Publicly Available Transcriptome Data Sets from Mytilus spp. (Retrieved from NCBI SRA, February 2015)

Database	Reference	Tissue	Sequencing Strategy	Total Number of Sequences	Sequences Related to MgCRP-I	%
Mytilus galloprovincialis	Venier et al. (2009)	Mixed tissues	Sanger	19,617	0	0
Mytilus californianus	NA	Mixed tissues	Sanger	42,354	0	0
Mytilus coruscus	NA	Foot	Sanger	719	0	0
Mytilus galloprovincialis	Craft et al. (2010)	Foot	454	31,227	0	0
Mytilus galloprovincialis	Craft et al. (2010)	Mantle	454	52,057	0	0
Mytilus galloprovincialis	Suárez-Ulloa, Fernández-Tajes, Aguiar-Pulido, et al. (2013)	Digestive gland	454	2,206,478	0	0
Mytilus trossulus	Romiguier et al. (2014)	Mixed tissues	Illumina	∼58 million	142	<0.001
Mytilus galloprovincialis	NA	Hemocytes	Illumina	∼106 million	490	<0.001
Mytilus galloprovincialis	NA	Gills	Illumina	∼52 million	182	<0.001
Mytilus edulis	Philipp et al. (2012)	Hemocytes	454	407,061	2	<0.001
Mytilus edulis	Bassim et al. (2014)	Larvae	Illumina	∼295 million	3,423	0.001
Mytilus californianus	Romiguier et al. (2014)	Mixed tissues	Illumina	∼78 million	644	0.001
Mytilus edulis	Philipp et al. (2012)	Mixed tissues	454	365,626	3	0.001
Mytilus edulis	Freer et al. (2014)	Mantle	454	494,391	8	0.002
Mytilus galloprovincialis	Craft et al. (2010)	Gill	454	58,271	1	0.002
Mytilus galloprovincialis	NA	Gills	Illumina	∼120 million	3,103	0.003
Mytilus edulis	Philipp et al. (2012)	Digestive gland	454	1,112,061	30	0.003
Mytilus galloprovincialis	Gerdol et al. (2014)	Digestive gland	Illumina	∼54 million	3,269	0.006
Mytilus galloprovincialis	NA	Posterior adductor muscle	Illumina	∼103 million	9,429	0.009
Mytilus galloprovincialis	Romiguier et al. (2014)	Mixed tissues	Illumina	∼108 million	11,695	0.011
Mytilus edulis	Philipp et al. (2012)	Inner mantle	454	323,482	46	0.014
Mytilus edulis	Romiguier et al. (2014)	Mixed tissues	Illumina	∼103 million	15,086	0.015
Mytilus edulis	González et al. (2015)	Mantle/foot	Illumina	∼49 million	8,492	0.017
Mytilus edulis	NA	Mixed tissues	Sanger	5,300	1	0.019
Mytilus galloprovincialis	NA	Mantle	Illumina	∼108 million	30,802	0.029
Mytilus galloprovincialis	Craft et al. (2010)	Digestive gland	454	33,992	2	0.059
Mytilus edulis	Philipp et al. (2012)	Mantle rim	454	324,592	299	0.092

Number of CRP-I Sequencing Reads Identified in the Publicly Available Transcriptome Data Sets from Mytilus spp. (Retrieved from NCBI SRA, February 2015) To better evaluate the MgCRP-I tissue-specificity, we analyzed the expression levels of 17 MgCRP-I transcripts by qPCR in different tissues (hemolymph, digestive gland, inner mantle, mantle rim, foot, posterior adductor muscle, and gills) of a pool of 30 naïve adult mussels (M. galloprovincialis). We found large variability among the expression profiles of individual CRP-I sequences, with the overall expression levels being almost invariably very low in all tissues. However, three tissues emerged as main sites of MgCRP-I expression, namely the digestive gland, the inner mantle, and the mantle rim. Several MgCRP-I displayed, at least to some extent, a certain degree of tissue specificity (fig. 6). In most cases these genes were not expressed at all in hemolymph, foot, gills, and posterior adductor muscle, indicating that these are not the primary sites of production of MgCRP-I peptides, which is consistent with RNA-seq data (table 4).

Gene expression of 17 selected MgCRP-I genes in six tissues (HE, hemolymph; DG, digestive gland; IM, inner mantle; MR, mantle rim; FO, foot; GI, gills; AM, posterior adductor muscle); primers for MgCRP-I 3 also target MgCRP-I 25, primers for MgCRP-I 10 also target MgCRP-I 26. Bars represent the expression relative to EF-1 alpha; results are mean ± standard deviation of three replicates. MgCRP-I sequences are divided into three panels based on their expression level: (A)—genes with maximum relative expression value comprised between 0.05 and 0.25, (B)—genes with maximum relative expression value comprised between 0.004 and 0.01, (C)—with maximum relative expression value lower than 0.003. (D) A schematic representation of a Mytilus galloprovincialis anatomical features, highlighting the sampled tissues. Overall, the gene expression data leave room to different hypotheses which need to be tested in future experiments: 1) The expression of these peptides may be induced by still unknown specific stimuli, 2) MgCRP-I are expressed by a low number of highly specialized cells and therefore the global contribution to mRNAs extracted from macrotissues is low, and 3) MgCRP-I are not expressed in adult individuals, but they play an important role in the early developmental stages (but this hypothesis seems to be disproved by the analysis of M. edulis larvae RNA-seq data; see table 4).

MgCRP-I Synthetic Peptides Do Not Show Any Significant Cytotoxic, Insecticidal, Antifungal, and Antimicrobial Activity

In an attempt to characterize the biological activity of the synthetic peptides MgCRP-I 7 and MgCRP-I 9, we evaluated their cytotoxicity on human tumor cell lines and insect larvae, and their antimicrobial activity on the bacteria E. coli and S. aureus, and on fungal strains of Ca. albicans, Cry. neoformans, A. fumigatus and A. brasiliensis (see Materials and Methods). The MTT assays indicated that both synthetic peptides were not cytotoxic on the HT-29, SHSY5Y, and MDAMB231 cell lines up to 10 µM concentration. Similarly, no insecticidal effect was observed in Z. morio larvae 48 h after the injection of 300 µg peptide/kg body weight, a quantity much higher than those determining visible neurotoxic effects, or even death, for other invertebrate toxins (Yang et al. 2012; Zhong et al. 2014). Finally, the antimicrobial activity assay evidenced that both MgCRP-I 7 and 9 did not show any effect on the selected bacterial and fungal strains at concentrations up to 32 µM. Although these results indicate that MgCRP-I synthetic peptides did not display antimicrobial or cytotoxic activity in the tested conditions, their involvement in defense processes cannot be ruled out. In fact, posttranslational modifications might occur in mussel cells but this can hardly be investigated due to the low expression levels of MgCRP-I genes which, in turn, makes the purification of native peptides difficult. Hence, the absence of biological effects could depend on a variety of modifications not present in the synthetic MgCRP-Is but often reported as fundamental for the antimicrobial or toxic activity of short CRPs (Guder et al. 2000; Buczek, Bulaj, et al. 2005; Buczek, Yoshikami, et al. 2005; Bergeron et al. 2013). In addition, as we have previously stated, given the difficulty in purifying peptides expressed at such low levels from tissue extracts, we cannot certify that the folding observed for synthetic peptides is identical to that of native peptides, even though the fact that one dominant product was obtained in the refolding of both peptides is consistent with this assumption. These considerations are important in perspective and, although the characterization of the activity of native MgCRP-I peptides is beyond the scope of this study, this will be an important task to be accomplished in future studies.

Conclusions

Thanks to an exploratory bioinformatics approach applied to the NGS sequencing data, we could identify a novel family of cysteine rich peptides, named MgCRP-I, which appears to be exclusively present in Mytiloida, an order of marine filter-feeding mussels. The MgCRP-I gene family and the encoded peptides share a number of structural and evolutionary traits in common with other families of CRPs, which almost invariably have an antimicrobial or toxic function. These marked similarities initially suggested that MgCRP-I peptides could have similar biological functions, thus making them intriguing targets for possible future biotechnological and pharmacological applications. However, all the tests performed on two synthetic MgCRP-I peptides led to inconclusive results, leaving their biological role still puzzling. In addition, the biological targets (both at the molecular and at the species levels) of MgCRP-I peptides are still unknown and the events triggering the expression of these molecules are still elusive. In absence of further indications, these questions remain unsolved. Overall, we have provided a preliminary overview on MgCRP-I peptides, which is intended as starting point for further investigations on their possible action on prokaryotic or eukaryotic cells. Our work also highlights the possibility of identifying previously uncharacterized, potentially bioactive, peptides from whole genomes and transcriptomes of nonmodel organisms without any previous knowledge about their primary sequence, an experimental approach which could speed up the discovery or the design of novel molecules with potential biotechnological applications. Due to their still limited genomic knowledge, marine invertebrates in particular represent a virtually unlimited and almost unexplored source of novel bioactive compounds.

Supplementary Material

Supplementary material, figures S1–S6, and tables S1–S5 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

94 in total

1. Mechanisms for evolving hypervariability: the case of conopeptides.

Authors: S G Conticello; Y Gilad; N Avidan; E Ben-Asher; Z Levy; M Fainzilber
Journal: Mol Biol Evol Date: 2001-02 Impact factor: 16.240

2. Involvement of mytilins in mussel antimicrobial defense.

Authors: G Mitta; F Vandenbulcke; F Hubert; M Salzet; P Roch
Journal: J Biol Chem Date: 2000-04-28 Impact factor: 5.157

3. Arthropod defensins illuminate the divergence of scorpion neurotoxins.

Authors: Oren Froy; Michael Gurevitz
Journal: J Pept Sci Date: 2004-12 Impact factor: 1.905

4. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors: Alexandros Stamatakis
Journal: Bioinformatics Date: 2006-08-23 Impact factor: 6.937

5. Protein structure prediction on the Web: a case study using the Phyre server.

Authors: Lawrence A Kelley; Michael J E Sternberg
Journal: Nat Protoc Date: 2009 Impact factor: 13.491

6. A 'conovenomic' analysis of the milked venom from the mollusk-hunting cone snail Conus textile--the pharmacological importance of post-translational modifications.

Authors: Zachary L Bergeron; Joycelyn B Chun; Margaret R Baker; David W Sandall; Steve Peigneur; Peter Y C Yu; Parashar Thapa; Jeffrey W Milisen; Jan Tytgat; Bruce G Livett; Jon-Paul Bingham
Journal: Peptides Date: 2013-09-18 Impact factor: 3.750

7. The CDNA and genomic DNA sequences of a mammalian neurotoxin from the scorpion Buthus martensii Karsch.

Authors: Y M Xiong; M H Ling; D C Wang; C W Chi
Journal: Toxicon Date: 1997-07 Impact factor: 3.033

8. Massively parallel amplicon sequencing reveals isotype-specific variability of antimicrobial peptide transcripts in Mytilus galloprovincialis.

Authors: Umberto Rosani; Laura Varotto; Alberta Rossi; Philippe Roch; Beatriz Novoa; Antonio Figueras; Alberto Pallavicini; Paola Venier
Journal: PLoS One Date: 2011-11-07 Impact factor: 3.240

9. Spodoptera frugiperda X-tox protein, an immune related defensin rosary, has lost the function of ancestral defensins.

Authors: Delphine Destoumieux-Garzón; Michel Brehelin; Philippe Bulet; Yvan Boublik; Pierre-Alain Girard; Stephen Baghdiguian; Robert Zumbihl; Jean-Michel Escoubas
Journal: PLoS One Date: 2009-08-27 Impact factor: 3.240

10. Molecular characterization of antimicrobial peptide genes of the carpenter ant Camponotus floridanus.

Authors: Carolin Ratzka; Frank Förster; Chunguang Liang; Maria Kupper; Thomas Dandekar; Heike Feldhaar; Roy Gross
Journal: PLoS One Date: 2012-08-09 Impact factor: 3.240

4 in total

1. Myticalins: A Novel Multigenic Family of Linear, Cationic Antimicrobial Peptides from Marine Mussels (Mytilus spp.).

Authors: Gabriele Leoni; Andrea De Poli; Mario Mardirossian; Stefano Gambato; Fiorella Florian; Paola Venier; Daniel N Wilson; Alessandro Tossi; Alberto Pallavicini; Marco Gerdol
Journal: Mar Drugs Date: 2017-08-22 Impact factor: 5.118

2. The purplish bifurcate mussel Mytilisepta virgata gene expression atlas reveals a remarkable tissue functional specialization.

Authors: Marco Gerdol; Yuki Fujii; Imtiaj Hasan; Toru Koike; Shunsuke Shimojo; Francesca Spazzali; Kaname Yamamoto; Yasuhiro Ozeki; Alberto Pallavicini; Hideaki Fujita
Journal: BMC Genomics Date: 2017-08-08 Impact factor: 3.969

3. Parallel identification of novel antimicrobial peptide sequences from multiple anuran species by targeted DNA sequencing.

Authors: Tomislav Rončević; Marco Gerdol; Francesca Spazzali; Fiorella Florian; Stjepan Mekinić; Alessandro Tossi; Alberto Pallavicini
Journal: BMC Genomics Date: 2018-11-20 Impact factor: 3.969

4. Venom Diversity and Evolution in the Most Divergent Cone Snail Genus Profundiconus.

Authors: Giulia Fassio; Maria Vittoria Modica; Lou Mary; Paul Zaharias; Alexander E Fedosov; Juliette Gorson; Yuri I Kantor; Mandё Holford; Nicolas Puillandre
Journal: Toxins (Basel) Date: 2019-10-28 Impact factor: 4.546

4 in total