Literature DB >> 29721534

Machine Learning To Predict Cell-Penetrating Peptides for Antisense Delivery.

Justin M Wolfe¹, Colin M Fadzen¹, Zi-Ning Choo¹, Rebecca L Holden¹, Monica Yao², Gunnar J Hanson², Bradley L Pentelute¹.

Abstract

Cell-penetrating peptides (CPPs) can facilitate the intracellular delivery of large therapeutically relevant molecules, including proteins and oligonucleotides. Although hundreds of CPP sequences are described in the literature, predicting efficacious sequences remains difficult. Here, we focus specifically on predicting CPPs for the delivery of phosphorodiamidate morpholino oligonucleotides (PMOs), a compelling type of antisense therapeutic that has recently been FDA approved for the treatment of Duchenne muscular dystrophy. Using literature CPP sequences, 64 covalent PMO-CPP conjugates were synthesized and evaluated in a fluorescence-based reporter assay for PMO activity. Significant discrepancies were observed between the sequences that performed well in this assay and the sequences that performed well when conjugated to only a small-molecule fluorophore. As a result, we envisioned that our PMO-CPP library would be a useful training set for a computational model to predict CPPs for PMO delivery. We used the PMO activity data to fit a random decision forest classifier to predict whether or not covalent attachment of a given peptide would enhance PMO activity at least 3-fold. To validate the model experimentally, seven novel sequences were generated, synthesized, and tested in the fluorescence reporter assay. All computationally predicted positive sequences were positive in the assay, and one sequence performed better than 80% of the tested literature CPPs. These results demonstrate the power of machine learning algorithms to identify peptide sequences with particular functions and illustrate the importance of tailoring a CPP sequence to the cargo of interest.

Entities: CellLine Chemical Disease Gene Species

Year: 2018 PMID： 29721534 PMCID： PMC5920612 DOI： 10.1021/acscentsci.8b00098

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 14.553

Introduction

Although small molecules can generally diffuse through the plasma membrane, many large molecules have limited uptake into cells.[1,2] These macromolecules are unable to diffuse across the plasma membrane and, if endocytosed, often remain trapped in endosomes. For example, gene-editing proteins, antisense oligonucleotides, and peptide-based proteolysis targeting chimeras (PROTACS) all mediate their effects on intracellular targets, and poor delivery limits their therapeutic potential.[3−5] One promising solution to improve the intracellular delivery of these macromolecules is the covalent conjugation of cell-penetrating peptides (CPPs).[6] Over the past few decades, hundreds of CPPs have been documented in the literature, and yet predicting which peptide sequences improve cytosolic delivery remains difficult. Due in part to the diverse nature of CPPs, the properties and characteristics that are necessary for cell penetration are not well understood. CPPs range from 5 to 40 residues in length, and the sequences can be highly cationic, amphipathic, or hydrophobic.[6−8] Many CPPs are derived from fragments of natural proteins, such as viral proteins, DNA- or RNA-binding proteins, heparin-binding proteins, or antimicrobial peptides.[9] Some sequences were rationally designed after recognizing that cationic residues or amphipathicity can improve cell penetration, while others were discovered using DNA-encoded peptide libraries.[10−13] Taking advantage of machine learning techniques, one recent strategy to predict new CPPs combines experimental data sets of known CPPs with computational models, such as support vector machines or neural networks.[14−17] Unfortunately, it is generally acknowledged that the existing computational models to predict CPPs are intrinsically limited.[14,15,18] These models were all trained on a similar heterogeneous data set compiled from multiple experimental papers on CPPs.[14−17] Since the original papers investigated CPPs for different applications, different experimental parameters were employed. For example, CPP treatment concentrations ranged from 10 to 400 μM, some included serum in the media and others did not, and different cell types were utilized including HeLa cells and primary rat cortex cells.[10,19−21] All of these variables affect cellular uptake, and therefore standardized treatment conditions should be used to improve model accuracy. Additionally, there is a need for computational models that predict CPPs specifically for macromolecule delivery. Experiments to determine putative CPP sequences generally involve the conjugation of a small-molecule fluorophore to the CPP, and the uptake of the fluorophore-CPP is then analyzed by flow cytometry or live-cell confocal imaging.[22,23] However, experiments with fluorophore-labeled CPPs do not assess whether or not the CPP is suitable for the delivery of macromolecular cargo. Further, it is likely that the optimal CPP for the delivery of one type of macromolecule is different from the optimal CPP for a different type of macromolecule. One approach to manage this cargo dependence is to evaluate CPPs in the context of a functional readout for a specific macromolecule. For example, several activity-based assays have been developed to evaluate successful delivery of peptides, proteins, and antisense oligonucleotides.[24−27] Phosphorodiamidate morpholino oligonucleotides (PMOs) are one particular type of macromolecule that benefits from conjugation to CPPs. PMOs are a charge-neutral antisense oligonucleotide therapeutic in which the ribose sugar is replaced with a methylenemorpholine ring and the phosphodiester backbone is replaced with a phosphorodiamidate backbone (Figure A).[28] PMOs can be designed to bind to pre-mRNA and can alter gene splicing, resulting in the exclusion or inclusion of particular genetic fragments in the mature mRNA. To improve PMO delivery, arginine-rich CPPs and their derivatives have been covalently conjugated to PMO and investigated using gene-splicing assays, in which cellular fluorescence increases in the presence of PMO.[29−32] Other types of neutral antisense oligonucleotides, such as peptide nucleic acids, have also been investigated after conjugation to CPPs.[33,34]

Figure 1

PMOs alter gene splicing. (A) The backbone structure of phosphorodiamidate morpholino oligonucleotides, a type of antisense oligonucleotide. (B) The exon-skipping assay used in this study. HeLa-654 cells are stably transfected with a split eGFP construct that contains a mutant intron. In the absence of PMO, a nonfluorescent truncated protein is expressed. If PMO IVS2-654 is present, it hybridizes to the mutant intron, alters pre-mRNA splicing, and produces functional mRNA that is translated into fluorescent eGFP. Here, we seek to predict CPPs specifically for PMO delivery. We hypothesized that PMO–CPP conjugates evaluated in a functional assay with standardized conditions would provide a data set for training a computational model. We synthesized a library of 64 PMO–CPP conjugates, utilizing previously reported CPPs. To benchmark each CPP, we measured the amount of PMO activity in an exon-skipping assay. We tested all PMO–CPP conjugates using the same concentration, cell-line, amount of serum in the media, and treatment time. Using select CPP sequences from our library, we directly compared the relative effectiveness of a given CPP for the delivery of small-molecule fluorophore to the delivery of PMO. We then developed a random forest classifier that can discriminate whether or not a given peptide sequence can improve PMO activity more than 3-fold. Lastly, we predicted custom peptides for PMO delivery and experimentally validated the sequences for successful delivery.

Results and Discussion

We began by measuring the effectiveness of literature-reported CPPs specifically for PMO delivery. We synthesized a library of CPPs consisting of the sequences listed in the comprehensive review by Milleti in 2012.[9] All the CPPs were capped at the N-terminus with an alkyne for further conjugation. Peptides that synthesized poorly or exhibited limited solubility were discarded from the library. After purification using reversed-phase high-performance liquid chromatography (RP-HPLC), we used copper-catalyzed click chemistry to conjugate the peptides to a 6212 Da, 18-mer PMO. The PMO we chose can trigger functional eGFP expression in a modified HeLa cell line (Figure B).[27] After another round of purification, we obtained a library of 64 PMO–CPP conjugates (Table ). The library included all of the canonical CPPs, including TAT, pVEC, TP10, penetratin, and polyarginine. Other less commonly reported peptides, such as the heparin binding proteins (DPV3-15) and proline-rich CPPs such as Bac7, were also included. With regard to classes of CPPs, our library contained 25 sequences generally classified as cationic sequences, 8 classified as hydrophobic, and 23 classified as amphipathic.

Table 1

List of CPPs that were Conjugated to PMO and Tested in the eGFP Assaya

CPP name	CPP class	amino acid sequence	theoretical net charge	activity relative to PMO
arginine-12	cationic	RRRRRRRRRRRR	12	10.4
MPG	amphipathic	GLAFLGFLGAAGSTMGAWSQPKKKRKV	5	7.5
Bac7	proline rich	RRIRPRPPRLPRPRPRPLPFPRPG	9	6.1
TAT	cationic	RKKRRQRRR	8	6.0
arginine-10	cationic	RRRRRRRRRR	10	5.8
DPV6	cationic	GRPRESGKKRKRKRLKP	9	5.4
S413-PVrev	amphipathic	ALWKTLLKKVLKAPKKKRKV	9	5.2
HRSV	cationic	RRIPNRRPRR	6	4.9
HTLV-II Rex	cationic	TRRQRTRRARRNR	8	4.7
L-2	amphipathic	HARIKPTFRRLKWKYKGKFW	9	4.6
melittin	amphipathic	GIGAVLKVLTTGLPALISWIKRKRQQ	5	4.5
DPV15	cationic	LRRERQSRLRRERQSR	6	4.3
arginine-9	cationic	RRRRRRRRR	9	4.2
penetratin	cationic	RQIKIWFQNRRMKWKK	7	4.2
yeast GCN4	cationic	KRARNTEAARRSRARKLQRMKQ	9	4.2
PDX-1	cationic	RHIKIWFQN RRM KWKK	8	4.1
arginine-8	cationic	RRRRRRRR	8	4.0
BMV Gag	cationic	KMTRAQRRAAARRNRWTAR	8	3.9
SynB1	amphipathic	RGGRLSYSRRRFSTSTGR	6	3.9
knotted-1	cationic	KQINNWFINQRKRHWK	6	3.8
IVV-14	hydrophobic	KLWMRWYSPTTRRYG	4	3.7
W/R	amphipathic	RRWWRRWRR	6	3.5
engrailed-2	cationic	SQIKIWFQN KRAKIKK	6	3.4
DPV15b	cationic	GAYDLRRRERQSRLRRRERQSR	7	3.3
yeast PrP6	cationic	TRRNKRNRIQEQLNRK	6	3.3
DPV7	cationic	GKRKKKGKLGKKRDP	8	3.2
HoxA-13	cationic	RQVTIWFQNRRVKEKK	5	3.1
AIP6	amphipathic	RLRWR	3	2.9
(PPR)5	cationic	PPRPPRPPRPPRPPR	5	2.7
CAYH		CAYHRLRRC	4	2.6
DPV10	cationic	SRRARRSPRHLGSG	6	2.5
(PPR)4	amphipathic	PPRPPRPPRPPR	4	2.4
P22 N	cationic	NAKTRRHERRRKLAIER	7	2.4
DPV1047	cationic	VKRGLKLRHVRPRVTRMDV	7	2.4
SVM4	amphipathic	LYKKGPAKKGRPPLRGWFH	7	2.2
cp21N(12–29)	cationic	TAKTRYKARRAELIAERR	5	2.1
SVM3	amphipathic	KGTYKKKLMRIPLKGT	6	2.1
(PPR)3	amphipathic	PPRPPRPPR	3	1.9
SVM2		RASKRDGSWVKKLHRILE	5	1.9
buforin 2	amphipathic	TRSSRAGLQWPVGRVHRLLRK	7	1.9
SVM1		FKIYDKKVRTRVVKH	6	1.7
SAP	amphipathic	VRLPPPVRLPPPVRLPPP	3	1.7
435b	hydrophobic	GPFHFYQFLFPPV	1	1.7
Pept1	hydrophobic	PLILLRLLRGQF	2	1.7
YTA2		YTAIAWVKAFIRKLRK	5	1.5
Pep-1	Amphipathic	KETWWETWWTEWSQ PKKRKV	2	1.4
EB-1	amphipathic	LIRLWSHLIHIWFQNRRLKWKKK	9	1.4
pyrrhocoricin	proline rich	VDKGSYLPRPTPPRPIYNRN	3	1.4
AN(1–22)	cationic	MDAQTRRRERRAEKQAQWKAAN	4	1.4
439a	hydrophobic	GSPWGLQHHPPRT	3	1.3
MAP	amphipathic	KLALKALKALKAALKLA	5	1.3
Bip	hydrophobic	IPALK	1	1.3
Bip	hydrophobic	VPALR	1	1.3
pVEC	amphipathic	LLIILRRRIRKQAHAHSK	8	1.2
YTA4		IAWVKAFIRKLRKGPLG	5	1.2
K-FGF + NLS	amphipathic	AAVLLPVLLAAPVQRKRQKLP	4	1.2
HN-1	hydrophobic	TSPLNIHNGQKL	2	1.2
Bip	hydrophobic	VPTLK	1	1.2
Bip	hydrophobic	VSALK	1	1.1
VT5	amphipathic	DPKGDPKGVTVTVTVTVTGKGDPKPD	0	0.8
transportan 10	amphipathic	AGYLLGKINLKALAALAKKIL	4	0.8
SAP(E)	amphipathic	VELPPPVELPPPVELPPP	–3	0.8
CADY	amphipathic	GLWRALWRLLRSLWRLLWRA	5	0.6
PreS2-TLM	amphipathic	PLSSIFSRIGDP	0	0.6

Each previously reported CPP was synthesized, purified, and conjugated to PMO IVS2-654. The conjugates were tested for functional PMO activity in the HeLa-654 cell assay. Individual CPPs are ranked by their activity relative to unconjugated PMO. For a functional readout, the PMO–CPP conjugates were tested in the eGFP HeLa PMO assay (Figure B). In this assay, HeLa-654 cells are stably transfected with an eGFP coding sequence interrupted by an intron from the human β-globin gene (IVS2-654) containing a mutation that alters the normal pre-mRNA splice site to a formerly cryptic splice site. The change in splicing leads to retention of an unnatural mRNA fragment in the spliced eGFP mRNA and the translation of a nonfluorescent form of eGFP. The PMO IVS2-654 hybridizes to the mutant β-globin exon in the stably transfected HeLa cells, altering gene splicing and leading to full-length eGFP expression. The amount of PMO delivered is therefore correlated to the amount of functional eGFP expressed. In the experiment, eGFP HeLa cells are incubated with 5 μM of each PMO–CPP conjugate in media containing 10% fetal bovine serum (FBS) for 24 h. Then the cellular fluorescence is analyzed by flow cytometry. Given that the effectiveness of CPPs can be sensitive to treatment conditions such as the amount of serum in the treatment media and the amount of time treated, we kept these variables constant, enabling us to directly compare all of the CPPs under similar conditions. Testing the library under these unified conditions led to the observation that several literature CPPs had little effect on promoting PMO delivery (Table ). While seven peptides increased PMO activity above 5-fold, many peptides exhibited marginal improvement for PMO delivery. Twenty-seven CPPs (42% of library) led to under a 2-fold increase in eGFP fluorescence, and five CPPs actually decreased the amount of eGFP fluorescence compared to unconjugated PMO. In particular, the commonly used CPP transportan-10 (TP10) exhibited a negative effect on eGFP fluorescence, suggesting it is ineffective for PMO delivery under these conditions. Additionally, we observed that net positive charge was one of the strongest predictors of a successful CPP. Cationic sequences represented 70% of the CPPs with over a 3-fold improvement in PMO activity (19 out of 27 sequences). This trend is specific for the number of arginine residues, with more arginine residues leading to more observed eGFP fluorescence. In particular, attachment of the arginine-12 CPP led to the greatest enhancement of PMO activity. However, a high net charge is by no means necessary—the peptide MPG has just one arginine residue and a relatively minor theoretical net charge of +5, yet it exhibited the second highest activity of all the CPPs that we tested. To understand if the trends in CPP effectiveness were specific for PMO delivery, we evaluated select members of our CPP library for fluorophore delivery (Figure A). The chosen CPPs cover the range of physiochemical properties present in our library, as well as the most commonly utilized CPP sequences. We used copper-catalyzed click chemistry to conjugate the CPPs to cyanine 5.5 (Cy5.5 – λex 684 nm, λem 710 nm). Next, eGFP HeLa cells were treated for 2 h with 5 μM of each Cy5.5–CPP conjugate in media containing 10% serum, and the cellular fluorescence was analyzed by flow cytometry. The amount of Cy5.5 fluorescence was normalized to the mean fluorescence intensity of Cy5.5–YTA-4, the conjugate with the highest fluorescence intensity. Then, the amount of Cy5.5 fluorescence measured for each Cy5.5–CPP conjugate was compared to the relative amount of eGFP fluorescence for the equivalent PMO–CPP conjugate. We observed little correlation between the relative effectiveness of PMO and Cy5.5 delivery for a given CPP (Figure B). For example, arginine-12 led to the highest eGFP fluorescence of the CPPs evaluated, yet only moderate Cy5.5 fluorescence. In contrast, YTA4 and TP10 both led to substantial Cy5.5 fluorescence but demonstrated no practical improvement in PMO delivery.

Figure 2

Cargo identity alters relative CPP efficacy. (A) Each CPP sequence was analyzed for delivery of both a fluorophore (Cy5.5, MW = 665 Da) and a PMO (IVS2-654, MW = 6212 Da). For Cy5.5-CPP conjugates, HeLa-654 cells were treated with 5 μM of the conjugate in media containing 10% FBS. After 2 h, the Cy5.5 fluorescence was measured by flow cytometry. For PMO–CPP conjugates, HeLa-654 cells were treated with 5 μM of the conjugate in media containing 10% FBS. After 22 h, the eGFP fluorescence was measured by flow cytometry. (B) The cargo dependency of a given CPP, normalized to the activity of the highest-performing CPP for each type of cargo. There is little relationship between the CPPs that led to the most cellular Cy5.5 fluorescence and the CPPs that led to the most eGFP fluorescence. It is important to note that the PMO assay does not distinguish between cellular uptake and other downstream effects that influence exon skipping (e.g., nuclear delivery or mRNA splicing). The amount of eGFP fluorescence is not a measure of intracellular concentration, even though they are correlated. We focused on an activity measurement because we believe that, in the context of macromolecule delivery, the final functional output of a cargo represents the most relevant criterion for judging a CPP. After benchmarking literature CPPs and noting the cargo dependency, we sought to develop a method to identify efficacious CPPs for PMO delivery. We hypothesized that we could leverage the consistency of our data set (identical concentrations, treatment times, serum-containing media, and assay) along with techniques from machine learning to build a predictive computational model for PMO delivery. While the algorithms we employed are standard in the computer science field, this is the first time they have been applied to the prediction task of peptides for antisense oligonucleotide delivery. We chose to train a random forest classifier to select CPPs specific to PMO delivery (Figure A). Random forests are sets of decision trees, each fit to a randomly selected subset of features and training examples, and we selected this ensemble learning method due to its scalability and robustness to overfitting.[35] We calculated 23 features for each peptide sequence. Two features were peptide molecular weight and sequence length (total number of residues). The theoretical net charge of the sequence was averaged across the five N-terminal residues, the five C-terminal residues, and the entire peptide sequence to give three features. The remaining 18 features were derived from six previously described amino acid physicochemical descriptors. These six descriptors were produced by factor analysis of 384 molecular properties calculated for 22 natural and 593 non-natural amino acids.[36] For each peptide sequence, the six descriptors were also averaged across the five N-terminal residues, the five C-terminal residues, and the entire peptide sequence.

Figure 3

Random forest ensemble learning methods can be used to predict peptide sequences that facilitate PMO delivery. (A) Scheme of the workflow for the development of computationally derived peptide sequences for antisense delivery. One component of the workflow was random forest training using the properties of the CPPs in the library to build a model. Another component was randomly generating peptide sequences and using the model to predict the peptide activity. Lastly, select sequences were synthesized for experimental validation. Performance metrics for the model based on the test set are given in the table. (B) Table of five predicted PMO carriers (PPC) and two predicted negative sequences (NS). All five PMO–PPC conjugates exhibited above a 3-fold improvement in eGFP fluorescence, whereas both PMO–NS conjugates exhibited less than a 3-fold change with respect to PMO. (C) The mean fluorescence intensity of eGFP HeLa cells treated at a concentration of 5 μM with each of the PMO–PPCs and PMO–NSs in serum-containing media. Each individual experiment consisted of the average of three different wells with the same treatment conditions, and the experiment was repeated three times. The error bars represent the standard deviation across the experimental replicates. The CPP sequences were classified as either positive or negative examples based on whether or not they exhibited above a 3-fold change in eGFP fluorescence with respect to the unconjugated PMO. Forty-four sequences were used as the training set for the random forest model. The other 20 sequences were held out to serve as a test set to evaluate the degree to which the model properly fit the data and could successfully predict the exon skipping activity of a sequence. The performance metrics of the model are shown in Figure A. After testing our model computationally, we sought to validate it experimentally. Random peptide sequences were generated by selecting a peptide length and amino acid composition with probability proportional to the distribution observed in the training data set from the CPP library. Of the random peptides, we selected five positive sequences predicted to lead to above a 3-fold increase in eGFP fluorescence and two negative sequences (NSs). We selected more positive sequences as our goal was to develop novel peptide sequences for PMO delivery, which we have termed predicted PMO carriers (PPCs). These PPCs were synthesized by solid-phase peptide synthesis, conjugated to PMO IVS2-654, and purified by RP-HPLC. We tested the PPCs in the eGFP assay, and all five PPCs had a greater than 3-fold change in cellular fluorescence with respect to unconjugated PMO (Figure B,C). In fact, PMO–PPC3 increased fluorescence 4.4-fold, which is a larger increase than 80% of the literature CPPs. The negative sequences demonstrated less than a 3-fold change, indicating that our model could accurately discriminate between positive and negative sequences. To demonstrate that our model is sensitive to the effects of cargo on CPP effectiveness, we next assessed whether or not our novel PPCs were CPPs with regard to small-molecule fluorophore delivery. We prepared variants of our PPCs labeled with Cy5.5, rather than the PMO, and measured the uptake by flow cytometry (Figure S1). Here, the trends were completely different, as cells treated with Cy5.5-NS2 exhibited fluorescence similar to cells treated with our positive control Cy5.5–YTA-4 and twice the fluorescence of cells treated with Cy5.5–PPC1. These data parallel our observations of the literature CPPs and suggest that our computational model is truly specific for PMO-based cargo, providing further evidence that CPPs must be chosen in the context of the cargo of interest. Next, we sought to characterize the mechanisms by which our PMO–PPC conjugates are internalized into cells, focusing on PPC3 and PPC5. All mechanistic studies were conducted at concentrations of 5 μM to remain consistent with the conditions with which we evaluated our PMO–PPC library. The experiments were performed in a pulse-chase format, in which the cells were preincubated in a particular treatment condition for 30 min followed by addition of PMO–PPC. After 3 h, the media was exchanged for fresh media that did not contain the conjugate, and the cells were allowed to incubate for an additional 22 h at 37 °C. First, we compared eGFP fluorescence after 3 h of treatment at 4 °C vs 37 °C. We found that for PMO alone and both PMO–PPC conjugates, eGFP fluorescence decreased when treatment occurred at 4 °C, suggesting that energy-dependent mechanisms play a role in the uptake of our conjugates (Figure A). One plausible explanation for the residual fluorescence in the 4 °C condition is that the cells incubate for an additional 22 h at 37 °C after treatment, so any conjugate that binds to the surface of the cells during treatment at 4 °C may be subsequently internalized and trigger eGFP expression.

Figure 4

The PMO–PPC conjugates engage endocytic mechanisms in their uptake into cells. (A) Effect of temperature on PMO–PPC activity. eGFP HeLa cells were incubated at 4 °C for 30 min before incubation with either PMO or PMO–PPC conjugates at 4 °C at a concentration of 5 μM. After 3 h, the treatment media was replaced with fresh, untreated media, and the cells were allowed to grow for an additional 22 h at 37 °C. CO refers to cell-only. (B) Effect of cytochalasin D on PMO–PPC activity. eGFP HeLa cells were treated with cytochalasin D for 30 min in serum-containing media before the addition of either PMO or PMO–PPC conjugates at a concentration of 5 μM. After 3 h, the treatment media was replaced with fresh, untreated media, and the cells were allowed to grow for an additional 22 h. The cells were analyzed for eGFP fluorescence by flow cytometry, and the results are shown in terms of the mean fluorescence intensity. These experiments were conducted on a plate alongside several other inhibitors, and the full results are in the Supporting Information. We also treated the HeLa eGFP cells with a panel of endocytosis disruptors and assessed the effects on internalization (Figure B and Figure S2). While eGFP fluorescence was relatively unchanged after preincubation with many of the inhibitors, preincubation with cytochalasin D led to a notable decrease in eGFP fluorescence. Cytochalasin D binds to the barbed, fast growing ends of actin microfilaments, which prevents assembly and disassembly of actin monomers.[37] This affects not only the cytoskeleton of the cell, but also the ability of the membrane to ruffle and reorganize to facilitate macropinocytosis. While it is possible that the decrease in eGFP fluorescence is due to effects downstream in the exon skipping pathway, these results suggest that macropinocytosis plays a significant role in the internalization of our conjugates.

Concluding Remarks

Serendipity is not necessary to discover new cell-penetrating peptide sequences for macromolecule delivery. Although many CPPs have been discovered using peptide fragments from natural proteins, computational models challenge the notion that CPP sequences must be found in nature. Here, we generated a random forest classifier that could accurately predict whether or not conjugation of a given peptide sequence would increase PMO activity 3-fold. Our model enabled the discovery of five completely novel sequences that increased PMO activity, and accurately discriminated between active and inactive sequences. A BLAST search revealed that these new sequences are not found in nature. One key component of our computational model is the standardized assay conditions used in the data set for training. Multiple experimental variables influence cellular uptake, including treatment concentration and the presence of serum. To enable the accurate classification of multiple CPPs, the peptides must be tested under similar conditions. Additionally, as noted previously in the CPP field, the computational models are only as valuable as the data used to train a given model. By testing a PMO–CPP library under standardized conditions, we obtained a high-quality training set to improve the accuracy of the predictions from the model. The second key component of our model is the type of cargo utilized to assess cell penetration. Functional macromolecules have diverse chemical structures, mechanisms of action, and sites of activity inside of cells. Although many CPPs are investigated using only an attached fluorophore as a metric of their efficacy, our experiments indicated that fluorophore studies have little predictive value with regard to the optimal CPP for macromolecular delivery. Therefore, to develop an optimal computational model, the training set should involve CPPs tested in the appropriate context. For our experiments, we investigated a library of PMO–CPP conjugates to focus on the context of improving PMO delivery and promoting exon skipping. Combining these two components leads to a computational model that enables exploration of the design space for cell-penetrating peptides. Although many, potentially infinite, peptide sequences may facilitate PMO delivery, our model can be employed to select sequences based on certain desirable parameters. For example, peptide sequences with a large number of arginine residues can lead to toxicity in vivo and so, in our computational model, sequence space can be restricted to sequences containing fewer than three arginine residues. Similar approaches can be utilized for identifying and avoiding peptide sequences that are immunogenic (e.g., by referencing a database of immunogenic sequences), or peptide sequences that will be synthetically challenging (e.g., peptides with multiple β-branched residues). We envision that using our computational model, vast numbers of putative sequences can be tested in silico, reducing experimental burden and drastically increasing the chemical space that can be investigated. Then, only the optimized sequences that meet the desired criteria can be evaluated experimentally. Moving forward, we seek to understand the generalizability of our current computational model. If our model extends to clinically relevant PMO sequences, it could serve as a valuable resource to optimize therapeutic PMO–peptide conjugates. Additionally, we will investigate the utility of our computational model for improving the delivery of different classes of antisense oligonucleotides. Understanding the strengths and limitations of computational prediction will be critical for applying machine learning to the challenge of intracellular delivery. We envision that careful experimental design coupled with the appropriate machine learning algorithm will significantly increase the portfolio of CPPs for macromolecule delivery.

Experimental Section

Peptide Synthesis

Peptides were synthesized using either manual flow peptide synthesis or various iterations of an automated flow peptide synthesizer.[38,39] For detailed methods on peptide synthesis and purification, please see the Supporting Information.

PMO Azide Synthesis

PMO IVS2-654 was provided by Sarepta Therapeutics. The sequence is shown in Figure A. To conjugate the azide to the 3′ end, PMO IVS2-654 was dissolved in DMSO (53 mM). To the solution was added 4 equiv of 5-azidopentanoic acid activated with HBTU and 4 equiv of DIEA dissolved in DMF. The reaction proceeded for 25 min before being quenched with water and ammonium hydroxide. The ammonium hydroxide was used to hydrolyze any ester formed during the course of the reaction. After 1 h, the solution was diluted and purified by reversed-phase HPLC using a linear gradient from 2% to 60% B over 58 min. Mobile phase A: water. Mobile phase B: acetonitrile. For LC-MS characterization, please see Supporting Information.

PMO Peptide Conjugation

PMO peptide conjugates were synthesized by copper-catalyzed azide alkyne cycloaddition using a copper bromide catalyst in DMF. Under nitrogen gas, a mixture of peptide alkyne (1.1 μmol), PMO azide (0.95 μmol), and copper bromide (0.05 mmol) was dissolved in 1 mL of DMF, vortexed, and allowed to react for 1 h. The reaction was quenched with the addition of 10 mL of 50 mM Tris (pH 8). Our optimized purification procedure utilized reversed-phase HPLC with a linear gradient from 5 to 45% B over 20 min. Mobile phase A: 100 mM ammonium acetate pH 7.2 in water. Mobile phase B: acetonitrile. For additional purification procedures and LC-MS characterization of all PMO-peptide conjugates, please see the Supporting Information.

Fluorophore Conjugation

Cy5.5 azide was conjugated to peptide alkyne by copper-catalyzed azide alkyne cycloaddition using copper sulfate and ascorbic acid. Briefly, 0.5 μmol of peptide alkyne was dissolved in 200 μL of 50:50 t-butanol/water in a 1.7 mL microcentrifuge tube. The following solutions were added in order: 10 μL of 50 mM Cy5.5 azide in DMSO, 100 μL of 500 mM Tris pH 8 in water, 50 μL of 100 mM copper(II) sulfate in water, 10 μL of 10 mM Tris(benzyltriazolylmethyl)amine (TBTA) in DMSO, 530 μL of 50:50 t-butanol/water, and 100 μL of 1 M ascorbic acid in water. After 1 h, the reactions were purified by reverse-phase HPLC using a linear gradient from 5 to 45% B over 80 min. Mobile phase A: water with 0.1% TFA. Mobile phase B: acetonitrile with 0.1% TFA. For LC-MS characterization of Cy5.5–PPC conjugates, please see the Supporting Information.

Computational Design

Random forest classifier hyperparameters were optimized through grid search with classification accuracy estimated with 3-fold cross validation. The selected number of features per tree, number of trees, and maximum tree depth were 11, 50, and 20, respectively. The change in accuracy from varying select hyperparameters or removing each feature is shown in the Supporting Information. Performance metrics from classifier evaluation on a held-out test set of 20 sequences are given in Figure A. The performance metrics are defined below, where TP refers to true positive, TN refers to true negative, FP refers to false positive, and FN refers to false negative.

Flow Cytometry

For testing the library of PMO–CPP conjugates, flow cytometry analysis of GFP fluorescence was conducted as previously described.[27] For testing the PMO–PPC conjugates, HeLa 654 cells were maintained in MEM supplemented with 10% (v/v) fetal bovine serum (FBS) and 1% (v/v) penicillin-streptomycin at 37 °C and 5% CO2. Stocks of each PMO–PPC conjugate were prepared in phosphate-buffered saline (PBS). The concentration of the stocks was determined by measuring the absorbance at 260 nm and using an extinction coefficient of 168 700 L mol–1 cm–1. Cells were incubated with each respective conjugate at a concentration of 5 μM in MEM supplemented with 10% FBS and 1% penicillin-streptomycin for 22 h at 37 °C and 5% CO2. Next, the treatment media was aspirated. The cells were incubated with trypsin-EDTA 0.25% for 15 min at 37 °C and 5% CO2, washed 1× with PBS, and resuspended in PBS with 2% FBS and 2 μg/mL propidium iodide. Flow cytometry analysis was carried out on a BD LSRII flow cytometer. Gates were applied to the data to ensure that cells that were highly positive for propidium iodide or exhibited forward/side scatter readings that were sufficiently different from the main cell population were excluded. Each histogram contained at least 10 000 gated events. Representative histograms are shown in the Supporting Information.

Inhibitor Experiments

To inhibit a variety of endocytic mechanisms, a pulse-chase experiment was performed. Briefly, HeLa 654 cells were plated at a density of 5000 cells per well in a 96-well plate in MEM supplemented with 10% FBS and 1% penicillin-streptomycin. The next day, the cells were treated with each inhibitor at the indicated concentration. After 30 min, PMO–peptide conjugate was added to each well at a concentration of 5 μM. After incubation at 37 °C and 5% CO2 for 3 h, the treatment media was replaced with fresh media (containing neither inhibitor nor PMO–peptide), and the cells were allowed to grow for another 22 h at 37 °C and 5% CO2. For the 4 °C experiments, the day after plating, the cells were preincubated for 30 min at 4 °C, followed by the addition of PMO–peptide conjugate to each well at a concentration of 5 μM. After incubation at 4 °C for 3 h, the treatment media was replaced with fresh media, and the cells were allowed to grow for another 22 h at 37 °C and 5% CO2. Sample preparation and flow cytometry were then performed as described above. Each histogram contains at least 2000 gated events, with the exception of treatment with 20 μM cytochalasin D and 200 nM wortmannin.

37 in total

1. Deletion analogues of transportan.

Authors: U Soomets; M Lindgren; X Gallet; M Hällbrink; A Elmquist; L Balaspiri; M Zorko; M Pooga; R Brasseur; U Langel
Journal: Biochim Biophys Acta Date: 2000-07-31

2. Live-cell analysis of cell penetration ability and toxicity of oligo-arginines.

Authors: Gisela Tünnemann; Gohar Ter-Avetisyan; Robert M Martin; Martin Stöckl; Andreas Herrmann; M Cristina Cardoso
Journal: J Pept Sci Date: 2008-04 Impact factor: 1.905

3. A fully automated flow-based approach for accelerated peptide synthesis.

Authors: Alexander J Mijalis; Dale A Thomas; Mark D Simon; Andrea Adamo; Ryan Beaumont; Klavs F Jensen; Bradley L Pentelute
Journal: Nat Chem Biol Date: 2017-02-28 Impact factor: 15.040

Review 4. Cell-penetrating peptides: design, synthesis, and applications.

Authors: Dana Maria Copolovici; Kent Langel; Elo Eriste; Ülo Langel
Journal: ACS Nano Date: 2014-02-28 Impact factor: 15.881

5. Up-regulation of luciferase gene expression with antisense oligonucleotides: implications and applications in functional assay development.

Authors: S H Kang; M J Cho; R Kole
Journal: Biochemistry Date: 1998-05-05 Impact factor: 3.162

6. Interaction of cytochalasin D with actin filaments in the presence of ADP and ATP.

Authors: M F Carlier; P Criquet; D Pantaloni; E D Korn
Journal: J Biol Chem Date: 1986-02-15 Impact factor: 5.157

Review 7. Bax-inhibiting peptides derived from Ku70 and cell-penetrating pentapeptides.

Authors: J A Gomez; V Gama; T Yoshida; W Sun; P Hayes; K Leskov; D Boothman; S Matsuyama
Journal: Biochem Soc Trans Date: 2007-08 Impact factor: 5.407

8. Prediction of cell penetrating peptides by support vector machines.

Authors: William S Sanders; C Ian Johnston; Susan M Bridges; Shane C Burgess; Kenneth O Willeford
Journal: PLoS Comput Biol Date: 2011-07-14 Impact factor: 4.475

9. Cell-penetrating peptides as transporters for morpholino oligomers: effects of amino acid composition on intracellular delivery and cytotoxicity.

Authors: Rebecca P Wu; Derek S Youngblood; Jed N Hassinger; Candace E Lovejoy; Michelle H Nelson; Patrick L Iversen; Hong M Moulton
Journal: Nucleic Acids Res Date: 2007-08-01 Impact factor: 16.971

10. CPPsite 2.0: a repository of experimentally validated cell-penetrating peptides.

Authors: Piyush Agrawal; Sherry Bhalla; Salman Sadullah Usmani; Sandeep Singh; Kumardeep Chaudhary; Gajendra P S Raghava; Ankur Gautam
Journal: Nucleic Acids Res Date: 2015-11-19 Impact factor: 16.971

16 in total

Review 1. Pharmacological inhibition of Bax-induced cell death: Bax-inhibiting peptides and small compounds inhibiting Bax.

Authors: Kelsey Jensen; David Jasen WuWong; Sean Wong; Mieko Matsuyama; Shigemi Matsuyama
Journal: Exp Biol Med (Maywood) Date: 2019-03-05

2. Synthetic Molecular Evolution of Cell Penetrating Peptides.

Authors: William C Wimley
Journal: Methods Mol Biol Date: 2022

3. Cell-Penetrating Peptides.

Authors: Matjaž Zorko; Ülo Langel
Journal: Methods Mol Biol Date: 2022

4. Deep learning to design nuclear-targeting abiotic miniproteins.

Authors: Carly K Schissel; Somesh Mohapatra; Justin M Wolfe; Colin M Fadzen; Kamela Bellovoda; Chia-Ling Wu; Jenna A Wood; Annika B Malmberg; Andrei Loas; Rafael Gómez-Bombarelli; Bradley L Pentelute
Journal: Nat Chem Date: 2021-08-09 Impact factor: 24.427

Review 5. Cyclisation strategies for stabilising peptides with irregular conformations.

Authors: Quynh Ngoc Vu; Reginald Young; Haritha Krishna Sudhakar; Tianyi Gao; Tiancheng Huang; Yaw Sing Tan; Yu Heng Lau
Journal: RSC Med Chem Date: 2021-04-28

Review 6. Synthetic Biology and Computer-Based Frameworks for Antimicrobial Peptide Discovery.

Authors: Marcelo D T Torres; Jicong Cao; Octavio L Franco; Timothy K Lu; Cesar de la Fuente-Nunez
Journal: ACS Nano Date: 2021-02-04 Impact factor: 15.881

7. Can pulmonary RNA delivery improve our pandemic preparedness?

Authors: Olivia M Merkel
Journal: J Control Release Date: 2022-03-28 Impact factor: 11.467

8. Cell-Penetrating Streptavidin: A General Tool for Bifunctional Delivery with Spatiotemporal Control, Mediated by Transport Systems Such as Adaptive Benzopolysulfane Networks.

Authors: Javier López-Andarias; Jacques Saarbach; Dimitri Moreau; Yangyang Cheng; Emmanuel Derivery; Quentin Laurent; Marcos González-Gaitán; Nicolas Winssinger; Naomi Sakai; Stefan Matile
Journal: J Am Chem Soc Date: 2020-02-28 Impact factor: 15.419

Review 9. Progress in the therapeutic inhibition of Cdc42 signalling.

Authors: Natasha P Murphy; Helen R Mott; Darerca Owen
Journal: Biochem Soc Trans Date: 2021-06-30 Impact factor: 5.407

10. Synthetic molecular evolution of hybrid cell penetrating peptides.

Authors: W Berkeley Kauffman; Shantanu Guha; William C Wimley
Journal: Nat Commun Date: 2018-07-02 Impact factor: 14.919