| Literature DB >> 17623108 |
Eunkyoung Jung1, Junhyoung Kim, Minkyoung Kim, Dong Hyun Jung, Hokyoung Rhee, Jae-Min Shin, Kihang Choi, Sang-Kee Kang, Min-Kook Kim, Cheol-Heui Yun, Yun-Jaie Choi, Seung-Hoon Choi.
Abstract
BACKGROUND: Oral delivery is a highly desirable property for candidate drugs under development. Computational modeling could provide a quick and inexpensive way to assess the intestinal permeability of a molecule. Although there have been several studies aimed at predicting the intestinal absorption of chemical compounds, there have been no attempts to predict intestinal permeability on the basis of peptide sequence information. To develop models for predicting the intestinal permeability of peptides, we adopted an artificial neural network as a machine-learning algorithm. The positive control data consisted of intestinal barrier-permeable peptides obtained by the peroral phage display technique, and the negative control data were prepared from random sequences.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17623108 PMCID: PMC1955455 DOI: 10.1186/1471-2105-8-245
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison for relative hydrophilicity and hydrophobicity of amino acids for the real data sets.
| Hydrophobicity* | ||||||||
| Amino acid | Hominga | Randomb | Ratioc | Calculatedd | Side-chain analoguese | Amino acidsf | N-acetyl amidesg | Hydrophilicity* |
| Alanine | 6.85 | 6.50 | 1.05 | -0.39 | -0.87 | -0.50 | -0.31 | -0.45 |
| Glycine | 4.02 | 2.20 | 1.83 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Isoleucine | 1.56 | 2.10 | 0.74 | -1.82 | -3.98 | -1.80 | -1.80 | -0.24 |
| Leucine | 7.19 | 9.60 | 0.75 | -1.82 | -3.98 | -1.80 | -1.70 | -0.11 |
| Valine | 1.98 | 1.90 | 1.04 | -1.30 | -3.10 | -1.50 | -1.22 | -0.40 |
| Methionine | 2.69 | 3.30 | 0.82 | -0.96 | -1.41 | -1.30 | -1.23 | -3.87 |
| Phenylalanine | 1.56 | 2.10 | 0.74 | -2.27 | -2.04 | -2.50 | -1.79 | -3.15 |
| Tryptophan | 0.66 | 1.90 | 0.35 | -2.13 | -1.39 | -3.40 | -2.25 | -8.27 |
| Proline | 11.41 | 10.70 | 1.07 | -0.99 | - | -1.40 | -0.72 | - |
| Cysteine | 0.02 | 0.00 | - | -0.99 | -0.34 | -1.00 | -1.54 | -3.63 |
| Serine | 13.42 | 8.60 | 1.56 | 1.24 | 4.34 | 0.30 | 0.04 | -7.45 |
| Threonine | 10.72 | 13.10 | 0.82 | 1.00 | 3.51 | -0.40 | -0.26 | -7.27 |
| Tyrosine | 1.95 | 2.40 | 0.81 | -1.47 | 1.08 | -2.30 | -0.96 | -8.50 |
| Asparagine | 6.22 | 6.40 | 0.97 | 1.91 | 7.58 | 0.20 | 0.60 | -12.07 |
| Glutamine | 7.56 | 7.10 | 1.06 | 1.30 | 6.48 | 0.20 | 0.22 | -11.77 |
| Histidine | 6.74 | 6.90 | 0.98 | 0.64 | 5.60 | -0.50 | -0.13 | -12.66 |
| Lysine | 5.33 | 3.80 | 1.40 | 2.77 | 6.49 | 3.00 | 0.99 | -11.91 |
| Arginine | 4.89 | 4.30 | 1.14 | 3.95 | 15.86 | 3.00 | 1.01 | -22.31 |
| Aspartic acid | 3.06 | 4.10 | 0.75 | 3.81 | 9.66 | 2.50 | 0.77 | -13.34 |
| Glutamic acid | 2.18 | 3.10 | 0.70 | 2.91 | 7.75 | 2.50 | 0.64 | -12.63 |
| Correlation Coefficienth | 0.17 | 0.21 | 0.40 | 0.50 | 0.03 | |||
a Observed frequency of each amino acid in the tissue-homing heptapeptide set obtained from peroral phage display
b Observed frequency of each amino acid in the phage library (Ph. D-C7C™ library)
c Relative ratio of amino acid frequency in homing peptidea to amino acid frequency in phage libraryb
d Calculated from hydrophobicities of the individual groups that make up each side chain, using data for the partition coefficient between water and octanol of many model compounds.
e Hydrophilicity was measured by the partition coefficient Kof the model for each side chain from vapor → water; hydrophobicity for water → cyclohexane. For ionizing side chains, the values were corrected for the fraction of each side chain that is ionized at pH 7. Both scales were normalized to zero for the value of Gly.
f Some values were measured from the relative solubilities of the amino acid in water and ethanol or dioxane.
g Measured from the partition coefficient between water and octanol of the N-acetyl amino acid amides.
h Correlation coefficients between relative ratioc and each hydrophobicity/hydrophilicity.
*Reference [26].
Figure 1Predictive features of the model. The model was constructed with zero neuron in a hidden layer and one in an output layer using binary descriptors. (A) Enrichment curve, (B) Histogram Actives vs. Model values, and (C) Receiver Operating Characteristic (ROC) curve. The features for the training and test set were plotted in the left and right panels, respectively.
Prediction accuracy for models with various network architecturesa.
| Binary Descriptor | VHSE Descriptor | ||||||||
| 1 : 1 Data set | 1 : 3 Data set | 1 : 1 Data set | 1 : 3 Data set | ||||||
| Training | Test | Training | Test | Training | Test | Training | Test | ||
| 0 | 0.84 | 0.77 | 0.83 | 0.79 | 0 | 0.80 | 0.76 | 0.79 | 0.77 |
| 1 | 0.92 | 0.73 | 0.90 | 0.76 | 1 | 0.87 | 0.70 | 0.84 | 0.75 |
| 2 | 0.97 | 0.71 | 0.94 | 0.77 | 2 | 0.89 | 0.71 | 0.86 | 0.75 |
| 3 | 0.98 | 0.71 | 0.97 | 0.74 | 3 | 0.92 | 0.70 | 0.90 | 0.72 |
a The network architecture A-B-C indicates the total number of descriptors in an input layer, where A is (7, the sequence length of a peptide) × (the number of descriptors for each amino acid), B and C are the numbers of neurons in hidden and output layers, respectively. For instance, the network architecture (7 × 20)-0-1 specifies a model constructed with zero neuron in hidden layer and one in output layer using the binary descriptor. All the models have one neuron in output layer.
b The number(B) of neurons in a hidden layer.
Figure 2Distribution of prediction scores for all permutations of three peptide sequences.
The results of validation for models with network architecture (7 × 20)-0-1a.
| Leave-5%-out cross-validationb | Decoy analysisc | ||||
| 1 : 1 Data set | Real set | Decoy set | |||
| Trainingd | Testd | Training | Test | Training | Test |
| 0.841 ± 0.002 | 0.760 ± 0.005 | 0.82 | 0.74 | 0.70 | 0.47 |
a The network architecture (7 × 20)-0-1 specifies a model constructed with zero neuron in hidden layer and one in output layer using the binary descriptor.
b The results of rigorous test using leave-5%-out method in 1:1 data sets.
c Comparison of ROC scores between real and decoy set using non-redundant data.
d The results of 20 rigorous tests are averaged and expressed as mean ± standard deviation.
Figure 3The features of the model constructed with the decoy set. The models were constructed with zero neuron in a hidden layer and one in an output layer using binary descriptor. (A) Training set and (B) Test set.
Comparison of truth table statistics for the test sets for two models
| Network | 1 : 1 Data set | 1 : 3 Data set | |||||||||
| architecture | SEa | SPb | PPVc | NPVd | Acce | SEa | SPb | PPVc | NPVd | Acce | |
| (7 × 20)-0-1 | 74 | 67 | 69 | 72 | 70 | 32 | 94 | 65 | 81 | 79 | |
| (7 × 8)-0-1 | 70 | 72 | 71 | 70 | 71 | 19 | 96 | 59 | 78 | 76 | |
a SE = Sensitivity : the proportion of all intestinal barrier-permeable peptides correctly predicted, SE = TP/(TP + FN) where TP is the number of intestinal barrier-permeable peptides correctly predicted and FN is the number of intestinal barrier-permeable peptides incorrectly predicted as impermeable peptides.
b SP = Specificity : the proportion of intestinal barrier-impermeable peptides correctly predicted, SP = TN/(TN + FP) where TN is the number of intestinal barrier-impermeable peptides correctly predicted and FP is the number of intestinal barrier-impermeable peptides incorrectly predicted as permeable peptides.
c PPV = Positive Predictive Value : the probability that a predicted permeable peptide is in fact a barrier- permeable peptide, PPV = TP/(TP + FP).
d NPV = Negative Predictive Value : the probability that a predicted intestinal barrier-impermeable peptide is in fact impermeable peptide, NPV = TN/(TN + FN).
e Acc = Accuracy : the percentage of all predictions that are correct, Acc = (TP + TN)/Total.
Figure 4A schematic view of peroral phage display procedure. After the third round of biopanning, individual recombinant phage was randomly selected from each organ tissue elute for analysis of peptide sequences from their genomes.