| Literature DB >> 27366202 |
Liliana I Barbosa-Santillán1, Juan J Sánchez-Escobar2, M Angeles Calixto-Romo3, Luis F Barbosa-Santillán1.
Abstract
We present an Identify Selective Antibacterial Peptides (ISAP) approach based on abstracts meaning. Laboratories and researchers have significantly increased the report of their discoveries related to antibacterial peptides in primary publications. It is important to find antibacterial peptides that have been reported in primary publications because they can produce antibiotics of different generations that attack and destroy the bacteria. Unfortunately, researchers used heterogeneous forms of natural language to describe their discoveries (sometimes without the sequence of the peptides). Thus, we propose that learning the words meaning instead of the antibacterial peptides sequence is possible to identify and predict antibacterial peptides reported in the PubMed engine. The ISAP approach consists of two stages: training and discovering. ISAP founds that the 35% of the abstracts sample had antibacterial peptides and we tested in the updated Antimicrobial Peptide Database 2 (APD2). ISAP predicted that 45% of the abstracts had antibacterial peptides. That is, ISAP found that 810 antibacterial peptides were not classified like that, so they are not reported in APD2. As a result, this new search tool would complement the APD2 with a set of peptides that are candidates to be antibacterial. Finally, 20% of the abstracts were not semantic related to APD2.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27366202 PMCID: PMC4913023 DOI: 10.1155/2016/1505261
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1(R1) percentage > 85: antibacterial (592.0), (R2) percentage ≤ 85 and percentage ≤ 79: noAntibacterial (360.0), and (R3) percentage ≤ 85 and percentage > 79: canBeAntibacterial (848.0/38.0).
Some of the 1762 antibacterial-peptides patterns.
| ID | Peptide |
|---|---|
| AP00001 | GLWSKIKEVGKEAAKAAAKAAGKAALGAVSEAV |
| AP00002 | YVPLPNVPQPGRRPFPTFPGQGPFNPKIKWPQGY |
| AP00004 | NLCERASLTWTGNCGNTGHCDTQCRNWESAKHGACHKRGNWKCFCYFDC |
| AP00005 | VFIDILDKVENAIHNAAQVGIGFAKPFEKLINPK |
| AP00006 | GNNRPVYIPQPRPPHPRI |
| AP00007 | GNNRPVYIPQPRPPHPRL |
| AP00008 | RLCRIVVIRVCR |
| AP00009 | RFRPPIRRPPIRPPFYPPFRPPIRPPIFPPIRPPFRPPLGPFP |
| AP00010 | RRIRPRPPRLPRPRPRPLPFPRPGPRPIPRPLPFPRPGPRPIPRPLPFPRPGPRPIPRPL |
| AP00011 | WNPFKELERAGQRVRDAVISAAPAVATVGQAAAIARG |
| AP00012 | GLFDIIKKIAESI |
| AP00013 | GLFDIIKKIAESF |
| ⋮ | ⋮ |
| AP02172 | FFGSLLSLGSKLLPSVFKLFQRKKE |
Box 1One example of a pattern abstract.
A partial view of the terms-abstracts matrix was generated with 8892 terms that contain data related to the antibacterial-pattern abstract collection.
| Terms |
|
|
|
|
|---|---|---|---|---|
| antimicrobi | 0.0735581075541086 | − 0.0244847301347335 | − 0.00710470395642022 | − 0.0252782143833409 |
| peptid | 0.0215719016953859 | − 0.00659599218340233 | 0.00526833571530154 | − 0.0027931356297511 |
| identifi | 0.0778285095429026 | 0.0363743255478556 | − 0.0125611882021721 | 0.0335513355175925 |
| puparum | 0.0031890062317054 | − 0.00433904701191521 | 0.000278800242086269 | − 0.000536875590521908 |
| cdna | 0.0752034473971184 | − 0.100948929444321 | − 0.120231206062553 | 0.00808336458293407 |
| clone | 0.0506850130947056 | − 0.0638498278798138 | − 0.0757289501422613 | 0.00855381871117201 |
| shen | 0.0015945031158527 | − 0.0021695235059576 | 0.000139400121043145 | − 0.000268437795260781 |
| cheng | 0.00269501661729059 | − 0.00524105545332141 | 0.0058427721000811 | 0.00350622976781349 |
| altosaar | 0.0015945031158527 | − 0.0021695235059576 | 0.000139400121043145 | − 0.000268437795260801 |
| state | 0.0113803981224781 | − 0.0145520357690956 | 0.0142282472637731 | − 0.000138326518028741 |
| kei | 0.0197729657477364 | − 0.0367429529088927 | − 0.0172473645432955 | 0.000106625559641722 |
| laboratori | 0.0262664068350282 | − 0.0268798478912976 | − 0.0148428311950068 | − 0.0498138841648604 |
| rice | 0.0015945031158527 | − 0.0021695235059576 | 0.000139400121043145 | − 0.000268437795260802 |
| biologi | 0.0310863666985836 | − 0.027616438954597 | 0.00141517510229895 | − 0.0374922667242602 |
| ministri | 0.00401909815031002 | − 0.00553150270576137 | − 0.00244184305151988 | − 0.00110105794405476 |
| agricultur | 0.0196836003644828 | − 0.0365550561462568 | − 0.0376858882953384 | 0.00278678534624657 |
| molecular | 0.0410216058415589 | − 0.037285345518391 | 0.00109576583310193 | − 0.0519376532603683 |
| crop | 0.0015945031158527 | − 0.0021695235059576 | 0.000139400121043145 | − 0.000268437795260802 |
| pathogen | 0.0276770473353559 | 0.00130713009997107 | 0.0051845473156494 | − 0.00667894125821291 |
| insect | 0.00541558702390805 | − 0.00725187831084491 | 0.00234699169994147 | − 0.00139023706808898 |
|
| X | X | X | X |
Total average accuracy of classification is 97% and misclassification or error rate is 2.11%.
| Correctly classified instances | 1762 (97.8889%) |
| Incorrectly classified instances | 38 (2.1111%) |
| Kappa statistic | 0.9666 |
| Mean absolute error | 0.0269 |
| Root mean squared error | 0.1161 |
| Relative absolute error | 6.3555 |
| Root relative squared error | 25.2452 |
| Total number of instances | 1800 |
Detailed accuracy by class.
| TP rate | FP rate | Precision | Recall |
| ROC area | Class |
|---|---|---|---|---|---|---|
| 0.94 | 0 | 1 | 0.94 | 0.969 | 0.973 | antibacterial |
| 1 | 0 | 1 | 1 | 1 | 1 | noAntibacterial |
| 1 | 0.038 | 0.955 | 1 | 0.977 | 0.975 | canBeAntibacterial |
The confusion matrix.
|
|
|
| Classified as |
|---|---|---|---|
| 592 | 0 | 38 |
|
| 0 | 360 | 0 |
|
| 0 | 0 | 810 |
|
Five peptides patterns selected randomly.
| ID | Peptide |
|---|---|
| Pattern 1 | AACSDRAHGHICESFKSFCKDSGRNGVKLRANCKKTCGLC |
| Pattern 2 | GLFDVIKKVASVIGGL |
| Pattern 3 | AILTTLANWARKFL |
| Pattern 4 | AKKVFKRLEKLFSKIQNDK |
| Pattern 5 | ALSILRGLEKLAKMGIALTNCKATKKC |
Figure 2The results of the first pattern “AACSDRAHGHICESFKSFCKDSGRNGVKLRANCKKTCGLC.”
As a result for the second pattern the following research articles obtained in a list recorded in descending order.
| PMID | Title | Percentage |
|---|---|---|
| 23069634 | Structural and activity changes in three bioactive | 98.116781% |
|
| ||
| 11478963 | Enhancing the hypotensive effect and diminishing | 13.075785% |
|
| ||
| 15917539 | Proline conformation-dependent antimicrobial activity | 10.655851% |
|
| ||
| 16470724 | Host-defence skin peptides of the Australian streambank | 9.804270% |
|
| ||
| 9231329 | Hydrophobic effects on antibacterial and channel-forming | 8.624214% |
|
| ||
| ⋮ | ⋮ | |
Figure 3A distribution of the antibacterial-peptides patterns according to their precision.
Nine classes of the antibacterial peptides patterns according to their precision. Frequency tabulation for similaritypatterns.similarity.
| Class | Lower | Upper | Midpoint | Frequency | Relative | Cumulative | Cum. rel. |
|---|---|---|---|---|---|---|---|
| at or below | −10.0 | 0 | 0.0000 | 0 | 0.0000 | ||
| 1 | −10.0 | 3.33333 | −3.33333 | 67 | 0.2815 | 67 | 0.2815 |
| 2 | 3.33333 | 16.6667 | 10.0 | 0 | 0.0000 | 67 | 0.2815 |
| 3 | 16.6667 | 30.0 | 23.3333 | 0 | 0.0000 | 67 | 0.2815 |
| 4 | 30.0 | 43.3333 | 36.6667 | 0 | 0.0000 | 67 | 0.2815 |
| 5 | 43.3333 | 56.6667 | 50.0 | 0 | 0.0000 | 67 | 0.2815 |
| 6 | 56.6667 | 70.0 | 63.3333 | 4 | 0.0168 | 71 | 0.2983 |
| 7 | 70.0 | 83.3333 | 76.6667 | 10 | 0.0420 | 81 | 0.3403 |
| 8 | 83.3333 | 96.6667 | 90.0 | 83 | 0.3487 | 164 | 0.6891 |
| 9 | 96.6667 | 110.0 | 103.333 | 74 | 0.3109 | 238 | 1.0000 |
| above | 110.0 | 0 | 0.0000 | 238 | 1.0000 |
Mean = 67.0756; standard deviation = 42.5446.
Figure 4Twelve antibacterial-peptides patterns and their first eleven research articles related to their precision.