| Literature DB >> 22267906 |
Abstract
Chloroplasts are organelles found in cells of green plants and eukaryotic algae that conduct photosynthesis. Knowing a protein's subchloroplast location provides in-depth insights about the protein's function and the microenvironment where it interacts with other molecules. In this paper, we present BS-KNN, a bit-score weighted K-nearest neighbor method for predicting proteins' subchloroplast locations. The method makes predictions based on the bit-score weighted Euclidean distance calculated from the composition of selected pseudo-amino acids. Our method achieved 76.4% overall accuracy in assigning proteins to 4 subchloroplast locations in cross-validation. When tested on an independent set that was not seen by the method during the training and feature selection, the method achieved a consistent overall accuracy of 76.0%. The method was also applied to predict subchloroplast locations of proteins in the chloroplast proteome and validated against proteins in Arabidopsis thaliana. The software and datasets of the proposed method are available at https://edisk.fandm.edu/jing.hu/bsknn/bsknn.html.Entities:
Keywords: bit-score weighted K-nearest neighbor method; feature selection; pseudo-amino acids; subchloroplast localization
Year: 2012 PMID: 22267906 PMCID: PMC3256996 DOI: 10.4137/EBO.S8681
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1Prediction accuracies of K-NN for various K values (1–15) based on Euclidean distance (ED) vs. bit-score weighted Euclidean distance (BS-WED).
The performance of BS-KNN based on selected pseudo-amino acid composition.
| Subchloroplast location | 4-fold cross-validation (S60_A) | Independent test (S60_B) | ||
|---|---|---|---|---|
| Recall | Precision | Recall | Precision | |
| Thylakoid lumen | 75.0% | 77.4% | 50.0% | 66.7% |
| Stroma | 64.9% | 63.2% | 88.9% | 57.1% |
| Thylakoid membrane | 85.3% | 84.5% | 84.0% | 91.3% |
| Envelope | 62.5% | 64.5% | 62.5% | 71.4% |
| Overall accuracy | ||||
The performance of BS-KNN using selected pseudo-amino acid composition on S60 dataset by self-consistency test and jackknife test.
| Subchloroplast location | Self-consistency test | Jackknife test | ||
|---|---|---|---|---|
| Recall | Precision | Recall | Precision | |
| Thylakoid lumen | 97.5% | 97.5% | 77.5% | 79.5% |
| Stroma | 100% | 100% | 73.9% | 61.8% |
| Thylakoid membrane | 99.2% | 98.4% | 85.0% | 83.7% |
| Envelope | 97.5% | 100% | 47.5% | 63.3% |
| Overall accuracy | ||||
Comparison of BS-KNN with previously published subchloroplast localization methods on S60 dataset by jackknife test.
| Location | Accuracy | |||
|---|---|---|---|---|
| SubChlo | ChloroRF | Subldent | BS-KNN | |
| Thylakoid lumen | 43.2% | 38.6% | 64.4% | |
| Stroma | 67.4% | 57.1% | 85.7% | |
| Thylakoid membrane | 83.7% | 87.5% | 98.2% | |
| Envelope | 40.0% | 47.5% | 80.0% | |
| Overall accuracy | 67.2% | 67.4% | 89.3% | |
List of computational methods for protein subchloroplast localization.
| Method | Details | Application |
|---|---|---|
| SubChlo | The method was based on the ET-KNN (evidence theoretic K-nearest neighbor). Using pseudo-amino acid compositions, the method achieved 67.2% overall prediction accuracy. | Predicting subchloroplast localizations of chloroplast proteins from protein sequence. |
| ChloroRF | The method was based on Random Forest. Using 531 physicochemical properties obtained from AAindex dataset, the method achieved 67.4% overall prediction accuracy. | Predicting subchloroplast localizations of chloroplast proteins from protein sequence. |
| Subldent | The method was based on Support Vector Machine. Using features extracted by discreate wavelet transform (DWT) from amino acids’ hydrophobicity and polarity values, the method achieved 89.3% overall prediction accuracy in subchloroplast location. | The method can be applied to classify whether a protein is mitochondria or chloroplast protein. If the protein is in mitochondria, then the method can predict its submitochondria location; otherwise predictes its subchloroplast location. |
| BS-KNN | The method was based on a bit-score weighted K-nearest neighbor (BS-KNN) method for predicting protein subchloroplast locations. The method makes prediction based on the bit-score weighted Euclidean distance calculated from the composition of selected pseudo-amino acids. It achieved about 76% overall prediction accuracy. | Predicting subchloroplast localizations of chloroplast proteins from protein sequence. |