| Literature DB >> 27634135 |
Shunsuke Shigemitsu1, Wei Cao1, Tohru Terada2, Kentaro Shimizu3.
Abstract
BACKGROUND: "Tail-anchored (TA) proteins" is a collective term for transmembrane proteins with a C-terminal transmembrane domain (TMD) and without an N-terminal signal sequence. TA proteins account for approximately 3-5 % of all transmembrane proteins that mediate membrane fusion, regulation of apoptosis, and vesicular transport. The combined use of TMD and signal sequence prediction tools is typically required to predict TA proteins.Entities:
Keywords: HMMs; Machine learning; Membrane proteins; Prediction; TA proteins
Mesh:
Substances:
Year: 2016 PMID: 27634135 PMCID: PMC5025589 DOI: 10.1186/s12859-016-1202-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Likelihood score classification
| Negative data | AUC | Sensitivity | Specificity | TP | FP | FN | TN |
|---|---|---|---|---|---|---|---|
| SP | 0.957 | 0.877 | 0.917 | 142 | 108 | 20 | 1199 |
| MP | 0.971 | 0.901 | 0.934 | 146 | 27 | 16 | 383 |
| NO | 0.967 | 0.895 | 0.929 | 145 | 355 | 17 | 4675 |
| Total | 0.963 | 0.864 | 0.952 | 140 | 327 | 22 | 6420 |
Each row corresponds to the specified individual negative data. The “total” at the bottom of the table shows the cross-validation results for all negative data. TP, FP, FN, and TN represent true-positive, false-positive, false-negative, and true-negative values, respectively
Fig. 1Results of likelihood score-based prediction. In a to d, the upper graphs show the ROC curves, and the lower graphs shows the histogram of likelihood scores. The vertical red lines represent the thresholds for calculating sensitivity and specificity. For NO and ALL set, the MP scores are used as their likelihood scores (denoted as Smp*) and the threshold value of the MP dataset is used as their threshold values
Classification by decoding
| Name | Sensitivity | Specificity | TP | FP | FN | TN |
|---|---|---|---|---|---|---|
| SP | 0.901 | 0.881 | 146 | 153 | 16 | 1154 |
| MP | 0.901 | 0.924 | 146 | 23 | 16 | 387 |
| NO | 0.901 | 0.960 | 146 | 199 | 16 | 4831 |
| Total | 0.901 | 0.944 | 146 | 375 | 16 | 6372 |
Each row corresponds to the specified individual negative data. The “total” at the bottom of the table shows the cross-validation results of all negative data. The area under the receiver operator curve (AUC) cannot be calculated because there are no indices (e.g., likelihood scores) associated with this method
Fig. 2Distribution of predicted lengths of tail and TMD regions. a Histogram of the predicted lengths of the TMD regions in sequences successfully predicted to have TMD and tail regions (146 sequences). b Histogram of the predicted lengths of the tail regions in sequences successfully predicted to have TMD and tail regions (146 sequences)
Predictions using likelihood scores and decoding
| Name | Sensitivity | Specificity | TP | FP | FN | TN |
|---|---|---|---|---|---|---|
| SP | 0.846 | 0.947 | 137 | 69 | 25 | 1238 |
| MP | 0.870 | 0.956 | 141 | 18 | 21 | 392 |
| NO | 0.920 | 0.980 | 149 | 101 | 22 | 4929 |
| Total | 0.852 | 0.978 | 138 | 151 | 24 | 6596 |
Each row corresponds to the specified individual negative data. The “total” at the bottom of the table shows cross-validation results of all negative data. The area under the receiver operator curve (AUC) cannot be calculated, because there are no indices (e.g., likelihood scores) associated with this method
Sequences not predicted by the TMHMM or our method TAPPM
| TAPPM | TMHMM | TMHMM | TAPPM |
|---|---|---|---|
| failures | failures | failures and | failures and |
| TAPPM success | TMHHM successes | ||
| TOM7_YEAST | GDAP1_HUMAN | TOM22_YEAST | O22825_ARATH |
| PGC1_YEAST | PEX15_YEAST | YBM6_YEAST | MAVS_HUMAN |
| TOM7_HUMAN | PGC1_YEAST | GDAP1_HUMAN | TOM6_YEAST |
| MAVS_HUMAN | SEC20_YEAST | PEX15_YEAST | TLG2_YEAST |
| O22825_ARATH | TOM22_YEAST | VPS64_YEAST | TOM7_HUMAN |
| GEX2_ARATH | TOM7_YEAST | UFE1_YEAST | GEX2_ARATH |
| MTX1_HUMAN | UFE1_YEAST | MTX1_HUMAN | |
| YD012_YEAST | VPS64_YEAST | Q9FNB2_ARATH | |
| TOM6_YEAST | YBM6_YEAST | ||
| TLG2_YEAST | YD012_YEAST | ||
| Q9FNB2_ARATH | |||
| SEC20_YEAST |
Subcellular locations of the collected TA protein sequences
| Subcellular location | # of sequences |
|---|---|
| Endoplasmic reticulum | 52 |
| Plasma membrane | 46 |
| Golgi apparatus | 33 |
| Plastid | 13 |
| Nucleus | 7 |
| Vacuole | 8 |
| Peroxisome | 3 |
| Synaptic vesicle membrane | 3 |
Certain sequences localize to several subcellular locations; therefore, the sum is <162
Fig. 3Transition-state diagram of HMM models. a TA model: Circles represent nodes (hidden states) and arrows signify transitions. Two models were constructed with either two or four states in the tail region. b SP model: Although globular region (1) and globular region (2) have different transitional states, their initial output probability conditions were identical. c MP model: Under the initial conditions, the output and transition probabilities of the SP, cap, and TMD regions were identical to those of the SP model. The loop region may comprise one to 20 residues