| Literature DB >> 23663289 |
Vladimir Espinosa Angarica1, Salvador Ventura, Javier Sancho.
Abstract
BACKGROUND: Prion proteins conform a special class among amyloids due to their ability to transmit aggregative folds. Prions are known to act as infectious agents in neurodegenerative diseases in animals, or as key elements in transcription and translation processes in yeast. It has been suggested that prions contain specific sequential domains with distinctive amino acid composition and physicochemical properties that allow them to control the switch between soluble and β-sheet aggregated states. Those prion-forming domains are low complexity segments enriched in glutamine/asparagine and depleted in charged residues and prolines. Different predictive methods have been developed to discover novel prions by either assessing the compositional bias of these stretches or estimating the propensity of protein sequences to form amyloid aggregates. However, the available algorithms hitherto lack a thorough statistical calibration against large sequence databases, which makes them unable to accurately predict prions without retrieving a large number of false positives.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23663289 PMCID: PMC3654983 DOI: 10.1186/1471-2164-14-316
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Amino acid propensities in PrD and PrD-cores
| 0.675 | −0.568 | 0.670 | −0.578 | |
| 0.071 | −3.807 | 1.520 | 0.604 | |
| 0.352 | −1.507 | 0.280 | −1.837 | |
| 0.147 | −2.766 | 0.550 | −0.862 | |
| 0.718 | −0.478 | 2.310 | 1.208 | |
| 1.028 | 0.040 | 0.960 | −0.059 | |
| 0.913 | −0.131 | 0.760 | −0.396 | |
| 0.350 | −1.515 | 2.260 | 1.176 | |
| 0.271 | −1.883 | 0.210 | −2.252 | |
| 0.340 | −1.556 | 0.960 | −0.059 | |
| 1.125 | 0.170 | 1.960 | 0.971 | |
| 5.700 | 2.511 | 1.080 | 0.111 | |
| 1.170 | 0.227 | 0.300 | −1.737 | |
| 4.125 | 2.044 | 1.070 | 0.098 | |
| 0.436 | −1.196 | 0.670 | −0.578 | |
| 1.662 | 0.733 | 1.140 | 0.189 | |
| 0.830 | −0.268 | 0.890 | −0.168 | |
| 0.304 | −1.716 | 2.260 | 1.176 | |
| 0.091 | −3.459 | 1.950 | 0.963 | |
| 1.724 | 0.786 | 2.180 | 1.124 | |
The observed frequencies of occurrence of the different amino acid residues were transformed into the corresponding statistical potentials using the equation described in Methods. Columns 2 and 3 show the calculated odds-ratio for the complete prion and the statistical potentials corresponding to the odds-ratios of PrD respectively (LOr). Columns 4 and 5 contain the ratio and log-odds obtained experimentally by means of a random mutagenesis assay with the library 1, as described in the paper by Toombs et al. [30].
Figure 1Observed frequency of P-(X)-P patterns in proteins. A representative non-redundant dataset of 4606913 from Uniref 50 were analyzed in the search for the significance of proline patterns in the protein universe. In the chart we plot the trend of the observed frequency of each pattern of two prolines separated a given distance between 1 and 60 residues.
Figure 2ROC plots of the PrD recovery and bootstrapping assays. The scoring histogram distributions of the negative and positive datasets were processed and the true positive rate (TPR) was plotted against the false positive rate (FPR) in a tryout in which the known PrDs –i.e. positives in all four experimental tests [38]– are picked up from a test dataset of non prions –i.e. negatives in all four experimental tests [38]. In red we show the plot obtained using our model which has an area under the curve (AUC) of 0.90. We also include the result of a bootstrap assay in which the 18 prions used as the training set were resampled 106 times forming partial training sets of 9 prions and generating positive test sets for the ROC plot analysis of the rest 9 prions. One million ROC plots were generated always using the same negative set and the average ROC curve was calculated (shown in blue), the area under the curve (AUC) is 0.85.
Figure 3Scoring of PrDs in yeast with respect to the complete proteome. The density histogram of the score of all the proteins in the yeast genome is shown in panel A. In panel B, left ordinate axis we include the observed p-values for the 29 known prions in this organism (blue line connecting open triangles) and the cumulative ratio representing the percent of known prions with a p-value equal or less than a given value is shown in the right ordinate axis (red line connecting open squares).
Figure 4Precision-recall plots for the comparison of PrD and non-prionogenic sequence distributions. For each one of the three negative additional datasets including proteins from Uniprot, Disprot and the PDB we follow the evolution of the classifier’s Precision to correctly make a positive mapping of known PrD segments from a pool of non-prionogenic sequences. These values are plotted against the TPR –i.e. recall– of the corresponding classification step. The ratio between the number of instances in each positive and negative distribution is also shown.
Figure 5Accuracy-cutoff plot of the classifier against the negative test set. The Accuracy obtained for the correct classification of TP and TN is graphed against decreasing cutoffs spanning the score range of the corresponding negative and positive distributions. We highlighted the highest accuracy of the assay, used to set the predictive cutoff of 50 bits.
Summary of the prion predictions in different taxa
| Archaea | 14 | 5769 | 22 | |
| Bacteria | 839 | 860337 | 2220 | |
| Viruses | 29 | 5807 | 115 | |
| Fungi | 114 | 965461 | 3330 | |
| Invertebrates | 220 | 1064320 | 13609 | |
| Vertebrates | 30 | 213915 | 190 | |
| Plants | 104 | 591244 | 518 | |
| Rodents | 7 | 137372 | 170 | |
| Mammals | 36 | 388018 | 275 | |
| Human | 1 | 96088 | 111 |
The predictions obtained for all the organisms analyzed is organized by taxon and the following information is included in the table: in the first column the index of the Additional file including the predictions of a taxon; in column 2, the taxon; in column 3, the number of organisms for which we obtained predictions; in column 4, the number of proteins scanned in the search for PrDs; and column 5 shows the number of proteins bearing prion-forming domains obtained.
Ratio of prion domains in the proteomes of representative organisms
| 117 | 3.90 | |
| 89 | 1.64 | |
| 468 | 17.9 | |
| 60 | 1.57 | |
| 2692 | 20.1 | |
| 992 | 8.01 | |
| 853 | 10.2 | |
| 11 | 0.50 | |
| 15 | 0.16 | |
| 169 | 2.62 | |
| 632 | 10.7 | |
| 150 | 2.58 | |
| 56 | 0.20 | |
| 50 | 0.08 | |
| 509 | 2.48 | |
| 486 | 3.33 | |
| 115 | 0.84 | |
| 98 | 0.42 | |
| 111 | 0.29 |
The percent of the proteome corresponding to proteins bearing putative prion-domain (column 3) is shown for a representative group of model organisms (column 1), from different evolutionary classifications, some of which have been extensively studied and whose complete genomes have been well characterized. The organisms included correspond to different species of (1) bacteria, (2) protozoans, (3) yeast, (4) plants, (5) dipterans, (6) nematode and (7) human. The number of predictions obtained for each organism is shown in column 2.
Association between proteins bearing PrD predictions and diseases in human
| ATXN1 | Spinocerebellar ataxia |
| Huntington’s disease | |
| ATXN3 | Machado-joseph disease |
| Spinocerebellar ataxias | |
| ATXN8 | Spinocerebellar Ataxia Type 8 |
| BMP2K | Internuclear ophthalmoplegia |
| Ulnar neuropathy | |
| FOXP2 | Speech-language disorders |
| Blepharophimosis | |
| Premature ovarian failure | |
| Autism | |
| Dyslexia | |
| HTT | Huntington’s disease |
| Spinocerebellar ataxia | |
| MAML | Mucoepidermoid carcinoma |
| Hidradenoma | |
| Lipoadenoma | |
| Epithelial-myoepithelial carcinoma | |
| MED12 | FG syndrome |
| Intellectual disability | |
| Schizophrenia | |
| MED15 | Epicondylitis |
| NCOA3 | Breast cancer |
| Ovarian carcinoma | |
| PAXIP1 | Spinocerebellar ataxia |
| TAF15 | Chondrosarcoma |
| Peripheral primitive neuroectodermal tumor | |
| Amyotrophic lateral sclerosis | |
| Sarcoma | |
| Liposarcoma | |
| TOX3 | Breast cancer |
| TPB | Spinocerebellar ataxia |
| Tuberculosis | |
| Huntington’s disease |
We compiled the different diseases associated with the genes in humans for which we found PrD predictions.