| Literature DB >> 22363733 |
Djamel Harbi1, Marimuthu Parthiban, Deena M A Gendoo, Sepehr Ehsani, Manish Kumar, Gerold Schmitt-Ulms, Ramanathan Sowdhamini, Paul M Harrison.
Abstract
Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing that the only abundant bias pattern is for asparagine bias with subsidiary serine bias. We anticipate that this database will be a useful experimental aid and reference resource. It is freely available at: http://libaio.biol.mcgill.ca/prion.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22363733 PMCID: PMC3282748 DOI: 10.1371/journal.pone.0031785
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A screenshot of an entry in the PrionHome database, for the Shadoo transcribed pseudogene in the human genome.
Figure 2A flow-chart showing a summary of the data collation and curation.
Summary of database content.
| PrionHome Classification | Number ofDatabase Entries |
| Prionogenic | 51 |
| Prionoid | 6 |
| Other prion-related phenomenon | 10 |
| Interactor | 460 |
| Ortholog | 958 |
| Paralog | 205 |
| Candidate-Prionogenic | 411 |
| Pseudogene | 13 |
| TOTAL | 2003 |
*This is not an arithmetic sum of the individual PrionHome Classification categories, since some database entries have multiple PrionHome Classifications.
Figure 3A graphical depiction of how transmission differs for prions and prionoids, as defined in the database.
A key that explains the symbols in the figure is included.
Figure 4An example of the results of a keyword search, for the term ‘Anolis’ in the ‘Organism’ field.
Five entries for Anolis caroliensis (a lizard), appear listed.
Counts of mammalian sequences in the PrionHome database with the sequence motif tyrosine-tyrosine-arginine (YYR).
| PrionHome Classification | # of sequences with the YYR sequence motif | # of sequences without the YYR sequence motif |
| Prionogenic PrP sequences (Major Prion Protein) | 20 | 0 |
| Orthologs of Major Prion Protein | 413 | 9 |
| Paralogs of Major Prion Protein | 0 | 120 |
As an example, the SQL query to obtain this data is: SELECT prion_id , name FROM main WHERE sequence REGEXP ‘YYR’ AND prion_type = ‘Ortholog’ AND taxonomy LIKE ‘%mammal%’.
A summary of prionogenic (Amyloid Prion Type Am) and candidate-prionogenic sequences in the PrionHome database from Saccharomyces cerevisiae, that have N/Q and F/Y compositional biases.
| PrionHome Classification | # of sequences that have N/Q bias, but no F/Y bias | # of sequences that have N/Q bias, and F/Y bias |
| Prionogenic | 19 | 6 |
| Candidate-Prionogenic | 329 | 48 |
| Candidate-Prionogenic experimentally shown not to form amyloid | 16 | 2 |
*F/Y compositional biases with binomial P-values< = 10−6 are considered (see Database Construction & Content for details).
**This data includes the Alberti, et al. (2009) data from screens for candidate prionogenic sequences.
***As an example, the SQL query to obtain this data is: SELECT prion_id , name FROM main WHERE prion_type LIKE’%prionogenic%’ AND bias REGEXP ‘Y|F’ AND bias REGEXP ‘N|Q’ AND organism LIKE ‘%cerevisiae%’.
****Polymorphic sequences have been removed from these totals. This is the total list of Prion-Like sequences from Harrison, et al. (2006) and Alberti, et al. (2009).
*****Prion-like sequences that failed to form amyloid by any tests in Alberti, et al. (2009).
Detailed analysis of compositional biases in budding yeast amyloid-type Prionogenic sequences.
| Database Identifier | Name and Prion [in square brackets] | Compositional biases |
| PD0023 | Serine/threonine-protein_kinase_CBK1[Alberti, et al. 2009 data] | 188/249/5.4e-46/Q; 85/179/2.2e-12/SNP |
| PD0026 | Prion_formation_protein_1_NEW1 [NU+] | 68/94/8.5e-27/NY; 60/103/3.6e-10/Y; 332/417/4.7e-08/SLV |
| PD0028 | Asparagine-rich_protein_NRP1 [Alberti, et al. 2009 data] | 391/567/4.7e-56/N; 383/715/1.6e-48/ |
| PD0029 | SWI/SNF_chromatin-remodeling_complex_subunit_SWI1 [SWI+] | 4/322/3.2e-67/N 336/384/1.1e-36/Q; 913/1260/1.6e-20/ |
| PD0033 | U6_snRNA-associated_Sm-like_protein_Lsm4 [Alberti, et al. 2009 data] | 93/166/6.5e-26/N |
| PD0034 | Uncharacterized_protein_YBL081W [Alberti, et al. 2009 data] | 30/353/2.3e-48/ |
| PD0035 | Nuclear_and_cytoplasmic_polyadenylated_RNA-binding_protein_PUB1 [Alberti, et al. 2009 data] | 242/287/4.7e-26/NM; 419/452/1.5e-21/Q; 273/303/8.4e-11/M |
| PD0040 | Protein_URE2 [URE3] | 2/78/3.2e-25/N |
| PD0044 | Nitrogen_regulatory_protein_GLN3 [Alberti, et al. 2009 data] | 142/630/4.0e-59/ |
| PD0734 | G_protein-coupled_receptor_GPR1 [Alberti, et al. 2009 data] | 490/557/2.1e-59/N; 91/288/1.7e-14/IFNYW; 675/736/2.9e-12/KWY; 854/947/4.1e-07/SN |
| PD0920 | Mediator_of_RNA_polymerase_II_transcription_subunit_3_PGD1 [Alberti, et al. 2009 data] | 277/373/7.6e-29/QNM; 256/392/7.1e-14/N; 202/259/2.2e-08/AP |
| PD0921 | Uncharacterized_RNA-binding_protein_YPL184C [Alberti, et al. 2009 data] | 5/27/2.7e-27/N; 98/124/1.2e-10/Q |
| PD2094 | global_transcriptional_regulator_Sfp1 [ISP+] | 21/517/3.2e-34/ |
| PD0021 | General_transcriptional_corepressor_CYC8 [OCT+] | 492/586/2.2e-81/QA; 698/952/7.5e-30/ESTPNAQ; 14/29/2.9e-23/Q; 509/555/4.6e-14/A; 116/373/2.7e-08/YW |
| PD0022 | Uncharacterized_protein_YBR016W [Alberti, et al. 2009 data] | 40/100/6.2e-33/QYN; 5/96/1.0e-07/Y |
| PD0024 | Eukaryotic_peptide_chain_release_factor_GTP-binding_subunit_SUP35 [PSI+] | 4/122/1.8e-49/QYNG; 158/221/3.7e-16/EK; 12/112/2.3e-11/Y; 138/218/7.9e-10/K; 138/218/7.9e-10/K; 4/108/1.1e-08/N |
| PD0025 | RNQ1 [RNQ+]/[PIN+] | 123/402/5.6e-82/QNSGY; 185/402/6.5e-16/N; 7/344/7.1e-13/S; 50/381/1.3e-10/G |
| PD0027 | mRNA-binding_protein_PUF2 [Alberti, et al. 2009 data] | 909/1062/2.5e-51/N; 52/486/3.2e-31/SQTNP; 241/470/4.7e-10/Q; 710/756/5.8e-09/LTIN; 854/1015/2.8e-08/SQ; 55/254/6.2e-08/T; 140/476/8.5e-08/N |
| PD0031 | Zinc_finger_protein_YPR022C [Alberti, et al. 2009 data] | 158/319/8.1e-42/Q; 20/472/1.2e-15/NPS; 732/1040/2.8e-09/NI |
| PD0032 | transcription_factor_RLM1 [Alberti, et al. 2009 data] | 212/629/4.5e-55/ |
| PD0036 | transcriptional_activator/repressor_MOT3 [MOT3+] | 4/488/4.4e-45/NH; 7/34/3.0e-26/Q; 231/337/2.1e-10/P; 4/396/1.4e-09/H; 431/449/2.6e-07/AS |
| PD0037 | Serine/threonine-protein_kinase_KSP1 [Alberti, et al. 2009 data] | 538/939/4.7e-33/ |
| PD0038 | Nucleoporin_ASM4 [Alberti, et al. 2009 data] | 52/496/5.7e-25/ |
| PD0043 | Nucleoporin_NSP1 [Alberti, et al. 2009 data] | 1/515/8.1e-54/ |
| PD2217 | transcriptional_regulatory_protein_SAP30 [Alberti, et al. 2009 data] | 25/57/4.6e-30/N |
*The biases are in the following format: start point/end point/binomial P-value/bias signature. Residues contributing significantly to the bias are sorted in decreasing order of precedence, i.e., for the bias signature NSHIT, N is the main bias, S is the most important subsidiary bias, and so on. NS biases are in bold text. All biases with P-value< = 10−6 are listed.