| Literature DB >> 21272300 |
Michelle S Scott1, François-Michel Boisvert, Angus I Lamond, Geoffrey J Barton.
Abstract
BACKGROUND: Although primarily known as the site of ribosome subunit production, the nucleolus is involved in numerous and diverse cellular processes. Recent large-scale proteomics projects have identified thousands of human proteins that associate with the nucleolus. However, in most cases, we know neither the fraction of each protein pool that is nucleolus-associated nor whether their association is permanent or conditional.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21272300 PMCID: PMC3038921 DOI: 10.1186/1471-2164-12-74
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Protein nucleolar association classes considered. PNAC classifies human proteins into four distinct classes according to their degree of nucleolar association. The nucleolar-enriched protein group (red) consists of proteins that are predominantly nucleolar in all cell types and conditions. The nucleolar-nucleoplasmic group (purple) is composed of proteins identified in both the nucleolus and any other nuclear region. The nucleolar-cytoplasmic group (blue) consists of cytoplasmic proteins that also can localise to the nucleolus. The non-nucleolar group (yellow) comprises all proteins that never localise to the nucleolus. The non-nucleolar proteins can localise to other regions of the nucleus, the cytoplasm, plasma membrane or extracellularly.
Features considered in the prediction of nucleolar association
| Features | Data source | Description | Bins |
|---|---|---|---|
| Amino acid frequency | Protein sequences from IPI [ | PNAC considers the relative proportion of leucine, isoleucine, lysine and serine residues | 5 bins for each distinct amino acid considered |
| Targeting motifs | Phobius [ | The predicted presence of signal peptides, transmembrane domains (TMDs), mitochondrial targeting peptides and nucleolar localisation sequences (NoLSs) | 9 bins detailed in the Methods |
| Gene co-expression | GDS596 from the Gene Expression Omnibus [ | The average Pearson correlation of expression between the query protein and proteins in the nucleolar-cytoplasmic training group using expression profiles from 79 physiologically normal tissues [ | 5 bins |
| GO | EBI Gene Ontology (GO) annotations [ | Biological process and molecular function Gene Ontology (GO) annotations for the query protein are compared to those of the training set proteins | 4 bins |
| Subcellular localisation of interactors | HPRD [ | A nucleolar proximity score is calculated for all the interactors of the query protein | 5 bins |
Tests of accuracy
| Class | Priors | Counts | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Test set 1 | Test set 2 | Test set 3 | Test set 1 | Test set 2 | Test set 3 | |||||
| Nucleolar-enriched | 0.20 | 30 | 15 | 52 | 0.72 | 0.78 | 0.56 | 0.68 | 0.61 | 0.47 |
| Nucleolar-nucleoplasmic | 0.15 | 22 | 7 | 24 | 0.73 | 0.57 | 0.50 | 0.63 | 0.60 | 0.41 |
| Nucleolar-cytoplasmic | 0.15 | 24 | 16 | 63 | 0.77 | 0.69 | 0.59 | 0.70 | 0.74 | 0.53 |
| Non-nucleolar | 0.50 | 200 | 100 | 344 | 0.90 | 0.91 | 0.81 | 0.94 | 0.93 | 0.87 |
a Test set 1: leave-one-out cross-validation test
b Test set 2: independent literature-based test
c Test set 3: SILAC experimental independent test
d These sensitivity measures represent average sensitivity values over ten runs. Their standard deviations were all below 0.03.
e PPV: positive predictive value. These measures represent average PPV values over ten runs. Their standard deviations were all below 0.1.
Figure 2Generation of the training and testing sets. Two datasets were used to generate the training and testing sets. A manually curated literature-based nucleolar association dataset (blue list) was used to construct the training set (which is also used in the leave-one-out cross validation test and referred to as the test set 1) and a non-overlapping independent literature-based test set (test set 2). An experimental SILAC dataset (red list) was used to construct the independent SILAC-derived test set (test set 3). The intersection of the manually curated literature dataset (blue list) and the experimental SILAC dataset (red list) is shown in purple and was used to map the SILAC data points to our nucleolar association groups to create the SILAC test set. The generation of the training and testing sets is described in more detail in the Methods section.
Examples of disagreements between SILAC classification and our predictions
| Accession | Name | SILAC classification | PNAC classification | Experimental observations from literature |
|---|---|---|---|---|
| NP_006588 | HSPA8 | Non-nucleolar (highly cytoplasmic) | Nucleolar-cytoplasmic | Usually cytoplasmic but accumulates in nucleoli after heat-shock [ |
| NP_001013 | RPS19 | Non-nucleolar (highly cytoplasmic) | Nucleolar-cytoplasmic | Ribosomal protein which accumulates in the nucleolus [ |
| NP_919223 | HNRNPA3 | Non-nucleolar (mainly nucleoplasmic) | Nucleolar-enriched | The Human Proteome Atlas finds it in the nucleolus, nucleus and cytoplasm [ |
| NP_002120 | HMGB2 | Non-nucleolar (mainly cytoplasmic but also nucleoplasmic) | Nucleolar-nucleoplasmic | The Human Proteome Atlas finds it to be strongly nucleolar [ |
| NP_061185 | RCC2 | Mainly nucleoplasmic | Nucleolar-nucleoplasmic | Annotated in Uniprot as nucleolar, cytoplasmic and centromere |
| NP_002408 | Antigen KI-67 | Highly enriched in nucleolus | Nucleolar-nucleoplasmic | Annotated in Uniprot as predominantly perinucleolar in G1 and in later phases predominantly localised in the nuclear matrix [ |
Figure 3Reliability analysis. The sensitivity (panel A) and positive predictive value (PPV; panel B) are plotted as a function of the minimum reliability score for all four classes considered. The error bars represent standard deviation over 10 independent runs.
Most abundant biological process GO annotations of nucleolar-associated proteins with reliability index above 10
| Biological process GO term | |
|---|---|
| RNA metabolic process (GO:0016070) | 163 |
| of which rRNA processing (GO:0006364) | 35 |
| tRNA processing (GO:0008033) | 27 |
| Transcription, DNA-dependent (GO:0006351) | 35 |
| Cellular component organization (GO:0016043) | 81 |
| Ribosome biogenesis (GO:0042254) | 50 |
| of which rRNA processing (GO:0006364) | 35 |
| Regulation of biological process (GO:0050789) | 45 |
| Nucleobase, nucleoside, nucleotide and nucleic acid metabolic process (GO:0006139) | 118 |
| of which DNA repair (GO:0006281) | 52 |
| DNA replication (GO:0006260) | 43 |
| Regulation of biological process (GO:0050789) | 107 |
| of which Regulation of transcription, DNA-dependent (GO:0006355) | 35 |
| Signal transduction (GO:0007165) | 36 |
| Cellular component organization (GO:0016043) | 92 |
| of which Chromosome organisation (GO:0051276) | 68 |
| Cell cycle (GO:0007049) | 89 |
| Multicellular organismal development (GO:0007275) | 40 |
| Cell proliferation (GO:0008283) | 39 |
| Cell death (GO:0008219) | 27 |
| Protein metabolic process (GO:0019538) | 127 |
| of which Translation (GO:0006412) | 106 |
| Nucleobase, nucleoside, nucleotide and nucleic acid metabolic process (GO:0006139) | 33 |
| Regulation of biological process (GO:0050789) | 27 |
| of which Signal transduction (GO:0007165) | 10 |
| Cellular component organisation (GO:0016043) | 18 |
| of which Organelle organisation (GO:0006996) | 11 |
a count includes proteins annotated with child terms
Figure 4Conservation analysis of nucleolar-associated proteins. For each nucleolar-association group and the non-nucleolar group, the fraction of proteins with orthologues in a given eukaryotic organism was examined. Only proteins with reliability index above 10 were considered. The error bars represent standard deviation as estimated using a bootstrap procedure.