| Literature DB >> 20444873 |
Nadav Rappoport1, Menachem Fromer, Regev Schweiger, Michal Linial.
Abstract
Derivation of biological meaning from large sets of proteins or genes is a frequent task in genomic and proteomic studies. Such sets often arise from experimental methods including large-scale gene expression experiments and mass spectrometry (MS) proteomics. Large sets of genes or proteins are also the outcome of computational methods such as BLAST search and homology-based classifications. We have developed the PANDORA web server, which functions as a platform for the advanced biological analysis of sets of genes, proteins, or proteolytic peptides. First, the input set is mapped to a set of corresponding proteins. Then, an analysis of the protein set produces a graph-based hierarchy which highlights intrinsic relations amongst biological subsets, in light of their different annotations from multiple annotation resources. PANDORA integrates a large collection of annotation sources (GO, UniProt Keywords, InterPro, Enzyme, SCOP, CATH, Gene-3D, NCBI taxonomy and more) that comprise approximately 200,000 different annotation terms associated with approximately 3.2 million sequences from UniProtKB. Statistical enrichment based on a binomial approximation of the hypergeometric distribution and corrected for multiple hypothesis tests is calculated using several background sets, including major gene-expression DNA-chip platforms. Users can also visualize either standard or user-defined binary and quantitative properties alongside the proteins. PANDORA 4.2 is available at http://www.pandora.cs.huji.ac.il.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20444873 PMCID: PMC2896089 DOI: 10.1093/nar/gkq320
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Number of sequences supported by older and new versions of PANDORA, for model organism representatives
| Species | PANDORA 2.0 | PANDORA 3.0 | PANDORA 4.2 |
|---|---|---|---|
| 8507 | 47 641 | 106 529 | |
| 5678 | 41 813 | 61 783 | |
| 2049 | 22 603 | 27 942 | |
| 1680 | 39 367 | 46 671 | |
| 153 | 8434 | 11 029 | |
| Total proteins | 114 033 | 1 072 911 | 3 188 835 |
Sample of the supported annotations and their coverage in PANDORA database
| Annotation resource | Percentage coverage | Number of annotations |
|---|---|---|
| ENZYME (10/2006) | 8 | 5010 |
| GENE3D (3.0) | 8 | 410 |
| SMART (5.0) | 15 | 704 |
| CATH (v3.1.0) | 19 | 3301 |
| GO (6/2006) | 22 | 13 603 |
| SCOP (1.71) | 25 | 6039 |
| PFAM (19.0) | 73 | 8534 |
| UniProt (8.1) | 78 | 879 |
| InterPro (12.1) | 78 | 13 147 |
| NCBI Taxonomy | 100 | 283 050 |
Figure 1.Result page from PANDORA analysis on a user set. The set of 15 proteins was included in the input set (marked as Basic Set, BS). (A) Approximately 40 annotation resources are selected by the user from a menu, multiple selections are encouraged. (B) Sample of the keywords annotation source color-coded by their types. (C) PANDORA graph for a user set associated with quantitative properties of user-input expression levels (red to yellow) and pre-calculated pI (blue to green). (D) Summary and statistics for the quantitative data of the analyzed protein set. (E) Distribution histogram of the expression range for a node. (F) Table of the statistical significance of the annotations, including a correction for multiple hypothesis test.