| Literature DB >> 17493290 |
James L Chen1, Yang Liu, Lee T Sam, Jianrong Li, Yves A Lussier.
Abstract
BACKGROUND: Biological data that are well-organized by an ontology, such as Gene Ontology, enables high-throughput availability of the semantic web. It can also be used to facilitate high throughput classification of biomedical information. However, to our knowledge, no evaluation has been published on automating classifications of human diseases genes using Gene Ontology. In this study, we evaluate automated classifications of well-defined human disease genes using their Gene Ontology annotations and compared them to a gold standard. This gold standard was independently conceived by Valle's research group, and contains 923 human disease genes organized in 14 categories of protein function.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17493290 PMCID: PMC1892104 DOI: 10.1186/1471-2105-8-S3-S7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
GO terms mapped to Valle's HDG functional categories
| Enzyme | GO:0050662 | mf:coenzyme binding |
| Enzyme | GO:0019899 | mf:enzyme binding |
| Enzyme | GO:0050790 | bp:regulation of enzyme activity |
| Enzyme | GO:0003824 | mf: catalytic activity |
| Enzyme | GO:0016591 | cc: DNA-directed RNA polymerase II, holoenzyme |
| Enzyme | GO:0005697 | cc: telomerase holoenzyme complex |
| Enzyme | GO:0017101 | cc: aminoacyl-tRNA synthetase multienzyme complex |
| Modulator protein function | GO:0003754 | mf: chaperon activity |
| Modulator protein function | GO:0003757 | mf: chaperon activity |
| Modulator protein function | GO:0003758 | mf: chaperon activity |
| Modulator protein function | GO:0003760 | mf: chaperon activity |
| Modulator protein function | GO:0003761 | mf: chaperon activity |
| Modulator protein function | GO:0003767 | mf: co-chaperon activity |
| Modulator protein function | GO:0003768 | mf: co-chaperon activity |
| Modulator protein function | GO:0003769 | mf: co-chaperon activity |
| Modulator protein function | GO:0003770 | mf: co-chaperon activity |
| Modulator protein function | GO:0003771 | mf: co-chaperon activity |
| Modulator protein function | GO:0016238 | bp: chaperone-related autophagy |
| Modulator protein function | GO:0007022 | bp: chaperone-mediated tubulin folding |
| Modulator protein function | GO:0007023 | bp: post-chaperonin tubulin folding pathway |
| Modulator protein function | GO:0006462 | bp: protein complex assembly, multichaperone pathway |
| Modulator protein function | GO:0016465 | cc:chaperonin ATPase complex |
| Modulator protein function | GO:0005832 | cc: chaperonin-containing T-complex |
| Receptor | GO:0004872 | mf: receptor activity |
| Receptor | GO:0005102 | mf: receptor binding |
| Receptor | GO:0007166 | bp: cell surface receptor linked signal transduction |
| Receptor | GO:0005057 | mf: receptor signaling protein activity |
| Transcription factor | GO:0003700 | mf: transcription factor activity |
| Transcription factor | GO:0000130 | mf: transcription factor activity |
| Transcription factor | GO:0005667 | cc: transcription factor complex |
| Transcription factor | GO:0042990 | bp: regulation of transcription factor-nucleus import |
| Transcription factor | GO:0042991 | bp: transcription factor-nucleus import |
| Intracellular matrix component | GO:0005622 | cc: intracellular |
| Intracellular matrix component | GO:0046907 | bp: intracellular transport |
| Intracellular matrix component | GO:0008092 | mf: cytoskeletal protein binding |
| Extracellular matrix component | GO:0007160 | bp: cell-matrix adhesion |
| Extracellular matrix component | GO:0009989 | bp: cell-matrix recognition |
| Extracellular matrix component | GO:0005578 | cc: extracellular matrix |
| Extracellular matrix component | GO:0005921 | cc: gap junction |
| Extracellular matrix component | GO:0030055 | cc: cell-matrix junction |
| Extracellular matrix component | GO:0005201 | mf: extracellular matrix structural constituent |
| Extracellular matrix component | GO:0050840 | mf: extracellular matrix binding |
| Transmembrane transporter | GO:0005215 | mf: transporter activity |
| Transmembrane transporter | GO:0000036 | mf: acyl carrier activity |
| Transmembrane transporter | GO:0019793 | mf: ISG15 carrier activity |
| Channel | GO:0015267 | mf: channel/pore class transporter activity |
| Channel | GO:0008282 | cc: ATP-sensitive potassium channel complex |
| Channel | GO:0005891 | cc: voltage-gated calcium channel complex |
| Channel | GO:0016935 | cc: glycine-gated chloride channel complex |
| Channel | GO:0019183 | cc: histamine-gated chloride channel complex |
| Channel | GO:0005892 | cc: nicotinic acetylcholine-gated receptor-channel complex |
| Channel | GO:0008076 | cc: voltage-gated potassium channel complex |
| Channel | GO:0001518 | cc: voltage-gated sodium channel complex |
| Hormone | GO:0042562 | mf: hormone binding |
| Hormone | GO:0005179 | mf: hormone activity |
| Hormone | GO:0005131 | mf: growth hormone receptor binding |
| Hormone | GO:0046879 | bp: hormone secretion |
| Hormone | GO:0009725 | bp: response to hormone stimulus |
| Hormone | GO:0009755 | bp: hormone mediated signaling |
| Hormone | GO:0005831 | cc: steroid hormone aporeceptor complex |
| Hormone | GO:0016914 | cc: follicle-stimulating hormone complex |
| Immunoglobulin | GO:0019865 | mf: immunoglobulin binding |
| Immunoglobulin | GO:0019763 | mf: immunoglobulin receptor activity |
| Immunoglobulin | GO:0048305 | bp: IG secretion |
| Immunoglobulin | GO:0045190 | bp: isotype switching |
| Immunoglobulin | GO:0019814 | cc: immunoglobulin complex |
| Cell signalling | GO:0019955 | mf: cytokine binding |
| Cell signalling | GO:0005125 | mf: cytokine activity |
| Cell signalling | GO:0019221 | bp: cytokine and chemokine mediated signaling pathway |
| Cell signalling | GO:0042089 | cytokine biosynthesis |
| Cell signalling | GO:0019838 | mf: growth factor binding |
| Cell signalling | GO:0008083 | mf: growth factor activity |
Comparison of GO mapping classification to Valle's categories
| 20 | 11 | 9 | 20 | 35% | 55% | |
| 32 | 25 | 7 | 15 | 61% | 78% | |
| 232 | 208 | 24 | 107 | 66% | 90% | |
| 54 | 41 | 13 | 25 | 62% | 76% | |
| 14 | 14 | 0 | 1 | 93% | 100% | |
| 4 | 1 | 3 | 2 | 33% | 25% | |
| 50 | 23 | 27 | 207 | 10% | 46% | |
| 105 | 4 | 101 | 5 | 44% | 4% | |
| 86 | 82 | 4 | 96 | 46% | 95% | |
| 79 | 62 | 17 | 10 | 85% | 78% | |
| 35 | 42 | 3 | 102 | 29% | 93% | |
* Due to the possible assignment of more than one GO term per gene, we overestimated the overall FP rate as one gene classified multiple times due to multiple GO annotations lead to counting as a FP the same gene more than once.
** Overall accuracy scores are calculated from the overall True positive, false positive and false negative scores (they are not an average of the categorical accuracy scores)
Figure 2Frequency distribution of the proteins associated to GO annotations. Note that in order to construct the GO mappings, all genes in GO terms subsumed by each of the 74 selected GO classes in the GO ontology were aggregated in this higher level class, thus accounting for the large number of GO terms per class in GO mappings. In contrast, the GO-Cluster method retains the original granularity of the GO mappings and allows for GO terms not subsumed with the selected 74 GO classes selected for GO Mappings.
Comparison of GO clustering classification to Valle's categories
| 20 | 0 | 20 | 0 | N/A | 0% | |
| 32 | 28 | 4 | 7 | 34% | 88% | |
| 232 | 143 | 89 | 15 | 85% | 61% | |
| 54 | 17 | 37 | 21 | 63% | 31% | |
| 14 | 0 | 14 | 0 | N/A | 0% | |
| 4 | 0 | 4 | 0 | N/A | 0% | |
| 50 | 42 | 8 | 10 | 43% | 84% | |
| 105 | 28 | 77 | 231 | 25% | 27% | |
| 86 | 58 | 28 | 10 | 73% | 67% | |
| 79 | 68 | 11 | 4 | 73% | 86% | |
| 35 | 15 | 20 | 42 | 32% | 43% | |
** Overall accuracy scores are calculated from the overall True positive, false positive and false negative scores (they are not an average of the categorical accuracy scores)
N/A: not applicable because precision cannot be calculated for categories with 0 true positive results and 0 false positives (zero divided by zero).
Correlations of 14 clusters generated by the GO clustering method and Valle's categories of Human Disease Genes
| 1 | 3 | 4 | 2 | 8 | 4 | 2 | 1 | 1 | 8 | 2 | 2 | 2 | ||||
| 2 | 4 | 10 | 4 | 5 | 2 | 2 | 7 | 1 | 6 | |||||||
| 3 | 1 | 7 | 2 | 1 | ||||||||||||
| 4 | 7 | 11 | 2 | 1 | 1 | 6 | ||||||||||
| 5 | 11 | 1 | 2 | |||||||||||||
| 6 | 4 | |||||||||||||||
| 7 | 19 | 1 | ||||||||||||||
| 8 | 1 | 1 | 3 | 1 | 1 | 3 | ||||||||||
| 9 | 5 | 16 | 4 | 17 | 8 | 6 | 3 | 8 | 17 | 7 | 2 | |||||
| 10 | 4 | 2 | 5 | 5 | 3 | 3 | 1 | 1 | 3 | |||||||
| 11 | 8 | 5 | 1 | 2 | 2 | 2 | 14 | 3 | ||||||||
| 12 | 3 | 1 | 1 | 2 | 1 | |||||||||||
| 13 | 1 | 2 | 1 | |||||||||||||
| 14 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | |||||||||
Mapping of 14 clusters to 14 of Valle's classifications of HDGs. Numbers in the table denote the count of HDGs in each category. By design, multiple clusters could map to a protein function category, but each cluster could not be mapped to more than one category. The bold underlined numbers represent the true positive HDG and the selected Valle Category chosen for each GO Cluster. Other numbers in the cluster are considered as false positive in the evaluation. Valle's categories "unknown" and "others" were not evaluated because of their ambiguity.
Figure 1Precision-recall graph for comparing accuracies of GO Mapping and GO Clustering methods. The data points represented by a larger circle and square with empty centers correspond to the overall accuracy scores of the two methods. Additional points on the precision-recall curve were obtained by progressively removing classes with poor precision from the evaluated set. Vale's Human Disease Gene annotations was used as a Gold Standard to calculate precision and recall, the task being to recapitulate Valle's categorization of human disease genes via GO Mapping or GO Clustering as described in the methods. Note that between 30%–55% recall, the GO Clustering Method provides higher precision than the GO Mapping. Overall, GO Mapping provides about 15% higher recall, but 10% lower precision than GO Clustering.