| Literature DB >> 21226956 |
Alex A Freitas1, Olga Vasieva, João Pedro de Magalhães.
Abstract
BACKGROUND: The ageing of the worldwide population means there is a growing need for research on the biology of ageing. DNA damage is likely a key contributor to the ageing process and elucidating the role of different DNA repair systems in ageing is of great interest. In this paper we propose a data mining approach, based on classification methods (decision trees and Naive Bayes), for analysing data about human DNA repair genes. The goal is to build classification models that allow us to discriminate between ageing-related and non-ageing-related DNA repair genes, in order to better understand their different properties.Entities:
Mesh:
Year: 2011 PMID: 21226956 PMCID: PMC3031233 DOI: 10.1186/1471-2164-12-27
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Area under ROC curve (AUC, in %) for J48 algorithm, for different datasets and different values of the GO term occurrence threshold (t)
| Dataset Id | PPI-related attributes | t = 3 (301 GO terms) | t = 7 (157 GO terms) | t = 11 (101 GO terms) |
|---|---|---|---|---|
| D1 | none | 63.0 | 68.0 | 65.3 |
| D2 | #partners | 66.1 | 63.3 | 59.6 |
| D3 | #partners + 10 BPI attr's | 72.3 | 74.2 | 75.4 |
| D4 | #partners + 20 BPI attr's | 73.5 | 74.6 | |
| D5 | #partners + 30 BPI attr's | 79.2 | 67.7 | 77.5 |
Area under ROC curve (AUC, in %) for Naive Bayes, for different datasets and different values of the GO term occurrence threshold (t)
| Dataset Id | PPI-related attributes | t = 3 (301 GO terms) | t = 7 (157 GO terms) | t = 11 (101 GO terms) |
|---|---|---|---|---|
| D1 | none | 75.9 | 74.9 | 71.9 |
| D2 | #partners | 76.0 | 75.3 | 74.0 |
| D3 | #partners + 10 BPI attr's | 78.3 | 77.1 | 76.6 |
| D4 | #partners + 20 BPI attr's | 80.5 | 80.1 | 79.4 |
| D5 | #partners + 30 BPI attr's | 80.7 | 80.2 | |
Frequency of occurrence as root node in decision tree built by J48
| Attribute | Frequency |
|---|---|
| WRN_interaction | 6 (out of 6) |
| XRCC5_interaction | 2 (out of 9) |
| #partners | 2 (out of 12) |
| GO:0009719 (response to endogenous stimulus) | 3 (out of 5) |
| GO:0042221 (response to chemical stimulus) | 2 (out of 15) |
Figure 1Network of genes/proteins and biological processes associated with the ageing-related patterns discovered via data mining. Pink links connect proteins to the process of double-strand DNA break repair, green links connect proteins to the process of telomere maintenance, dark blue to T cell development, light blue to V(D)J recombination, and yellow to apoptosis. Figure generated through the use of Ingenuity Pathways Analysis.
Figure 2Summary of the procedure for creating a set of predictor attributes involving GO terms. First, a list of gene IDs is used to download from UniProt the specific GO terms annotated for each gene. Next, information about GO term definitions is used to select only the biological process (BP) terms for each gene, and then to find the ancestors of those terms in the GO hierarchy. (The notation "anc(term1)" denotes the set of all ancestors of term1, "anc1(term1)" denotes the first ancestor of term 1, etc.) After adding those ancestor GO terms to the list of GO terms per gene, the dataset is transformed into a format having a fixed-length list of binary attributes (representing GO terms) for each gene, where each attribute value indicates whether or not the gene is annotated with the corresponding GO term.