| Literature DB >> 12540298 |
Lev A Soinov1, Maria A Krestyaninova, Alvis Brazma.
Abstract
BACKGROUND: Microarray experiments are generating datasets that can help in reconstructing gene networks. One of the most important problems in network reconstruction is finding, for each gene in the network, which genes can affect it and how. We use a supervised learning approach to address this question by building decision-tree-related classifiers, which predict gene expression from the expression data of other genes.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12540298 PMCID: PMC151290 DOI: 10.1186/gb-2003-4-1-r6
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1The decision tree for gene CLN2 of S. cerevisiae. Here CLN2 is the predicted gene; SWI5, CLN1 and CDC28 are the explaining genes. Expression thresholds of the respective explaining genes mark all the arcs.
The list of genes considered
| ORF | Gene name | Description |
| YMR199W | Cyclin, G1/S-specific | |
| YPL256C | Cyclin, G1/S-specific | |
| YAL040C | Cyclin, G1/S-specific | |
| YGR108W | Cyclin, G2/M-specific | |
| YPR119W | Cyclin, G2/M-specific | |
| YLR210W | Cyclin, G2/M-specific | |
| YPR120C | Cyclin, B-type | |
| YGR109C | Cyclin, B-type | |
| YMR043W | Transcription factor of the MADS box family | |
| YLR079W | Inhibitor of Cdc28p-Clb protein kinase complex | |
| YLR182W | Transcription factor, subunit of SBF and MBF factors | |
| YBR160W | Cyclin-dependent protein kinase | |
| YDL132W | Controls G1/S transition, component of SCF-ubiquitine ligase complexes | |
| YDL056W | Transcription factor, subunit of the MBF factor | |
| YDR054C | E2 ubiquitin-conjugating enzyme | |
| YDR146C | Transcription factor | |
| YDR328C | Core component of SCF-ubiquitin ligase complexes | |
| YER111C | Transcription factor, subunit of SBF factor | |
| YGL116W | Cell division control protein | |
| YGL003C | Substrate-specific activator of APC-dependent proteolysis |
Classification rules
| Gene name | 'Simultaneous' rules | Supporting information | 'Time delay' and 'changes' rules | Supporting information |
| - | [ | - | - | |
| - | [ | |||
| +CLN2⇔+CLN1 | [ | - | - | |
| -CDC20⇔+CLN1 | ||||
| - | [ | ± | [ | |
| + | [ | [ | ||
| ± | [ | + | [ | |
| ↑↓ | [ | |||
| - | [ | + | [ | |
| ↑↓ | [ | |||
| + | [ | - | - | |
| + | [ | - | - | |
| [ | ↑↓ | [ | ||
| + | [ | - | - | |
| + | [ | ↑↓ | [ | |
| - | - | ↑ | [ | |
| ± | [ | |||
| - | ||||
| - | - | + | [ | |
| - | ||||
| - | - | ↑↓ | [ |
Classification rules with high accuracy in all three accuracy tests (CV-10 and cdc28, alpha-factor test sets) are shown in the upper part of the table. Questionable rules (see Classification rules for explanation) are shown in the lower part of the table.
Accuracy of final classifiers for 'simultaneous' events in the cdc15 dataset
| Gene name | ||||||
| α | α | |||||
| 82.8% | 76.5% | 94.4% | 91.1% | 76.5% | 94.4% | |
| 69.9% | 88.2% | 77.8% | 73.5% | 88.2% | 77.8% | |
| 95.8% | 88.2% | 88.9% | 95.3% | 88.2% | 88.9% | |
| 95.8% | 88.2% | 77.8% | 95.0% | 88.2% | 77.8% | |
| 76.0% | 94.1% | 83.3% | 76.0% | 94.1% | 83.3% | |
| 83.7% | 88.2% | 88.9% | 84.4% | 88.2% | 88.9% | |
| 73.7% | 88.2% | 83.3% | ||||
Estimates are shown for C4.5 by Quinlan with wrappers by Kohavi on the cdc15 dataset with continuous features discretized by the Fayyad and Irani method. 10-CV, 10-fold cross-validation; cdc28 and α, accuracy estimations where cdc8 and alpha-factor datasets were used as test sets. See Materials and methods for the algorithm description.
Figure 2The network of gene interactions constructed using the decision rules for the cdc15 dataset (see Table 2). The network is a graphical representation of the information comprised in the extracted decision rules. Every node in this graph represents a gene and every arc indicates the relation between the genes defined by the corresponding decision rule. Note the existence of two separate modules in the constructed network.