| Literature DB >> 21347156 |
Lee T Sam1, George Michailidis.
Abstract
The availability of large-scale, genome-wide data about the molecular interactome of entire organisms has made possible new types of integrative studies, making use of rapidly accumulating knowledge of gene-disease associations. Previous studies have established the presence of functional biomodules in the molecular interaction network of living organisms, a number of which have been associated with the pathogenesis and progression of human disease. While a number of studies have examined the networks and biomodules associated with disease, the properties that contribute to the particular susceptibility of these subnetworks to disruptions leading to disease phenotypes have not been extensively studied. We take a machine learning approach to the characterization of these disease subnetworks associated with complex and single-gene diseases, taking into account both the biological roles of their constituent genes and topological properties of the networks they form.Entities:
Year: 2009 PMID: 21347156 PMCID: PMC3041579
Source DB: PubMed Journal: Summit Transl Bioinform ISSN: 2153-6430
| === Run information === |
| Scheme: weka.clusterers. SimpleKMeans -N 3 -S 10 |
| Relation: combined_data |
| Instances: 2944 |
| Attributes: 20 |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of singnal domains |
| average # transmembrane domains |
| average GC content |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficent |
| Ignored: |
| source |
| phenotype code |
| Test mode: Classes to clusters evaluation on training data |
| === Model and evaluation on training set === |
| kMeans |
| ====== |
| Number of iterations: 6 |
| Within cluster sum of squared errors: 1660.859140812153 |
Biological parameters only: dataset split into “disease” and “normal” classes
| Out of bag error: 0.0309 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2836 | 96.2988 % | |||
| Incorrectly Classified Instances | 109 | 3.7012 % | |||
| Kappa statistic | 0.8064 | ||||
| Mean absolute error | 0.1287 | ||||
| Root mean squared error | 0.216 | ||||
| Relative absolute error | 60.339 % | ||||
| Root relative squared error | 66.1667 % | ||||
| Total Number of Instances | 2945 | ||||
| 0.995 | 0.272 | 0.964 | 0.995 | 0.979 | GO |
| 0.728 | 0.005 | 0.956 | 0.728 | 0.827 | Disease |
Biological parameters only: dataset split into CD, SGD, and GO classes
| Out of bag error: 0.0309 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2832 | 96.163 % | |||
| Incorrectly Classified Instances | 113 | 3.837 % | |||
| Kappa statistic | 0.8008 | ||||
| Mean absolute error | 0.0893 | ||||
| Root mean squared error | 0.1801 | ||||
| Relative absolute error | 61.2569 % | ||||
| Root relative squared error | 66.7931 % | ||||
| Total Number of Instances | 2945 | ||||
| 0.165 | 0.003 | 0.565 | 0.165 | 0.255 | SGD |
| 0.867 | 0.001 | 0.992 | 0.867 | 0.925 | CD |
| 0.996 | 0.283 | 0.962 | 0.996 | 0.979 | GO |
Biological parameters only: SGD and GO classes
| Out of bag error: 0.0274 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2590 | 97.1129 % | |||
| Incorrectly Classified Instances | 77 | 2.8871 % | |||
| Kappa statistic | 0.1974 | ||||
| Mean absolute error | 0.0527 | ||||
| Root mean squared error | 0.1661 | ||||
| Relative absolute error | 91.1176 % | ||||
| Root relative squared error | 97.9961 % | ||||
| Total Number of Instances | 2667 | ||||
| 0.127 | 0.003 | 0.556 | 0.127 | 0.206 | SGD |
| 0.997 | 0.873 | 0.974 | 0.997 | 0.985 | GO |
Topological Parameters Only: dataset split into “disease” and “normal” classes
| Out of bag error: 0.0853 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2675 | 90.8628 % | |||
| Incorrectly Classified Instances | 269 | 9.1372 % | |||
| Kappa statistic | 0.4646 | ||||
| Mean absolute error | 0.1475 | ||||
| Root mean squared error | 0.2732 | ||||
| Relative absolute error | 69.1481 % | ||||
| Root relative squared error | 83.7012 % | ||||
| Total Number of Instances | 2944 | ||||
| 0.392 | 0.02 | 0.729 | 0.392 | 0.51 | Disease |
| 0.98 | 0.608 | 0.921 | 0.98 | 0.95 | GO |
Topological Parameters Only: dataset split into CD, SGD, and GO classes
| Out of bag error: 0.0832 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2688 | 91.30% | |||
| Incorrectly Classified Instances | 256 | 8.70% | |||
| Kappa statistic | 0.4863 | ||||
| Mean absolute error | 0.1016 | ||||
| Root mean squared error | 0.2241 | ||||
| Relative absolute error | 69.7015 % | ||||
| Root relative squared error | 83.1102 % | ||||
| Total Number of Instances | 2944 | ||||
| 0.038 | 0.004 | 0.214 | 0.038 | 0.065 | SGD |
| 0.493 | 0.011 | 0.83 | 0.493 | 0.619 | CD |
| 0.985 | 0.608 | 0.922 | 0.985 | 0.952 | GO |
Topological Parameters Only: SGD and GO classes
| Out of bag error: 0.0315 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2581 | 96.81% | |||
| Incorrectly Classified Instances | 85 | 3.19% | |||
| Kappa statistic | 0.0586 | ||||
| Mean absolute error | 0.0543 | ||||
| Root mean squared error | 0.1716 | ||||
| Relative absolute error | 93.8315 % | ||||
| Root relative squared error | 101.201 % | ||||
| Total Number of Instances | 2666 | ||||
| 0.038 | 0.003 | 0.25 | 0.038 | 0.066 | SGD |
| 0.997 | 0.962 | 0.971 | 0.997 | 0.984 | GO |
All parameters: dataset split into “disease” and “normal” classes
| Out of bag error: 0.0452 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2791 | 94.803 % | |||
| Incorrectly Classified Instances | 153 | 5.197 % | |||
| Kappa statistic | 0.7128 | ||||
| Mean absolute error | 0.1269 | ||||
| Root mean squared error | 0.2191 | ||||
| Relative absolute error | 59.5021 % | ||||
| Root relative squared error | 67.1287 % | ||||
| Total Number of Instances | 2944 | ||||
| 0.611 | 0.005 | 0.94 | 0.611 | 0.74 | Disease |
| 0.995 | 0.389 | 0.949 | 0.995 | 0.971 | GO |
All parameters: dataset split into CD, SGD, and GO classes
| Out of bag error: 0.0438 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2795 | 94.9389 % | |||
| Incorrectly Classified Instances | 149 | 5.0611 % | |||
| Kappa statistic | 0.7225 | ||||
| Mean absolute error | 0.0886 | ||||
| Root mean squared error | 0.1815 | ||||
| Relative absolute error | 60.7398 % | ||||
| Root relative squared error | 67.2984 % | ||||
| Total Number of Instances | 2944 | ||||
| 0.101 | 0.003 | 0.5 | 0.101 | 0.168 | SGD |
| 0.997 | 0.387 | 0.949 | 0.997 | 0.972 | GO |
| 0.752 | 0.001 | 0.986 | 0.752 | 0.853 | CD |
All parameters: SGD and GO classes
| Out of bag error: 0.0281 | |||||
|---|---|---|---|---|---|
| Correctly Classified Instances | 2591 | 97.1868 % | |||
| Incorrectly Classified Instances | 75 | 2.8132 % | |||
| Kappa statistic | 0.2332 | ||||
| Mean absolute error | 0.0498 | ||||
| Root mean squared error | 0.1594 | ||||
| Relative absolute error | 86.0831 % | ||||
| Root relative squared error | 93.9883 % | ||||
| Total Number of Instances | 2666 | ||||
| 0.152 | 0.003 | 0.6 | 0.152 | 0.242 | SGD |
| 0.997 | 0.848 | 0.975 | 0.997 | 0.986 | GO |
All parameters: SGD and CD classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Relation: OMIM-PhenoGO-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 357 |
| Attributes: 19 |
| source |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of signal domains |
| average # transmembrane domains |
| average GC content |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficient |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
Biological Parameters Only
| GO | SGD/OMIM | CD/PhenoGO | MeanDecreaseAccuracy | MeanDecreaseGini | |
|---|---|---|---|---|---|
| averageGeneStart | 0.2783482 | 1.0280783 | 0.9059960 | 0.2757494 | 84.56684 |
| averageGeneEnd | 0.2768157 | 0.9394527 | 0.8925733 | 0.2747467 | 82.32455 |
| averageLength | 0.2644807 | 1.2301754 | 0.9510359 | 0.2876197 | 89.97404 |
| averageGeneStrand | 0.1758904 | 0.1357294 | 0.9539724 | 0.2776031 | 63.51283 |
| averagePfamCount | 0.2730130 | 0.5254745 | 0.8856997 | 0.2717815 | 68.71366 |
| averagePrositeCount | 0.2732054 | 0.7780531 | 0.8667791 | 0.2729219 | 71.44485 |
| averageSingnalDomainCount | 0.2126032 | 1.1321489 | 0.9215645 | 0.2744301 | 46.04487 |
| averageTransmembraneDomainsCount | 0.2369126 | 0.7511460 | 0.9107138 | 0.2746473 | 41.26618 |
| averageGCContent | 0.2527932 | 1.1863229 | 0.9633071 | 0.2872784 | 90.52120 |
Topological Parameters Only
| GO | SGD/OMIM | CD/PhenoGO | MeanDecreaseAccuracy | MeanDecreaseGini | |
|---|---|---|---|---|---|
| observedEdgeFraction | 0.23001163 | 0.5940764 | 0.90312482 | 0.24675347 | 93.847995 |
| averageNodeDegree | 0.18907358 | −0.1896722 | 0.92494854 | 0.25118579 | 73.325193 |
| maxNodeDegree | 0.23248537 | −0.0195584 | 0.75146118 | 0.23964507 | 45.595834 |
| radius | 0.14363009 | 0.3341730 | 0.73797260 | 0.17620126 | 10.558500 |
| diameter | 0.16504637 | 0.3258433 | 0.89950612 | 0.21990106 | 24.283709 |
| nodeCount | 0.24716779 | 0.1174077 | 0.62814917 | 0.24756213 | 47.349672 |
| cyclicity | 0.07668406 | 0.1599157 | 0.05666838 | 0.08233893 | 2.229017 |
| biconnectivity | 0.05281318 | 0.2182699 | 0.47637630 | 0.10961336 | 3.538654 |
| clusteringCoefficent | 0.28966769 | 0.9925351 | 0.96101890 | 0.28810431 | 97.553541 |
Combined Parameterization
| GO | SGD/OMIM | CD/PhenoGO | MeanDecreaseAccuracy | MeanDecreaseGini | |
|---|---|---|---|---|---|
| averageGeneStart | 0.25577147 | 0.6187922 | 0.8782965 | 0.2631096 | 58.025555 |
| averageGeneEnd | 0.24189366 | 0.8649050 | 0.8823725 | 0.2517155 | 54.866536 |
| averageLength | 0.21860181 | 1.0476172 | 0.9157395 | 0.2702029 | 53.928221 |
| averageGeneStrand | 0.21222727 | 0.4779712 | 0.8899448 | 0.2613027 | 37.971447 |
| averagePfamCount | 0.24589871 | 0.7138733 | 0.8139401 | 0.2557329 | 51.837923 |
| averagePrositeCount | 0.24653767 | 0.8026352 | 0.8288924 | 0.2553449 | 51.873560 |
| averageSingnalDomainCount | 0.17608440 | 0.8725259 | 0.8494207 | 0.2504462 | 28.695867 |
| averageTransmembraneDomainsCount | 0.17643006 | 0.8016404 | 0.8587388 | 0.2398903 | 25.758543 |
| averageGCContent | 0.20630777 | 1.0249891 | 0.9042456 | 0.2621500 | 57.568889 |
| observedEdgeFraction | 0.22721854 | 0.9567640 | 0.8553682 | 0.2424491 | 39.992423 |
| averageNodeDegree | 0.24357245 | 0.6044357 | 0.8350696 | 0.2586311 | 33.451690 |
| maxNodeDegree | 0.23311884 | 0.5687222 | 0.7704013 | 0.2418791 | 23.089282 |
| radius | 0.19372018 | 0.5507024 | 0.5725879 | 0.1942285 | 9.303571 |
| diameter | 0.22263432 | 0.7683270 | 0.7232573 | 0.2295851 | 16.473967 |
| nodeCount | 0.23954530 | 0.7925986 | 0.8041791 | 0.2430081 | 25.501844 |
| cyclicity | 0.11050759 | 0.2201386 | 0.5157050 | 0.1559355 | 3.013125 |
| biconnectivity | 0.07642597 | 0.1890160 | 0.2993280 | 0.1074229 | 1.420956 |
| clusteringCoefficent | 0.26042896 | 1.4008805 | 0.8991804 | 0.2705517 | 61.914586 |
Biological parameters only: dataset split into “disease” and “normal” classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Instances: 1706 |
| Attributes: 10 |
| source |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of signal domains |
| average # transmembrane domains |
| average GC content |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
Biological parameters only: dataset split into CD, SGD, and GO classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Instances: 1706 |
| Attributes: 10 |
| source |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of signal domains |
| average # transmembrane domains |
| average GC content |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
Biological parameters only: SGD and GO classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 6 -S 1 |
| Relation: filtered_biological_2class_OMIM_omly-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 1428 |
| Attributes: 10 |
| source |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of signal domains |
| average # transmembrane domains |
| average GC content |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
Topological Parameters Only: dataset split into “disease” and “normal” classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Relation: filtered_2class_topological_data-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 1705 |
| Attributes: 10 |
| state |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficient |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
Topological Parameters Only: dataset split into CD, SGD, and GO classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Relation: filtered_3class_topological_data-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 1705 |
| Attributes: 10 |
| source |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficient |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
Topological Parameters Only: SGD and GO classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Relation: filtered_2class_omimonly_topological_data-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 1427 |
| Attributes: 10 |
| source |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficient |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
All parameters: dataset split into “disease” and “normal” classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Relation: filtered_combined_data_2class-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 1705 |
| Attributes: 19 |
| source |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of signal domains |
| average # transmembrane domains |
| average GC content |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficient |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
All parameters: dataset split into CD, SGD, and GO classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Relation: filtered_combined_data-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 1705 |
| Attributes: 19 |
| source |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of signal domains |
| average # transmembrane domains |
| average GC content |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficient |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
All parameters: SGD and GO classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Relation: filtered_combined_data_2class_omim_only-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 1427 |
| Attributes: 19 |
| source |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of signal domains |
| average # transmembrane domains |
| average GC content |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficient |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
All parameters: SGD and CD classes
| === Run information === |
| Scheme: weka.classifiers.trees. RandomForest -I 100 -K 4 -S 1 |
| Relation: filtered_OMIM-PhenoGO-weka.filters.unsupervised.attribute. Remove-R2 |
| Instances: 357 |
| Attributes: 19 |
| source |
| average gene start |
| average gene end |
| average length |
| average gene strand |
| average pfam count |
| average prosite count |
| average # of signal domains |
| average # transmembrane domains |
| average GC content |
| observed edges/total possible edges |
| average node degree |
| max node degree |
| radius |
| diameter |
| node count |
| cyclicity |
| biconnectivity |
| clustering coefficient |
| Test mode: 10-fold cross-validation |
| === Classifier model (full training set) === |
Unsupervised k-means clustering illustrates the poor separability of the data, with 1631 (55.4%) instances incorrectly clustered
| 59 | 4 | 16 | ||
| 1220 | 435 | 932 | ||
| 158 | 31 | 89 | ||
Classification of CD, SGD, and GO classes using all variables
| Correctly Classified Instances | 2795 | 94.94 % | |||
| Incorrectly Classified Instances | 149 | 5.06% | |||
| TP Rate | FP Rate | Precision | Recall | f-Measure | class |
| 0.101 | 0.003 | 0.5 | 0.101 | 0.168 | SGD |
| 0.997 | 0.387 | 0.949 | 0.997 | 0.972 | GO |
| 0.752 | 0.001 | 0.986 | 0.752 | 0.853 | CD |