| Literature DB >> 27454166 |
Emad Ramadan1, Sadiq Alinsaif2, Md Rafiul Hassan2.
Abstract
BACKGROUND: Massive biological datasets are generated in different locations all over the world. Analysis of these datasets is required in order to extract knowledge that might be helpful for biologists, physicians and pharmacists. Recently, analysis of biological networks has received a lot of attention, as an understanding of the network can reveal information about life at the cellular level. Biological networks can be generated that examine the interaction between proteins or the relationship amongst different genes at the expression level. Identifying information from biological networks is recognized as a significant challenge, due to the inherent complexity of the structures. Computational techniques are used to analyze such complex networks with varying success.Entities:
Keywords: Biological networks; Machine learning; Phenotype-gene association
Mesh:
Substances:
Year: 2016 PMID: 27454166 PMCID: PMC4965731 DOI: 10.1186/s12859-016-1095-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Topological measures
| Degree–based measurements | Degree |
| Coreness | |
| Clustering coefficient | |
| Shortest–path–based measurements | Betweenness |
| Closeness | |
| Proximity prestige | |
| Bary center score | |
| Eigenvector–based measurements | Eigenvector centrality |
| Katz status index | |
| Subgraph–based measurements | Subgraph centrality |
| Within–module | |
| Random–walk–based measurements |
|
| Social–capital–based measurements | Structural holes |
Fig. 1SMOTE’d data example (sample data)
Comparison of classification results which adapt SMOTE sampling
| Classifier | # Metric | Public networks | ||
|---|---|---|---|---|
| GCN | PPI | FI | ||
| DTB | ACC | .89±.02 | .88±.02 | .90±0.02 |
| F | .89±.02 | .89±.02 | .90±0.02 | |
| AUC | .89±.02 | .88±.02 | .90±0.02 | |
| G-Means | .89±.02 | .88±.02 | .90±0.02 | |
| RUSBOOST | ACC | .80±.04 | .82±.02 | .82±0.03 |
| F | .80±.04 | .83±.02 | .82±0.02 | |
| AUC | .80±.04 | .82±.02 | .82±0.03 | |
| G-Means | .80±.04 | .82±.02 | .81±0.03 | |
Fig. 2Comparison of classification results on the FI network data
Feature importance analysis: Accuracy, Gini Index, and the combined score are listed
| Topological measures | Accuracy | Gini index | Combined score |
|---|---|---|---|
| Structural holes | 0.3081 | 579.9545 | 13.37 |
| Degree | 0.3088 | 578.1108 | 13.36 |
| Coreness | 0.3056 | 474.7823 | 12.05 |
|
| 0.2958 | 371.1454 | 10.47 |
| Subgraph centrality | 0.3032 | 354.3712 | 10.36 |
| Within–module | 0.2704 | 291.5019 | 8.88 |
| Katz status index | 0.2882 | 259.1472 | 8.64 |
| Closeness | 0.2943 | 227.2495 | 8.18 |
| Proximity prestige | 0.2962 | 222.5109 | 8.12 |
| Eigenvector centrality | 0.2834 | 230.7507 | 8.09 |
| Betweenness | 0.2731 | 230.3441 | 7.93 |
| Bary center score | 0.2742 | 118.4802 | 5.70 |
| Clustering coefficient | 0.0632 | 0.3585 | 0.15 |
List of some genes that are misclassified by the method as breast cancer related genes
| Gene symbol | Gene name | OMIM disease |
|---|---|---|
|
| CD4 molecule | CD4+ lymphocyte deficiency |
|
| amyloid beta (A4) precursor protein | Alzheimer disease 1, Amyloidosis, Dementia, early-onset progressive, autosomal recessive, |
|
| cyclin-dependent kinase 2 | A novel susceptibility locus for type 1 diabetes. |
|
| fibronectin 1 | Glomerulopathy with fibronectin deposits. |
|
| interferon regulatory factor 1 | Gastric cancer, Macrocytic anemia, Myelodysplastic syndrome, preleukemic, Myelogenous leukemia, acute, Nonsmall cell lung cancer. |
|
| presenilin 1 | Alzheimer disease, Cardiomyopathy, Pick disease. |
|
| signal transducer and activator of transcription 1 | Mycobacterial infection, atypical, familial disseminated. |
|
| solute carrier family 25 | Micochondrial phosphate carrier deficiency. |
|
| son of sevenless homolog 1 | Fibromatosis, gingival, Noonan syndrome 4. |