| Literature DB >> 24001106 |
Madhavi K Ganapathiraju1, Naoki Orii.
Abstract
BACKGROUND: Advances in biotechnology have created "big-data" situations in molecular and cellular biology. Several sophisticated algorithms have been developed that process big data to generate hundreds of biomedical hypotheses (or predictions). The bottleneck to translating this large number of biological hypotheses is that each of them needs to be studied by experimentation for interpreting its functional significance. Even when the predictions are estimated to be very accurate, from a biologist's perspective, the choice of which of these predictions is to be studied further is made based on factors like availability of reagents and resources and the possibility of formulating some reasonable hypothesis about its biological relevance. When viewed from a global perspective, say from that of a federal funding agency, ideally the choice of which prediction should be studied would be made based on which of them can make the most translational impact.Entities:
Year: 2013 PMID: 24001106 PMCID: PMC3844564 DOI: 10.1186/2047-217X-2-11
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1Inference analytics. (A) Data analytics typically analyze large datasets to draw inferences; these inferences are usually used directly; the inferences may be evaluated with relatively small investment of resources or through crowdsourcing. (B) In areas such as biology, it is desirable that data analytics is followed by inference analytics; these algorithms would analyze the large number of data analytic inferences and re-ranking them by various criteria to aid the users in selecting which inference to pursue. The work presented here corresponds to inference analytics for “scientific impact prediction” criterion.
Figure 2PPI data. From the human interactome, those PPIs (edges) are selected that have 1–1 relation with a publication; that is, the publication reports only one interaction, and that interaction is not reported by any other publication. The classification model is trained and evaluated using this 1–1 dataset. After evaluating the approach thus, all of the 1–1 dataset is used to train a new model which is then used to classify each of all of the edges in the interactome to identify high-impact edges. PPI network diagram was created with Cytoscape [29-31].
Calculated centrality measures
| Node centrality | Degree centrality |
| Closeness centrality | |
| Betweenness centrality [ | |
| Eigenvector centrality | |
| Network constraint | |
| Clustering coefficient | |
| PageRank [ | |
| Hub centrality (authority centrality) | |
| Edge centrality | Brandes’ betweeness-centrality [ |
The number of positive/negative instances and their ratio under the different threshold settings in our dataset
| 5 | 3,393 | 3,474 | 0.494 |
| 10 | 1,686 | 5,181 | 0.246 |
| 30 | 267 | 6,880 | 0.039 |
| 50 | 93 | 6,774 | 0.014 |
Figure 3Precision-recall curve. Average precision-recall curves are shown for the two methods at thresholds of 5, 10, 30 and 50. The blue solid line corresponds to the random forest model. The red dashed line corresponds to the random probability assignment.
Figure 4Distribution of the number of PPIs reported in a paper.
Figure 5Distribution of the number of papers that report a PPI.
Results on the dataset
| 5 | Random forest | 0.5718 | 0.1450 |
| Random | 0.4975 | 0.0661 | |
| 10 | Random forest | 0.3204 | 0.0762 |
| Random | 0.2510 | 0.0272 | |
| 30 | Random forest | 0.0868 | 0.0426 |
| Random | 0.0600 | 0.0197 | |
| 50 | Random forest | 0.1115 | 0.0792 |
| Random | 0.0734 | 0.0283 |
Table shows area under the precision recall curve (AUPR) and R50 that measures the area under the precision-recall curve until reaching 50 negative predictions.
Figure 6Temporal citation patterns.
Figure 7Random Forest Gini importance measures for each feature.
The top 10 interactions that are predicted to be of high impact
| 1 | ADIPOQ - ADIPOR2 | adiponectin, C1Q and collagen domain containing | adiponectin receptor 2 | 12802337 | 160 | 2003 |
| 2 | NMB - NMBR | neuromedin B | neuromedin B receptor | 8392057 | 8 | 1993 |
| 3 | NUP93 - TMEM48 | nucleoporin 93 k Da | transmembrane protein 48 | 12928435 | 37 | 2006 |
| 4 | DAO - DAOA | D-amino-acid oxidase | D-amino acid oxidase activator | 12364586 | 72 | 2002 |
| 5 | PCM1 - TIC8 | pericentriolar material 1 | tetratricopeptide repeat domain 8 | 14520415 | 99 | 2003 |
| 6 | PCM1 - KIAA0368 | pericentriolar material 1 | KIAA0368 | 16189514 | 413 | 2005 |
| 7 | BBS4 - PCM1 | Bardet-Biedl syndrome 4 | pericentriolar material 1 | 15107855 | 55 | 2004 |
| 8 | SRC - YWHAG | v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) | tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, gamma polypeptide | 8702721 | 21 | 1996 |
| 9 | GRB2 - SRC | growth factor receptor-bound protein 2 | v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) | 11964172 | 2 | 2002 |
| 10 | HCN2 - HCN4 | hyperpolarization activated cyclic nucleotide-gated potassium channel 2 | hyperpolarization activated cyclic nucleotide-gated potassium channel 4 | 12928435 | 26 | 2003 |
Note that for each interaction, we only show the publication that has the highest citation count among those that report the said interaction.
GWAS genes in the top 50 high-impact interactions
| | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 12802337 | 160 | 2003 | ADIPOR2 | adiponectin receptor 2 | adiponectin levels | | ||
| 2 | 8392057 | 8 | 1993 | NMB | neuromedin B | | Retinal vascular caliber | ||
| 3 | 16600873 | 37 | 2006 | TMEM48 | transmembrane protein 48 | HDL cholesterol | | ||
| 4 | 12364586 | 72 | 2002 | DAO | D-amino-acid oxidase | | Bipolar disorder and schizophrenia | ||
| 8 | 8702721 | 21 | 1996 | SRC | v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) | | Multiple sclerosis | ||
| 12 | 2996780 | 64 | 1985 | SRC | v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) | Ventricular conduction,Height | | ||
| 14 | 10766163 | 6 | 2000 | TP53 | tumor protein p53 | Bone mineral density (hip),Height,Alcohol dependence,Sudden cardiac arrest,Bone mineral density (spine),Chronic myeloid leukemia,Breast cancer | | ||
| 15 | 15173068 | 8 | 2004 | GRB2 | growth factor receptor-bound protein 2 | Bone mineral density (hip),Height,Alcohol dependence,Sudden cardiac arrest,Bone mineral density (spine),Chronic myeloid leukemia,Breast cancer | | ||
| 16 | 9568714 | 278 | 1998 | Retinal vascular caliber | Height | ||||
| 17 | 10433554 | 4 | 1999 | Ventricular conduction,Height | Multiple sclerosis | ||||
| 18 | 8266076 | 105 | 1993 | IL2RG | interleukin 2 receptor, gamma | | IgE levels | ||
| 22 | 12732139 | 46 | 2003 | TP53 | tumor protein p53 | Coronary heart disease,Asthma,Crohn's disease | | ||
| 23 | 1454855 | 45 | 1992 | TP53 | tumor protein p53 | Ventricular conduction,Height | | ||
| 24 | 8266077 | 115 | 1993 | IL2RG | interleukin 2 receptor, gamma | | Type 1 diabetes,Ulcerative colitis,Multiple sclerosis,Primary biliary cirrhosis | ||
| 28 | 15324660 | 72 | 2004 | TP53 | tumor protein p53 | | Multiple sclerosis | ||
| 31 | 15140878 | 17 | 2004 | SRC | v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) | Bone mineral density (hip),Height,Alcohol dependence,Sudden cardiac arrest,Bone mineral density (spine),Chronic myeloid leukemia,Breast cancer | | ||
| 34 | 12878187 | 1 | 2003 | Glioma | Ventricular conduction,Height | ||||
| 36 | 16192271 | 88 | 2005 | DLG3 | discs, large homolog 3 (Drosophila) | MRI atrophy measures,HDL cholesterol,Lipid metabolism phenotypes,Cholesterol,total,Coronary heart disease | | ||
| 38 | 17057718 | 66 | 2006 | Hodgkin's lymphoma | Platelet counts | ||||
| 39 | 15466214 | 27 | 2004 | SRC | v-src sarcoma (Schmidt-Ruppin A-2) viral oncogene homolog (avian) | Prostate cancer,Male-pattern baldness,LDL cholesterol | | ||
| 44 | 8994038 | 82 | 1997 | GRB2 | growth factor receptor-bound protein 2 | | Multiple sclerosis | ||
| 46 | 11782371 | 15 | 2002 | EP300 | E1A binding protein p300 | | Bone mineral density (hip),Height,Alcohol dependence,Sudden cardiac arrest,Bone mineral density (spine),Chronic myeloid leukemia,Breast cancer | ||
| 47 | 12200137 | 1 | 2002 | IL2RG | interleukin 2 receptor, gamma | | Brain structure | ||
| 50 | 7477400 | 72 | 1995 | HLA-DRB3 | major histocompatibility complex, class II, DR beta 3 | Type 1 diabetes,Height,Body mass index | |||
Of the top 50 high impact interactions, those in which one or both genes are found to be associated with a disease or trait by genome-wide association studies (GWAS) are shown. GWAS associated genes are shown in bold.