| Literature DB >> 28881978 |
Sean Robinson1,2,3,4,5, Jaakko Nevalainen4,6, Guillaume Pinna7, Anna Campalans8,9,10,11, J Pablo Radicella8,9,10,11, Laurent Guyon1,2,3.
Abstract
MOTIVATION: Incorporating gene interaction data into the identification of 'hit' genes in genomic experiments is a well-established approach leveraging the 'guilt by association' assumption to obtain a network based hit list of functionally related genes. We aim to develop a method to allow for multivariate gene scores and multiple hit labels in order to extend the analysis of genomic screening data within such an approach.Entities:
Mesh:
Year: 2017 PMID: 28881978 PMCID: PMC5870666 DOI: 10.1093/bioinformatics/btx244
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
The proposed similarity matrices for calculating the Knode (Cornish and Markowetz, 2014), NePhe (Wang ) and NEST (Jiang ) scores
| Knode | NePhe | NEST | |
|---|---|---|---|
| Adjacency | X | X | |
| Common neighbours | X | ||
| Mean steps between | X | ||
| Shortest path | X | X | |
| Diffusion kernel | X | X |
Fig. 1Toy example. (a) Original observations and underlying label (‘blue’ or ‘red’) for each vertex. The vertices on the left are true ‘blue’ vertices while the vertices on the right are true ‘red’ vertices, indicated in the vertex labels. The observed value for the vertex is indicated in the colour of the vertex itself. (b) Densities corresponding to both the ‘blue’ or ‘red’ labels. (c) Minimum energy labels for a range of values of β. (d) The MRF scores
Proportion of edges with contributions from different sources in the overall ‘combined’ PPI network for the OGG1 data
| Contribute | Uniquely contribute | |
|---|---|---|
| Text mining | 0.9268 | 0.5364 |
| Experimental | 0.3294 | 0.0295 |
| Co-expression | 0.1134 | 0.0095 |
| Database | 0.0722 | 0.0236 |
| Neighbourhood | 0.0161 | 0.0006 |
| Co-occurrence | 0.0119 | 0.0015 |
| Fusion | 0.0005 | 0.0000 |
Fig. 2Schematic diagram of the construction of the trivariate densities corresponding to each label for the OGG1 data. In each dimension there are two possibilities and an associated univariate density: ‘null case’ (U(0, 1)) and ‘negative siRNA’ (). Trivariate densities are defined for each of four labels of interest and are schematically represented by encompassing a single possibility in each dimension. The four labels are: ‘3 negative siRNAs’ (3 low P-values), ‘2 negative siRNAs’ (2 low P-values), ‘1 negative siRNA’ (1 low P-value) and ‘no negative siRNAs’ (no low P-values). Note that the ordering of the P-values limits the possible combinations of trivariate densities across the three dimensions in this case
Fig. 3Log10(P-values) for Fisher exact tests of enrichment or depletion in functional/GO annotations for the hits lists obtained from the MRF method and median P-value. The annotation term ‘nuclear lumen (GO:0031981)’ has been highlighted
Fig. 4TOP3A and neighbours present in the top 200 MRF hits. Vertices are coloured based on median score/P-value. Diamond vertices correspond to the genes with the ‘nuclear lumen (GO:0031981)’ annotation. The network was visualized in Cytoscape 3.3.0 (Shannon ) using the yFiles Organic layout
Fig. 5Schematic diagram of the construction of the bivariate densities corresponding to each label for the lymphoma data. In each dimension there are two possibilities and an associated univariate density: ‘non-hit’ (U(0, 1)) and ‘hit’ (). Bivariate densities are defined for each of four labels of interest and are schematically represented by encompassing a single possibility in each dimension. The four labels are: ‘S and T hits’ (low P-value in both dimensions), ‘S hits only’ (low P-value in S dimension), ‘T hits only’ (low P-value in T dimension) and ‘no hits’ (no low P-value in either dimension)
Fig. 6Venn diagram of the hit lists for the lymphoma data overlaid with genes associated with the NFκB pathway. There are 46 genes in each hit list (black) along with the genes associated with the NFκB pathway (grey and listed at the sides in blue). The NFκB annotation was obtained from KEGG (Kanehisa ). Below each method is the P-value for the Fisher exact text for enrichment or depletion of the NFκB terms in the hit list. The Venn diagram is based on the layout from http://bioinformatics.psb.ugent.be/webtools/Venn/
Fig. 7Box plots of the proportion of true hit vertices identified for each method in the ‘3 cluster’ Knode simulation scheme (1000 simulation runs) (Cornish and Markowetz, 2014)