| Literature DB >> 23055628 |
Ulykbek Kairov1, Tatyana Karpenyuk, Erlan Ramanculov, Andrei Zinovyev.
Abstract
Many genome-scale studies in molecular biology deliver results in the form of a ranked list of gene names, accordingly to some scoring method. There is always the question how many top-ranked genes to consider for further analysis, for example, in order creating a diagnostic or predictive gene signature for a disease. This question is usually approached from a statistical point of view, without considering any biological properties of top-ranked genes or how they are related to each other functionally. Here we suggest a new method for selecting a number of genes in a ranked gene list such that this set forms the Optimally Functionally Enriched Network (OFTEN), formed by known physical interactions between genes or their products. The method allows associating a network with the gene list, providing easier interpretation of the results and classifying the genes or proteins accordingly to their position in the resulting network. We demonstrate the method on four breast cancer datasets and show that 1) the resulting gene signatures are more reproducible from one dataset to another compared to standard statistical procedures and 2) the overlap of these signatures has significant prognostic potential. The method is implemented in BiNoM Cytoscape plugin (http://binom.curie.fr).Entities:
Year: 2012 PMID: 23055628 PMCID: PMC3449386 DOI: 10.6026/97320630008773
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1OFTEN analysis and META-OFTEN network of genes differentially expressed between metastatic and non-metastatic patient samples. A) Example of the percolation score S behavior with respect to the number of chosen top-ranked differentially expressed genes (GSE2034 dataset). B) META-OFTEN network constructed for four breast cancer datasets. Different edge types correspond to different evidences of protein-protein interactions as described in HPRD. Color shows average t-test values over those datasets where the gene is included in OFTEN network. Size of the node signifies the number of OFTEN networks in which the gene is found: small circles correspond to two datasets, average – to three, big nodes appear in all four datasets. C) Survival analysis made on the genes of the network, using unsupervised scoring strategy. The plot shows percentage of metastases-free survival for three groups of patients: with high, intermediate (within one standard deviation around the mean value) and low score values.