| Literature DB >> 26579514 |
Neel S Madhukar1, Olivier Elemento1, Gaurav Pandey2.
Abstract
A genetic interaction (GI) is a type of interaction where the effect of one gene is modified by the effect of one or several other genes. These interactions are important for delineating functional relationships among genes and their corresponding proteins, as well as elucidating complex biological processes and diseases. An important type of GI - synthetic sickness or synthetic lethality - involves two or more genes, where the loss of either gene alone has little impact on cell viability, but the combined loss of all genes leads to a severe decrease in fitness (sickness) or cell death (lethality). The identification of GIs is an important problem for it can help delineate pathways, protein complexes, and regulatory dependencies. Synthetic lethal interactions have important clinical and biological significance, such as providing therapeutically exploitable weaknesses in tumors. While near systematic high-content screening for GIs is possible in single cell organisms such as yeast, the systematic discovery of GIs is extremely difficult in mammalian cells. Therefore, there is a great need for computational approaches to reliably predict GIs, including synthetic lethal interactions, in these organisms. Here, we review the state-of-the-art approaches, strategies, and rigorous evaluation methods for learning and predicting GIs, both under general (healthy/standard laboratory) conditions and under specific contexts, such as diseases.Entities:
Keywords: cancer; drug discovery; genetic interactions; machine learning; network analysis; prediction
Year: 2015 PMID: 26579514 PMCID: PMC4620407 DOI: 10.3389/fbioe.2015.00172
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Overview of the most commonly used approach to predicting genetic interactions (GIs). Here, a generally large number and variety of features are extracted from diverse data sources, examples of both of which are shown in the top panel. The feature data are combined with known GI data from public databases like BioGRID, leading to a feature + label table/matrix. Some of the gene pairs in this table, whose GI status is known, are used as training examples, from which a GI prediction model is learnt using an appropriate algorithm. Finally, the model is applied to test gene pairs to make predictions of their GI status, which can be used for downstream evaluations and/or applications.
Examples of features derived from a variety of data sources that were found to be discriminative between GI and non-GI gene pairs in our previous work on GI prediction (Pandey et al., .
| Category/data source | Feature description | KS statistic | KS |
|---|---|---|---|
| Functional information | Co-membership in the same KEGG pathway | 0.4388 | 0 |
| Similarity of two genes using their annotations to GO BP terms and semantic similarity between the terms (Tao et al., | 0.2306 | 0 | |
| Number of functions shared by two genes [calculated here using the 138 most populated GO BP terms recommended by Myers et al. ( | 0.1861 | 0 | |
| Similarity of two genes using their annotations to GO CC terms and semantic similarity between the terms (Tao et al., | 0.1826 | 0 | |
| Similarity of two genes using their annotations to GO MF terms and semantic similarity between the terms (Tao et al., | 0.0763 | 0 | |
| Protein–protein interaction (PPI) network | Number of communities derived from PPI network that two proteins are co-members of | 0.2257 | 0 |
| Length of shortest path between two | 0.14 | 0 | |
| Common neighborhood similarity [topological overlap (Zhang and Horvath, | 0.0991 | 0 | |
| Number of cliques in the PPI network (Zhu et al., | 0.0839 | 0 | |
| Co-membership in modules discovered from PPI network (Zhang and Horvath, | 0.0456 | 3.33E-15 | |
| Degree of vertex corresponding to an edge in the PPI network in its edge graph version (edge degree) | 0.0444 | 2.08E-14 | |
| Betweenness of the edge between two proteins in the PPI network | 0.0444 | 2.13E-14 | |
| Presence (1)/absence (0) of an interaction between two proteins | 0.0443 | 2.20E-14 | |
| Gene expression data (pairwise correlation of expression profiles) | From Brem et al. ( | 0.0904 | 0 |
| From Spellman et al. ( | 0.0594 | 0 | |
| From Mnaimneh et al. ( | 0.0471 | 0 | |
| From Hughes et al. ( | 0.0219 | 2.76E-04 | |
| Sequence similarity (pairwise BLAST comparison of protein sequences) | Length of alignment | 0.0272 | 2.25E-06 |
| E-value of alignment | 0.0271 | 2.30E-06 | |
| Bit score of alignment | 0.0271 | 2.38E-06 | |
| Percentage identity in alignment | 0.0271 | 2.38E-06 | |
| Number of mismatches in alignment | 0.0268 | 3.21E-06 | |
| Number of gaps included in alignment | 0.0235 | 7.00E-05 | |
| Others | Mutual information between the phylogenetic profiles of two proteins | 0.0673 | 0 |
| Number of mutant phenotypes shared by two genes | 0.0268 | 3.41E-06 |
Also shown are the Kolmogorov–Smirnov test statistic scores and .
Visual depiction of the .
| CV round | Training examples | Test example |
|---|---|---|
| 1 | 2, 3, 4, 5, 6, …, | 1 |
| 2 | 1, 3, 4, 5, 6, …, | 2 |
| 3 | 1, 2, 4, 5, 6, …, | 3 |
| . | . | . |
| . | . | . |
| . | . | . |
| 1, 2, 3, 4, 5, …, |
In each round, one of the examples is reserved for testing the predictive model learnt over the remaining training examples. Other forms of .