| Literature DB >> 28185541 |
Angelo Nuzzo1,2, Giovanni Carapezza1, Sebastiano Di Bella1, Alfredo Pulvirenti3, Antonella Isacchi1, Roberta Bosotti4.
Abstract
BACKGROUND: Kinase over-expression and activation as a consequence of gene amplification or gene fusion events is a well-known mechanism of tumorigenesis. The search for novel rearrangements of kinases or other druggable genes may contribute to understanding the biology of cancerogenesis, as well as lead to the identification of new candidate targets for drug discovery. However this requires the ability to query large datasets to identify rare events occurring in very small fractions (1-3 %) of different tumor subtypes. This task is different from what is normally done by conventional tools that are able to find genes differentially expressed between two experimental conditions.Entities:
Keywords: Gene expression; Gene fusion; Kinase; Outlier
Mesh:
Substances:
Year: 2016 PMID: 28185541 PMCID: PMC5123341 DOI: 10.1186/s12859-016-1188-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Outlier detection. Outlier detection method is reported. a Statistical detection: for each kinase, gene expression level in all the analysed samples belonging to a specific tumor type is reported as an histogram (left panel) and as boxplot (right panel). “Rare events” (a kinase over-expressed in one or a few cell lines and low/not expressed in the others) are identified by mean of the Grubb test and reported as a red circle. b Prioritization and filtering: the most relevant outlier kinases are selected applying specific filter criteria (minimal expression threshold; maximum median level of expression over the tumor type; minimum distance from the 75th percentile of the tissue-specific distribution; proportion of the number of outliers with respect to the whole dataset of outlier occurrences). Samples that do not consistently pass the imposed filters are removed (reported in the figure as red crosses)
Fig. 2The ranking algorithm. a 2-d plot of the two measured distance: M1 is the distance from the upper wisker; M2 is the distance from the median. The “best” outliers lie on the top right corner of the graph, that corresponding to a major distance from both upper wisker and median, and are reported as red dots. b The metrics used for ranking are reported: M1 (red arrow) is the distance from the upper wisker; M2 (orange arrow) is the distance from the median; M3 (yellow circle) is the number of samples in which the gene has an outlier expression value
Fig. 3Graphical User Interface. KAOS graphical user interface, developed in Java, is reported. The interface allows to visualize both the information on the detected outliers (top panel) and graphically represent the results (central panel) at the same time. The interface allows to customize query parameters and to filter the results (left panel)
Tools comparison on simulated data for k = 1
| k = 1, T = 10 | k = 1, T = 20 | k = 1, T = 50 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | |
| KAOS |
|
|
|
|
|
| 0.162 | 0.162 | 0.162 |
| Zodet | 0.232 | 0.232 | 0.232 | 0.243 | 0.243 | 0.243 |
|
|
|
| GTI | 0.182 | 0.182 | 0.182 | 0.175 | 0.175 | 0.175 | 0.146 | 0.146 | 0.146 |
| Khotary et al. | - | 0.038 | - | 0.114 | 0.032 | 0.050 | 0.121 | 0.012 | 0.022 |
The comparison of Kaos performances is based on 50 simulations on a synthetic dataset made of 1000 genes expression values for 30 cases and 30 cancer test samples. The expression values were drawn from a normal distribuion with mean 7 and standard deviation 1, where k samples which have been marked as outliers’ samples (see Methods section for further details) and T is the top T number of outlier genes found. The table shows average Precision, Recall and F-Measure for k =1 and t ranging from 10 to 50
In bold are reported the values obtained by the best performing tool in the different conditions
Tools comparison on silmulated data for k = 5
| k = 5, T = 10 | k = 5, T = 20 | k = 5, T = 50 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | |
| KAOS | 0.526 | 0.526 | 0.526 | 0.374 | 0.374 | 0.374 | 0.244 | 0.244 | 0.244 |
| Zodet | 0.828 | 0.828 | 0.828 | 0.699 | 0.699 | 0.699 | 0.516 | 0.516 | 0.516 |
| GTI |
|
|
|
|
|
|
|
|
|
| Khotary et al. | 0.454 | 0.246 | 0.319 | 0.450 | 0.124 | 0.194 | - | 0.041 | - |
The table shows the same simulation results as Table 1 when k = 5
In bold are reported the values obtained by the best performing tool in the different conditions
Tools comparison (k = 10)
| k = 10, T = 10 | k = 10, T = 20 | k = 10, T = 50 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F-measure | Precision | Recall | F-measure | Precision | Recall | F-measure | |
| KAOS | 0.268 | 0.268 | 0.268 | 0.188 | 0.188 | 0.188 | 0.109 | 0.109 | 0.109 |
| Zodet | 0.986 | 0.986 | 0.986 | 0.948 | 0.948 | 0.948 | 0.765 | 0.765 | 0.765 |
| GTI |
|
|
|
|
|
|
|
|
|
| Khotary et al. | 0.837 | 0.776 | 0.805 | 0.754 | 0.389 | 0.513 | 0.767 | 0.151 | 0.252 |
The table shows the same simulation results as Table 1 when k = 10
In bold are reported the values obtained by the best performing tool in the different conditions
Fig. 4Identification of known and new overexpressed kinases. Figure 1. Left panel shows gene expression level of a selected kinase in 917 cancer cell lines belonging to 24 different tumor types (CCLE data) as histogram. Tumor types are reported in different colors. The boxplot of the tissue-specific distribution of the kinase is reported in the right panel. Outlier samples are reported as black circl. a NTRK1 is generally expressed in hematopoietic and lymphoid and autonomic ganglia. No expression is observed in large intestine (colon), apart in KM12 colorectal cancer cell line, highlited as outlier in this tissue; b RET tyrosine kinase is generally expressed in tissues such as autonomic ganglia, haematopoietic tissues, but no expression is observed in thyroid tumors. In this tissue a dramatic expression of RET can be detected in TT papillary tumor cell line only, assigned as outlier by the tool; c ROS1 tyrosine kinase is typically poorly expressed apart in colon where HCC-78 lung cancer cell line stands out as a clear outlier; d FGFR4 is highly expressed in few breast cancer cell lines, among those MDA-MB-453 breast cancer cell line appear as highly overexpressed. e ZAP-70 tyrosine kinase can be observed in haematopoietic and lymphoid tissues only. No expression in breast cancer cell lines can be appreciated, with the exception of a significant overexpression of the gene in DU4475 breast cancer cell line