| Literature DB >> 25993042 |
Itziar Frades1, Erik Andreasson2, Jose Maria Mato3, Erik Alexandersson2, Rune Matthiesen4, Maria Luz Martínez-Chantar3.
Abstract
Nonalcoholic fatty liver disease (NAFLD) is a risk factor for Hepatocellular carcinoma (HCC), but he transition from NAFLD to HCC is poorly understood. Feature selection algorithms in human and genetically modified mice NAFLD and HCC microarray data were applied to generate signatures of NAFLD progression and HCC differential survival. These signatures were used to study the pathogenesis of NAFLD derived HCC and explore which subtypes of cancers that can be investigated using mouse models. Our findings show that: (I) HNF4 is a common potential transcription factor mediating the transcription of NAFLD progression genes (II) mice HCC derived from NAFLD co-cluster with a less aggressive human HCC subtype of differential prognosis and mixed etiology (III) the HCC survival signature is able to correctly classify 95% of the samples and gives Fgf20 and Tgfb1i1 as the most robust genes for prediction (IV) the expression values of genes composing the signature in an independent human HCC dataset revealed different HCC subtypes showing differences in survival time by a Logrank test. In summary, we present marker signatures for NAFLD derived HCC molecular pathogenesis both at the gene and pathway level.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25993042 PMCID: PMC4439034 DOI: 10.1371/journal.pone.0124544
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Microarray samples (biological replicates), platforms and GEO accession numbers.
| Microarray samples, platforms and GEO accession numbers |
|
|
| |||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
|
|
| -5 biological replicates of 3 month MAT1A KO mouse | Affymetrix Mouse430_2.na21 platform | -5 biological replicates of 3 month GNMT KO-5 biological replicates of 8 month MAT1A KO mouse | Affymetrix Mouse430_2.na21 platform | -4 biological replicates of 8 month GNMT KO-5 biological replicates of 15 month MAT1A KO mouse | Affymetrix Mouse430_2.na21 platform | |
|
| -9 human biological replicates | Affymetrix HG-U133_Plus_2.na22 platform | ||||||
|
|
| -4 biological replicates of 8 month GNMT KO-5 biological replicates of 15 month MAT1A KO mouse | Affymetrix Mouse430_2.na21 platform | |||||
|
| -91 human biological replicates | GPL1528 human microarray platform in GSE1898 series | ||||||
|
| -87 human biological replicates | GPL257 human microarray platform in GSE364 series | ||||||
|
|
| -2 human biological replicates | Affymetrix HG-U133_Plus_2.na22 platform | -9 human biological replicates | Affymetrix HG-U133_Plus_2.na22 platform | |||
The 26 feature selection methods.
| Search strategies | |||||
|---|---|---|---|---|---|
| Sequential | Evolutionary approach | ||||
| Backward Elimination | Forward feature selection | ||||
|
|
|
|
| ||
|
|
|
|
|
| |
|
|
| ||||
|
|
|
|
| ||
|
|
|
|
| ||
The methods are described in terms of the search and evaluation procedure they use, whether they tackle redundancy (r, redundant; nr, non-redundant), the name feature selection method and whether they are univariate (u), multivariate (m) or a hybrid of these two (h).
Fig 1Data partition and aggregation procedures.
A random partition of the data into mutually exclusive sets P1, P2, P3, P4 and P5 is done. Feature selection is performed in each partition. It results in a feature subset for each partition. We perform frequency based aggregation by individually adding the most frequent features from the subsets and stop adding features when the performance of a mining algorithm starts to decrease. It results in a unique ensemble subset.
Fig 2Tree structure where each of the stages of the disease has been clustered in a single cluster using the RFE_clust_Dunn algorithm to select the variables used as input in pvclust [43] used to perform hierarchical clustering.
Fig 3Mouse and human HCC clustering.
the gene expression data of the human HCC of mixed etiologies has been integrated with HCC samples from GNMT and MAT1A mouse KO models of HCC derived from NAFLD by selecting the orthologous genes using the homologene database. The integrated data holds 1691 genes obtained from matching the orthologous genes between the genes having at least 9 samples of two fold regulation in the human HCC series, the 15 month MAT1A KO and 8 month GNMT mouse KO models. Using complete hierarchical clustering and Pearson correlation it is possible to distinguish cluster A and B with significant differences of survival length and the mouse models laying together cluster A.
Fig 4Survival signature common for human and mouse in an independent HCC dataset using complete hierarchical clustering and Pearson correlation as a similarity measure over the expression values of the genes composing renders 3 main clusters (A, C and B) representing HCC subtypes of differential survival.
5 fold cross-validation classification performance, stability calculated as the Average Normalized Hamming Distance (ANHD) and number of selected genes in the signatures of NAFLD progression from smoothed and raw data.
| Method | 5 fold crossvalidation classification performance smoothed data | 5 fold crossvalidation classification performance raw data | Genes smoothed data | Genes smoothed data | ANHD smoothed data | ANHD raw data | Ensemble error smoothed data | Ensemble error raw data |
|---|---|---|---|---|---|---|---|---|
|
| 0.065±0.009 | 0.084±0.016 | 28 | 39 | 0 | 6.577 | 0.08 | 0.092 |
|
| 0.070±0.010 | 0.087±0.019 | 39 | 39 | 0 | 8.156 | 0.061 | 0.093 |
|
| 0.077±0.012 | 0.086±0.019 | 43 | 54 | 0 | 8.020 | 0.054 | 0.095 |
|
| 0.033±0.015 | 0.043±0.011 | 28 | 61 | 0 | 3.955 | 0.054 | 0.067 |
|
| 0.067±0.009 | 0.085±0.020 | 50 | 373 | 0 | 5.065 | 0.061 | 0.093 |
|
| 0.135±0.048 | 0.232±0.130 | 11 | 26 | 0 | 0.756 | 0.144 | 0.091 |
|
| 0.042±0.044 | 0.072±0.036 | 58 | 84 | 0 | 5.678 | 0.064 | 0.101 |
|
| 0.217±0.082 | 0.217±0.061 | 49 | 70 | 0 | 3.152 | 0.054 | 0.051 |
|
| 0.027±0.009 | 0.042±0.007 | 111 | 67 | 0 | 5.665 | 0.058 | 0.058 |
|
| 0.060±0.020 | 0.076±0.015 | 35 | 371 | 0 | 5.140 | 0.08 | 0.097 |
|
| 0.070±0.014 | 0.090±0.021 | 50 | 85 | 0 | 4.582 | 0.067 | 0.092 |
|
| 0.068±0.026 | 0.088±0.017 | 218 | 93 | 0 | 5.658 | 0.077 | 0.085 |
Fig 6Kaplan-Meier plots showing the survival probability over time (days) of the 3 main clusters representing HCC subtypes of differential survival found in the independent HCC dataset when performing clustering analysis over the expression values of the genes composing the survival signature common for human and mouse.
Fig 5Enriched KEGG pathway signatures selected by the two supervised clustering based feature selection methods which produced the optimal clustering result on smoothed data and the two ensemble signatures derived from 14 feature selection algorithm from raw and smoothed data used to build the signatures of NAFLD progression.
KEGG enrichment analysis was performed on the genes selected in the 5 feature selection runs of the external 5 fold crossvalidation procedure and those pathways having a significant p-value (p<0.05) were selected.
Ensemble unique gene survival signature common for human and mouse resulting from the frequency based aggregation of the signatures produced by the 5 feature selection methods.
| Gene ID | Gene name | Frequency |
|---|---|---|
| Tgfb1i1 |
| 1 |
| Fgf20 |
| 1 |
| Kcnk2 |
| 0.8 |
| Pfkfb2 |
| 0.8 |
| Kcnk3 |
| 0.8 |
| Pigr |
| 0.8 |
| Egr4 |
| 0.8 |
| Kera |
| 0.8 |
| Foxf2 |
| 0.8 |
| Adprh |
| 0.4 |
| Cecr6 |
| 0.2 |
| Slco1b2 |
| 0.2 |
| Slc5a6 |
| 0.2 |
| Xkr4 |
| 0.2 |
| Camk1g |
| 0.2 |
| Brd7 |
| 0.2 |
| Mdfic |
| 0.2 |
| D3Bwg0562e |
| 0.2 |
| Tnfsf13b |
| 0.2 |
| Muc13 |
| 0.2 |
| Elf1 |
| 0.2 |
| Ube2g2 |
| 0.2 |
| Ddx46 |
| 0.2 |
The frequency of appearance of the selected genes among the 5 feature selection methods is recorded as a measure of stability.
Survival signature of pathways common for human and mouse resulting from the signatures produced by the 5 runs of the 5 feature selection methods.
| Enriched KEGG pathways | Hypergeometric tests p-value | Standard deviation of p-value | Frequency |
|---|---|---|---|
| Regulation of autophagy | 0.0103 | 0.0076 | 7 |
| Reductive carboxylate cycle (CO2 fixation) | 0.0257 | 0.0029 | 6 |
| Neuroactive ligand-receptor interaction | 0.0260 | 0.0020 | 4 |
| Hematopoietic cell lineage | 0.0272 | 0.0077 | 4 |
| Folate biosynthesis | 0.0382 | 0.0138 | 2 |
| Starch and sucrose metabolism | 0.0204 | 0.0109 | 2 |
| Leukocyte transendothelial migration | 0.0145 | 0.0028 | 2 |
| Cell adhesion molecules (CAMs) | 0.0081 | 0.0040 | 2 |
The frequency of appearance of the selected pathways among the 5 runs of the 5 feature selection methods is recorded as a measure of stability. Another measure of stability is the Hypergeometric test´s p-values standard deviation.