Literature DB >> 24462714

Pathway-based analysis of the hidden genetic heterogeneities in cancers.

Xiaolei Zhao¹, Shouqiang Zhong², Xiaoyu Zuo³, Meihua Lin¹, Jiheng Qin¹, Yizhao Luan¹, Naizun Zhang², Yan Liang⁴, Shaoqi Rao⁵.

Abstract

Many cancers apparently showing similar phenotypes are actually distinct at the molecular level, leading to very different responses to the same treatment. It has been recently demonstrated that pathway-based approaches are robust and reliable for genetic analysis of cancers. Nevertheless, it remains unclear whether such function-based approaches are useful in deciphering molecular heterogeneities in cancers. Therefore, we aimed to test this possibility in the present study. First, we used a NCI60 dataset to validate the ability of pathways to correctly partition samples. Next, we applied the proposed method to identify the hidden subtypes in diffuse large B-cell lymphoma (DLBCL). Finally, the clinical significance of the identified subtypes was verified using survival analysis. For the NCI60 dataset, we achieved highly accurate partitions that best fit the clinical cancer phenotypes. Subsequently, for a DLBCL dataset, we identified three hidden subtypes that showed very different 10-year overall survival rates (90%, 46% and 20%) and were highly significantly (P=0.008) correlated with the clinical survival rate. This study demonstrated that the pathway-based approach is promising for unveiling genetic heterogeneities in complex human diseases.

Entities: Disease Gene Species

Keywords: Cancer; Enrichment analysis; Genetic heterogeneity; Pathway-based approach; Sample partitioning; Survival analysis

Mesh：

Year: 2014 PMID： 24462714 PMCID： PMC4411334 DOI： 10.1016/j.gpb.2013.12.001

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

Genetic heterogeneity has attracted increasing attention in the study of genetic mechanisms of complex diseases. It describes the biological complexities that apparently similar characters may result from different genes or different genetic mechanisms [1]. In the clinical setting, patients with diseases displaying a similar phenotype but resulting from different genetic causes frequently respond very differently to the same treatment and thus receive a markedly different prognosis. Therefore, elucidation of the genetic heterogeneities underlying complex diseases has profound influences on both modern clinical practice and basic biomedical research. Rapidly accumulated genomic-scale molecular data provide good opportunities to unveil the genetic heterogeneities in complex diseases at the molecular level. Significant improvements in methods and applications for analysis of the genetic heterogeneity have been achieved in the past decades. The usefulness of large-scale gene expression data, as measured by microarrays, has noticeably been indicated by the successful stratification of diffuse large B-cell lymphoma (DLBCL) [2-5]. In these pioneering studies, an unsupervised clustering algorithm was used to partition both gene expression data and patients with an aim to define genetically homogeneous novel cancer subgroups among cancer patients based on the principle that patients within the same cluster probably involve the similar molecular pathogenesis and hence could be grouped into the same molecular subphenotype [6]. Although the traditional clustering analysis based on individual gene expression profiles has achieved great success in unveiling the genetic heterogeneity, it seldom considered the combined actions of multiple functionally dependent genes. It is increasingly recognized that complex diseases such as cancers are a consequence of alterations in a complicated cascade of events involving multiple biological processes and pathways. Thus, subtypes identified by individual genes often lack good biological interpretations. In this sense, the development of function-based methods for cancer subtyping is warranted. Gene Ontology (GO) [7] and the Kyoto Encyclopedia of Genes and Genomes (KEGG) [8] are the two most common databases currently used for gene functional annotation. GO terms are used primarily for the annotation of individual gene products, whereas KEGG pathway terms are used for the annotation of classes of gene products, thus providing a more precise delineation of functionalities for a group of genes that act together to some extent. KEGG pathway is a collection of manually drawn pathway maps that represent the knowledge on molecular interactions and reaction networks for human diseases, environmental information processing, genetic information processing, etc., thus possibly providing biological interpretations of higher-level systemic functions [9]. Hence, a pathway-based approach can integrate the effects of genetic factors and biological networks [10] and has been used for disease classification [11]. In our previous work [1], we proposed a GO-based approach to unveil the hidden heterogeneities in cancers, and demonstrated that it can successfully integrate the cellular function and the gene expression profile, and the approach showed the greater advantage of GO in classifying the cancer types. In principle, a similar pathway-based approach should have comparable performance in the genetic analysis of molecular heterogeneities in cancers. Numerous studies have shown that the cancer subtypes are, in essence, related to multiple pathways [12-14]. For example, recent evidence has shown that molecular subtypes of DLBCL arise from distinct genetic pathways [15]. Therefore, this study aimed to verify whether a pathway-based approach is useful in deciphering molecular heterogeneities in complex diseases such as cancer. In this study, we proposed a pathway-based clustering approach to unveil disease heterogeneities based on multiple pathways. First, we selected differentially expressed genes that are associated with specific disease conditions. It should be noted that algorithms such as the t test or F test are not proper for selecting differentially expressed genes due to the presence of genetic heterogeneity, because the validity of these tests relies on accurately and unambiguously defining phenotype characteristics. Hence, we took a robust metric, the overall variability of gene expression, to guide gene selection. Firstly, genes with top-ranked expression variations across samples, which explain most of the total variance potentially contributed by known or unknown factors (for example, the hidden cancer subtypes), were selected as “feature genes” in the initial gene selection as implemented in several previous studies [16,17]. Then, we identified KEGG pathways enriched with feature genes as “putative signature pathways” (here, “enriched” means that a pathway has saliently more feature genes (with large variance) than a random gene set of the same size does). Finally, we classified samples to identify the hidden disease subtypes using the expression profiles of genes annotated to these well-characterized pathways. In the numerical analysis, we first validated the proposed approach in accurately partitioning cancer phenotypes using a publicly-available large cancer dataset. Subsequently, we used the approach to identify the hidden subtypes of a notoriously heterogeneous phenotype, DLBCL. Our results demonstrated that three new subtypes identified using signature pathways had very different 10-year overall survival rates, and the partitions were highly significantly correlated with the clinical survival rates.

Results

Validation of the proposed pathway-based approach using a large microarray dataset

We selected the signature pathways that were significantly (FDR ⩽ 0.01, see the Materials and methods section for the details) enriched with the 10% top-ranked genes with largest expression variances based on the NCI60 dataset [18]. As a result, three pathways were identified, which were used for the subsequent analyses. These include the small cell lung cancer pathway (hsa05222), the extracellular matrix (ECM)–receptor interaction pathway (hsa04512) and the focal adhesion pathway (hsa04510) (Table 1). First, we evaluated the ability of each signature pathway to accurately partition the samples into the known cancer types using the clustering analysis based on only the expression profiles of genes within the pathway. Our results based on each of the three pathways agreed well with the original clinical labels. The observed values for the adjusted Rand index (ARI) [19] (to measure the agreement between the identified clusters and the original partitions, ranging from 0 to 1, see the Materials and methods section for the details) were 0.83, 0.69 and 0.78, respectively. Subsequently, to determine the empirical significance of each pathway, we randomly selected 1000 gene subsets of the same pathway size from the null distribution as described in the Materials and methods section. No random subset achieved an ARI value higher than that of the corresponding pathway such that all identified signature pathways showed significantly better performance (P < 0.001) in correctly partitioning the samples (that is, more likely relevant to the phenotypic partitions). Furthermore, after applying the majority rule voting for integrating results from the three signature pathways, we achieved a ARI value of 0.83, with only two tumor samples misallocated. Alternatively, four samples were misclassified with construction of a decision tree (Figure 1).

Table 1

Signature pathways for NCI60

Signature pathway	Number of annotated genes	Nominal P (pathway)^a	FDR (pathway)^b	ARI	Number of misallocated samples	P (ARI)^c
hsa05222: small cell lung cancer	19	7.83E−06	9.82E−03	0.83	2	<0.001
hsa04512: ECM–receptor interaction	21	3.03E−07	3.81E−04	0.69	8	<0.001
hsa04510: focal adhesion	36	1.54E−10	1.93E−07	0.78	3	<0.001

Note: Signature pathways for NCI60 were identified by using FDR for multiple tests correction (adjusted α = 0.01). Details of the NCI60 dataset were described previously [18]. a Modified Fisher Exact P value. b FDR stands for false positive rate, which is used for adjustment of multiple tests for 201 pathways. c Statistical significance of ARI for the selected pathway. ARI stands for adjusted Rand index.

Figure 1

Decision tree based on three signature pathways for five cancer types The internal nodes of the tree are the signature pathways. The leaf nodes represent the classification for five types of cancer (renal cancer, central nervous system cancer, melanoma, colon cancer and leukemia). Included in the leaf nodes are the total number of samples over the number of the incorrectly predicted samples for the specific type of cancer indicated.

We also assessed the robustness of the proposed pathway-based approach to the methods for feature gene selection. With the feature genes selected as the top 10%, 15% and 20% ranked genes with the largest variances, we found that the identified signature pathways largely overlapped. Compared to using the top 10% ranked genes as feature genes, no additional pathways were identified when using the top 15% genes, and only one more pathway was identified when using the top 20% genes. These data suggest the robustness of such pathways to the differences of the thresholds for selecting feature genes. Numerous biological experiments provided ample evidence to support the involvement of the three pathways in the molecular mechanisms underlying the various cancer types. For example, the focal adhesion pathway and the ECM–receptor interaction pathway were identified to be the functional gene sets that were significantly differentially expressed in leukemia [20]. In addition, by searching for the oncogenes in the KEGG database, one can easily find that the three pathways, particularly the small cell lung cancer pathway and the focal adhesion pathway, were enriched with various oncogenes. All evidence supports that these three pathways are truly linked to cancer(s).

Unveiling the hidden genetic heterogeneities in DLBCL

The genetic heterogeneities in DLBCL have been extensively investigated previously [2,21,22]. Inspired by its success in classifying the known NCI60 cancer types, we then applied the proposed pathway-based approach to discover the hidden molecular types of DLBCL. Based on the DLBCL dataset [2], we identified three subtypes using two signature pathways, i.e., the hematopoietic cell lineage pathway (has04640) and the cytokine receptor interaction pathway (has04060) (Table 2). These two pathways might be substantially responsible for the incidence and progression of DLBCL. The former pathway was associated with the immune system, and latter pathway was associated with various signaling molecules and their corresponding interactions. The abnormalities in either the immune function and/or the signaling molecules and their interactions were considered to be the major causes of DLBCL [23]. The sensitivity analysis based on different criteria for feature gene selection (top 10%, 15% and 20% ranked genes with the largest variances) revealed that the identified signature pathways largely overlapped. Compared to using the top 10% ranked genes as feature genes, two additional pathways were identified when using the top 15% genes, and only one more pathway was identified when using the top 20% genes. Thus, only the results for the criterion of the top 10% ranked genes were presented in this study.

Table 2

Signature pathways for DLBCL

Signature pathway	Number of annotated genes	Nominal P (pathway)^a	FDR (pathway)^b
hsa04640: hematopoietic cell lineage	22	3.80E−10	4.76E−07
hsa04060: cytokine receptor interaction	24	1.00E−06	1.26E−03

Note: Signature pathways for DLBCL were identified by using FDR for multiple tests correction (adjusted α = 0.01). a Modified Fisher Exact P value. b FDR stands for false positive rate, which is used for adjustment of multiple tests for 201 pathways. DLBCL stands for diffuse large B-cell lymphoma.

The survival results for these subtypes are shown in Figure 2. The 10-year overall survival rates for three newly defined molecular subtypes were 90%, 46% and 20%, respectively. The log-rank statistic showed that the survival time of the three subtypes was significantly different (P = 0.008), which had a markedly higher caliber compared to the original partitions (the clinic labels, P = 0.010, see [2]) to map their differential survival profiles. Compared with the partitioning results from a GO module (P = 0.007) obtained previously by our group [1], the pathway-based approach performed equally well, and identified one more molecular subtype.

Figure 2

Clinically distinct DLBCL subtypes defined by gene expression profiling of two signature pathways Kaplan–Meier plot of the overall survival of the three molecular subtypes of DLBCL, partitioned using the expression profiles of the genes contained in two signature pathways, hsa04640 and hsa04060.

To further explore a compact model for clinical use, we analyzed genes included in the two signature pathways using Cox proportional-hazards models. In the univariate analysis, nine genes were found at the liberal significance level of 0.1. Subsequently, using the stepwise variable selection option (with the same inclusion and exclusion P values of 0.05) for the multivariate Cox proportional-hazards regression model, we identified three genes, CD10, CD21 and IL2RB, as predictors (Table 3). CD10 encodes a common acute lymphocytic leukemia (ALL) antigen that serves as an important cell surface marker in the diagnosis of ALL [24]. CD21 encodes a membrane protein that functions as a receptor for Epstein–Barr virus (EBV) binding on B and T lymphocytes. A previous study [25] reported that the prognosis of CD21-positive DLBCL was significantly favorable to that of CD21-negative DLBCL and then a later in vivo experiment [26] showed that CD21 was closely related to LFA-1 expression in B-cell lymphoma (BCL), and the absence of CD21/LFA-1 expression was associated with pleural/peritoneal fluid involvement caused by BCL, which is a potential indicator of BCL progression. It is interesting to note that interleukin-2 receptor beta (IL2RB) was significant in the Cox proportional-hazards model. Although no study has directly shown that IL2RB is a predictor for DLBCL, IL2RB has been reported to be a potential prognostic biomarker for chronic lymphocytic leukemia (CLL) [27].

Table 3

Multivariate Cox proportional-hazard model built using the genes in the two signature pathways

Variable	Estimated coefficient	Wald χ²	P value	Hazard ratio (95% CI)
CD10	−0.762	10.635	0.001	0.530 (0.295–0.738)
CD21	−0.735	6.210	0.013	0.467 (0.269–0.855)
IL2RB	−0.630	6.377	0.012	0.479 (0.327–0.869)

Note: CI stands for confidence interval.

Discussion

From a biological perspective, compared to GO that reflects the functional similarities of genes, KEGG pathway reflects an integration of several specific functions. It is more systematic in revealing and elucidating the sophisticated molecular mechanisms underlying complex diseases such as cancer. Several studies have suggested the link between cancer subtypes and pathways. Therefore, the proposed pathway-based clustering approach for unveiling genetic heterogeneities of complex diseases would facilitate better understanding of the mechanisms underlying these phenomena. In this study, we evaluated this approach using a public benchmark dataset. Our results demonstrated that the gene expression profiles of pathways effectively distinguished well-characterized clinical types of cancers. Hence, there was sufficient reason to believe that the putative signature pathways for a heterogeneous disease could depict the underlying molecular mechanism(s) leading to the molecular subtypes. Further application of this proposed pathway-based approach to DLBCL demonstrated its effectiveness in dissecting genetic heterogeneities in complex diseases. Similar to the GO-based approach, the proposed pathway-based approach is also an efficient unsupervised feature selection method, which yields multiple feature gene sets (i.e., genes annotated to identified signature pathways) of functional compactness. The genes with top-rank expression variations across samples were selected as the initial feature genes [16,17]. Subsequently, the feature genes were further filtered or organized by significant KEGG pathways. Similar to the GO-based approach, this approach is not only useful in identifying both the gene expression signatures and the functional signatures of disease subtypes but can also provide guidance for functional studies on the molecular pathogenesis of the diseases investigated. Although some previous studies [2] that clustered disease subtypes based on expression profiles of the genes achieved great success in dissecting genetic heterogeneities involved in DLBCL, such a clustering algorithm itself does not provide proof of the best grouping of genes in terms of biological functions [28]. Thus, biological interpretation of the grouping requires expert knowledge, which is often subjective [29]. In this study, we proposed to directly use an external annotation database such as the KEGG pathway to extract multiple functionally compact and coherent gene sets. Three hidden subtypes were identified by applying the proposed pathway-based approach in unraveling DLBCL. In terms of the survival analysis and the implications of the signature pathways, the proposed pathway-based approach provided a novel and feasible avenue to the genetic analysis of the hidden subtypes of complex human diseases such as cancer. In this study, we took the known cluster number suggested by the preassigned clinic labels to validate the proposed approach, assuming the lack of heterogeneities in the NCI60 data for several well-characterized cancers. Although the clustering results provided good fits to the known phenotypic partitions, this assumption might not be true [1]. Meanwhile, the problem of estimating the correct number of clusters to unveil hidden cancer subtypes has largely remained unresolved. In addition, although the proposed pathway-based approach has achieved some success in the genetic analysis of the underlying molecular stratifications in cancer, we should recognize the limitations of this study. First, only one dataset for DLBCL was analyzed, it is thus very likely that only a small proportion of the relevant pathways were identified due to the limited information provided by a single dataset. Second, the current knowledge about pathways is largely fragmented and far from complete; hence, this limitation would compromise the aspect of this analysis that relies on pathway knowledge. Finally, although we tried our best to control type I errors (incorrect rejections of true null hypotheses) in various steps toward the identification of either pathways or hidden cancer subtypes, whether the overall type I error was well-controlled remains unclear. In this sense, we considered our analysis as exploratory in nature. Further studies using large-scale datasets and refined pathway knowledge are highly demanded, which could increase the effectiveness in detecting pathways with modest effects. Finally, although in principle the proposed pathway-based method for the analysis of genetic heterogeneities could be extended to other types of data such as that of genome-wide SNPs and next generation sequencing data, such an approach has to be carefully assessed. This assessment will be the next focus of our research group.

Materials and methods

Description of datasets

A large classical multiple-class dataset NCI60 [18] was used as the benchmark dataset to validate the efficiency of the proposed pathway-based approach, which consists of 9703 cDNAs measured in 60 cell lines of nine cancers. The data for prostate cancer were excluded because these consisted of only two samples. Samples of breast tumors, ovarian cancer and non-small cell lung carcinoma were also excluded for the possible existence of heterogeneous hidden subtypes or misassigned labeling of samples [21,30]. Thus, a subset of the NCI60 data (35 samples of five cancer types) was used in the study, including eight samples of renal cancer (RE), six samples of central nervous system cancer (CNS), eight samples of melanoma (ME), seven samples of colon cancer (CO) and six samples of leukemia (LE). After evaluating its ability for accurately partitioning this diverse data structure, we used the proposed pathway-based approach to analyze the hidden subtypes of DLBCL, which has been demonstrated to be notoriously heterogeneous [21,22,30]. The independent dataset for DLBCL consists of 4026 cDNAs measured in 42 samples [2]. We verified the identified hidden partitions by survival analysis of the clinical profiles of patients in each molecular-based partition. A detailed procedure chart for the pathway-based approach is shown in Figure 3 and described below. The corresponding source code is freely available upon written request. For data preprocessing, we adopted a unified criterion for the initial selection of genes from the previously described cDNA microarray datasets. First, we discarded clones with missing data in more than 5% of the arrays and applied a base-2 logarithmic transformation to the expression data. Second, similar to our previous paper [2], we imputed remaining missing data with zeros. Third, the data for genes were centered by subtracting the observed median value. The final datasets of NCI60 and DLBCL finally comprised 5124 and 3148 genes, respectively.

Figure 3

A detailed procedure chart for pathway-based analysis of genetic heterogeneities

Selecting putative signature pathways from KEGG

For the NCI60 and DLBCL datasets, the top x percent of genes with largest expression variances were selected as feature genes. Subsequently, we loaded these feature genes into the Database for Annotation, Visualization and Integrated Discovery (DAVID) [31] software to test their enrichment in pathways based on a modified Fisher Exact test. Finally, we identified the significantly enriched pathways at a false discovery rate (FDR) of 0.01 to adjust for multiple tests of 201 pathways in the DAVID database. To demonstrate the robustness of the pathways, we compared the pathways identified at different top percentage levels (x = 10, 15 and 20) of the feature genes with the largest variances for NCI60 or DLBCL.

Clustering samples based on individual pathways

For each signature pathway, we extracted the expression profiles of the measured genes that were annotated to it. By agglomerative hierarchical clustering algorithm, each sample was initially assigned to one cluster, then the distances between clusters were computed, and the two clusters with the smallest distance value were merged. Distance computation and merging were repeated until there was only one cluster left. In this work, correlation (uncentered) was used as the distance metric and the average linkage method was used for merging. The software can be downloaded from the website of the Eisen Lab (http://rana.lbl.gov/EisenSoftware.html). We pruned off the hierarchical tree to allocate the samples into clusters. To evaluate the performance of the proposed approach, the number of clusters in the validation dataset was determined by the number of predefined clusters from the original data source. Additionally, we assessed the classification performance of these signature pathways using a decision-tree based approach [32]. Finally, the adjusted ARI [19] was calculated to measure the agreement between the identified clusters and the original partitions, which ranged from 0 to 1. The expected value of the ARI is 0 when the partitions are randomly drawn, and the ARI is 1 when two partitions perfectly agree. A larger ARI dictates a higher correspondence between two types of partitions. To assess the significance of the ARI, we compared the observed ARI value with that of the same-sized gene subsets randomly selected from the whole microarray. The aim of this statistical test was to empirically verify whether the profiles of the genes in the signature pathways performed significantly better at clustering than the gene groups randomly selected from a null (or contrast) population, in which the gene had no or less functional relationship. It is well known that similarly expressed (co-expressed) genes tend to share the same or similar function(s) and in fact, the gene co-expression information is often used for predicting gene functions [33]. Similar to our previous study [1], we constructed the null gene population using the silenced genes among all the annotated genes from the original expression profiles after excluding the genes annotated to the identified signature pathways and the genes significantly co-expressed with at least one gene in the signature pathways. Here, two genes were classified as co-expressed when the absolute value of Pearson correlation coefficient of their expression was larger than a threshold at significance level α = 0.005, as determined using 10,000 gene pairs randomly sampled from the original expression profiles. Subsequently, for each signature pathway, 1000 gene subsets of the same gene set size were randomly sampled from the null population. By applying the same clustering procedure to the 1000 random gene subsets, we defined the empirical P value for the observed ARI of each signature pathway as the fraction (proportion) of 1000 random subsets having ARIs larger than that of the signature pathway. The P value was used to assess whether an identified signature pathway had significantly better performance at correctly partitioning the samples (i.e., more likely relevant to the phenotypic partitions) than the random gene subsets that were less likely to be functionally related.

Clustering based on multiple pathways

The utility of partition for a single pathway might be limited, wherein some samples could have been misclassified through the use of information from only one or a few pathways. To increase the accuracy of phenotypic partition, we applied a voting step to comprehensively integrate the partition results drawn from each signature pathway. Specifically, for a sample that had multiple membership labels obtained from different pathways, we applied a simple majority rule to determine the sample’s membership. If several classes drew the vote, we randomly assigned one of the class labels to the sample. The agreement between the clustering results based on multiple pathways and the original partitions was also evaluated by calculating the ARI. Alternatively, assuming that the original phenotypic labels for samples were correct, we evaluated the signature pathways by building a decision tree [32]. We then used the approach to unveil the hidden subtypes of DLBCL.

Survival analysis

To verify the clinical significance of the identified hidden DLBCL subtypes, we estimated the survival curves of the subtypes using the Kaplan–Meier product-limit method and assessed the difference between the survival curves using the log-rank test [34]. To explore a compact model for clinical use, we also evaluated the potential of genes in the signature pathways for predicting phenotypes. First, we applied a univariate Cox proportional-hazards model to identify the genes whose marginal effects on the overall survival time were significant. Subsequently, a multivariate Cox proportional-hazards model was used to analyze the power of the significant genes for predicting the overall survival time. The Wald χ2 test was used to determine the significance of each predictor’s hazard toward the patients’ survival time.

Authors’ contributions

SR, XZ and SZ conceived the project, performed the analysis and wrote the manuscript. All the remaining authors participated in writing the computing codes and analyzing the public datasets. All authors read and approved the final manuscript.

Competing interests

The authors declare no competing interests.

31 in total

1. The KEGG databases at GenomeNet.

Authors: Minoru Kanehisa; Susumu Goto; Shuichi Kawashima; Akihiro Nakaya
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

2. DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Authors: Glynn Dennis; Brad T Sherman; Douglas A Hosack; Jun Yang; Wei Gao; H Clifford Lane; Richard A Lempicki
Journal: Genome Biol Date: 2003-04-03 Impact factor: 13.583

3. The Gene Ontology (GO) database and informatics resource.

Authors: M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

4. Unsupervised feature selection via two-way ordering in gene expression analysis.

Authors: Chris H Q Ding
Journal: Bioinformatics Date: 2003-07-01 Impact factor: 6.937

5. Judging the quality of gene expression-based clustering methods using gene annotation.

Authors: Francis D Gibbons; Frederick P Roth
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

6. Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia.

Authors: Lars Bullinger; Konstanze Döhner; Eric Bair; Stefan Fröhling; Richard F Schlenk; Robert Tibshirani; Hartmut Döhner; Jonathan R Pollack
Journal: N Engl J Med Date: 2004-04-15 Impact factor: 91.245

7. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma.

Authors: Andreas Rosenwald; George Wright; Wing C Chan; Joseph M Connors; Elias Campo; Richard I Fisher; Randy D Gascoyne; H Konrad Muller-Hermelink; Erlend B Smeland; Jena M Giltnane; Elaine M Hurt; Hong Zhao; Lauren Averett; Liming Yang; Wyndham H Wilson; Elaine S Jaffe; Richard Simon; Richard D Klausner; John Powell; Patricia L Duffey; Dan L Longo; Timothy C Greiner; Dennis D Weisenburger; Warren G Sanger; Bhavana J Dave; James C Lynch; Julie Vose; James O Armitage; Emilio Montserrat; Armando López-Guillermo; Thomas M Grogan; Thomas P Miller; Michel LeBlanc; German Ott; Stein Kvaloy; Jan Delabie; Harald Holte; Peter Krajci; Trond Stokke; Louis M Staudt
Journal: N Engl J Med Date: 2002-06-20 Impact factor: 91.245