| Literature DB >> 14667254 |
Mark A Levenstien1, Yaning Yang, Jürg Ott.
Abstract
BACKGROUND: With the increasing amount of data generated in molecular genetics laboratories, it is often difficult to make sense of results because of the vast number of different outcomes or variables studied. Examples include expression levels for large numbers of genes and haplotypes at large numbers of loci. It is then natural to group observations into smaller numbers of classes that allow for an easier overview and interpretation of the data. This grouping is often carried out in multiple steps with the aid of hierarchical cluster analysis, each step leading to a smaller number of classes by combining similar observations or classes. At each step, either implicitly or explicitly, researchers tend to interpret results and eventually focus on that set of classes providing the "best" (most significant) result. While this approach makes sense, the overall statistical significance of the experiment must include the clustering process, which modifies the grouping structure of the data and often removes variation.Entities:
Mesh:
Year: 2003 PMID: 14667254 PMCID: PMC328091 DOI: 10.1186/1471-2105-4-62
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Results from haplotype association tests applied to all steps of the hierarchical structure formed by clustering data from Hoehe et al. [2]. This bar graph presents the local p-values computed by our group at all steps within the hierarchical structure.
Figure 2Results from log-rank tests applied to steps of the hierarchical structure formed by clustering data from Garber et al. [11]. A, This schematized dendrogram reflects the process of clustering microarray samples according to the similarity of their gene expression profiles as measured by the Pearson correlation coefficient. Distances between array sample clusters are approximated (not to scale) by the vertical axis. Along the bottom of the dendrogram are the microarray tissue samples from individuals for which survival data was available [11]. B, This bar graph displays the local p-values we compute at each step within the structure created by hierarchical clustering.
Figure 3Results from log-rank tests applied to steps of the hierarchical structure formed by clustering data from Alizadeh et al. [12]. A, This schematized dendrogram reflects the process of clustering microarray samples according to the similarity of their gene expression profiles as measured by the Pearson correlation coefficient. Distances between array sample clusters are approximated (not to scale) by the vertical axis. Along the bottom of the dendrogram are the microarray tissue samples from individuals for which survival data was available [12]. B, This bar graph displays the local p-values we compute at each step within the structure created by hierarchical clustering.