| Literature DB >> 19224643 |
Abstract
BACKGROUND: In the work of Chari et al. entitled "Effect of active smoking on the human bronchial epithelium transcriptome" the authors use SAGE to identify candidate gene expression changes in bronchial brushings from never, former, and current smokers. These gene expression changes are categorized into those that are reversible or irreversible upon smoking cessation. A subset of these identified genes is validated on an independent cohort using RT-PCR. The authors conclude that their results support the notion of gene expression changes in the lungs of smokers which persist even after an individual has quit.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19224643 PMCID: PMC2656532 DOI: 10.1186/1471-2164-10-82
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of criticisms.
| poor definition of "preferential" expression | introduces unchecked bias from different group sizes |
| incorrect use of Venn diagram | confounds overall sense of group-specific differences |
| use of raw tag counts to determine "preferential" expression | introduces unchecked bias from different library sizes |
| data filtered using criteria that includes variable to be tested | pre-selects for data more likely to be found significant, confounding estimated of false discovery rate (FDR) |
| significance threshold set to | false discovery rate (FDR) could be very high |
| "significant" results undergo | low tag counts more likely to pass the filter, yet these more likely to represent random variation |
| other possible null hypotheses not tested | not possible to check for consistency with known biology |
| null hypotheses formed with 2 of the 3 sample types | loss of power |
| data selected for differential expression is clustered | formation of distinct clusters is meaningless |
| genes tested for consistency with third sample group restricted to genes pre-selected as different between original two groups | flaws in implementation of first hypothesis test become propagated and amplified in second hypothesis test |
| no RT-PCR of irreversible genes | no validation of irreversible gene expression hypothesis |
| evidence for GSK3B as an irreversible gene is weak or supports reversible hypothesis | selection of GSK3B for further experimentation is not indicated |
| tags per million (TPM) used in statistical testing rather than for reporting purposes only | artificially inflates non-zero counts |
| some SAGE tags incorrectly mapped | a) follow-up RT-PCR is not validation, b) evidence for involvement of COX2 pathway is weaker than implied |
A short description of each of the criticisms addressed in this correspondence, and their consequences for Chari et al.
Figure 1"Venn" diagram using Chari . The left hand Venn diagram shows values obtained using a null (randomized) dataset, and the right hand Venn diagrams shows values obtained using the actual dataset. Note these do not represent properly formed Venn diagrams (see text for details).
Estimated false discovery rate (FDR) of null hypotheses using different combinations of the three groups
| null dataset | actual dataset | ||||||
| null hypothesis | ≥20 TPM | p ≤ 0.05 | fold-change ≥ 2 | ≥20 TPM | p ≤ 0.05 | fold-change ≥ 2 | FDR |
| N = C | 7406 | 418 | 195 | 7764 | 885 | 609 | 47.2 |
| N = F | 7323 | 384 | 157 | 7547 | 765 | 447 | 50.2 |
| F = C | 7102 | 416 | 92 | 7318 | 895 | 433 | 46.5 |
| (N, F) = C | 7726 | 411 | 82 | 8148 | 959 | 460 | 42.8 |
| N = (F, C) | 7726 | 382 | 67 | 8148 | 836 | 475 | 45.7 |
| (N, C) = F | 7726 | 388 | 157 | 8148 | 818 | 314 | 47.4 |
The estimated false discovery rate (FDR) of different null hypotheses when grouping the 24 SAGE libraries of Chari et al. into never, former, and current smokers. A p-value cutoff of 0.05 was used, as in the original paper.
Figure 2Estimated false discovery rate at different threshold p-values for comparisons of two groups. The x-axis represents different p-value cutoffs (0.00–0.10) to determine differential expression, and the y-axis represents the estimated false discovery rate (FDR) expected for each cutoff. The color key to differentiate the three possible null hypotheses comprising two of the three groups is shown in the legend.
Figure 3Estimated false discovery rate at different threshold p-values for comparisons of three groups. The x-axis represents different p-value cutoffs (0.00–0.10) to determine differential expression, and the y-axis represents the estimated false discovery rate (FDR) expected for each cutoff. The color key to differentiate the three possible null hypotheses involving pair-wise combinations of all of the three groups is shown in the legend.
Figure 4Hierarchical clustering using Chari . Counts for SAGE tags that met a threshold p-value cutoff of 0.05 for the null hypothesis of never = current smokers. The counts were row normalized and underwent single-link hierarchical clustering using a Pearson correlation as a distance metric. The left-hand tree represents the actual dataset and the right-hand tree represents the null dataset.