| Literature DB >> 22849515 |
Rong-Lin Wang1, David Bencic, Adam Biales, Robert Flick, Jim Lazorchak, Daniel Villeneuve, Gerald T Ankley.
Abstract
BACKGROUND: Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22849515 PMCID: PMC3469349 DOI: 10.1186/1471-2164-13-358
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sample size across chemical-tissue conditions for classifier search
| 17α-Ethynyl Estradiol | 10b | 5 | 9 | 30 |
| Fadrozole | 15c | 15 | 30 | |
| Fipronil | 10c | 10 | 10 | 30 |
| Flutamide | 10 | 15 | 25 | |
| Ketoconazole | 10c | 15 | 5 | 40 |
| Muscimol | 20c | 5 | 25 | |
| Prochloraz | 10d | 10 | 10 | 30 |
| 17β-Trenbolone | 5 | 15 | 25 | |
| Trilostane | 10/5 | 20 / 15 | 30/20 | |
| Vinclozolin | 10 | 15 | 25 | |
| Total | 80 | 105 | 84 | 269/290 |
Except for trilostane-ovary and trilostane-testis, each entry represents equal number of independent replicates of both treated and control samplesa. In trilostane where some of the same controls were paired to different treatment conditions, only unique ones were counted. Conditions with less than nine samples in either class were excluded from classifier search. “NA”, a condition not included in the study. The dataset of “all tissues” includes brain, ovary, testis, as well as a few additional samples from liver for selected conditions.
afor example, 17α-Ethynyl Estradiol-brain had 10 treated samples and 10 control samples; bmale only; cmale and female; dfemale only.
Figure 1Flow chart for gene classifier discovery and validation.
Chemical-tissue conditions found with (1) or without (0) a qualified gene classifier
| 17α-Ethynyl Estradiol | 0 | 0 | --- | 0 |
| Fadrozole | 0 | 0 | 0 | |
| Fipronil | 0 | 0 | 0 | 1 |
| Flutamide | 0 | 1 | 0 | |
| Ketoconazole | 0 | 0 | 1 | --- |
| Muscimol | 0 | 1 | --- | |
| Prochloraz | 0 | 0 | 1 | 1 |
| 17β-Trenbolone | 0 | --- | 0 | |
| Trilostane | 0 | --- | 0 | |
| Vinclozolin | 0 | 0 | 0 |
Searches were conducted transcriptome-wide using genetic algorithm-support vector machine followed by visual screening of heat maps. Each condition refers to the tissue sampled from fish treated with a particular chemical. “NA”, a condition not included in the study; “---“, insufficient sample size for classifier search.
Summary statistics of qualified gene classifiers from network-specific searches
| Prochloraz | & | 177 | 68 | 76 | 321 |
| Flutamide | & | 124 | 0 | 124 | |
| Fipronil | & | 71 | 17 | 20 | 108 |
| 17α-ethynyl estradiol | 0 | 41 | --- | 50 | 91 |
| Vinclozolin | 0 | 28 | 1 | 29 | |
| Ketoconazole | & | 18 | 7 | --- | 25 |
| Muscimol | & | 22 | --- | 22 | |
| Fadrozole | & | 4 | 0 | 4 | |
| 17β-trenbolone | & | --- | 0 | | |
| trilostane | 0 | --- | 0 | | |
| Subtotal by tissue | 333 | 243 | 147 |
A total of 517 individual networks were analyzed by chemical-tissue conditions using genetic algorithm-K nearest neighbors. Within a condition, each TF network contributed one classifier composed of multiple gene features. “NA”, a condition not included in the study; “---“, insufficient sample size for classifier search; “0”, searched but no qualified classifier; “&”, no search conducted.
aBetween 20 to 22 of the 517 networks did not contain sufficient number of genes to be searched due to variations in their availability in different datasets.
Figure 2A comparison of heat maps of selected gene classifiers. Heat maps were generated for gene classifiers of prochloraz-ovary and prochloraz-brain based on the corresponding gene expression profiles of the same chemical-tissue conditions in the respective training and validation datasets. All datasets were prepared tissue-specifically. The classifiers were unique to each chemical-tissue condition, and cross mapped between the design 015064 and 019161 based on the exact match of their probe sequences. The red and green bars above each heat map indicate treated and control samples in a condition. Pairs of heat maps in each column compare a classifier of the same group of gene features between the training and validation data. Pro, prochloraz; TF network, transcription factor network.
Number of gene features in qualified classifiers for various chemical-tissue conditions
| Prochloraz | 1236 (3391) | 277 (1437) | 335 (1550) | 1848 (6378) |
| Flutamide | 814 (2491) | 0 | 814 (2491) | |
| 17α-ethynyl estradiol | 221 (897) | --- | 252 (737) | 473 (1634) |
| Fipronil | 261 (1370) | 47 (338) | 53 (353) | 361 (2061) |
| Vinclozolin | 131 (662) | 3 (6) | 134 (668) | |
| Ketoconazole | 84 (519) | 30 (185) | --- | 114 (704) |
| Fadrozole | 14 (115) | 0 | 14 (115) | |
| Muscimol | 78 (656) | --- | 78 (656) | |
| 17β-trenbolone | --- | 0 | | |
| Trilostane | --- | 0 | | |
| Subtotal by tissue | 1894 | 1299 | 643 | |
Based on Agilent probe IDs, each total count includes those gene features overlapping among multiple classifiers. While search was network-specific, gene features were summarized across multiple networks in a classifier for a given condition where available. A total of 7522 features were selected into classifiers for the 15 conditions listed. “NA”, not included in the study; “---“, insufficient sample size for classifier search; “0”, searched but no qualified classifier.
Gene set enrichment analysis (GSEA) of classifiers on their original gene expression profiles (GEPs) for training
| Fadrozole (FAD) | No enrichment | FAD-ovary (2.01)a | PRO-ovary (0.005) | |
| | | | | KET-ovary (0.006) |
| | | | | FLU-ovary (0.01) |
| 17α-ethynyl estradiol (EE2) | EE2-brain (0.028) | EE2-ovary (22.4)b | KET-ovary (0.002) | |
| | | | | PRO-ovary (0.004) |
| | | | | FLU-ovary (0.023) |
| Muscimol (MUS) | MUS-brain (0.0) | MUS-ovary (42.62)b | FIP-ovary (0.027) | |
| | | | | KET-ovary (0.035) |
| | | | | PRO-ovary (0.052) |
| | | | | FLU-ovary (0.077) |
| Fipronil (FIP) | FIP-brain (0.002); MUS-brain (0.012) | FIP-ovary (0.006); | ||
| | | | | PRO-ovary (0.0) |
| | | | | KET-ovary (0.0) |
| Flutamide (FLU) | | KET-ovary (0.002) | ||
| | | | | PRO-ovary (0.006) |
| | | | | FLU-ovary (0.007) |
| Ketoconazole (KET) | KET-brain (0.006); | KET-ovary (0.0) | ||
| | | | | PRO-ovary (0.008) |
| | | | | FLU-ovary (0.18) |
| Prochloraz (PRO) | PRO-brain (0.0) | PRO-ovary (0.0) | ||
| | | | | KET-ovary (0.009) |
| | | | | FLU-ovary (0.008) |
| 17β-trenbolone (TRE) | TRE-brain (22.94)b | No enrichment | TRE-ovary (3.28)a | KET-ovary (0.1) |
| | | | | PRO-ovary (0.234); |
| | | | | FIP-ovary (0.236) |
| Trilostane (TRI) | | TRI-ovary (11.16)b | PRO-ovary (0.121) | |
| Vinclozolin (VIN) | | VIN-ovary (0.024); | ||
| | | | | KET-ovary (0.0) |
| | | | | PRO-ovary (0.107) |
| | | | | FLU-ovary (0.106) |
| F-ratio mean = 10.59 | F-ratio mean = 18.44 |
Each classifier was formatted into a gene set, and multiple gene sets were grouped by their tissue of origin. GSEA was conducted by individual chemical-tissue conditions, at a threshold of FDR ≤ 0.25. Classifiers enriched on the top or bottom of a ranked gene list are separated by a semicolon. If two classifiers of the same condition are both significant, only the most significant one is listed. A GEP in bold indicates the availability of its corresponding classifier as a gene set. FDR, false discovery rate; F-ratio, maximum Fisher’s discriminant ratio; “NA“, a condition not included in the study.
aSearched but no qualified classifier found for the condition. Its GSEA was conducted using classifiers for other unrelated chemical-tissues.
bNo classifier search was conducted due to insufficient sample size for the condition. Its GSEA was conducted using classifiers for other unrelated chemical-tissues.
Gene set enrichment analysis (GSEA) of classifiers on their validation data
| Flutamide (FLU) | FLU-ovary (13.76) | VIN-ovary (0.174) | ||
| | | | | FLU-ovary(0.21)a; |
| | | | | KET-ovary (0.039) |
| | | | | PRO-ovary (0.123) |
| Prochloraz (PRO) | PRO-brain (24.75) | No enrichment | PRO-ovary (15.2) | PRO-ovary (0.067) |
| KET-ovary (0.163) |
Although the gene expression profiles (GEPs) were preprocessed as a whole by combining data from both brain and ovary, each GSEA was conducted by individual chemical-tissue conditions at a threshold of FDR ≤ 0.25. Classifiers enriched on the top or bottom of a ranked gene list are separated by a semicolon. If two classifiers for the same condition were both significant, only the most significant one is listed. Mapping of the probe IDs between the Agilent design 15064 and 19161 was based on identical probe sequences only. FDR, false discovery rate; F-ratio, maximum Fisher’s discriminant ratio. “NA”, a condition not included in the study.
aWhen the GEPs were preprocessed with data from ovary only, the FDR for flutamide-ovary was 0.53.