| Literature DB >> 24678894 |
Xiaomou Wei, Junmei Ai, Youping Deng1, Xin Guan, David R Johnson, Choo Y Ang, Chaoyang Zhang, Edward J Perkins.
Abstract
BACKGROUND: High throughput transcriptomics profiles such as those generated using microarrays have been useful in identifying biomarkers for different classification and toxicity prediction purposes. Here, we investigated the use of microarrays to predict chemical toxicants and their possible mechanisms of action.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24678894 PMCID: PMC4051169 DOI: 10.1186/1471-2164-15-248
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Effect of normalization methods on the classification accuracy. Microarray experiments were developed using Agilent rat whole genome array (4 X 44 k). Cultured primary hepatocytes were treated with distinctive 105 compounds (Additional file 1) as well as respective vehicle controls for 24 h; subsequently RNAs were isolated for array hybridization. 105 compounds treated samples and control samples were divided into 14 classes. Two normalization methods (median and control) based normalizations were compared for the classification accuracy of the 14 classes.
Figure 2Effect of initial feature filtering methods on the classification accuracy. Three initial feature filtering methods including One-Way ANOVA, Kruscal Wallis and One-Way ANOVA unequal variance were compared for the classification accuracy for 14 class compounds. Different feature (gene) sizes to compare the mean prediction accuracies of 14 classes for each method, by averaging the prediction accuracy of different classification algorithms.
Figure 3Effect of classification algorithms on the classification accuracy. Six classification algorithms including J48, LibSVM, NB, RF, SMO and SL were used for the comparison. The prediction accuracy shown here was the mean value by averaging the prediction accuracy of 6 feature selection methods including ChiSquare, GainRatio, Inforgain, PCA, SVM-RFE and Relief for different feature (gene) sizes (10, 25, 50, 100, 200, 300, 400 and 500).
Figure 4Effect of feature algorithms on the classification accuracy. The figure shows comparative prediction results for 6 feature selection methods, which include PCA, Chisquare, Gainratio, Inforgain, relief, and SVM-RFE. The prediction accuracy shown in the figure was mean values by averaging different classification algorithms including J48, LibSVM, NB, RF, SMO and SL for each feature size (10 to 500).
Figure 5The best models for the classification of 14 class compounds. Seven feature selection methods, including PCA, Chisquare, Gradient, Gainratio, Inforgain, Relief, and SVM-RFE were used to compare their impact on the classifcation accuracy of 14 class compounds based on LibSVM classification algorithm. Different feature sizes (10 to 500) for each feature selection method were applied.
Prediction of 14 class compounds using independent dataset
| SVM-RFE | 200 | LibSVM | 100 | 64.9 |
| Relief | 500 | SL | 83.7 | 72.6 |
| Inforgain | 500 | SL | 83.7 | 66.7 |
| Chisquare | 400 | SL | 82.6 | 72.6 |
| Gainratio | 500 | SL | 82.9 | 66.1 |
| PCA | 200 | SMO | 73.3 | 66.7 |
| Gradient | 300 | SMO | 85.7 | 79.7 |
*Dataset 1 (D1) has a total of 168 array samples what were produced in 2007, and dataset 2 (D2) includes 363 array samples that were hybridized in 2008. For each dataset, a complete set of 105 compounds were included.
Figure 6Comparison of prediction overfitting rate of various feature selection methods. The overfitting rates of different feature selection methods PCA, Chisquare, Gradient, Gainratio, Inforgain, Relief, and SVM-RFE over three classification algorithms, LibSVM, SMO and SL were compared. The overfitting rate was calculated by the percentage of the difference between the training accuracy and prediction accuracy of the summary of both the accuracies for a specific method.
Figure 7Gene expression pattern analysis of biomarkers. A. 300 transcripts (horizontal axis) resulted from the Gradient algorithm was used to perform a two-way hierarchical analysis across 14 classes (vertical axis). B. 104 transcripts (horizontal axis) were used to perform a hierarchical clustering across different compounds in the classes of antimicrobial, cancer related drugs, pesticides, PXR mediators, inflammatory mediators, and metals as well as control (vertical axis). An Euclidean distance algorithm was applied to calculate the distances between transcripts or between conditions. The relative level of gene expression is indicated by the color scale at the bottom of Figure 7B.
Functional analysis of biomarkers that distinguish 14 class compounds
| Cell Cycle | 3.88E-29-2.23E-02 | 42 | KIF23, KIF20A, CDC20, PTTG1, CCNB2, DSN1, KNTC1, MKI67, NUF2, AURKB, TTK, SKA1, BIRC5, RAD51, CCNA2, NEK2, TOP2A, CDKN3, KIF2C, ECT2, KIFC1, NCAPH, TRIP13, TACC3, ESPL1, CCNB1, RACGAP1, SPC25, PRC1, KIF4A, CKAP2, PLK1, FOXM1, BUB1B, CDC2, NCAPG2, BUB1, CCNE1, MCM2, KIF20B, NDC80, UBE2C |
| Cellular assembly and organization | 8.61E-23-2.16E-02 | 33 | KIF23, PTTG1, CCNB2, DSN1, AURKB, TTK, NUF2, SKA1, BIRC5, RAD51, CCNA2, NEK2, EZR, TOP2A, KIF2C, ECT2, NCAPH, KIFC1, TACC3, ESPL1, CCNB1, KIF4A, PRC1, SPC25, PLK1, CKAP2, BUB1B, NCAPG2, CDC2, BUB1, CCNE1, KIF20B, NDC80 |
| DNA replication, recombination, and repair | 8.61E-23-2.16E-02 | 44 | KIF23, MCM6, CDC20, KIAA0101, PTTG1, CCNB2, DSN1, KNTC1, AURKB, TTK, NUF2, PBK, SKA1, BIRC5, RAD51, CCNA2, NEK2, TOP2A, KIF2C, ECT2, NCAPH, KIFC1, EXO1, MCM5, TRIP13, TACC3, ESPL1, CCNB1, KIF4A, SPC25, PRC1, CKAP2, PLK1, FOXM1, BUB1B, NCAPG2, CDC2, MCM3, BUB1, CCNE1, MCM2, POLA2, NDC80, TK1 |
| Cellular movement | 3.82E-18-1.73E-02 | 13 | KIF23, KIF20A, CCNB1, CDC20, RACGAP1, KIF4A, PRC1, PLK1, AURKB, KIF20B, TOP2A, ECT2, KIFC1 |
| Cell death | 5.84E-08-2.44E-02 | 28 | CDC20, PTTG1, TTK, NUF2, LMNB1, BIRC5, RAD51, CCNA2, NEK2, EZR, TOP2A, TACC3, CCNB1, ESPL1, PCSK9, SPC25, RRM2, PLK1, CKAP2, BUB1B, FOXM1, CDC2, NCAPG2, BUB1, CCNE1, MCM2, TK1, UBE2C |
| Cellular compromise | 1.87E-05-1.73E-02 | 13 | KIF23, TACC3, CCNB1, PTTG1, PLK1, CDC2, BIRC5, NEK2, EZR, TOP2A, NDC80, KIF2C, ECT2 |
| Cellular growth and proliferation | 5.43E-05-2.39E-02 | 32 | KIF20A, KIF23, KIAA0101, PTTG1, MKI67, TTK, PBK, BIRC5, RAD51, CCNA2, NEK2, EZR, CDKN3, KIF2C, E2F8, MCM5, TACC3, ESPL1, CCNB1, PRC1, RRM2, PLK1, FOXM1, BUB1B, CDC2, MCM3, BUB1, CCNE1, MCM2, KIF20B, TCF19, UBE2C |
Pathway analysis of biomarkers that distinguish 14 class compounds
| Mitotic roles of polo-like kinase | 11.6 | 1.45E-01 | KIF23, CCNB1, ESPL1, CDC20, PTTG1, PRC1, CCNB2, PLK1, CDC2 |
| Cell Cycle: G2/M DNA damage checkpoint regulation | 6.17 | 1.16E-01 | CCNB1, TOP2A, CCNB2, PLK1, CDC2 |
| ATM signaling | 4.16 | 7.55E-02 | RAD51, CCNB1, CCNB2, CDC2 |
| Nicotinate and nicotinamide metabolism | 3.03 | 2.94E-02 | NEK2, PLK1, TTK, CDC2 |
| Inositol phosphate metabolism | 2.57 | 2.27E-02 | NEK2, PLK1, TTK, CDC2 |
| Sonic hedgehog signaling | 2.15 | 6.06E-02 | CCNB1, CDC2 |
| Pancreatic adenocarcinoma signaling | 2 | 2.59E-02 | RAD51, CCNE1, BIRC5 |
| Hereditary breast cancer signaling | 1.85 | 2.33E-02 | RAD51, CCNB1, CDC2 |
| Role of BRCA1 in DNA damage response | 1.62 | 3.28E-02 | RAD51, PLK1 |
Figure 8Mitotic roles of Polo-like kinase pathway. Most of the genes in the mitotic of Polo-like kinase pathway were down regulated (green color highlighted) by most of the compounds in the classes of metals and inflammatory mediators, but up regulated by most of the compounds in the classes of antimicrobial, cancer related drugs, pesticides, and PXR mediators.
Figure 9Cell cycle related gene network. A cell cycle network was constructed using Ingenuity knowledge base tool. Most of the genes in the network were down regulated (green color highlighted) by most of the compounds in the classes of metals and inflammatory mediators, but up regulated by most of the compounds in the classes of antimicrobial, cancer related drugs, pesticides, and PXR mediators. Nf-kB complex is connected with cell cycle genes.