| Literature DB >> 29534550 |
Deling Wang1,2, Jia-Rui Li3, Yu-Hang Zhang4, Lei Chen5, Tao Huang6, Yu-Dong Cai7.
Abstract
Breast cancer is one of the most common malignancies in women. Patient-derived tumor xenograft (PDX) model is a cutting-edge approach for drug research on breast cancer. However, PDX still exhibits differences from original human tumors, thereby challenging the molecular understanding of tumorigenesis. In particular, gene expression changes after tissues are transplanted from human to mouse model. In this study, we propose a novel computational method by incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), random forest (RF), and rough set-based rule learning, to identify genes with significant expression differences between PDX and original human tumors. First, 831 breast tumors, including 657 PDX and 174 human tumors, were collected. Based on MCFS and RF, 32 genes were then identified to be informative for the prediction of PDX and human tumors and can be used to construct a prediction model. The prediction model exhibits a Matthews coefficient correlation value of 0.777. Seven interpretable interactions within the informative gene were detected based on the rough set-based rule learning. Furthermore, the seven interpretable interactions can be well supported by previous experimental studies. Our study not only presents a method for identifying informative genes with differential expression but also provides insights into the mechanism through which gene expression changes after being transplanted from human tumor into mouse model. This work would be helpful for research and drug development for breast cancer.Entities:
Keywords: Monte Carlo feature selection; breast cancer; patient-derived tumor xenograft; random forest
Year: 2018 PMID: 29534550 PMCID: PMC5867876 DOI: 10.3390/genes9030155
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1The repeated incremental pruning to produce error reduction (RIPPER) algorithm. RIPPER starts from an empty rule and splits the training set into a growing set and a pruning set. Then it repeatedly grows rules that achieve the highest Foil’s information gain using the growing set and prunes the rules using the pruning set until certain conditions are met. Finally, a global pruning is applied to prune the rules to gain the final rule set.
Figure 2Incremental forward selection (IFS) curve illustrating the relationship of the prediction performance and the number of features incorporated in building the prediction engine. MCC: Matthews coefficient correlation.
Comparison on performance of different models. The RF model with top 32 features achieved the highest MCC (0.777). RF: random forest; SVM: supporter vector machine; MCFS: Monte Carlo feature selection.
| Model | Feature No. | MCC | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|---|
| RF | 32 | 0.996 | 0.672 | 0.929 | |
| RF | 57 (MCFS cutoff) | 0.695 | 0.995 | 0.598 | 0.905 |
| Rough Set | 57 (MCFS cutoff) | 0.665 | 0.950 | 0.680 | 0.893 |
| SVM | 41 | 0.695 | 0.995 | 0.563 | 0.904 |
| Dagging | 58 | 0.599 | 0.996 | 0.436 | 0.878 |
Seven rules quantitatively defining the criteria for classification. These rules were produced by the MCFS software package. If a tumor cell satisfies the first six criteria, it would be classified into Human tumor; otherwise, it would be classified into PDX tumor.
| Rules | Criteria | Classification |
|---|---|---|
| Rule 1 | KRT19 ≥ 1.939224 | Human tumor |
| KRT5 ≤ 0.148786 | ||
| CDH3 ≤ 0.868794 | ||
| Rule 2 | EMP1 ≥ 4.237572 | Human tumor |
| CAV2 ≤ 1.610886 | ||
| Rule 3 | TP53 ≤ 0.291193 | Human tumor |
| CXCR4 ≥ 4.367387 | ||
| TGFBR2 ≤ 1.868461 | ||
| Rule 4 | CXCR4 ≤ −2.474571 | Human tumor |
| CD44 ≥ 0.086944 | ||
| PTEN ≥ 0.143515 | ||
| VIM ≤ 0.647694 | ||
| Rule 5 | PARP2 ≥ 3.111536 | Human tumor |
| Rule 6 | PLCB4 ≥ 3.744729 | Human tumor |
| AKT1 ≤ −0.070679 | ||
| Rule 7 | Other conditions | PDX tumor |
PDX: Patient-derived tumor xenograft.
Thirty-two differentially expressed genes identified as optimal features.
| HUGO Symbol | HUGO Name | |
|---|---|---|
| epithelial membrane protein 1 | 0.16895404 | |
| poly(ADP-ribose) polymerase 2 | 0.15058246 | |
| keratin 19 | 0.12158414 | |
| mucin 1, cell surface associated | 0.11115772 | |
| C-X-C motif chemokine receptor 4 | 0.07917199 | |
| prominin 1 | 0.06480689 | |
| erb-b2 receptor tyrosine kinase 2 | 0.048957534 | |
| erb-b2 receptor tyrosine kinase 3 | 0.04209958 | |
| keratin 5 | 0.037512265 | |
| inhibitor of DNA binding 4, HLH protein | 0.03389286 | |
| phosphatase and tensin homolog | 0.029668033 | |
| neurotrophic receptor tyrosine kinase 2 | 0.022596486 | |
| progesterone receptor | 0.020494139 | |
| tumor protein p53 | 0.019557578 | |
| cadherin 3 | 0.01846532 | |
| BMI1 proto-oncogene, polycomb ring finger | 0.013900218 | |
| transforming growth factor beta receptor 2 | 0.013375987 | |
| cyclin B1 | 0.013296658 | |
| phospholipase C beta 4 | 0.013219586 | |
| claudin 4 | 0.013182897 | |
| C-X-C motif chemokine ligand 12 | 0.010324035 | |
| epidermal growth factor receptor | 0.010273729 | |
| CD44 molecule (Indian blood group) | 0.009676576 | |
| leucine rich repeat containing G protein-coupled receptor 5 | 0.008659011 | |
| notch 4 | 0.007799821 | |
| BCL2, apoptosis regulator | 0.007518955 | |
| caveolin 2 | 0.007474113 | |
| vascular endothelial growth factor C | 0.006789302 | |
| transforming growth factor beta receptor 1 | 0.006149265 | |
| vimentin | 0.005953075 | |
| transforming growth factor beta 2 | 0.005226418 | |
| keratin 8 | 0.00506866 |
HUGO: Human gene nomenclature; RI: relative importance.
Figure 3Rule networks for the seven detected rules generated by Ciruvis [74]. For the purpose of visualization, Ciruvis assigned specific colors for genes and placed genes into a circle. Meanwhile, Ciruvis indicated the strength of interaction among genes by using the degree of red color on the interactions. The darker the red color is, the stronger the interaction among genes.