| Literature DB >> 28233873 |
Jung Hun Oh1, Sarah Kerns2, Harry Ostrer3, Simon N Powell4, Barry Rosenstein5, Joseph O Deasy1.
Abstract
The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28233873 PMCID: PMC5324069 DOI: 10.1038/srep43381
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Number of samples used in the training and validation datasets in the model building process of rectal bleeding and erectile dysfunction.
| Endpoint | Group | Training samples | Validation samples |
|---|---|---|---|
| Rectal bleeding | Cases | 49 | 25 |
| Controls | 194 | 97 | |
| Total | 243 | 122 | |
| Erectile dysfunction | Cases | 88 | 45 |
| Controls | 69 | 34 | |
| Total | 157 | 79 |
Figure 1Quantile-Quantile (Q-Q) plot for p-values obtained by single association tests using chi-square test on the training dataset in rectal bleeding.
It is typical of GWAS that many of the highest correlations are false-positives. We address this problem via multi-SNP machine learning methods.
Figure 2Performance evaluation of the proposed method on a validation dataset.
Performance comparison of our proposed method (pre-conditioned random forest regression) with other computational methods on the hold-out validation data in (A) rectal bleeding and (B) erectile dysfunction. STD: standard deviation. (C) Box plots showing the performance of our proposed predictive model for rectal bleeding and erectile dysfunction, resulting from the list of SNPs obtained after the biological relevance test of gene ontology biological processes. For the erectile dysfunction model, two clinical variables (ADT and age) were also added in the model building process. The filled circle dot indicates an AUC when a final model for each endpoint, built using all training data, was tested on the hold-out validation data. For the final models, comparison of the predicted and actual incidence for (D) rectal bleeding and (E) erectile dysfunction on the hold-out validation data. The patients were sorted based on the predicted outcomes and binned into 6 groups with 1 being the lowest risk group and 6 being the highest risk group. The error bar indicates the standard error.
Figure 3SNP importance score measured in the model building process for (A) rectal bleeding and (B) erectile dysfunction. The red dashed lines indicate the points of the top 25, 50, and 75% of SNPs. Directly connected protein-protein interaction networks for (C) rectal bleeding and (D) erectile dysfunction generated using the MetaCore software with the top 50% of SNPs. The line colors indicate the activation (green), inhibition (red), and unspecified (grey) effects.
Top 10 biological processes and corresponding genes for rectal bleeding.
| Ranking | GO Processes/Genes | FDR |
|---|---|---|
| 1 | Regulation of ion transport | 4.70E-06 |
| CACNA1D,CCL13,DPP10,DPP6,GCK,GNB4,GPR63,HOMER1,IL1RAPL1,JDP2,KCNIP4,KCNJ6,NLGN1,NOS1AP,PRKCB,PRKG1,VDR | ||
| 2 | Regulation of potassium ion transport | 5.33E-06 |
| CACNA1D,DPP10,DPP6,GCK,JDP2,KCNIP4,NOS1AP,PRKG1 | ||
| 3 | Regulation of metal ion transport | 8.92E-06 |
| CACNA1D,CCL13,DPP10,DPP6,GCK,GNB4,GPR63,HOMER1,JDP2,KCNIP4,NOS1AP,PRKCB,PRKG1,VDR | ||
| 4 | Regulation of cation transmembrane transport | 1.76E-05 |
| CACNA1D,DPP10,DPP6,GNB4,HOMER1,KCNIP4,NOS1AP,PRKCB,PRKG1 | ||
| 5 | Regulation of potassium ion transmembrane transport | 1.89E-05 |
| CACNA1D,DPP10,DPP6,KCNIP4,NOS1AP,PRKG1 | ||
| 6 | Regulation of ion transmembrane transport | 2.31E-05 |
| CACNA1D,CCL13,DPP10,DPP6,GNB4,HOMER1,IL1RAPL1,KCNIP4,KCNJ6,NLGN1,NOS1AP,PRKCB,PRKG1 | ||
| 7 | Regulation of transmembrane transport | 5.04E-05 |
| CACNA1D,CCL13,DPP10,DPP6,GNB4,HOMER1,IL1RAPL1,KCNIP4,KCNJ6,NLGN1,NOS1AP,PRKCB,PRKG1 | ||
| 8 | Cellular calcium ion homeostasis | 1.27E-04 |
| CACNA1D,CCL1,CCL13,GCK,GNB4,GPR63,HERPUD1,PRKCB,PRKG1,TMEM165,VDR | ||
| 9 | Regulation of system process | 1.27E-04 |
| CACNA1D,CTNNA2,FST,GPR63,GUCY1A2,NLGN1,NOS1AP,PRKCB,PRKG1,SLC1A1,TENM4,TNFRSF21,TNR | ||
| 10 | Regulation of ion transmembrane transporter activity | 1.27E-04 |
| CACNA1D,GNB4,HOMER1,NLGN1,NOS1AP,PRKCB,PRKG1 |
Top 10 biological processes and corresponding genes for erectile dysfunction.
| Ranking | GO Processes/Genes | FDR |
|---|---|---|
| 1 | Negative regulation of heart contraction | 8.38E-10 |
| CXCR5,PDE4D,PRKCA,SPX | ||
| 2 | Negative regulation of blood circulation | 2.18E-08 |
| CXCR5,PDE4D,PRKCA,SPX | ||
| 3 | Neutrophil chemotaxis | 5.03E-08 |
| CXCR5,PDE4D,PRKCA | ||
| 4 | Neutrophil migration | 5.88E-08 |
| CXCR5,PDE4D,PRKCA | ||
| 5 | Granulocyte chemotaxis | 9.68E-08 |
| CXCR5,PDE4D,PRKCA | ||
| 6 | Granulocyte migration | 1.30E-07 |
| CXCR5,PDE4D,PRKCA | ||
| 7 | Positive regulation of locomotion | 2.63E-07 |
| CXCR5,DAB2IP,MAP2K1,OPRK1,PDE4D,PRKCA,SEMA5A,SMAD3 | ||
| 8 | Regulation of muscle system process | 5.51E-07 |
| CXCR5,GLRX3,MAP2K1,PDE4D,PRKCA | ||
| 9 | Regulation of muscle contraction | 5.51E-07 |
| CXCR5,MAP2K1,PDE4D,PRKCA | ||
| 10 | Positive regulation of cell migration | 8.96E-07 |
| CXCR5,DAB2IP,MAP2K1,PDE4D,PRKCA,SEMA5A,SMAD3 |