| Literature DB >> 29977864 |
John Kang1, Tiziana Rancati2, Sangkyu Lee3, Jung Hun Oh3, Sarah L Kerns1, Jacob G Scott4,5, Russell Schwartz6,7, Seyoung Kim6, Barry S Rosenstein8,9.
Abstract
Due to the rapid increase in the availability of patient data, there is significant interest in precision medicine that could facilitate the development of a personalized treatment plan for each patient on an individual basis. Radiation oncology is particularly suited for predictive machine learning (ML) models due to the enormous amount of diagnostic data used as input and therapeutic data generated as output. An emerging field in precision radiation oncology that can take advantage of ML approaches is radiogenomics, which is the study of the impact of genomic variations on the sensitivity of normal and tumor tissue to radiation. Currently, patients undergoing radiotherapy are treated using uniform dose constraints specific to the tumor and surrounding normal tissues. This is suboptimal in many ways. First, the dose that can be delivered to the target volume may be insufficient for control but is constrained by the surrounding normal tissue, as dose escalation can lead to significant morbidity and rare. Second, two patients with nearly identical dose distributions can have substantially different acute and late toxicities, resulting in lengthy treatment breaks and suboptimal control, or chronic morbidities leading to poor quality of life. Despite significant advances in radiogenomics, the magnitude of the genetic contribution to radiation response far exceeds our current understanding of individual risk variants. In the field of genomics, ML methods are being used to extract harder-to-detect knowledge, but these methods have yet to fully penetrate radiogenomics. Hence, the goal of this publication is to provide an overview of ML as it applies to radiogenomics. We begin with a brief history of radiogenomics and its relationship to precision medicine. We then introduce ML and compare it to statistical hypothesis testing to reflect on shared lessons and to avoid common pitfalls. Current ML approaches to genome-wide association studies are examined. The application of ML specifically to radiogenomics is next presented. We end with important lessons for the proper integration of ML into radiogenomics.Entities:
Keywords: big data; computational genomics; machine learning in radiation oncology; precision oncology; predictive modeling; radiation oncology; statistical genetics and genomics
Year: 2018 PMID: 29977864 PMCID: PMC6021505 DOI: 10.3389/fonc.2018.00228
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Schematic outline of functional biology modeling via generative or discriminative models.
Figure 2Typical machine learning project workflow.
Figure 3Sample plots of statistical power and learning curve error. Statistical power graph derived using Genomic Association Studies power calculator (137). Learning curve assuming an inverse power law common to multiple machine learning methods (80, 139, 140).
Three representative machine learning methods with select pre-processing tips and tuning methods for complexity control.
| Method | Pre-process | Complexity control | Reference |
|---|---|---|---|
| Support vector machine (SVM) | Encode features as binary Normalize to uniform distribution Imputation for balancing data | Recursive feature elimination for linear SVM Soft margin width (C-parameter) Kernel hyperparameters | ( |
| Bayesian networks | Feature discretization Variable selection to reduce graph search space Imputation not necessary when using expectation maximization | Constraints to a graph search space based on prior knowledge Graph scoring functions that penalize complexity | ( |
| Random forest | No discretization or normalization necessary Imputation required | Number of features to sample at each node split (mtry) Minimum number of samples in a terminal node | ( |
Figure 4Possible representation of a Bayesian network directed acyclic graph for predicting late rectal bleeding after radiotherapy for prostate cancer. The network includes tumor-related characteristics (PSA, Gleason pattern score, and clinical T stage) which determine risk class and consequently radiotherapy targets (irradiation of pelvic lymph nodes and of seminal vesicles) and use of concomitant hormone therapy. Treatment variables influence the dosimetry of organs at risk [rectal dose–volume histogram (DVH)], and this has a causal effect on late rectal bleeding probability. Clinical (presence of a previous abdominal surgery and of cardiovascular diseases) and genetic [single-nucleotide polymorphism (SNP) signature] variables with (causal) associations with rectal bleeding are also included in the DAG.