Literature DB >> 32617516

Prediction of Nephrotoxicity Associated With Cisplatin-Based Chemotherapy in Testicular Cancer Patients.

Sara L Garcia¹, Jakob Lauritsen², Zeyu Zhang^1,3, Mikkel Bandak², Marlene D Dalgaard¹, Rikke L Nielsen^1,4, Gedske Daugaard², Ramneek Gupta¹.

Abstract

BACKGROUND: Cisplatin-based chemotherapy may induce nephrotoxicity. This study presents a random forest predictive model that identifies testicular cancer patients at risk of nephrotoxicity before treatment.
METHODS: Clinical data and DNA from saliva samples were collected for 433 patients. These were genotyped on Illumina HumanOmniExpressExome-8 v1.2 (964 193 markers). Clinical and genomics-based random forest models generated a risk score for each individual to develop nephrotoxicity defined as a 20% drop in isotopic glomerular filtration rate during chemotherapy. The area under the receiver operating characteristic curve was the primary measure to evaluate models. Sensitivity, specificity, and positive and negative predictive values were used to discuss model clinical utility.
RESULTS: Of 433 patients assessed in this study, 26.8% developed nephrotoxicity after bleomycin-etoposide-cisplatin treatment. Genomic markers found to be associated with nephrotoxicity were located at NAT1, NAT2, and the intergenic region of CNTN6 and CNTN4. These, in addition to previously associated markers located at ERCC1, ERCC2, and SLC22A2, were found to improve predictions in a clinical feature-trained random forest model. Using only clinical data for training the model, an area under the receiver operating characteristic curve of 0.635 (95% confidence interval [CI] = 0.629 to 0.640) was obtained. Retraining the classifier by adding genomics markers increased performance to 0.731 (95% CI = 0.726 to 0.736) and 0.692 (95% CI = 0.688 to 0.696) on the holdout set.
CONCLUSIONS: A clinical and genomics-based machine learning algorithm improved the ability to identify patients at risk of nephrotoxicity compared with using clinical variables alone. Novel genetics associations with cisplatin-induced nephrotoxicity were found for NAT1, NAT2, CNTN6, and CNTN4 that require replication in larger studies before application to clinical practice.

Entities: Chemical

Year: 2020 PMID： 32617516 PMCID： PMC7315098 DOI： 10.1093/jncics/pkaa032

Source DB: PubMed Journal: JNCI Cancer Spectr ISSN： 2515-5091

Standard treatment in patients with disseminated testicular cancer is chemotherapy consisting of bleomycin-etoposide-cisplatin (BEP). Cisplatin is also central in the treatment of many other solid tumors such as bladder, ovarian, and lung cancer (1). Treatment containing cisplatin has a wide range of side effects, one of which is nephrotoxicity (2,3). Cisplatin is excreted by the kidneys and may induce nephrotoxicity resulting in glomerular filtration rate (GFR) decline (4). Maintenance of sufficient renal function during treatment with chemotherapy is vital, and identification of patients at risk for developing nephrotoxicity could influence the treatment of choice if alternatives exist. Additionally, impaired renal function has been associated with increased risk of cardiovascular disease (5), which may pose a problem in long-term cancer survivors. Previous studies have improved the understanding of molecular mechanisms of cisplatin-induced nephrotoxicity (6), and several candidate gene studies have identified single-nucleotide polymorphisms (SNPs) associated with cisplatin-induced nephrotoxicity (7–9). However, these studies were conducted with surrogate measures of GFR (creatinine clearance or estimated GFR) rather than measured GFR as outcome. The scope of this study was 2-fold: first, to conduct a genome-wide association study (GWAS) using a linear model controlling for cisplatin dosage (high or normal) to identify new genetic variants associated with cisplatin-induced nephrotoxicity; and second, to investigate the utility of germline genetic markers together with clinical prognostic factors to predict nephrotoxicity using a random forest-recursive feature elimination algorithm. Patients treated for disseminated testicular cancer were chosen for this study because this patient group does not normally have comorbidity, which could influence renal function.

Methods

Patients

Patients were identified in the Danish Testicular Cancer-Late cohort (10), which includes 2572 Danish patients treated for testicular cancer from 1984 through 2007. Clinical features from 433 patients were originally extracted from hospital files as registered in the Danish Testicular Cancer database (Table 1). In 2014, all patients with measurements of renal function before and after treatment with BEP were invited to deliver a saliva sample for DNA analysis (Supplementary Figure 1, available online). Patients provided informed consent, and the study was approved by the regional ethical committee (H-2-2012-044) and the National Board of Data Protection (2012-41-0751).

Table 1.

Comparison of baseline characteristics between affected (GFR high-drop) and nonaffected patients

Characteristics	Affected, No. (%)	Nonaffected, No. (%)	P ^b
No. of patients	116 (26.8)	317 (73.2)
Clinical characteristics
Age, median (IQR)	34 (27-43)	30 (26-37)	.001
BEP regimen
Normal dose	92 (79.3)	295 (93.4)	<.001
Double dose	24 (20.7)	21 (6.6)
Unknown	—	1
GFR before treatment, median (IQR), mL/min/1.73 m²	128 (115-139)	119 (110-131)	.001
GFR after treatment, median (IQR), mL/min/1.73 m²	88 (75-99)	109 (100-119)	<.001
Cisplatin, median (IQR), mg/m²	400 (391-410)	400 (300-400)	<.001
Treatment cycles
3	20 (17.2)	97 (30.6)	<.001
4	72 (62.1)	199 (62.8)
5 or more	6 (5.2)	14 (4.4)
High dose	18 (15.5)	7 (2.2)
Histology
Seminoma	23 (19.8)	68 (21.5)	.78
Nonseminoma	93 (80.2)	249 (78.5)
Prognostic group
Good	71 (61.2)	277 (87.4)	<.001
Intermediate	30 (25.9)	35 (11.0)
Poor	15 (12.9)	5 (1.6)
Stage
Extragonadal	15 (12.9)	15 (4.7)	.87
Stage Im	7 (6.0)	30 (9.6)
Stage Iia	22 (19.1)	80 (25.5)
Stage Iib	21 (18.1)	77 (24.5)
Stage Iic	23 (19.8)	42 (13.4)
Stage III	28 (24.1)	70 (22.3)
Unknown	—	3

BEP = bleomycin-etoposide-cisplatin; GFR = glomerular filtration rate; IQR = interquartile range.

P values were calculated by 2-sided Mann-Whitney U test for continuous or ordinal characteristics. For “histology,” P value was calculated by χ2 test.

Comparison of baseline characteristics between affected (GFR high-drop) and nonaffected patients BEP = bleomycin-etoposide-cisplatin; GFR = glomerular filtration rate; IQR = interquartile range. P values were calculated by 2-sided Mann-Whitney U test for continuous or ordinal characteristics. For “histology,” P value was calculated by χ2 test.

Treatment and Renal Measurement

All 433 patients received 3 cycles or more of BEP. The majority received normal-dose cisplatin 20 mg/m2 × 5 q3w, etoposide 100 mg/m2 × 5 q3w, and bleomycin 15 IU/m2 q1w, and 25 patients received double-dose cisplatin and etoposide: cisplatin 40 mg/m2 × 5 q3w, etoposide 200 mg/m2 × 5 q3w, and bleomycin 15 IU/m2 q1w. Hydration remained uniform over time with 2 L isotonic saline before cisplatin and an additional 1-2 L after. Diuretics were administered only in special cases, and no magnesium was added to hydration. There was no predefined cutoff of renal function where patients would not receive cisplatin-based triplets; however, to ensure toxicity was related to treatment, only patients with a GFR greater than 90 mL/min/1.73m2 before chemotherapy were included. GFR was measured by the 1-sample 51Cr-ethylenediaminetetra acetic acid clearance technique using 2 samples 200 minutes after tracer injection and normalized to a body surface area (BSA) of 1.73 m2.

Genomic Information

Genomic DNA was collected and purified using GeneFiX Saliva DNA Midi Kit from Isohelix (Harrietsham, UK). DNA samples were prepared at DTU Multi-Assay Core (Lyngby, Denmark) and genotyped at AROS Applied Biotechnology A/S (Aarhus, Denmark) using Illumina HumanOmniExpressExome-8 v1.2 chip (964 193 markers). Genomic data were filtered using standard quality control steps (Supplementary Figure 2, available online). GWAS testing for single SNP association was conducted using PLINK (11) (v1.9beta3), with the GFR decline after chemotherapy as the measure of toxicity and discretized cisplatin dosage as covariate with double-dose and normal-dose groups. The cutoff of 5 cycles was made to differentiate between normal and historically higher doses of cisplatin. SNPs were annotated by ANNOVAR (v2015-06-17) (12) against the human reference genome hg19. Gene expression profiles were retrieved from GTExPortal (13). We used a suggestive P value threshold of 1 × 10−5 (14) and a stringent threshold of 8.02 × 10−8 [Bonferroni corrected (15)]. In addition to the GWAS hits, 4 SNPs, rs11615 and rs3212986 (ERCC1), rs13181 (ERCC2), and rs316019 (SLC22A2), found in previous literature to be associated with cisplatin-induced nephrotoxicity (9), were added to the input feature search space in the machine learning modeling.

Clinical Information

The clinical features used as input feature variables in the machine learning model were age at time of treatment, GFR before treatment, cumulative cisplatin dose per square meter of BSA, normal dose vs double-dose BEP, number of treatment cycles, histology (seminoma vs nonseminoma), prognostic classification as per IGCCCG (16) and stage of the disease as surrogate for size of retroperitoneal tumor size, which was represented as 3 features in the model (details on Supplementary Methods, available online).

Statistical Analysis and Model Development

A random forest model (17), which identified different risk subgroups of GFR drop, was developed using SciKit-learn (18) in Python (v3.7.1). A GFR decline of more than 20% after chemotherapy was chosen as outcome to indicate a clinically significant change and to avoid selection of cases due to random variation. A 20% decline has been associated with, for example, cognitive deterioration (19) and risk of cardiovascular and all-cause mortality compared with those with stable GFR (20). As a first stage, the predictive power of a model driven by clinical features only was established. In a second stage, genomic markers were added to the model. From all 433 individuals, about 20% (78 individuals: 20 nephrotoxicity affected) of the data, with no missing values, was randomly separated ahead of time to be used as a holdout set. Therefore, for machine model training, we omitted those 78 individuals present on the holdout set and excluded individuals with missing data in either clinical or genomic data (Supplementary Figure 1, available online). Patients’ baseline characteristics in each of these sets are available in Supplementary Table 2 (available online). Training and testing of the algorithm was performed with a 5 outer, 2 inner fold nested cross-validation (21,22) (Supplementary Figure 3, available online). The sample-splitting process for training and testing cohorts was random and repeated 100 times. Area under the receiver operating characteristic curve (ROC-AUC) was used as the primary performance measure for model optimization. A recursive backwards feature elimination approach was used for feature selection initiated with 10 clinical features and then reduced (23). To identify when the algorithm should stop removing features, a paired t test (level of statistical significance, P < .05) was calculated for each round of feature elimination on mean ROC-AUCs (Figure 1, A and B). A statistically significant AUC drop (P < .05) was indicative of an important feature being eliminated. All statistical tests were 2-sided. Details on model optimization and variable importance are described in the Supplementary Methods (available online).

Figure 1.

Feature selection using random forest-recursive feature elimination algorithm and diagnostic performances. A and B) Boxplots with different number of features, −10 to 1 and 27 to 5, for clinical and clinical plus genomics, respectively, and respective area under the receiver operating characteristic curve (ROC-AUC) throughout 100 different replications for data shuffling. Asterisks between boxplots represent P values (paired t test) of >.05 (*), .05 (**), and .01 (***). All tests were 2-sided. The red arrow represents the block chosen for further analysis. C) The features chosen the most on the 15-features clinical and SNP-based models. D) Performances obtained (mean and 95% confidence intervals) on the clinical models (6 features) and on the clinical and SNP-based models (15 features) using 0.50 cutoff for classification for sensitivity, specificity, positive predictive value, and negative predictive value. NPV = negative predictive value; Perfs. = performances; PPV = positive predictive value; ROC-AUC = area under the receiver operating characteristic curve; SNP = single-nucleotide polymorphism.

Polygenic Risk Score (PRS)-Derived Models

We also calculated PRS-derived models weighted by effect sizes estimated by the GWAS using the R-Package PRSice (24). These were tested in the random forest models in place of individual SNPs. Two different approaches were used: the risks associated with all the 21 SNPs were combined to determine a PRS, and a PRS per gene was estimated.

Model Performances and Risk Groups

The primary reported performance was assessed with a 0.50 cutoff on the random forest model scores. In addition, to determine clinical applicability, we assessed different cutoffs on the random forest scores with a goal of 10% false discovery or omission rate (positive or negative predictive values >90%). For the SNPs and clinical-based models from the best round, the split that had a representative ROC-AUC close to the mean was used to assess different cutoffs (25) (Supplementary Figure 4, available online). Based on this, specific cutoffs for detection of 3 risk groups were used on the holdout set: a high-risk group for developing nephrotoxicity; a low-risk group for developing nephrotoxicity; and an intermediate group, which refers to individuals whose prediction is not adequately compelling to change the clinical decision.

Results

Study Population

Overall, 433 individuals (26.8% nephrotoxicity affected) were assessed in this study, with a median (interquartile range [IQR]) age of 34 (27-43) years for affected patients (N = 116) and 30 years (26–37) for nonaffected patients (N = 317). The majority received 3 or 4 cycles of BEP. Before treatment, the median (IQR) GFR (mL/min/1.73 m2) was 128 (115-139) for affected and 119 (110-131) for nonaffected, and after treatment it decreased to 88 (75-99) for affected and 109 (100-119) for nonaffected (Table 1).

Genome-Wide Association Study

Of 433 saliva samples received, 8 failed to yield high-quality genetic data. After quality control filtering, a total of 411 patients and 623 289 SNPs were eligible for GWAS (Supplementary Figures 1 and 2, available online). There was no indication of population stratification or inflation in the quantile-quantile plot of observed vs expected -log10 (P values) (Supplementary Figure 5, available online). GWAS controlling for cisplatin-based chemotherapy dosage identified 17 SNPs associated with GFR decline. Seven SNPs located contiguous on chromosome 14 within the intergenic region between LINC00645 and FOXG1 passed a genome-wide statistical significance threshold of P = 8.02 × 10−8 (Figure 2; Table 2). Nine additional SNPs located on chromosome 8, cytoband p22, passed a suggestive threshold of P = 1 × 10−5 and were located in the intron and 3´ untranslated region of NAT1 or the intergenic region between NAT1 and NAT2. SNP rs17038909 (P = 6.70 × 10−8), located in the intergenic region between CNTN6 and CNTN4, passed the genome-wide statistical significance threshold.

Figure 2.

Genome-wide association study. Manhattan plot for association of 623 289 single-nucleotide polymorphisms with glomerular filtration rate decline. Linear model adjusted for cisplatin dosage was performed. The black dashed line represents a suggestive threshold: 1 × 10−5, and the red dashed line represents a stringent Bonferroni corrected threshold: 8.02 × 10−8. Markers in a contiguous pattern that pass the suggestive threshold are marked with a dotted box.

Risk Prediction Model

A baseline predictive model with only clinical features was trained using random forests. Of the initial 10 clinical features, 6 features were prioritized through recursive backwards elimination (Figure 1A): age at time of treatment, GFR before treatment, cumulative cisplatin-dose per square meter of BSA, number of treatment cycles, prognostic classification as per IGCCCG (1) (16), and stage of the disease, excluding group and histology. Univariate analysis also highlighted features selected in the random forest model (Table 1).

SNPs and Clinical-Based Model

A selection of genomic markers was added to the baseline clinical prediction model: 17 SNPs from the GWAS and 4 additional SNPs from prior literature. Through recursive backwards elimination, 15 features were prioritized (6 clinical and 9 SNPs). The selected SNPs were rs11615 and rs3212986 (ERCC1), rs13181 (ERCC2), rs4986993, rs15561, rs8190870 (NAT1), rs1353035 (NAT1/NAT2), rs316019 (SLC22A2), and rs17038909 (CNTN6/CNTN4) (Figure 1, B and C). None of the SNPs located within the intergenic region between LINC00645 and FOXG1 were selected. By adding genomic markers, ROC-AUC increased from 0.635 (95% confidence interval [CI] = 0.629 to 0.640) to 0.731 (95% CI = 0.726 to 0.736) (Figure 1D for additional performance metrics). Additionally, 2 PRS were added independently to the baseline clinical model but did not outperform the individual SNPs (Supplementary Table 1, available online).

Model Robustness

As a further validation, we tested for random outcome, simulated by permuting the labels 2000 times. This generated random performance for the model based on the clinical traits in combination with the 9 SNPs previously reported, with a ROC-AUC mean of 0.498 (95% CI = 0.497 to 0.500). Furthermore, to assess if the SNP selection was meaningful, the performance of 9 random GWAS SNPs instead of the previously described 9 selected SNPs was tested when combined with the selected clinical traits; this process was repeated 2000 times. This performed very similarly to clinical traits alone, with a ROC-AUC mean of 0.661 (95% CI = 0.660 to 0.661) against the model scores with a ROC-AUC mean of 0.742 (95% CI = 0.741 to 0.743) (Figure 3).

Figure 3.

Benchmarking of the models. A) Test for random outcome simulated by permuting the labels 2000 times. B) Test for random single-nucleotide polymorphisms selection by combining 9 random markers, instead of the 9 selected markers, with the selected clinical traits. ROC-AUC = area under the receiver operating characteristic curve; SNP = single nucleotide polymorphism.

Replication Dataset

The holdout set (78 individuals: 20 nephrotoxicity affected) was used for replication of the random forest models with clinical and genetic features. A ROC-AUC of 0.692 (95% CI = 0.688 to 0.696) was obtained on the final evaluation (Figure 4A).

Figure 4.

Final model evaluation (clinical and genomic markers) on the holdout set. A) Area under (AUC) the receiver operating characteristic curve (ROC; mean and 95% confidence interval) analysis of clinical risk factors and genetic variables for prediction of cisplatin-based nephrotoxicity in testicular cancer patients using the holdout dataset. B) Diagnostic performances obtained with 3 prediction cutoffs and independent evaluation (random forest score) for each individual: 78 individuals (×5 cross-validated models) (blue: affected; red: nonaffected). One validation external set was used. The 3 groups are represented: low-risk group (8% false negatives), undetermined zone, and high-risk group (33% false positives). Perfs. = performances; PPV = positive predictive value; NPV = negative predictive value; FN = false negatives; FP = false positives. A prediction cutoff of 0.90 and 0.30 for high risk and low risk, respectively, of developing nephrotoxicity was chosen for further analysis on 1 validation external set to discuss the model clinical utility. A random forest score between 0.30 and 0.90 was not enough to make a clinical decision. In the high-risk group, we had a positive predictive value of 0.67 (33% false discovery rate) and specificity of 0.99 while capturing 6% of all nephrotoxicity, whereas in the low-risk group we had a sensitivity of 0.92 and negative predictive value of 0.92 (8% false omission rate), which captured 32% of all nonaffected patients (Figure 4B).

Discussion

In this study, we were able to predict patients at risk of developing nephrotoxicity after BEP chemotherapy based on clinical and genetic features with a machine learning algorithm. Clinical features selected on the random forests–driven baseline clinical model were known risk factors of renal toxicity (2) and were statistically significant in univariate analysis. The aim of the baseline model was to mimic and codify clinical intuition, which relies on the available clinical information at the time of treatment. When genomic markers were added to the baseline model, prediction power substantially improved. We believe that genomic information, although not being predictive on its own, improves a baseline clinical model for identification of patients at risk for nephrotoxicity. PRS did not perform as well as independent SNPs when added to the model, suggesting that nonlinear correlations between SNPs drove the increase in performance opposed to the linear combination that PRS offer, as has also been suggested elsewhere (26). SNPs located in the LINC00645 and FOXG1 intergenic regions, although strongly associated in the GWAS (P = 5 × 10−8), were not selected in the machine learning model because of either limited contribution or low minor allele frequencies (Table 2) that made it harder to detect in cross-validated setups.

Table 2.

Top GWAS hits and literature SNP hits for cisplatin-based nephrotoxicity in testicular cancer patients

SNP	Gene	CHR	Position	Region/Consequence	Alleles (ref/alt)	MAF (all)	MAF (EUR)	P ^b
Top GWAS
rs17038909	CNTN6, CNTN4	3	1467145	Intergenic	A/G	G: 0.10	G: 0.08	6.70 × 10⁻⁸
rs8190845	NAT1	8	18078628	Intronic	G/A	A: 0.20	A: 0.15	1.79 × 10⁻⁶
rs15561	NAT1	8	18080651	3 UTR	A/C	A: 0.44	A: 0.28	2.29 × 10⁻⁷
rs4986993	NAT1	8	18080747	3 UTR	T/G	T: 0.44	T: 0.28	5.25 × 10⁻⁷
rs8190870	NAT1	8	18081272	Downstream	C/T	T: 0.14	T: 0.15	1.12 × 10⁻⁶
rs13270034	NAT1, NAT2	8	18082354	Intergenic	G/A	A: 0.08	A: 0.13	7.64 × 10⁻⁶
rs13277177	NAT1, NAT2	8	18086096	Intergenic	A/G	G: 0.06	G: 0.10	9.72 × 10⁻⁶
rs13277481	NAT1, NAT2	8	18086217	Intergenic	A/G	G: 0.08	G: 0.13	5.47 × 10⁻⁶
rs13270961	NAT1, NAT2	8	18139163	Intergenic	T/C	C: 0.08	C: 0.11	7.31 × 10^-−6
rs1353035	NAT1, NAT2	8	18140633	Intergenic	C/T	C: 0.15	C: 0.17	5.35 × 10⁻⁶
rs17095485	LINC00645, FOXG1	14	28500775	Intergenic	C/T	T: 0.07	T: 0.06	1.13 × 10⁻⁸
rs17382424	LINC00645, FOXG1	14	28529219	Intergenic	C/T	T: 0.02	T: 0.06	1.29 × 10⁻⁸
rs4551947	LINC00645, FOXG1	14	28584430	Intergenic	C/A	A: 0.05	A: 0.06	2.26 × 10⁻⁸
rs8020589	LINC00645, FOXG1	14	28604708	Intergenic	C/T	T: 0.07	T: 0.06	1.44 × 10⁻⁸
rs10131751	LINC00645, FOXG1	14	28681216	Intergenic	C/A	A: 0.07	A: 0.07	1.45 × 10⁻⁸
rs9671720	LINC00645, FOXG1	14	28714229	Intergenic	C/T	T: 0.05	T: 0.04	8.81 × 10⁻⁹
rs12323487	LINC00645, FOXG1	14	28837771	Intergenic	C/A/T	A: 0.09	A: 0.05	1.19 × 10⁻⁸
Literature
rs316019	SLC22A2	6	160670282	Missense	A/C	A: 0.14	A: 0.11	0.21
rs13181	ERCC2	19	45854919	Stop gained	T/A/G	G: 0.24	G: 0.36	0.03
rs3212986	ERCC1	19	45912736	Stop gained	C/A/G/T	A: 0.30	A: 0.25	0.11
rs11615	ERCC1	19	45923653	Synonymous	A/G	A: 0.33	G: 0.38	0.004

Positions refer to assembly GRCh37. alt = alternative(s); CHR = chromosome; EUR = Europe; GWAS = genome-wide association study; MAF = minor allele frequency; ref = reference; ; SNP = single-nucleotide polymorphism; UTR = untranslated region.

A linear model was adjusted for cisplatin dosage and scored by P values representing how likely the variant association was by random chance.

Top GWAS hits and literature SNP hits for cisplatin-based nephrotoxicity in testicular cancer patients Positions refer to assembly GRCh37. alt = alternative(s); CHR = chromosome; EUR = Europe; GWAS = genome-wide association study; MAF = minor allele frequency; ref = reference; ; SNP = single-nucleotide polymorphism; UTR = untranslated region. A linear model was adjusted for cisplatin dosage and scored by P values representing how likely the variant association was by random chance. SNPs rs4986993, rs15561, and rs8190870 (NAT1), rs1353035 (NAT1/NAT2), and rs17038909 (CNTN6/CNTN4) were newly discovered in the present GWAS to be associated with nephrotoxicity and added performance to the machine learning model. NAT1 and NAT2 encode for arylamine N-acetyltransferases that take part in metabolizing drugs and chemical compounds in humans with a role in folate metabolism (27). These 2 genes encode similar protein sequences [identity = 81.03%, Clustal-Omega, Uniprot (28)], yet differ on expression profiles (13). NAT1 is ubiquitously expressed in the central nervous system, and NAT2 is specifically expressed in the liver, colon, and small intestine (Supplementary Figure 6, available online). It has been reported that cisplatin can impair NAT1 by blocking its transferase activity in human breast cancer cells and impair murine Nat2 activity in cultured mouse tissues (liver and kidney) (29), which on one hand contributes to the therapeutic effects of cisplatin, but on the other hand may lead to accumulation of cisplatin in the kidneys. CNTN6 and CNTN4 encode for contacting proteins, which mediate cell surface interactions during nervous system development and have been suggested to be associated with neurodevelopmental disorders (30–32), though the association with nephrotoxicity needs to be further explored. SNPs found previously to be associated with nephrotoxicity were incorporated in this model. These SNPs were located at ERCC1, ERCC2, and SLC22A2. ERCC1 and ERCC2 encode for excision repair proteins, and polymorphisms in ERCC1/2 have been reported to alter ERCC1/2 DNA repair function (33–35), which may affect nephron repair capacity after cisplatin exposure during chemotherapy (36–39). If not adequately repaired, cisplatin-induced DNA damage can induce cell death (40,41). SLC22A2 encodes for organic cation transporter 2 (OCT2) protein, which is expressed in the proximal tubule epithelial cells of the kidney and involved in the absorption and excretion of xenobiotics and metabolites (42). OCT2 efficiently mediates cisplatin cellular uptake, leading to high cisplatin accumulation in renal proximal tubule cells (43) where cisplatin-induced nephrotoxicity typically occurs (44). OCT2 may be a key regulator in the renal accumulation of cisplatin, affecting drug handling and inducing nephrotoxicity (42,45). During primary treatment of disseminated testicular cancer, about one-third of the patients develop cisplatin-induced nephrotoxicity (46,47). This clinical and genomics-based model could be used as an early assessment for nephrotoxicity risk, assisting in identifying patients at high and low nephrotoxicity risk and influencing decisions on cisplatin chemotherapy cycles. Using a 0.50 cutoff on the random forest model scores, we were able to achieve a sensitivity of 0.65, positive predictive value of 0.35, specificity of 0.60, and negative predictive value of 0.83. Differential thresholding of the nephrotoxicity model classified patients into high, low, and intermediate risk. For the high-risk group, the model correctly classified 67% of the patients who developed nephrotoxicity, yet only a small fraction of affected individuals was captured (0.06 sensitivity). On the other hand, for the low-risk group, the model correctly classified 92% of the patients who did not develop nephrotoxicity and captured 32% of the nonaffected population (Figure 4B). Even though the model shows utility in the ability to predict toxicity throughout the score range, extreme cutoffs to identify the highest and lowest risk patients could point at the least disruptive implementation of such a model within current practice. A strength of this study is the large dataset with a good representation of patients who developed nephrotoxicity after cisplatin-based chemotherapy, using exact renal measurements, and the first application, to our knowledge, of artificial intelligence on predicting such a phenotype. The machine learning models appeared to be robust with stable performance across 100 random cross-validation splits of the training data, demonstrating performance of 0.731 mean ROC-AUC in cross-validation and 0.692 (95% CI = 0.688 to 0.696) ROC-AUC in the holdout set. Yet, as a limitation, the machine learning setups use some of the association results from the GWAS on the same cohort; therefore, replication on another cohort from an external dataset would be of substantial interest. NAT1 and NAT2 appear as interesting genetic targets to prioritize for assaying in future nephrotoxicity studies and would benefit from functional validation. The ability to develop machine learning models for patient stratification in different nephrotoxicity risk groups has the potential to balance aggressive treatment against predicted toxicity risk. In the future, toxicity may play a larger role in guiding treatment across several complex diseases, where data-driven prediction models may aid in decision making. Some of the clinical features used in this model, such as age at the time of treatment and GFR before chemotherapy as well as some of the identified genomics markers, could be applicable to other tumors types. Cisplatin is one of the most compelling drugs used in cancer treatment, and nephrotoxicity is a well-known side effect from its use. Our model could be applicable to ovarian, bladder, and lung cancer, where more elderly patients are at risk of nephrotoxicity and early identification of toxicity risks (or lack thereof) may influence treatment aggression or increase monitoring for selected patients.

Funding

This work was supported by the Danish cancer society (R40-A2119). SLG was supported by Idella Foundation. ZZ and RLN were supported by Sino-Danish Center for Education and Research.

Notes

Role of the funder: The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit. Conflicts of interest: RG is employed with Novo Nordisk Research Centre Oxford since February 2020. The other authors have no conflicts of interest to disclose. Author contributions: JL, GD, RG: Study concept and design. SLG, JL, ZZ, MB, RLN, RG: Acquisition, analysis, or interpretation of data. SLG, JL, ZZ: Drafting of the manuscript. SLG, JL, ZZ, MB, MDD, RLN, GD, RG: Critical revision of the manuscript for important intellectual content. SLG, ZZ: Statistical Analysis. MDD, GD, RG: Study supervision. Click here for additional data file.

44 in total

1. Multiple significance tests: the Bonferroni method.

Authors: J M Bland; D G Altman
Journal: BMJ Date: 1995-01-21

2. Incidence of cisplatin-induced nephrotoxicity and associated factors among cancer patients in Indonesia.

Authors: Yenny Prasaja; Noorwati Sutandyo; Retnosari Andrajati
Journal: Asian Pac J Cancer Prev Date: 2015

3. Genetic polymorphisms and the efficacy and toxicity of cisplatin-based chemotherapy in ovarian cancer patients.

Authors: A V Khrunin; A Moisseev; V Gorbunova; S Limborska
Journal: Pharmacogenomics J Date: 2009-09-29 Impact factor: 3.550

Review 4. Platinum-based drugs: past, present and future.

Authors: Shahana Dilruba; Ganna V Kalayda
Journal: Cancer Chemother Pharmacol Date: 2016-02-17 Impact factor: 3.333

5. Risk factors for cisplatin-induced nephrotoxicity and potential of magnesium supplementation for renal protection.

Authors: Yasuhiro Kidera; Hisato Kawakami; Tsutomu Sakiyama; Kunio Okamoto; Kaoru Tanaka; Masayuki Takeda; Hiroyasu Kaneda; Shin-ichi Nishina; Junji Tsurutani; Kimiko Fujiwara; Morihiro Nomura; Yuzuru Yamazoe; Yasutaka Chiba; Shozo Nishida; Takao Tamura; Kazuhiko Nakagawa
Journal: PLoS One Date: 2014-07-14 Impact factor: 3.240

6. CNTN6 mutations are risk factors for abnormal auditory sensory perception in autism spectrum disorders.

Authors: O Mercati; G Huguet; A Danckaert; G André-Leroux; A Maruani; M Bellinzoni; T Rolland; L Gouder; A Mathieu; J Buratti; F Amsellem; M Benabou; J Van-Gils; A Beggiato; M Konyukh; J-P Bourgeois; M J Gazzellone; R K C Yuen; S Walker; M Delépine; A Boland; B Régnault; M Francois; T Van Den Abbeele; A L Mosca-Boidron; L Faivre; Y Shimoda; K Watanabe; D Bonneau; M Rastam; M Leboyer; S W Scherer; C Gillberg; R Delorme; I Cloëz-Tayarani; T Bourgeron
Journal: Mol Psychiatry Date: 2016-05-10 Impact factor: 15.992

Review 7. Machine Learning SNP Based Prediction for Precision Medicine.

Authors: Daniel Sik Wai Ho; William Schierding; Melissa Wake; Richard Saffery; Justin O'Sullivan
Journal: Front Genet Date: 2019-03-27 Impact factor: 4.599

8. Understanding and using sensitivity, specificity and predictive values.

Authors: Rajul Parikh; Annie Mathai; Shefali Parikh; G Chandra Sekhar; Ravi Thomas
Journal: Indian J Ophthalmol Date: 2008 Jan-Feb Impact factor: 1.848

9. CNTN6 copy number variations in 14 patients: a possible candidate gene for neurodevelopmental and neuropsychiatric disorders.

Authors: Jie Hu; Jun Liao; Malini Sathanoori; Sally Kochmar; Jessica Sebastian; Svetlana A Yatsenko; Urvashi Surti
Journal: J Neurodev Disord Date: 2015-08-06 Impact factor: 4.025

10. Disruption of DNA repair in cancer cells by ubiquitination of a destabilising dimerization domain of nucleotide excision repair protein ERCC1.

Authors: Lanlan Yang; Ann-Marie Ritchie; David W Melton
Journal: Oncotarget Date: 2017-07-21

3 in total