Literature DB >> 35603279

Identifying causal relationships of cancer treatment and long-term health effects among 5-year survivors of childhood cancer in Southern Sweden.

Anders Holst¹, Jan Ekman¹, Magnus Petersson-Ahrholt², Thomas Relander³, Thomas Wiebe⁴, Helena M Linge⁴.

Abstract

Background: Survivors of childhood cancer can develop adverse health events later in life. Infrequent occurrences and scarcity of structured information result in analytical and statistical challenges. Alternative statistical approaches are required to investigate the basis of late effects in smaller data sets.
Methods: Here we describe sex-specific health care use, mortality and causal associations between primary diagnosis, treatment and outcomes in a small cohort (n = 2315) of 5-year survivors of childhood cancer (n = 2129) in southern Sweden and a control group (n = 11,882; age-, sex- and region-matched from the general population). We developed a constraint-based method for causal inference based on Bayesian estimation of distributions, and used it to investigate health care use and causal associations between diagnoses, treatments and outcomes. Mortality was analyzed by the Kaplan-Meier method.
Results: Our results confirm a significantly higher health care usage and premature mortality among childhood cancer survivors as compared to controls. The developed method for causal inference identifies 98 significant associations (p < 0.0001) where most are well known (n = 73; 74.5%). Hitherto undescribed associations are identified (n = 5; 5.1%). These were between use of alkylating agents and eye conditions, topoisomerase inhibitors and viral infections; pituitary surgery and intestinal infections; and cervical cancer and endometritis. We discuss study-related biases (n = 20; 20.4%) and limitations. Conclusions: The findings contribute to a broader understanding of the consequences of cancer treatment. The study shows relevance for small data sets and causal inference, and presents the method as a complement to traditional statistical approaches.

Entities: Chemical

Keywords: Databases; Paediatric cancer

Year: 2022 PMID： 35603279 PMCID： PMC9053221 DOI： 10.1038/s43856-022-00081-z

Source DB: PubMed Journal: Commun Med (Lond) ISSN： 2730-664X

Introduction

Survival after childhood cancer has increased and resulted in a growing population of survivors at risk for developing late complications[1,2]. The most apparent late effects have been clinically evident since the beginning of the combinational therapy era in the 1970s and have in recent years been transformed into evidence-based guidelines to aid medical follow-up[3-6]. Extensive studies have resulted in risk associations between adverse outcomes, e.g., cardiovascular events[7], impaired cognition[8], and specific chemotherapeutic drugs, targets and doses of radiation therapy as well as stem cell transplantation[9]. In the current study, we developed a method for causal inference and used it to investigate causal relations between details of the primary disease, its treatment history and health-related events that occur 5 years after the first childhood cancer diagnosis (CCD), using a population-based childhood cancer cohort with years of diagnoses ranging from 1970 to 2016[10]. The purpose of the study was to (1) briefly describe the newly established cohort, (2) describe and discuss the causal inference method as suitable for smaller data sets and (3) report the links found between primary disease, treatment and outcomes. We hypothesized that the analysis would identify novel associations, which may enrich clinical follow-up care of childhood cancer survivors (CCS), and contribute to an improved understanding of the basis of adverse effects after cancer treatment. The study confirmed a higher health care usage and mortality among CCS than in the control group from the general population, and identified 98 links between primary disease, treatment and outcomes. Most of the links were well known but some have not been described previously, to the best of our knowledge. These were between use of alkylating agents and eye conditions, topoisomerase inhibitors and viral infections; pituitary surgery and intestinal infections; and cervical cancer and endometritis. The developed method for causal inference for limited size data sets is a substantial result of this study.

Methods

Ethical considerations

Ethical approval of the study was received by the Regional Ethical Review Board in Lund (2018-022) and the treatment data were extracted from the regional pediatric oncology registry BORISS with permission from the Council of Skane. The law governing quality registries does not impose a requirement of consent from the individual. At the time of diagnosis, the parents of the patients were informed and presented with the possibility of opting out of several quality registries. All data in the project were pseudonymized to allow identification by health care services in case life-threatening late effects were identified.

Data assembly and coding

The demographical data and detailed treatment data of all 5-year survivors in the regional quality registry[10] were included in the study (n = 2315). They were diagnosed with childhood cancer between the years 1970 and 2012. The patients diagnosed with childhood cancer in 2013–2015 were not yet 5-year survivors (n = 186) at the time of data extraction. The detailed treatment data from these latter patients contributed to the analyses but their outcomes did not. Matching control subjects from the general population were selected based on sex, year of birth and place of residency, and served as a control group (n = 11,882). Although outcome data are routinely recorded in national registries for all Swedish citizens including CCS, detailed and structured childhood cancer treatment data were available only for the regional subset of CCS. For this study, the outcomes data included in-patient care, outpatient care, and causes of death, represented as International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) codes. The median time of follow-up was 15.1 years and the mean 16.4 years. The outpatient care registrations started in 1997, resulting in a median time of follow-up of 13.3 years and the mean 11.8 years for outpatient data. Treatment data included chemotherapeutic agents (grouped into 10 categories), anatomical site of radiation therapy (10 categories), type of surgery (86 categories), and stem cell transplantation (allogeneic or autologous). Predisposing medical conditions or syndromes (multiple endocrine neoplasia type 2, neurofibromatosis, Down syndrome, Beckwith–Wiedemann syndrome, and tuberous sclerosis) in the CCS cohort were included as a subgroup (n = 101).

Causal inference approach

The analysis approach was to use a constraint-based method for causal inference, the PC algorithm[11,12] (named after the inventors Peter Spirtes and Clark Glymour), to build a causal graph between all individual properties, childhood cancer diagnoses, cancer treatments, and potential late effects. In order to use the method with a limited amount of data, we developed a specially tailored conditional independence test, based on Bayesian estimation of distributions and a measure of correlation, which is invariant under conditioning. This makes it possible to base the estimation of the correlation on the whole data set also when conditioning on one or more variables, and thus makes it possible to maintain a high level of significance despite a limited amount of data. Each potential late effect, as given by the register data records, is represented by an ICD-10 code, henceforth termed “outcome diagnosis code” or just “outcome”, and has a certain base frequency of occurrence in the population. We wish to test whether the frequency of a certain outcome is significantly higher for CCS having received specific treatment. We make two model assumptions: first, we assume an exponential distribution of the occurrence of a specific outcome in the population, i.e., the time t until the first occurrence of an outcome with frequency λ is distributed as follows: Second, we assume that each treatment or other exposure that may have an impact on an outcome, will independently of the other exposures cause a possible increase Δλ to the base frequency of that outcome. Both assumptions are only approximately true—the first because many outcomes increase in frequency with increasing age[13], and the second because there may be exposures that instead increase the sensitivity to other exposures, making the different exposures statistically dependent. Justification of the assumptions and discussion of the possible implications of violations to them is included in the Discussion section.

Derivation of the unconditional independence test

Given the exponential distribution assumption, we can characterize the observations D in a population regarding a specific outcome in terms of the number of individuals n which had this outcome, and the total observation time t of all individuals, up till the first occurrence of the outcome in each individual. Then we can express the Bayesian estimation of the distribution of the frequency λ of the outcome as follows:where we have assumed a uniform prior over λ. To calculate whether the frequency in one subpopulation is significantly higher than in another subpopulation, we consider the distribution of the difference between the two frequencies, Δλ. Once we have this distribution we can make a hypothesis test with the null hypothesis that Δλ = 0. First, we express the joint distribution of the two frequencies of the outcome in the two subpopulations, set the first frequency to λ and the second to λ + Δλ, and integrate over λ: Here the result of the integral is expressed in the hypergeometric function 1F1. With this distribution, we can now calculate both the expected value of Δλ as an estimate of how large the effect is, and the significance of the difference by considering the area of the tail reaching “beyond” zero (that is, the probability that we mistake a positive difference for a negative, or vice versa).

Derivation of the conditional independence test

To determine whether a detected effect is direct or indirect, we systematically condition on alternative causes. We keep only those effects that remain significant in this process, thus removing all effects that can be explained by other causes. For this, we need a conditional independence test, i.e., it is necessary to check the significance of non-zero Δλ under conditioning on a set of other potential causes. Here the assumption that each cause has an (approximately) additive effect on the frequency comes in. The frequencies themselves may vary between the different subpopulations. However, if we assume that each treatment adds to the frequency of an outcome, then this frequency difference Δλ will be invariant over the subpopulations. Instead of estimating separate distributions for each subpopulation and assess the significance in each of them, we can thus use Bayesian estimation of this common Δλ from all subpopulations, dramatically reducing the loss of significance that would otherwise occur due to conditioning. More formally, assume that all conditioning factors are collected into a joint variable G that can assume K different values. That is, there are K different subpopulations, each with a different set of values of the factors in G. Every subpopulation, corresponding to k ∈ K, is in turn divided in two parts, D1( and D2(, respectively with and without the cause under observation. Then the conditional distribution of the difference of frequencies between those with and without the potential cause can be expressed as follows: We have once again assumed a uniform prior for Δλ. The expected value correspond to the size of the conditional effect, i.e., the direct effect associated with the cause after having removed the effects of the causes in G. The area of the tail of the distribution beyond zero is again used as the significance of the difference.

Resulting algorithm description

We now have an invariant Bayesian conditional independence test, suitable for the current domain with its limited amount of data, which we can use to identify the causal structure of the domain. This is based on the first two parts of the PC algorithm with a minor modification of the second step to make the algorithm independent on the order of evaluation. The first step is to identify all significant correlations (direct or indirect) from the potential causes (i.e., predisposing factors, first CCD, and childhood cancer treatment) to each outcome using Eq. 3. In the second step, each identified relation is reconsidered when conditioning on combinations of the other identified potential causes, to remove those relations that can be explained as indirect relations via other factors. In more detail, this step identifies the smallest set of potential causes that cannot make each other non-significant by conditioning, but which together renders all other relations non-significant. The resulting causes are assumed to be the direct causal links to the outcome. The PC algorithm relies on the faithfulness condition, which states that if there is a direct causal link between two variables, there should be a statistical correlation between them. This makes it possible to only condition on potential causes that are themselves correlated with the effect, rather than testing all possible subsets of potential causes. While it is theoretically possible that a set of causal links exactly cancel each other out, such that a cause and its effect appear completely uncorrelated, it is highly unlikely. In this study, where we assume that each cause monotonically increases the risk of a particular effect, such a cancellation of effects is even less likely. It may occur that no single unambiguous set can be found, but two or more sets of potential causes mutually make each other non-significant. This may happen if, e.g., two treatments are usually given together and thus highly correlated. In this case conditioning on one treatment will make the other one independent of any late effects and vice versa. If this occurs, it is not possible to determine which of the treatments is the actual cause. In this study, if no single unambiguous set of potential causes could be found then all sets of potential causes were noted as alternative explanations, and left for manual domain-specific analysis to determine whether one was more biologically relevant than the other was.

Analysis

The cohort was analyzed using the described method. A significance level of 0.01 was used in the conditional independence test to find potential causes and to sieve out non-direct correlations. In total 108 treatment components, 51 primary cancer diagnoses (occurring between ages 0 and 18 years), and five predisposing conditions were investigated as potential causes for 1332 diagnostic outcome codes. Outcomes occurring at least 5 years after the first primary malignancy were counted as diagnostic outcomes, but relapses were not. This resulted in 814 significant correlations after the first step, and 274 remaining proposed causal relations after the second step. Of the 274 relations, 146 were at a significance level in the interval 0.001–0.01, 32 at a significance level in the interval 0.0001–0.001, and 98 at a significance level in the interval 0.0–0.0001. Only the last interval was considered sufficiently significant (see Supplementary Methods “Choice of significance level” for further motivation of this choice), and the resultant 98 relations were taken for validation. Mortality rates were analyzed according to the Kaplan–Meier method[14]. The frequency of the presumed novel associations was calculated by dividing number of individuals by the accrued number of person-years for the control population, CCS without the specific outcome code, and CCS with the specific outcome code, respectively.

Validation of results

The results were graded together with medical expertise and extensive literature searches into classes: Category A: previously known or reported effect, Category B: previously unknown effect, and Category C: other; further divided into: C1: probable bias in registering, C2: probable bias due to increased surveillance or awareness after the first cancer, C3: data quality and congregation issues, and C4: suspicion of time era as a confounding factor. As the pursuit of confirmations of actual outcomes in medical charts was not possible, the 98 associations were tested for reliability by determination of time of diagnosis code registration. Associations that (1) preceded the CCD or (2) besides occurring 5 years post CCD also occurred within the period 0.5–5 years post CCD were identified. We chose 0.5 years post CCD to exclude acute effects of the induction phase of childhood cancer treatment. A subset of associations contained ICD-10 chapter C or chapter D outcome codes as consequences of a pediatric malignancy and was scrutinized for the source of the outcome code. The associations where the outcome codes originated solely in the outpatient setting are indicated, as the reliability of these outcome codes is more uncertain. A significance level of 0.001 was used to verify whether an association remained intact during the additional tests controlling for the factors time and source of registration code.

Table 1

Descriptive characteristics of the cohort and the control population.

	Control group	CCS
Total	11,882	2315
Alive	11,716 (98.6)	2145 (92.6)
Deceased	166 (1.4)	171 (7.4)
Male	5877 (49.5)	1152 (49.7)
Female	6005 (50.5)	1164 (50.3)
Number of 5-year survivors per pediatric cancer diagnosis group[81]
Group		n, (%)
I	Leukemias	481 (20.8)
II	Lymphomas	292 (12.6)
III	CNS	561 (24.2)
IV	Neuroblastoma	87 (3.8)
V	Retinoblastoma	49 (2.1)
VI	Renal tumors	113 (4.9)
VII	Hepatic tumors	13 (0.6)
VIII	Malignant bone tumors	108 (4.7)
IX	Soft tissue sarcomas	95 (4.1)
X	Germ cell tumors	110 (4.7)
XI	Other malignant epithelial neoplasms and malignant melanomas	135 (5.8)
XII	Other malignant neoplasms	271 (11.7)
	Total	2315 (100)

The characteristics of the individuals contributing to the study are shown for the control group and childhood cancer survivors (CCS), respectively (total number and % of total). For the CCS group, the distribution of childhood cancer diagnoses is shown.

n number, CCS childhood cancer survivor, CNS central nervous system.

Table 2

The presumed novel associations (p < 0.0001) between childhood cancer diagnosis (CCD), treatment details and outcomes (shown in plain text and International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) codes).

Association number as found in Supplementary Data 3	Presumed novel associations (n = 5)		Literature references
Association number as found in Supplementary Data 3	Possible cause	Outcome	Literature references
2	Alkylating agents	Disorders of lacrimal system (H04)	[70–72]
7	Topoisomerase inhibitors	Viral infection of unspecified site (B34)	n.a.
49	Surgery in pituitary region	Viral intestinal infection (A08)	n.a.
50	Surgery in pituitary region	Gastroenteritis and colitis of infectious origin (A09)	n.a.
72	Malignant neoplasm of cervix uteri (C53)	Inflammatory disease of uterus, except cervix (N71 \| female)	n.a.

Analysis was performed with the developed method as described and validated with domain expertise. The sample size was n = 2315. These five associations were graded B as presumed novel in the validation step with domain expertise and literature searches. They were regarded as intact as they did not originate from the outpatient setting, were not present before the CCD, and did not occur within the 0.5–5 years post CCD time frame. The 98 associations are shown in full in Supplementary Data 3 and the statistical basis for them is shown in Supplementary Data 4.

|Female or |male indicates that the association was based on only one sex, respectively.

International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10) codes in parentheses.

n.a. not applicable as none were found.

75 in total

1. Health care use of long-term survivors of childhood cancer: the British Childhood Cancer Survivor Study.

Authors: Cornelia E Rebholz; Raoul C Reulen; Andrew A Toogood; Clare Frobisher; Emma R Lancashire; David L Winter; Claudia E Kuehni; Michael M Hawkins
Journal: J Clin Oncol Date: 2011-09-26 Impact factor: 44.544

2. Increased Risk of All Cardiovascular Disease Subtypes Among Childhood Cancer Survivors: Population-Based Matched Cohort Study.

Authors: Ashna Khanna; Priscila Pequeno; Sumit Gupta; Paaladinesh Thavendiranathan; Douglas S Lee; Husam Abdel-Qadir; Paul C Nathan
Journal: Circulation Date: 2019-08-26 Impact factor: 29.690

Review 3. Recommendations for gonadotoxicity surveillance in male childhood, adolescent, and young adult cancer survivors: a report from the International Late Effects of Childhood Cancer Guideline Harmonization Group in collaboration with the PanCareSurFup Consortium.

Authors: Roderick Skinner; Renee L Mulder; Leontien C Kremer; Melissa M Hudson; Louis S Constine; Edit Bardi; Annelies Boekhout; Anja Borgmann-Staudt; Morven C Brown; Richard Cohn; Uta Dirksen; Alexsander Giwercman; Hiroyuki Ishiguro; Kirsi Jahnukainen; Lisa B Kenney; Jacqueline J Loonen; Lilian Meacham; Sebastian Neggers; Stephen Nussey; Cecilia Petersen; Margarett Shnorhavorian; Marry M van den Heuvel-Eibrink; Hanneke M van Santen; William H B Wallace; Daniel M Green
Journal: Lancet Oncol Date: 2017-02 Impact factor: 41.316

4. Risk Factors of Subsequent Central Nervous System Tumors after Childhood and Adolescent Cancers: Findings from the French Childhood Cancer Survivor Study.

Authors: Neige Marie Yvanne Journy; Wael Salem Zrafi; Stéphanie Bolle; Brice Fresneau; Claire Alapetite; Rodrigue Setcheou Allodji; Delphine Berchery; Nadia Haddy; Isao Kobayashi; Martine Labbé; Hélène Pacquement; Claire Pluchart; Boris Schwartz; Vincent Souchard; Cécile Thomas-Teinturier; Cristina Veres; Giao Vu-Bezin; Ibrahima Diallo; Florent de Vathaire
Journal: Cancer Epidemiol Biomarkers Prev Date: 2020-10-08 Impact factor: 4.254

Review 5. Male fertility and strategies for fertility preservation following childhood cancer treatment.

Authors: R T Mitchell; P T K Saunders; R M Sharpe; C J H Kelnar; W H B Wallace
Journal: Endocr Dev Date: 2009-03-03

6. Ultrasound screening for thyroid carcinoma in childhood cancer survivors: a case series.

Authors: Enrico Brignardello; Andrea Corrias; Giuseppe Isolato; Nicola Palestini; Luca Cordero di Montezemolo; Franca Fagioli; Giuseppe Boccuzzi
Journal: J Clin Endocrinol Metab Date: 2008-09-23 Impact factor: 5.958

7. Hypothyroidism after Radiation Therapy for Childhood Cancer: A Report from the Childhood Cancer Survivor Study.

Authors: Peter D Inskip; Lene H S Veiga; Alina V Brenner; Alice J Sigurdson; Evgenia Ostroumova; Eric J Chow; Marilyn Stovall; Susan A Smith; Rita E Weathers; Wendy Leisenring; Leslie L Robison; Gregory T Armstrong; Charles A Sklar; Jay H Lubin
Journal: Radiat Res Date: 2018-05-15 Impact factor: 2.841

8. Late Cardiotoxicity in Aging Adult Survivors of Childhood Cancer.

Authors: Gregory T Armstrong; Jordan D Ross
Journal: Prog Pediatr Cardiol Date: 2014-09-01

9. Hearing Loss in Patients Who Received Cranial Radiation Therapy for Childhood Cancer.

Authors: Johnnie K Bass; Chia-Ho Hua; Jie Huang; Arzu Onar-Thomas; Kirsten K Ness; Skye Jones; Stephanie White; Shaum P Bhagat; Kay W Chang; Thomas E Merchant
Journal: J Clin Oncol Date: 2016-01-25 Impact factor: 44.544

10. Long-term inpatient disease burden in the Adult Life after Childhood Cancer in Scandinavia (ALiCCS) study: A cohort study of 21,297 childhood cancer survivors.

Authors: Sofie de Fine Licht; Kathrine Rugbjerg; Thorgerdur Gudmundsdottir; Trine G Bonnesen; Peter Haubjerg Asdahl; Anna Sällfors Holmqvist; Laura Madanat-Harjuoja; Laufey Tryggvadottir; Finn Wesenberg; Henrik Hasle; Jeanette F Winther; Jørgen H Olsen
Journal: PLoS Med Date: 2017-05-09 Impact factor: 11.069