Literature DB >> 17270060

Classification tree analysis of second neoplasms in survivors of childhood cancer.

Janez Jazbec1, Ljupco Todorovski, Berta Jereb.   

Abstract

BACKGROUND: Reports on childhood cancer survivors estimated cumulative probability of developing secondary neoplasms vary from 3.3% to 25% at 25 years from diagnosis, and the risk of developing another cancer to several times greater than in the general population.
METHODS: In our retrospective study, we have used the classification tree multivariate method on a group of 849 first cancer survivors, to identify childhood cancer patients with the greatest risk for development of secondary neoplasms.
RESULTS: In observed group of patients, 34 develop secondary neoplasm after treatment of primary cancer. Analysis of parameters present at the treatment of first cancer, exposed two groups of patients at the special risk for secondary neoplasm. First are female patients treated for Hodgkin's disease at the age between 10 and 15 years, whose treatment included radiotherapy. Second group at special risk were male patients with acute lymphoblastic leukemia who were treated at the age between 4.6 and 6.6 years of age.
CONCLUSION: The risk groups identified in our study are similar to the results of studies that used more conventional approaches. Usefulness of our approach in study of occurrence of second neoplasms should be confirmed in larger sample study, but user friendly presentation of results makes it attractive for further studies.

Entities:  

Mesh:

Year:  2007        PMID: 17270060      PMCID: PMC1802085          DOI: 10.1186/1471-2407-7-27

Source DB:  PubMed          Journal:  BMC Cancer        ISSN: 1471-2407            Impact factor:   4.430


Background

As the number of childhood cancer survivors grows and the period of follow-up lengthens, increasing attention is directed towards the delayed adverse effects of therapy. The late effects of treatment on many organs have been described. These include the, cardiovascular, skeletal, endocrine, dental, hepatic, pulmonary and renal systems. Psychosocial, educational and neuropsychological problems are also common, but among the most serious of the delayed complications is the appearance of second neoplasms (SN). The better the treatment results become for the primary malignancy, the more may long-term results be compromised by secondary cancers [1]. Relevant reports of the cumulative probability of developing SN vary from 3.3% to 25% at 25 years from diagnosis. Thus, the risk of developing another cancer can be up to 35 times greater than in the general population [2]. SNs develop after interaction among many independent factors to which the patient is exposed before, during and after treatment of the first malignancy. Some of those factors may have synergistic oncogenic effects on the development of SNs, and design of prospective studies to identify those risk factors is difficult, due to the long latency period. In our retrospective study, we have used the decision tree multivariate method to identify the group of childhood cancer patient with the greatest risk for development of SN.

Methods

Patients

The study included 1577 cancer patients younger than 16 years of age registered at the Cancer Registry of Slovenia in the period from 1-1-1961 to 12-10-2000. The decision tree analysis was performed on a group of 849 first cancer survivors, among whom 34 developed a SN. An SN was defined as a malignant neoplasm in a new location that was neither the result of direct spread nor a metastasis from the primary neoplasm. Also included among the SNs was a neoplasm in the same location as the primary but of different histological type [3]. Primary neoplasms were categorized according to histology as: leukemia, Hodgkin's disease, non-Hodgkin lymphoma, Ewing sarcoma, osteogenic tumors, nephroblastoma, neuroblastoma, hepatoblastoma, rhabdomyosarcoma, retinoblastoma, thyroid cancer, germ-cell tumors, tumors of central nervous system (CNS) and others. The group of "others" consisted of carcinomas of different organ systems in 41 cases and two melanomas. They were grouped together because each particular group was too small for further analysis. Data in the database included patient's name, sex, date of birth, clinical diagnosis, histologic type of the neoplasm, date of the diagnosis, treatment modality, date and status at the last follow-up. Detailed information on chemotherapy and radiotherapy was not included in the database. Table 1 presents the independent and dependent variables used for multivariate analysis.
Table 1

Description and values of the independent variables and the dependent variables (last row) used for multivariate analysis.

Variable nameDescription
Sexmale (485), female (346)
age_at_diagnosisnumeric
histology_type (categories)leukemia, Hodgkin's disease, non-Hodgkin lymphoma, Ewing sarcoma, osteogenic tumors, nephroblastoma, neuroblastoma, hepatoblastoma, rhabdomyosarcoma, retinoblastoma, thyroid cancer, germ-cell tumors, tumors of central nervous system, others
Surgeryyes (481), no (368)
Radiotherapyyes (500), no (349)
Chemotherapyyes (598), no (251)
second_neoplasmyes (34), no (815)
All data were collected through the childhood cancer follow-up program in Slovenia. One pediatric-oncology center in the Department of Pediatrics, University Medical Center, Ljubljana, serves as a national referral center for all pediatric patients with malignant diseases. It covers the population of Slovenia that approximates 2 million. After the end of treatment all children are followed in the center until the end of schooling or for at least four years. After that, they are followed at the outpatient Clinic for Late Effects at the Institute of Oncology. A team there, headed by an oncologist known to the patient as a member of the pediatric follow-up team, continues follow-up for life [4]. Fewer than 5% of patients were lost to follow-up because of permanent migration outside the territory of the Republic Slovenia. All of them were treated before 1990. The study was performed in compliance with the Helsinki Declaration with the approval N° 38/11/96 of National Medical Ethics Committee of Slovenia

Classification tree analysis

Classification tree is a method for multivariate analysis that allow for study of simultaneous influence of a series of independent variables on the one dependent variable. The analysis is performed by successive divisions of the original group of cases into pairs of subgroups, where each division is based on the value of a single independent variable. The variable that produces most pure pair of case subgroups is chosen for a division (division being often referred to as a split). A purity of a case group is measured as a fraction of cases with the same value of the dependent variable: a completely pure group contains cases that have the same outcome. Each of subgroup in the pair becomes a parent group in the next step of the analysis and is therefore further divided in the same way. The division of cases stops when the group of cases is completely pure or when it contains less than operator-defined minimal number of cases. In our study, the C4.5 [5] program for building classification trees was used. C4.5 allows the setting of several parameters that influence the branching and quality of final classification tree: most notably there is one parameter that determines the smallest number of cases to be included in a single group (mentioned already above), and another parameter that determines the degree of post-pruning performed. For details please refer to the description in [5]. The optimal values of these parameters were determined using a standard cross-validation method [6-8]. Following this method, we systematically try different combinations of parameter settings and use cross-validation to estimate the performance of the tree on unseen cases, and choose the settings that lead to the best tree performance. Using these optimal settings, we build a tree that is then used in further analysis and present in next section. We tried 5 possible values for the minimal number of cases in a group parameter (from 1 to 5) and 7 possible values for post-pruning confidence (1, 5, 10, 25, 50, 75, and 99%), which lead to 5 × 7 = 35 possible parameter settings. The usual performance measure for classification trees is the accuracy of the tree when predicting the outcome (the value of the dependent variable) on samples not seen during the process of tree building. Note however, that since the SN has been observed in only a minority of patients (about 4%), the classification tree algorithm tend to build a single group of cases that classify all the patients as non-SN cases, this simple tree have a prediction accuracy of 96% that can not be significantly improved. This tree however misclassifies all the patients where SN is observed as non-SN cases. Note that this misclassification is much more serious for the patient when the opposite one, where a non-SN patient is predicted to have SN. The tool to deal with this issue in classification trees is to assign different costs to misclassifications, i.e., specifying that misclassifying a SN patient as a non-SN case is X time worse (or more costly) than the misclassifying a patient in opposite direction, where X is a user-specified parameter. In business applications of classification trees, misclassifications can be easily related to costs and these can be then used to estimate the X parameter setting. However, in our case, this is non-trivial issue: we know that this X is larger than 1. Thus, we approach this problem using the cross-validation procedure outlined above: we use it to find optimal settings for the X parameter. We choose the parameter that lead to minimal number of misclassifications of a SN patient as a non-SN case. In the experiments with C4.5, we increase the cost of this misclassification type using 7 different settings, starting with the default one of 1:1 (equal costs of both misclassifications), through 1:5, 1:10, 1:25, 1:50, 1:100, and 1:200. Note finally, that since we use an alternative performance criterion, the classification tree obtained the cross-validation procedure outlined above is not expected to provide accurate classification of cases into SN and non-SN classes. Instead of using the tree as an accurate predictor, we are interested in analyzing the tree structure and identifying the risk groups where incidence of SN is significantly higher than the one observed in the whole population of 849 cancer survivors included in the study.

Results

Highly branched tree, where most of the cases are misclassified, may be result of low rate of events or low predictability of the factors used to develop the tree. In the analysis of the entire group of 1577 childhood cancer patients, the number of SN cases is below 3% and all the SN cases were classified as non-SN cases. Therefore we reduced our analysis on the group of children who survived their first cancer. There were 849 patients in this group and 34 developed SN. We have build several classification tree models with different misclassification costs in the algorithm. We considered misclassification of an SN case in the group without SN as a more severe mistake than vice versa. In the extreme case, with misclassification cost 5:1, we have built a tree where all SN cases were allocated in the group without SN. On the other side, if the misclassification cost was set too high, there were too many cases without SN classified as patients with SN. Table 2 presents a sample of classification results obtained using three different parameter settings. The table includes results for the optimal parameter setting, where the misclassification cost value was set to 25:1, post-pruning confidence value to 50% and minimal number of cases in a group to 2.
Table 2

Comparison of classification results obtained using cross-validation for three different C4.5 parameter settings.

Misclassification cost setting 5:1, post-pruning confidence 1%
Classified as
ObservedSNNon-SN
SN034
Non-SN0815

Misclassification cost setting 10:1, post-pruning confidence 1%

Classified as
ObservedSNNon-SN
SN826
Non-SN66749

Misclassification cost setting 25:1, post-pruning confidence 50%

Classified as
ObservedSNNon-SN
SN313
Non-SN18797

The settings comprise of three different misclassification costs, two different post-pruning confidence values, and the default value of 2 for the minimal number of cases in a group parameter. In each table, the number of SN patients being misclassified as non-SN is typed in bold-face. To induce the final tree, presented in Figure 1, we selected a parameter setting that lead to minimal number of such misclassifications.

On the basis of these results presented in Table 2, we were able to choose the parameter setting that gave the lowest number of SN cases being misclassified as non-SN cases. Figure 1 depicts the classification tree obtained using this parameter setting.
Figure 1

Clasification tree for the risk of secondary neoplasm after treatment for childhood cancer. Analysis of 849 childhood cancer survivors from Cancer Registry of Slovenia.

Despite the optimal setting, branching of the tree is still considerable. There are many sets with individual SN cases and some clusters in which misclassified non-SN cases predominate. In the graphic presentation of the pruned tree the first factor that divides our cohort is radiotherapy. In the group of patients treated without radiotherapy, only 1,4% patients developed SN, which is considerable less than in the group of irradiated patients (5,8%). From this point we can follow two paths. The first one encompasses patients with Hodgkin's disease. At the end of the non-Hodgkin's disease branch, a group of females, aged between 10 and 15 years at first diagnosis and treated with chemotherapy, can be identified as a group in which the risk of an SN reaches 45%. The other path reveals a group of male patients with acute leukemia, who were aged between 4.6 and 6.6 years of diagnosis of leukemia. In these groups the risk for SN reaches 40%. Both incidence rates of 40% and 45%, observed in these groups of patients, are significantly higher compared to the observed 4% incidence in the whole population. The fact that can be easily confirmed using a simple ChiSquared test, see Table 3 for results.
Table 3

Comparison of SN incidence in the two risk groups, identified using the classification tree from Figure 1, with the incidence of SN in the whole observed population

First identified risk group (girls with Hodgkin's lymphoma): ChiSquared p-value is much smaller than 0.01 (in a range of 10-11)
Group of patientsSNNon-SNTotal

Identified risk group5611
Others29809838
All included in the study34815849

Second identified risk group (boys with acute lymphoblastic leukemia): ChiSquared p-value is much smaller than 0.01 (in a range of 10-9)

Group of patientsSNNon-SNTotal

Identified risk group4610
Others30809838
All included in the study34815849

ChiSquared test show significant difference in both cases.

Discussion and conclusion

In general the estimation of risk varies, between hospital based and population based studies [2], probably due to more complete follow-up in the former registries. Our population based study, differs from similar studies also for defining a period at risk for SMN, from the diagnosis of primary cancer on. Varying cure rates in different time periods also have impact on estimated risk. The period covered in our study starts in early seventies, when cure rate of childhood cancer was still very low. In our study of 849 childhood cancer survivors we have performed a multivariate analysis using classification trees to identify groups that are at special risk for the development of a SN. The group at highest risk was identified as girls with Hodgkin's disease, aged between 10 and 16 years at first diagnosis, who were treated with combined of chemo- and radio-therapy. In all of this cases, the SN was a carcinoma, with the latent period ranging from 3 to 16.5 years after treatment of the Hodgkin's disease. These results are similar to the observation of Beaty et. al. [9], who found statistically significant higher risk for SN in adolescents treated for Hodgkin's disease. Bhatia and coworkers found 6.7 fold higher risk for SN in patients treated for Hodgkin's disease between 10 and 16 years of age [10]. They also found the risk for secondary solid tumors after a combination of chemotherapy and radiotherapy to be twice as great as after chemotherapy without radiotherapy. It is possible that some tissues are particularly vulnerable to the carcinogenic effect of chemotherapy and radiotherapy during puberty. The challenge is to maintain the high rate of cure in Hodgkin's disease and at the same time reduce the risk for second malignancies. Some modern protocols of treatment of Hodkin's disease have already reduced or completely omitted radiotherapy for patients with low stages of disease. Löning et. al found radiation therapy as significant risk factor for SN after treatment of childhood acute lymphoblastic leukemia [11]. This is in contrast with the results of Dalton et. al. [12]. Löning also states that particularly young children are at increased risk when irradiation has been used. Intensive chemotherapy regimens do not predict a higher risk as reported in several studies [13,14]. In the Childhood Cancer Survivor Study, the diagnosis of leukemia was independently associated with the occurrence of a second malignant tumor of the central nervous system, as was younger age at diagnosis [15]. The improved survival rate of children with cancer should not be overshadowed by the incidence of SNs. Nonetheless, patients and health care providers should be aware of the populations at greatest risk for this serious complication, and focus their efforts on primary and secondary prevention in this vulnerable population. Using the C 4.5 algorithm for building classification trees, we were able to construct subgroups at different risk, by logical combination of patients characteristics. The risk groups identified in our study are similar to the results of studies that used more conventional approaches. In contrast to traditional regression methods (e.g. Cox proportional hazard regression) which compute prognostic index as a weighted average of patients' characteristics, in the classification tree model the subgroups are based directly on the patients' characteristics. The model shows the correlation between the various independent variables and its influence on the end result [16]. Another advantage of the method is in its simple and intuitive nature (i.e. find the best split by examining all possible splits in all available variables, form subgroups based on this split, repeat in all subgroups) [17]. Classification trees have been used in medical and health care applications for more than 20 years and have been shown to be a powerful classification tool in various areas [18]. In oncology the method has been used for tumor classification, evaluation of biomarkers [19-23]. The sample size represents a limitation in our study, but the method used is a potentially powerful tool for investigating multilevel interactions [24]. Occurrence of secondary neoplasms may well be the result of complex interactions of several independent factors such as genetic predisposition, treatment related factors and environmental exposures. The approach to the analyses of a larger sample here described might serve to validate the technique.

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

All authors read and approved the final manuscript. JJ carried out the patient recruitment, acquisition and interpretation of the data. He also drafted the manuscript. LD performed the statistical analysis and drafted the manuscript. BJ is a leader of Late effect study group and participated in the design of the study, carried out the patient recruitment and gave final approval of the version to be published.

Pre-publication history

The pre-publication history for this paper can be accessed here:
  19 in total

Review 1.  Secondary malignancies after multimodality treatment regimens.

Authors:  Reinhard Kodym
Journal:  Front Radiat Ther Oncol       Date:  2002

2.  Classification tree analysis: a statistical tool to investigate risk factor interactions with an example for colon cancer (United States).

Authors:  Nicola J Camp; Martha L Slattery
Journal:  Cancer Causes Control       Date:  2002-11       Impact factor: 2.506

Review 3.  Decision trees: an overview and their use in medicine.

Authors:  Vili Podgorelec; Peter Kokol; Bruno Stiglic; Ivan Rozman
Journal:  J Med Syst       Date:  2002-10       Impact factor: 4.460

4.  Secondary neoplasms subsequent to Berlin-Frankfurt-Münster therapy of acute lymphoblastic leukemia in childhood: significantly lower risk without cranial radiotherapy.

Authors:  L Löning; M Zimmermann; A Reiter; P Kaatsch; G Henze; H Riehm; M Schrappe
Journal:  Blood       Date:  2000-05-01       Impact factor: 22.113

5.  Classification and regression tree analysis of 1000 consecutive patients with unknown primary carcinoma.

Authors:  K R Hess; M C Abbruzzese; R Lenzi; M N Raber; J L Abbruzzese
Journal:  Clin Cancer Res       Date:  1999-11       Impact factor: 12.531

6.  Application of classification tree and neural network algorithms to the identification of serological liver marker profiles for the diagnosis of hepatocellular carcinoma.

Authors:  T C Poon; A T Chan; B Zee; S K Ho; T S Mok; T W Leung; P J Johnson
Journal:  Oncology       Date:  2001       Impact factor: 2.935

7.  Second malignancies in patients treated for childhood acute lymphoblastic leukemia.

Authors:  V M Kimball Dalton; R D Gelber; F Li; M J Donnelly; N J Tarbell; S E Sallan
Journal:  J Clin Oncol       Date:  1998-08       Impact factor: 44.544

8.  Identification of patients with head and neck cancer using serum protein profiles.

Authors:  J Trad Wadsworth; Kenneth D Somers; Brendan C Stack; Lisa Cazares; Gunjan Malik; Bao-Ling Adam; George L Wright; O John Semmes
Journal:  Arch Otolaryngol Head Neck Surg       Date:  2004-01

9.  Long-term sequelae in children treated for brain tumors: impairments, disability, and handicap.

Authors:  Marta Macedoni-Luksic; Berta Jereb; Lupco Todorovski
Journal:  Pediatr Hematol Oncol       Date:  2003-03       Impact factor: 1.969

10.  Second malignant neoplasms in five-year survivors of childhood cancer: childhood cancer survivor study.

Authors:  J P Neglia; D L Friedman; Y Yasui; A C Mertens; S Hammond; M Stovall; S S Donaldson; A T Meadows; L L Robison
Journal:  J Natl Cancer Inst       Date:  2001-04-18       Impact factor: 11.816

View more
  5 in total

1.  Two patients with Hailey-Hailey disease, multiple primary melanomas, and other cancers.

Authors:  Melinda R Mohr; Gulsun Erdag; Amber L Shada; Michael E Williams; Craig L Slingluff; James W Patterson
Journal:  Arch Dermatol       Date:  2011-02

2.  DNA repair polymorphisms influence the risk of second neoplasm after treatment of childhood acute lymphoblastic leukemia.

Authors:  Nina Erčulj; Barbara Faganel Kotnik; Maruša Debeljak; Janez Jazbec; Vita Dolžan
Journal:  J Cancer Res Clin Oncol       Date:  2012-07-01       Impact factor: 4.553

3.  Genotyping of BRCA1, BRCA2, p53, CDKN2A, MLH1 and MSH2 genes in a male patient with secondary breast cancer.

Authors:  Ana Lina Vodusek; Srdjan Novakovic; Vida Stegel; Berta Jereb
Journal:  Radiol Oncol       Date:  2011-09-22       Impact factor: 2.991

4.  Late somatic sequelae after treatment of childhood cancer in Slovenia.

Authors:  Nuša Erman; Ljupčo Todorovski; Berta Jereb
Journal:  BMC Res Notes       Date:  2012-05-24

5.  Cardiac damage after treatment of childhood cancer: a long-term follow-up.

Authors:  Veronika Velensek; Uros Mazic; Ciril Krzisnik; Damjan Demsar; Janez Jazbec; Berta Jereb
Journal:  BMC Cancer       Date:  2008-05-20       Impact factor: 4.430

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.