Literature DB >> 23984353

Identifying the association rules between clinicopathologic factors and higher survival performance in operation-centric oral cancer patients using the Apriori algorithm.

Jen-Yang Tang1, Li-Yeh Chuang, Edward Hsi, Yu-Da Lin, Cheng-Hong Yang, Hsueh-Wei Chang.   

Abstract

This study computationally determines the contribution of clinicopathologic factors correlated with 5-year survival in oral squamous cell carcinoma (OSCC) patients primarily treated by surgical operation (OP) followed by other treatments. From 2004 to 2010, the program enrolled 493 OSCC patients at the Kaohsiung Medical Hospital University. The clinicopathologic records were retrospectively reviewed and compared for survival analysis. The Apriori algorithm was applied to mine the association rules between these factors and improved survival. Univariate analysis of demographic data showed that grade/differentiation, clinical tumor size, pathology tumor size, and OP grouping were associated with survival longer than 36 months. Using the Apriori algorithm, multivariate correlation analysis identified the factors that coexistently provide good survival rates with higher lift values, such as grade/differentiation = 2, clinical stage group = early, primary site = tongue, and group = OP. Without the OP, the lift values are lower. In conclusion, this hospital-based analysis suggests that early OP and other treatments starting from OP are the key to improving the survival of OSCC patients, especially for early stage tongue cancer with moderate differentiation, having a better survival (>36 months) with varied OP approaches.

Entities:  

Mesh:

Year:  2013        PMID: 23984353      PMCID: PMC3741931          DOI: 10.1155/2013/359634

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.411


1. Introduction

In Taiwan, betel nut chewing, cigarette smoking, and alcohol consumption have been found to be highly associated with oral cancer [1], with habitual betel nut chewers showing a particular high prevalence [2-4]. Oral cancer is one of the 10 most prevalent cancers in Taiwan, mostly classified as oral squamous cell carcinoma (OSCC) [5], which has high rates of morbidity and mortality [6] because diagnosis often only takes place in the later stages [7]. Although many tumor markers [8-10] and single nucleotide polymorphism (SNP) markers [11] have been reported as being associated with oral cancer, outcome-based studies focusing on oral cancer therapy are lacking. The survival of OSCC patients following surgical therapy has been reported to be affected by tumor size, nodal metastasis, staging, and differentiation [12]. Some researchers have been further concerned with factors involved in outcomes for postoperative radiotherapy for OSCC patients [13]. However, the correlation between the multiple survival affecting factors for predicting the well survival of OSCC therapy is less addressed and remains a challenge. Recently, several computational methodologies have been introduced to analyze the relationship between multiple factors and therapies for several non-OSCC diseases, including machine learning algorithms [14], data mining [15], decision tree-based learning [16], and rule-based multiscale simulations [17]. The Apriori algorithm is used here to explore the correlation between clinical factors and good survival outcomes (i.e., >36 months) in operation- (surgery-) centric treatments, including operation alone, operation/IA, and operation/IA, CT, IV, and RT, where IA, IV, CT, and RT, respectively stand for intra-arterial, intravenous, oral chemotherapies, and radiotherapy. The study aims to computationally evaluate the correlation between clinicopathological factors and survival outcomes in 493 OSCC patients treated by operation alone or by operation followed with other nonsurgical treatments.

2. Materials and Methods

2.1. Data Source

The database used to construct our cases and control groups was obtained from the chart registry of cancer center of the Kaohsiung Medical University Hospital from 2004 to 2010. Patients were excluded if they had distant metastases at presentation, did not complete the therapeutic protocol in Kaohsiung Medical University Hospital, or had incomplete records. A total of 493 patients fulfilled the requirements and were included for further analyses (the raw data set is available at http://bioinfo.kmu.edu.tw/OP_high-OP_low_groups.xlsx). The patients were followed at Kaohsiung Medical University Hospital. The last followup was recorded from the last outpatient visit or the date of death. This use of patient data and the study design were reviewed and approved by the Institutional Review Board of Kaohsiung Medical University Hospital (KMUH-IRB-EXEMPT-20130029).

2.2. Introduction of the Apriori Algorithm

The problem for association rule learning can be stated as follows. Let I = {i 1, i 2,…, i } be a set of literals, called items. Let transaction T be a set of items, where T⊆I. Let D be a set of transactions. The objective of the association rule is an implication of the form A⇒B, where A ⊂ I and B ⊂ I, if A∩B = Ø. The rule A⇒B holds in the transaction set D with confidence c if c% of transactions in D that contain A also contain B. The rule A⇒B has support s in the transaction set D if s% of transactions in D contain A ∪ B. Item sets with the minimum support s are called large itemsets, and the others small itemsets. The Apriori algorithm was proposed by Agrawal and Srikant in 1994 [18] and has been widely used for frequent itemset mining and association rule learning in databases. The Apriori algorithm aims to generate the desired rules from large itemsets. The general idea is that if items ABCD are large itemsets, then any rule in ABCD will have the minimum required support because ABCD is large; that is, AB⇒CD. The Apriori algorithm can be divided into three steps. Algorithm 1 shows the pseudocode of the Apriori algorithm. The algorithm's first pass counts item occurrences to screen the large itemsets (Section 2.2.1). The second pass generates the candidate itemsets C from large itemsets L , using the apriori-gen function (Section 2.2.2). Next, each transaction t checks whether the subsets of k-itemsets of t belong to C , called subset function and described in Section 2.2.3. Finally, each c counts item occurrences in C , and c will be stored in L if c.count minimum support. The algorithm terminates when L is empty; that is, no frequent set of k or more items is present in D.
Algorithm 1

Pseudocode of the Apriori algorithm.

2.2.1. Screening the Large 1-Itemsets

Algorithm 2 shows the pseudo code of first pass which simply counts item occurrences I = {i 1, i 2,…, i } to determine the large itemsets in all items. The array of item counts is used to count item occurrences, and elements in Item-counts having minimum support are included in the L 1 set.
Algorithm 2

The first pass of the Apriori algorithm.

2.2.2. Candidate Set Generations

The function apriori-gen (L ) generates C from L , and it returns a superset of the set of all large k-itemsets. Algorithm 3 shows the pseudo code of the function apriori-gen (L ). We use a set c, c = {L .item[i]}, for all i ∈ {1,…, k − 1}, to store the frequent (k − 1)-itemsets in L . The selections of the pairs are called L .item , L .item ∈ L . For each L .item in L , we start the search tuples in the L .item and stop the search if we find L .item such that 1 to k − 2 items are not equal to the 1 to k − 2 items of L .item . Only if we find an L .item that satisfies L .item [i] = L .item [i], for all i ∈ {1,…, k − 2}, the c does create the k-itemset = {L .item [i],…, L .item [k − 2], L .item [k − 1], L .item [k − 1]}. Finally, c checks whether the subsets of c are included in L .
Algorithm 3

Pseudocode of the function apriori-gen().

2.2.3. Candidate Set Counts Using Hash Tree

After the candidate sets C are generated, the C are stored in a hash tree created by the function subset (C , t). The leaf of the hash tree comprises the pointers to C and the associated counters, and the leaf refers to distinct partitions of C . In the hash tree, the hash function can be used to insert the candidate itemsets and search the transaction subsets in C . The hash function is hash(i) = imod⁡T, T < m, where T is a constant, and m is the number of items. Function subset (C , t) is a recursive function which traverses the tree starting from the root node to the leaves, with each item in t = {i 1,…, i } chosen as a possible starting item of a candidate itemset. It is applied at every level of the tree. When t reaches a leaf of the tree, all candidate itemsets are checked against t and their counters are updated.

2.3. Statistics Analysis

Statistical analysis was performed with JMP version 9. All statistical tests were done at a 0.05 significance level.

3. Results and Discussion

3.1. Demographic Data and Survival

3.1.1. Age and Survival

As shown in Table 1, all patients were categorized into 2 groups based on whether the survival is greater or less than 36 months. In this regard, no difference in varied age groups can be found. This is probably because anyone who was eligible for surgical resection would have comparable survival rates.
Table 1

Demographic data of 493 enrolled patients with OSCC.

CharacteristicsSurvived months P value∗1 5-year survival (%) P value∗2
Total>36 group<36 group
Age0.77860.5556
 <3073471.4
 30~5022812510377.2
 50~7023612910779.2
 >702214863.6
Primary Site0.79150.1957
 Lip36241286.1
 Cheek mucosa1841038183.2
 Gum42251771.4
 Tongue175888772.0
 Mouth floor1911868.4
 Palate53260.0
 Retromolar27151277.8
 Vestibule211100.0
 Nonspecific312100.0
Laterality∗3 0.39650.8612
 0037221573.0
 0123012310779.1
 0222312310076.7
 0333066.7
 04000NA
Grade/differentiation0.1476 0.0006
 0128715613180.1
 02123606365.0
 0375257.1
 04110100.0
 0975492689.3
Regional lymph nodes examined0.15500.1424
 <528516012580.4
 >10134656973.1
 5~1073452874.0
Clinical stage group0.07490.5689
 Stage 040475.0
 Stage 1141796280.1
 Stage 273472671.2
 Stage 3131696277.1
 Stage 482503272.0
Pathologic stage group0.25400.0514
 Stage 0220100.0
 Stage 121511210382.3
 Stage 292524075.0
 Stage 331151674.2
 Stage 458243467.2
Clinical tumor size0.3967 0.0004
 <2 cm1621006287.0
 2~4 cm24413411071.3
 >4 cm33191466.7
Pathology tumor size0.4417 0.0141
 <2 cm1971148381.7
 2~4 cm183948969.4
 >4 cm25141172.0
OP group∗4 <0.0001 <0.0001
 0138523814781.6
 0227141366.7
 0381196261.7

∗1 P value for the comparison of the survival between >36 and <36 months groups.

∗2 P value for 5-year survival among the items of the same characteristics group.

∗30: unknown primary site or the shape of the organ is not paired; 1: the primary site is originated from the right side; 2: the primary site is originated from the left side; 3: only one side is invaded but it is not clear which side (R't or L't) it is originated from; 4: both sides are invaded but the origin of the primary site is not clear and the chart record describes only one primary site.

∗4OP group for 01: OP only; 02: OP→IA; 03: OP→CT, OP→CT + IV, OP→CT→RT, OP→IA→RT, OP→IV,OP→IV→RT, OP→RT, OP→RT + CT,OP→RT + IV, OP→RT→CT, OP→RT→IA, OP→RT→IV. Symbols: OP: operation; IA: intraarterial chemotherapy; CT: oral chemotherapy; IV: intravenous chemotherapy; RT: radiotherapy; →: then.

3.1.2. Subsites and Survival

As shown in Table 1, the site distribution of the 493 cases of oral cancer patients showed common affected sites including the cheek mucosa, gum, tongue, and retromolar trigon. Postsurgical organ function and cosmetics may vary with surgical site, but no difference to survival could be found.

3.1.3. Laterality and Survival

As shown in Table 1, laterality is recorded in the database of cancer registries and is a mixed expression of clinical/pathological tumor size and location. It does not play a significant role in the surgical group.

3.1.4. Grade and Survival

As shown in Table 1, comparison of the pathological characteristics between >5-year (n = 271) and <5-year survival (n = 222) revealed better treatment outcomes for low grade tumors (P = 0.0006), suggesting that well-differentiated tumors are less aggressive and thus are associated with better overall survival.

3.1.5. Regional Lymph Nodes and Survival

As shown in Table 1, regional lymph node examination might express the details and quality of surgical resection. However, the number of examined lymph nodes was not found to have an effect on survival. This might be due to cross-interaction between clinical lymph node stages and overall survival.

3.1.6. Clinical Stages, Pathology Stages, Clinical/Pathology Tumor Sizes, and Survival

As shown in Table 1, neither clinical nor pathological stages were found to have an impact on 5-year survival. There might be some influencing factors between low- and high-tumor stages which cannot be simply explained by surgery. However, for clinical/pathological tumor size alone, significant differences between >5-year and <5-year groups are found (P = 0.0004 and P = 0.0141, resp.). Smaller tumor size means less tumor burden and has less surrounding tissue infiltration, which may explain improved overall outcomes.

3.1.7. Surgical Modalities and Survival

As shown in Table 1, treatment modalities (OP) were further differentiated into 3 groups based on different adjuvant therapies, that is, surgery alone, surgery plus intra-arterial chemotherapy, and surgery plus concomitant chemoradiotherapy. Significant differences between groups were found (P < 0.0001), and further analysis of surgical modalities based on the clinical/pathological stages could produce interesting insights. This hospital-based study followed nearly 500 patients with oral squamous cell carcinoma after surgical treatment. Results showed that age of onset and laterality of tumor location did not influence the treatment outcome. The latter might be attributed to oral cancer being a less multifocal or multicentric disease than, for example, breast cancer and, hence, laterality of the primary tumor has less influence on survival. These findings are in line with previous findings [19, 20]. Advanced tumor stage or failure of locoregional control negatively influences survival in patients with OSCC [21]. However, we did not observe a significant influence from either clinical or pathological tumor stages. Similar to our findings, Pandey et al. reported no difference in survival rates for the extent of tumor [22], and the observed difference might be due to the facts that all stages of tumor have been poured in the analysis. In the present study, multimodality treatment proved to be a prognostic factor. Benefit from systemic or adjuvant local therapies might correlate with disease biology as the grade of tumor differentiation was also an important influencing factor.

3.2. Data Mining Results Using Apriori Algorithm

Table 2 shows the best rules for OP > 36 months. The head Y and body X represent a class association rule X⇒Y which means the head Y of an association rule X⇒Y (with rule body X) must be restricted to one attribute-value pair. The attribute of the attribute-value pair is thus the class attribute. The resulting rules can be evaluated according to three metrics: confidence, lift, and leverage. The minimum value of 1.5 for lift (or improvement) is computed as the confidence of the rule divided by the support of the right-hand-side (RHS). The lift represents the ratio of probability. Given a rule X⇒Y, X and Y occur together to the multiple of the two individual probabilities for X and Y; that is,
Table 2

Ranking of the top 10 best rules found in survival larger than 36 months.

Body∗1 No.Head∗1 No.ConfidenceLift∗2 LeverageConviction
Grade/differentiation = 2 Clinical stage group = early49Primary site = tongue Group = OP270.551.910.051.52
Primary site = tongue Group = OP78Grade/differentiation = 2 Clinical stage group = early270.351.910.051.23
Primary site = tongue Clinical stage group = early70Grade/differentiation = 2 Group = OP270.391.90.051.27
Grade/differentiation = 2 Group = OP55Primary site = tongue Clinical stage group = early270.491.90.051.41
Grade/differentiation = 260Primary site = tongue Clinical stage group = early Group = OP270.451.880.051.34
Primary site = tongue Clinical stage group = early Group = OP65Grade/differentiation = 2270.421.880.051.3
Primary site = tongue88Grade/differentiation = 2 Clinical stage group = early Group = OP270.311.810.041.18
Grade/differentiation = 2 Clinical stage group = early Group = OP46Primary Site = tongue270.591.810.041.55
Grade/differentiation = 260Primary site = tongueClinical stage group = early270.451.740.041.31
Primary site = tongueClinical stage group = early70Grade/differentiation = 2270.391.740.041.24

∗1Stages 0 to 3 of clinical stage group and pathologic stage group as shown in Table 1 are regarded as early and stage 4 is regarded as late stage in Table 2.

∗2The best rules with lift >1.5 were shown here.

If lift is 1, X and Y are independent. The higher lift is above 1, the more likely that the existence of X and Y together in a transaction is due to a relationship between them and not just random occurrence. Unlike lift, leverage measures the difference between the probability of co-occurrence of X and Y as the independent probabilities of each of X and Y; that is, Leverage measures the proportion of additional cases covered by both X and Y above those expected if X and Y were independent of each other. Thus, for leverage, values above 0 are desirable whereas values greater than 1 are desirable for lift. Finally, conviction is similar to lift, but it measures the effect of the right-hand side not being true and also inverts the ratio. Conviction is measured as Table 2 shows that the rule “grade/differentiation = 2 and clinical stage group = early” is associated with the rule “primary site = tongue and group = OP.” The rule shows 49 patients as being grade/differentiation = 2 and clinical stage group = early, while 27 of these 49 patients fulfill the rules “primary site = tongue and group = OP.” The confidence shows the proportion of the rule “primary site = tongue and group = OP” in the rule “grade/differentiation = 2 and clinical stage group = early,” that is, 27/49. The lift is 1.91, meaning the existence of rule “grade/differentiation = 2 and clinical stage group = early” and rule “primary site = tongue and group = OP” together in a transaction is not just a random occurrence. The leverage value of 0.05 means that the proportion of additional cases covered by both rule “grade/differentiation = 2 and clinical stage group = early” and rule “primary site = tongue and group = OP” are greater than those that would be expected if these two rules were independent of each other. The conviction value of 1.52 indicates the effect of the right-hand side is not being true. From the top down in Table 2, the lift values gradually decrease but still show a high correlation between the body/head and survival of >36 months. When the Apriori algorithm-based lift value of the items listed in “body” and “head” of Table 2 is high, there is less chance of misinterpretation of the relationships between each item. Judging by the top 8 results, the same items such as grade/differentiation = 2, clinical stage group = early, primary site = tongue, and group = OP flowed between the “body” and “head”. These data suggest that early stage tongue cancer with moderate differentiation will have a better survival (>36 months) with varied surgical approaches where the OP has three kinds of treatments. Judging by the top 9 to 10 results, however, only three items are included without the group = OP and their lift values are decreased to 1.74. These results suggest that the factor of “group = OP” is not important to the top 9 to 10 results and is less strongly correlated compared with the top 8 results. It also implies that the OP plays an important role in creating a correlation with improved survival (>36 months). In clinical settings, this might be due to good treatment outcome which often accompanies surgery. Accordingly, our proposed Apriori algorithm is a relatively simple form of rule-based computation to identify potential rules involving various factors, such as grade/differentiation = 2, clinical stage group = early, primary site = tongue, and group = OP. The algorithm can reveal the combination effect of these factors on the outcome of OSCC therapy.

4. Conclusion

This hospital-based analysis reviewed 493 patients with OSCC to mine survival factors in operation-centric patients. The results identify the importance of grade/differentiation = 2, clinical stage group = early, primary site = tongue, and group = OP in predicting higher survival for OSCC patients.
  21 in total

1.  Tumor-associated carbonic anhydrase XII is linked to the growth of primary oral squamous cell carcinoma and its poor prognosis.

Authors:  Ming-Hsien Chien; Tsung-Ho Ying; Yi-Hsien Hsieh; Chien-Huang Lin; Chun-Han Shih; Lin-Hung Wei; Shun-Fa Yang
Journal:  Oral Oncol       Date:  2011-12-14       Impact factor: 5.337

2.  Outcomes of oral squamous cell carcinoma in Taiwan after surgical therapy: factors affecting survival.

Authors:  Wen-Liang Lo; Shou-Yen Kao; Lin-Yang Chi; Yong-Kie Wong; Richard Che-Shoa Chang
Journal:  J Oral Maxillofac Surg       Date:  2003-07       Impact factor: 1.895

3.  Does multicentric/multifocal breast cancer differ from unifocal breast cancer? An analysis of survival and contralateral breast cancer incidence.

Authors:  Rinat Yerushalmi; Hagen Kennecke; Ryan Woods; Ivo A Olivotto; Caroline Speers; Karen A Gelmon
Journal:  Breast Cancer Res Treat       Date:  2008-12-11       Impact factor: 4.872

4.  Matrix metalloproteinases (MMP) 1 and MMP10 but not MMP12 are potential oral cancer markers.

Authors:  Ching-Yu Yen; Chung-Ho Chen; Chao-Hsiang Chang; Hung-Fu Tseng; Shyun-Yeu Liu; Li-Yeh Chuang; Cheng-Hao Wen; Hsueh-Wei Chang
Journal:  Biomarkers       Date:  2009-06       Impact factor: 2.658

5.  Betel quid chewing, cigarette smoking and alcohol consumption related to oral cancer in Taiwan.

Authors:  Y C Ko; Y L Huang; C H Lee; M J Chen; L M Lin; C C Tsai
Journal:  J Oral Pathol Med       Date:  1995-11       Impact factor: 4.253

6.  Prevalence of betel quid chewing habit in Taiwan and related sociodemographic factors.

Authors:  Y C Ko; T A Chiang; S J Chang; S F Hsieh
Journal:  J Oral Pathol Med       Date:  1992-07       Impact factor: 4.253

7.  Decision tree-based learning to predict patient controlled analgesia consumption and readjustment.

Authors:  Yuh-Jyh Hu; Tien-Hsiung Ku; Rong-Hong Jan; Kuochen Wang; Yu-Chee Tseng; Shu-Fen Yang
Journal:  BMC Med Inform Decis Mak       Date:  2012-11-14       Impact factor: 2.796

8.  Current aspects on oral squamous cell carcinoma.

Authors:  Anastasios K Markopoulos
Journal:  Open Dent J       Date:  2012-08-10

9.  Predictors of betel quid chewing behavior and cessation patterns in Taiwan aborigines.

Authors:  Chin-Feng Lin; Jung-Der Wang; Ping-Ho Chen; Shun-Jen Chang; Yi-Hsin Yang; Ying-Chin Ko
Journal:  BMC Public Health       Date:  2006-11-03       Impact factor: 3.295

10.  Rule-based multi-scale simulation for drug effect pathway analysis.

Authors:  Woochang Hwang; Yongdeuk Hwang; Sunjae Lee; Doheon Lee
Journal:  BMC Med Inform Decis Mak       Date:  2013-04-05       Impact factor: 2.796

View more
  6 in total

1.  Association Patterns in Open Data to Explore Ciprofloxacin Adverse Events.

Authors:  P Yildirim
Journal:  Appl Clin Inform       Date:  2015-12-16       Impact factor: 2.342

2.  Discovering Associations of Adverse Events with Pharmacotherapy in Patients with Non-Small Cell Lung Cancer Using Modified Apriori Algorithm.

Authors:  Wei Chen; Jun Yang; Hui-Ling Wang; Ya-Fei Shi; Hao Tang; Guo-Hui Li
Journal:  Biomed Res Int       Date:  2018-04-23       Impact factor: 3.411

3.  Identification of High-Order Single-Nucleotide Polymorphism Barcodes in Breast Cancer Using a Hybrid Taguchi-Genetic Algorithm: Case-Control Study.

Authors:  Cheng-Hong Yang; Li-Yeh Chuang; Cheng-San Yang; Huai-Shuo Yang
Journal:  JMIR Med Inform       Date:  2020-06-17

4.  The index lift in data mining has a close relationship with the association measure relative risk in epidemiological studies.

Authors:  Khanh Vu; Rebecca A Clark; Colin Bellinger; Graham Erickson; Alvaro Osornio-Vargas; Osmar R Zaïane; Yan Yuan
Journal:  BMC Med Inform Decis Mak       Date:  2019-06-17       Impact factor: 2.796

5.  Application of simulation-based CYP26 SNP-environment barcodes for evaluating the occurrence of oral malignant disorders by odds ratio-based binary particle swarm optimization: A case-control study in the Taiwanese population.

Authors:  Ping-Ho Chen; Li-Yeh Chuang; Kuo-Chuan Wu; Yan-Hsiung Wang; Tien-Yu Shieh; Jim Jinn-Chyuan Sheu; Hsueh-Wei Chang; Cheng-Hong Yang
Journal:  PLoS One       Date:  2019-08-29       Impact factor: 3.240

6.  The Combinational Polymorphisms of ORAI1 Gene Are Associated with Preventive Models of Breast Cancer in the Taiwanese.

Authors:  Fu Ou-Yang; Yu-Da Lin; Li-Yeh Chuang; Hsueh-Wei Chang; Cheng-Hong Yang; Ming-Feng Hou
Journal:  Biomed Res Int       Date:  2015-08-25       Impact factor: 3.411

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.