| Literature DB >> 16480491 |
Srinivas Vinnakota1, Nina S N Lam.
Abstract
BACKGROUND: The objective of this study was to demonstrate the use of an association rule mining approach to discover associations between selected socioeconomic variables and the four most leading causes of cancer mortality in the United States. An association rule mining algorithm was applied to extract associations between the 1988-1992 cancer mortality rates for colorectal, lung, breast, and prostate cancers defined at the Health Service Area level and selected socioeconomic variables from the 1990 United States census. Geographic information system technology was used to integrate these data which were defined at different spatial resolutions, and to visualize and analyze the results from the association rule mining process.Entities:
Mesh:
Year: 2006 PMID: 16480491 PMCID: PMC1397822 DOI: 10.1186/1476-072X-5-9
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Association rules that have the highest support value for each of the four cancer sites used in the study
| Association Rule | Support%, Confidence%, lift | |
| a | Lung cancer mortality among white men is high → proportion of whites with low educational attainment is high | 8.955%, 46.45%, 2.32 (72/155) |
| b | Colorectal cancer mortality among white men is high → proportion of households with white male householder with no wife present and with no children under the age of 18 years is high | 7.463%, 45.80%, 2.29 (60/131) |
| c | Prostate cancer among black men is medium high → proportion of households with black female householder with no husband present and with no children under the age of 18 years is high | 3.731%, 61.22%, 3.06 (30/49) |
| d | Breast cancer among black females is high → proportion of households with black female householder with no husband present and with the presence of own children under the age of 18 years in the house is high | 2.861%, 74.19%, 3.59 (23/31) |
Top 3 HSAs ordered on the cancer mortality rate for the association rules based on the highest support value
| Rule | HSA | Mortality rate | Socio-economic variable value |
| a: Lung cancer mortality among white men is high → Whites with low educational attainment is high | Madison, MO | 119.293 | 0.787 |
| Scott, TN | 115.034 | 0.839 | |
| Pike, KY – Logan, WV | 100.694 | 0.801 | |
| b: Colorectal cancer mortality among white men is high → density of households with white male householder with no wife present and with no children under the age of 18 years is high | Jefferson (Steubenville), OH – Harrison, OH | 24.503 | 0.017 |
| White, IL – Hamilton, IL | 24.427 | 0.014 | |
| Lackawanna (Scranton), PA – Wayne, PA | 24.157 | 0.021 | |
| c: Prostate cancer among black men is medium high → density of households with black female householder with no husband present and with no children under the age of 18 years is high | Dallas, AL – Marengo, AL | 41.445 | 0.074 |
| Ouachita, AR – Dallas, AR | 41.364 | 0.033 | |
| Upson, GA – Lamar, GA | 41.017 | 0.035 | |
| d: Breast cancer among black females is high → density of households with black female householder with no husband present and with the presence of own children under the age of 18 years in the house is high | Halifax, VA – Mecklenburg, VA | 39.206 | 0.041 |
| Lincoln, LA – Union, LA | 38.517 | 0.051 | |
| Gregg (Longview), TX – Rusk, TX | 36.581 | 0.028 |
Figure 1Lung cancer mortality among white men and associated socioeconomic characteristic with the highest support. Shown in red is the spatial distribution of areas having a high rate of lung cancer mortality among white men and high density of whites with low educational attainment. This pattern has the highest support among the association rules involving lung cancer mortality rate. The beige color indicates areas having high rate of lung cancer mortality among white men.
Figure 2Colorectal cancer mortality among white men and associated socioeconomic characteristic with the highest support. Shown in red is the spatial distribution of areas having a high rate of colorectal cancer mortality among white men and high density of households with white male householder and no wife present and no children under the age of 18 years. This pattern has the highest support among the association rules involving colorectal cancer mortality rate. The beige color indicates areas having high rate of colorectal cancer mortality among white men.
Figure 3Prostate cancer mortality among black men and associated socioeconomic characteristic with the highest support. Shown in red is the spatial distribution of areas having a medium-high rate of prostate cancer mortality among black men and high density of households with black female householder with no husband present and with no children under the age of 18 years. This pattern has the highest support among the association rules involving prostate cancer mortality rate. The beige color indicates areas having medium-high rate of prostate cancer mortality among black men.
Figure 4Breast cancer among black women and associated socioeconomic characteristic with the highest support. Shown in red is the spatial distribution of areas having high rate of breast cancer mortality among black women and high density of households with black female householder with no husband present and with the presence of own children under the age of 18 years. This pattern has the highest support among the association rules involving breast cancer mortality rate. The beige color indicates areas having high rate of breast cancer mortality among black women.
Results of correlation analysis between cancer mortality rates and socioeconomic variables using Pearson's product-moment correlation
| Cancer mortality rate | Socioeconomic variable | Correlation coefficient |
| Black female lung cancer | Women who are widowed | -0.468 |
| Black female lung cancer | Women who are divorced | 0.449 |
| Black female lung cancer | Houses that have complete plumbing | 0.438 |
| Black female breast cancer | Women who are never married | 0.438 |
| Black female breast cancer | Women who are married and living with their spouse | -0.418 |
| Black female breast cancer | Black female unemployment rate | 0.486 |
| Black female colorectal cancer | Women who are never married | 0.422 |
| Black female colorectal cancer | Black female unemployment rate | 0.514 |
| Black female colorectal cancer | Population employed in Agricultural services industry | -0.435 |
| White male lung cancer | Population born in the south but living elsewhere | 0.451 |
| White male lung cancer | Population working in the same county as residence | -0.455 |
| White male lung cancer | White population with less educational attainment | 0.458 |
| White male lung cancer | White population with high educational attainment | -0.458 |
Comparison between selected results of statistical correlation analysis and association rule mining
| Type | Rule | Rule value: correlation coefficient for correlation rule and (support, confidence, lift) for association rule. |
| Correlation rule | White male lung cancer and percent population working in the same county as residence | -0.455 |
| Correlation rule | White male lung cancer and percent white population with less educational attainment | 0.458 |
| Correlation rule | White male lung cancer and percent white population with high educational attainment | -0.458 |
| Association rule | White male lung cancer mortality is low → percent population working in the same county as residence is high | 8.706, 45.45%, 2.27 |
| Association rule | White male lung cancer mortality is high → percent white with less educational attainment is high | 8.955%, 46.45%, 2.32 |
| Association rule | White male lung cancer mortality is high → percent white population with high educational attainment is low | 8.831%, 45.81%, 2.30 |
Descriptive statistics for cancer mortality rates used in the analysis. Mortality rates reported are per 100,000 age-, sex-, and race-adjusted to 1940 standard US population
| Cancer mortality rate | Number of records | Minimum | Maximum | Mean | Standard Deviation |
| White male lung cancer | 774 | 9.41 | 119.29 | 58.9193 | 14.46512 |
| White male colorectal cancer | 654 | 7.52 | 25.46 | 16.1918 | 3.07422 |
| White male prostate cancer | 730 | 6.46 | 27.2 | 15.218 | 2.53059 |
| White female lung cancer | 684 | 8.61 | 49.36 | 24.7711 | 5.65126 |
| White female breast cancer | 699 | 9.46 | 37.31 | 21.5052 | 3.23601 |
| White female colorectal cancer | 622 | 4.3 | 21.18 | 11.0973 | 2.12812 |
| Black male lung cancer | 290 | 0 | 245.2 | 88.2525 | 20.70256 |
| Black male colorectal cancer | 114 | 0 | 36.89 | 21.3869 | 4.7887 |
| Black male prostate cancer | 244 | 0 | 61.38 | 35.7475 | 8.09248 |
| Black female lung cancer | 153 | 0 | 56.06 | 24.8152 | 9.62457 |
| Black female breast cancer | 158 | 0 | 43.21 | 26.3151 | 7.58142 |
| Black female colorectal cancer | 131 | 0 | 29.48 | 15.0637 | 4.82983 |
Data variables used in association rule mining of cancer mortality rates and socioeconomic characteristics
| Dataset | Variable | Source |
| Cancer mortality rates | Lung cancer among white/black men/women. | All cause mortality shapefile from National Atlas of United States of America. |
| Socioeconomic characteristics | Number of persons in household: | United States Census Bureau: 1990 Census |
Descriptive statistics for selected socioeconomic variables used in the study. Except for unemployment rates that expressed as a percentage the rest of variables are proportion of their corresponding universe
| Socioeconomic variable | Number of HSAs | Minimum | Maximum | Mean | Std. Deviation |
| White male householder with no wife present and with no own children under the age of 18 years. | 804 | 0 | 0.03 | 0.00969 | 0.001735 |
| Black female householder with no husband present and with presence of own children under the age of 18 years in the same household | 804 | 0 | 0.13 | 0.01279 | 0.020477 |
| Black female householder with no husband present and with no own children under the age of 18 years | 804 | 0 | 0.08 | 0.00757 | 0.012746 |
| Women who are never married | 804 | 0.08 | 0.77 | 0.1991 | 0.060248 |
| Women who are married and now living with their spouse | 804 | 0.31 | 0.86 | 0.5731 | 0.064284 |
| Women who are widowed | 804 | 0.05 | 0.37 | 0.13571 | 0.03033 |
| Women who are divorced | 804 | 0.008 | 0.3 | 0.08327 | 0.025265 |
| Population born in the south but now living elsewhere | 804 | 0.004 | 0.38 | 0.08357 | 0.062408 |
| Population working in the same county as residence | 804 | 0.24 | 0.7 | 0.44745 | 0.077879 |
| Households with complete plumbing facilities | 804 | 0.64 | 0.99 | 0.97546 | 0.021435 |
| Households with no plumbing facilities | 804 | 0.001 | 0.35 | 0.01678 | 0.020128 |
| Black female unemployment rate | 804 | 0 | 100 | 11.93057 | 11.867286 |
| Population employed in services industry | 804 | 0 | 0.88 | 0.27529 | 0.078398 |
| Population employed in agricultural – services industry | 804 | 0 | 0.23 | 0.01511 | 0.01745 |
| White with less education attainment | 804 | 0.23 | 0.85 | 0.59597 | 0.097935 |
| White with high educational attainment | 804 | 0.14 | 0.76 | 0.39404 | 0.097948 |