| Literature DB >> 25875441 |
Chu Yu Chin1, Meng Yu Weng2, Tzu Chieh Lin3, Shyr Yuan Cheng4, Yea Huei Kao Yang5, Vincent S Tseng6.
Abstract
Rheumatoid arthritis (RA) is a chronic autoimmune rheumatic disease that can cause painful swelling in the joint lining, morning stiffness, and joint deformation/destruction. These symptoms decrease both quality of life and life expectancy. However, if RA can be diagnosed in the early stages, it can be controlled with pharmacotherapy. Although many studies have examined the possibility of early assessment and diagnosis, few have considered the relationship between significant risk factors and the early assessment of RA. In this paper, we present a novel framework for early RA assessment that utilizes data preprocessing, risk pattern mining, validation, and analysis. Under our proposed framework, two risk patterns can be discovered. Type I refers to well-known risk patterns that have been identified by existing studies, whereas Type II denotes unknown relationship risk patterns that have rarely or never been reported in the literature. These Type II patterns are very valuable in supporting novel hypotheses in clinical trials of RA, and constitute the main contribution of this work. To ensure the robustness of our experimental evaluation, we use a nationwide clinical database containing information on 1,314 RA-diagnosed patients over a 12-year follow-up period (1997-2008) and 965,279 non-RA patients. Our proposed framework is employed on this large-scale population-based dataset, and is shown to effectively discover rich RA risk patterns. These patterns may assist physicians in patient assessment, and enhance opportunities for early detection of RA. The proposed framework is broadly applicable to the mining of risk patterns for major disease assessments. This enables the identification of early risk patterns that are significantly associated with a target disease.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25875441 PMCID: PMC4395408 DOI: 10.1371/journal.pone.0122508
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Workflow of data mining process for RA disease early detection.
Fig 2Timeline for data collection of RA group.
Fig 3Risk Pattern Viewer.
Clinical course variables among patients categorized by RA diagnosis and gender.
| Characteristic | Sex | RA group | Non-RA group | p-value |
|---|---|---|---|---|
|
| All | 58.7±15.4 | 42.5±20.6 | 0.00 |
| Female | 58.1±14.9 | 42.53±20.5 | 0.00 | |
| Male | 60.5±16.8 | 42.4±20.7 | 0.00 | |
|
| Female | 1,013 | 470,520 | 0.00 |
| Male | 301 | 494,759 | 0.00 | |
|
| F:485,934 M:513,703 | |||
|
| 1997–2008 | |||
|
| National Health Insurance Research Database (NHIRD) | |||
Fig 4Age distribution of cohort.
Fig 5Trend of RA assessment effects under different support thresholds.
Fig 6Distribution of RA patients under different numbers of diagnosis records.
Fig 7Trend of RA assessment effects under different diagnosis record numbers.
Characteristics of mined patterns.
| RA group | Non-RA group | |
|---|---|---|
|
| 281,208 | 227,453,049 |
|
| 3,169 | 16,518 |
|
| 32,560 | 2,914 |
|
| 497 (16%) | 308 (2%) |
Risk pattern distribution between the RA group and non-RA group for all diseases categorized in ICD-9-CM.
| Disease Category | ICD-9-CM code | RA | Non-RA | RR | CI (95%) |
|---|---|---|---|---|---|
|
| 001–139 | 2.46% | 7.62% | 0.32 | (0.27–0.37) |
|
| 140–239 | 1.92% | 0.31% | 6.21 | (5.71–7.28) |
|
| 240–279 | 9.70% | 0.55% |
| (15.36–21.74) |
|
| 280–289 | 1.66% | 0.03% |
| (29.23–286.57) |
|
| 290–319 | 9.85% | 0.14% |
| (51.16–144.86) |
|
| 320–389 | 20.10% | 18.74% | 1.07 | (1.04–1.10) |
|
| 390–459 | 10.54% | 0.82% |
| (11.55–14.79) |
|
| 460–519 | 31.45% | 44.89% | 0.7 | (0.68–0.72) |
|
| 520–579 | 35.32% | 42.62% | 0.83 | (0.81–0.85) |
|
| 580–629 | 16.06% | 6.49% | 2.48 | (2.46–2.49) |
|
| 630–679 | 0% | 0.58% | 0 | (0.00–0.00) |
|
| 680–709 | 8.52% | 18.98% | 0.45 | (0.41–0.48) |
|
| 710–739 | 57.18% | 1.30% |
| (38.33–51.8) |
|
| 740–759 | 0% | 0% | 0 | (0.00–0.00) |
|
| 760–779 | 0% | 0% | 0 | (0.00–0.00) |
|
| 780–799 | 21.91% | 16.61% | 1.32 | (1.29–1.35) |
|
| 800–999 | 9.52% | 17.02% | 0.56 | (0.52–0.60) |
Fig 8Trend in literature for single disease risk patterns in PubMed.
Fig 9Trend in literature for PubMed pattern related mining.
Sensitivity and specificity of the RA disease risk model with ten-fold cross-validation.
| Ten-fold | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 78% | 81% | 78% | 85% | 82% | 73% | 84% | 78% | 82% | 82% |
|
| 74% | 72% | 74% | 73% | 73% | 71% | 70% | 73% | 72% | 74% |
RA risk patterns of autoimmune-related diseases.
| Number of Related Papers | |||||
|---|---|---|---|---|---|
| RA Risk Patterns | Confidence | Support | RR | T1 | T2 |
|
| 93.09% | 1.31% | 13.5 | 20,1852,59,169 | 0 |
| Benign | 90.56% | 1.64% | 9.6 | 74,426,30 | 0 |
|
| 90.46% | 1.64% | 9.5 | 4,25 | 0 |
|
| 90.29% | 1.31% | 9.3 | 15,1290,374 | 0 |
|
| 89.75 | 1.64% | 8.75 | 16,5946,68 | 0 |
|
| 88.74% | 1.64% | 7.9 | 916,1508,126 | 0 |
|
| 88.49% | 1.31% | 7.7 | 123 | 0 |
|
| 88.23% | 1.8% | 1.7 | 121 | 0 |
|
| 88.06% | 1.96% | 7.4 | 62,2503,273 | 0 |
|
| 87.45% | 2.29% | 6.9 | 49 | 0 |
|
| 86.55% | 1.64% | 6.4 | 66,817,72 | 1 |
|
| 84.94% | 1.64% | 5.6 | 794,662,137 | 0 |
|
| 83.57% | 1.96% | 5.1 | 41,25 | 0 |
|
| 83.56% | 1.64% | 5.1 | 10,7 | 0 |
|
| 77.14% | 5.56% | 3.4 | 47,5 | 0 |
|
| 76.58% | 11.45% | 3.2 | 21 | 2 |
|
| 74.80% | 26.84% | 2.9 | 744 | 2 |
|
| 72.95% | 12.6% | 2.7 | 13,2 | 0 |
|
| 72.9% | 5.24% | 2.7 | 42 | 0 |
|
| 72.21% | 16.36% | 2.6 | 264 | 0 |
|
| 70.75% | 16.85% | 2.4 | 65 | 1 |
|
| 70.64% | 8.84% | 2.4 | 5461 | 0 |
|
| 70.43% | 8.35% | 2.4 | 2,0 | 0 |
|
| 70% | 8.18% | 2.3 | 122 | 0 |
Bold text denotes autoimmune-related diseases. The query term has been underscored, and repetitions are not marked again.
aT1 = Number of papers related to Type I pattern.
bT2 = Number of papers related to Type II pattern.