| Literature DB >> 29872307 |
Satoshi Irino1, Yukio Kurihara2.
Abstract
We evaluated quasi-healthy cohorts (model cohorts), derived from clinical data, to determine how well they simulated control cohorts. Control cohorts comprised individuals extracted from a public checkup database in Japan, under the condition that their values for 3 basic laboratory tests fall within specific reference ranges (3Ts condition). Model cohorts comprised outpatients, extracted from a clinical database at a hospital, under the 3Ts condition or under the condition that their values for 4 laboratory tests fall within specific reference ranges (4Ts condition). Because even a patient with a serious illness, such as cancer, may present with normal values on basic laboratory tests, one additional condition was added: the duration (1 or 3 months; 1M or 3M) during which patients were not hospitalized after their first laboratory test. For evaluations, cohorts were specified by age and sex. The 4Ts + 3M condition was the most effective condition, under which model cohorts were used to successfully simulate age-dependent changes and sex differences in laboratory test values for control cohorts. Therefore, by properly setting the conditions for extracting quasi-healthy individuals, we can derive cohorts from clinical data to simulate various types of cohorts. Although some issues with the proposed method remain to be solved, this approach presents new possibilities for using clinical data for cohort studies.Entities:
Keywords: Cohort study; clinical data; derivation of cohorts; secondary use; statistical analysis
Year: 2018 PMID: 29872307 PMCID: PMC5977427 DOI: 10.1177/1178222618777758
Source DB: PubMed Journal: Biomed Inform Insights ISSN: 1178-2226
The number of checkup examinees and ratio among the population of Kochi Prefecture from the period of 2003 to 2007.
| Age brackets, y | 40-49 | 50-59 | 60-69 | 70-79 | All age brackets |
|---|---|---|---|---|---|
| Male | 7579 (3%) | 15 306 (5%) | 31 947 (13%) | 35 081 (17%) | 89 914 (9%) |
| Female | 16 471 (7%) | 37 033 (12%) | 61 440 (22%) | 58 104 (21%) | 173 047 (16%) |
| Males + females | 24 050 (5%) | 52 339 (9%) | 93 387 (18%) | 93 185 (19%) | 262 961 (13%) |
Figure 1.Procedure for extracting the cohorts.
Clinical reference ranges used at the Kochi Medical School Hospital.
| Tests | Males | Females | ||
|---|---|---|---|---|
| Lower limit | Upper limit | Lower limit | Upper limit | |
| Hb, g/dL | 13.2 | 17.2 | 10.8 | 14.9 |
| ALT, IU/dL | 8 | 42 | 6 | 27 |
| CRN, mg/dL | 0.6 | 1.1 | 0.4 | 0.8 |
| WBC, cells/μL | 3600 | 9600 | 3000 | 8500 |
Abbreviations: ALT, alanine aminotransferase; CRN, blood creatinine level; Hb, hemoglobin; WBC, white blood cell.
The number and percentages of public checkup examinees extracted for the control cohorts and of the patients used for the model cohorts.
| Age brackets, y | 40-49 | 50-59 | 60-69 | 70-79 | All age brackets | |
|---|---|---|---|---|---|---|
| Control cohorts | ||||||
| Males | 5684 (75%) | 12 245 (80%) | 24 919 (78%) | 24 206 (69%) | 67 054 (75%) | |
| Females | 12 847 (78%) | 29 997 (81%) | 49 766 (81%) | 45 902 (79%) | 138 512 (80%) | |
| Males + females | 18 531 (77%) | 42 242 (81%) | 74 685 (80%) | 70 108 (75%) | 205 566 (78%) | |
| Model cohorts | ||||||
| Males | 3Ts | 346 (53%) | 685 (55%) | 815 (52%) | 722 (43%) | 2568 (50%) |
| 4Ts | 301 (46%) | 600 (48%) | 740 (47%) | 669 (40%) | 2310 (45%) | |
| 4Ts + 1M | 172 (26%) | 291 (23%) | 310 (20%) | 232 (14%) | 1005 (20%) | |
| 4Ts + 3M | 146 (22%) | 232 (19%) | 233 (15%) | 174 (10%) | 785 (15%) | |
| Females | 3Ts | 593 (65%) | 913 (65%) | 978 (63%) | 1160 (62%) | 3644 (63%) |
| 4Ts | 526 (58%) | 833 (60%) | 914 (59%) | 1074 (58%) | 3347 (58%) | |
| 4Ts + 1M | 299 (33%) | 479 (34%) | 467 (30%) | 419 (22%) | 1664 (29%) | |
| 4Ts + 3M | 240 (26%) | 395 (28%) | 390 (25%) | 342 (18%) | 1367 (24%) | |
| Males + females | 4Ts + 3M | 386 (25%) | 627 (24%) | 623 (20%) | 516 (15%) | 2152 (20%) |
The average ages for the control and model cohorts under the 4Ts + 3M condition.
| Age brackets, y | Males | Females | ||||||
|---|---|---|---|---|---|---|---|---|
| Control cohorts | Model cohorts (4Ts + 3M) | Control cohorts | Model cohorts (4Ts + 3M) | |||||
| Average ± SD | Median | Average ± SD | Median | Average ± SD | Median | Average ± SD | Median | |
| 40-49 | 45.0 ± 2.9 | 45.0 | 44.5 ± 2.9 | 45.0 | 44.8 ± 2.9 | 45.0 | 44.9 ± 2.9 | 45.0 |
| 50-59 | 55.1 ± 2.8 | 55.0 | 54.8 ± 2.8 | 55.0 | 55.2 ± 2.7 | 55.0 | 54.7 ± 2.7 | 55.0 |
| 60-69 | 65.0 ± 2.8 | 65.0 | 64.2 ± 2.9 | 64.0 | 64.7 ± 2.8 | 65.0 | 64.3 ± 2.8 | 65.0 |
| 70-79 | 73.8 ± 2.8 | 74.0 | 73.2 ± 2.8 | 73.0 | 73.9 ± 2.8 | 74.0 | 74.4 ± 2.8 | 74.0 |
The P values for the Mann-Whitney and Kolmogorov-Smirnov tests.
| Sex | Age brackets, y | Conditions | Mann-Whitney test | Kolmogorov-Smirnov test | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Hb | ALT | TC | γ-GTP | Hb | ALT | TC | γ-GTP | |||
| Males | 40-49 | 3Ts | .875 | .260 | .242 | .603 | .946 | .653 | .363 | .911 |
| 4Ts | .781 | .388 | .153 | .503 | .985 | .840 | .207 | .574 | ||
| 4Ts + 1M | .365 | .725 | .425 | .139 | .716 | .796 | .600 | .282 | ||
| 4Ts + 3M | .361 | .330 | .157 | .122 | .543 | .647 | .192 | .321 | ||
| 50-59 | 3Ts | . | .738 |
| .652 | . | .520 | . | .709 | |
| 4Ts | . | .607 | . | .988 | . | .667 | . | .800 | ||
| 4Ts + 1M | .287 | .990 | .214 | .623 | .724 | .637 | .445 | .609 | ||
| 4Ts + 3M | .178 | .917 | .160 | .923 | .401 | .801 | .258 | .869 | ||
| 60-69 | 3Ts | . | . | . |
| . | . | . |
| |
| 4Ts | . | . | . |
| . | . | . |
| ||
| 4Ts + 1M | .886 | . | .420 | . | .836 | . | .648 | . | ||
| 4Ts + 3M | .866 | . | .494 | .054 | .868 | .284 | .651 | . | ||
| 70-79 | 3Ts | . | .299 | . | . | .095 | .099 | . | . | |
| 4Ts | . | .401 | . | . | .146 | .157 | . | . | ||
| 4Ts + 1M | .210 | .956 | .303 | .267 | .511 | .995 | .589 | .101 | ||
| 4Ts + 3M | .487 | .917 | .144 | .141 | .956 | .997 | .397 | .111 | ||
| Females | 40-49 | 3Ts | .188 | .252 | .557 | .427 | .189 | .255 | .144 | .228 |
| 4Ts | .436 | .424 | .575 | .963 | .405 | .594 | .291 | .431 | ||
| 4Ts + 1M | .122 | .647 | .297 | .956 | .206 | .767 | .050 | .666 | ||
| 4Ts + 3M | . | .594 | .268 | .667 | .070 | .636 | .077 | .415 | ||
| 50-59 | 3Ts | .464 | . | . | .123 | . | . |
| .227 | |
| 4Ts | .857 | . | . | .172 | .100 | . | . | .377 | ||
| 4Ts + 1M | .942 | .344 | .450 | .308 | .45 | .435 | .452 | .575 | ||
| 4Ts + 3M | .659 | .630 | .382 | .668 | .307 | .675 | .300 | .913 | ||
| 60-69 | 3Ts | .841 | . | . | . | .327 | . | . | . | |
| 4Ts | .778 | .077 | .053 | . | .539 | . | . | .050 | ||
| 4Ts + 1M | . | .265 | .153 | . | .114 | . | .083 | .194 | ||
| 4Ts + 3M | .205 | .246 | .246 | .267 | .366 | .074 | .177 | .673 | ||
| 70-79 | 3Ts | .150 |
|
| . | .078 |
|
| . | |
| 4Ts | .159 | . | . | .082 | .107 |
| . | .088 | ||
| 4Ts + 1M | .181 | .169 | .251 | .264 | .229 | .118 | .205 | .092 | ||
| 4Ts + 3M | .111 | .240 | .707 | .240 | .201 | .195 | .653 | .153 | ||
Abbreviations: γ-GTP, γ-glutamyltransferase; ALT, alanine aminotransferase; Hb, hemoglobin; TC, total cholesterol.
P < .05; **P < .01.
The test item with a significant difference was shown by an italic character.
Figure 2.Evaluation of statistical errors in the 25th, 50th, and 75th percentiles, for males in the control and model cohorts, by the bootstrap method.
Figure 3.Evaluation of statistical errors in the 25th, 50th, and 75th percentiles, for females in the control and model cohorts, by the bootstrap method.