| Literature DB >> 35167675 |
Polina V Kukhareva1, Tanner J Caverly2,3,4, Haojia Li5, Hormuzd A Katki6, Li C Cheung6, Thomas J Reese7, Guilherme Del Fiol1, Rachel Hess5,8, David W Wetter9, Yue Zhang5, Teresa Y Taft1, Michael C Flynn10,11,12, Kensaku Kawamoto1.
Abstract
OBJECTIVE: The US Preventive Services Task Force (USPSTF) requires the estimation of lifetime pack-years to determine lung cancer screening eligibility. Leading electronic health record (EHR) vendors calculate pack-years using only the most recently recorded smoking data. The objective was to characterize EHR smoking data issues and to propose an approach to addressing these issues using longitudinal smoking data.Entities:
Keywords: electronic health records; lung cancer screening; lung cancer screening eligibility; pack-years; self-reported smoking history
Mesh:
Year: 2022 PMID: 35167675 PMCID: PMC9006678 DOI: 10.1093/jamia/ocac020
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 7.942
Figure 1.Smoking history collection form.
Figure 2.Example of using Baseline and Longitudinal Approaches for patient who smoked 1 pack-per-day for 20 years and then switched to smoking 0.5 packs-per-day.
Figure 3.Patient flow through inclusion and exclusion criteria.
Patient characteristics
| Patient characteristics ( |
|
|---|---|
| Age | |
| 50–59 | 5926 (35.1%) |
| 60–69 | 6438 (38.2%) |
| 70–80 | 4510 (26.7%) |
| Female gender | 8057 (47.7%) |
| Race/ethnicity | |
| Non-Hispanic White | 13 230 (78.4%) |
| Non-Hispanic Black/African American | 324 (1.9%) |
| Hispanic | 1822 (10.8%) |
| Other race/ethnicity | 1498 (8.9%) |
| Smoking status based on last record | |
| Current smoker | 4151 (24.6%) |
| Former smoker | 12 723 (75.4%) |
Other race/ethnicity includes non-Hispanic participants with race other than White or Black or those who chose not to disclose race.
Data issues that could affect Baseline Approach
| Data issue that could affect Baseline Approach |
| Illustrative example | How issue is addressed in Longitudinal Approach |
|---|---|---|---|
| 1. Insufficient data to calculate pack-years from the most recent smoking record | 7204 (42.7%) | While patient is recorded as smoker, no information on packs-per-day or smoking duration is present in the most recent smoking record. | Pack-per-day are calculated from the first available observation. If never available, pack-per-day are imputed as the median value for the sample (0.5 packs-per-day). If years-smoked are missing from the most recent observation, the algorithm uses data from previous observations, with appropriate increase in duration with the passage of time. |
| 2. Insufficient data to calculate pack-years from the longitudinal records | 6708 (39.8%) | While patient is recorded as smoker, no information on packs-per-day or smoking duration is present in the longitudinal records. | Longitudinal Approach does not address this issue. |
| 3. Pack-years did not change for over a year when a patient was a current smoker | 4235 (25.1%) | Patient remained 20-year smoker for the past 15 years. | Instead of using years-smoked to calculate pack-years, years-smoked is used to calculate the start date of the first smoking period, and pack-years are calculated based on duration of the period. |
| 4. Unknown smoking quit date for a former smoker | 2942 (17.4%) | A patient was recorded as a smoker, then was recorded as being a former smoker, but the quit date was not entered. | When smoking date is missing for a former smoker, we use the transition date as the quit date. |
| 5. An increase or decrease in packs-per-day in the most recent observation results in changed pack-years for previous periods | 2852 (16.9%) | A patient smoked 1 pack-per-day for 20 years, then cut down to 0.5 packs-per-day, resulting in pack-years decreasing from 20 to 10. | Decreases and increases in packs-per-day do not affect results for previous periods. This assumption was tested in the sensitivity analysis. |
| 6. Pack-years decreased over time (caused by decreased years-smoked or packs-per-day) | 2001 (11.9%) | A patient originally said that they smoked for 30 years, then reported later that they smoked for 20 years. | Period-pack-years are added over time to the first period pack-years. |
| 7. Smoking quit date changed to an earlier date | 711 (4.2%) | A patient was documented as having quit smoking on January 01, 2010, but the most recent observation overrode the quit date with January 01, 2000. | The latest smoking quit date on record is used to determine how long a patient has been a former smoker. |
| 8. Patients became a ‘never smoker’ after being a current or former smoker | 512 (3%) | A patient was recorded as being smoker for a year, then subsequently was recorded as being a never smoker. | If detailed smoking information is available, it is used to estimate tobacco exposure. |
| 9. New smoking quit date not recorded after patient quit smoking repeatedly | 234 (1.4%) | Patient quit smoking on January 01, 1999, then restarted smoking and quit again on January 01, 2009. However, January 01, 1999 was the only quit date recorded. | If the recorded quit date is before the last known smoking date, the last known transition from smoking to not smoking is used to determine the quit date. |
| 10. The duration between recorded smoking start and end dates did not correspond to recorded years-smoked | 213 (1.3%) | A patient was documented as having started smoking on January 01, 2000 and having quit smoking on January 01, 2010. However, the smoking duration was recorded as 20 years. | The start date of the first period is calculated as the earliest date across all records indicating when the patient started smoking. |
| 11. Recorded as started smoking before age 5 | 91 (0.5%) | A patient was recorded as starting smoking when 3 years old. | For patients reported to start smoking before the age of 5, the smoking start date was moved to the 5th birthday. |
| 12. Recorded as smoking over 5 packs per day | 36 (0.2%) | A patient was recorded as smoking 10 packs-per-day for 20 years. | Packs-per-day >5 are divided by 20, with an assumption that cigarettes-per-day were mistakenly entered as packs-per-day. |
| Any of above | 13 833 (82%) |
One record can have more than one issue.
Figure 4.Diagram of patient eligibility for lung cancer screening according to the Baseline and Longitudinal Approaches.
Figure 5.Number of eligible patients identified using Baseline and Longitudinal Approaches.
Figure 6.Scatter plot of pack-years estimated using Baseline and Longitudinal Approaches for current smokers. The red line divides the plane in equal parts. The blue line represents the regression line between the pack-years estimated using 2 algorithms fitted by the local polynomial regression model. Forty-seven points are omitted due to pack-years estimated using either of the 2 algorithms larger than 100.
Patient eligibility for lung cancer screening according to the Baseline and Combined Approaches
| Baseline Approach, | Combined Approach, | Absolute change | Relative change |
| |
|---|---|---|---|---|---|
| Sufficient data to calculate pack-years and years-quit | 8164 (48.4%) | 10 171 (60.3%) | 2007 (11.9%) | 24.6% (23.3%, 25.9%) | <.001 |
| Current smoker, sufficient data to calculate pack-years | 2592 (15.4%) | 2769 (16.4%) | 177 (1%) | 6.8% (5.8%, 7.9%) | <.001 |
| Smoked <20 pack-years (did not meet criteria) | 1405 (8.3%) | 1620 (9.6%) | 215 (1.3%) | 15.3% (13.2%, 17.5%) | <.001 |
| Smoked ≥20 pack-years (met criteria) | 1187 (7%) | 1759 (10.4%) | 572 (3.4%) | 48.2% (43.4%, 53.2%) | <.001 |
| Former smoker, sufficient data to calculate pack-years and years-quit | 5572 (33%) | 7441 (44.1%) | 1869 (11.1%) | 33.5% (31.6%, 35.6%) | <.001 |
| Quit ≥15 years ago (did not meet criteria) | 3298 (19.5%) | 4225 (25%) | 927 (5.5%) | 28.1% (26%, 30.3%) | <.001 |
| Smoked <20 pack-years (did not meet criteria) | 1233 (7.3%) | 2067 (12.2%) | 834 (4.9%) | 67.6% (61.9%, 73.3%) | <.001 |
| Smoked ≥20 pack-years (met criteria) | 1041 (6.2%) | 1577 (9.3%) | 536 (3.2%) | 51.5% (46.7%, 56.8%) | <.001 |
| Average pack-years for current smokers with sufficient data for both algorithms | 22.7 (27.5) | 30.5 (22.2) | 7.8 (6.7, 8.8) | 34.3% (28.5%, 39.8%) | <.001 |
| Average pack-years for former smokers with sufficient data for both algorithms | 18.1 (23.4) | 20.6 (21.5) | 2.5 (1.8, 3) | 13.8% (9.9%, 17.4%) | <.001 |
| Average years-quit for former smokers with sufficient data for both algorithms | 21.4 (15.3) | 20.6 (15.1) | −0.7 (−0.9, −0.6) | −3.3 (−4.3%, −2.8%) | <.001 |
| Met USPSTF lung cancer screening criteria (combining current and former smokers) | 2228 (13.2%) | 3329 (19.7%) | 1101 (6.5%) | 49.4% (46%, 53%) | <.001 |
| Met USPSTF lung cancer screening criteria, high-benefit population (Bach model) | 988 (5.9%) | 1387 (8.2%) | 399 (2.4%) | 40.4% (36%, 45.2%) | <.001 |
USPSTF: US Preventive Services Task Force.
Average pack-years and years-quit are calculated for Baseline and Longitudinal Approaches.
P < .05.