| Literature DB >> 26290883 |
Kenneth F Adams1, Eric A Johnson2, Jessica Chubak2, Aruna Kamineni2, Chyke A Doubeni3, Diana S M Buist2, Andrew E Williams4, Sheila Weinmann5, V Paul Doria-Rose6, Carolyn M Rutter7.
Abstract
INTRODUCTION: Electronic health data are potentially valuable resources for evaluating colonoscopy screening utilization and effectiveness. The ability to distinguish screening colonoscopies from exams performed for other purposes is critical for research that examines factors related to screening uptake and adherence, and the impact of screening on patient outcomes, but distinguishing between these indications in secondary health data proves challenging. The objective of this study is to develop a new and more accurate algorithm for identification of screening colonoscopies using electronic health data.Entities:
Keywords: LASSO; ROC; classification; cohort identification; colonoscopy; data use and quality; health information technology; screening
Year: 2015 PMID: 26290883 PMCID: PMC4537082 DOI: 10.13063/2327-9214.1171
Source DB: PubMed Journal: EGEMS (Wash DC) ISSN: 2327-9214
Candidate Predictors: Variables Included in Selection Algorithms Used for Algorithm Development
| Age on procedure date, age squared | 0 | ||
| Gender | 0 | ||
| Iron deficiency anemia | 280.0, 280.9 | 0–365, 1–365 | g |
| Anemia (1) | 285.1, 285.9 | 0–365, 1–365 | |
| Anemia (2) | 281.9, 285.1, 285.9 | 0–365, 1–365 | g |
| Colitis (1) | 558.9 | 0–365, 1–365 | |
| Colitis (2) | 558.x | 0–365, 1–365 | g |
| Diarrhea (1) | 558.9, 564.5 | 0–365, 1–365 | |
| Diarrhea (2) | 9.2, 9.3, 564.5, 787.91 | 0–365, 1–365 | |
| Diarrhea (3) | 9.2, 9.3, 558.9, 564.5, 787.91 | 0–365, 1–365 | g |
| Diarrhea (4) | 9, 9.0, 9.1, 9.2, 9.3, 558.9, 564.5, 787.91 | 1–365 | g |
| Intestinal obstruction | 560, 560.0, 560.89, 560.9 | 0–365 | |
| Diverticula | 562.1x | 0–365, 1–365 | g |
| Constipation | 564.00, 564.09 | 1–365 | g |
| Irritable colon (Irritable bowel syndrome) | 564.1 | 0–365 | |
| Functional digestive disorder | 564.0, 564.00, 564.09, 564.1, 564.7, 564.81, 564.89, 564.9 | 0–365, 1–365 | g |
| Rectal bleeding, Hemorrhage, BRBPR | 569.3 | 0–365, 1–365 | |
| GI bleed, stool (1), Melena | 578.1 | 0–365, 1–365 | |
| GI bleed, stool (2), Hematochezia | 578.9 | 0–365, 1–365 | |
| GI bleed, stool (3), Melena or hematochezia | 578.1, 578.9 | 0–365, 1–365 | |
| Heme-positive stool | 792.1 | 0–365, 1–365 | |
| GI bleed, stool (4) | 578, 578.1, 578.9, 792.1 | 0–365, 1–365 | |
| Abdominal pain | 789.0, 789.00, 789.01, 789.02, 789.03, 789.04, 789.05, 789.06, 789.07, 789.09 | 0–365, 1–365 | |
| Abdominal swelling | 789.3, 789.30, 789.31, 789.32, 789.33, 789.34, 789.35, 789.36, 789.37, 789.39 | 0–365, 1–365 | |
| Other digestive system symptoms | 787.99 | 0–365, 1–365 | |
| Weight loss | 783.2X, 799.4 | 0–365, 1–365 | |
| Nausea/vomiting | 787.0, 787.01, 787.02, 787.03, 787.04 | 0–365, 1–365 | |
| Abdominal distension | 787.3 | 0–365 | |
| Abnormal GI findings | 793.4 | 0–365 | |
| Rectal polyp | 569.0 | 0–365, 1–365 | g |
| Cancer screening | G0344, v76.41, v76.50, v76.51 | 0–31, 0–180, 1–180, 0–365, 1–365 | g |
| Family history of CRC | V16.0 | 0–365, 1–365 | |
| Personal history of colon polyps | V12.72 | 0–365, 1–365 | |
| FOBT | 82270, 82271, 82272, 82273, 82274, G0107, G0328, G0394 | 1–365 | |
| Flexible sigmoidoscopy | 45300, 45303, 45305, 45307, 45308, 45309, 45315, 45317, 45320, 45321, 45327, 45330, 45331, 45332, 45333, 45334, 45335, 45336, 45337, 45338, 45339, 45340, 45341, 45342, 45345, G0104; 45.24, 48.21, 48.22, 48.23, 48.24, 48.36 | 1–365 | |
| Previous colonoscopy | 44388, 44389, 44390, 44391, 44392–44394, 44397, 45355, 45378, 45379, 45380, 45381, G0105, G0121, 45382, 45383, 45384, 45385, 45386, 45387, 45388, 45391, 45392; 45.23, 45.21, 45.25, 45.43, 98.04 | 1–365 | |
Notes: Abbreviations: BRBPR, Bright red blood per rectum; Gl, Gastrointestinal; FOBT, fecal occult blood test; ICD, International Classification of Disease; CPT, Current Procedural Terminology; HCPCS, Health Care Procedure Coding System
These variables are constructed from ICD-9 diagnosis codes and code groups.
We included multiple candidate variables for anemia, colitis, diarrhea, and Gl bleed; each coded slightly differently based on the source. For example, we created “anemia (1)” using diagnosis codes based on our own judgment and adapted “anemia (2)” from the Cooper et al. and El-Serag et al. algorithm.9,13
These variables are constructed from ICD-9 V-codes and HCSPCS procedure codes.
Included as candidate predictors for the primary algorithm only.
These variables are constructed from CPT, HCPCS, and ICD-9 procedure codes.
Most variables had two versions: one that included codes assigned to the day of the procedure, the other didn’t. The 0–365 day look-back interval included symptoms, signs, and conditions assigned on the day of the procedure (day 0) and up to 365 days prior to the procedure. The 1–365 day look-back excluded codes assigned on the day of the procedure.
Characteristics of Colonoscopies in the Reference Data Set
| Male | 314 | 52.7 | 52.8 |
| Female | 282 | 47.3 | 47.2 |
| 50–59 | 83 | 13.9 | 25.9 |
| 60–75 | 323 | 54.2 | 59.7 |
| 76–85 | 190 | 31.9 | 14.5 |
| Case | 404 | 67.8 | 4.0 |
| Control | 192 | 32.2 | 96.0 |
| Nonscreening | 536 | 90.0 | 75.4 |
| Screening | 60 | 10.1 | 24.6 |
| 1996–2000 | 53 | 8.9 | 11.1 |
| 2001–2005 | 167 | 28.0 | 45.2 |
| 2006–2008 | 376 | 63.1 | 43.7 |
Notes:
Only colonoscopies classified as “definitely screening” in the reference data set were considered average-risk screening in the present analyses. Reference colonoscopies classified as “definite diagnostic,” “probable diagnostic,” surveillance, and “probably screening (with symptoms)” were dichotomized as nonscreening.
Weighted by the inverse sampling fraction based on age, calendar year, and CRC case status.
Logistic Regression Coefficients for Colonoscopy Indication Algorithms
| (Intercept) | 1.64852 | 1.74247 | |
| Age | 0 | 0 | −0.04521 |
| Age squared | 0 | −0.00046 | −0.00033 |
| Iron deficiency anemia | 0–365 | −0.91922 | −0.59150 |
| Anemia (1) | 0–365 | 0 | −0.05479 |
| Functional digestive disorder | 0–365 | −0.86536 | −0.25237 |
| Rectal bleeding, hemorrhage, BRBPR | 1–365 | 0 | −0.07013 |
| Rectal bleeding, Hemorrhage, BRBPR | 0–365 | −1.34397 | −1.45248 |
| GI bleed, stool (4) | 0–365 | −1.16778 | −0.83575 |
| Abdominal distension | 0–365 | −0.73345 | −0.73452 |
| Abdominal pain | 0–365 | −0.18429 | −0.27980 |
| Nausea/vomiting | 1–365 | −0.05580 | 0 |
| Nausea/vomiting | 0–365 | −0.64995 | −0.02046 |
| Rectal polyp | 1–365 | −1.81902 | −2.05020 |
| Cancer screening | 0–31 | – | 0.63791 |
| Cancer screening | 1–180 | – | −0.28645 |
| Cancer screening | 0–180 | – | 1.56963 |
Notes: Abbreviations: BRBPR, Bright red blood per rectum; Gl, gastrointestinal
The algorithms use the logistic regression coefficients to predict the probability that a colonoscopy is a screening exam is estimated using the equation p̂=l/(l+exp(1*Xß), where Xß is the linear combination of the covariates and coefficient values.
The 0–365 day look-back interval included code groupings assigned on the day of the procedure (day 0) and up to 365 days prior. The 1–365 day look-back excluded codes assigned on day 0.
The secondary, extended algorithm was developed using a model that included screening V-codes (V76.41, V75.50, V76.51) and the HCPCS average risk procedure code (G0344) as candidate variables, whereas the primary, restricted model excludes codes as candidates.
Figure 1.Legend: Area under the Receiver Operating Characteristics (ROC) Curves for the Restricted and Extended Algorithms.
The primary, restricted algorithm did not include ICD-9 screening V-codes (V76.41, V76.50, V76.51) and the HCPCS preventative examination code (G0344) as candidate variables, whereas the secondary, extended algorithm included these variables.
Predictive Model Test Characteristics for the Restricted and Extended Algorithms across a Range of Values[a,b]
| 0.487 | 0.985 | 0.912 | 0.855 | 0.509 | |
| 0.611 | 0.95 | 0.798 | 0.882 | 0.438 | |
| 0.693 | 0.909 | 0.712 | 0.901 | 0.379 | |
| 0.778 | 0.879 | 0.677 | 0.924 | 0.364 | |
| 0.824 | 0.862 | 0.661 | 0.938 | 0.349 | |
| 0.909 | 0.825 | 0.628 | 0.965 | 0.305 | |
| 0.933 | 0.799 | 0.602 | 0.973 | 0.291 | |
| 0.961 | 0.719 | 0.527 | 0.983 | 0.236 | |
| 0.974 | 0.677 | 0.496 | 0.988 | 0.214 | |
| 0.492 | 0.989 | 0.938 | 0.857 | 0.651 | |
| 0.694 | 0.962 | 0.857 | 0.906 | 0.536 | |
| 0.706 | 0.949 | 0.819 | 0.908 | 0.476 | |
| 0.750 | 0.921 | 0.756 | 0.919 | 0.356 | |
| 0.833 | 0.907 | 0.744 | 0.943 | 0.311 | |
| 0.885 | 0.905 | 0.753 | 0.960 | 0.261 | |
| 0.918 | 0.832 | 0.640 | 0.969 | 0.216 | |
| 0.938 | 0.761 | 0.561 | 0.974 | 0.184 | |
| 0.975 | 0.687 | 0.503 | 0.988 | 0.156 | |
Notes:
The primary, restricted algorithm included the predictor variables selected by the primary regression model (see Table 3) from the candidate predictors shown in Table 1 and a model intercept. The secondary, extended algorithm included the same predictors as selected by the restricted regression model but with different coefficient values, and a model intercept. The secondary algorithm additionally included cancer screening variables representing 3 look-back intervals.
Selected combinations of sensitivity, specificity, positive predictive value, and negative predictive value. At each combination of test characteristics, the predicted probability threshold is the level at or above which colonoscopies are classified as screening. The optimal point on the ROC curve is defined by the Youden Index, the point at which the sum of sensitivity and specificity is maximized.