| Literature DB >> 35576579 |
Yaguang Zheng1, Victoria Vaughan Dickson1, Saul Blecker2, Jason M Ng3, Brynne Campbell Rice4, Gail D'Eramo Melkus1, Liat Shenkar5, Marie Claire R Mortejo1, Stephen B Johnson2.
Abstract
BACKGROUND: Accurately identifying patients with hypoglycemia is key to preventing adverse events and mortality. Natural language processing (NLP), a form of artificial intelligence, uses computational algorithms to extract information from text data. NLP is a scalable, efficient, and quick method to extract hypoglycemia-related information when using electronic health record data sources from a large population.Entities:
Keywords: diabetes; electronic health records; hypoglycemia; natural language processing
Year: 2022 PMID: 35576579 PMCID: PMC9152713 DOI: 10.2196/34681
Source DB: PubMed Journal: JMIR Diabetes ISSN: 2371-4379
Figure 1PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart. In the case of Google Scholar, the first 100 results based on relevancy ranking is suggested to identify additional articles, and in the case of ACL Anthology, all the citations found were added to the irrelevant set (excluded based on title and abstract) [30]. NLP: natural language processing.
Summary of studies on natural language processing (NLP) and hypoglycemia.
| Author, year, country | Sample characteristics | Medical conditions | Antihyperglycemic medication | Study design |
| Nunes et al, 2016 [ | N=844,683; age (years; n [%]): <30: 10,138 (1.20), 30 to 39: 38,491 (4.56), 40 to 49: 105,476 (12.49), 50 to 59: 196,494 (23.26), 60 to 69: 232,885 (27.57), >69: 261,199 (30.92); female (n [%]): 433,322 (51.30); White (n [%]): 655,474 (77.60); T2Da (n [%]): 844,683 (100); baseline measures, mean (SD): BMI (kg/m2): 31.8 (10.2), HbA1cb (%): 7.0 (1.9), blood glucose level (mg/dL): 139.0 (82) | Atrial fibrillation (n [%]): 60,773 (7.19); hypertension (n [%]): 555,482 (65.76); hyperlipidemia (n [%]): 510,944 (60.49); cerebrovascular disease (n [%]): 54,336 (6.43); chronic kidney disease: retinopathy (n [%]): 10,356 (1.23), neuropathy (n [%]): 44,352 (5.25), nephropathy (n [%]): 26,498 (3.14); ischemic heart disease (n [%]): 154,049 (18.24); congestive heart failure (n [%]): 59,438 (7.04) | Not specified | Retrospective cohort study |
| Nunes et al 2017 [ | N=143,635; age (years; n [%]): <30: 1333 (0.93), 30 to 39: 5420 (3.77), 40 to 49: 15,645 (10.89), 50 to 59: 32,796 (22.83), 60 to 69: 39,852 (27.75), >69: 48,491 (33.76); female (n [%]): 69,879 (48.65); White (n [%]): 116,701 (81.25); T2D (n [%]): 143,635 (100); baseline measures (median [IQR]): BMI (kg/m2): 32.3 (28.1-37.6), HbA1c (%): 7.1 (6.5-8.1), blood glucose level (mg/dL): 146.0 (116.0-191.0) | N=143,635; cerebrovascular disease (n [%]): 11,903 (8.29); retinopathy (n [%]): 3091 (2.15); neuropathy (n [%]): 12,961 (9.02); nephropathy (n [%]): 8338 (5.80); ischemic heart disease (n [%]): 33,570 (23.37) | N=143,635; sulfonylureas (n [%]): 143,635 (100) | Retrospective cohort study |
| Loughlin et al, 2018 [ | N=6024; EQWc cohort (n [%]): 2008 (33.33%); age (years): —d; female (n [%]): 1004 (50); White (n [%]): 1630 (81.17); T2D (n [%]): 2008 (100); baseline measures: —; BIe cohort (n [%]): 4016 (66.67%); age (years): —; female (n [%]): 2036 (50.70); White (n [%]): 3277 (81.60); T2D (n [%]): 4016 (100); baseline measures: — | — | N=6024; insulin (n [%]): 6024 (100) | Retrospective cohort study |
| Pettus et al, 2019 [ | N=831,456; BI switchers (n=3920 to 19,256); age (years): range 58.2-60.1; female (%): range 49.8-52.0; White (%): —; T2D: (831,456, 100%); baseline measures: BMI (kg/m2): range 33.8-35.0; HbA1c (%): range 8.91-9.02; blood glucose level (mg/dL): —; smoking (%): —. Insulin naïve (n=2279 to 47,085); age (years): range 58.8-60.4; female (%): range 48.6-52.1; White (%): —; T2D (n [%]): (100); baseline measures: BMI (kg/m2): range 34.0-34.6; HbA1c (%): range 9.39-9.64; blood glucose level (mg/dL): —; smoking (%): — | BI switchers: hypertension: 63.4-73.4, hyperlipidemia: 68.1-77.8, microvascular complication: 44.7-55.7, macrovascular complication: 44.2-63.5. Insulin naïve: hypertension: 56.8-74.2; hyperlipidemia: 61.5-77.8, microvascular complication: 25.3-34.6, macrovascular complication: 32.7-63.5 | BI switchers: sulfonylureas: 24.5-28.3; any OADf: 63.6- 75.2. Insulin naïve: sulfonylureas: 47.6-56.6; | Retrospective cohort study |
| Li et al, 2019 [ | N=38,780; age (years), mean: 57.0; female (n [%]): 21,716 (56); White (%): 18,226 (47); T2D (%): —; baseline measures: BMI (kg/m2), mean (SD): 35.7 (9.8); HbA1c (n [%]): ≤6.5%: 5321 (13.72), >6.5% to <7%: 1840 (4.74), ≥7% to <8%: 3155 (8.14), ≥8% to <9%: 1773 (4.57), ≥9%: 3977 (10.26), missing: 22,714 (58.57) | N=38,780; coronary artery disease (n [%]): 2021 (5.21); chronic heart failure (n [%]): 1582 (4.08); diabetic neuropathy (n [%]): 1414 (3.65) | N=38,780; long-acting insulin (LAIg): 615 (1.59); sulfonylureas: 8727 (22.50) | Retrospective cohort study |
| Misra-Hebert et al, 2020 [ | N=204,517; the values provided herein are from a subsample: (n=46,302); age (years): 61.48; female (%):22,633 (48.90); White (%):34,004 (73.40); T2D (n [%]):46,302 (100); baseline measures: BMI (kg/m2): 32.2; HbA1c (%): 6.6; blood glucose level (mg/dL): — | n=46,302; cardiovascular disease (n [%]): 13,372 (28.9); congestive heart failure (n [%]): 2195 (4.7); chronic kidney disease (n [%]): 2460 (5.3) | n=46,302; insulin (n [%]): 8050 (17.4); glucagon-like peptide-1 receptor agonist (n [%]): 1781 (3.8); dipeptidyl peptidase 4: 4437 (9.6); sodium-glucose cotransporter-2 inhibitor (n [%]): 791 (1.7); metformin: 28,851 (62.3); sulfonylureas (n [%]): 10,098 (21.8); alpha-glucosidase inhibitor (n [%]): 107 (0.2) | Retrospective cohort study |
| Uzoigw et al 2020 [ | N=359,087; T2D (n [%]): 317,399 (88.39); age (years), median (IQR): 68.0 (18); female (n [%]): 154,512 (48.68); White (%):121,468 (38.27); baseline measures: BMI (kg/m2): —; HbA1c (%): —; blood glucose level (mg/dL): —; smoking (n [%]):106,760 (33.63). T1Dh: (n [%]): 41,688 (11.61); age (years): median (IQR) 55.0 (30); female (n [%]): 21,034 (50.46); White (n [%]): 16,072 (38.55); baseline measures: BMI (kg/m2): —; HbA1c (%): —; blood glucose level (mg/dL): —; smoking (n [%]): 9174 (22) | T2D: N=317,399; hypertension (n [%]): 257,093 (81); hyperlipidemia (n [%]): 193,616 (61); cardiovascular disease (n [%]): 158,699 (50). T1D: N=41,688; high blood sugar level or diabetic ketoacidosis (n [%]): 14,067 (33.74); cancer (n [%]): 6752 (16.20); stroke (n [%]): 7377 (17.70); substance use or abuse (n [%]): 4917 (11.79) | T2D: N=317,399; insulin (n [%]): 174,569 (55); sulfonylureas (n [%]): 55,710 (17.55); metformin (n [%]): 114,263(36). T1D: N=41,688; insulin (n [%]): 37,279 (89.42); sulfonylureas (n [%]): 1846 (4.43); metformin (n [%]): 5059 (12.14) | Retrospective cohort study |
| Ganz et al 2014 [ | N=7235; HbA1c (%): —; blood glucose level (mg/dL): —; smoking (%): —; T2D (n [%]): 7235 (100); age (years), mean (SD): 60.82 (11.65); female (n [%]): 3668 (50.70); White (n [%]): 4576 (63.25); baseline measures: BMI (kg/m2): —; HbA1c (%): —; blood glucose level (mg/dL): —; smoking (%): — | T2D (n [%]): 7235 (100) | Insulin: glargine (%): 77.24; neutral protamine Hagedorn insulin (%): 5.86; detemir (%): 16.90. Sulfonylureas (%): 38.06; metformin (%): 36.66; other OADs (%): 25.82 | Retrospective cohort study |
aT2D: type 2 diabetes.
bHbA1c: glycated hemoglobin.
cEQW: exenatide once weekly.
dNot available.
eBI: basal insulin.
fOAD: oral antidiabetic drug.
gLAI: long-acting insulin.
hT1D: type 1 diabetes.
Summary of studies on natural language processing (NLP) and hypoglycemia.
| Author, year, country | Data source | Definition of hypoglycemia | Method used to identify hypoglycemia | NLP algorithm: rule-based or machine learning | NLP algorithm validation | Outcomes |
| Nunes et al, 2016 [ | Optum Humedica EHRa database, which incorporates EHRs from 35 large medical provider organizations (including >195 hospitals), >25,000 physicians, and >25 million patients, making up the largest EHR database within the United States (January 2009 to March 2014) | Serious: ICD-9b identified events were characterized as serious or nonserious if the diagnosis was identified within a problem list; NLP-identified categories included serious (eg, serious, acute, severe, and profound); mild to moderate: NLP-identified categories included mild to moderate (eg, mild, moderate, slight, and minor) | ICD-9 algorithm (structured diagnostic codes only); NLP algorithm (NLP of clinical notes); combined algorithm (either ICD-9 diagnostic codes or NLP of clinical notes) | Rule-based | The final algorithm was validated by manual review: precision (PPVc)=0.77, recall (sensitivity)=0.67 | Period prevalence (%): any conditions: ICD-9: 12.37, NLP: 25.11, combined: 32.19; serious: ICD-9: 11.93, NLP: 10.71, combined: 18.72; mild to moderate: ICD-9: 0.00, NLP: 0.76, combined: 0.78. Incidence rate (per 100 person-years): any conditions: ICD-9: 2.25, NLP: 4.78, combined: 6.28. Serious: ICD-9: 2.12, NLP: 1.72, combined: 3.19; mild to moderate: ICD-9: 0.00, NLP: 0.09, combined: 0.08. Event rate (per 100 person-years): any conditions: ICD-9: 6.92, NLP: 10.03, combined: 16.12; serious: ICD-9: 6.63, NLP: 3.06, combined: 8.90; mild to moderate: ICD-9: 0.00, NLP: 0.20, combined: 0.19 |
| Nunes et al, 2017 [ | Optum EHR database (January 2009 to December 2014) | Serious: ICDd and CPTe evidence of medical intervention or abstracted descriptors suggestive of serious event; nonserious, mild to moderate: No ICD or CPT evidence of medical intervention but with abstracted descriptors suggestive of mild to moderate event; nonserious, unspecified: no ICD or CPT evidence of medical intervention and no descriptors of event seriousness | ICD codes and NLP | Rule-based | The final algorithm was validated by manual review: precision (PPV)=0.77, recall (sensitivity)=0.67 | Incidence rate (per 100 person-years; 95% CI): any conditions: overall: 11.76 (11.49-12.04), sulfonylureas use: 12.77 (12.40-13.15), sulfonylureas nonuse: 10.39 (10.00-10.79). Serious: overall: 5.06 (4.88-5.24), sulfonylureas use: 5.77 (5.52-6.03), sulfonylureas nonuse: 4.09 (3.84-4.34). Nonserious, mild to moderate: overall: 0.14 (0.11-0.17), sulfonylureas use: 0.17 (0.13-0.22), sulfonylureas nonuse: 0.09 (0.06-0.13). Nonserious, unspecified: overall: 6.57 (6.37-6.78), sulfonylureas use: 6.83 (6.56-7.11), sulfonylureas nonuse: 6.21 (5.91-6.52) |
| Loughlin et al, 2018 [ | Optum EHR database (January 2012 to January 2015) | Documented blood glucose level <3.9 mmol/L or emergency physician–charted diagnosis of hypoglycemia | Hypoglycemia and gastrointestinal symptoms (vomiting, nausea, diarrhea, or constipation) were identified by using both ICD-9 Clinical Modification diagnostic codes within structured fields and NLP clinical notes; hypoglycemia was identified using an algorithm developed by Optum, incorporated diagnostic codes, and NLP of clinical notes | Rule-based | The final algorithm was validated by manual review: precision (PPV)=0.77, recall (sensitivity)=0.67 | Incidence rate (per 1000 person-years; 95% CI): EQWf cohort: 52.5 (44.4-61.6), BIg cohort: 65.7 (59.1-72.7). Any gastrointestinal symptoms: EQW cohort: 225.5 (206.8-245.5), BI cohort: 191.0 (179.1-203.6). Participants with at least one event (n/N [%]): EQW cohort: 149/2008 (7.42), BI cohort: 368/4016 (9.16). Any gastrointestinal symptoms (n/N [%]): EQW cohort: 534/2008 (26.60), BI cohort: 946/4016 (23.56) |
| Pettus et al, 2019 [ | Optum Humedica EHR database (January 1, 2007, to March 31, 2017) | Hypoglycemia: ICD-9 and ICD-10h codes for hypoglycemia; plasma glucose level measures ≤70 mg/dL; IMi glucagon administration; NLP: mention of hypoglycemia; severe hypoglycemia: ICD-9 and ICD-10 codes for hypoglycemia that is severe by default or ICD-9 and ICD-10 codes for hypoglycemia and hypoglycemia is reason for care on discharge or admission or hypoglycemia index date on same day as emergency department visit or inpatient diagnosis on admission (all related to hypoglycemic coma); plasma glucose level measures <54 mg/dL; IM glucagon administration; NLP: mention of hypoglycemia with either a descriptor of hypoglycemia severity, including severity terms (eg, severe) and attributes (eg, emergency), or emergency department visit or inpatient admission on same day as medical record was written | ICD-9 and ICD-10 codes; plasma glucose measures ≤70 mg/dL; IM glucagon administration; NLP | Rule-based | The final algorithm was validated by manual review: precision (PPV)=0.77, recall (sensitivity)=0.67 | Any hypoglycemia (%): BI switchers: 42.2-46.2. Insulin naïve: 22.8-28.8. Severe hypoglycemia: BI switchers: 8.2-17.4, insulin naïve: 2.7-8.6 |
| Li et al, 2019 [ | Regenstrief Medical Record System, which is an urban safety-net medical institution in Indianapolis, Indiana, United States. In 2012, Eskenazi Health had 1081 physicians on staff and serviced 950,592 outpatient visits, including 234,637 community health center visits (January 1, 2004, to December 31, 2013) | Plasma or point-of-care glucose value of at least 5 mg/dL and <70 mg/dL, documented in the medical record; ICD-9 code: 251.1 or 251.2; ICD code 250.8 without any of the following codes: 259.8, 272.7, 681.xx, 682.x, 686.9, 707.1x, 707.2x, 707.8, 707.9, 709.3, 730.0x, 730.1x, 730.2x, 731.8; text note indicating hypoglycemia, including a blood glucose value | Laboratory tests; diagnostic codes; NLP | Rule-based | — | A 1-year window for prior episodes of hypoglycemia: overall prevalence (n/N [%]): 8182/38,780 (21); non-LAIj and sulfonylureas within 90 days (%): 42.92; sulfonylureas without insulin (%): 23.82; no insulin, no sulfonylureas (%): 17.85%; blood glucose value between 5 mg/dL and 70 mg/dL (n/N [%]): 7070/38,780 (18.23); blood glucose value<54 mg/dL (n/N [%]): 4784/38,780 (12.34); NLP (n/N [%]): 3751/38,780 (9.67), with 539/38,780 (1.39), identified only by NLP |
| Misra-Hebert et al, 2020 [ | Cleveland Clinic Health System patient records (2005 to 2017) | Hypoglycemia: blood glucose level <70 mg/dL; severe hypoglycemia: patients with T2Dk require hospitalization or emergency department visit; nonsevere hypoglycemia: does not require assistance for recovery | NLP; ICD-9 codes: 251.0, 251.1, 251.2; ICD-10 codes: E08.641, E11.641, E11.649, E13.64, E13.641, E13.649, E16.0, E16.1, E15, E16.2 | Rule-based | Compared with clinician chart review manually, PPV=93% | Prevalence: among 204,517 patients with no codes for nonsevere hypoglycemia, evidence of nonsevere hypoglycemia was found in 7035 (3.4%) using NLP. Number of nonsevere hypoglycemia events: ICD codes (n/N [%]): 10,205/204,517 (4.99), NLP: 14,763/204,517 (7.22), with overlap of only 5 events. Incidence proportion of patients from 2005 to 2017 ICD codes (%): severe hypoglycemia: 0.3 to 1.7, nonsevere hypoglycemia: 0.4 to 1.3; NLP+ICD (%): nonsevere hypoglycemia: 0.8 to 2.6 |
| Uzoigw et al, 2020 [ | Amplity Insights database, unstructured health records, generated from provider notes as transcribed from verbal to written form (January 1, 2016, to April 30, 2018) | Nonsymptom-based: mention of hypoglycemia, low blood glucose level or blood glucose value≤70 mg/dL; symptom-based: keywords identified by endocrinologists, used by patients to describe hypoglycemia | ICD codes; NLP | Rule-based | — | Prevalence during 2 years (%): T2D: ICD: 52 (<0.1); combined symptom and nonsymptom-based: 11.4; nonsymptom-based: 7.59; symptom-based: irritable or anxious: 14.50; cognitive issues: 12.14; elevated or irregular heart rate: 10.21. T1Dl: ICD codes: 30 (0.1); combined symptom and nonsymptom-based: 20.4; nonsymptom-based: 18.12; symptom-based: irritable or anxious: 16.00; cognitive issues: 8.17; elevated or irregular heart rate: 8.17 |
| Ganz et al, 2014 [ | Humedica real-time longitudinal clinical data patient-level EHR database (January 2008 to December 2011) | Severe hypoglycemia: blood glucose level≤40 mg/dL | ICD-9 codes 251.0x, 251.1x, 251.2x, or 250.3x on different days; NLP | Rule-based | The final algorithm was validated by manual review: precision (PPV)=0.77, recall (sensitivity)=0.67 | Posttitration follow-up period (1.8 years): incidence rate (per 100 patient-years; 95% CI)=4.63 (4.59-4.67); total severe hypoglycemia rate (per 100 patient-years)=9.69 (9.64-9.75). Incidence rate for patients with history of severe hypoglycemia events (95% CI)=5.91 (5.76-6.06). Total severe hypoglycemia rate for patients with history of severe hypoglycemia events (95% CI)=9.00 (8.87-9.12) |
aEHR: electronic health record.
bICD-9: International Classification of Diseases, Ninth Revision.
cPPV: positive predictive value.
dICD: International Classification of Diseases.
eCPT: Current Procedures Terminology.
fEQW: exenatide once weekly.
gBI: basal insulin.
hICD-10: International Classification of Diseases, Tenth Revision.
iIM: intramuscular.
jLAI: long-acting insulin.
kT2D: type 2 diabetes.
lT1D: type 1 diabetes.
Natural language processing (NLP) algorithms applied in the reviewed studies.
| Study | NLP algorithm type | Details of NLP algorithms |
| Ganz et al, 2014 [ | Rule-based |
Identify terms consistent with hypoglycemia (including alternative or incorrect spellings and abbreviations) Identify descriptive attributes of the hypoglycemia mention (eg, seriousness, duration, and frequency) Identify sentiment of the mention (eg, denial and affirmation, including “has,” “diagnosed,” and “present”) Identify contextual information (eg, note section headers and neighboring text). Sections such as “history of present illness,” “assessment,” “hospital course,” “reason,” “review of symptoms,” and “chief complaint” generally reflected occurrence of hypoglycemia |
| Li et al, 2019 [ | Rule-based |
A formally defined pattern (regular expression), which identified clinical reports mentioning a “blood sugar word” followed within 5 words by what could be a low blood sugar value represented by a number ranging from 10 to 69 |
| Misra-Hebert et al, 2020 [ | Rule-based |
Split clinical notes into sentences and phrases Filter sentences and phrases to those containing a hypoglycemia-related Unified Medical Language System concept Identify temporal phrases (when the event occurred) Classify polarity (assertion or negation) into no, nonsevere, and severe event |
| Uzoigwe et al, 2020 [ | Rule-based |
Identify keywords or concepts of interest: symptom-based and nonsymptom-based hypoglycemic events Symptom-based terms: neuroglycopenic and adrenergic symptomology associated with hypoglycemia. Adrenergic symptomology: elevated or irregular heart rate, sweating, tremor, trembling, tingling, or shaking, and vision impairment Neuroglycopenic symptomology: cognitive issues, irritable or anxious, mood or behavior change+NOT substance abuse or alcohol, slurred speech+NOT stroke+NOT substance abuse or alcohol Nonsymptom-based definition: Mention of “hypoglycemia” Relevant medical ontology such as “low glucose” A blood glucose laboratory value ≤70 mg/dL documented |