| Literature DB >> 33371009 |
Hee Yun Seol1,2, Mary C Rolfes3, Wi Chung1, Sunghwan Sohn4, Euijung Ryu5, Miguel A Park6, Hirohito Kita6, Junya Ono7, Ivana Croghan8, Sebastian M Armasu5, Jose A Castro-Rodriguez9, Jill D Weston1, Hongfang Liu4, Young Juhn10.
Abstract
INTRODUCTION: The lack of effective, consistent, reproducible and efficient asthma ascertainment methods results in inconsistent asthma cohorts and study results for clinical trials or other studies. We aimed to assess whether application of expert artificial intelligence (AI)-based natural language processing (NLP) algorithms for two existing asthma criteria to electronic health records of a paediatric population systematically identifies childhood asthma and its subgroups with distinctive characteristics.Entities:
Keywords: asthma; asthma epidemiology; paediatric asthma
Year: 2020 PMID: 33371009 PMCID: PMC7011897 DOI: 10.1136/bmjresp-2019-000524
Source DB: PubMed Journal: BMJ Open Respir Res ISSN: 2052-4439
Asthma criteria
| 1–1. Predetermined Asthma Criteria (PAC) | |
|
Patients were considered to have Substantial variability in symptoms from time to time or periods of weeks or more when symptoms were absent. Two or more of the following: Sleep disturbance by nocturnal cough and wheeze. Non-smoker (14 years or older). Nasal polyps. Blood eosinophilia higher than 300/µL. Positive weal and flare skin tests OR elevated serum IgE. History of hay fever or infantile eczema OR cough, dyspnoea and wheezing regularly on exposure to an antigen. Pulmonary function tests showing one FEV1 or FVC less than 70% predicted and another with at least 20% improvement to an FEV1 of higher than70% predicted OR methacholine challenge test showing 20% or greater decrease in FEV1. Favourable clinical response to bronchodilator. Pulmonary function tests that showed FEV1 to be consistently below 50% predicted or diminished diffusion capacity. Tracheobronchial foreign body at or about the incidence date. Hypogammaglobulinaemia (IgG less than 2.0 mg/mL) or other immunodeficiency disorder. Wheezing occurring only in response to anaesthesia or medications. Bullous emphysema or pulmonary fibrosis on chest radiograph. PiZZ alpha1-antitrypsin. Cystic fibrosis. Other major chest disease such as juvenile kyphoscoliosis or bronchiectasis. | |
|
| |
|
|
|
|
Physician diagnosis of asthma for parents. Physician diagnosis of eczema for patient. |
Physician diagnosis of allergic rhinitis for patient. Wheezing apart from colds. Eosinophilia (≥4%). |
*Asthma is determined by frequent wheezing episodes (two or more) plus at least one of two major criteria or two of three minor criteria.
FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity.
Sociodemographic and clinical characteristics for subgroups of asthma by NLP-PAC and NLP-API.
| NLP-PAC+/ | NLP-PAC+ only | NLP-API+ only (n=105) | NLP-PAC−/ | Total | P value | |
| Age at the last follow-up date, years, mean (SD) | 12.2 (3.1) | 12.2 (3.1) | 11.8 (3.2) | 11.6 (3.2) | 11.8 (3.2) | <0.001 |
| Male, n (%) | 959 (59) | 521 (55) | 58 (55) | 2664 (48) | 4202 (51) | <0.001 |
| Ethnicity, n (%) | 0.007 | |||||
| White | 1289 (80) | 779 (82) | 77 (73) | 4414 (80) | 6559 (80) | |
| Black | 93 (6) | 53 (6) | 8 (8) | 244 (4) | 398 (5) | |
| Hispanic | 50 (3) | 27 (3) | 7 (7) | 233 (4) | 317 (4) | |
| Asian | 69 (4) | 26 (3) | 5 (5) | 270 (5) | 370 (5) | |
| Others | 101 (6) | 60 (6) | 7 (7) | 314 (6) | 482 (6) | |
| HOUSES* at birth in the lowest quartile, n (%) | 343 (21) | 191 (20) | 23 (22) | 971 (18) | 1528 (19) | 0.004 |
| Overweight†, n (%) | 281 (17) | 179 (19) | 20 (19) | N/A | 480 f (18) | 0.62‡ |
| Maternal smoking history during pregnancy, n (%) | 131 (8) | 88 (9) | 12 (11) | 105 (2) | 336 (4) | 0.52 |
| Family history of asthma, n (%) | 586 (36) | 135 (14) | 36 (34) | 826 (15) | 1583 (19) | <0.001 |
| Well-child visit per year, mean (SD) | 0.95 (0.31) | 0.90 (0.29) | 0.96 (0.36) | 0.93 (0.36) | 0.93 (0.34) | <0.001 |
The percentage of each variable was calculated with the number of each group (column %).
*HOUSES: individual-level housing-based socioeconomic status measure in quartile.
†Overweight at asthma index date (for children age 2 years or more, body mass index-for-age at or above 85% used (https://www.cdc.gov/obesity/childhood/defining.html)); for those age less than 2 years, weight-for-lengths at or above 95%; (https://www.cdc.gov/nccdphp/dnpao/growthcharts/who/using/).
‡Overweight for NLP-PAC−/NLP-API− group is not available; p values were calculated among asthmatic group, n=2673 (NLP-PAC+/NLP-API+, NLP-PAC+ only and NLP-API+ only).
NLP, natural language processing; NLP-API, NLP algorithms for Asthma Predictive Index; NLP-PAC, NLP algorithms for Predetermined Asthma Criteria.
Asthma-specific clinical and laboratory characteristics for the study subjects by subgroup
| NLP-PAC+/ | NLP-PAC+ only (n=954) | NLP-API+ only (n=105) | NLP-PAC−/ | Total | P value | |
|
| ||||||
| Physician diagnosed asthma, n (%) | 1123 (70) | 554 (58) | 2 (2) | N/A | 1679j (21) | <0.001 |
| Age at first physician diagnosis of asthma, years, mean (SD) | 4.3 (3.4) | 6.1 (4.3) | 8.8 (0.5) | N/A | 4.9j (3.8) | <0.001 |
| Age at asthma onset by PAC or API, years, mean (SD) | 3.3 | 4.6 (4.2) | 6.4 (4.8) | N/A | 3.9j (3.8) | <0.001 |
|
| ||||||
| Eczema, n (%) | 794 (49) | 127 (13) | 62 (59) | 1403 (25) | 2386 (29) | <0.001 |
| Allergic rhinitis, n (%) | 646 (40) | 150 (16) | 30 (29) | 660 (12) | 1486 (18) | <0.001 |
| Eosinophilia*, n (%) | 565 (35) | 120 (13) | 33 (31) | 905 (16) | 1623 (20) | <0.001 |
| Unavailable, n (%) | 527 (33) | 443 (46) | 30 (29) | 2740 (50) | 3740 (46) | |
| Total IgE (kU/L)>300, n (%) | 50 (3) | 8 (1) | 1 (1) | 19 (0) | 78 (1) | 0.02 |
| Unavailable, n (%) | 1438 (89) | 918 (96) | 97 (92) | 5387 (98) | 7840 (96) | |
| Elevated IgE to any aeroallergen†, n (%) | 59 (4) | 13 (1) | 1 (1) | 38 (1) | 111 (1) | 0.55 |
| Unavailable, n (%) | 1459 (90) | 922 (97) | 99 (94) | 5405 (98) | 7885 (96) | |
|
| ||||||
| FEV1/FVC <0.85, n (%) | 346 (21) | 91 (10) | 4 (4) | 37 (1) | 478 (6) | <0.001 |
| Unavailable, n (%) | 1021 (63) | 756 (79) | 100 (95) | 5424 (98) | 7301 (89) | |
| Acute exacerbation of asthma, n (%) | 359 (22.2) | 72 (8) | 4 (4) | N/A | 435j (16) | <0.001 |
| HEDIS-defined persistent asthma, n (%) | 423 (26) | 112 (12) | 1 (1) | N/A | 53§ (20) | <0.001 |
|
| ||||||
| Pneumonia, n (%) | 539 (33) | 245 (26) | 23 (22) | 764 (14) | 1571 (19) | <0.001 |
| Tympanostomy tube, n (%) | 192 (12) | 80 (8) | 13 (12) | 337 (6) | 622 (8) | <0.001 |
| Pertussis, n (%) | 39 (2) | 18 (2) | 5 (5) | 71 (1) | 133 (2) | <0.001 |
| Zoster, n (%) | 37 (2) | 14 (2) | 3 (3) | 77 (1) | 131 (2) | 0.05 |
| Appendicitis, n (%) | 30 (2) | 9 (1) | 8 (8) | 67 (1) | 114 (1) | <0.001 |
| Frequency of viral infection per year, mean (SD) | 0.67 (0.43) | 0.51 (0.34) | 0.50 (0.42) | 0.32 (0.28) | 0.41 (0.35) | <0.001 |
| Frequency of strep infection‡ per year, mean (SD) | 0.15 (0.18) | 0.14 (0.18) | 0.13 (0.15) | 0.12 (0.16) | 0.13 (0.17) | <0.001 |
| Coeliac disease, n (%) | 21 (1) | 5 (1) | 0 (0) | 44 (1) | 70 (1) | 0.10 |
The percentage of each variable was calculated with the number of each group (column %).
*Eosinophilia defined by >300/µL (PAC) or ≥4% (API).
†Elevated IgE defined by >0.35 kU/L to any aeroallergen among alternaria tenuis, cat epithelium, dog dander, house dust mite/D.F., house dust mite/D.P., elm, oak, short ragweed and timothy grass.
‡Strep infection, Streptococcus pyogenes upper respiratory infection.
§Physician-diagnosed asthma, age at first physician diagnosis of asthma, age at asthma onset by PAC or API, acute exacerbation of asthma and HEDIS-defined persistent asthma for NLP-PAC−/NLP-API− group are not available; p values were calculated among asthmatic group, n=2673 (NLP-PAC+/NLP-API+, NLP-PAC+ only and NLP-API+ only).
FEV1, forced expiratory volume in 1 s; FVC, forced vital capacity; HEDIS, Healthcare Effectiveness Data and Information Set; NLP-API, NLP algorithms for Asthma Predictive Index; NLP-PAC, NLP algorithms for Predetermined Asthma Criteria.
Figure 1Characteristics of laboratory and pulmonary function test results among a random sample of the original study cohort (n=300). NLP, natural language processing; NLP-API, NLP algorithms for Asthma Predictive Index; NLP-PAC, NLP algorithms for Predetermined Asthma Criteria.
Figure 2A heatmap of variable clusters (rows) and subject clusters (columns) among asthmatics (NLP-PAC+/NLP-API+, NLP-PAC+ only and NLP-API+ only group), which was identified by non-negative matrix factorisation with three optimal clusters (see Statistical analysis section for details). This heatmap consists of two cluster axis, while the three horizontal (rows) clusters present the sociodemographic and clinical variable clusters (eg, cluster 1 includes more children with spring birth, family history of asthma and history of eczema), the three vertical (columns) clusters present patient clusters (eg, 82% of cluster A represented group of NLP-PAC+/NLP-API+ (light blue)). Each rectangular red box in the map represents presence of each variable (eg, cluster 1 includes more children with history of allergic rhinitis compared with other two clusters), while each blue box represents absence of each variable. Subject cluster A (n=655) had the following characteristics: spring birth, more frequent family history of asthma, eczema, allergic rhinitis, eosinophilia, persistent asthma, asthma exacerbation, pneumonia, pertussis, tympanostomy tube, coeliac disease, no smoking during pregnancy, high SES defined by HOUSES and viral and streptococcal infection, compared with cluster B and C (see online supplementary table 4 above). HOUSES, HOUsing-based Index of SocioEconomic Status; NLP-API, NLP algorithms for Asthma Predictive Index; NLP-PAC, NLP algorithms for Predetermined Asthma Criteria; SES, socioeconomic status.
Characteristics for the clusters using non-negative matrix factorisation technique among asthmatics (NLP-PAC+/NLP-API+, NLP-PAC+ only and NLP-API+ only group) (the three clusters were depicted in the heatmap analysis in figure 2)
| Cluster A (n=655) | Cluster B (n=843) | Cluster C (n=1175) | Total (n=2673) | P value | |
| Group defined by NLP, n (%) | <0.001* | ||||
| Group 1 (NLP-PAC+/NLP-API+) | 536 (82) | 430 (51) | 648 (55) | 1614 (60) | |
| Group 2 (NLP-PAC+ only) | 110(17) | 369 (44) | 475 (40) | 954 (36) | |
| Group 3 (NLP-API+ only) | 9 (1) | 44 (5) | 52 (4) | 105 (4) | |
| Male, n (%) | 369 (56) | 442 (52) | 727 (62) | 1538 (58) | <0.001* |
| Ethnicity, n (%) | <0.001* | ||||
| White | 575 (88) | 765 (91) | 805 (69) | 2145 (80) | |
| HOUSES at birth in lowest quartile, n (%) | 114 (17) | 112 (13) | 331 (28) | 557 (21) | <0.001* |
| Overweight†, n (%) | 108 (17) | 176 (21) | 196 (17) | 480 (18) | 0.021* |
| Unknown, n (%) | 2 (0) | 13(2) | 9 (1) | 24 (1) | |
| Maternal smoking history during pregnancy, n (%) | 42 (6) | 63 (8) | 126 (11) | 231 (9) | <0.001* |
| Unknown, n (%) | 44 (7) | 104 (12) | 162 (14) | 310 (12) | |
| Infrequent well-child visit per year in quartile in the lowest quartile, n (%) | 101 (15) | 332 (39) | 235 (20) | 668 (25.0) | <0.001* |
| Family history of asthma, n (%) | 242 (37) | 244 (29) | 271 (23) | 757 (28) | <0.001* |
| Early onset asthma‡, n (%) | 475 (73) | 532 (63) | 1031 (88) | 2038 (76) | <0.001* |
| Eczema, n (%) | 329 (50) | 254 (30) | 400 (34) | 983 (37) | <0.001* |
| Allergic rhinitis, n (%) | 389 (59) | 22 7(27) | 210 (18) | 826 (31) | <0.001* |
| Eosinophilia, n (%)§ | 271 (41) | 147 (17) | 300 (26) | 718 (27) | <0.001* |
| Unknown, n (%) | 153 (23) | 329 (39) | 518 (44) | 1000 (37) | |
| Total IgE (kU/L)>300, n (%) | 41 (6) | 7 (1) | 11 (1) | 59 (2) | 0.32* |
| Unknown, n (%) | 520 (79) | 809 (96) | 1124 (96) | 2453 (92) | |
| Elevated IgE to any aeroallergen, n (%)¶ | 48 (7) | 11 (1) | 14 (1) | 73 (3) | 0.68* |
| Unknown, n (%) | 535 (82) | 813 (96) | 1132 (96) | 2480 (93) | |
| FEV1/FVC<0.85, n (%) | 238 (36) | 136 (16) | 67 (6) | 441 (17) | 0.65* |
| Unknown, n (%) | 237 (36) | 590 (70) | 1050 (89) | 1877 (70) | |
| Acute exacerbation of asthma, n (%)** | 262 (40) | 77 (9) | 96 (8) | 435 (16) | <0.001* |
| HEDIS††-defined persistent asthma, n (%) | 306 (47) | 123 (15) | 107 (9) | 536 (20) | <0.001* |
| Pneumonia, n (%) | 301 (46) | 179 (21) | 327 (28) | 807 (30) | <0.001* |
| PE tube, n (%) | 106 (16) | 57 (7) | 122 (10) | 285 (11) | <0.001* |
| Pertussis, n (%) | 23 (4) | 19 (2) | 20 (2) | 62 (2) | 0.047* |
| Zoster, n (%) | 17 (3) | 23 (3) | 14 (1) | 54 (2) | 0.026* |
| Appendicitis, n (%) | 11 (2) | 23 (3) | 13 (1) | 47 (2) | 0.023* |
| Coeliac disease, n (%) | 9 (1) | 6 (1) | 11 (1) | 26 (1) | 0.42* |
| Frequent viral infection per year in group in the highest quartile, n (%) | 403 (62) | 13 (2) | 253 (22) | 669 (25) | <0.001* |
| Frequent strep infection per year in the highest quartile, n (%) | 323 (49) | 9 (1) | 102 (9) | 434 (16) | <0.001* |
*Pearson’s χ2 test.
†Overweight at asthma index date (for children age 2 years or more, BMI for age at or above 85% used (https://www.cdc.gov/obesity/childhood/defining.html)).
‡Early onset asthma defined by age at either criteria met date <6 years old.
§Eosinophilia defined by >300/µL (PAC) or ≥4% (API).
¶Elevated IgE defined by >0.35 kU/L to any aeroallergen among alternaria tenuis, cat epithelium, dog dander, house dust mite/D.F., house dust mite/D.P., elm, oak, short ragweed, timothy grass.
**Acute exacerbation of asthma defined by any of ER visit, hospitalisation or systemic corticosteroid use for asthma during follow-up period.
††Healthcare Effectiveness Data and Information Set.
‡‡PE tube, pressure equalising tube as a surrogate marker for frequent ear infection.
§§Strep infection, Streptococcus pyogenes upper respiratory infection.
¶¶Season: spring (March–May), summer (June–August), autumn (September–November) and winter (December, January and February).
BMI, body mass index; HOUSES, HOUsing-based Index of SocioEconomic Status; NLP, natural language processing; NLP-API, NLP algorithms for Asthma Predictive Index; NLP-CAP, NLP algorithms for Predetermined Asthma Criteria.
List of ICD-9 and Current Procedural Terminology (CPT) codes used for identifying asthma-associated infectious and inflammatory diseases comorbidities
| Asthma-associated comorbidities | ICD-9 codes | CPT codes | Lab result |
| Pneumonia | 486 | N/A | N/A |
| Pertussis | 033, V01.89 (pertussis only) | N/A |
|
| Zoster | 53 | N/A | N/A |
| Appendicitis | 540–541 | N/A | N/A |
| PE tube placement | 20.01–20.1 | 126, 69421, 69433, 69436, 69620, | |
| Coeliac disease | 579 | N/A | N/A |