| Literature DB >> 24148554 |
Mary F Davis1, Subramaniam Sriram, William S Bush, Joshua C Denny, Jonathan L Haines.
Abstract
OBJECTIVES: The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course.Entities:
Keywords: Multiple sclerosis; electronic health records
Mesh:
Year: 2013 PMID: 24148554 PMCID: PMC3861927 DOI: 10.1136/amiajnl-2013-001999
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1Schematic to represent how the algorithm to determine the origin of first neurological symptom works.
Counts of individuals selected by case algorithms
| Algorithm | No of samples | PPV* (%) | PPV† (%) |
|---|---|---|---|
| Definitive type 1 | 3975 | 96 | 96 |
| Definitive type 2 | 85 | 64 | 79 |
| Possible type 1 | 1315 | 16 | 64 |
| Possible type 2 | 414 | 72 | 86 |
| Total | 5789 | – | – |
Algorithm details are available at http://www.phekb.org/phenotype/multiple-sclerosis-demonstration-project.
*Possible cases counted as false positives.
†Possible cases counted as true positives.
PPV, positive predictive value.
Demographics of all extracted cases
| No of individuals | |
|---|---|
| Gender | |
| Female | 4484 |
| Male | 1305 |
| Age | |
| Median | 54 |
| Range | 8–107 |
| Deceased | 508 |
| Ethnicity | |
| White | 3513 |
| Black | 440 |
| Asian | 11 |
| Hispanic | 16 |
| Native American | 1 |
| Unknown | 1808 |
Age is calculated for the year 2013 using birth year. Deceased includes individuals reported deceased in the EMR by linkage to the social security death index.
EMR, electronic medical record.
Number of individuals for whom information was extracted for each clinical trait out of 5789
| Clinical trait | Individuals, n |
|---|---|
| Clinical subtype | 3140 |
| Oligoclonal bands | 1043 |
| Year of diagnosis | 1053 |
| EDSS | 903 |
| Timed 25-foot walk | 3523 |
| Year of first symptom | 2301 |
| Origin of first symptom | 1288 |
| MS medications | 2586 |
EDSS, Expanded Disability Status Scale; MS, multiple sclerosis.
Statistics of algorithms compared to blinded manual review of 100 charts for all characteristics
| Clinical trait | Gold standard positives, n* | Correctly identified, n* | Recall, % | Precision, % | Specificity, % | F-measure, % |
|---|---|---|---|---|---|---|
| Clinical MS subtype | 61 | 60 | 98 | 88 | 81 | 93 |
| Oligoclonal bands | 28 | 20 | 71 | 87 | 97 | 78 |
| Year of diagnosis | 51 | 17 | 33 | 89 | 100 | 49 |
| Expanded disability status scale | 75 | 61 | 81 | 94 | 100 | 87 |
| Timed 25-foot walk | 120 | 99 | 83 | 99 | 100 | 90 |
| Year of first symptom | 56 | 24 | 43 | 100 | 100 | 60 |
| Origin of first symptom | 62 | 14 | 23 | 88 | 100 | 36 |
| MS medications | 99 | 63 | 64 | 95 | 93 | 76 |
*n refers to how many instances were recorded, not number of individuals. For EDSS, clinical subtype, timed 25-foot walk, medications, and origin of first symptom, this could be more than one per individual.
EDSS, Expanded Disability Status Scale; MS, multiple sclerosis.
Statistics of algorithms after additional modifications
| Clinical trait | Gold standard positives, n* | Correctly identified, n* | Recall, % | Precision, % | Specificity, % | F-measure, % |
|---|---|---|---|---|---|---|
| Timed 25-foot walk | 120 | 108 | 90 | 99 | 100 | 94 |
| Year of first symptom | 56 | 31 | 55 | 97 | 100 | 70 |
| Origin of first symptom | 62 | 21 | 34 | 88 | 93 | 49 |
*n refers to how many instances were recorded, not number of individuals. For timed 25-foot walk and origin of first symptom, this could be more than one per individual.
Figure 2Distributions of timed 25-foot walk scores as found in the structured fields and extracted from the text of the clinical records.