| Literature DB >> 35663112 |
Oliver J Bear Don't Walk1, Harry Reyes Nieva1,2, Sandra Soo-Jin Lee3, Noémie Elhadad1.
Abstract
Objectives: To review through an ethics lens the state of research in clinical natural language processing (NLP) for the study of bias and fairness, and to identify gaps in research.Entities:
Keywords: bias; ethically informed; fairness; natural language processing
Year: 2022 PMID: 35663112 PMCID: PMC9154253 DOI: 10.1093/jamiaopen/ooac039
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Search terms and queries for PubMed and Google Scholar
| Source | Search term | Query | ||
|---|---|---|---|---|
| NLP | Clinical data | Bias | ||
| PubMed |
“natural language processing”, “machine learning”, “artificial intelligence”, “information storage and retrieval” |
“unstructured”, “electronic health records”, “clinical” |
“bias*”, “fair”, “fairness”, “health disparities”, “explicability”, “interpretab*”, “explainab*” | (“natural language processing” OR “machine learning” OR “artificial intelligence” OR “information storage and retrieval”) AND (“unstructured” OR “electronic health records” OR “clinical”) AND (“bias*” OR “fair” OR “fairness” OR “health disparities” OR “explicability” OR “interpretab*” OR “explainab*”) |
| Google Scholar |
“natural language processing”, “machine learning” |
“clinical note”, “clinical text”, “electronic health records” |
“bias*”, “fairness”, “health disparities” | (“natural language processing” OR (“machine learning” AND (“clinical note” OR “clinical text”))) AND (“electronic health records”) AND (“bias*” OR “fairness” OR “health disparities” |
Figure 1.Proposed stages of the ML development process. Design, data, and algorithm capture stages discussed in prior work, while the critique stage incorporates Box’s Loop and illustrates the cyclic nature to inherent in development. ML: machine learning.
Figure 2.The ethical framework proposed by AI4People—An Ethical Framework for a Good AI Society: Opportunities, Risks, Principles, and Recommendations and used to understand the complex interactions between multiple actors and clinical NLP technologies in this work. The framework focuses on the 4 traditional bioethics principles and introduces explicability to enable the other principles for application to AI. AI: artificial intelligence; NLP: natural language processing.
Figure 3.Articles were analyzed according to the ML development process and an ethical framework resulting in this matrix of analysis. ML: machine learning.
Figure 4.A flowchart of the article screening process in accordance with PRISMA guidelines.
Different measures for biased models discussed throughout the work identified in this scoping review
| Bias measure | Description | Relevant article(s) |
|---|---|---|
| Parity gap | Positive prediction differences between 2 groups | Zhang et al |
| Recall gap | Recall difference between 2 groups | Zhang et al |
| Specificity gap | Specificity difference between 2 groups | Zhang et al |
| AUC gap | AUC difference between 2 groups | Tsui et al |
| Zero-one loss gap | Zero-one loss difference between 2 groups | Chen et al |
| Sentence log probability gap | Difference in a language model’s sentence log probability when swapping out demographic information (eg, discussion of race) | Zhang et al |
| Rank-turbulence divergence | Ranks occurrences of | Minot et al |
| Conditional prediction parity | Fairness criteria that assess conditional independence between a model outcome and a demographics class. Encompasses notions of the parity gap. | Pfohl et al |
| Calibration fairness criteria | Measures model calibration across groups. | Pfohl et al |
| Cross-group ranking measures | Variation on AUC that measures how often positive instances in 1 group are ranked above negative instances in another | Pfohl et al |
| Sensitive attribute recovery | Measures how well a sensitive attribute (eg, gender) can be recovered | Minot et al |
| Demographic association with outcome | Significant association between patient demographics and model outcome using regression parameters | Wissel et al |
| Gold-standard bias comparison | Compare group representation to previous standard’s representation | Weber et al, Polling et al |
Note: This does not include measures of bias for data or healthcare delivery.