| Literature DB >> 32166042 |
Pericles S Giannaris1,2, Zainab Al-Taie1,3, Mikhail Kovalenko1,2, Nattapon Thanintorn2, Olha Kholod1,2, Yulia Innokenteva1, Emily Coberly2, Shellaine Frazier2, Katsiarina Laziuk2, Mihail Popescu4,1,5, Chi-Ren Shyu1,5, Dong Xu5,1, Richard D Hammer2,1, Dmitriy Shin2,1,5.
Abstract
BACKGROUND: Free-text sections of pathology reports contain the most important information from a diagnostic standpoint. However, this information is largely underutilized for computer-based analytics. The vast majority of NLP-based methods lack a capacity to accurately extract complex diagnostic entities and relationships among them as well as to provide an adequate knowledge representation for downstream data-mining applications.Entities:
Keywords: Free-text pathology reports; information extraction; n-ary modeling; structurization
Year: 2020 PMID: 32166042 PMCID: PMC7045509 DOI: 10.4103/jpi.jpi_30_19
Source DB: PubMed Journal: J Pathol Inform
Figure 1Architecture of extraction and structurization of diagnostic information pipeline
Examples of compound relational triples
| (a) “…neoplastic cells have a single large folded multilobulated nucleus with an inconspicuous nucleolus and scant cytoplasm consistent with a popcorn or lymphocyte-predominant-lp cell variant. …” | ||
|---|---|---|
| neoplastic cells | have | large folded multilobulated nucleus with inconspicuous nucleolus consistent with lymphocyte-predominant-lp cell variant |
| neoplastic cells | are positive for | cd30 cd15 |
Figure 2Workflow of the structurization process
Relational triples extracted from the example in the in the case illustration
| RT | Subject | Predicate | Object |
|---|---|---|---|
| 1 | neoplastic cell | are negative for | cd45 cd20 bcl-6 cd10 cd23 alk |
| 2 | mum-1 cd79a | highlight | plasma cells |
RT: Relational triple
Figure 3N-ary modeling representation of the free-text pathology report in the case illustration
Comparative results of extraction of diagnostic information from free-text microscopic description section of pathology reports by different open information extraction methods
| Stanford OpenIE | OLLIE | ClauseIE | CSD | ReVerb | |
|---|---|---|---|---|---|
| Precision | 9.80% | 18.18% | 23.81% | 14.29% | 16.67% |
| Recall | 18.52% | 7.41% | 18.52% | 11.11% | 3.70% |
Figure 4Example of a pathology report demonstrating (i) complex diagnostic entities, (ii) complex relations among these diagnostic entities, and (iii) context in which these complex relations take place
Contingency table for the Fisher’s exact test
| RTs | RTs with score 3 | RTs with score <3 |
|---|---|---|
| Returned | TP: 3,551 | FP: 69 |
| Missed | FN: 485 | TN: 186 |
TP: True positives, FN: False negatives, TN: True negatives, FP: False positives, RTs: Relational triples