| Literature DB >> 35468085 |
Danqing Hu1, Shaolei Li2, Huanyao Zhang1, Nan Wu2, Xudong Lu1.
Abstract
BACKGROUND: Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non-small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use.Entities:
Keywords: algorithm; decision making; electronic medical records; forest modeling; lung cancer; lymph node metastasis prediction; machine learning; natural language processing; non–small cell lung cancer; prediction models
Year: 2022 PMID: 35468085 PMCID: PMC9086872 DOI: 10.2196/35475
Source DB: PubMed Journal: JMIR Med Inform
Questions and entity types for natural language processing–extracted features.
| Question (Chinese) | Question (English) | Answer notation | Entity type | ||||
|
| |||||||
|
| 原发肿物的相关描述是什么? | What is the description about the primary tumor? | Head1 | Tumor | |||
|
| 淋巴结的相关描述是什么? | What is the description about the lymph nodes? | Head2 | Lymph node | |||
|
| |||||||
|
| Head1 位于什么地方? | Where is Head1 located? | Tail1 | Location | |||
|
| Head1 的大小是多少? | What is the size of Head1? | Tail2 | Size | |||
|
| Head1 的形状是什么? | What is the shape of Head1? | Tail3 | Shape | |||
|
| Head1 的密度是什么? | What is the density of Head1? | Tail4 | Density | |||
|
| 与Head1 相关的胸膜侵犯的描述是什么? | What is the description about the pleura invasion related to Head1? | Tail5 | Pleura | |||
|
| 与Head1 相关的血管侵犯的描述是什么? | What is the description about the vessel invasion related to Head1? | Tail6 | Vessel | |||
|
| Head2 位于什么地方? | Where is Head2 located? | Tail7 | Location | |||
|
| Head2 的大小是多少? | What is size of Head2? | Tail8 | Size | |||
Figure 1A case of multiturn question answering application. BERT: bidirectional encoder representations from transformers.
Patient characteristics.
|
| Total (n=794) | LNMa status | ||||
|
|
| pN2b (n=105) | pN0c or pN1d (n=689) |
| ||
| Age (years), mean (SD) | 60.92 (51.48 to 70.36) | 60.87 (51.87 to 69.86) | 60.93 (51.42 to 70.44) | .45 | ||
|
| —e | — | — | .06 | ||
|
| Male | 397 | 62 | 335 | — | |
|
| Female | 397 | 43 | 354 | — | |
|
| — | — | — | .04 | ||
|
| Yes | 337 | 55 | 282 | — | |
|
| No | 457 | 50 | 407 | — | |
|
| — | — | — | .94 | ||
|
| Yes | 183 | 25 | 158 | — | |
|
| No | 611 | 80 | 531 | — | |
|
| — | — | — | .32 | ||
|
| Yes | 137 | 14 | 123 | — | |
|
| No | 657 | 91 | 566 | — | |
|
| — | — | — | .18 | ||
|
| Yes | 232 | 37 | 195 | — | |
|
| No | 562 | 68 | 494 | — | |
|
| — | — | — | .25 | ||
|
| Yes | 84 | 15 | 69 | — | |
|
| No | 710 | 90 | 620 | — | |
|
| — | — | — | .33 | ||
|
| Yes | 33 | 2 | 31 | — | |
|
| No | 761 | 103 | 658 | — | |
|
| — | — | — | .06 | ||
|
| Yes | 36 | 9 | 27 | — | |
|
| No | 758 | 96 | 662 | — | |
|
| — | — | — | .35 | ||
|
| Yes | 29 | 6 | 23 | — | |
|
| No | 765 | 99 | 666 | — | |
|
| — | — | — | .22 | ||
|
| RULg | 249 | 27 | 222 | — | |
|
| RMLh | 59 | 4 | 55 | — | |
|
| RLLi | 150 | 18 | 132 | — | |
|
| LULj | 185 | 31 | 154 | — | |
|
| LLLk | 126 | 21 | 105 | — | |
|
| Other | 25 | 4 | 21 | — | |
| TLAf,l, median (IQR) | 2.61 (1.20 to 4.01) | 3.02 (1.64 to 4.39) | 2.55 (1.15 to 3.94) | <.001 | ||
| TSAf,m, median (IQR) | 2.03 (0.88 to 3.18) | 2.38 (1.27 to 3.48) | 1.98 (0.83 to 3.13) | <.001 | ||
|
| — | — | — | .08 | ||
|
| Yes | 255 | 42 | 213 | — | |
|
| No | 539 | 63 | 476 | — | |
|
| — | — | — | <.001 | ||
|
| Yes | 211 | 48 | 163 | — | |
|
| No | 583 | 57 | 526 | — | |
|
| — | — | — | <.001 | ||
|
| pGGOn | 124 | 0 | 124 | — | |
|
| mGGOo | 96 | 3 | 93 | — | |
|
| Solid nodule | 574 | 102 | 472 | — | |
|
| — | — | — | .87 | ||
|
| Yes | 52 | 6 | 46 | — | |
|
| No | 742 | 99 | 643 | — | |
|
| — | — | — | .001 | ||
|
| Yes | 406 | 70 | 336 | — | |
|
| No | 388 | 35 | 353 | — | |
|
| — | — | — | .008 | ||
|
| >10 mm | 148 | 30 | 118 | — | |
|
| ≤10 mm | 646 | 75 | 571 | — | |
|
| — | — | — | <.001 | ||
|
| >10 mm | 66 | 19 | 47 | — | |
|
| ≤10 mm | 728 | 86 | 642 | — | |
|
| — | — | — | <.001 | ||
|
| >10 mm | 191 | 50 | 141 | — | |
|
| ≤10 mm | 603 | 55 | 548 | — | |
|
| — | — | — | <.001 | ||
|
| >10 mm | 72 | 27 | 45 | — | |
|
| ≤10 mm | 722 | 78 | 644 | — | |
| CEAt, median (IQR) | 5.31 (–6.66 to 17.27) | 12.66 (–8.44 to 33.76) | 4.18 (–5.17 to 13.54) | <.001 | ||
| CA199u, median (IQR) | 14.41 (–3.24 to 32.06) | 15.80 (–5.08 to 36.68) | 14.20 (–2.90 to 31.29) | .47 | ||
| CA125v, median (IQR) | 14.46 (0.03 to 28.90) | 19.88 (–5.56 to 45.32) | 13.64 (1.96 to 25.32) | <.001 | ||
| NSEw, median (IQR) | 15.81 (8.85 to 22.78) | 16.26 (10.19 to 22.33) | 15.75 (8.66 to 22.83) | .048 | ||
| Cyfra211x, median (IQR) | 3.20 (–0.23 to 6.62) | 3.55 (–0.64 to 7.75) | 3.14 (–0.15 to 6.43) | .06 | ||
| SCCAgy, median (IQR) | 0.96 (–0.16 to 2.08) | 1.18 (–0.62 to 2.99) | 0.93 (–0.04 to 1.90) | .14 | ||
aLNM: lymph node metastasis.
bpN2: pathological N stage 2.
cpN0: pathological N stage 0.
dpN1: pathological N stage 1.
eNot applicable.
fFeatures recorded in computed tomography reports.
gRUL: right upper lobe.
hRML: right middle lobe.
iRLL: right lower lobe.
jLUL: left upper lobe.
kLLL: left lower lobe.
lTLA: tumor long axis.
mTSA: tumor short axis
npGGO: pure ground glass opacity.
omGGO: mixed ground glass opacity.
pHLNLA: hilar lymph node long axis.
qHLNSA: hilar lymph node short axis.
rMLNLA: mediastinal lymph node long axis.
sMLNSA: mediastinal lymph node short axis.
tCEA: carcinoembryonic antigen.
uCA199: carbohydrate antigen 19-9.
vCA125: carbohydrate antigen 12-5.
wNSA: neuron-specific enolase.
xCyfra211: cytokeratin 19-fragments.
ySCCAg: squamous cell carcinoma antigen.
Performances of pN2 lymph node metastasis prediction models.
| Model | AUCa | APb | ||||
|
| Mean | SD | 95% CI | Mean | SD | 95% CI |
| LRc | 0.778 | 0.041 | 0.747-0.809 | 0.442 | 0.075 | 0.385-0.499 |
| L2-LRd | 0.768 | 0.038 | 0.739-0.796 | 0.413 | 0.072 | 0.359-0.467 |
| ANNe | 0.769 | 0.051 | 0.730-0.808 | 0.434 | 0.095 | 0.363-0.506 |
| SVMf | 0.771 | 0.071 | 0.718-0.825 | 0.453 | 0.084 | 0.389-0.516 |
| RFg | 0.792 | 0.042 | 0.760-0.825 | 0.456 | 0.075 | 0.399-0.512 |
| LGBMh | 0.787 | 0.044 | 0.755-0.820 | 0.457 | 0.101 | 0.381-0.534 |
aAUC: area under the receiver operating characteristic curve.
bAP: average precision.
cLR: logistic regression.
dL2-LR: L2-logistic regression.
eANN: artificial neural network.
fSVM: support vector machine.
gRF: random forest.
hLGBM: LightGBM.
Figure 2The receiver operating characteristic curve (A) and precision-recall curves (B) of pN2 prediction models.
Performances of pN1&N2 lymph node metastasis prediction models.
| Model | AUCa | APb | ||||
|
| Mean | SD | 95% CI | Mean | SD | 95% CI |
| LRc | 0.740 | 0.035 | 0.714-0.766 | 0.467 | 0.058 | 0.423-0.510 |
| L2-LRd | 0.736 | 0.044 | 0.704-0.769 | 0.465 | 0.058 | 0.422-0.509 |
| ANNe | 0.734 | 0.047 | 0.698-0.770 | 0.479 | 0.087 | 0.413-0.545 |
| SVMf | 0.735 | 0.023 | 0.717-0.752 | 0.474 | 0.047 | 0.439-0.509 |
| LGBMg | 0.768 | 0.030 | 0.745-0.791 | 0.524 | 0.044 | 0.491-0.557 |
| RFh | 0.771 | 0.026 | 0.752-0.791 | 0.524 | 0.057 | 0.481-0.567 |
aAUC: area under the receiver operating characteristic curve.
bAP: average precision.
cLR: logistic regression.
dL2-LR: L2-logistic regression.
eANN: artificial neural network.
fSVM: support vector machine.
gRF: random forest.
hLGBM: LightGBM.
Figure 3The receiver operating characteristic curve (A) and precision-recall curves (B) of pN1&N2 prediction models.
Top 10 important features for pN2 lymph node metastasis prediction.
| Rank | LRa | L2-LRb | RFc | LGBMd | All | ||||||||
|
| Feature | Weight | Feature | Weight | Feature | Weight | Feature | Weight |
| ||||
| 1 | pGGOe,f | –10.383 | CEAg | 3.530 | CEA | 0.229 | CEA | 46.0 | CEA | ||||
| 2 | CEA | 6.010 | CA125h | 3.067 | CA125 | 0.094 | Age | 23.3 | Solid nodulef | ||||
| 3 | CA125 | 4.728 | pGGOf | –1.799 | Solid nodulef | 0.094 | Solid nodulef | 18.8 | CA125 | ||||
| 4 | Solid nodulef | 3.683 | Solid nodulef | 1.773 | MLNSAf,i | 0.073 | TLAf,j | 17.6 | Age | ||||
| 5 | TLAf | –2.701 | Age | –1.315 | MLNLAf,k | 0.072 | TSAf,l | 15.1 | MLNLAf | ||||
| 6 | Age | –1.908 | SCCAgm | 0.944 | TLAf | 0.054 | CA125 | 13.3 | TLAf | ||||
| 7 | SCCAg | 1.763 | MLNLAf | 0.896 | TSAf | 0.048 | Cyfra211n | 12.9 | pGGOf | ||||
| 8 | mGGOf,o | 1.759 | Pleural indentationf | 0.836 | Cyfra211 | 0.038 | NSEp | 12.7 | SCCAg | ||||
| 9 | RMLf,q | –1.729 | Cardiovascular disease | 0.807 | SCCAg | 0.037 | MLNLAf | 11.6 | Lobulationf | ||||
| 10 | TSAf | 1.601 | Lobulationf | 0.725 | Lobulationf | 0.036 | SCCAg | 9.0 | TSAf | ||||
aLR: logistic regression.
bL2-LR: L2-logistic regression.
cRF: random forest.
dLGBM: LightGBM.
epGGO: pure ground glass opacity.
fFeatures recorded in computed tomography reports.
gCEA: carcinoembryonic antigen.
hCA125: carbohydrate antigen 12-5.
iMLNSA: mediastinal lymph node short axis.
jTLA: tumor long axis.
kMLNLA: mediastinal lymph node long axis.
lTSA: tumor short axis.
mSCCAg: squamous cell carcinoma antigen.
nCyfra211: cytokeratin 19-fragments.
omGGO: mixed ground glass opacity.
pNSE: neuron-specific enolase.
qRML: right middle lobe.
Performance of the multiturn question answering model and baseline models.
| Feature | BiLSTMa-pipeline | BERTb-pipeline | BERT-MTQAc | ||||||||
|
| Pd | Re | Ff | P | R | F | P | R | F | ||
| Tumor density | 0.882 | 0.625 | 0.732 | 0.889 | 0.667 | 0.762 | 0.938 | 0.938 | 0.938 | ||
| MLNLAg | 1.000 | 0.640 | 0.780 | 1.000 | 0.720 | 0.837 | 1.000 | 0.960 | 0.980 | ||
| TLAh | 0.967 | 0.892 | 0.928 | 0.984 | 0.938 | 0.961 | 0.984 | 0.954 | 0.969 | ||
| Lobulation | 0.889 | 0.533 | 0.667 | 0.909 | 0.667 | 0.769 | 1.000 | 0.867 | 0.929 | ||
| TSAi | 0.967 | 0.892 | 0.928 | 0.984 | 0.938 | 0.961 | 0.984 | 0.954 | 0.969 | ||
| MLNSAj | 1.000 | 0.750 | 0.857 | 1.000 | 0.750 | 0.857 | 1.000 | 0.938 | 0.968 | ||
| Pleural indentation | 0.931 | 0.818 | 0.871 | 0.964 | 0.818 | 0.885 | 1.000 | 0.848 | 0.918 | ||
| Tumor location | 0.984 | 0.897 | 0.938 | 0.968 | 0.897 | 0.931 | 0.985 | 0.985 | 0.985 | ||
| Spiculation | 1.000 | 0.727 | 0.842 | 1.000 | 0.773 | 0.872 | 1.000 | 1.000 | 1.000 | ||
| Vessel invasion | 1.000 | 0.111 | 0.200 | 1.000 | 0.222 | 0.364 | 1.000 | 0.556 | 0.714 | ||
| HLNLAk | 1.000 | 0.778 | 0.875 | 1.000 | 0.833 | 0.909 | 1.000 | 1.000 | 1.000 | ||
| HLNSAl | 1.000 | 0.750 | 0.857 | 1.000 | 0.750 | 0.857 | 1.000 | 1.000 | 1.000 | ||
| Average | 0.968 | 0.701 | 0.790 | 0.975 | 0.748 | 0.830 | 0.991 | 0.917 | 0.948 | ||
aBiLSTM: bidirectional long short-term memory.
bBERT: bidirectional encoder representations from transformers.
cMTQA: multiturn question answering.
dP: precision.
eR: recall.
fF: F1 score.
gMLNLA: mediastinal lymph node long axis.
hTLA: tumor long axis.
iTSA: tumor short axis.
jMLNSA: mediastinal lymph node short axis.
kHLNLA: hilar lymph node long axis.
lHLNSA: hilar lymph node short axis.
Performance of the multiturn question answering model for feature extraction.
| Feature | Accuracy | Precision | Recall | F1 score |
| Tumor density | 0.940 | 0.875 | 0.915 | 0.893 |
| MLNLAa | 0.965 | 0.927 | 0.927 | 0.927 |
| TLAb | 0.974 | 0.974 | 0.974 | 0.974 |
| Lobulation | 0.923 | 0.993 | 0.716 | 0.832 |
| TSAc | 0.972 | 0.972 | 0.972 | 0.972 |
| MLNSAd | 0.986 | 0.918 | 0.931 | 0.924 |
| Pleural indentation | 0.917 | 0.903 | 0.938 | 0.920 |
| Tumor location | 0.994 | 0.990 | 0.990 | 0.990 |
| Spiculation | 0.979 | 0.988 | 0.945 | 0.966 |
| Vessel invasion | 0.982 | 0.932 | 0.788 | 0.854 |
| HLNLAe | 0.965 | 1.000 | 0.811 | 0.896 |
| HLNSAf | 0.986 | 0.982 | 0.848 | 0.911 |
aMLNLA: mediastinal lymph node long axis.
bTLA: tumor long axis.
cTSA: tumor short axis.
dMLNSA: mediastinal lymph node short axis.
eHLNLA: hilar lymph node long axis.
fHLNSA: hilar lymph node short axis.
Figure 4Concordance correlation values between pN2 prediction models using complete and partial gold standard features. LR: logistic regression; L2-LR: L2-logistic regression; RF: random forest; LGBM: LightGBM; SVM: support vector machine; ANN: artificial neural network: NLP: natural language processing; pGGO: pure ground glass opacity; MLNLA: mediastinal lymph node long axis; TLA: tumor long axis; TSA: tumor short axis.