| Literature DB >> 34661112 |
Mayla R Boguslav1, Nourah M Salem2, Elizabeth K White3, Sonia M Leach3, Lawrence E Hunter1.
Abstract
MOTIVATION: Science progresses by posing good questions, yet work in biomedical text mining has not focused on them much. We propose a novel idea for biomedical natural language processing: identifying and characterizing the questions stated in the biomedical literature. Formally, the task is to identify and characterize statements of ignorance, statements where scientific knowledge is missing or incomplete. The creation of such technology could have many significant impacts, from the training of PhD students to ranking publications and prioritizing funding based on particular questions of interest. The work presented here is intended as the first step towards these goals.Entities:
Year: 2021 PMID: 34661112 PMCID: PMC8508177 DOI: 10.1093/bioadv/vbab012
Source DB: PubMed Journal: Bioinform Adv ISSN: 2635-0041
Fig. 1.Methods flowchart. A flowchart of the methods
Fig. 2.Classification flowchart. A flowchart of the different classification problems.
The ignorance taxonomy with definitions, knowledge goals, example cues, and total cue count.
| Ignorance category | Definition | Knowledge goal | Example cues | Total cues |
|---|---|---|---|---|
|
| A statement of a goal or objective of a study that is attempted or completed during the study. | To find the answer(s) in the article; determine if the question(s) is (are) fully answered in the article | Aim, goal, objective, our study, sought, to determine | 58 |
| Full unknown | A statement that indicates something is not known (a lack of information), or information is presented for the first time (new or novel) and a significant amount of research is needed; not a statement about the absence of something. | To explore the unknown further to gain any insights | Could not find, do not know, elusive, not…established, uncertain, still unclear | 137 |
| Explicit question | An explicit statement of inquiry (with a question mark or question word such as how, where, what, why). | To find answers to the question and/or discover methodologies that will help answer the question | ?, what, where, wondered, why | 17 |
| Incomplete evidence | A positive or negative statement proposing a possible/feasible explanation for a phenomenon on the basis of limited evidence as a starting point for further investigation OR a statement that information is needed to support an assertion or claim, including both positive and negative statements. Either a statement that some evidence already exists, explaining how current findings support previous work, adding confidence to a claim OR a statement that information is limited, more research is needed or is ongoing including limitations—biases or short comings related to the study design and execution. | To gather more evidence to support the claim OR conduct more research to determine the validity of the claim; complete the partial picture; consider the short comings and biases for the next experiment and how it can be addressed. | A good understanding, believe, evidence…limited, has been suggested, hypothesis, no studies, possibly, preliminary stage, remains under investigation, still being discovered, support, trend | 619 |
| Superficial relationship | A statement about a connection, link or association between at least two variables; connectedness between entities and/or interactions representing their relatedness or influence. | To confirm the connection, link or association between variables; determine the full underlying relationship between variables | Affect, associated, correlate, factor, influence, interact, link, pattern, tend | 133 |
| Probable understanding | A statement staking a claim to the most likely explanation, relationship, or phenomenon; assumes that there is a good chance this understanding is correct. | To determine if the most likely option is correct or if another option is more feasible | Almost all, assumed, concluding, evident, it is clear, most likely, thus | 119 |
|
| A statement of a surprising result, conclusion, observation or situation; the researchers were not expecting the result, conclusion, observation or situation but are intrigued by it. | To explore the surprising result, conclusion or situation more and determine if the result, conclusion, observation or situation is repeatable | Appeared to be, interestingly, noteworthy, surprisingly | 89 |
| Alternative options/controversy | Either an explicit statement of multiple (at least 2) choices, actions, approaches or methods that need to be experimentally determined, including statements with an implied second option, such as ‘whether’. This includes a statement of disagreement amongst researchers OR a lack of consensus OR at least two possible answers presented as results from different researchers—usually in reference to previous results and stated when results disagree with each other OR contradictions. | To determine the correct option or a better option and if there are disagreements, or to determine the truth to break any disagreements | Cannot rule out, claims, has been challenged, whether, whilst | 193 |
| Difficult task | A statement of something not easily done, accomplished, comprehended or solved; or a complicated thing with a multitude of underlying pieces or parts; heterogeneity; excludes medical complications. | To create methods to study the complicated system and to better understand any piece of the complicated system; potentially requires new experiments or better techniques | Not feasible, remains…challenge, variability, rarely able to | 69 |
| Problem/complication | A statement of issues, problems, mistakes or medical complications that are cause for anxiety and/or worry. | To determine the gravity of the concern and determine if it needs to be dealt with before the next experiment or study | Issue, error, insufficient, lack of reproducibility, publication bias, underestimated | 86 |
| Future work | A statement of extensions, including next steps, directions, opportunities, approaches or considerations of the described work that may be implemented at some future time point. This also includes a statement of suggestion or a proposal as to the next best course of action, especially one put forward by an authoritative body; advice telling someone the best action to take. | To determine the next course of action based on this future work proposal | Additional research, are needed, continue to explore, further study, more…studies, recommend, warrants, worthy of closer attention | 201 |
| Future prediction | A statement of extrapolation of given data into the future and/or from past observations, without reference to next steps. | To run the simulation or experiment to determine if the prediction is correct; publicize the outcomes of the study to the correct people | Allow, expect, if so, serve as a basis, will | 17 |
| Important consideration | A statement calling for attention including an action needed to be taken immediately or information that needs to be disseminated immediately OR critical: being in or verging on a state of crisis or emergency OR urgently needed OR absolutely necessary. | To take the urgent action ASAP or distribute the knowledge ASAP | Call for action, cautious, crucial, emphasis, global problem, high on the agenda, necessary, relevant to note, vital | 152 |
Note: The categories in bold are both broad and narrow categories.
Per article counts of annotations
| Category | # Total annotations | Average # annotations | Median # annotations | Maximum # annotations |
|---|---|---|---|---|
| Question answered by this work | 310 | 5.17 | 2 | 23 |
| Full unknown | 191 | 3.18 | 1 | 20 |
| Explicit question | 84 | 1.4 | 0.5 | 27 |
| Incomplete evidence | 3 628 | 60.47 | 39.5 | 330 |
| Superficial relationship | 1 953 | 32.55 | 18 | 161 |
| Probable understanding | 749 | 12.48 | 4.5 | 107 |
| Anomaly/curious finding | 501 | 8.35 | 4 | 39 |
| Alternative options/controversy | 933 | 15.55 | 7.5 | 94 |
| Difficult task | 164 | 2.73 | 1 | 25 |
| Problem/complication | 352 | 5.87 | 2.5 | 30 |
| Future work | 535 | 8.92 | 3.5 | 79 |
| Future prediction | 173 | 2.88 | 1 | 29 |
| Important consideration | 717 | 11.95 | 4 | 119 |
| All categories | 10 289 | 171.5 | 126.5 | 1 021 |
| Subject | 3 852 | 64.2 | 56.5 | 262 |
Notes: Total number of annotations in all articles and statistics per article. Note that all categories except for ALL CATEGORIES and SUBJECT (have 1) have zero minimum number of annotations.
Per article unique counts of annotations
| Category | # Total unique annotations | Average # unique annotations | Median # unique annotations | Maximum # unique annotations |
|---|---|---|---|---|
| Question answered by this work | 36 | 3.13 | 2 | 10 |
| Full unknown | 47 | 2.1 | 1 | 7 |
| Explicit question | 12 | 0.87 | 0.5 | 5 |
| Incomplete evidence | 268 | 25.28 | 22 | 84 |
| Superficial relationship | 84 | 10.88 | 9.5 | 45 |
| Probable understanding | 51 | 4.47 | 3 | 22 |
| Anomaly/curious finding | 43 | 3.85 | 2 | 14 |
| Alternative options/controversy | 77 | 6.17 | 4.5 | 24 |
| Difficult task | 28 | 1.85 | 1 | 11 |
| Problem/complication | 32 | 2.78 | 1 | 13 |
| Future work | 67 | 4.52 | 3 | 20 |
| Future prediction | 11 | 1.47 | 1 | 6 |
| Important consideration | 50 | 4.12 | 3 | 26 |
| All categories | 806 | 71.5 | 66.5 | 273 |
Notes: Total number of unique annotations in all articles and statistics per article. Note that all categories except for ALL CATEGORIES (has 1) have zero minimum number of unique annotations.
Per article annotation counts per section
| Section | # Total articles | # Total annotations | Average # annotations | Median # annotations | Minimum # annotations | Maximum # annotations |
|---|---|---|---|---|---|---|
| Abstract | 42 | 80 | 1.9 | 1 | 0 | 29 |
| Introduction | 55 | 1416 | 25.75 | 14 | 0 | 571 |
| Methods | 35 | 1403 | 40.09 | 30 | 1 | 367 |
| Results | 29 | 323 | 11.14 | 6 | 0 | 46 |
| Discussion | 31 | 1940 | 62.58 | 37 | 3 | 258 |
| Conclusion | 28 | 5127 | 183.11 | 107.5 | 4 | 990 |
Notes: Total number of annotations by section in all articles with section delineation and statistics per article.
Sentence classification both binary and all 13 categories
| Ignorance category | Training F1 score | Training support | Testing F1 score | Testing support |
|---|---|---|---|---|
| All categories binary | 0.97 | 3 390 | 0.85 | 377 |
| Question answered by this work | >0.99 | 1 965 | 0.89 | 109 |
| Full unknown | >0.99 | 2 223 | 0.90 | 123 |
| Explicit question | 0.99 | 2 782 | 0.84 | 155 |
| Incomplete evidence | >0.99 | 1 389 | 0.90 | 77 |
| Superficial relationship | >0.99 | 3 288 | 0.87 | 183 |
| Probable understanding | >0.99 | 2 179 | 0.86 | 121 |
| Anomaly/curious finding | >0.99 | 1 288 | 0.87 | 72 |
| Alternative options/controversy | >0.99 | 1 416 | 0.89 | 79 |
| Difficult task | >0.99 | 532 | 0.89 | 30 |
| Problem/complication | >0.99 | 988 | 0.81 | 55 |
| Future work | >0.99 | 1 270 | 0.88 | 71 |
| Future prediction | >0.99 | 489 | 0.94 | 27 |
| Important consideration | >0.99 | 1 677 | 0.82 | 93 |
Notes: Note that one sentence can map to more than one category and so they will not add up to the total binary.
Word classification both altogether, binary, and to all 13 categories
| Ignorance category | Training F1 score | Training support | Testing F1 score | Testing support |
|---|---|---|---|---|
| All categories binary | 0.95 | 11 552 | 0.93 | 1 210 |
| Question answered by this work | 0.91 | 474 | 0.83 | 51 |
| Full unknown | 0.85 | 299 | 0.8 | 28 |
| Explicit question | 0.98 | 68 | 0.9 | 15 |
| Incomplete evidence | 0.96 | 4 342 | 0.96 | 520 |
| Superficial relationship | 0.98 | 1 812 | 0.99 | 199 |
| Probable understanding | 0.96 | 753 | 0.95 | 73 |
| Anomaly/curious finding | 0.91 | 522 | 0.94 | 61 |
| Alternative options/controversy | 0.92 | 961 | 0.9 | 117 |
| Difficult task | 0.87 | 199 | >0.99 | 19 |
| Problem/complication | 0.93 | 415 | 0.94 | 36 |
| Future work | 0.91 | 716 | 0.87 | 84 |
| Future prediction | 0.97 | 180 | 0.94 | 18 |
| Important consideration | 0.98 | 737 | 0.97 | 85 |
| All categories combined | 0.84 | 9 416 | 0.85 | 1 073 |