| Literature DB >> 21385430 |
Yufan Guo1, Anna Korhonen, Maria Liakata, Ilona Silins, Johan Hogberg, Ulla Stenius.
Abstract
BACKGROUND: Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking.Entities:
Mesh:
Year: 2011 PMID: 21385430 PMCID: PMC3060841 DOI: 10.1186/1471-2105-12-69
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The three schemes
| S1 | Objective | The background and the aim of the research | |
|---|---|---|---|
| Method | The way to achieve the goal | ||
| Result | The principle findings | ||
| Conclusion | Analysis, discussion and the main conclusions | ||
| Background | The circumstances pertaining to the current work, situation, or its causes, history, etc. | ||
| Objective | A thing aimed at or sought, a target or goal | ||
| Method | A way of doing research, esp. according to a defined and regular plan; a special form of procedure or characteristic set of procedures employed in a field of study as a mode of investigation and inquiry | ||
| Result | The effect, consequence, issue or outcome of an experiment; the quantity, formula, etc. obtained by calculation | ||
| Conclusion | A judgment or statement arrived at by any reasoning process; an inference, deduction, induction; a proposition deduced by reasoning from other propositions; the result of a discussion, or examination of a question, final determination, decision, resolution, final arrangement or agreement | ||
| Related work | A comparison between the current work and the related work | ||
| Future work | The work that needs to be done in the future | ||
| Hypothesis | A statement that has not been yet confirmed rather than a factual statement | ||
| Motivation | The reason for carrying out the investigation | ||
| Background | Description of generally accepted background knowledge and previous work | ||
| Goal | The target state of the investigation where intended discoveries are made | ||
| Object | An entity which is a product or main theme of the investigation | ||
| Experiment | Experiment details | ||
| Model | A statement about a theoretical model or framework | ||
| Method | The means by which the authors seek to achieve a goal of the investigation | ||
| Observation | The data/phenomena recorded within an investigation | ||
| Result | Factual statements about the outputs of an investigation | ||
| Conclusion | Statements inferred from observations and results, relating to research hypothesis | ||
Figure 1An example of an abstract annotated manually according to S1 (A), S2 (B) and S3 (C).
Figure 2An example of the user test for S3 abstracts. The figure shows questions Q1 (A) and Q2 (B), respectively, and the scheme-annotated sentences useful for answering these questions.
The mapping between the questions in the CRA questionnaire and scheme categories
| S1 | S3 | |||
|---|---|---|---|---|
| Possible | Dominant | Possible | Dominant | |
(i) shows all the possible categories and (ii) shows the dominant categories. The latter were used in the user test.
The distribution of words and sentences in the scheme-annotated CRA corpus
| S1 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61483 | 39163 | 89575 | 35564 | Words | ||||||||
| 2145 | 1396 | 3203 | 1241 | Sentences | ||||||||
| 27% | 17% | 40% | 16% | Sentences | ||||||||
| 36828 | 23493 | 41544 | 89538 | 30752 | 2456 | 1174 | Words | |||||
| 1429 | 674 | 1473 | 3185 | 1082 | 95 | 47 | Sentences | |||||
| 18% | 8% | 18% | 40% | 14% | 1% | 1% | Sentences | |||||
| 2676 | 4277 | 28028 | 10612 | 15894 | 22444 | 1157 | 17982 | 17402 | 75951 | 29362 | Words | |
| 99 | 172 | 1088 | 294 | 474 | 805 | 41 | 637 | 744 | 2582 | 1049 | Sentences | |
| 1% | 2% | 14% | 4% | 6% | 10% | 1% | 8% | 9% | 32% | 13% | Sentences | |
Confusion matrix for inter-annotator agreement on the CRA corpus: linguist (L) vs. domain expert (E) - S1
| L/E | ||||
|---|---|---|---|---|
| 29 | 9 | 4 | ||
| 88 | 19 | 1 | ||
| 5 | 17 | 18 | ||
| 11 | 4 | 158 | ||
Confusion matrix for inter-annotator agreement on the CRA corpus: linguist (L) vs. domain expert (E) - S2
| L/E | |||||||
|---|---|---|---|---|---|---|---|
| 0 | 2 | 1 | 3 | 2 | 0 | ||
| 22 | 53 | 7 | 2 | 0 | 0 | ||
| 5 | 17 | 16 | 0 | 1 | 0 | ||
| 9 | 4 | 18 | 17 | 2 | 0 | ||
| 7 | 1 | 5 | 131 | 4 | 7 | ||
| 2 | 0 | 0 | 11 | 13 | 0 | ||
| 0 | 0 | 0 | 0 | 2 | 0 | ||
Confusion matrix for inter-annotator agreement on the CRA corpus: linguist (L) vs. domain expert (E) - S3
| L/E | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 12 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | ||
| 5 | 13 | 0 | 2 | 0 | 0 | 4 | 0 | 1 | 0 | ||
| 11 | 6 | 0 | 3 | 0 | 0 | 4 | 1 | 1 | 2 | ||
| 2 | 4 | 4 | 80 | 7 | 0 | 10 | 0 | 0 | 0 | ||
| 0 | 0 | 0 | 2 | 3 | 0 | 9 | 0 | 0 | 0 | ||
| 0 | 0 | 1 | 1 | 7 | 0 | 66 | 5 | 8 | 0 | ||
| 0 | 0 | 3 | 1 | 4 | 7 | 13 | 0 | 2 | 1 | ||
| 0 | 0 | 8 | 7 | 25 | 63 | 5 | 3 | 10 | 1 | ||
| 0 | 0 | 4 | 0 | 1 | 3 | 0 | 9 | 285 | 1 | ||
| 0 | 0 | 3 | 0 | 0 | 2 | 3 | 8 | 53 | 10 | ||
| 0 | 1 | 5 | 1 | 1 | 0 | 1 | 3 | 9 | 105 | ||
Association measures between schemes S1, S2, S3
| S1 vs S2 | S1 vs S3 | S2 vs S3 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| 5577.1 | 18 | 0 | 5363.6 | 30 | 0 | 6293.4 | 60 | 0 | |
| 6613.0 | 18 | 0 | 6371.0 | 30 | 0 | 8554.7 | 60 | 0 | |
| 0.842 | 0.837 | 0.871 | |||||||
| 0.901 | 0.885 | 0.725 | |||||||
Figure 3Comparison of the three schemes in terms of manual annotations. The figure shows pairwise interpretation of categories of one scheme in terms of the categories of the other: S2 to S1 mapping in A, S3 to S1 mapping in B and S3 to S2 mapping in C.
F-Measure results when using each individual feature alone
| a | b | c | d | e | f | g | h | i | j | k | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| .39 | .83 | .71 | .69 | .52 | .45 | .45 | .45 | .54 | .39 | - | ||
| - | .47 | .81 | .74 | .63 | .49 | - | .46 | .03 | .42 | .51 | ||
| - | .76 | .85 | .86 | .76 | .70 | .72 | .69 | .70 | .68 | .54 | ||
| - | .72 | .70 | .65 | .63 | .53 | .49 | .57 | .68 | .20 | - | ||
| .26 | .73 | .69 | .67 | .45 | .38 | .56 | .33 | .33 | .29 | - | ||
| - | .13 | .72 | .68 | .54 | .63 | - | .49 | .48 | .20 | - | ||
| - | .50 | .81 | .72 | .64 | .47 | - | .47 | .03 | .42 | .51 | ||
| - | .76 | .85 | .87 | .76 | .72 | .72 | .70 | .69 | .68 | .54 | ||
| - | .70 | .73 | .71 | .62 | .51 | .40 | .61 | .67 | .23 | - | ||
| - | - | - | - | - | - | - | - | - | - | - | ||
| - | - | - | - | - | - | - | - | - | - | - | ||
| - | - | - | - | .67 | - | - | - | - | - | - | ||
| .18 | .57 | .70 | .49 | .39 | .13 | .36 | .33 | .30 | .40 | - | ||
| - | - | .54 | .40 | .21 | - | - | .11 | .06 | .06 | - | ||
| - | - | .53 | .33 | .22 | - | .19 | .31 | - | .25 | - | ||
| - | - | .73 | .63 | .60 | .10 | - | .26 | .32 | - | - | ||
| - | .22 | .63 | .46 | .33 | .30 | - | .31 | .07 | .44 | .25 | ||
| - | - | - | - | - | - | - | - | - | - | - | ||
| - | - | .82 | .61 | .39 | .39 | - | .50 | - | .37 | - | ||
| - | .59 | .75 | .71 | .63 | .56 | .56 | .54 | .48 | .52 | .47 | ||
| - | - | .87 | .73 | .41 | .34 | - | .38 | .24 | .35 | - | ||
| - | .74 | .68 | .65 | .65 | .50 | .48 | .49 | .55 | .21 | - | ||
a-k: History, Location, Word, Bi-gram, Verb, Verb Class, POS, GR, Subj, Obj, Voice
F-Measure results using all the features and all but one of the features
| ALL | A | B | C | D | E | F | G | H | I | J | K | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| .90 | .89 | .87 | .92 | .90 | .90 | .91 | .91 | .91 | .92 | .91 | .88 | ||
| .80 | .81 | .80 | .80 | .79 | .81 | .79 | .80 | .80 | .80 | .81 | .81 | ||
| .88 | .90 | .88 | .90 | .88 | .90 | .88 | .88 | .88 | .89 | .89 | .90 | ||
| .86 | .85 | .82 | .87 | .88 | .90 | .90 | .88 | .89 | .88 | .88 | .90 | ||
| .91 | .94 | .90 | .90 | .93 | .94 | .94 | .91 | .93 | .94 | .92 | .94 | ||
| .72 | .78 | .84 | .78 | .83 | .88 | .84 | .81 | .83 | .84 | .78 | .83 | ||
| .81 | .83 | .80 | .81 | .80 | .85 | .80 | .78 | .81 | .81 | .82 | .83 | ||
| .88 | .90 | .88 | .89 | .88 | .91 | .89 | .89 | .90 | .90 | .90 | .89 | ||
| .84 | .83 | .77 | .83 | .86 | .88 | .86 | .87 | .88 | .89 | .88 | .81 | ||
| - | - | - | - | - | - | - | - | - | - | - | - | ||
| - | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | ||
| - | - | - | - | - | - | - | - | - | - | - | - | ||
| .82 | .84 | .80 | .76 | .82 | .82 | .83 | .78 | .83 | .83 | .83 | .83 | ||
| .59 | .60 | .60 | .54 | .67 | .62 | .62 | .59 | .61 | .61 | .62 | .61 | ||
| .62 | .67 | .67 | .62 | .71 | .62 | .67 | .43 | .67 | .67 | .67 | .62 | ||
| .88 | .85 | .83 | .74 | .83 | .85 | .83 | .74 | .83 | .83 | .83 | .85 | ||
| .72 | .68 | .72 | .53 | .65 | .70 | .72 | .73 | .74 | .74 | .72 | .68 | ||
| - | - | - | - | - | - | - | - | - | - | - | - | ||
| .87 | .86 | .87 | .66 | .85 | .89 | .87 | .88 | .86 | .86 | .87 | .86 | ||
| .82 | .81 | .84 | .72 | .80 | .82 | .81 | .80 | .82 | .82 | .81 | .81 | ||
| .87 | .87 | .88 | .74 | .87 | .86 | .87 | .86 | .87 | .87 | .87 | .88 | ||
| .88 | .88 | .82 | .88 | .83 | .87 | .87 | .84 | .87 | .88 | .87 | .86 | ||
A-K: History, Location, Word, Bi-gram, Verb, Verb Class, POS, GR, Subj, Obj, Voice
We have 1.0 for FUT in S2 probably because the size of the training data is just right, and the model doesn't over-fit the data. We make this assumption because we have 1.0 for almost all the categories on the training data, but only for FUT on the test data.
Baseline and best results for NB, SVM, CRF
| F-Measure | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| .29 | .23 | .23 | .39 | .18 | |||||||||
| .82 | .85 | .75 | .85 | .71 | |||||||||
| .89 | .90 | .81 | .90 | .90 | |||||||||
| .85 | .87 | .72 | .87 | .81 | |||||||||
| .25 | .13 | .08 | .22 | .40 | .13 | - | - | ||||||
| .76 | .79 | .25 | .70 | .83 | .66 | - | - | ||||||
| .90 | .94 | .88 | .85 | .91 | .88 | 1.0 | - | ||||||
| .85 | .92 | .69 | .77 | .88 | .75 | - | .33 | ||||||
| .15 | - | .10 | .06 | .04 | .06 | .11 | - | .13 | .24 | .15 | .17 | ||
| .53 | - | .56 | - | - | - | .30 | - | .32 | .61 | .59 | .62 | ||
| .81 | - | .82 | .62 | .62 | .85 | .70 | - | .89 | .82 | .86 | .87 | ||
| .71 | - | .74 | .49 | .72 | .67 | .59 | - | .58 | .71 | .56 | .82 | ||
Time measures for the user test
| Q1 | Q2 | Q3a | Q3b | Q3c | Q4 | Q5 | TOTAL | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15.3 | 9.4 | 8.7 | 4.4 | 8.9 | 9.4 | 13.3 | 69.5 | ||||||||||
| 27.1 | 18.8 | 15.5 | 8.5 | 14.8 | 13.5 | 18.0 | 116.1 | ||||||||||
| 19.3 | 12.0 | 17.9 | 4.9 | 9.6 | 18.4 | 20.9 | 102.9 | ||||||||||
| 15.3 | 0% | 7.4 | 22% | 6.2 | 28% | 4.2 | 5% | 5.8 | 35% | 7.4 | 21% | 11.9 | 11% | 58.2 | 16% | ||
| 17.0 | 37% | 7.9 | 58% | 8.5 | 45% | 4.8 | 43% | 6.8 | 54% | 9.9 | 27% | 13.1 | 27% | 67.9 | 42% | ||
| 15.8 | 18% | 6.4 | 47% | 8.8 | 51% | 3.8 | 21% | 5.8 | 40% | 12.5 | 32% | 12.5 | 40% | 65.6 | 36% | ||
| 13.1 | 15% | 7.9 | 17% | 5.6 | 35% | 3.9 | 11% | 5.7 | 36% | 6.4 | 32% | 11.9 | 11% | 54.5 | 22% | ||
| 15.9 | 41% | 8.9 | 53% | 7.2 | 53% | 4.7 | 45% | 7.8 | 47% | 6.1 | 55% | 12.0 | 33% | 62.6 | 46% | ||
| 15.4 | 20% | 5.9 | 51% | 8.5 | 53% | 3.8 | 22% | 6.9 | 29% | 11.8 | 36% | 11.4 | 45% | 63.7 | 38% | ||
| 15.0 | 3% | 9.4 | 1% | 6.3 | 28% | 4.2 | 5% | 7.1 | 20% | 7.4 | 22% | 12.5 | 6% | 61.8 | 11% | ||
| 18.4 | 32% | 12.4 | 34% | 9.4 | 39% | 8.1 | 5% | 8.2 | 44% | 6.7 | 50% | 14.2 | 21% | 77.5 | 33% | ||
| 18.5 | 4% | 12.3 | -3% | 13.9 | 22% | 6.3 | -29% | 8.6 | 11% | 12.8 | 31% | 12.9 | 38% | 85.3 | 17% | ||
| 13.0 | 16% | 8.3 | 12% | 6.6 | 24% | 4.9 | -11% | 6.5 | 27% | 6.8 | 28% | 11.5 | 14% | 57.6 | 17% | ||
| 23.9 | 12% | 14.5 | 23% | 11.4 | 26% | 7.8 | 8% | 10.1 | 32% | 7.2 | 47% | 15.3 | 15% | 90.2 | 22% | ||
| 17.1 | 11% | 12.0 | 0% | 15.1 | 16% | 4.8 | 1% | 8.3 | 14% | 11.9 | 35% | 15.8 | 24% | 84.9 | 17% | ||
The table shows the time it takes for the CRA experts (A, B and C) to answer the questions in the questionnaire (sample mean) and the percentage of time they save using scheme annotations.
Significance of the results in the previous table according to the Mann-Whitney U Test (p-value)
| Q1 | Q2 | Q3a | Q3b | Q3c | Q4 | Q5 | TOTAL | ||
|---|---|---|---|---|---|---|---|---|---|
| .594 | .058 | .247 | .081 | .109 | |||||
| .352 | |||||||||
| .192 | .076 | .146 | |||||||
| .074 | .096 | ||||||||
| .058 | |||||||||
| .369 | .574 | ||||||||
| .663 | .190 | .743 | .676 | .175 | .486 | .152 | |||
| .508 | .592 | .174 | .729 | .170 | .623 | .340 | |||
| .488 | .800 | .855 | .338 | .357 | .405 | .673 | .420 | ||
| .443 | .286 | .590 | .546 | .294 | .599 | .351 | .316 | ||
| .201 | .673 | .188 | .106 | .600 | .058 | ||||
| .394 | .053 | .900 | .538 | ||||||
| .677 | .350 | .315 | .094 | .102 | .720 | .719 | .458 | ||
| .144 | .356 | .542 | .058 | ||||||
| .253 | .052 | .341 | .579 | ||||||
| .600 | .331 | .627 | .118 | .709 | |||||
| .382 | .894 | .066 | |||||||
| .576 | .820 | .127 | .076 | .668 | |||||
| .285 | .118 | .704 | |||||||
| .232 | .747 | .073 | .919 | .107 | .144 | ||||
| .362 | .619 | .468 | .252 | .810 | .134 | ||||
| .695 | .589 | .458 | .873 | .750 | .107 | .141 | |||
| .075 | .600 | .358 | .601 | .260 | .873 | .474 | |||
| .693 | .898 | .291 | .377 | .732 | .898 | .127 | .907 | ||