| Literature DB >> 23227044 |
Anna Divoli1, Preslav Nakov, Marti A Hearst.
Abstract
Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances.Entities:
Year: 2012 PMID: 23227044 PMCID: PMC3514807 DOI: 10.1155/2012/750214
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Summary of the data used for the manual analysis.
| PubMed ID Of the target | Year of publication | Number of sentences analyzed | Number of annotations in | Number of papers the | ||
|---|---|---|---|---|---|---|
| Abstract | Citances | Abstract | Citances | |||
| 8939603 | 1996 | 17 | 51 | 192 | 728 | 27 |
| 11346650 | 2001 | 11 | 45 | 141 | 761 | 24 |
| 11125146 | 2000 | 8 | 10 | 91 | 144 | 10 |
| 11251070 | 2001 | 12 | 10 | 142 | 128 | 10 |
| 11298456 | 2001 | 9 | 10 | 146 | 178 | 9 |
| 11850621 | 2002 | 8 | 10 | 132 | 157 | 8 |
|
| ||||||
| All | 65 | 136 | 844 | 2096 | 88 | |
Categories used in the manual annotation.
| Categories | Description | Examples | MeSH Tree IDs |
|---|---|---|---|
| E (entities) | Genes and proteins | MCM, protein, ORC, Skp2 | D06, D08, D12, and D23.529 |
| F (function) | Biological function or process | Regulation, pathway, and function | G, F01, F02 |
| D (dependency) | Relationship type | Involve, cause | N/A |
| X (characteristic) | Modifier | Unstable, common, and ionizing | N/A |
| L (location) | Cellular or molecular part | C-terminal, cytosol, and motif | A |
| S (species) | Any taxonomic description | Human, mammal, and | B |
| T (time) | Temporal information | During, after, and following | N/A |
| M (exp methods) | Methods and their components | Recombination, transfect | E |
| H (chemicals) | Not including genes/proteins | DNA, thymidine, and phosphoryl | D (except: D06, D08, D12, and D23.529) |
| R (disorders) | Names and associated terms | Cancer, tumor, and patient | C, F03 |
| Special Types: | |||
| IDs with subtypes | Subtype of a BASIC type | Retain-change, common-distinct | |
| IDs with opposite | Opposite of a BASIC type | Cell cycle—G phase, CDK–CDK2 | |
| Complex IDs | Combination of BASIC types | Radio-resistant DNA synthesis |
Figure 1Example of an annotated citance. The citance is for PMID 11346650, demonstrating different categories of annotation (e.g., E, D; F; H…), subtypes (e.g., E1.64; L4.s; E2.2…), opposite concepts (e.g., F6.o), and complex IDs (e.g., L4.s.F6).
Figure 2Semantic annotation groups. This figure depicts all different annotation types associated with abstract sentences and with citances. The overlap and, where possible, the mapping of automatic and manual annotations categories are also shown. See also Table 2 for details on the mapping of MeSH IDs to categories from the manual annotation.
Figure 3Distribution (in %) of the manually annotated categories for abstracts and citances. Shown are results for all abstracts and for the one with PubMed ID 11346650.
Figure 4Number of unique concepts found in abstracts, all citances, and citances with 0 adjoining citations. Also shown is the overlap between all citances and abstracts.
Figure 5Unique Annotations found in abstracts, citances, and their overlap for the annotation categories defined only in the automatic analysis.
Comparison of the number of distinct annotation types in abstracts and citances with zero adjoining citations. We used all sentences from the 6 abstracts and all 23 citances that were only citing one paper for this analysis.
| PubMed ID | Abstract | Abstract and citances_0 | Difference | n/a | In full text | In MeSH or substances | Not found |
|---|---|---|---|---|---|---|---|
| 8939603 | 52 | 65 | 13 | 1 | 10 | 2 | |
| 11346650 | 52 | 75 | 23 | 3 | 14 | 6 | |
| 11251070 | 57 | 73 | 16 | 2 | 3 | 2 | 9 |
| 11298456 | 60 | 71 | 11 | 6 | 5 | ||
| 11850621 | 61 | 71 | 10 | 9 | 1 | ||
|
| |||||||
| Total | 282 | 355 | 73 | 6 | 42 | 2 | 25 |
Figure 6Categories of distinct manual annotation types not found in abstracts.
Number of citances with a different number of adjoining citations in each article and the number of distinct annotation types they contain. These statistics are for the manual analysis. For the automatic analysis, see Figure 4 and the supplementary material.
| PMID | Citance number | Distinct annotation types (abstract and citances) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| All cit. | Cit_0 | Cit_1 | Cit_2 | Cit_3 | Cit_4+ | All cit. | Cit_0 | Cit_1 | Cit_2 | Cit_3 | Cit_4+ | |
| 8939603 | 51 | 3 | 8 | 12 | 10 | 18 | 121 | 65 | 68 | 63 | 87 | 85 |
| 11346650 | 45 | 7 | 3 | 4 | 7 | 24 | 170 | 75 | 66 | 66 | 73 | 144 |
| 11125146 | 10 | 0 | 6 | 3 | 1 | 0 | 80 | 67 | 65 | 43 | ||
| 11251070 | 10 | 7 | 0 | 0 | 0 | 3 | 88 | 73 | 73 | |||
| 11298456 | 10 | 3 | 3 | 2 | 0 | 2 | 96 | 71 | 72 | 66 | 70 | |
| 11850621 | 10 | 3 | 4 | 1 | 0 | 2 | 98 | 71 | 76 | 67 | 71 | |
|
| ||||||||||||
| Total | 136 | 23 | 24 | 22 | 18 | 49 | 653 | 355 | 349 | 327 | 203 | 443 |
Figure 7The effect of time. We show the unique semantic categories mentioned in the citances from the same publication year as the original target paper and how they overlap with the semantic categories matched in the target abstracts. Semantic annotations and overlap with the abstract for the following 1, 2, 3, and 4+ years are also shown. Note that only new unique semantic annotations are counted, for example, annotations of “citances of year 2” do not include any annotations that already appeared in years 0 or 1.