| Literature DB >> 17430562 |
Dietrich Rebholz-Schuhman1, Graham Cameron, Dominic Clark, Erik van Mulligen, Jean-Louis Coatrieux, Eva Del Hoyo Barbolla, Fernando Martin-Sanchez, Luciano Milanesi, Ivan Porro, Francesco Beltrame, Ioannis Tollis, Johan Van der Lei.
Abstract
BACKGROUND: The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to http://www.symbiomatics.org). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics.Entities:
Mesh:
Year: 2007 PMID: 17430562 PMCID: PMC1885847 DOI: 10.1186/1471-2105-8-S1-S18
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Number of bigrams in the BI and MI corpora.
| 2000 – 2005 | 1990 – 1999 | ||||||
| documents | bigrams | bigrams | bigrams | documents | bigrams | bigrams | |
| all | Df > 20 | Df > 20 | all | Df > 20 | |||
| emerging | |||||||
| BI journal corpus | 5,968 | 12,992 | 701 | 172 | 2,728 | 10,777 | 119 |
| BI query corpus | 90,082 | 50,248 | 15,406 | 4,666 | 52,574 | 33,438 | 8,862 |
| MI journal corpus | 3,330 | 8,604 | 257 | 15 | 2,979 | 8,569 | 186 |
| MI query corpus | 21,609 | 34,432 | 2,284 | 60 | 27,510 | 44,043 | 2,463 |
4 different sets of Medline abstracts were analyzed (ref. to text). All documents were categorized as recent documents (2000 – 2005) and past documents (1990 – 1999). From all documents bigrams were extracted from noun phrases (for details see text). The analysis was restricted to bigrams with document frequency of at least 20. In the set of recent documents we identified those bigrams that were not mentioned before 2000 ("emerging"). The BI journal corpus and the MI journal corpus are similar in terms of the document members and contained bigrams.
Overlap between the query and the journal corpora.
| BiQueryCorpus (142,656 docs) | MiQueryCorpus (49,119 docs) | |
| BiJournalCorpus (8,696 docs) | 3,837 | 731 |
| MiJournalCorpus (6,309 docs) | 215 | 3,925 |
The table displays the number of Medline abstracts contained in four corpora extracted from Medline (ref. to text). As expected there is a strong overlap between the BI journal corpus and BI query corpus and between the MI journal corpus and MI query corpus. The intersection between BI journal corpus and MI query corpus is small as well as the intersection between MI journal corpus and BI query corpus. This shows that the selection of the corpora based on the journal titles already leads to a selection of documents that represent information for the BI domain which is different from the MI domain. In the case of the BI journal corpus less than half of the documents are contained in the BI query corpus. This finding indicates that the query terms for the BI query corpus might be still too restrictive to cover the whole BI domain knowledge.
Figure 1Sample figure title. Distribution of the publications from the BI journal corpus and from the MI journal corpus over time. The number of publications continuously increased BI domain a continuous increase in publications in the used journals took place. In the MI field the number of publications fluctuated over time, which might be the result of conferences that did not take place every year. Relative low publication figures in 2005 are partly due to the fact that not all publications of 2005 have yet been incorporated into the Medline distribution. Nevertheless, the publication number in 2005 in the BI field showed already an increase in published articles, which could be the result of open access publishing.
Emerging bigrams in the BI journal corpus.
| Rank | Rank | Rank | Rank | Rank | Rank | ||||
| Df | emerging | 2000–2005 | 1990–1999 | Doc. Freq. | emerging | 2000–2005 | 1990–1999 | ||
| gene expression | 711 | 1 | 1 | microarray experiment | 184 | 2 | 22 | ||
| amino acid | 490 | 2 | not only | 181 | 23 | 15 | |||
| protein sequence | 438 | 3 | 2 | microarray data | 169 | 3 | 25 | ||
| expression datum | 339 | 4 | expression profile | 168 | 4 | 26 | |||
| sequence alignment | 321 | 5 | gene ontology | 135 | 5 | 37 | |||
| supplementary information | 321 | 6 | support vector | 133 | 6 | 38 | |||
| sequence alignment | 321 | 3 | vector machine | 130 | 7 | 41 | |||
| dna sequence | 313 | 7 | 4 | protein interaction | 99 | 8 | 62 | ||
| protein structure | 313 | 8 | 5 | whole genome | 80 | 9 | 74 | ||
| freely available | 306 | 9 | nucleotide polymorphism | 76 | 10 | 80 | |||
| binding site | 295 | 10 | 6 | cdna microarray | 73 | 11 | 83 | ||
| large number | 288 | 11 | 7 | microarray technology | 73 | 12 | 84 | ||
| microarray datum | 268 | 1 | 12 | microarray gene | 66 | 13 | 85 | ||
| neural network | 250 | 13 | 8 | data mining | 60 | 14 | 87 | ||
| secondary structure | 246 | 14 | 9 | interaction network | 60 | 15 | 88 | ||
| new method | 244 | 15 | 10 | ||||||
| data set | 236 | 16 | 11 | ||||||
| datum set | 224 | 17 | 12 | ||||||
| source code | 208 | 18 | 13 | ||||||
| markov model | 187 | 21 | 14 |
The table shows bigrams extracted from the BI journal corpus (col. 1) together with their document frequency (col. 2) and their ranks. The first rank refers to emerging bigrams (ref. to text, col. 3), the second rank is for bigrams with their highest document frequency during 2000–2005 (col. 4) and the last rank uses the highest document frequency during 1990–1999 (col. 5). The table shows that over the last five years new topics at a high frequency emerged.
Emerging bigrams in the MI journal corpus.
| information system | 899 | 1 | 1 | patient safety | 64 | 1 | 75 | ||
| health care | 881 | 2 | 2 | gene expression | 44 | 2 | 87 | ||
| decision support | 536 | 3 | 3 | medical error | 41 | 3 | 92 | ||
| medical record | 445 | 4 | 4 | digital assistant | 35 | 4 | 94 | ||
| patient record | 427 | 5 | 5 | personal digital | 35 | 5 | 95 | ||
| medical informatics | 397 | 6 | 6 | disease management | 31 | 6 | |||
| clinical information | 330 | 7 | 7 | open source | 28 | 7 | |||
| health information | 294 | 8 | provider order | 25 | 8 | ||||
| patient care | 285 | 9 | 8 | clinical documentation | 23 | 9 | |||
| support system | 284 | 10 | 9 | clinical document | 23 | 10 | |||
| electronic medical | 261 | 11 | support vector | 23 | 11 | ||||
| information technology | 245 | 12 | vector machine | 22 | 12 | ||||
| clinical practice | 210 | 13 | 10 | expression datum | 21 | 13 | |||
| medical information | 203 | 14 | study objective | 21 | 14 | ||||
| neural network | 203 | 11 | snomed ct | 20 | 15 | 100 | |||
| knowledge base | 198 | 15 | |||||||
| natural language | 196 | 12 | |||||||
| clinical datum | 194 | 13 | |||||||
| hospital information | 191 | 16 | 14 | ||||||
| electronic patient | 180 | 15 | |||||||
The table shows bigrams from the MI journal corpus (ref. to table 1 for details). The table shows that emerging topics played only a minor role in recent documents.
New bigrams in the BI journal corpus in recent years.
| protein background | 16 | false discovery | 41 | microarray datum | 268 |
| method conclusion | 12 | discovery rate | 40 | microarray experiment | 183 |
| annotation method | 11 | datum background | 40 | expression profile | 168 |
| dataset result | 11 | microarray study | 36 | microarray data | 161 |
| array cgh | 10 | text mining | 35 | gene ontology | 135 |
| protein localization | 10 | association study | 28 | support vector | 133 |
| organism database | 10 | r package | 26 | vector machine | 130 |
| ontology database | 10 | normalization method | 25 | protein interaction | 99 |
| biocreative task | 9 | multiple testing | 23 | nucleotide polymorphism | 76 |
| entity recognition | 9 | ontology term | 22 | cdna microarray | 73 |
| splicing event | 8 | go term | 21 | microarray technology | 73 |
| name recognition | 8 | gene list | 20 | microarray gene | 65 |
| lowess normalization | 8 | human protein | 20 | differential expression | 59 |
| anatomy ontology | 7 | biomedical text | 19 | open source | 54 |
| novo sequencing | 7 | complex disease | 19 | biological network | 50 |
| task 2 | 6 | microarray result | 18 | microarray analysis | 48 |
| task 1a | 6 | homo sapiens | 18 | widely used | 48 |
| venn diagram | 4 | named entity | 17 | gene selection | 46 |
| database identifier | 4 | synonymous codon | 16 | interaction datum | 37 |
| gene clustering | 16 | system biology | 34 | ||
| mammalian genome | 16 | interacting protein | 33 | ||
| bioinformatics analysis | 15 | alternative splicing | 32 | ||
| haplotype block | 14 | oligonucleotide microarray | 29 | ||
| go annotation | 13 | related gene | 27 | ||
| two dataset | 13 | web application | 27 | ||
| expression result | 13 | biological sample | 26 | ||
| marker gene | 12 | expression value | 23 | ||
| dimensionality reduction | 12 | primer design | 22 |
The table shows bigrams from the BI journal corpus that were new during the period 2004–2005 (col. 1 and 2), the period 2003–2004 (col. 3 and 4) and the period 2000–2001 (col. 5 and 6). All bigrams were selected and ranked according to their document frequency value (ref. to text), which had to be above 3. During the time 2000–2001 a large number of bigrams referring to microarray experiments emerged. "task 1a" and "task 2" are exclusively linked to BioCreAtive. "false discovery" refers to false discovery rate (FDR) in DNA microarray analysis.
New bigrams in the MI journal in recent years.
| grid technology | 10 | microarray experiment | 14 | medical error | 41 |
| computation time | 7 | microarray datum | 14 | open source | 28 |
| grid service | 6 | dna microarray | 13 | support vector | 23 |
| grid objective | 6 | Microarray data | 12 | vector machine | 22 |
| grid infrastructure | 5 | computerized provider | 10 | expression datum | 21 |
| respiratory syndrome | 5 | syndromic surveillance | 10 | snomed ct | 20 |
| gene selection | 5 | year 2013 | 10 | system conclusion | 19 |
| microarray gene | 4 | Semantic web | 10 | study background | 14 |
| secondary structure | 4 | setting method | 10 | patient method | 12 |
| hierarchical clustering | 4 | expression level | 9 | medication order | 12 |
| result conclusion | 9 | cpoe system | 11 | ||
| cpoe implementation | 8 | search tool | 11 | ||
| gene ontology | 8 | method conclusion | 11 | ||
| System functionality | 8 | patient empowerment | 10 | ||
| mobile phone | 7 | search filter | 10 | ||
| exclamation mark | 7 | partner healthcare | 10 | ||
| inverted exclamation | 7 | detection system | 10 | ||
| ubiquitous computing | 6 | intermountain health | 10 | ||
| online evidence | 6 | guideline element | 10 | ||
| health literacy | 6 | overall goal | 10 | ||
| expression profile | 6 | xml schema | 9 | ||
| Electronic prescribing | 6 | original study | 9 | ||
| wireless handheld | 5 | snomed clinical | 9 | ||
| pda use | 5 | exploratory study | 9 | ||
| digital pen | 5 | informatics method | 9 | ||
| computational modeling | 5 | hl7 rim | 9 | ||
| collaborative clinical | 5 | mesh thesaurus | 9 | ||
| evidence system | 4 | search method | 9 | ||
| online health | 8 | ||||
| Functional magnetic | 8 |
The table shows bigrams that were extracted from the MI journal corpus. Again all bigrams are mentioned first in the years 2000 to 2005 (ref. to Table 5). New technologies are: grid technology, tissue engineering, clinical bioinformatics, tissue microarray (tma) and TMA data exchange specification (TMA DES by the Association of Pathology Information, PMIDs 15871741 and 16086837), gene ontology and semantic Web. "Year 2013" refers to a set of publications related to the subject "Quo vadis Health care" (PMIDs 1245355{2,4,5,6,7,8,9}, 1245356{1,4,5}).