Literature DB >> 29688372

Baseline and extensions approach to information retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016.

Artur Cieslewicz1, Jakub Dutkiewicz2, Czeslaw Jedrzejek2.   

Abstract

Database URL: https://biocaddie.org/benchmark-data.

Entities:  

Mesh:

Year:  2018        PMID: 29688372      PMCID: PMC5846287          DOI: 10.1093/database/bax103

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


Introduction

Biomedical research produces ever increasing amount of digital data, which is stored in a variety of formats and hosted in a multitude of different sites. These sites could be generated by original researchers, attached to journals as supplementary material, organized as datasets and kept in databases or repositories. The most common information source is literature in the form of indexed journals that in electronic form reside of Pubmed platform or publisher portals. The article format has its advantage—ease of reading. Articles contain mostly unstructured information that is hard to use specialized processing, comparison, aggregation and integration. Therefore, we need transformation of this information into more structured form that can be stored in databases, collection and repositories. This process requires development of useful data structures and indexing and extraction tools. Data is a set of values of qualitative or quantitative variables. Pieces of data are individual pieces of information. A dataset or collection of data often corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable. Generic ontologies and metadata models designed for description of datasets, supplement domain-specific ontologies to describe the research field. The enormous amount of biomedical literature, the existence of data of different granularity and data heterogeneity, as well as the lack of common metadata, makes it difficult to selectively access increasingly complex relevant information. As pointed out by (20), ‘A typical dataset available in, for instance, the gene expression repositories may contain a description, a list of keywords and a list of organisms. A typical dataset available in the protein structure repositories contains, in addition, a list of genes and a list of research articles’. Thus, a global pharmaceutical company, for instance, may need close to 30 different databases to complete a clinical study. These sources of data require recording provenance for datasets and data curation. Moreover, the data resulting from biomedical experiments often possess an implicit hierarchy (1). In terms of granularity needed for specific databases, a PubMed article needs to be decomposed into snippets which describe structured data markup. Snippets may be organized using a comprehensive data type ontology which will provide definitions of types of data (Protein, Phenotype, Gene Expression, Nucleotide Sequence, Clinical Trials, Imaging Data, Morphology, Proteomics Data, Physiological Signals, Epigenetic Data, Data from Papers, Omics Data, Survey Data, Cell Signalling and Unspecified). Snippets in different databases may often be found at different levels of a database schema. Since different types of metadata are of importance for given specialized databases, historically their schemas were developed independently, and do not conform to any standardized pattern. Since datasets are combination of structured and unstructured data, often presented in incompatible ways (e.g. the same information with different tags), using them in a complex processing can be quite difficult. Futhermore, a significant percentage of specific data that had been reported in clinical reports does not made its way into journals (2). Nevertheless, data needs to be compared and verified. Often, cost and utility considerations make it necessary to try a multi-sponsored clinical development approach termed Portfolio of Innovative Platform Engines, Longitudinal Investigations and Novel Effectiveness to generate a new hypothesis. In such environment (3), this need for shared collaborative data governance forces a use of integrated data—therefore, improving the effectiveness of retrieval is paramount to finding state-of-the-art methods of diagnosis, testing and treatment for individual patients. Existing platforms such as Google and PubMed serve their purpose providing an up-to-date sources of information with various additional functionalities but it is difficult to assess their effectiveness. Thus, the crucial aspect for addressing this complexity is the availability of annotated distributed datasets created by the scientific community, with which researchers can test the effectiveness of various approaches. That in turn leads to better data structures and indexes of various granularities. This can be achieved only within a shared task environment, which enables researchers from many different institutions to work together at solving important scientific problems. In the biomedical area, Text REtrieval Conference (TREC) and bioASQ have contributed the most towards achieving this goal. Collaboration occurs at multiple level: definition of test collections, task definition, evaluation and analysis of results. For the last several years, the National Institute of Standards of Technology’s the TREC has concentrated on finding the most relevant PubMed articles and clinical trial data in response to selected medical records within its clinical decision support (CDS) track evolving into Precision Medicine (4). In this context, the bioASQ (5) challenge concentrates mainly on the following broad tasks: bioASQ Task on Online Biomedical Semantic Indexing—classification of new PubMed documents into the MeSH hierarchy concepts. bioASQ Task on Biomedical Semantic query answering (QA) related to information retrieval and query answering—one of the most complex semantic tasks in natural language processing (NLP). Previous TREC CDS and earlier medical tracks and bioASQ challenges had many specific task orientations, data sources and retrieval conditions. For example, some TREC sources were either full publications or abstracts. The topics of a question could be electronic health records (EHR) admission notes curated by physicians. Notes could be of Diagnosis, Test and Treatment type. Notes could be much longer compared with concise bioCADDIE questions. Currently, the format for run submissions of TREC and bioCADDIE is the standard trec_eval format. The bioASQ contest shares a deep semantic approach to answer questions with bioCADDIE when word embeddings (WEs) are used for query expansion or within document vector framework. Based on these tasks, the Biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium, funded by the US National Institute of Health Big Data to Knowledge program, aims to empower researchers to find data the most efficient way and expand sources and types of data. These would include opinion on research on non-scientific portals (i.e. conversations about scholarly content) together with monitoring attention surrounding particular work (altmetric). BioCADDIE (6) has developed DataMed, a search engine prototype of Data Discovery Index (DDI), using the data tag suite (DATS) model to support the DataMed discovery index (7). This enables searching data of various types and formats (while maintaining a core set of elements), curated by separate institutions. DataMed based on ISA formatted metadata aims to facilitate the discovery of a digital object. At this time, DataMed has indexed close to 1 400 000 datasets drawn from 66 repositories (8). The bioCADDIE challenge concerned finding most relevant docnos (elements of datasets) in response to 15 questions provided by bioCADDIE experts. The structure of the questions followed the DataMed prototype idea of the rdf type of relations between entities (‘data type’ = w, ‘biological process’ = x, ‘species/organism’ = y and ‘phenotype’ = z) (9). The graph structure of a query suggests that if we also transformed documents into graph structure the matching process would be at the level of relations and not keywords. The aim of the 2016 bioCADDIE Challenge (9) was the retrieval of datasets from a collection that is relevant to the needs of biomedical researchers; the purpose was to facilitate the reutilization of collected data and enable the replication of published results. Such work is the focus of WG4 of the bioCADDIE consortium: Use Cases and Testing Benchmarks. The goal is to develop usability specifications/requirements and appropriate benchmarks with associated testing content for DataMed. To address this goal sections, later discuss the following aspects: The Related work section discusses the content of already published bioCADDIE articles The Methodology section presents the methods, algorithms and solutions prepared by our team, divided into following subsections: The Overview, describing the model of our information retrieval system The Collection, with information on the bioCADDIE datasets An Analysis of document structure and content, presenting the differences among various repositories A Selection of documents with valuable data for indexing, with the description of our algorithm evaluating whether a document is worth indexing An Index of data, including information of corpus preparation for indexation Query preprocessing Query expansion, describing the methods chosen to expand the query Information retrieval and evaluation, with information on the retrieval platform The Results and discussion section is divided into the following subsections: Selection of the optimal baseline system Query expansion Further analysis The Conclusions and future work section summarizes the main outcome of the article

Related work

At present, details of bioCADDIE Challenge systems exist for selected contributions. Apart from standard similar preprocessing similar to that presented in this work, processing can be divided into advanced preprocessing, retrieval and re-ranking. The University of California San Diego (UCSD) team that obtained the top infNDCG result (9) implemented a two-step ‘retrieval plus re-ranking’ strategy (10). Based on this idea, they developed a method to find the Google top 10 returned documents and then transformed these documents into queries for relevant datasets. This strategy was used by East China University in their winning contribution to TREC CDS 2015 (11). Their baseline was Elastic search (a Lucene-based search engine that is part of a DataMed technology). The Elastic search top 5000 retrieved datasets were re-ranked based on the concatenated documents using the pseudo sequential dependence (PSD) model (12). The best run used the PSD-allwords model. UCSD used the concept matching formula with Dirichlet smoothing, with weights based on the annotated dataset repository. In contrast to the original algorithm in (12), an actual term frequency was increased by a constant = 5. UCSD found (as we do) that neither ordered nor unordered bigrams have improved performance. We would like to point out that the UCSD results presented in (10) do not exactly match the official results (9). Elsevier (13) used two approaches: word embeddings and ontology-based indexing (queries and data sources were tagged with named entities from MeSH and Entrez Gene) with indexing and search platform Apache Solr. For WEs, fastText (14) gave better results than word2Vec (15) and GloVe (16) that we both used. FastText, based on a skip-gram model, uses character n-grams and smaller windows that translate to better WEs for query expansion. Elsevier used an additional advanced modification of queries: Abbreviated species names were expanded to full names (e.g. M to Mus). Greek characters were replaced with English spelling. It has been noted in (13), for example, that for ‘glycolysis’ (a word that does not appear in the bioCADDIE questions), the word2Vec model returned ‘tca_cycle’, ‘mitochondria_remodelling’ and ‘reroute’. FastText delivered more reasonable similar words/phrases. For example, for the phrase ‘glycolysis’, the top three similar phrases returned by fastText were ‘gluconeogenesis’, ‘glycolytic’ and ‘glycolytic_pathway’. However, it is well-known that WE methods are extremely sensitive to a training corpus (we used the PubMed abstracts). With word2vec, we obtained the following most similar words (characterized by the cosine similarity measure) to ‘glycolysis’: [(gluconeogenesis, 0.804), (glycogenolysis, 0.797), (glyconeogenesis, 0.771), (gluconeogenic, 0.751), (glycogen, 0.7405), (lipogenesis, 0.738), (ureogenesis, 0.738), (glycogenic, 0.738), (ketogenesis, 0.737) and (glycogenolytic, 0.734)]. Elsevier obtained the best result with Elsevier four run modified queries (all additional modifications) + concept expansion + multi-phase execution; Search: Apache Solr, stemmed index) but only 2% better than their baseline. SIBTex (17) divided query terms into non-relevant, relevant and key, assigning larger weights to key relevant terms compared with relevant terms. This is the same strategy that we used for expanded terms. Universal protein resource (UniProt) was used to constrain query and datasets to a set of 14 biomedical topics (18). They used the Gensim word2vec library (as we did) for finding expansion candidates. Their best run SIBTex 3 was achieved with a baseline + query expansion with weighted terms + results categorization in the post-processing phase. OHSU assumed a variable number and relative weighting of MeSH terms for query expansion in the work after the challenge. Additional runs determined the optimal number of MeSH terms and weighting. Their best overall score used five MeSH terms with a 1:5 terms: words weighting ratio (19). This is the same ratio we used in our best run when query expanded terms are derived from word2vec. The University of Melbourne, UM (20) provided useful determination of appearance of most important metadata in bioCADDIE used repositories. This information could be helpful for determination whether a query term belongs to a concept expressed by metadata or using weights for answers coming from different repositories. UM applied transformation of the initial query into a multi-field query that is then enriched with terms that are likely to occur in the relevant datasets.

Methodology

The overview

The information retrieval process, we used was divided into four steps: Analysis of repositories structure and their information content. Selection of the optimal baseline system. Selection of optimal possible system extension. Optimization of parameters of the complete system. The model of the system developed to generate information retrieval for the bioCADDIE challenge includes the following elements: Preparation of database with valuable information from datasets Indexing of data collection Query preprocessing Preparation of two vector space models based on data from bioCADDIE datasets and PubMed abstracts Query expansion with the use of prepared vector space models and pseudo-relevance feedback (PRF) (provided by Terrier) Information retrieval by the Terrier engine Evaluation of the results.

The collection

The bioCADDIE corpus was a collection of metadata (structured and unstructured) from biomedical datasets generated from a set of 20 individual repositories (Table 1). A total of 794 992 XML documents were made available for use from the set of indices that was frozen from the DataMed backend on 24 March 2016 (21). Data in each document was organized into the following tags:
Table 1.

Characteristics of the collection

RepositoryDescription of repositoryNumber of documentsNumber of different json key patterns within  <METADATA> tagNumber of documents with valid
TitleKeywordsDescription
arrayexpressData from high-throughput functional genomics experiments60 8811760 817060 804
bioprojectCollection of genomics, functional genomics and genetics studies and links to their resulting datasets155 85041155 631117 577149 399
ciaArchive of cancer imaging data63144063
clinicaltrialsCollection of data concerning publicly- and privately supported clinical studies of human participants conducted around the world192 5005518192 486138 983191 934
ctnRepository of data from National Drug Abuse Treatment Clinical Trials Network461464446
cvrgCardioVascular Research Grid29529028
dataverseOpen-source research data repository software60 303760 037060 303
dryadGeneral-purpose database for wide diversity of databases67 4559862 79560 95758 421
gemmaDatabase for genomics data (especially gene expression profiles)22851227202285
geoDatasets focused on gene expression105 033496 2640105 033
mpdCollection of measured data on laboratory mouse strains and populations23512350235
neuromorphoCollection of digitally reconstructed neurons associated with peer-reviewed publications34 082130 016034 082
nursadatasetsRepository of data on the role of nuclear receptors (NRs) in human diseases and conditions in which NRs play an integral role3892389387389
openfmriCollection of magnetic resonance imaging data36135036
pdbDatabase with protein aminoacid sequences113 4931410113 424113 492113 331
peptideatlasPublic compendium of peptides identified in mass spectrometry proteomics experiments76155076
phenodiscoRepository of data from studies investigating the interaction of genotype and phenotype in Humans42914290429
physiobankThe archive containing digital recordings of physiologic signals and related data70170070
ProteomexchangeMass spectrometry proteomics data17161170617161716
YpedOpen-source proteomics database for high throughput proteomic and small molecule data21121021
Total794 9927113776 801433 156778 701

In many documents, certain data are missing or are removed as uninformative. Of all documents, 97.71% had valid title, 54.49% keywords and 97.95% description. In total, 99.98% had no valid title, keywords or description.

: document number, : document title,</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span><REPOSITORY>: biomedical repository used to generate document,</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span><METADATA>: various data from the repository presented in json format.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Characteristics of the collection</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>In many documents, certain data are missing or are removed as uninformative. Of all documents, 97.71% had valid title, 54.49% keywords and 97.95% description. In total, 99.98% had no valid title, keywords or description.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h2><span>Analysis of document structures and their information content </span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button> </h2> <pxy><span>Each repository uses different <span class="Disease">json schema</span> to organize data. Moreover, in some cases a variation was noted within the same repository (Table 1).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>To prepare a text corpus for indexing, tags and keys with potentially valuable information were selected and their values were exported to the SQL database. The data was then assigned to one of three categories: Title, Keywords or generalized Description. For one of the repositories (geo), the generalized description contained additional text data, obtained from geo database online resources, based on the ‘geo_accesion’ code found in the metadata (Table 2).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <div class="xtable"><div class="fig"><b>Table 2.</b><p><span>Preparation of text data for title, keywords and description categories</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p><table xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" frame="hsides" rules="groups"><thead><tr><th rowspan="2" colspan="1">Repository</th><th colspan="3" rowspan="1">Categories<hr></th></tr><tr><th rowspan="1" colspan="1">Title</th><th rowspan="1" colspan="1">Keywords</th><th rowspan="1" colspan="1">Description</th></tr></thead><tbody><tr><td rowspan="1" colspan="1">Arrayexpress</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">description</td></tr><tr><td rowspan="1" colspan="1">Bioproject</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1">dataItemkeywords</td><td rowspan="1" colspan="1">organismtargetspecies, dataItemdescription</td></tr><tr><td rowspan="1" colspan="1">Cia</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">anatomicalPartname, diseasename, organismname, organismscientificname</td></tr><tr><td rowspan="1" colspan="1">Clinicaltrials</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1">keyword</td><td rowspan="1" colspan="1">criteria, StudyGroupdescription, Diseasename, Treatmentdescription, Treatmentagent, Datasetdescription</td></tr><tr><td rowspan="1" colspan="1">Ctn</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1">datasetkeywords</td><td rowspan="1" colspan="1">datasetdescription, organismscientificName, organismname</td></tr><tr><td rowspan="1" colspan="1">Cvrg</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">datasetdescription</td></tr><tr><td rowspan="1" colspan="1">Dataverse</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">publicationdescription, datasetdescription</td></tr><tr><td rowspan="1" colspan="1">Dryad</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1">datasetkeywords</td><td rowspan="1" colspan="1">datasetdescription</td></tr><tr><td rowspan="1" colspan="1">Gemma</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">dataItemdescription, organismcommonName</td></tr><tr><td rowspan="1" colspan="1">Geo</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">dataItemsource_name, dataItemorganism, dataItemdescription, text data downloaded from geo database on the basis of the geo_accesion code</td></tr><tr><td rowspan="1" colspan="1">Mpd</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">datasetdescription, organismscientificName, organismname</td></tr><tr><td rowspan="1" colspan="1">Neuromorpho</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">anatomicalPartname, cellname, organismscientificName, organismname</td></tr><tr><td rowspan="1" colspan="1">Nursadatasets</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1">datasetkeywords</td><td rowspan="1" colspan="1">datasetdescription, organismname</td></tr><tr><td rowspan="1" colspan="1">Openfmri</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">datasetdescription</td></tr><tr><td rowspan="1" colspan="1">Pdb</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1">dataItemkeywords</td><td rowspan="1" colspan="1">dataItemdescription, organismsourcescientificName, organismhostscientificName, genename</td></tr><tr><td rowspan="1" colspan="1">Peptideatlas</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">datasetdescription, treatmentdescription</td></tr><tr><td rowspan="1" colspan="1">Phenodisco</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">inexclude, desc, disease, history</td></tr><tr><td rowspan="1" colspan="1">Physiobank</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">datasetdescription</td></tr><tr><td rowspan="1" colspan="1">Proteomexchange</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1">keywords</td><td rowspan="1" colspan="1">organismname</td></tr><tr><td rowspan="1" colspan="1">Yped</td><td rowspan="1" colspan="1">title</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">datasetdescription, organismname</td></tr></tbody></table><p><span>Items in the table represent column names from the SQL database (prepared on the basis of documents’ JSON keys). In most cases, more than one column was used to prepare text categorized as Description. Thirteen repositories did not provide any keywords.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p></div></div><pxy><span>Preparation of text data for title, keywords and description categories</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Items in the table represent column names from the SQL database (prepared on the basis of documents’ JSON keys). In most cases, more than one column was used to prepare text categorized as Description. Thirteen repositories did not provide any keywords.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h2><span>Selection of documents with valuable data for indexing </span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button> </h2> <pxy><span>Because documents from some repositories (e.g. dryad, geo) contained very little useful information (see examples in Table 3), we decided to assess if a document’s content is worth indexing using MeSH. MeSH, which stands for ‘Medical Subject Headings’, is a vocabulary thesaurus used by the National Library of Medicine (NLM) to index articles stored in PubMed (22).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <div class="xtable"><div class="fig"><b>Table 3.</b><p><span>Examples of datasets having very little useful information</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p><table xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" frame="hsides" rules="groups"><thead><tr><th rowspan="1" colspan="1">No.</th><th rowspan="1" colspan="1">Docu-ment number</th><th rowspan="1" colspan="1">Repository</th><th rowspan="1" colspan="1">Title</th><th rowspan="1" colspan="1">Keywords</th><th rowspan="1" colspan="1">Description</th></tr></thead><tbody><tr><td rowspan="1" colspan="1">1</td><td rowspan="1" colspan="1">104242</td><td rowspan="1" colspan="1">dryad</td><td rowspan="1" colspan="1">chr19</td><td rowspan="1" colspan="1">NULL</td><td rowspan="1" colspan="1">NULL</td></tr><tr><td rowspan="1" colspan="1">2</td><td rowspan="1" colspan="1">108196</td><td rowspan="1" colspan="1">dryad</td><td rowspan="1" colspan="1">Chr8</td><td rowspan="1" colspan="1">NULL</td><td rowspan="1" colspan="1">NULL</td></tr><tr><td rowspan="1" colspan="1">3</td><td rowspan="1" colspan="1">124757</td><td rowspan="1" colspan="1">bioproject</td><td rowspan="1" colspan="1">Sobemovirus</td><td rowspan="1" colspan="1">NULL</td><td rowspan="1" colspan="1">NULL</td></tr><tr><td rowspan="1" colspan="1">4</td><td rowspan="1" colspan="1">151909</td><td rowspan="1" colspan="1">bioproject</td><td rowspan="1" colspan="1">Alphaflexiviridae</td><td rowspan="1" colspan="1">NULL</td><td rowspan="1" colspan="1">NULL</td></tr><tr><td rowspan="1" colspan="1">5</td><td rowspan="1" colspan="1">500000</td><td rowspan="1" colspan="1">geo</td><td rowspan="1" colspan="1">A375R_RPL10a_vivo__ Ronly_vem10d_rep2</td><td rowspan="1" colspan="1">NULL</td><td rowspan="1" colspan="1">melanoma</td></tr><tr><td rowspan="1" colspan="1">6</td><td rowspan="1" colspan="1">500002</td><td rowspan="1" colspan="1">geo</td><td rowspan="1" colspan="1">A375_vitro_vehicle_rep3</td><td rowspan="1" colspan="1">NULL</td><td rowspan="1" colspan="1">melanoma</td></tr></tbody></table><p><span>NULL means that in the source file there was no information that could be categorized as ‘Title’, ‘Keywords’ or ‘Description’.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p></div></div><pxy><span>Examples of datasets having very little useful information</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>NULL means that in the source file there was no information that could be categorized as ‘Title’, ‘Keywords’ or ‘Description’.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>At this point it does matter whether we use words or lemmatized words, so we chose to remain with the former. Terrier tokenises a query and documents so various <span class="Disease">word</span> forms are treated exactly the same. In WE methodology various words forms represent different elements of space but when these words became expanded terms only the token form count.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>For each category (Title, Keywords or Description) of each record (from the previously prepared SQL database), a score was calculated according to the following heuristic algorithm that removes meaningless records before indexing (i.e. shown in Table 3):</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Let X represent the total number of words in the record.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Let Y1 represent the number of words which are recognized as English words.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Let Y2 represent the number of words which are not recognized as English words (e.g. ‘<span class="Gene">MIP-2</span>’, ‘<span class="Gene">CD69</span>’ and ‘LDLR’) but are recognized as MeSH words found in MeSH terms (descriptors and their synonyms in the MeSH database).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>If there are no words (e.g. the document is lacking keywords) set the Score to −1.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Calculate the Score = (Y1 + Y2)/X.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>For Title and Description categories, if the Score is >0 and X is >2, take the record for indexing.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>For the Keywords category, if the Score is >0 (as follows from the condition at Step 4), take the record for indexing.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The Descriptor/Concept/Term structure makes it possible to attach various data elements in MeSH to the appropriate object. This sentence is directly taken from https://www.nlm.nih.gov/mesh/concept_structure.html. <span class="Disease">Word</span> (linguistic notion), term (appears in a query), data element (part of a taxonomy structure) differ in context—here they are used in the meaning of <span class="Disease">word</span>.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>MeSH terms used in the previous algorithm were prepared according to the following procedure:</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>‘Descriptor’, ‘Substances with pharmacologic action’, ‘Qualifiers’ and ‘Supplementary records’ files were downloaded from the MeSH database website (20).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Words were collected from specific tags depending on the file (<DescriptorName> and <ConceptList> tree from ‘desc2017.xml’; <DescriptorName> and <PharmacologicalActionSubstanceList> tree from ‘pa2017.xml’; <QualifierName> and <ConceptList> tree from ‘qual2017.xml’; <SupplementalRecordName> and <ConceptList> tree from ‘suppl2017.xml’).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>For each <span class="Disease">word</span> characters such as ‘,’, “(‘or’)” were removed.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The list of words was reduced by removing duplicates of each <span class="Disease">word</span>.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The resulting list of unique terms consisted of 479 545 words.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h2><span>Indexing of data </span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button> </h2> <pxy><span>After the removal of documents without valuable data, the text corpus for indexing was prepared in the form of an xml file, with the content of every document placed within a DOC tag (a format required by Terrier). Such prepared text corpora were tokenized and indexed by the Terrier 4.2 engine (8).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h2><span>Query preprocessing </span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button> </h2> <pxy><span>The queries were provided as natural language sentences, containing a lot of noise words. To improve the retrieval, stop-words and common non-informative phrases (e.g. ‘find’, ‘data’ and ‘related to’) were removed from each query.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h2><span>Query expansion </span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button> </h2> <pxy><span>To expand the queries, we used WEs, choosing the word2vec algorithm (15). Two vector space models were calculated the first based on the corpus from the bioCADDIE collection and the second utilizing the much larger text corpus based on PubMed article abstracts. Calculated vectors were then used to find the words most similar to query terms. To enable setting the different weights for original and expanded query terms, the query was not passed through the tokenizer (class SingleLineTRECQuery).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Additional query expansion was carried out by the Terrier engine in the form of PRF utilizing the Rocchio algorithm.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h2><span>Information retrieval and evaluation </span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button> </h2> <pxy><span>Information retrieval was done using the Terrier 4.2 platform. The results were then evaluated using the qrel file provided by the challenge organizers.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h1>Results and discussion</h1> <pxy><span>The complexity and fragmentation of the repositories made it difficult to index the data. For the original challenge, due to lack of time and inexperience of our team with DataMed, the data was not fully indexed and we achieved a poor result, shown in Table 4 (9).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <div class="xtable"><div class="fig"><b>Table 4.</b><p><span>Original Poznan consortium results as submitted for the challenge vs. the best participant results for a given evaluation measure (in bold font)</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p><table xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="top" span="1"><col align="left" valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="center" valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="center" valign="top" span="1"></colgroup><thead><tr><th rowspan="1" colspan="1">Group</th><th rowspan="1" colspan="1">Submission</th><th rowspan="1" colspan="1">infAP</th><th rowspan="1" colspan="1">infNDCG</th><th rowspan="1" colspan="1">NDCG@10</th><th rowspan="1" colspan="1">P@10 (+partial)</th><th rowspan="1" colspan="1">P@10 (−partial)</th></tr></thead><tbody><tr><td rowspan="1" colspan="1">IAII_PUT</td><td rowspan="1" colspan="1">Biocaddie dphresults.txt</td><td rowspan="1" colspan="1">0.0876</td><td rowspan="1" colspan="1">0.3580</td><td rowspan="1" colspan="1">0.4265</td><td rowspan="1" colspan="1">0.5333</td><td rowspan="1" colspan="1">0.1600</td></tr><tr><td rowspan="1" colspan="1">UCSD</td><td rowspan="1" colspan="1">armyofucsdgrads-3.txt</td><td rowspan="1" colspan="1">0.1468</td><td rowspan="1" colspan="1"><bold>0.5132</bold></td><td rowspan="1" colspan="1">0.5303</td><td rowspan="1" colspan="1">0.7133</td><td rowspan="1" colspan="1">0.2400</td></tr><tr><td rowspan="1" colspan="1">SIBTex</td><td rowspan="1" colspan="1">sibtex-5_0.txt</td><td rowspan="1" colspan="1"><bold>0.3664</bold></td><td rowspan="1" colspan="1">0.4188</td><td rowspan="1" colspan="1">0.6271</td><td rowspan="1" colspan="1">0.7533</td><td rowspan="1" colspan="1">0.3467</td></tr><tr><td rowspan="1" colspan="1">Elsevier</td><td rowspan="1" colspan="1">elsevier4.txt</td><td rowspan="1" colspan="1">0.3049</td><td rowspan="1" colspan="1">0.4368</td><td rowspan="1" colspan="1"><bold>0.6861</bold></td><td rowspan="1" colspan="1"><bold>0.8267</bold></td><td rowspan="1" colspan="1"><bold>0.4267</bold></td></tr><tr><td rowspan="1" colspan="1">UIUC GSIS</td><td rowspan="1" colspan="1">sdm-0.75-0.1-0.15.krovetz.txt</td><td rowspan="1" colspan="1">0.3228</td><td rowspan="1" colspan="1">0.4502</td><td rowspan="1" colspan="1">0.5569</td><td rowspan="1" colspan="1">0.7133</td><td rowspan="1" colspan="1">0.2867</td></tr><tr><td rowspan="1" colspan="1">BioMelb</td><td rowspan="1" colspan="1">Post-challenge</td><td rowspan="1" colspan="1">0.3575</td><td rowspan="1" colspan="1">0.4219</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">0.7733</td><td rowspan="1" colspan="1"></td></tr><tr><td rowspan="1" colspan="1"><italic>Poznan—this work</italic></td><td rowspan="1" colspan="1"><italic>LGD word2vec and Terrier Rocchio</italic></td><td rowspan="1" colspan="1"><bold><italic>0.3978</italic></bold></td><td rowspan="1" colspan="1"><italic>0.4539</italic></td><td rowspan="1" colspan="1"><italic>0.6375</italic></td><td rowspan="1" colspan="1"><italic>0.7700</italic></td><td rowspan="1" colspan="1"><italic>0.4000</italic></td></tr></tbody></table><p><span>The results of the current Poznan consortium work are shown in italics.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p></div></div><pxy><span>Original Poznan consortium results as submitted for the challenge vs. the best <span class="Species">participant</span> results for a given evaluation measure (in bold font)</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The results of the current Poznan consortium work are shown in italics.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Having made modifications of our system, our present results are much better. Application of our algorithm for selection of documents with valuable data for the indexing revealed that 97.71% of documents had ‘Title’ assessed as valid for indexing (see Table 1 for details). A similar value was observed for ‘Description’ (97.95%). Only slightly more than half of documents (54.49%) had valid keywords (this was mainly due to the fact that in many datasets keywords were not present). One hundred and fifty-five datasets were assessed as having no valid ‘Title’, ‘Keywords’ and ‘Description’. Only one of them was present in the qrels file (dataset no. 5322) and was marked as ‘non-judged’ (−1).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h2><span>Selection of the optimal baseline system </span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button> </h2> <pxy><span>Our selection of Terrier (23)—the open-source search engine written in Java—was motivated by its maturity and its use of state-of-the-art retrieval weighting models and techniques that can be used to index large collection of various documents.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>In particular, some of the notable weighting models implemented include Okapi BM25 (best matching model), term frequency inversed document frequency (TFIDF) and a whole group of Divergence From <span class="Disease">Randomness</span> Framework, DFR [mostly originating in (24)]. DFR models have their origin in information theory (Amati, Encyclopedia). A <span class="Disease">word</span> that is randomly distributed according to some distribution in documents is not informative, whereas a <span class="Disease">word</span> that does not obey this distribution conveys information. The models were obtained by representing the three components of the framework: selecting a basic <span class="Disease">randomness</span> model, applying the first normalization and normalizing the term frequencies with respect to the document-length. In this work, the so-called Normalization 2 was applied with the hyper-parameter c = 1.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The following divergence from <span class="Disease">randomness</span> (DFR) models were used based on Terrier (DFR Framework, http://terrier.org/docs/<span class="Gene">v3.5</span>/dfr_description.html):</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span><span class="Gene">BB2</span> (Bernoulli–Einstein model with Bernoulli after-effect and Normalization 2),</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>DFR version of BM25, DFree (parameter-free DFR model),</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>DLH and its improved version DLH13 (parameter-free DFR model, assuming hypergeometric term frequency distribution),</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span><span class="Chemical">DPH</span> (parameter-free hypergeometric model with Popper’s normalization),</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>IFB2 (inverse term frequency model with Bernoulli after-effect and Normalization 2),</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>ExpB2 (inverse expected document frequency model with Bernoulli after-effect and Normalization 2; it uses logarithm Base 2),</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>In_ExpC2 (same as the previous one but with logarithm base e),</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>InL2 (inverse document frequency model with Laplace after-effect and Normalization 2),</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>LGD (a log-logistic model for information retrieval) (23, 25) and</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>PL2 (Poisson model with Laplace after-effect and Normalization 2).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>We direct a reader to the original source (26) for complex model formulas. So far, it has not been demonstrated theoretically why some of these models perform better than others.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Another valuable feature implemented in Terrier is PRF query expansion—a mechanism allowing for extraction of n most informative terms from m top ranked documents (ranking created in the first search run) which are then added to the original query in the second retrieval rank. Terrier provides both parameter-free (Bose–Einstein 1; Bose–Einstein 2; Kullback–Leibler) and parameterized (Rocchio) models for query expansion (27). The Rocchio feedback approach was developed using the vector space model. The modified vectors are moved in a direction closer or farther away, from the original query depending on whether documents, are related or non-related.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>In recent work (28), several leading systems were evaluated within the Open-Source Information Retrieval (IR) Reproducibility Challenge for the Gov2 test collection to select the best DFR variant. Among the options was Terrier 4.0 with <span class="Chemical">DPH</span> ranking function, which is a hypergeometric parameter-free model from the Divergence from <span class="Disease">Randomness</span> family of functions (8). The query expansion version—the ‘<span class="Chemical">DPH</span>  +  Bo1 QE’ uses PRF, which is known to find potentially relevant terms by first querying the index and looking for new terms in high-ranking documents. Specifically, 10 terms are added from three PRF documents.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Research by in (28) found that the ‘<span class="Chemical">DPH</span>  +  Bo1 QE’ run of Terrier 4.0 was statistically significantly better than all other runs including Terrier’s BM25 run, with all other differences not significant. In particular, it was 0.04 better compared with the <span class="Chemical">Lucene</span>-based solutions for the mean average precision (MAP) at 1000 measure. We corroborated this finding with the relatively successful Poznan University of Technology (PUT) TREC CDS 2016 contribution (29), where Terrier <span class="Chemical">DPH</span> Bo1 was used, and the data consisted of a subset of the PubMed articles.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The baseline information retrieval results are presented in Table 5. Fourteen weighting models implemented in Terrier were tested, with the log-logistic DFR model providing the best infNDCG.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <div class="xtable"><div class="fig"><b>Table 5.</b><p><span>Baseline information retrieval results</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p><table xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"></colgroup><thead><tr><th rowspan="1" colspan="1">Algorithm</th><th rowspan="1" colspan="1">infAP</th><th rowspan="1" colspan="1">infNDCG</th><th rowspan="1" colspan="1">P@10 (+partial)</th><th rowspan="1" colspan="1">P@10 (−partial)</th></tr></thead><tbody><tr><td rowspan="1" colspan="1">BB2</td><td rowspan="1" colspan="1">0.3550</td><td rowspan="1" colspan="1">0.4184</td><td rowspan="1" colspan="1">0.7133</td><td rowspan="1" colspan="1">0.3533</td></tr><tr><td rowspan="1" colspan="1">BM25</td><td rowspan="1" colspan="1">0.3547</td><td rowspan="1" colspan="1">0.4055</td><td rowspan="1" colspan="1">0.7067</td><td rowspan="1" colspan="1">0.3400</td></tr><tr><td rowspan="1" colspan="1">DFR_BM25</td><td rowspan="1" colspan="1">0.3723</td><td rowspan="1" colspan="1">0.4085</td><td rowspan="1" colspan="1">0.7067</td><td rowspan="1" colspan="1">0.3533</td></tr><tr><td rowspan="1" colspan="1">Dfree</td><td rowspan="1" colspan="1">0.3664</td><td rowspan="1" colspan="1">0.4248</td><td rowspan="1" colspan="1"><bold>0.7533</bold></td><td rowspan="1" colspan="1"><bold>0.4067</bold></td></tr><tr><td rowspan="1" colspan="1">DLH</td><td rowspan="1" colspan="1">0.3617</td><td rowspan="1" colspan="1">0.4120</td><td rowspan="1" colspan="1">0.7200</td><td rowspan="1" colspan="1">0.3333</td></tr><tr><td rowspan="1" colspan="1">DLH13</td><td rowspan="1" colspan="1">0.3640</td><td rowspan="1" colspan="1">0.4207</td><td rowspan="1" colspan="1">0.7533</td><td rowspan="1" colspan="1">0.3733</td></tr><tr><td rowspan="1" colspan="1">DPH</td><td rowspan="1" colspan="1">0.3442</td><td rowspan="1" colspan="1">0.4125</td><td rowspan="1" colspan="1">0.7200</td><td rowspan="1" colspan="1">0.3400</td></tr><tr><td rowspan="1" colspan="1">IFB2</td><td rowspan="1" colspan="1">0.3494</td><td rowspan="1" colspan="1">0.3948</td><td rowspan="1" colspan="1">0.6853</td><td rowspan="1" colspan="1">0.3400</td></tr><tr><td rowspan="1" colspan="1">In_ExpB2</td><td rowspan="1" colspan="1">0.3534</td><td rowspan="1" colspan="1">0.4079</td><td rowspan="1" colspan="1">0.7222</td><td rowspan="1" colspan="1">0.3667</td></tr><tr><td rowspan="1" colspan="1">In_ExpC2</td><td rowspan="1" colspan="1">0.3379</td><td rowspan="1" colspan="1">0.4015</td><td rowspan="1" colspan="1">0.7367</td><td rowspan="1" colspan="1">0.3333</td></tr><tr><td rowspan="1" colspan="1">InL2</td><td rowspan="1" colspan="1"><bold>0.3791</bold></td><td rowspan="1" colspan="1">0.4181</td><td rowspan="1" colspan="1">0.7367</td><td rowspan="1" colspan="1">0.3600</td></tr><tr><td rowspan="1" colspan="1">LGD</td><td rowspan="1" colspan="1">0.3773</td><td rowspan="1" colspan="1"><bold>0.4355</bold></td><td rowspan="1" colspan="1">0.7333</td><td rowspan="1" colspan="1">0.3933</td></tr><tr><td rowspan="1" colspan="1">PL2</td><td rowspan="1" colspan="1">0.3474</td><td rowspan="1" colspan="1">0.4009</td><td rowspan="1" colspan="1">0.7222</td><td rowspan="1" colspan="1">0.3067</td></tr><tr><td rowspan="1" colspan="1">TFIDF</td><td rowspan="1" colspan="1">0.3530</td><td rowspan="1" colspan="1">0.4120</td><td rowspan="1" colspan="1">0.7067</td><td rowspan="1" colspan="1">0.3400</td></tr></tbody></table><p><span>Bold font indicates the highest values for a given measure.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p></div></div><pxy><span>Baseline information retrieval results</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Bold font indicates the highest values for a given measure.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>For the Biocaddie data, which are not continuous data, surprisingly the best results for infNDCG were achieved with LGD, not <span class="Gene">BB2</span> (<span class="Chemical">DPH</span> Bo1), which provides the best results for infAP and <span class="Gene">P@10</span>. These results could not had been predicted before the evaluation of the Challenge results. Therefore, for original challenge our results could have been 0.02 lower in comparison to what we present now.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Our baseline results compare quite favourably with the best original baseline bioCADDIE teams’ results in spite of the fact that no advanced preprocessing was used. The best Terrier option LGD gives the infNDCG value 0.4355, compared with UCSD 0.4498 (official bioCADDIE evaluation)/0.433 (10), and Elsevier’ 0.4292 (13), UIUC GSIS 0.4207, SIBTex 0.3898 (17).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Expanding queries by adding potentially relevant terms is a common practice in improving relevance in IR systems. There are many methods of query expansion. Relevance feedback takes the documents on top of a ranking list and adds terms appearing in these document to a new query. In this work, we use the idea to add synonyms and other similar terms to query terms before the PRF. This type of expansion can be divided into two categories. The first category involves the use of ontologies or lexicons (relational knowledge). In biomedical area UMLS, MeSH (22), SNOMED-CT, ICD-10, WordNet and Wikipedia are used (30). Generally, the result of lexicon type expansion is positive (in the bioCADDIE contest see for example (19, 20)). We did not use this method in our work because of lack of access to MeSH medical text indexer service. The second category is WE, i.e. word2vec—mapping a <span class="Disease">word</span> on a corresponding vector. This belongs to a class of distributional semantics, feature learning techniques in natural language processing. Such language modelling derives <span class="Disease">word</span> space from linguistic items in context. Space with one dimension per <span class="Disease">word</span> is transformed to a continuous vector space with much lower dimension. Meaning is obtained by defining a distance measure between vectors corresponding to lexical entities (here words). In the WE query expansion methods, terms are added to a query based on their similarity to original query terms. Goodwin and Harabagiu (31) used the skip-gram word2vec method for query expansion with negative effect compared with the baseline, as we did for TREC CDS (29).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Analysis of the effects of query expansion is difficult, as stressed in (32). There, it was shown that various methods gave very different top expansion terms in response to a query ‘foreign minorities Germany in Google (as of April 2009)’. The methods were automatic-query expansion, mutual information, local context analysis Rocchio, binary independent model, Chi-square, Robertson selection value, Kullback–Leibler and relevance model. Only the binary independent model, Chi-square and Kullback–Leibler gave ‘frisians’ and sorbs ‘2’ as the top two expanded terms. Some of the methods got none of the intended correct terms among the first eight expanded terms.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>In this work, we used MeSH only for filtering, so that query expansion terms stayed in the medical domain. The query was expanded with most similar terms obtained from a collection of PubMed Biomedical journal citations (titles and abstracts) and from the Biocaddie data challenge collection. Similarity was calculated for each dataset using word2vec, an efficient model allowing for learning vector representations of words from unstructured text data (15) with the following parameters:</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>PubMed collection: number of dimensions = 100; window size = 5; minimum <span class="Disease">word</span> count = 10; this resulted in the collection of 1 498 219 words;</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>BioCaddie collection: number of dimensions = 100; window size = 20; minimum <span class="Disease">word</span> count = 5; this resulted in the collection of 296 503 words.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>A similarity threshold was set to 0.9 for vectors generated from PubMed abstracts and 0.8 for vectors calculated on the basis of bioCADDIE datasets (lower values resulted in dissimilar query terms).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>As in (29) and (31), if queries are expanded with WE obtained terms and added to a list of query terms with the same weight as the original terms, the results, in general, get worse, because a query drift is introduced. In Question 9 (question pertains to ‘ob’ and <span class="Species">Mus musculus</span>), adding terms such as ‘<span class="Species">mouse</span>’ or ‘<span class="Species">mice</span>’ to a question does not improve the result.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The most important result of this work is observation that the results improve if query expanded terms are given a much smaller weight than the original terms.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The weight of original query terms was set to 100, terms obtained from PubMed to 20 and terms provided with bioCADDIE embeddings to 1. This is justified by the relative smallness of the bioCADDIE dataset.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>In (26), we used MeSH not only for filtering but also for query expansion, with positive results. For the purpose of this work, we use MeSH only for filtering because the free access interface was discontinued.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>We tried query expansion with WE using two approaches:</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The skip-gram method (15) on abstracts of the entire PubMed using Gensim library (33).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The Glove method (16) on free TREC 2016 PubMed documents.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>In our case, vectors obtained from word2vec and Glove were quite different, and in case of Glove gave negative results (data not shown). However, this may be related to the relative smallness of the corpora used. We plan to extend the current work to larger corpora (e.g. 34) for neural network training.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>We focused on the Terrier Rocchio method optimizing the beta parameter, a number of top documents and a number of extracted terms to obtain an optimal infNDCG result. For the same conditions, the Rocchio query expansion method slightly outperforms the Terrier parameter-free expansion method Bo1 http://terrier.org/docs/<span class="Gene">v3.5</span>/javadoc/org/terrier/matching/models/queryexpansion/Bo1.html). For LGD with word2vec, the difference is 0.0049. For infAP the reverse occurs—the parameter-free expansion slightly outperforms Rocchio by 0.0034.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Terrier PRF was configured to use the Rocchio algorithm with the following parameters: number of top documents used for query expansion = 2; number of terms extracted from each document = 2; beta parameter for Rocchio algorithm = 0.5.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The results of information retrieval with expanded query are presented in Table 6. Once again, LGD was found to provide the best infNDCG measure. The percentage-wise gain obtained by the query expansion over the baseline result is a little over 4%, smaller than achieved in (29). However, the bioCADDIE data have quite irregular structure (some data types missing in many documents), and this might make a difference.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <div class="xtable"><div class="fig"><b>Table 6.</b><p><span>Baseline information retrieval results with the best word2vec query expansion and PRF</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p><table xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="top" span="1"><col align="left" valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"></colgroup><thead><tr><th rowspan="1" colspan="1">Algorithm</th><th rowspan="1" colspan="1">Run parameters</th><th rowspan="1" colspan="1">infAP</th><th rowspan="1" colspan="1">indNDCG</th><th rowspan="1" colspan="1">P@10 (+partial)</th><th rowspan="1" colspan="1">P@10 (−partial)</th></tr></thead><tbody><tr><td rowspan="1" colspan="1">BB2</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3911</td><td rowspan="1" colspan="1">0.4325</td><td rowspan="1" colspan="1"><bold>0.7900</bold></td><td rowspan="1" colspan="1">0.3200</td></tr><tr><td rowspan="1" colspan="1">BB2</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1"><bold>0.4001</bold></td><td rowspan="1" colspan="1">0.4533</td><td rowspan="1" colspan="1"><bold>0.7900</bold></td><td rowspan="1" colspan="1">0.3200</td></tr><tr><td rowspan="1" colspan="1">BM25</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3719</td><td rowspan="1" colspan="1">0.4158</td><td rowspan="1" colspan="1">0.7067</td><td rowspan="1" colspan="1">0.3200</td></tr><tr><td rowspan="1" colspan="1">BM25</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3601</td><td rowspan="1" colspan="1">0.4286</td><td rowspan="1" colspan="1">0.6933</td><td rowspan="1" colspan="1">0.3200</td></tr><tr><td rowspan="1" colspan="1">DFR_BM25</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3883</td><td rowspan="1" colspan="1">0.4066</td><td rowspan="1" colspan="1">0.7214</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">DFR_BM25</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3801</td><td rowspan="1" colspan="1">0.4311</td><td rowspan="1" colspan="1">0.7267</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">Dfree</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3910</td><td rowspan="1" colspan="1">0.4371</td><td rowspan="1" colspan="1">0.7500</td><td rowspan="1" colspan="1">0.3667</td></tr><tr><td rowspan="1" colspan="1">Dfree</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3888</td><td rowspan="1" colspan="1">0.4454</td><td rowspan="1" colspan="1">0.7567</td><td rowspan="1" colspan="1">0.3733</td></tr><tr><td rowspan="1" colspan="1">DLH</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3683</td><td rowspan="1" colspan="1">0.4181</td><td rowspan="1" colspan="1">0.7400</td><td rowspan="1" colspan="1">0.3000</td></tr><tr><td rowspan="1" colspan="1">DLH</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3604</td><td rowspan="1" colspan="1">0.4292</td><td rowspan="1" colspan="1">0.7400</td><td rowspan="1" colspan="1">0.3000</td></tr><tr><td rowspan="1" colspan="1">DLH13</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3759</td><td rowspan="1" colspan="1">0.4324</td><td rowspan="1" colspan="1">0.7733</td><td rowspan="1" colspan="1">0.3467</td></tr><tr><td rowspan="1" colspan="1">DLH13</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3692</td><td rowspan="1" colspan="1">0.4422</td><td rowspan="1" colspan="1">0.7733</td><td rowspan="1" colspan="1">0.3467</td></tr><tr><td rowspan="1" colspan="1">DPH</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3779</td><td rowspan="1" colspan="1">0.4194</td><td rowspan="1" colspan="1">0.7500</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">DPH</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3751</td><td rowspan="1" colspan="1">0.4276</td><td rowspan="1" colspan="1">0.7567</td><td rowspan="1" colspan="1">0.3200</td></tr><tr><td rowspan="1" colspan="1">IFB2</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3669</td><td rowspan="1" colspan="1">0.4005</td><td rowspan="1" colspan="1">0.7233</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">IFB2</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3813</td><td rowspan="1" colspan="1">0.4284</td><td rowspan="1" colspan="1">0.7367</td><td rowspan="1" colspan="1">0.3067</td></tr><tr><td rowspan="1" colspan="1">In_ExpB2</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3720</td><td rowspan="1" colspan="1">0.4108</td><td rowspan="1" colspan="1">0.7433</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">In_ExpB2</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3816</td><td rowspan="1" colspan="1">0.4330</td><td rowspan="1" colspan="1">0.7433</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">In_ExpC2</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3720</td><td rowspan="1" colspan="1">0.3999</td><td rowspan="1" colspan="1">0.7367</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">In_ExpC2</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3672</td><td rowspan="1" colspan="1">0.4157</td><td rowspan="1" colspan="1">0.7367</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">InL2</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1"><bold>0.4001</bold></td><td rowspan="1" colspan="1">0.4259</td><td rowspan="1" colspan="1">0.7533</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">InL2</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3902</td><td rowspan="1" colspan="1">0.4360</td><td rowspan="1" colspan="1">0.7467</td><td rowspan="1" colspan="1">0.3200</td></tr><tr><td rowspan="1" colspan="1">LGD</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3990</td><td rowspan="1" colspan="1">0.4456</td><td rowspan="1" colspan="1">0.7633</td><td rowspan="1" colspan="1">0.3867</td></tr><tr><td rowspan="1" colspan="1">LGD</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3978</td><td rowspan="1" colspan="1"><bold>0.4539</bold></td><td rowspan="1" colspan="1">0.7700</td><td rowspan="1" colspan="1"><bold>0.3933</bold></td></tr><tr><td rowspan="1" colspan="1">PL2</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3648</td><td rowspan="1" colspan="1">0.4082</td><td rowspan="1" colspan="1">0.7467</td><td rowspan="1" colspan="1">0.2800</td></tr><tr><td rowspan="1" colspan="1">PL2</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3542</td><td rowspan="1" colspan="1">0.4213</td><td rowspan="1" colspan="1">0.7467</td><td rowspan="1" colspan="1">0.2800</td></tr><tr><td rowspan="1" colspan="1">TFIDF</td><td rowspan="1" colspan="1">terrier Rocchio</td><td rowspan="1" colspan="1">0.3641</td><td rowspan="1" colspan="1">0.4023</td><td rowspan="1" colspan="1">0.7317</td><td rowspan="1" colspan="1">0.3133</td></tr><tr><td rowspan="1" colspan="1">TFIDF</td><td rowspan="1" colspan="1">word2vec and terrier Rocchio</td><td rowspan="1" colspan="1">0.3523</td><td rowspan="1" colspan="1">0.4154</td><td rowspan="1" colspan="1">0.7250</td><td rowspan="1" colspan="1">0.3133</td></tr></tbody></table><p><span>Bold font indicates the highest values for a given measure.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p></div></div><pxy><span>Baseline information retrieval results with the best word2vec query expansion and PRF</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Bold font indicates the highest values for a given measure.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h2><span>Further analysis </span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button> </h2> <pxy><span>To better understand the results, we did evaluation for individual questions (Table 7) for our best result: LGD with query expanded with word2vec and Terrier PRF. Strikingly, the highest value of measure is for Question 15 (for which, similar to Question 7 no Score 2 of evaluation was assigned).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <div class="xtable"><div class="fig"><b>Table 7.</b><p><span>Variation of measures for each bioCADDIE question</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p><table xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"></colgroup><thead><tr><th rowspan="1" colspan="1">Query number</th><th rowspan="1" colspan="1">infAP</th><th rowspan="1" colspan="1">infNDCG</th><th rowspan="1" colspan="1">P@10 (partial)</th></tr></thead><tbody><tr><td rowspan="1" colspan="1">1</td><td rowspan="1" colspan="1">0.4217</td><td rowspan="1" colspan="1">0.6504</td><td rowspan="1" colspan="1">0.9000</td></tr><tr><td rowspan="1" colspan="1">2</td><td rowspan="1" colspan="1">0.3933</td><td rowspan="1" colspan="1">0.3338</td><td rowspan="1" colspan="1">0.8000</td></tr><tr><td rowspan="1" colspan="1">3</td><td rowspan="1" colspan="1">0.5832</td><td rowspan="1" colspan="1">0.6898</td><td rowspan="1" colspan="1">0.9000</td></tr><tr><td rowspan="1" colspan="1">4</td><td rowspan="1" colspan="1">0.6999</td><td rowspan="1" colspan="1">0.5177</td><td rowspan="1" colspan="1">1.0000</td></tr><tr><td rowspan="1" colspan="1">5</td><td rowspan="1" colspan="1">0.1620</td><td rowspan="1" colspan="1">0.2897</td><td rowspan="1" colspan="1">0.4000</td></tr><tr><td rowspan="1" colspan="1">6</td><td rowspan="1" colspan="1">0.3256</td><td rowspan="1" colspan="1">0.4938</td><td rowspan="1" colspan="1">1.0000</td></tr><tr><td rowspan="1" colspan="1">7</td><td rowspan="1" colspan="1">0.1931</td><td rowspan="1" colspan="1">0.6197</td><td rowspan="1" colspan="1">0.2500</td></tr><tr><td rowspan="1" colspan="1">8</td><td rowspan="1" colspan="1">0.0856</td><td rowspan="1" colspan="1">0.4547</td><td rowspan="1" colspan="1">0.3000</td></tr><tr><td rowspan="1" colspan="1">9</td><td rowspan="1" colspan="1">0.2207</td><td rowspan="1" colspan="1">0.2607</td><td rowspan="1" colspan="1">0.8000</td></tr><tr><td rowspan="1" colspan="1">10</td><td rowspan="1" colspan="1">0.1186</td><td rowspan="1" colspan="1">0.1961</td><td rowspan="1" colspan="1">0.5000</td></tr><tr><td rowspan="1" colspan="1">11</td><td rowspan="1" colspan="1">0.6373</td><td rowspan="1" colspan="1">0.3402</td><td rowspan="1" colspan="1">1.0000</td></tr><tr><td rowspan="1" colspan="1">12</td><td rowspan="1" colspan="1">0.5860</td><td rowspan="1" colspan="1">0.4011</td><td rowspan="1" colspan="1">0.9000</td></tr><tr><td rowspan="1" colspan="1">13</td><td rowspan="1" colspan="1">0.3171</td><td rowspan="1" colspan="1">0.2919</td><td rowspan="1" colspan="1">0.9000</td></tr><tr><td rowspan="1" colspan="1">14</td><td rowspan="1" colspan="1">0.7005</td><td rowspan="1" colspan="1">0.3300</td><td rowspan="1" colspan="1">0.9000</td></tr><tr><td rowspan="1" colspan="1">15</td><td rowspan="1" colspan="1">0.5228</td><td rowspan="1" colspan="1">0.9384</td><td rowspan="1" colspan="1">1.0000</td></tr><tr><td rowspan="1" colspan="1">Average</td><td rowspan="1" colspan="1">0.3978</td><td rowspan="1" colspan="1">0.4539</td><td rowspan="1" colspan="1">0.7700</td></tr></tbody></table></div></div><pxy><span>Variation of measures for each bioCADDIE question</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Further analysis of which particular databases carry information gain is required. For example, neuromorpho provided 11% of the contribution to infNDCG measure, although it constitutes <5% of data volume.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Table 8 presents the details of run options for the LGD algorithm using the same or different weights for original and expanded terms and shows that expansion terms should not have the same weight as original terms.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <div class="xtable"><div class="fig"><b>Table 8.</b><p><span>Evaluation of search results obtained with the LGD algorithm using the same or different weights for original and expanded terms</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p><table xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" frame="hsides" rules="groups"><thead><tr><th rowspan="1" colspan="1">Run</th><th rowspan="1" colspan="1">infAP</th><th rowspan="1" colspan="1">infNDCG</th><th rowspan="1" colspan="1">NDCG@10</th><th rowspan="1" colspan="1">P@10 (+partial)</th><th rowspan="1" colspan="1">P@10 (−partial)</th></tr></thead><tbody><tr><td rowspan="1" colspan="1">Separate words; terms added manually; same weight of all terms</td><td rowspan="1" colspan="1">0.2896</td><td rowspan="1" colspan="1">0.3329</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">0.6656</td><td rowspan="1" colspan="1"></td></tr><tr><td rowspan="1" colspan="1">Separate words; terms added manually; original query words weight = 100</td><td rowspan="1" colspan="1">0.3922</td><td rowspan="1" colspan="1">0.4525</td><td rowspan="1" colspan="1"></td><td rowspan="1" colspan="1">0.7633</td><td rowspan="1" colspan="1"></td></tr><tr><td rowspan="1" colspan="1">Terms from query as separate words without query expansion</td><td rowspan="1" colspan="1">0.3773</td><td rowspan="1" colspan="1">0.4355</td><td rowspan="1" colspan="1">0.6375</td><td rowspan="1" colspan="1">0.7333</td><td rowspan="1" colspan="1">0.4000</td></tr><tr><td rowspan="1" colspan="1">Terms from query as separate words; Terrier query expansion (PRF)</td><td rowspan="1" colspan="1">0.3990</td><td rowspan="1" colspan="1">0.4456</td><td rowspan="1" colspan="1">0.6425</td><td rowspan="1" colspan="1">0.7633</td><td rowspan="1" colspan="1">0.3867</td></tr><tr><td rowspan="1" colspan="1">Terms from query (weight 100) + word2vec (weight 20 or 1, depending on the corpus − PubMed or bioCADDIE) + Terrier query expansion (PRF)</td><td rowspan="1" colspan="1">0.3978</td><td rowspan="1" colspan="1">0.4539</td><td rowspan="1" colspan="1">0.6425</td><td rowspan="1" colspan="1">0.7700</td><td rowspan="1" colspan="1">0.3933</td></tr></tbody></table><p><span>Manually added terms were chosen by a biology specialist.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p></div></div><pxy><span>Evaluation of search results obtained with the LGD algorithm using the same or different weights for original and expanded terms</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Manually added terms were chosen by a biology specialist.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>We evaluated the results using the query relevance file with partially relevant documents denoted as non-relevant. We have noticed that search results benefit from query expansion in any form. We have evaluated three forms of expanding the query: no expansion (denoted as NoEXP), Terrier default query expansion (denoted as Terrier) and query expansion with the WEs (denoted as <span class="Gene">Emb</span>). Results are presented in Table 9.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <div class="xtable"><div class="fig"><b>Table 9.</b><p><span>Evaluation of search results obtained with various algorithms without use of partially relevant documents</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p><table xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" frame="hsides" rules="groups"><colgroup span="1"><col align="left" valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"><col align="char" char="." valign="top" span="1"></colgroup><thead><tr><th rowspan="1" colspan="1">Expansion method</th><th rowspan="1" colspan="1">NoEXP</th><th rowspan="1" colspan="1">Terrier</th><th rowspan="1" colspan="1">Emb</th><th rowspan="1" colspan="1">NoEXP</th><th rowspan="1" colspan="1">Terrier</th><th rowspan="1" colspan="1">Emb</th></tr><tr><th rowspan="1" colspan="1">baseline method</th><th rowspan="1" colspan="1">infAP</th><th rowspan="1" colspan="1">infAP</th><th rowspan="1" colspan="1">infAP</th><th rowspan="1" colspan="1">infNDCG</th><th rowspan="1" colspan="1">infNDCG</th><th rowspan="1" colspan="1">infNDCG</th></tr></thead><tbody><tr><td rowspan="1" colspan="1">InL2c</td><td rowspan="1" colspan="1">0.1940</td><td rowspan="1" colspan="1"><bold>0.2056</bold></td><td rowspan="1" colspan="1">0.2085</td><td rowspan="1" colspan="1">0.2524</td><td rowspan="1" colspan="1"><bold>0.2689</bold></td><td rowspan="1" colspan="1"><bold>0.2687</bold></td></tr><tr><td rowspan="1" colspan="1">BB2</td><td rowspan="1" colspan="1">0.1853</td><td rowspan="1" colspan="1">0.2023</td><td rowspan="1" colspan="1">0.2079</td><td rowspan="1" colspan="1">0.2469</td><td rowspan="1" colspan="1">0.2624</td><td rowspan="1" colspan="1">0.2642</td></tr><tr><td rowspan="1" colspan="1">BM25</td><td rowspan="1" colspan="1">0.1813</td><td rowspan="1" colspan="1">0.1950</td><td rowspan="1" colspan="1">0.1980</td><td rowspan="1" colspan="1">0.2437</td><td rowspan="1" colspan="1">0.2591</td><td rowspan="1" colspan="1">0.2610</td></tr><tr><td rowspan="1" colspan="1">DFR_BM25</td><td rowspan="1" colspan="1">0.1893</td><td rowspan="1" colspan="1">0.1996</td><td rowspan="1" colspan="1">0.2040</td><td rowspan="1" colspan="1">0.2469</td><td rowspan="1" colspan="1">0.2590</td><td rowspan="1" colspan="1">0.2601</td></tr><tr><td rowspan="1" colspan="1">In_expB2</td><td rowspan="1" colspan="1">0.1841</td><td rowspan="1" colspan="1">0.1954</td><td rowspan="1" colspan="1">0.1995</td><td rowspan="1" colspan="1">0.2439</td><td rowspan="1" colspan="1">0.2578</td><td rowspan="1" colspan="1">0.2587</td></tr><tr><td rowspan="1" colspan="1">DLH13</td><td rowspan="1" colspan="1">0.1780</td><td rowspan="1" colspan="1">0.1815</td><td rowspan="1" colspan="1">0.1845</td><td rowspan="1" colspan="1">0.2495</td><td rowspan="1" colspan="1">0.2529</td><td rowspan="1" colspan="1">0.2585</td></tr><tr><td rowspan="1" colspan="1">LGD</td><td rowspan="1" colspan="1"><bold>0.1946</bold></td><td rowspan="1" colspan="1">0.2013</td><td rowspan="1" colspan="1"><bold>0.2086</bold></td><td rowspan="1" colspan="1"><bold>0.2599</bold></td><td rowspan="1" colspan="1">0.2569</td><td rowspan="1" colspan="1">0.2579</td></tr><tr><td rowspan="1" colspan="1">DLH</td><td rowspan="1" colspan="1">0.1633</td><td rowspan="1" colspan="1">0.1664</td><td rowspan="1" colspan="1">0.1688</td><td rowspan="1" colspan="1">0.2392</td><td rowspan="1" colspan="1">0.2560</td><td rowspan="1" colspan="1">0.2579</td></tr><tr><td rowspan="1" colspan="1">DFRee</td><td rowspan="1" colspan="1">0.1779</td><td rowspan="1" colspan="1">0.1905</td><td rowspan="1" colspan="1">0.1981</td><td rowspan="1" colspan="1">0.2489</td><td rowspan="1" colspan="1">0.2547</td><td rowspan="1" colspan="1">0.2560</td></tr><tr><td rowspan="1" colspan="1">IFB</td><td rowspan="1" colspan="1">0.1684</td><td rowspan="1" colspan="1">0.1762</td><td rowspan="1" colspan="1">0.1824</td><td rowspan="1" colspan="1">0.2360</td><td rowspan="1" colspan="1">0.2467</td><td rowspan="1" colspan="1">0.2485</td></tr><tr><td rowspan="1" colspan="1">In_expC2</td><td rowspan="1" colspan="1">0.1754</td><td rowspan="1" colspan="1">0.1786</td><td rowspan="1" colspan="1">0.1829</td><td rowspan="1" colspan="1">0.2350</td><td rowspan="1" colspan="1">0.2408</td><td rowspan="1" colspan="1">0.2420</td></tr><tr><td rowspan="1" colspan="1">PL2</td><td rowspan="1" colspan="1">0.1660</td><td rowspan="1" colspan="1">0.1708</td><td rowspan="1" colspan="1">0.1735</td><td rowspan="1" colspan="1">0.2367</td><td rowspan="1" colspan="1">0.2371</td><td rowspan="1" colspan="1">0.2383</td></tr><tr><td rowspan="1" colspan="1">DPH</td><td rowspan="1" colspan="1">0.1584</td><td rowspan="1" colspan="1">0.1691</td><td rowspan="1" colspan="1">0.1777</td><td rowspan="1" colspan="1">0.2422</td><td rowspan="1" colspan="1">0.2343</td><td rowspan="1" colspan="1">0.2360</td></tr><tr><td rowspan="1" colspan="1">TF_IDF</td><td rowspan="1" colspan="1">0.1827</td><td rowspan="1" colspan="1">0.1880</td><td rowspan="1" colspan="1">0.1904</td><td rowspan="1" colspan="1">0.2456</td><td rowspan="1" colspan="1">0.2327</td><td rowspan="1" colspan="1">0.2345</td></tr></tbody></table><p><span>Bold font indicates the highest values for a given measure.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></p></div></div><pxy><span>Evaluation of search results obtained with various algorithms without use of partially relevant documents</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Bold font indicates the highest values for a given measure.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>We can see that commonly used BM25 and its extension InL2 gives surprisingly good results, better than the best performing algorithm in the full evaluation—LGD. In terms of cumulative gain, TF-IDF is the worst performing algorithm. Improvement for results obtained with query expansion is consistent across all algorithms. Composition of both types of query expansions gives the best results, reaching a normalized discounted cumulative gain of 0.2687 for the InL2 algorithm and 0.2086 Average Precision for the LGD algorithm.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h1>Conclusions and future work</h1> <pxy><span>Shared tasks bioCADDIE challenge fulfilled an important role in the advancement of biomedical Information retrieval methods using data snippets as datasets. Our post-challenge analysis indicates that bioCADDIE data is quite different from continuous biomedical data. There are quite a number of documents that basically present the same information duplicated in NML databases. Manual expansion, in general, makes the results worse. Word2vec based query expansion improves the results but expansion term weights have to be much smaller than the original weights. For effectiveness of word2vec, a method for calculating the similarity of candidate expansion terms to the original query terms is crucial. In this work, we use the pure word2vec.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Several recently proposed theoretical approaches to query expansion reporting positive results (34–42) deserve to be applied in a bioCADDIE context [including word2vec (38)]. There are many studies on WE information retrieval in the biomedical domain (34, 39).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The work of Fudan group within the bioASK contest (43) used deep semantics comparing query and document text on a sentence basis (D2V, document vectors). D2V-TFIDF, which concatenates both dense and sparse semantic representations, performed very well in application to ranking of MeSHLabeler.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>It should be stressed that in (15), the pure word2vec method (with cosine similarity) was presented as better than it actually is by choosing an easy type of corpus such as countries and capitals. Much better results are obtained when sense disambiguation (44) and <span class="Disease">hubness reduction</span> is applied to the vector space. For similarity tasks, the results in (45), where three different corrections to word2vec were used (retrofit, hubness removal and ranking type similarity), are up to 30% better than with the other method (15). Such a method (enhanced to relatedness) could allow direct comparison of query and target terms.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Other query expansion schemes are based on WE exist (41–42). Terrier provides a state-of-the-art baseline system but our perspective is that PRF and phrase query expansion could be significantly improved within Terrier.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Direct comparison of this work results with original bioCADDIE results is not warranted. Nevertheless, our results are strong. They are close to the top in most measures, and the best in infAP measure.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>To summarize, the main conclusions of this article are the following:</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Use of language models created on the basis of distribution semantics to expand the query (using WE) has the potential to significantly improve WE results in the near future.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Assigning different weights to words in a query, depending on whether the words were added in the expansion process or originating from the original content of the query significantly improves the result.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Filtering documents that do not convey informative content (based on PyEnchant and MeSH) likewise improves the result.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>An important element influencing the final result was the selection of the appropriate ranking function and the adjustment of the PRF extension parameters (parametric, with the coefficient β, to use the two best articles, instead of the standard three).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>In achieving the competitive results of this work, we used no advanced preprocessing, neither manual tasks nor system training. These results could be treated as a new baseline. It is our belief that with more sophistication by including the aforementioned elements, particularly in application to individual questions, we can potentially improve infNDCG by 0.05. Even small improvement amounts to a large economic gain as in the 2012 survey (46), it had been found that that doctors performed an average of six professional searches a day during their course of work.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>The bioCADDIE challenge results need to be further analysed to understand which features of participating team algorithms contributed to effectiveness of results for particular measures. Such extended analysis was performed or TREC CDS 2014 (47).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Comparing all bioCADDIE runs based on the infAP, infNDCG, NDCG and <span class="Gene">P@10</span> there is surprisingly little correlation between evaluated results for these measures (20). The UCSD team was ranked first in term of infNDCG but would rank ninth in the ranking based on the classic NDCG metric. The UCSD method was optimized for infNDCG but has not been universally strong across measures. This challenge deserves further work and should contribute the development of a DDI prototype.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Finally, the result of bioCADDIE effort could be useful for determination of relevance of particular data. For example, evaluation performed in (48) showed that the genome-wide association studies dataset finder outperformed PubMed significantly in retrieving literature with desired datasets. This could indicate better usefulness of datasets compared with literature for some semantic tasks.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <h1>Funding</h1> <pxy><span>Poznan University of Technology grant (04/45/DSPB/0149). The bioCADDIE Dataset Retrieval Challenge was supported by National Institutes of Health (U24AI117966).</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> <pxy><span>Conflict of interest. None declared.</span> <button onclick="translate_abc(this)" style="border:none;outline:none;color:#5577AA;font-size:10px;margin-bottom:0px;" title="Translate into Chinese"> <span class="glyphicon glyphicon-transfer"></span> </button></pxy> </div> <div class="tab-pane fade" id="refx"> <div style="padding-top:15px;"> <span style="font-size:13px;"> <span class="glyphicon glyphicon-stats"> </span>   10 in total</span></div> <div style="padding-top:4px;"> </div> <span class="literature_info"></span> <h2><span class="s2"></span> <a href="si.php?db=pubmed&id=29220475" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">1.  Improving average ranking precision in user searches for biomedical research datasets.</span></a></h2> <span class="author">Authors:  Douglas Teodoro; Luc Mottin; Julien Gobeill; Arnaud Gaudinat; Thérèse Vachon; Patrick Ruch </span><br> <span class="journal">Journal:  Database (Oxford) </span>      <span class="year">Date:  2017-01-01 </span>      <span class="year">Impact factor: 3.451 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=25734591" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">2.  Problems with the nested granularity of feature domains in bioinformatics: the eXtasy case.</span></a></h2> <span class="author">Authors:  Dusan Popovic; Alejandro Sifrim; Jesse Davis; Yves Moreau; Bart De Moor </span><br> <span class="journal">Journal:  BMC Bioinformatics </span>      <span class="year">Date:  2015-02-23 </span>      <span class="year">Impact factor: 3.169 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=27307646" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">3.  DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.</span></a></h2> <span class="author">Authors:  Shengwen Peng; Ronghui You; Hongning Wang; Chengxiang Zhai; Hiroshi Mamitsuka; Shanfeng Zhu </span><br> <span class="journal">Journal:  Bioinformatics </span>      <span class="year">Date:  2016-06-15 </span>      <span class="year">Impact factor: 6.937 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=27643536" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">4.  PIPELINEs: Creating Comparable Clinical Knowledge Efficiently by Linking Trial Platforms.</span></a></h2> <span class="author">Authors:  M R Trusheim; A A Shrier; Z Antonijevic; R A Beckman; R K Campbell; C Chen; K T Flaherty; J Loewy; D Lacombe; S Madhavan; H P Selker; L J Esserman </span><br> <span class="journal">Journal:  Clin Pharmacol Ther </span>      <span class="year">Date:  2016-10-19 </span>      <span class="year">Impact factor: 6.875 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=29220453" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">5.  A publicly available benchmark for biomedical dataset retrieval: the reference standard for the 2016 bioCADDIE dataset retrieval challenge.</span></a></h2> <span class="author">Authors:  Trevor Cohen; Kirk Roberts; Anupama E Gururaj; Xiaoling Chen; Saeid Pournejati; George Alter; William R Hersh; Dina Demner-Fushman; Lucila Ohno-Machado; Hua Xu </span><br> <span class="journal">Journal:  Database (Oxford) </span>      <span class="year">Date:  2017-01-01 </span>      <span class="year">Impact factor: 3.451 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=28815103" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">6.  Search Datasets in Literature: A Case Study of GWAS.</span></a></h2> <span class="author">Authors:  Xiao Dong; Yaoyun Zhang; Hua Xu </span><br> <span class="journal">Journal:  AMIA Jt Summits Transl Sci Proc </span>      <span class="year">Date:  2017-07-26 </span><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=29220457" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">7.  Multi-field query expansion is effective for biomedical dataset retrieval.</span></a></h2> <span class="author">Authors:  Mohamed Reda Bouadjenek; Karin Verspoor </span><br> <span class="journal">Journal:  Database (Oxford) </span>      <span class="year">Date:  2017-01-01 </span>      <span class="year">Impact factor: 3.451 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=29220467" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">8.  Query expansion using MeSH terms for dataset retrieval: OHSU at the bioCADDIE 2016 dataset retrieval challenge.</span></a></h2> <span class="author">Authors:  Theodore B Wright; David Ball; William Hersh </span><br> <span class="journal">Journal:  Database (Oxford) </span>      <span class="year">Date:  2017-01-01 </span>      <span class="year">Impact factor: 3.451 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=28585923" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">9.  DATS, the data tag suite to enable discoverability of datasets.</span></a></h2> <span class="author">Authors:  Susanna-Assunta Sansone; Alejandra Gonzalez-Beltran; Philippe Rocca-Serra; George Alter; Jeffrey S Grethe; Hua Xu; Ian M Fore; Jared Lyle; Anupama E Gururaj; Xiaoling Chen; Hyeon-Eui Kim; Nansu Zong; Yueling Li; Ruiling Liu; I Burak Ozyurt; Lucila Ohno-Machado </span><br> <span class="journal">Journal:  Sci Data </span>      <span class="year">Date:  2017-06-06 </span>      <span class="year">Impact factor: 6.444 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /><h2><span class="s2"></span> <a href="si.php?db=pubmed&id=26269118" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">10.  Comparison of serious adverse events posted at ClinicalTrials.gov and published in corresponding journal articles.</span></a></h2> <span class="author">Authors:  Eve Tang; Philippe Ravaud; Carolina Riveros; Elodie Perrodeau; Agnes Dechartres </span><br> <span class="journal">Journal:  BMC Med </span>      <span class="year">Date:  2015-08-14 </span>      <span class="year">Impact factor: 8.775 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /> <div style="padding:0px;"> </div> <div style="padding-top:5px;"> <span style="font-size:13px;"> <span class="glyphicon glyphicon-stats"> </span>   10 in total</span> </div> </div> <div class="tab-pane fade" id="citex"> <div style="padding-top:15px;"> <span style="font-size:13px;"> <span class="glyphicon glyphicon-stats"> </span>   1 in total</span></div> <div style="padding-top:4px;"> </div> <h2><span class="s2"></span> <a href="si.php?db=pubmed&id=33349237" target="_blank" style="cursor:pointer;"><span style="font-weight:500;font-size:15px;color:#337AB7;">1.  A2A: a platform for research in biomedical literature search.</span></a></h2> <span class="author">Authors:  Maciej Rybinski; Sarvnaz Karimi; Vincent Nguyen; Cecile Paris </span><br> <span class="journal">Journal:  BMC Bioinformatics </span>      <span class="year">Date:  2020-12-21 </span>      <span class="year">Impact factor: 3.169 </span><br><hr style="padding:0px;margin:10px;margin-left:0px;" /> <div style="padding-top:15px;"> <span style="font-size:13px;"> <span class="glyphicon glyphicon-stats"> </span>   1 in total</span></div> </div> </div> </div> <script type="text/javascript"> $('.more_ref_info a').click(function() { $(".more_ref_info").html('<div class="alert alert-info" style="padding:8px;padding-left:0px;width:90%"> <div class="three-bounce"> Loading  <div class="bounce1"></div> <div class="bounce2"></div> <div class="bounce3"></div> </div> </div>'); $.post('codes/reference/ref.php',{pn:'2',idx:'29688372'},function(data) { $(".more_ref_info").html(data); }) }); $('.more_cite_info a').click(function() { $(".more_cite_info").html('<div class="alert alert-info" style="padding:8px;padding-left:0px;width:90%"> <div class="three-bounce"> Loading  <div class="bounce1"></div> <div class="bounce2"></div> <div class="bounce3"></div> </div> </div>'); $.post('codes/reference/cite.php',{pn:'2',idx:'29688372'},function(data) { $(".more_cite_info").html(data); }) }); </script> <script type="text/javascript"> $(document).ready(function(){ $(".con").html('<br><br><div style="width:280px;"><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div></div>'); $.post('codes/translate/IF.php',{db:'pubmed',id:'1758-0463',lang:'en'},function(data) { $(".con").html(data); }); }); $('.search_IF a').click(function(e) { $(".con2").html(''); $(".con").html('<br><div style="width:380px;"><center><font color="#87CEEB"><b>Please waiting ...</b></font></center><br><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div></div>'); $.post('codes/translate/IF.php',{db:'pubmed',id:'1758-0463',lang:'en'},function(data) { $(".con").html(data); }); }); </script> <script type="text/javascript"> $('.dx_button').click(function() { loading.showLoading({ type:1, tip:"Loading" }) $.post('codes/translate/download_dx.php',{pmid:'29688372'},function(data) { eval('var data='+data); if(data.ti==1){ loading.hideLoading(); window.open('tmpe/29688372.pdf') } }) }) </script> <script type="text/javascript"> $(document).ready(function(){ var t=new Date().getTime(); var id=getCookie('w_id'); if(window.XMLHttpRequest){ var xhr=new XMLHttpRequest(); }else{ var xhr=new ActiveXObject('Microsoft.XMLHTTP'); } xhr.open('GET','src/php/index.php?p='+t); xhr.send(); xhr.onreadystatechange=function(){ if(xhr.readyState==4){ if(xhr.status==200){ if(!(xhr.responseText=='' && id!='' && id!=0)){ var n_val=getCookie('name_val'); //alert(n_val); $.post('codes/translate/download.php',{doi:'10.1093/database/bax103',user:n_val},function(data) { $(".d_button").html(data); }); } } } } }); </script> <script> $(document).ready(function(){ $("#Chemical_id").change(function() { if($("#Chemical_id").is(":checked")) { $(".Chemical").addClass("Chemical_desc"); $(".Chemical_desc4").addClass("Chemical_desc3"); $(".Chemical").bind('click',function(e){ e.preventDefault(); var namea = $(this).text(); $(document).ready(function(){ $("#myModal_annotation").modal("show") }); $(".annotation_alert").html('<br><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div>'); $.get('codes/geo/annotation.php',{pmid:'29688372',namea:namea,typea:'Chemical',query:'',db:'pubmed'},function(data) { $(".annotation_alert").html(data); }) }) }else{ $(".Chemical_desc4").removeClass("Chemical_desc3"); $(".Chemical").removeClass("Chemical_desc"); $(".Chemical").unbind(); } }) $("#Disease_id").change(function() { if($("#Disease_id").is(":checked")) { $(".Disease").addClass("Disease_desc"); $(".Disease_desc4").addClass("Disease_desc3"); $(".Disease").bind('click',function(e){ e.preventDefault(); var namea = $(this).text(); $(document).ready(function(){ $("#myModal_annotation").modal("show") }); $(".annotation_alert").html('<br><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div>'); $.get('codes/geo/annotation.php',{pmid:'29688372',namea:namea,typea:'Disease',query:'',db:'pubmed'},function(data) { $(".annotation_alert").html(data); }) }) }else{ $(".Disease_desc4").removeClass("Disease_desc3"); $(".Disease").removeClass("Disease_desc"); $(".Disease").unbind(); } }) $("#Gene_id").change(function() { if($("#Gene_id").is(":checked")) { $(".Gene").addClass("Gene_desc"); $(".Gene_desc4").addClass("Gene_desc3"); $(".Gene").bind('click',function(e){ e.preventDefault(); var namea = $(this).text(); $(document).ready(function(){ $("#myModal_annotation").modal("show") }); $(".annotation_alert").html('<br><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div>'); $.get('codes/geo/annotation.php',{pmid:'29688372',namea:namea,typea:'Gene',query:'',db:'pubmed'},function(data) { $(".annotation_alert").html(data); }) }) }else{ $(".Gene_desc4").removeClass("Gene_desc3"); $(".Gene").removeClass("Gene_desc"); $(".Gene").unbind(); } }) $("#Mutation_id").change(function() { if($("#Mutation_id").is(":checked")) { $(".Mutation").addClass("Mutation_desc"); $(".Mutation_desc4").addClass("Mutation_desc3"); $(".Mutation").bind('click',function(e){ e.preventDefault(); var namea = $(this).text(); $(document).ready(function(){ $("#myModal_annotation").modal("show") }); $(".annotation_alert").html('<br><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div>'); $.get('codes/geo/annotation.php',{pmid:'29688372',namea:namea,typea:'Mutation',query:'',db:'pubmed'},function(data) { $(".annotation_alert").html(data); }) }) }else{ $(".Mutation_desc4").removeClass("Mutation_desc3"); $(".Mutation").removeClass("Mutation_desc"); $(".Mutation").unbind(); } }) $("#Species_id").change(function() { if($("#Species_id").is(":checked")) { $(".Species").addClass("Species_desc"); $(".Species_desc4").addClass("Species_desc3"); $(".Species").bind('click',function(e){ e.preventDefault(); var namea = $(this).text(); $(document).ready(function(){ $("#myModal_annotation").modal("show") }); $(".annotation_alert").html('<br><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div>'); $.get('codes/geo/annotation.php',{pmid:'29688372',namea:namea,typea:'Species',query:'',db:'pubmed'},function(data) { $(".annotation_alert").html(data); }) }) }else{ $(".Species_desc4").removeClass("Species_desc3"); $(".Species").removeClass("Species_desc"); $(".Species").unbind(); } }) $(".population").addClass("population_desc"); $("#population_id").change(function() { if($("#population_id").is(":checked")) { $(".population_desc4").addClass("population_desc3"); $(".population").addClass("population_desc"); }else{ $(".population_desc4").removeClass("population_desc3"); $(".population").removeClass("population_desc"); } }) $(".interventions").addClass("interventions_desc"); $("#interventions_id").change(function() { if($("#interventions_id").is(":checked")) { $(".interventions_desc4").addClass("interventions_desc3"); $(".interventions").addClass("interventions_desc"); }else{ $(".interventions_desc4").removeClass("interventions_desc3"); $(".interventions").removeClass("interventions_desc"); } }) $(".outcomes").addClass("outcomes_desc"); $("#outcomes_id").change(function() { if($("#outcomes_id").is(":checked")) { $(".outcomes_desc4").addClass("outcomes_desc3"); $(".outcomes").addClass("outcomes_desc"); }else{ $(".outcomes_desc4").removeClass("outcomes_desc3"); $(".outcomes").removeClass("outcomes_desc"); } }) }) </script> <div class="col-sm-4" style=""> <div id="myNav"> <span class="con"> </span> </div> <span class="con2"></span> <span class="con3"></span> </div> </div> </div> <script type="text/javascript"> function translate_xyz(btnObj){ var x = btnObj.previousElementSibling.innerHTML; $(".con2").html(''); $(".con").html('<br><div style="width:380px;"><center><font color="#87CEEB"><b>正在翻译中 ...</b></font></center><br><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div></div>'); $.post('codes/translate/translate_content.php',{content:x,to_lang:'en'},function(data) { $(".con").html(data); }); } function translate_abc(btnObj){ var x = btnObj.previousElementSibling.innerHTML; $(".con2").html(''); $(".con").html('<br><div style="width:380px;"><center><font color="#87CEEB"><b>正在翻译中 ...</b></font></center><br><div class="spinner"><div class="double-bounce1"></div><div class="double-bounce2"></div></div></div>'); $.post('codes/translate/translate_content.php',{content:x,to_lang:'zh'},function(data) { $(".con").html(data); }); } </script> <script type="text/javascript"> $(document).ready(function(){ loading.hideLoading(); }); $('.tab_b a').click(function() { initial_url_paras = window.location.href.split("?"); initial_url = initial_url_paras[0]; paras = initial_url_paras[1]; paras_array = paras.split("&"); for(let ii=0;ii<paras_array.length;ii++){ current_para_array = paras_array[ii].split("="); if(current_para_array[0]=="db"){dbx=current_para_array[1]} if(current_para_array[0]=="id"){idx=current_para_array[1]} } $(".ax2").html(' <div style="background-color:#d9edf7;padding:1px;padding-left:6px;margin-left:4px;font-size:12px;"><table><tr> <td>跳转中 ... </td> <td> <div class="three-bounce" style="min-height:22px;"> <div class="bounce1"></div> <div class="bounce2"></div> <div class="bounce3"></div> </div></td></tr></table> </div>'); window.location.href = 'si.php?db=' + dbx + '&id=' + idx; }) </script> <div class="modal fade" id="myModal_annotation" tabindex="-1" role="dialog" aria-labelledby="myModalLabel" aria-hidden="true"> <div class="modal-dialog" style="width:300px;"> <div class="modal-content"> <div class="modal-body"> <button type="button" class="close" data-dismiss="modal" aria-hidden="true">× </button> <span class="annotation_alert"></span> </div> </div> </div> </div> <br> <script type="text/javascript" src="src/js/child_nav.js"></script> <div id="autoHeightDiv"></div> <div class="footLineGray" style="border:none;"></div> <div class="lineWhite" style="border:none;"></div> <div class="webFoot"> <div class="foot middle" style="text-align:center;padding-right:10px;;padding-top:19px;background:white;border:none;"> 北京卡尤迪生物科技股份有限公司 © 2022-2023. </div> </div> <script> $(function () { $("[data-toggle='tooltip']").tooltip({html : true }); }); $(function() { $('#rct_show_id').click(function() { $('.rct_class').show() $('.entity_class').hide() $('#rct_show_id').hide() $('#rct_hide_id').show() }) $('#rct_hide_id').click(function() { $('.rct_class').hide() $('.entity_class').show() $('#rct_show_id').show() $('#rct_hide_id').hide() }) }) </script> <script> $(function () { $("[data-toggle='popover']").popover({html:true,trigger:'hover click'}); }); </script> <script type="text/javascript" src="src/js/child_nav.js"></script> <script type="text/javascript" src="src/js/clickx.js"></script> <script src="end.js"></script> </body> </html>