| Literature DB >> 33028629 |
Yanhui Hu1,2, Verena Chung3,2, Aram Comjean3,2, Jonathan Rodiger3,2, Fnu Nipun3,2, Norbert Perrimon3,2,4, Stephanie E Mohr3,2.
Abstract
The accumulation of biological and biomedical literature outpaces the ability of most researchers and clinicians to stay abreast of their own immediate fields, let alone a broader range of topics. Although available search tools support identification of relevant literature, finding relevant and key publications is not always straightforward. For example, important publications might be missed in searches with an official gene name due to gene synonyms. Moreover, ambiguity of gene names can result in retrieval of a large number of irrelevant publications. To address these issues and help researchers and physicians quickly identify relevant publications, we developed BioLitMine, an advanced literature mining tool that takes advantage of the medical subject heading (MeSH) index and gene-to-publication annotations already available for PubMed literature. Using BioLitMine, a user can identify what MeSH terms are represented in the set of publications associated with a given gene of the interest, or start with a term and identify relevant publications. Users can also use the tool to find co-cited genes and a build a literature co-citation network. In addition, BioLitMine can help users build a gene list relevant to a MeSH term, such as a list of genes relevant to "stem cells" or "breast neoplasms." Users can also start with a gene or pathway of interest and identify authors associated with that gene or pathway, a feature that makes it easier to identify experts who might serve as collaborators or reviewers. Altogether, BioLitMine extends the value of PubMed-indexed literature and its existing expert curation by providing a robust and gene-centric approach to retrieval of relevant information.Entities:
Keywords: Co-citation network; Enrichment analysis; Literature mining; Model organisms; Pubmed derivatives
Year: 2020 PMID: 33028629 PMCID: PMC7718760 DOI: 10.1534/g3.120.401775
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1The BioLitMine literature mining and annotation informatics pipeline. Literature and gene associations are imported monthly and automatically from the MEDLINE database and NCBI PubMed release files. Literature records are then filtered to remove publications not associated with any gene. The resulting set of gene-associated records are then associated with pathway annotations and processed. DIOPT ortholog mapping is used to associate genes with orthologs in other species. All information is imported into a MySQL database and presented to users via an online user interface that facilitates search and retrieval using gene, MeSH term, people, and other information.
Figure 2BioLitMine gene search. A. Example gene-to-MeSH term workflow using “Drosophila” as the selected species and “vasa” as the input gene (top panel). MeSH terms associated with the input gene or any of its orthologs can be displayed and filtered to select a specific sub-category of terms (middle panel). In this example, only “anatomy” MeSH terms are displayed. Associated publications are provided as active links to records at NCBI PubMed. Users can also click to view the results in an alternative ‘word cloud’ visualization (overlay panel) and can download tabular results as a comma-separated value (csv) file. B. Example search for genes co-cited with an input term. As for (A), “Drosophila” was selected as the species and “vasa” as the input gene (top panel). Tabular results display genes co-cited with the input gene (bottom panel). Associated publications are provided as active links to records at NCBI PubMed. Users can also click to view the results in an alternative network visualization format (overlay panel) and can download tabular results as a csv file. C. Multiple genes can be searched using the Batch Search option. As shown in the example output, a count of publications is shown for each gene and links to the associated PubMed records.
Figure 3BioLitMine medical subject heading (MeSH) search. Example MeSH term-to-gene search using “human” as the selected species and “breast neoplasm” as the input term (top panel). Users are first given the option to choose ‘child’ terms additional to the input term (not shown). Next, a table of results is displayed. Associated publications are provided as active links to records at NCBI PubMed. Users can download tabular results as a csv file. In addition, the resulting gene list can be mapped to orthologs in another species (bottom overlay panels; “zebrafish” in this example).
Figure 4BioLitMine people search. Example of a pathway-to-people search using “mouse” as the selected species and “Notch” as the selected pathway (top panel). Gene-to-people searches are also supported (not shown). Results are shown in a table (bottom panel) that includes a list of genes associated with both the pathway and the person (last author); the count of genes and publications associated with both the pathway and person; a link to the most recent relevant publication and the year of that publication; and an address extracted from the most recent publication. Results can be downloaded as a csv file.
Figure 5BioLitMine enrichment analysis. Example of an enrichment analysis of a gene list. In this example, the input is a list of 208 autophagy-associated genes from the Gene list Annotation for Drosophila (GLAD) online resource. In this example, the output as limited to MeSH terms in the category “phenomena and processes” (top panel). As expected, enrichment analysis identifies “autophagy” as a highly enriched term (bottom panel). The results can be downloaded as a csv file.
Comparison of the top 25 terms of GO enrichment analysis and MeSH enrichment analysis results for the set of autophagy genes provided at GLAD
| GO “Biological Process” enrichment | MeSH “Phenomena and Processes” enrichment |
|---|---|
| Glycogenolysis | |
| Cellular Microenvironment | |
| Protein Interaction Maps | |
| cellular response to starvation | |
| MAP Kinase Signaling System | |
| Proteolysis | |
| Cell Communication | |
| cellular response to DNA damage stimulus | |
| cellular response to nitrogen starvation | |
| positive regulation of RNA polymerase II transcriptional preinitiation complex assembly | |
| neuron remodeling | Gene Expression Regulation, Enzymologic |
| determination of adult lifespan | |
| regulation of growth | Larva |
| regulation of cell growth | Down-Regulation |
| Lipid Metabolism | |
| phosphatidylinositol dephosphorylation | Protein Binding |
| Rab protein signal transduction | Mitosis |
| Amino Acid Sequence | |
| Enzyme Activation |
Note: Terms shown in bold were found using both enrichment analysis methods; these might be similar terms with different wording, e.g., “Positive regulation of cell size” (GO term) and “Cell Size” (MeSH phenomena and process term). Each list is sorted in ascending order by enrichment P-values. GLAD, Gene List Annotation for Drosophila (https://www.flyrnai.org/tools/glad/).
Figure 6The trend of anatomy MeSH terms in MEDLINE publications for human, mouse, zebrafish and Drosophila studies. The count of gene-associated publications over time associated with anatomy MeSH terms for human, mouse, zebrafish and Drosophila. MeSH terms are organized in a hierarchical structure. For this analysis, we grouped publications based on root-level anatomy terms.