| Literature DB >> 35550589 |
Heejung Yang1,2, Namgil Lee3,4, Beomjun Park3, Jinyoung Park5, Jiho Lee5, Hyeon Seok Jang5, Hojin Yoo3.
Abstract
Biomedical databases grow by more than a thousand new publications every day. The large volume of biomedical literature that is being published at an unprecedented rate hinders the discovery of relevant knowledge from keywords of interest to gather new insights and form hypotheses. A text-mining tool, PubTator, helps to automatically annotate bioentities, such as species, chemicals, genes, and diseases, from PubMed abstracts and full-text articles. However, the manual re-organization and analysis of bioentities is a non-trivial and highly time-consuming task. ChexMix was designed to extract the unique identifiers of bioentities from query results. Herein, ChexMix was used to construct a taxonomic tree with allied species among Korean native plants and to extract the medical subject headings unique identifier of the bioentities, which co-occurred with the keywords in the same literature. ChexMix discovered the allied species related to a keyword of interest and experimentally proved its usefulness for multi-species analysis.Entities:
Mesh:
Year: 2022 PMID: 35550589 PMCID: PMC9098521 DOI: 10.1038/s41598-022-12093-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Network and hierarchical tree of biomedicals using ChexMix.
Figure 2(A) The recommendation process of Korean native plants related to the query keyword using ChexMix. (B) Network obtained by entering ‘amentoflavone’ as input keyword in ChexMix. The unique identifiers (TaxID, pale green nodes) for species co-existing with the input keyword in the literature are linked to their own taxonomic higher rank (genus, sky blue color). Orange nodes represent species names that only existed in the list of Korean medicinal plants of the KPEB and are linked to the nodes for genus to which each species belongs. (C) Detailed subnetwork under the Viburnum genus. Each node was displayed as ‘ID: name’ for TaxID and genus or species name. The networks were drawn by Gephi software (ver. 0.9.2, https://gephi.org/)[30].
Figure 3(A) Chemical structure of amentoflavone. (B) Chromatograms of the five samples with the highest amentoflavone content determined as described in the “Methods” section. AMEN, amentoflavone; VCL, leaves of Viburnum carlesii; VDF, fruits of V. furcatum; VDSt, leaves of V. dilatatum; VEL, leaves of V. erosum; VESt, stems of V. erosum.
Figure 4(A) Acquired network using ‘taxus cuspidata’ and ‘Podophyllum peltatum’ as input keywords in ChexMix. MeSH terms co-occurring in the literature with the input keywords were reorganized according to the hierarchy rules of the MeSH Tree Structures in the MeSH browser (https://meshb-prev.nlm.nih.gov/treeView). The nodes of the co-occurred bioentities in both keywords are colored in orange. (B) Details of the subnetwork of the co-occurred bioentities in both keywords. Each node displays as ‘Tree Number: MeSH Heading’ for MeSH identifiers and a MeSH term. The networks were drawn by Gephi software (ver. 0.9.2, https://gephi.org/)[30].