| Literature DB >> 35208208 |
Shinji Kanazawa1,2,3, Satoshi Shimizu1, Shigeki Kajihara1, Norio Mukai1, Junko Iida1,2, Fumio Matsuda3,4.
Abstract
Metabolomics can help identify candidate biomarker metabolites whose levels are altered in response to disease development or drug administration. However, assessment of the underlying molecular mechanism is challenging considering it depends on the researcher's knowledge. This study reports a novel method for the automated recommendation of keywords known in the literature that may be overlooked by researchers. The proposed method aided in the identification of Medical Subject Headings (MeSH) terms in PubMed using MeSH co-occurrence data. The intended users are biocurators who have identified specific biomarker metabolites from a metabolomics study and would like to identify literature-reported molecular mechanisms that are associated with both the metabolite and their research area of interest. The proposed method finds MeSH terms that co-occur with a MeSH term of the candidate biomarker metabolite as well as a MeSH term of a researcher's known keyword, such as the name of a disease. The connectivity score S was determined using association analysis. Pilot analyses demonstrated that, while the biological significance of the obtained MeSH terms could not be guaranteed, the developed method can be useful for finding keywords to further investigate molecular mechanisms in association with candidate biomarker molecules.Entities:
Keywords: MeSH co-occurrence; Medical Subject Headings terms; association analysis; biomarker discovery; keyword recommendation
Year: 2022 PMID: 35208208 PMCID: PMC8875447 DOI: 10.3390/metabo12020133
Source DB: PubMed Journal: Metabolites ISSN: 2218-1989
Figure 1An automated method for finding Medical Subject Heading (MeSH) terms highlighting an association between metabolome data and the researcher’s knowledge. (a) A typical metabolomics research for biomarker discovery. (b) Tasks of a researcher to find research keywords suggesting a molecular mechanism. (c) Relationships among MeSH terms of a metabolite c obtained via metabolome analysis, a keywords k′, and the researcher’s known keyword k. (d) Novel method for keyword recommendation. The connectivity score S(c, k′, k) is determined based on the association scores A(c, k′) and A(k′, k) using the MeSH co-occurrence data derived from the PubMed subset. Significance of the connectivity score is statistically tested using null distribution of S derived from randomized database (DB) and false discovery rate (FDR) estimation by performing the Benjamini–Hochberg adjustment. MeSH terms below the threshold are retrieved and used to guide a literature search. (e) Relationship between PubMed, PubMed subset, and randomized DB used in this study.
Comparison of the scoring methods using two example MeSH terms (false discovery rate level < 0.01).
| Example 1. Sarcosine and Prostate Neoplasm (1) | Example 2. Leucine and Diabetes Mellitus, Type 2 (2) | |||||
|---|---|---|---|---|---|---|
| Methods for Association Scoring | Number of Obtained MeSH Terms | Ranking of Dimethylglycine Dehydrogenase | Ranking of One-Carbon Group Transferases | Number of Obtained MeSH Terms | Ranking of Insulin Resistance | Ranking of Mechanistic Target of Rapamycin Complex 1 |
| Simpson | 4 | 4th | No hit | 2 | No hit | No hit |
| Lift | 0 | No hit | No hit | 0 | No hit | No hit |
| Cosine | 1 | No hit | No hit | 54 | No hit | No hit |
| Confidence (RR) | 0 | No hit | No hit | 0 | No hit | No hit |
| Confidence (RL) | 6 | No hit | No hit | 4 | No hit | No hit |
| Confidence (LR) | 5 | 3rd | 5th | 291 | 53rd | 77th |
| Confidence (LL) | 1 | No hit | No hit | 0 | No hit | No hit |
(1) MeSH terms (k′) were obtained from sarcosine (metabolite, c) and prostate neoplasm (the researcher’s known keyword, k). Results were checked by the occurrence of MeSH terms, “dimethylglycine dehydrogenase” and “one-carbon group transferases”. (2) MeSH terms (k′) were obtained from leucine (metabolite, c) and diabetes mellitus, type 2 (the researcher’s known keyword, k). Results were checked by the occurrence of MeSH terms, “insulin resistance” and “mechanistic target of rapamycin complex 1”.
Medical Subject Heading (MeSH) terms (k′) obtained from sarcosine (metabolite, c) and prostate neoplasm (the researcher’s known keyword, k) using the confidence (LR) method at a false discovery rate (FDR) level of <0.01.
| Ranking | Obtained MeSH Terms, | Co-Occurrence ( | Co-Occurrence ( | FDR | PubMed Search Hit (1) | |||
|---|---|---|---|---|---|---|---|---|
| 1 | Sarcosine Dehydrogenase | 25 | 0.431 | 5 | 0.086 | 1.00 × 10−8 | 1.4 × 10−4 | 5 |
| 2 | Sarcosine Oxidase | 38 | 0.245 | 7 | 0.045 | 8.00 × 10−8 | 5.6 × 10−4 | 7 |
| 3 | Dimethylglycine Dehydrogenase | 15 | 0.326 | 1 | 0.022 | 1.70 × 10−7 | 7.9 × 10−4 | 1 |
| 4 | Glycine N-Methyltransferase | 19 | 0.075 | 14 | 0.055 | 3.00 × 10−7 | 1.0 × 10−3 | 6 |
| 5 | One-Carbon Group Transferases | 1 | 0.019 | 3 | 0.056 | 3.38 × 10−6 | 9.4 × 10−3 | 7 |
(1) Based on the consideration of three MeSH terms of metabolite c, known keyword k, and answer keyword k′, a query term for PubMed (https://pubmed.ncbi.nlm.nih.gov/, accessed on 28 January 2022) search was created as prostate neoplasm “sarcosine” (MeSH Terms) AND “prostate neoplasm” (MeSH Terms) AND “k′” (MeSH terms). PubMed searches were conducted in October 2021.
MeSH terms under the enzyme (D08.811) in the over-representation analysis of the 291 MeSH terms obtained from “Leucine” and “Diabetes Mellitus, Type 2” at a false discovery rate (FDR) level of <0.01 (1).
| MeSH Tree ID | MeSH ID | MeSH Term | Number of Obtained MeSH Terms in the Lower Hierarchy | Total Number of MeSH Terms in the Lower Hierarchy |
| FDR |
|---|---|---|---|---|---|---|
| D08.811.277.656 | D010447 | Peptide Hydrolases | 28 | 358 | 5.32 × 10−6 | 2.05 × 10−5 |
| D08.811.277.656.350 | D020689 | Exopeptidases | 10 | 35 | 4.44 × 10−16 | 5.53 × 10−15 |
| D08.811.277.656.350.100 | D000626 | Aminopeptidases | 2 | 6 | 5.96 × 10−5 | 0.000196 |
| D08.811.277.656.350.350 | D004152 | Dipeptidyl-Peptidases and Tripeptidyl-Peptidases | 2 | 3 | 1.92 × 10−9 | 1.09 × 10−8 |
| D08.811.277.656.350.555 | D045727 | Metalloexopeptidases | 3 | 10 | 4.13 × 10−6 | 1.63 × 10−5 |
| D08.811.277.656.675.555 | D045727 | Metalloexopeptidases | 3 | 10 | 4.13 × 10−6 | 1.63 × 10−5 |
| D08.811.277.656.837 | D043484 | Proprotein Convertases | 4 | 9 | 1.53 × 10−11 | 9.91 × 10−11 |
| D08.811.913.696.620.682.700.931 | D058570 | TOR Serine-Threonine Kinases | 3 | 5 | 4.08 × 10−12 | 2.80 × 10−11 |
| D08.811.913.696.620.682.700.931.500 | D000076222 | Mechanistic Target of Rapamycin Complex 1 | 2 | 2 | 7.02 × 10−14 | 5.48 × 10−13 |
(1) All MeSH terms in the over-representation analysis are available in Data S3.
Figure 2Connectivity between metabolites and diseases in literature. A pair of metabolite and disease was counted when at least one MeSH term was obtained by the developed method. The results were summarized for each metabolite against 20 cancers and 19 metabolic diseases. The complete figure with metabolite names is shown in Figure S2.