| Literature DB >> 32092871 |
Pascal Steffen1,2,3, Jemma Wu2, Shubhang Hariharan1, Hannah Voss3, Vijay Raghunath4, Mark P Molloy1,2, Hartmut Schlüter3.
Abstract
Proteomics and genomics discovery experiments generate increasingly large result tables, necessitating more researcher time to convert the biological data into new knowledge. Literature review is an important step in this process and can be tedious for large scale experiments. An informed and strategic decision about which biomolecule targets should be pursued for follow-up experiments thus remains a considerable challenge. To streamline and formalise this process of literature retrieval and analysis of discovery based 'omics data and as a decision-facilitating support tool for follow-up experiments we present OmixLitMiner, a package written in the computational language R. The tool automates the retrieval of literature from PubMed based on UniProt protein identifiers, gene names and their synonyms, combined with user defined contextual keyword search (i.e., gene ontology based). The search strategy is programmed to allow either strict or more lenient literature retrieval and the outputs are assigned to three categories describing how well characterized a regulated gene or protein is. The category helps to meet a decision, regarding which gene/protein follow-up experiments may be performed for gaining new knowledge and to exclude following already known biomarkers. We demonstrate the tool's usefulness in this retrospective study assessing three cancer proteomics and one cancer genomics publication. Using the tool, we were able to corroborate most of the decisions in these papers as well as detect additional biomolecule leads that may be valuable for future research.Entities:
Keywords: bioinformatics; data mining; genomics; literature retrieval; mass spectrometry; proteomics
Mesh:
Substances:
Year: 2020 PMID: 32092871 PMCID: PMC7073124 DOI: 10.3390/ijms21041374
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Extract of the OmixLitMiner output of the three comparisons found in the supplement of Martinez-Aguilar et al. [9] using the keyword ‘cancer’.
| Keyword: Cancer | ||||||
|---|---|---|---|---|---|---|
| Protein | Summary from Martinez-Aguilar | OmixLitMiner results | ||||
| UniProt ID | Gene Name | Total | Category | False | Comments | |
| Q15582 |
| Was elevated in 6/8 FTC samples compared to FA samples. Interested in it as a potential marker for the progression of adenoma to malignancy in follicular thyroid tumours | 18 | 1 | 1 | |
| P12830 |
| PTC tumours express lower levels of E-cadherin | 204 | 1 | 4 | |
| Q07654 |
| Dysregulated in both PTC and FTC (compared to normal) | 22 | 1 | 1 | 6 papers were published after Martinez-Aguilar |
| P60709 |
| Overexpressed in PTC, associated with actin cytoskeleton remodeling | 4 | 1 | 0 | One paper was published after Martinez-Aguilar |
| P07585 |
| Decorin was lost in both FA and FTC, but not PTC. Decorin expression supposedly decreases metastasis | 2 | 2 | 0 | Both papers were published after Martinez-Aguilar |
| P08294 |
| Dysregulated in both PTC and FTC (compared to normal) | 3 | 2 | 0 | One paper was published after Martinez-Aguilar |
| O15511 |
| Overexpressed in PTC, associated with actin cytoskeleton remodeling | 0 | 3 | 0 | |
Extract of the OmixLitMiner output of the differentially expressed proteins at both distant sites compared to the primary tumour as reported in the supplement of Hanel et al. [10] using the keyword ‘Cancer’.
| Keyword: Cancer | ||||||
|---|---|---|---|---|---|---|
| Protein | Summary from Hanel et al. [ | Our Results | ||||
| UniProt ID | Gene Name | Total | Category | False | Comments | |
| P23588 |
| Was strongly upregulated at both sites, and is upregulated by a downstream signaling cascade initiated by | 2 | 2 | 0 | |
| Q9C005 |
| Was strongly upregulated at both sites | 2 | 2 | 0 | One publication after Hanel et al. [ |
| P47929 |
| Was strongly downregulated at both sites | 0 | 3 | 0 | |
| Q9WTL7 |
| Was upregulated at both sites. The paper has also recognised that it is currently under investigation as a potential anti-cancer target | 0 | 3 | 0 | This version of the LYPLA2 protein is found in mice, not humans. As a mouse model was used in their investigation, its inclusion may or may not have been intentional |
| P41219 |
| Was downregulated at both sites | 0 | 3 | 0 | |
| P08553 |
| Was strongly downregulated at both sites, though its involvement in metastasis is unclear | 0 | 3 | 0 | This version of the NEFM protein is found in mice, not humans. |
Extract from OmixLitMiner from the list of 55 proteins reported in the supplement of Mori et al. [11] using the keyword “Metastasis”.
| Keyword: Metastasis | ||||||
|---|---|---|---|---|---|---|
| Protein | Summary from Mori et al. [ | OmixLitMiner Results | ||||
| UniProt ID | Gene Name | Total | Category | False | Comments | |
| P15311 |
| Was prominently associated with metastasis and is functionally involved in the ‘development process’. mRNA expression was also elevated in CRC with LN Metastasis | 1 | 2 | 0 | Published subsequent to Mori et al. [ |
| Q12906 |
| Was prominently associated with metastasis and is functionally involved in the ‘development process’ | 2 | 2 | 0 | |
| P06753-2 |
| Was prominently associated with metastasis and is functionally involved in the ‘development process’ | 11 | 0 | 11 | This is one of the isoforms of |
| P05783 |
| Was prominently associated with metastasis and is functionally involved in the ‘development process’ | 0 | 3 | 0 | |
Figure 1Distribution of the different categories for all proteins analysed by the OmixLitMiner tool. (a) The results of Martinez-Aguilar et al. [9]; (b) the results of Hanel et al. [10]; (c) the results of Mori et al. [11]. Category 1: review articles found; Category 2: original papers found, but no reviews; Category 3: no articles found; Category 0: protein absent in reviewed UniProt database (see Section 4.2 for details).
Figure 2Schematic workflow showing the different steps the OmixLitMiner tool goes through before generating an output file.