| Literature DB >> 31324861 |
Stefano Pirrò1,2, Emanuela Gadaleta3, Andrea Galgani4, Vittorio Colizzi5, Claude Chelala3.
Abstract
High-throughput technologies have produced a large amount of experimental and biomedical data creating an urgent need for comprehensive and automated mining approaches. To meet this need, we developed SMAC (SMart Automatic Classification method): a tool to extract, prioritise, integrate and analyse biomedical and molecular data according to user-defined terms. The robust ranking step performed on Medical Subject Headings (MeSH) ensures that papers are prioritised based on specific user requirements. SMAC then retrieves any related molecular data from the Gene Expression Omnibus and performs a wide range of bioinformatics analyses to extract biological insights. These features make SMAC a robust tool to explore the literature around any biomedical topic. SMAC can easily be customised/expanded and is distributed as a Docker container ( https://hub.docker.com/r/hfx320/smac ) ready-to-use on Windows, Mac and Linux OS. SMAC's functionalities have already been adapted and integrated into the Breast Cancer Now Tissue Bank bioinformatics platform and the Pancreatic Expression Database.Entities:
Mesh:
Year: 2019 PMID: 31324861 PMCID: PMC6642118 DOI: 10.1038/s41598-019-47046-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic representation of the workflow performed by SMAC.
Description of the metadata retrieved for each publication.
| Type of information | Description |
|---|---|
| PMID | Unique identifier number for publications stored in PubMed |
| Title | Full title of the publication |
| Authors | List of authors, separated by comma |
| Journal | Full name of the journal that published the paper |
| Date of publication | Date of publication is always reported using the |
| MeSH headings | List of the medical headings associated to the publication |
| GSE codes | List of the GEO dataset linked to the PMID |
| Platforms | Experimental platforms used for generating the data |
| ftp-links | Web link for the direct download of the GEO raw data |
| Analyses | List of analyses performed by SMAC on each tuple |
Figure 2Bioinformatics analyses performed by SMAC. The Principal Component Analysis (A) permits to highlight the key sources of variation. Gene expression heatmap (B) shows the normalised levels for the most variable genes. An interactive gene network (C) reflects the association rate among the genes in selected publications. The cellular purity of cancer samples is presented in a single, interactive scatterplot (D). Two interactive barplots (E) show the percentage of breast cancer samples belonging to each molecular subtype and receptor status profile.
Figure 3Semantic similarity benchmarks between SMAC and Polysearch2. The value of 0.5 is set as minimum threshold for statistically significant comparisons.
Comparison of SMAC with other tools focused on the reanalysis of GEO datasets.
| Tool | Description | Single/multiple | Type of analyses | |||||
|---|---|---|---|---|---|---|---|---|
| PCA | DEGs | Tumour purity | Molecular classification | Enrichment Analysis | Meta-analysis | |||
| SMAC | Download and analyse multiple GEO datasets | Multiple | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
| GEO2R | Compares two or more groups of samples in a GEO dataset | Single | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| shinyGEO | Shiny extension of GEO2R | Single | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| GEOquery | R package for | Single | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
| ImaGEO | Meta-analyses across multiple GEO studies | Multiple | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ |
| ScanGEO | Identifies Differentially Expressed Genes across multiple GEO studies | Multiple | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ |
| GEO2Enrichr | Performs enrichment analyses on DEGs extracted from GEO datasets | Single | ✗ | ✓ | ✗ | ✗ | ✓ | ✗ |
| BART | Download and analyse microarray data from GEO | Multiple | ✓ | ✗ | ✗ | ✗ | ✓ | ✗ |
Figure 4Evaluation of the computational burden for the download and analysis tasks. Both curves follow a polynomial, quadratic trend, represented as dashed lines.