Literature DB >> 32476989

Available Software for Meta-analyses of Genome-wide Expression Studies.

Diego A Forero1,2.   

Abstract

Advances in transcriptomic methods have led to a large number of published Genome-Wide Expression Studies (GWES), in humans and model organisms. For several years, GWES involved the use of microarray platforms to compare genome-expression data for two or more groups of samples of interest. Meta-analysis of GWES is a powerful approach for the identification of differentially expressed genes in biological topics or diseases of interest, combining information from multiple primary studies. In this article, the main features of available software for carrying out meta-analysis of GWES have been reviewed and seven packages from the Bioconductor platform and five packages from the CRAN platform have been described. In addition, nine previously described programs and four online programs are reviewed. Finally, advantages and disadvantages of these available programs and proposed key points for future developments have been discussed.
© 2019 Bentham Science Publishers.

Entities:  

Keywords:  Genomics; bioinformatics; genome-wide expression; meta-analysis; microarray experiment; transcriptomics

Year:  2019        PMID: 32476989      PMCID: PMC7235394          DOI: 10.2174/1389202920666190822113912

Source DB:  PubMed          Journal:  Curr Genomics        ISSN: 1389-2029            Impact factor:   2.236


INTRODUCTION

Advances in transcriptomic methods have led to a large number of published Genome-Wide Expression Studies (GWES), in humans and model organisms [1]. Broad application of international guidelines, such as the Minimum Information About a Microarray Experiment (MIAME) [2], has facilitated transparency in the report of results from GWES [3]. For several years, GWES were mainly based on the use of microarray platforms (such as the chips developed by commercial companies such as Affymetrix and Illumina, which have tens of thousands of probes targeting a large number of transcripts) [4] to compare genome-expression data for two or more groups of samples of interest [5]. In recent years, it has also involved methods built on sequencing of the transcriptome (RNA-seq), which are based on the use of next-generation sequencing platforms [6]. Repositories of GWES results, with freely available, structured and complete data, are one of the best examples of open science [7], which is beneficial for replication of initial findings and for meta-research [8, 9]. In the latest version of NCBI GEO (https://www.ncbi.nlm.nih.gov/geo), there is information for more than 3 million samples from more than 112.000 series, originated from 19.000 platforms [10, 11]. ArrayExpress is an online database (https://www.ebi.ac.uk/ arrayexpress), created in 2002 and maintained by the European Bioinformatics Institute [1]. A large number of recent submissions to AE correspond to results from RNA-seq experiments and ArrayExpress has data from 72.000 experiments, for a total of 54.4 TB of available data [1]. Several available programs are useful for different steps in the bioinformatic analysis of individual GWES, such as GEOquery [12], GEO2R [11], shinyGEO [13] and Babelomics [14], among others. Meta-analysis of GWES is a powerful approach for the identification of differentially expressed genes in biological topics or diseases of interest, combining information from multiple primary studies [3, 15-18]. Multiple bioinformatic analyses are needed to carry out a meta-analysis of GWES, such as data quality check, inclusion of data from technical replicates, annotation of probes and statistical procedures for meta-analysis [16], among others [3]. Existing statistical methods for meta-analyses of GWES have been based on combination of p values (such as the Stouffer’s and Fisher’s methods), effect sizes (such as the fixed effects and random effects models or ranks (such as the product of ranks and sum of ranks [3, 16, 19]. Several programs have been developed for carrying out meta-analysis of GWES [15, 20-23]. In addition to nominal statistical significance results, many of these programs provide the option of corrections for multiple comparisons, such as the False Discovery Rate (FDR) [24], among other bioinformatic procedures. In this article, the main features of available software for carrying out meta-analysis of GWES have been reviewed. In order to identify the available programs for meta-analysis of GWES, a search in PubMed and Google Scholar databases [25] was carried out, which was complemented with a revision of reference lists of key original and review articles [18].

AVAILABLE PACKAGES IN THE BIOCONDUCTOR PLATFORM

The Bioconductor platform (www.bioconductor.org) was developed as an open and collaborative resource for the development and availability of software for bioinformatics and computational biology [26]. It has been broadly used and supported by the international scientific community and the latest release contains 1649 packages. In Bioconductor, the BiocManager::install() function is useful for the installation of packages. Table describes available software for meta-analyses of GWES in the Bioconductor platform, although some of those packages have not been described in articles published in indexed journals. In addition to the reference manuals, these packages have available tutorials or vignettes, which provide useful examples for the users. One of the most used packages is RankProd, which is based in the rank product method [27], being useful to integrate results from different microarray platforms [20, 28]. OrderedList measures the similarities found between gene lists and generates random scores from perturbed data to evaluate the statistical significance [21]. GeneMeta allows to use fixed-effects or random-effects models [29] for meta-analysis of GWES data. MetaArray uses two methods (based on the Markov Chain Monte Carlo techniques and the expectation-maximization algorithm) [30] for obtaining a probability of expression in a meta-analysis. Crossmeta is a package that facilitates analyses for different platforms and species, carrying out effect size and pathway meta-analyses. MetaSeq is based on the non-parametric method NOISeq [31] for carrying out meta-analysis of RNA-seq studies. Metahdep is a package that allows to carry out meta-analysis of GWES using fixed-effects or random-effects models and taking hierarchical dependence into account [22].

AVAILABLE PACKAGES IN THE CRAN PLATFORM

The Comprehensive R Archive Network (CRAN) (https://cran.r-project.org) was created more than 20 years ago, as a public repository of packages for the R platform contributed by the international scientific community [32]. It currently has more than 14.000 available packages and the install.packages() and library() functions are useful for installing and running packages, respectively. Table describes available software for meta-analyses of GWES in the CRAN platform; all those packages have been described in articles published in indexed journals. Some of these programs have detailed tutorials available. RankAggreg allows combining gene lists from different studies and platforms, using the Genetic or the Cross-Entropy Monte Carlo algorithms [33]. metaMA is a package for meta-analysis using moderated effect sizes and p value combinations [34]. MetaPath facilitates the development of pathway enrichment meta-analyses, with an exploration of significance for entire pathways or for each gene [35]. MetaRNASeq is a package for meta-analysis of RNA-seq studies, using the inverse normal and Fisher combination methods for p value combinations [36]. MetaIntegrator allows the meta-analysis based on effect sizes and p values combination [37].

OTHER PREVIOUSLY DESCRIBED PROGRAMS

In addition to the packages available in the Bioconductor and CRAN platforms and reviewed above, there are other freely available programs for meta-analysis of GWES. Table describes other freely available software for meta-analyses of GWES. Some of these programs are R packages that are not available on the CRAN or Bioconductor platforms and some other programs are not currently on the internet. A-MADMAN is a program written in Python, available for Windows and Linux operative systems, for meta-analysis of GWES using data combination [38]. BayesPoolMicro is a program for Windows (it needs the WinBUGS software) and Linux, using a Bayesian hierarchical model for meta-analysis [39]. ICS is a program written in C++ that allows to identify the consistency of findings between GWES [40]. MAAMD runs in Windows and Mac OSX systems, allowing the preparation of data for meta-analysis of results obtained with the Affymetrix platforms [41], with help of the AltAnalyze program [42]. MAID is an R package that allows to carry out meta-analyses for two-channel microarrays, in addition to one-channel platforms [43]. metaGEM is an R package that was developed by Ramasamy et al. for carrying out meta-analysis of GWES data, for example, using a random-effects model [3]. MetaOmics is a pipeline with several modules for different types of computational studies, including a module with 12 methods for meta-analysis of GWES [44]. METRADISC is a program for carrying out meta-analysis of GWES, based on rank of genes between studies and using a non-parametric method (Monte Carlo permutations) [45]. MTGDR is an R package, based on the meta threshold gradient descent regularization method for meta-analysis of GWES [46].

AVAILABLE ONLINE PROGRAMS

There are only four available online programs for meta-analysis of GWES. NetworkAnalyst (https://www.networkanalyst.ca) is a user-friendly online platform that has the possibility of receiving as input GWE results from an important number of microarray platforms for different species [23]. In addition, expression data can be entered in the program using gene symbols as identifiers (instead of probe IDs) and it allows meta-analyses for RNA-seq studies. Users have a limit of 1000 samples for meta-analyses and NetworkAnalyst provides the tools for annotation, normalization and exploration of batch effects. It has several options for meta-analytical procedures, such as random and fixed-effects models, combining p values and vote counting [23]. It was previously called INMEX [47] and the two of the primary articles, [48] and [49], have 135 and 199 citations respectively. ExAtlas (https://lgsun.irp.nia.nih.gov/exatlas) is an online program for meta-analyses of GWES, including four statistical approaches: random and fixed-effects models and Fisher's method [50]. It has options for helping with extraction of data from NCBI GEO database and for carrying out other types of bioinformatic analyses, such as correlations between datasets and gene set enrichment and overlap, among others [50]. SMAGEXP (https://github.com/sblanck/smagexp) is an initiative [51] that incorporates the metaMA [34] and metaRNAseq [36] packages into the Galaxy online platform (https://usegalaxy.org) [52, 53]. Finally, RNA Meta Analysis (https://rnama.com) is an online program that facilitates several steps, such as preprocessing and annotation, for carrying out meta-analyses of GWES.

EXAMPLES OF PUBLISHED META-ANALYSES USING AVAILABLE SOFTWARE

The use of available software, particularly those programs that have been initially well described in peer-reviewed publications, is quite helpful for the development of meta-analyses of GWES, in humans and other organisms [15, 18]. It facilitates the development of statistical analyses for genome-wide data and its replication by other researchers [15, 18]. In this section, some illustrative examples of use of available software for meta-analyses of GWES, reported in international articles, are highlighted. Wang et al. carried out a meta-analysis for 7 GWES for Alzheimer´s disease and 9 GWES for Parkinson´s disease, which are available in NCBI GEO and ArrayExpress databases. The primary studies that were included used different microarray platforms and analyzed samples from several brain tissues. The authors used the RankProd package [20] for carrying the meta-analyses of GWES [54]. Jha et al. used 5 GWES available in the NCBI GEO database to carry out a meta-analysis for venous thrombosis, polycythemia vera and essential thrombocythemia. The authors used the metaMA package [34] to carry out the meta-analysis [55]. Piras et al. carried out a meta-analysis for 3 GWES that included samples from peripheral tissues from schizophrenia patients and controls; they used the GeneMeta package [56]. Forero et al. carried out a meta-analysis for GWES, using data from 24 previously published studies for patients with major depressive disorder and controls, with RNA extracted from different tissues and analyzed in multiple microarray platforms. They used the NetworkAnalyst program [23] for carrying out the meta-analysis of available GWES data [57]. Manchia et al. did a meta-analysis for 5 GWES for schizophrenia, which are available in the NCBI GEO database. One of these GWES was focused on human induced pluripotent stem cell-derived neurons and the other 4 GWES were carried out in post-mortem brain samples. They used the GeneMeta package for the meta-analysis of GWES data [58].

CONCLUSION AND FUTURE PERSPECTIVES

As discussed above, there are multiple available programs for carrying out meta-analysis of GWES. Packages that are on the Bioconductor and CRAN platforms have the advantages that they are easy to install [26] but some of them, from the perspective of the researchers in genomics, are not completely user friendly. On the other hand, available online programs have the advantage of being user friendly but have the limitations in number of samples or file sizes. Given the size and complexity of recent GWES datasets (that in several cases have hundreds of samples), it would be important to have novel or updated programs that are both user-friendly and computationally powerful and that facilitates, in addition to analysis of microarray data, the development of meta-analysis of large RNA-seq studies [6, 59]. In this context, broad implementation of standards for reporting of RNA-seq experiments, similar to the MIAME guidelines, would be quite helpful [60]. Programs that have detailed use guidelines might have a higher possibility of being employed extensively and adequately by the international scientific community. As maintenance and update of bioinformatics software is a critical issue [61], it is important to highlight that the dependence on other programs leads to some issues when running some Bioconductor and CRAN packages. An important number of the citations of the articles describing the different CRAN and Bioconductor packages were related to methodological developments. On the other hand, several published articles describing results for meta-analysis of GWES used in-house scripts, instead of available software [62-64]. Some of these programs can also be used for meta-analysis of other types of larger -omics datasets, such as genome-wide methylation studies [65], which have a larger number of probes. Integration of meta-analysis of GWES with experimental approaches and with other in silico explorations [66-70] would lead to a better and deeper understanding of multiple biological processes and of pathophysiology of diseases.
Table 1

Available packages in the Bioconductor platform.

Package Article Citations Depends on Rank
RankProdHong 2006634R >= 3.2.1, stats, methods, Rmpfr, gmp200
OrderedListLottaz 200668R>= 2.1.0, Biobase, twilight, method332
GeneMetaNANAR>= 2.10, methods, Biobase, genefilter601
MetaArrayNANANA689
crossmetaNANAR>= 3.3749
metaSeqNANAR>= 2.13.0, NOISeq, snow, Rcpp876
metahdepStevens 200913R>= 2.10, methods1344

NA: Not Available.

Table 2

Available packages in the CRAN platform.

Package Article Citations Tutorial Depends on
RankAggregPihur 2009223YesR≥ 2.12.0, gtools
metaMAMarot 200986YesR≥ 3.1.2, limma, SMVar
metaPathShen 201069NoR≥ 3.0.0, Biobase, GSEABase, genefilter, impute
metaRNASeqRau 201434YesR≥ 2.15.0
MetaIntegratorHaynes 201719YesR≥ 3.3, rmeta, multtest, ggplot2, parallel, Rmisc, gplots, Biobase, RMySQL, DBI, stringr, preprocessCore, GEOquery, GEOmetadb, RSQLite, data.table, ggpubr, ROCR, zoo, pracma, COCONUT, Metrics, manhattanly, snplist, DT, pheatmap, plyr, boot, dplyr, reshape2, rmarkdown, AnnotationDbi, HGNChelper, magrittr, readr

NA: Not Available.

Table 3

Other previously described software.

Program Article Citations Link
A-MADMANBisognin 200935compgen.bio.unipd.it/bioinfo/amadman/
BayesPoolMicroConlon 200652www.math.umass.edu/~conlon/research/BayesPoolMicro
ICSRajaram 20095NA
MAAMDGan 201411www.biokepler.org/use_cases/maamd-workflow-standardize-meta-analyses-affymetrix-microarray-data
MAIDBorozan 200813NA
metaGEMRamasamy 2008NANA
MetaOmicsMa 20181https://github.com/metaOmics/metaOmics
METRADISCZintzaras 200851NA
MTGDRMa 200949http://www.cs.uiowa.edu/~jian/MTGDR/main.html

NA: Not Available.

  69 in total

1.  Moderated effect size and P-value combinations for microarray meta-analyses.

Authors:  Guillemette Marot; Jean-Louis Foulley; Claus-Dieter Mayer; Florence Jaffrézic
Journal:  Bioinformatics       Date:  2009-07-23       Impact factor: 6.937

2.  Differential expression in RNA-seq: a matter of depth.

Authors:  Sonia Tarazona; Fernando García-Alcalde; Joaquín Dopazo; Alberto Ferrer; Ana Conesa
Journal:  Genome Res       Date:  2011-09-08       Impact factor: 9.043

3.  Pattern of gene expression in different stages of schizophrenia: Down-regulation of NPTX2 gene revealed by a meta-analysis of microarray datasets.

Authors:  Mirko Manchia; Ignazio S Piras; Matthew J Huentelman; Federica Pinna; Clement C Zai; James L Kennedy; Bernardo Carpiniello
Journal:  Eur Neuropsychopharmacol       Date:  2017-07-18       Impact factor: 4.600

Review 4.  Meta-analysis methods for genome-wide association studies and beyond.

Authors:  Evangelos Evangelou; John P A Ioannidis
Journal:  Nat Rev Genet       Date:  2013-05-09       Impact factor: 53.242

5.  Meta-Analysis in Gene Expression Studies.

Authors:  Levi Waldron; Markus Riester
Journal:  Methods Mol Biol       Date:  2016

6.  Meta-Analysis of Parkinson's Disease and Alzheimer's Disease Revealed Commonly Impaired Pathways and Dysregulation of NRF2-Dependent Genes.

Authors:  Qian Wang; Wen-Xing Li; Shao-Xing Dai; Yi-Cheng Guo; Fei-Fei Han; Jun-Juan Zheng; Gong-Hua Li; Jing-Fei Huang
Journal:  J Alzheimers Dis       Date:  2017       Impact factor: 4.472

7.  NetworkAnalyst--integrative approaches for protein-protein interaction network analysis and visual exploration.

Authors:  Jianguo Xia; Maia J Benner; Robert E W Hancock
Journal:  Nucleic Acids Res       Date:  2014-05-26       Impact factor: 16.971

8.  SMAGEXP: a galaxy tool suite for transcriptomics data meta-analysis.

Authors:  Samuel Blanck; Guillemette Marot
Journal:  Gigascience       Date:  2019-02-01       Impact factor: 6.524

9.  NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis.

Authors:  Guangyan Zhou; Othman Soufan; Jessica Ewald; Robert E W Hancock; Niladri Basu; Jianguo Xia
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

10.  Meta-research matters: Meta-spin cycles, the blindness of bias, and rebuilding trust.

Authors:  Lisa Bero
Journal:  PLoS Biol       Date:  2018-04-02       Impact factor: 8.029

View more
  1 in total

1.  Integrative OMICS Data-Driven Procedure Using a Derivatized Meta-Analysis Approach.

Authors:  Karla Cervantes-Gracia; Richard Chahwan; Holger Husi
Journal:  Front Genet       Date:  2022-02-04       Impact factor: 4.599

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.