Literature DB >> 31688915

Model-based clustering of multi-tissue gene expression data.

Pau Erola^1,2, Johan L M Björkegren^3,4, Tom Michoel^1,5.

Abstract

MOTIVATION: Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues.
RESULTS: We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals.
AVAILABILITY AND IMPLEMENTATION: Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Year: 2020 PMID： 31688915 PMCID： PMC7162352 DOI： 10.1093/bioinformatics/btz805

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Clustering gene expression data into groups of genes sharing the same expression profile across multiple conditions remains one of the most important methods for reducing the dimensionality and complexity of large-scale microarray and RNA-sequencing datasets (Andreopoulos ; D’haeseleer, 2005; van Dam ). Coexpression clusters group functionally related genes together, and reveal how diverse biological processes and pathways respond to the underlying perturbation of the biological system of interest. Traditionally, clustering is performed by collecting data from multiple experimental treatments (Eisen ), time points (Spellman ), cell or tissue types (Freeman ), or genetically diverse individuals (Ghazalpour ) in a single data matrix from which meaningful patterns are extracted using any of a whole range of statistical and algorithmic approaches. More recently, it has become feasible to probe systems along two or more of these dimensions simultaneously. In particular, we are interested in multi-tissue data, where gene expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals (Foroughi Asl ; Franzén ; Fu ; Greenawalt ; Grundberg ; GTEx Consortium, 2017; Hägg ; Keller ). These data can potentially reveal the similarity and differences in (co)expression between tissues as well as the tissue-specific variation in (co)expression across individuals. However, when traditional clustering methods are applied to this type of data, important information is lost. For instance, if each tissue-specific sub-dataset is clustered independently, the resulting sets of clusters will rarely align, and to compare clusters across tissues, one will be faced with the general problem of determining cluster preservation statistics (Langfelder ). If instead the data are concatenated ‘horizontally’ in a single gene-by-sample matrix, a common set of clusters will be found, but these will be biased heavily towards house-keeping processes that are coexpressed in all tissues. A potentially more promising approach is to concatenate data ‘vertically’ in a tissue-gene-by-individual matrix, where the entities being clustered are ‘tissue-genes’, the tissue-specific expression profiles of genes (Dobrin ; Talukdar ). However, in studies with a large number of tissues, the number of individuals with available data in all tissues is typically very small, i.e. a large number of samples will have to be discarded to obtain a tissue-gene-by-individual matrix without missing data. Dedicated clustering algorithms for multi-tissue expression data are scarce and mostly based on using the higher-order generalized singular value decomposition or related matrix decomposition techniques to identify common and differential clusters across multiple conditions (Li ; Ponnapalli ; Xiao ). However, these methods either require that all tissues have the same number of one-to-one matching samples (Ponnapalli ), or that tissue-specific coexpression networks are reconstructed for each tissue separately as a preliminary step (Li ; Xiao ). Bayesian model-based clustering methods, which model the data as a whole using mixtures of probability distributions (Fraley and Raftery, 2002; Ickstadt ; Si ), are an attractive alternative approach for clustering multi-tissue data, because they would allow, at least in principle, to account for different noise levels and sample sizes in different tissues and to incorporate prior information on the relative similarity between certain tissues based on their known physiological function. Here we present a novel statistical framework and inference algorithm for model-based clustering of multi-tissue gene expression data, which can incorporate prior information on tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues.

2 Materials and methods

2.1 Approach

In model-based clustering, a partitioning of genes into non-overlapping clusters parametrizes a probabilistic model from which the expression data is assumed to have been generated, typically in the form of a mixture distribution where each cluster corresponds to one mixture component. Using Bayes’ theorem, this can be recast as a probability distribution on the set of all possible clusterings parameterized by the expression data, from which maximum-likelihood solutions can be obtained using expectation-maximization or Gibbs sampling. Our approach to clustering multi-tissue data combines ideas from existing ordinary (‘single-tissue’) and multi-species model-based clustering methods. We use the generative model of Qin (2006) and Joshi to obtain the posterior probability for a (single-tissue) clustering given a (single-tissue) dataset. From Roy we use the idea that a multi-tissue clustering consists of a set of linked clusters, where cluster k in one tissue corresponds to cluster k in any other tissue, and each cluster k contains a core set of genes, belonging to cluster k in all tissues, and a differential set of tissue-specific genes, belonging to cluster k in one or more, but not all, tissues. Like Roy , we assume that the data from one tissue can influence the clustering in another tissue, albeit via a simpler mechanism as we do not aim to reconstruct any phylogenetic histories among tissues. In brief, we assume that the posterior probability distribution of clusterings in tissue t is given by its ordinary single-tissue distribution given the expression data for tissue t, multiplied by a tempered distribution for observing that same clustering given the expression data for all other tissues . The degree of tempering determines the degree of influence of one tissue on another, and can be used to model known prior relationships between tissues. For instance, we expect a priori that coexpression clusters will be more similar between vascular tissues, than between vascular and metabolic tissues.

2.2 Statistical model for single-tissue clustering

Our method is based on previous single-tissue, model-based clustering algorithms (Joshi ; Qin, 2006). In brief, for an expression data matrix for G genes and N samples, a clustering is defined as a partition of the genes into K non-overlapping sets C. We assume that the data points for the genes in each cluster and each sample are normally distributed around an unknown mean and unknown variance/precision. Given a clustering and a set of means and precisions for each cluster k and sample n, we obtain a distribution on expression data matrices as Assuming a uniform prior on the clusterings and independent normal-gamma priors on the normal distribution parameters, we can use Bayes’ rule to find the marginal posterior probability of observing a clustering given data , upto a normalization constant: Note that we use a capital ‘P’ to indicate that this is a discrete distribution. is the normal-gamma prior, with and being the parameters of the normal-gamma prior distribution. We use the values and , resulting in a non-informative prior. The double integral in (1) can be solved exactly in terms of the sufficient statistics () for each cluster, see Joshi for details. For computational purposes, the decomposition of Eq. (1) into a product of independent factors, one for each cluster and sample, is important. We write the log-likelihood or Bayesian score accordingly as:

2.3 Statistical model for multi-tissue clustering

Next, we assume that expression data is available for G genes in T tissues, with N samples in each tissue . We define a multi-tissue clustering as a collection of single-tissue clusterings , and assume that the probability of observing given data is given by where Z is a normalization constant which we henceforth will ignore, each factor is a single-tissue posterior probability distribution defined in Eq. (1), and is a set of hyper-parameters that define the prior tissue similarities; for notational convenience we define . Note that is a discrete distribution measuring how well clustering is supported by data . Raising a discrete distribution to a power less than 1 has the effect of making the distribution more uniform. Hence in Eq. (3), we are asking that clustering is supported predominantly by data from its own tissue, but also, albeit to a lesser extent depending on the values of , by data from the other tissues. Optimizing Eq. (3) across all multi-tissue clusterings is challenging. A considerable simplification is obtained if we constrain the problem to multi-tissue clusterings with the same number of clusters K in each tissue. Denoting by the set of samples/individuals in tissue t and by the total number of samples, the decomposition in Eq. (2) allows to write: where we used , defined , with t(n) the tissue to which sample n belongs, and wrote to denote the Bayesian score of clustering with respect to sample n. Two extremal choices for the hyper-parameters are of interest. If for all , then the Bayesian score is the same for each tissue t and identical to Eq. (2) for the concatenated data matrix . Hence this is equivalent to clustering the entire dataset as if it came from a single-tissue (‘horizontal’ data concatenation). If for , then Eq. (3) decomposes as a product of independent single-tissue factors. This is equivalent to clustering each tissue sub-dataset independently.

2.4 Optimization algorithm

To find a local maximum of the Bayesian score in Eq. (4), the following heuristic, greedy optimization algorithm was used: Data standardization: Using appropriately normalized gene expression data, each gene is standardized to have mean zero and standard deviation one on the concatenated data . Determine the number of clusters: K-means clustering is run on the concatenated data with the number of clusters ranging from 2 to 100. The optimal number K is selected by visual inspection of an elbow plot. Initialize multi-tissue clustering: Starting from the k-means clustering output at the selected number of clusters, genes are reassigned until a local optimum is reached for the single-tissue score Eq. (2) on the concatenated data . All are initialized by this clustering. Optimize multi-tissue clustering: For each tissue t, optimize by finding a local maximum for the Bayesian score Eq. (5) using single-gene reassignments; only gene reassignments improving the score by a minimum threshold ϵ are considered. Note that even in the case for , which removes all tissue dependencies in the Bayesian score (4), this algorithm still results in a multi-tissue clustering with linked clusters, due to each tissue being initialized by the same clustering and converging to a local optimum.

2.5 Implementation

The statistical model and optimization algorithm have been implemented in Java, as an extension of the ‘task’ revamp in the Lemon-Tree software (Bonnet ; Erola ), available at https://github.com/eb00/lemon-tree.

2.6 The Stockholm Atherosclerosis Gene Expression dataset

In the STockholm Atherosclerosis Gene Expression (STAGE) study, 612 tissue samples from 121 individuals were obtained during coronary artery bypass grafting surgery from the atherosclerotic arterial wall (AAW, n = 73), internal mammary artery (IMA, n = 88), liver (n = 87), skeletal muscle (SM, n = 89), subcutaneous fat (SF, n = 72) and visceral fat (VF, n = 98) of well-characterized CAD patients; fasting whole blood (WB) was obtained for isolation of DNA (n = 109) and RNA (n = 105) and biochemical analyses. Gene expression profiles from RNA samples of different tissues were jointly normalized to enable comparison across tissues (Foroughi Asl ; Hägg ; Talukdar ). 4956 genes with variance greater than 1 across all 612 samples were selected for further analysis, and subsequently standardized to have mean zero and standard deviation one, again across all 612 samples.

2.7 Multi-tissue clustering methods for comparison

We ran four multi-tissue clustering methods (see Supplementary Fig. S1): Revamp with reassignment threshold and prior tissue similarities , where is the average correlation coefficient between samples from tissue t and measured in the same individual and is a dissipation parameter to scale the correlation values. Here we suggest to derive the similarity coefficients using Pearson’s correlation, but other distance measures could be used. Revamp with reassignment threshold and prior tissue similarities . An alternative method, which treats the expression profile of each gene g in each tissue t as a separate (gene, tissue) variable and clusters the resulting (gene, tissue)-by-individual expression matrix using the single-tissue clustering algorithm (Section 2.2). This results in a single set of clusters, which are disentangled into a set of linked clusters, by assigning gene g to cluster m in tissue t whenever (g, t) belongs to original cluster m. This method was called ‘vertical data concatenation’ before, and relies on having expression data from multiple tissues in the same individual. In STAGE, 21 individuals had data in all 7 tissues. Single-tissue clustering on the entire dataset of 612 samples (called ‘horizontal data concatenation’ before). This results in an identical clustering across all tissues. It is not a true multi-tissue clustering method, but is used as an overall benchmark to determine the relevance of a multi-tissue approach.

2.8 Validation data

To evaluate the biological relevance of each multi-tissue clustering method, we used the following approach: We performed GSEA using first the GOSlim ontology, that gives a broad overview of the ontology content without the detail of the specific fine-grained terms (http://www.geneontology.org/page/go-slim-and-subset-guide), and after on GO terms (http://www.geneontology.org/page/download-ontology). We assigned sets of ‘regulators’ to each of the modules considering as candidate regulators the tissue-specific sets of genes with significant eQTLs identified in Foroughi Asl (2464 AAW, 3209 IMA, 4491 liver, 2534 SM, 2373 SF, 2994 VF and 5691 WB genes). We obtained human tissue protein–protein interaction (PPI) networks from Barshir . Specifically, we used TissueNet v2 networks consisting of curated experimentally detected PPIs between proteins expressed in Genotype-Tissue Expression dataset tissues ‘Artery Aorta’, ‘Liver’, ‘Muscle Skeletal’, ‘Adipose Subcutaneous’, ‘Adipose Visceral’ and ‘Whole Blood’, available for download at http://netbio.bgu.ac.il/labwebsite/? q=tissuenet2-download.

2.9 Validation methods

We tested for GO functional enrichment using the task go_annotation in the Lemon-Tree software, and task regulators were used to identify gene ‘regulators’ using a probabilistic scoring (Joshi ). To test for enrichment of known PPIs in a given clustering, we calculated the fold-change enrichment as All clustering methods were run on the seven available STAGE tissues, and the results for six tissues were used for validation (IMA did not have a matching tissue in the TissueNet database). To evaluate the clustering of a particular tissue, we used all PPIs for that tissue. To evaluate the core gene set of a cluster (for cluster m, the set of genes belonging to m in all tissues), we used the set of PPIs shared across all tissues. Because the fold-change value is influenced by the number of clusters (more clusters results in fewer co-clustered pairs), we used the same number (k = 12) of clusters for all compared methods (Section 2.7).

3 Results

3.1 Multi-tissue clustering with revamp produces mappable clusters with tunable overlap levels

To identify co-expression clusters that reflect biological similarities and differences across tissues, we analyzed samples from seven tissues from the STAGE study. First we initialized revamp with the partition obtained from clustering all tissue samples using k-means with k = 12 clusters for all our analyses, as this value was near the inflection point of the elbow plots in all tissues (Supplementary Fig. S2). Then we updated the cluster assignments for each tissue independently using our Bayesian model-based score that depends on a set of hyper-parameters , expressing prior beliefs on pairwise tissue similarities (Section 2.3), using a greedy optimization algorithm that has one free parameter ϵ, the minimum gain in Bayesian score for reassigning a gene from one cluster to another (Section 2.4). The resulting multi-tissue clustering consists of a set of linked clusters, where cluster k in one tissue corresponds to cluster k in any other tissue. Genes that belong to a particular cluster k in all tissues form a core set of genes with conserved coexpression across tissues, whereas genes that belong to cluster k in one or more, but not all, tissues form tissue-specific sets of genes that are differentially coexpressed with the core of cluster k. To test the influence of the method parameters, we systematically tested a large space of parameter combinations (Supplementary Fig. S3). Both the reassignment threshold ϵ and tissue similarities ultimately govern the degree of overlap across tissues of the linked clusters, with small thresholds and near-zero similarities leading to nearly tissue-independent clusterings, and large thresholds and/or near-one similarities leading to nearly identical clusterings. Although ϵ and are to some extent interchangeable (i.e. a smaller threshold value can be compensated by a uniform increase in similarity values), setting ϵ to a small, non-zero value is recommended to avoid spurious reassignments due to numerical round-off errors in the Bayesian score calculation. When comparing this partitioning with clustering tissues independently, the cluster quality is improved (Supplementary Table S1) and the similarities between tissues are stronger. The functional enrichment analysis revealed that a larger proportion of functional enriched categories were shared across two or more tissues (Supplementary Fig. S4). Moreover, similarity heatmaps showed that the degree of shared enrichment between tissues in our clustering was able to reflect the degree of overall expression similarity (Supplementary Fig. S5). Yet it is noteworthy to mention that multi-tissue clustering methods, and in particular revamp when using prior tissue similarities that is optimized based on Eq. (5), may show fuzzy borders when assessed with traditional validation methods like silhouette scores (see Supplementary Fig. S6).

3.2 Revamp multi-tissue clustering is more enriched for tissue protein–protein interactions than other approaches

To evaluate the performance of revamp, we ran four different multi-tissue clustering methods (see Methods), testing for each one for the enrichment of human tissue protein-protein interactions (PPIs) from the TissueNet database (Barshir ) among co-clustered genes, using six tissues that matched between STAGE and TissueNet. On a tissue-by-tissue basis, running revamp with or without prior tissue similarity values resulted in similar fold-change enrichment values for tissue PPIs (average fold-change over 6 tissues of 1.49 and 1.48, respectively) as running single-tissue clustering on all samples together (average fold-change 1.50), and considerably higher enrichment than using vertically concatenated data (average fold-change 1.22) (Fig. 1). For a baseline reference, we also calculated enrichment for each tissue clustered individually using the single-tissue clustering method. Consistent with the assumption that analyzing data integratively using multi-tissue clustering should improve biological relevance, single-tissue clustering resulted in lower fold-change values (average fold-change 1.31) (Fig. 1).

Fig. 1.

Fold-change enrichment of tissue PPIs in tissue clusters for four multi-tissue clustering methods and individual single-tissue clustering. RW4—revamp with prior tissue similarities set according to their overall expression correlation, RA—revamp with prior tissue similarities set to zero, VERT—vertical data concatenation, HORIZ—horizontal data concatenation, INDIV—each tissue clustered individually. Each colored bar shows the fold-change overlap of tissue PPIs in clusters for the matching tissue; the black bar shows the fold-change overlap of tissue-shared PPIs in tissue-shared genes of linked clusters. See Section 2 for details. (Color version of this figure is available at Bioinformatics online.) We further reasoned that genes assigned consistently to the same cluster across all tissues (‘core’ cluster genes) should reflect tissue-independent interactions between these genes. To test this hypothesis, we calculated enrichment of tissue-independent PPIs (i.e. PPIs present in all six tissue PPI networks) among core cluster genes. For revamp with prior tissue similarity values, a significant increase in enrichment for tissue-independent PPIs was observed (fold-change 1.72), whereas for revamp without prior tissue similarities and horizontal data concatenation no difference was observed compared to all tissue PPIs (fold-changes 1.47 and 1.57, respectively) (Fig. 1). Vertical data concatenation resulted in very small core gene sets, containing no known tissue-independent PPIs (see also Supplementary Table S2).

3.3 Functional predictions by Revamp clusters and gene regulators associated with CAD

To test whether the clustering algorithm accurately captures the higher-level biological process represented by each module we first performed gene ontology enrichment analysis (see top enrichments in Supplementary Table S3). Network analysis revealed three connected components: clusters 5, 9 and 10 were related with immune system response; the lipid metabolic process was enriched in clusters 4, 6 and 7; and clusters 0 and 8 were associated with cell adhesion and extracellular matrix organization. Then we ran independently on each tissue the regulator probabilistic scoring task (see Section 2.9) to predict upstream regulatory genes, considering as candidate regulators the tissue-specific genes with genetic variants in their regulatory regions affecting gene expression (‘cis-eQTL effects’). The regulatory network of the most significant regulators for the inferred modules is depicted in Figure 2.

Fig. 2.

Module regulatory network for all seven tissues. Regulators are presented as squares and clusters as circles with size proportional to the number of genes in the cluster. Only the regulators with a score greater than 20 in the regulators task are represented, and we named those with a score above 60. Edges are colored per tissue as per Figure 3, and their width is proportional to the regulator score. (Color version of this figure is available at Bioinformatics online.)

Fig. 3.

Network representation of the correlation between the eigengenes, the first principal component of a given module, and relevant CAD phenotypes (squares), aggregated per tissue (circles). Edge width is inversely proportional to the correlation P-value

The development of atherosclerosis is in large part mediated by the inflammatory cascade (Crowther, 2005). Our results indicated that the inflammatory response in AAW may be regulated by PTAFR, a mediator in platelet aggregation and the inflammatory response (Perisic ; Rastogi ). SF and VF were shown to be regulated by SIGLEC10 and CD247, respectively, genes that have been previously associated with CAD (Ammirati ; Shen ). Other tissues were linked to the previously identified inflammatory regulators BIN2 (Liao ), CD2 (Hansson and Libby, 2006), RAC2, that also directs plaque osteogenesis (Ceneri ), and the pro-apoptotic regulator of RAS protein, RASSF5 (Dejeans ). Lipid metabolism also plays a key role in the development of atheroma plaques. Metabolism-related clusters 6 and 7 were found to be regulated by AGXT2 and SPP2, in SF and VF respectively. AGXT2 polymorphisms were identified as risk for CAD in Asian populations (Yoshino ; Zhou ), and SPP2 may contribute to the atheroprotective effects of HDL (Abdel-Latif ). AADAC, that controls the export of sterols (Tiwari ), may also be a regulator in SM. In WB, we found MASP1, a gene associated with a decreased lectin pathway activity in acute myocardial infarction patients (Yan ). The atherogenic pathway involves the inflammation of the arterial wall, injury of the intima, lipid infiltration and activation of the angiogenic signaling, processes that involve a dysfunction in the cell adhesion (Sun, 2014). Our analysis showed that RAB31, which induces lipid accumulation in atheroma plaques (Fu ), regulates the morphogenesis-related clusters 3 and 8 in SM. Cluster 3 was also shown to be regulated by CACNA1C in SF, a gene involved in calcium channels and associated with inherited cardiac arrhythmia (Kawashiri ), and COL18A1 in VF, that may control angiogenesis and vascular permeability (Moulton ). The expression levels of PCDH7, gene involved in cell adhesion, and TUBA1 were also previously correlated with CAD (Chittur ; Eyster ; Sinnaeve ).

3.4 Revamp discovers multi-tissue clusters underlying CAD phenotypes

The systems genetics paradigm says that genetic variants in regulatory regions affect nearby gene expression (‘cis-eQTL effects’), which then causes variation in downstream gene networks (‘trans-eQTL effects’) and clinical phenotypes. Ultimately, gene-gene interactions across metabolic and vascular tissues will enable information flow to the end stage phenotypic changes in CAD. We therefore used regression analysis to identify associations between module gene expression and CAD phenotypes (see Talukdar ), as presented in Figure 3. Network representation of the correlation between the eigengenes, the first principal component of a given module, and relevant CAD phenotypes (squares), aggregated per tissue (circles). Edge width is inversely proportional to the correlation P-value The aggregated results revealed that AAW and SF are the main tissues associated with very-low-density lipoprotein (VLDL) and low-density lipoprotein (LDL) cholesterol levels, while the liver was the main tissue associated with high-density lipoprotein (HDL) cholesterol. Fat has been previously identified as the main contributor of CAD heritability, and the top regulatory networks in CAD have shown to be strongly enriched in associations with plasma levels of HDL, LDL and pro-insulin (Zeng ), as it is depicted in the left part of Figure 3. Besides that, IMA was found to be associated in cluster 3 with the thyroid-stimulating hormone, that causes many hemodynamic effects and influences the structure of the heart and circulatory system (Grais and Sowers, 2014), and alcohol consumption in clusters 5 and 9, whose associations with cardiovascular diseases are heterogeneous (Bell ). On the other hand, the results showed that the phenotypes related to anthropometric measurements are mostly associated with SM, liver and IMA, and with less significance with WB and AAW, but not with SF and VF. If we focus on clusters related to body weight, as a typical example of a trait regulated by, and affecting multiple tissues, we can find gene regulators such as PTAFR (in AAW) and CD2 (in IMA) which have been described to affect food intake and body weight, apart from the inflammatory response (Are Hanssen ; Li and McIntyre, 2015). In SM, RAC31 may influence on the body weight by mediating the insulin-stimulated glucose uptake (Lyons ). Last, also the candidate regulators BIN2 and RAC2 have been associated with obesity and metabolic syndrome (Aguilera ; Zhang ).

4 Conclusion

Herein we proposed a Bayesian model-based multi-tissue clustering algorithm, revamp, which incorporates prior information on physiological tissue similarity, and which results in a set of clusters consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from the STAGE study, we demonstrated that our method resulted in multi-tissue clusters with higher enrichment of tissue-specific protein-protein interactions than comparable clustering algorithms. Moreover, the multi-tissue clusters highlighted the ability of revamp to link together regulatory genes, biological processes and clinical patient characteristics in a meaningful way across multiple tissues, and we believe this makes it an attractive and statistically sound method for analyzing multi-tissue gene expression datatsets in general. Revamp is implemented and freely available in the Lemon-Tree software at https://github.com/eb00/lemon-tree.

Funding

This work was supported by BBSRC [Roslin Institute Strategic Programme, BB/P013732/1] and the NIH [NHLBI R01HL125863]. P.E. has been partially supported by CRUK [C18281/A19169]. Conflict of Interest: none declared. Click here for additional data file.

58 in total

Review 1. Lysophospholipids in coronary artery and chronic ischemic heart disease.

Authors: Ahmed Abdel-Latif; Paula M Heron; Andrew J Morris; Susan S Smyth
Journal: Curr Opin Lipidol Date: 2015-10 Impact factor: 4.776

Review 2. Learning Differential Module Networks Across Multiple Experimental Conditions.

Authors: Pau Erola; Eric Bonnet; Tom Michoel
Journal: Methods Mol Biol Date: 2019

3. BRAP Activates Inflammatory Cascades and Increases the Risk for Carotid Atherosclerosis.

Authors: Yi-Chu Liao; Yung-Song Wang; Yuh-Cherng Guo; Kouichi Ozaki; Toshihiro Tanaka; Hsiu-Fen Lin; Ming-Hong Chang; Ku-Chung Chen; Ming-Lung Yu; Sheng-Hsiung Sheu; Suh-Hang Hank Juo
Journal: Mol Med Date: 2011-06-10 Impact factor: 6.354

Review 4. The immune response in atherosclerosis: a double-edged sword.

Authors: Göran K Hansson; Peter Libby
Journal: Nat Rev Immunol Date: 2006-06-16 Impact factor: 53.106

5. Multi-organ expression profiling uncovers a gene module in coronary artery disease involving transendothelial migration of leukocytes and LIM domain binding 2: the Stockholm Atherosclerosis Gene Expression (STAGE) study.

Authors: Sara Hägg; Josefin Skogsberg; Jesper Lundström; Peri Noori; Roland Nilsson; Hua Zhong; Shohreh Maleki; Ming-Mei Shang; Björn Brinne; Maria Bradshaw; Vladimir B Bajic; Ann Samnegård; Angela Silveira; Lee M Kaplan; Bruna Gigante; Karin Leander; Ulf de Faire; Stefan Rosfors; Ulf Lockowandt; Jan Liska; Peter Konrad; Rabbe Takolander; Anders Franco-Cereceda; Eric E Schadt; Torbjörn Ivert; Anders Hamsten; Jesper Tegnér; Johan Björkegren
Journal: PLoS Genet Date: 2009-12-04 Impact factor: 5.917

6. An acetylation/deacetylation cycle controls the export of sterols and steroids from S. cerevisiae.

Authors: Rashi Tiwari; René Köffel; Roger Schneiter
Journal: EMBO J Date: 2007-11-22 Impact factor: 11.598

7. Arboretum: reconstruction and analysis of the evolutionary history of condition-specific transcriptional modules.

Authors: Sushmita Roy; Ilan Wapinski; Jenna Pfiffner; Courtney French; Amanda Socha; Jay Konieczka; Naomi Habib; Manolis Kellis; Dawn Thompson; Aviv Regev
Journal: Genome Res Date: 2013-05-02 Impact factor: 9.043

8. Integrating genetic and network analysis to characterize genes related to mouse weight.

Authors: Anatole Ghazalpour; Sudheer Doss; Bin Zhang; Susanna Wang; Christopher Plaisier; Ruth Castellanos; Alec Brozell; Eric E Schadt; Thomas A Drake; Aldons J Lusis; Steve Horvath
Journal: PLoS Genet Date: 2006-07-05 Impact factor: 5.917

Review 9. Atherosclerosis and atheroma plaque rupture: normal anatomy of vasa vasorum and their role associated with atherosclerosis.

Authors: Zhonghua Sun
Journal: ScientificWorldJournal Date: 2014-03-20

10. Association between clinically recorded alcohol consumption and initial presentation of 12 cardiovascular diseases: population based cohort study using linked health records.

Authors: Steven Bell; Marina Daskalopoulou; Eleni Rapsomaniki; Julie George; Annie Britton; Martin Bobak; Juan P Casas; Caroline E Dale; Spiros Denaxas; Anoop D Shah; Harry Hemingway
Journal: BMJ Date: 2017-03-22

2 in total

1. Improving the generalization of unsupervised feature learning by using data from different sources on gene expression data for cancer diagnosis.

Authors: Zhen Liu; Ruoyu Wang; Wenbin Zhang
Journal: Med Biol Eng Comput Date: 2022-02-24 Impact factor: 2.602

2. Identification of ITGAX and CCR1 as potential biomarkers of atherosclerosis via Gene Set Enrichment Analysis.

Authors: Sheng Yan; Lingbing Meng; Xiaoyong Guo; Zuoguan Chen; Yuanmeng Zhang; Yongjun Li
Journal: J Int Med Res Date: 2022-03 Impact factor: 1.671

2 in total