| Literature DB >> 35758795 |
Andrew R Ghazi1,2,3, Kathleen Sucipto1, Ali Rahnavard1,2, Eric A Franzosa1,2,3, Lauren J McIver1,2,3, Jason Lloyd-Price1,2, Emma Schwager1, George Weingart1,3, Yo Sup Moon1, Xochitl C Morgan4, Levi Waldron5, Curtis Huttenhower1,2,3,6.
Abstract
MOTIVATION: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control.Entities:
Mesh:
Year: 2022 PMID: 35758795 PMCID: PMC9235493 DOI: 10.1093/bioinformatics/btac232
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Hierarchical all-against-all (HAllA) association testing. (A) HAllA provides a novel method for heterogeneous association discovery in high-dimensional data. Input data are represented in matrix form as features (rows) and samples (columns). Features within each dataset are hierarchically clustered using average-linkage and Spearman association as default methods. (B) Starting with the feature hierarchies and the full pairwise association matrix, HAllA descends through both trees rejecting putative blocks that fail the FNT threshold using Benjamini–Hochberg FDR threshold for pairwise associations within the block. When a block fails the FNT threshold, the decision of whether to cut the X or Y feature tree is guided by whichever cut yields a higher Gini impurity improvement. The process stops when a block is dense with marginal associations. (C) Significant associations are reported in a block-wise manner once the hierarchical descent step has terminated. In this example, the X3:Y2 pair is correctly included among the significant associations, where it would have been missed by an AllA approach
Fig. 4.Association of fatty acids with host transcriptional activity in murine liver. We applied HAllA to paired data comprising 120 hepatic transcript levels and 21 liver lipid levels in a set of 40 previously profiled mice (Martin ). In this ‘HAllAgram’ visualization of results, block associations are numbered in descending order of significance, with each numbered block corresponding to a group of coexpressed transcripts related to a group of co-occurring lipids. A white dot indicates the marginal significance of a particular pair of features. A total of 109 block associations achieved significance at FDR 0.05, matching the previous study’s threshold based on canonical correlation (González ). HAllA’s associations were a strict superset of those found earlier by CCA. Spearman correlation was used as a similarity metric
Fig. 2.HAllA improves statistical power while controlling the FDR. Fifty paired, synthetic datasets with 200 features and 50 samples containing clusters with linear block associations were analyzed. (A) With FNT = 0.2, HAllA maintains the simulated FDR below the target (here (0.05, 0.1, 0.25 and 0.5), with associated tradeoffs in statistical power. In addition, HAllA is consistently better powered than AllA association testing across this range of target FDR values. Dashed lines parallel to the x-axis indicate the target FDR value in each comparison. (B) By increasing the FNT, HAllA can improve the true positive rate with a comparatively minor increase in FDR
Fig. 3.HAllA discovers block-structured associations while controlling FDR. (A) For a variety of feature linkage relationships, we simulated 50 independent paired datasets, each containing 200 features, 50 samples and clusters of correlated features. We then evaluated the ability of hierarchical versus AllA testing to recover these associations using a variety of similarity metrics. (B) Performance was evaluated by comparing power and FDRs. Our hierarchical AllA approach improved sensitivity relative to naive AllA approaches at a comparable FDR. Similarity metrics that do not accept categorical data have not been evaluated in the categorical or mixed association type. Other similarity metrics included in HAllA (dCor, NMI) were not applied in these simulations because their reliance on permutation tests made them too slow for simulations of this size (i.e. with many repeated iterations), although they are typically practical in individual real-world datasets
Fig. 5.HAllAgram for block-wise associations. (A) Using HAllA to associate multiomic data for the analysis of metabolome–microbiome interactions. We used HAllA to associate paired stool metabolomic and 16S rRNA gene sequencing data from the DIABIMMUNE (Kostic ) cohort, in which infants were recruited at birth and sampled monthly for the first 3 years of life. The data comprise 104 samples and describe the abundance of 20 genera and 284 labeled metabolites. A white dot indicates the marginal significance of a particular pair of features. Here, we show the 30 strongest associations ranked by P-value (target FDR = 0.05). (B) Relating host transcriptome and microbial taxa in inflammatory bowel disease (IBD) patients. We applied HAllA to identify associations between the human gut microbiome and transcriptome in 204 patients receiving IPAA surgeries (Morgan ). Block associations are numbered in descending order of significance based on best P-values in each block with each numbered block corresponding to a group of coexpressed transcripts related to a group of co-occurring microbial taxa (OTUs)
Fig. 6.Non-linear relationships detected between RNA and protein expression in a breast cancer cohort. By using an association metric sensitive to non-linear relationships (XICOR), HAllA detects U-shaped relationships between FOXC1 RNA expression and the protein expression of three genes. Overlaying the PAM50 subtype reveals that the U-shapes seem to emerge from a varying response to increased FOXC1 RNA expression by subtype. This effect seems to have gone unnoticed in the literature, thus demonstrating the ease with which HAllA can aid in the discovery of complicated relationships that might be missed otherwise