| Literature DB >> 36253801 |
Peilin Jia1, Ruifeng Hu1, Fangfang Yan1, Yulin Dai1, Zhongming Zhao2,3,4.
Abstract
BACKGROUND: The rapid accumulation of single-cell RNA sequencing (scRNA-seq) data presents unique opportunities to decode the genetically mediated cell-type specificity in complex diseases. Here, we develop a new method, scGWAS, which effectively leverages scRNA-seq data to achieve two goals: (1) to infer the cell types in which the disease-associated genes manifest and (2) to construct cellular modules which imply disease-specific activation of different processes.Entities:
Keywords: Cell-type specificity; Complex diseases; GWAS; Single-cell RNA sequencing; scGWAS; scRNA-seq
Mesh:
Year: 2022 PMID: 36253801 PMCID: PMC9575201 DOI: 10.1186/s13059-022-02785-w
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906
Summary of complex traits/diseases and implicated tissues
| Trait/disease name | Abbr | Year | # samples | Implied tissue | |
|---|---|---|---|---|---|
| Alcohol use disorder [ | AUD | 2019 | 202,004 | Brain—cerebellum | 0.08 |
| Alzheimer’s disease [ | AD | 2018 | 455,258 | Whole blood | 9.94 × 10−9 |
| Amyotrophic lateral sclerosis [ | ALS | 2016 | 36,052 | Heart—atrial appendage | 0.28 |
| Anxiety, anxiety-continuous [ | ANX | 2016 | 18,186 | Brain—spinal cord (cervical c-1) | 0.10 |
| Anxiety tension-special-factor-of-neuroticism[ | ANEU | 2019 | 270,059 | Brain—anterior cingulate cortex (BA24) | 6.49 × 10−3 |
| Asthma [ | Asthma | 2017 | 127,669 | Spleen | 9.71 × 10−3 |
| Attention-deficit-hyperactivity disorder [ | ADHD | 2017 | 53,293 | Brain—anterior cingulate cortex (BA24) | 9.05 × 10−4 |
| PGC Autism, Autism-Europeans [ | ASD | 2017 | 13,574 | Heart—atrial appendage | 0.40 |
| Bipolar disorder [ | BD | 2018 | 51,710 | Brain—anterior cingulate cortex (BA24) | 6.85 × 10−4 |
| Blood lipids, high-density lipoprotein [ | HDL | 2010 | 99,900 | Liver | 6.62 × 10−4 |
| Blood lipids, low-density lipoprotein [ | LDL | 2010 | 95,454 | Liver | 2.11 × 10−8 |
| Blood lipids, total cholesterol [ | TC | 2010 | 100,184 | Liver | 6.91 × 10−10 |
| Blood lipids, triglycerides [ | TG | 2010 | 96,598 | Liver | 1.31 × 10−8 |
| Body mass index [ | BMI | 2015 | 234,069 | Colon—sigmoid | 0.02 |
| CAD resting heart rate [ | CAD_RHR | 2016 | 265,046 | Liver | 4.80 × 10−3 |
| Coronary artery disease [ | CAD | 2017 | 63,731 | Artery—aorta | 0.01 |
| Depressive symptoms [ | DS | 2019 | 181,045 | Adrenal gland | 0.18 |
| Educational attainment, education years all [ | EDU | 2016 | 293,723 | Brain—frontal cortex (BA9) | 3.31 × 10−4 |
| General factor of neuroticism [ | GNEU | 2019 | 270,059 | Brain—nucleus accumbens (basal ganglia) | 3.10 × 10−5 |
| Heart failure [ | HF | 2018 | 394,156 | Artery—aorta | 0.19 |
| Height [ | Height | 2014 | 253,288 | Artery—tibial | 0.02 |
| Internalizing problems [ | IP | 2014 | 4596 | Adrenal gland | 0.44 |
| Lipoprotein concentrations, HDL [ | LIP_HDL | 2009 | 19,840 | Adipose—visceral (omentum) | 0.03 |
| Lung function, FEV1/FVC [ | FEV1 | 2019 | 316,614 | Artery—tibial | 1.52 × 10−5 |
| Lung function, FVC [ | FVC | 2019 | 317,222 | Colon—sigmoid | 1.04 × 10−3 |
| Major depressive disorder [ | MDD | 2018 | 42,455 | Brain—anterior cingulate cortex (BA24) | 0.04 |
| Multiple sclerosis [ | MS | 2018 | 41,505 | Spleen | 9.03 × 10−15 |
| Neuroticism [ | NEU | 2019 | 523,783 | Brain—cerebellar hemisphere | 9.18 × 10−3 |
| Obsessive–compulsive disorder [ | OCD | 2017 | 9,725 | Brain—cerebellum | 0.05 |
| Pancreatic cancer [ | PanCan | 2009 | 3,576 | Adipose—subcutaneous | 0.73 |
| Parkinson’s disease [ | PD | 2012 | 8,477 | Brain—cerebellum | 3.65 × 10−4 |
| Resting heart rate [ | RHR | 2019 | 458,969 | Heart—atrial appendage | 9.27 × 10−17 |
| Rheumatoid arthritis [ | RA | 2014 | 58,284 | Spleen | 2.80 × 10−10 |
| Schizophrenia [ | SCZ | 2018 | 74,626 | Brain—frontal cortex (BA9) | 6.60 × 10−3 |
| SSGAC College [ | COL | 2013 | 101,069 | Brain—cerebellar hemisphere | 2.76 × 10−3 |
| Subjective wellbeing [ | SWB | 2016 | 298,420 | Brain—amygdala | 0.20 |
| Type 2 diabetes [ | T2D | 2017 | 159,208 | Brain—spinal cord (cervical c-1) | 0.10 |
| Type 1 diabetes [ | T1D | 2011 | 26,890 | Spleen | 8.64 × 10−9 |
| Type 1 diabetes, childhood adiposity age under17 [ | T1D_C | 2017 | 14,741 | Spleen | 7.61 × 10−10 |
| Waist format 2: Waist hip ratio [ | WHR | 2015 | 143,480 | Esophagus—Muscularis | 0.04 |
PGC Psychiatric Genomics Consortium
Fig. 1Analysis framework to decode trait-associated tissues and cell types. A Illustration of GWAS and cell-type expression integration at the cellular level. B Tissue-specific enrichment analysis of the traits. The color reflects the significance level [− log10(pBH)]. C Demonstration of the proportional test. D Demonstration of a case showing the association between GWAS and cell type transcriptome (left) and another case without such an association (right). In each figure, a dot indicates a module, with its GWAS-based score shown on the x-axis and its scRNA-seq score shown on the y-axis. The gray dots are random modules from the randomization process. The green and red dots are modules from the real data whereas the red ones indicate significance. Cyan and red circles indicate the 95% confidence interval (CI) of the random modules and the actual modules, respectively. The horizontal and vertical dash lines indicate nominal significance (z = 1.96). E Illustration of disease subnetworks and the enrichment result of the component genes in each of the two heterogeneous data sets
Fig. 2Illustration of the scGWAS method. A–J Illustration of the normalization process. A The distribution of gene-based p-values calculated from GWAS summary statistics, using bipolar disorder as an example. B Estimation of lambda values in Box-Cox transformation. C QQ plot of the original gene-based p-values from GWAS. D QQ plot of the Box-Cox transformed gene-based p-values. E QQ plot of the Box-Cox transformed and calibrated gene-based p-values. F–J Distribution and QQ plot of the gene expression for the astrocyte cell type in the DER20 panel, in the same order as A–E. K, L Illustration using seven cell types from the DER22 panel to show the impact of the penalty factor, where the top panel shows modules identified including the penalty factor (K) and the bottom panel shows modules identified excluding the penalty factor (L). A full comparison using all cell types can be found in Additional file 1: Fig. S2. M Comparison of different normalization methods for module scores. For each cell type, we show three types of module score distribution: the raw module score, permutation-based z-score, and the z-score based on size-matched random modules from the virtual search process. More comparison examples can be found in Additional file 1: Fig. S3
Fig. 3Illustration of the scGWAS results. A A heatmap of all scGWAS results using 18 scRNA-seq panels and 40 traits. B, C Demonstration of module score distribution in schizophrenia (B) and major depressive disorder (C) in cell types from the DER20 panel. The last plot shows the scale of the axes: normalized module score from scRNA-seq on the y-axis and normalized module score from GWAS on the x-axis. In each panel, the red circle indicates the 95% confidence interval (CI) estimated using the random modules and the blue circle indicates the 95% CI estimated using the real modules. Significant modules are highlighted in red while all other modules, including non-significant modules from real data and all random modules from the virtual runs, are not plotted for simplicity
Fig. 4Independent validation of scGWAS results. A Validation of module genes using pLI, ClinVar, and OMIM annotations. For each scRNA-seq panel, we showed the forest plot using three sets (the scGWAS-identified module gene set, the GWAS-promoted set, and the expression-promoted set) using the mean OR with the 25–75% range. The values on the left part of each plot were the mean OR (25–75% OR) for module genes identified by scGWAS. All: module gene sets from all the trait and cell-type associations. All2: module gene sets with ≥ 20 genes. We also showed the results for module genes identified by using only GWAS data (GWASonly). B Cross-panel validation of the trait and cell-type associations. Brain_DER20_N, Brain_DER20_A, and Brain_DER20_M are short for the neuron, astrocytes, and microglia cell types in the DER20 panel. Brain_DER20_E and Brain_DER20_I refer to the excitatory and inhibitory neurons in the DER20 panel. Lung10x_B, Lung10x_T, Lung10x_M, and Lung10x_DC refer to the B cells (including subtypes), T cells, macrophages, and dendritic cells in the Lung10x panel
Fig. 5Investigation of the correlation among traits based on module genes. A Trait-trait correlation based on shared module genes across significantly enriched cell types. B Bubble plot of the trait-trait correlation in each cell type. Each dot represents a trait-trait correlation in a particular cell type. The size of the dot is proportional to the Jaccard Index (JI) based on module genes
Fig. 6Trait-cell type association using the pancreas panels. A Association results using the five pancreas panels. B–D Demonstration and comparison of three subnetworks: T1D in the beta cell from the GSE85241 panel (B), T2D in the beta cell from the E-MTAB-5061 panel (C), and T2D in the beta cell from the GSE85241 panel (D). In all networks, node color is proportional to the corresponding GWAS signals and node size is proportional to the average gene expression in the corresponding cell type
Fig. 7Trait-cell type association using the liver panel. A Heatmap of the identified trait-cell type associations in the liver panel. B Distribution of module scores using selected trait and cell type pairs as examples. In each panel, the cyan circle indicates the 95% confidence interval (CI) estimated using the random modules and the red circle indicates the 95% CI estimated using the real modules. Significant modules are highlighted in red while all other modules, including non-significant modules from real data and all random modules from the virtual runs, are plotted as gray dots. C Trait-trait correlation based on shared module genes in the Hep_1 cell. The red dots are proportional to the Jaccard Index between any pair of traits using their module genes identified in Hep_1. D Demonstration of subnetworks for AUD, CAD, CAD-RHR, and HDL. Note that the AUD network was constructed using all genes identified in Hep cells and the networks for CAD, CAD-RHR, and HDL were constructed using genes identified in a specific Hep cell type. In all networks, node color is proportional to the corresponding GWAS signals and node size is proportional to the average gene expression in the corresponding Hep cells
Fig. 8Trait-cell type association using the lung panels. A Heatmap of the identified trait-cell type associations in the Lung10x and Madissoon_Lung panels. B The cluster of cell types from different panels for MS. C Heatmap of genes that were frequently identified in different traits in the macrophage cells or the dendritic cells