| Literature DB >> 28740209 |
Jonas Carlsson Almlöf1, Andrei Alexsson2, Juliana Imgenberg-Kreuz3, Lina Sylwan3,4, Christofer Bäcklin3, Dag Leonard2, Gunnel Nordmark2, Karolina Tandre2, Maija-Leena Eloranta2, Leonid Padyukov5, Christine Bengtsson6, Andreas Jönsen7, Solbritt Rantapää Dahlqvist6, Christopher Sjöwall8, Anders A Bengtsson7, Iva Gunnarsson5, Elisabet Svenungsson5, Lars Rönnblom2, Johanna K Sandling3,2, Ann-Christine Syvänen3.
Abstract
Genome-wide association studies have identified risk loci for SLE, but a large proportion of the genetic contribution to SLE still remains unexplained. To detect novel risk genes, and to predict an individual's SLE risk we designed a random forest classifier using SNP genotype data generated on the "Immunochip" from 1,160 patients with SLE and 2,711 controls. Using gene importance scores defined by the random forest classifier, we identified 15 potential novel risk genes for SLE. Of them 12 are associated with other autoimmune diseases than SLE, whereas three genes (ZNF804A, CDK1, and MANF) have not previously been associated with autoimmunity. Random forest classification also allowed prediction of patients at risk for lupus nephritis with an area under the curve of 0.94. By allele-specific gene expression analysis we detected cis-regulatory SNPs that affect the expression levels of six of the top 40 genes designed by the random forest analysis, indicating a regulatory role for the identified risk variants. The 40 top genes from the prediction were overrepresented for differential expression in B and T cells according to RNA-sequencing of samples from five healthy donors, with more frequent over-expression in B cells compared to T cells.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28740209 PMCID: PMC5524838 DOI: 10.1038/s41598-017-06516-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Prediction accuracy. Prediction accuracy measured by the area under the curve (AUC) using genotype data from the Immunochip. All data from 1,160 SLE patients and 2,711 controls were used for the prediction of SLE disease status by random forests (RF) and using a risk score based on the single SNP association analysis. The random forest classification was also applied to the subgroup of the SLE patients diagnosed with lupus nephritis (n = 274) together with all control samples.
Top 40 risk genes for SLE identified by random forest prediction using Immunochip genotype data from SLE patients and controls.
| Predicted genes1 | Gene importance score4 | Association with autoimmune diseases in the GWAS catalog5 | Differential expression in B and T cells8,9 |
|---|---|---|---|
|
| 51.8 | SLE, RA, KD, pSS | B > T*** |
|
| 49.2 | SLE, IBD, UC, T1D, MS, CD, Psoriasis | B > T |
|
| 39.8 | SLE, UC, IBD, RA, CD, pSS, Celiac, PBC | T > B*** |
|
| 33.5 | SLE, RA, Celiac, Psoriasis | T > B |
|
| 33.4 | New | B > T*** |
|
| 33.0 | New | T > B |
|
| 30.5 | SLE, IBD, CD | B > T*** |
|
| 27.5 | IBD, UC, CD, AS | T > B |
|
| 27.2 | SLE, IBD, Psoriasis | B > T |
|
| 26.7 | SLE6, CD, IBD, MS | Low |
|
| 26.6 | SLE, PBC, pSS | Low |
|
| 25.9 | SLE, IBD, CD, UC | T > B |
|
| 24.4 | IBD, CD, UC, AS, MS | T > B |
|
| 23.9 | SLE, Vitiligo | T > B |
|
| 23.1 | SLE, UC, IBD, RA | B > T*** |
|
| 22.8 | SLE6,7, IBD, CD, T1D, RA, MS, Vitiligo | T > B |
|
| 21.6 | SLE, CD, RA, Celiac, MS | Low |
|
| 20.4 | SLE | B > T |
|
| 20.1 | SLE6, Celiac, PBD, MS, pSS | Low |
|
| 19.7 | SLE | B > T |
|
| 18.2 | SLE | Low |
|
| 17.8 | IBD, CD | B > T* |
|
| 17.6 | IBD, CD, UC, RA | T > B* |
|
| 17.2 | SLE6, IBD, CD, MS | B > T |
|
| 16.7 | T1D | Low |
|
| 16.7 | SLE | B > T |
|
| 16.6 | SLE | T > B |
|
| 16.4 | SLE6, CD, RA, Psoriasis | T > B |
|
| 16.4 | IBD, UC, CD, AS | Low |
|
| 16.1 | IBD, CD, UC | B > T*** |
|
| 15.8 | IBD, CD | T > B |
|
| 15.4 | SLE6, IBD, CD, MS | T > B |
|
| 15.4 | SLE6, RA, MS | T > B |
|
| 15.3 | IBD, CD, MS, Vitiligo, Psoriasis | Low |
|
| 14.9 | CD, RA, PBC, Psoriasis | B > T |
|
| 14.9 | SLE, RA | B > T*** |
|
| 14.8 | IBD, CD | B > T |
|
| 14.7 | New | Low |
|
| 14.7 | UC, IBD | Low |
|
| 14.6 | SLE, IBD, UC, RA, PBC, CD | B > T*** |
1Human leukocyte antigen (HLA) genes not included, 2Alternative candidate autoimmunity gene in the region reported in the GWAS catalog or functional studies, 3Cis-regulatory SNPs with significant association with allele-specific gene expression in B or T cells, 4The random forest generates SNP importance scores based on the importance of each SNP for the prediction. The SNP scores are summed up over a gene region to obtain the final gene importance score, 5SLE = systemic lupus erythematosus, RA = rheumatoid arthritis, IBD = inflammatory bowel disease, CD = Crohn’s disease, T1D = diabetes mellitus type 1, MS = multiple sclerosis, PBC = primary biliary cirrhosis, UC = ulcerative colitis, KD = Kawasaki disease, Celiac = Celiac disease, AS = Ankylosing spondylitis, pSS = primary Sjögren’s syndrome, New = previously unknown SLE risk gene, 6Langefeld, C. D. et al. Transancestral mapping and genetic load in systemic lupus erythematosus, submitted manuscript, 7Evidence of SLE association from literature[44], 8Genes are annotated according to their expression level in B or T cells based on RNA-sequencing data, 9Low = Expression below 1 fragments per kilobase of exon per million fragments mapped (FPKM) for both cell types, *Bonferroni corrected p-value < 0.05, ***Bonferroni corrected p-value < 0.001.
Top 40 risk genes for SLE identified by the regular single SNP association using Immunochip genotype data from SLE patients and controls.
| Predicted genes1 | Association p-value | Association with autoimmune diseases in the GWAS catalog4 | Differential expression in B and T cells6,7 | Rank in random forest prediction8 |
|---|---|---|---|---|
|
| 4.08E-24*** | SLE, UC, IBD, RA, pSS | B > T*** | 15 |
|
| 1.76E-20*** | SLE, UC, IBD, RA, CD, pSS, Celiac, PBC | T > B*** | 3 |
|
| 1.05E-14*** | SLE5, pSS | NA | 3473 |
|
| 5.69E-11*** | SLE | Low | 265 |
|
| 6.82E-9*** | IBD, CD, T1D | B > T** | 185 |
|
| 1.79E-8** | SLE | B > T | 78 |
|
| 3.24E-8** | SLE, UC, IBD, RA, CD, T1D, Psoriasis | B > T | 106 |
|
| 5.41E-8** | T1D | T > B | 66 |
|
| 7.77E-8* | CD, Celiac, AS | T > B | 85 |
|
| 1.09E-7* | SLE | B > T | 18 |
|
| 3.59E-7* | SLE, IBD, Psoriasis, pSS | B > T | 9 |
|
| 4.99E-7 | SLE | B > T | 20 |
|
| 5.28E-7 | SLE, CD, RA, Celiac, MS | Low | 17 |
|
| 6.67E-7 | SLE | T > B | 27 |
|
| 9.35E-7 | SLE, RA, KD, pSS | B > T*** | 1 |
|
| 1.15E-6 | SLE5, CD, IBD, MS | Low | 10 |
|
| 1.67E-6 | SLE, Vitiligo | T > B | 14 |
|
| 2.18E-6 | SLE, IBD, CD, UC | NA | 12 |
|
| 3.53E-6 | SLE, IBD, UC, T1D, MS, CD, Psoriasis | B > T | 2 |
|
| 4.19E-6 | SLE, UC, CD, IBD, PBC | B > T | 566 |
|
| 6.71E-6 | SLE, IBD, UC, T1B, CD | T > B | 122 |
|
| 7.19E-6 | IBD, CD | Low | 263 |
|
| 1.86E-5 | New | Low | 458 |
|
| 1.89E-5 | New | T > B | 6 |
|
| 1.99E-5 | SLE, IBD, UC, CD | Low | 119 |
|
| 2.93E-5 | SLE, IBD, UC, T1D, Vitiligo, Psoriasis | B > T | 908 |
|
| 3.19E-5 | CD | T > B | 83 |
|
| 3.38E-5 | RA, T1D | T > B | 102 |
|
| 3.90E-5 | SLE, IBD, CD | B > T*** | 7 |
|
| 4.70E-5 | New | T > B* | 767 |
|
| 6.00E-5 | IBD, CD, T1D, RA | T > B** | 149 |
|
| 6.02E-5 | SLE, IBD, CD, RA, Celiac | T > B | 345 |
|
| 6.77E-5 | SLE, RA, Celiac, Psoriasis | T > B | 4 |
|
| 8.63E-5 | Psoriasis | Low | 2054 |
|
| 9.36E-5 | SLE | B > T | 195 |
|
| 9.81E-5 | New | Low | 392 |
|
| 9.85E-5 | CD, T1D, RA, Vitiligo | B > T | 306 |
|
| 9.97E-5 | T1D | B > T | 1098 |
|
| 1.16E-4 | New | Low | 1475 |
|
| 1.29E-4 | IBD, UC, CD, T1D, RA | T > B | 228 |
1Human leukocyte antigen (HLA) genes not included, 2Alternative candidate autoimmunity gene in the region reported in the GWAS catalog or functional studies, 3Cis-regulatory SNP with significant association with allele-specific gene expression in B or T cells, 4SLE = systemic lupus erythematosus, RA = rheumatoid arthritis, IBD = inflammatory bowel disease, CD = Crohn’s disease, T1D = diabetes mellitus type 1, MS = multiple sclerosis, PBC = primary biliary cirrhosis, UC = ulcerative colitis, KD = Kawasaki disease, pSS = primary Sjögren’s syndrome, New = previously unknown SLE risk gene, 5Langefeld, C. D. et al. Transancestral mapping and genetic load in systemic lupus erythematosus, submitted manuscript, 6Genes are annotated according to their expression level in B or T cells based on RNA-sequencing data, 7Low = Expression below 1 FPKM for both cell types, 8Gene ranking in the random forest prediction, *Bonferroni corrected p-value < 0.05, **Bonferroni corrected p-value < 0.01, ***Bonferroni corrected p-value < 0.001.
Figure 2Overlapping genes. Genes overlapping between the top 40 genes defined by the random forest prediction and the regular single SNP association analysis.
Figure 3Over-representation of genes with allele-specific expression (ASE) in disease associated genes. Fold difference of expressed predicted SLE genes and autoimmunity associated genes with ASE in more than 80% of the individuals compared to all other genes. The risk genes in T-cells were significantly overrepresented in all gene sets, with the top 40 genes from random forest classification (p = 0.0079), top 40 genes from logistic regression (p = 0.015), and autoimmunity associated genes (p < 0.0001). Additionally, the enrichment of the autoimmunity associated risk genes in B-cells was also significant (p = 0.007).
Figure 4Over-representation of differentially expressed genes. Over-representation of differentially expressed genes between B cells and T cells among the top 40 genes from the random forest prediction of SLE at different significance cutoffs. Blue curve shows all genes; green curve shows genes that were expressed at a higher level in B cells than in T cells; yellow curve shows genes that were expressed at a higher level in T cells than in B cells.
Difference in expression between B cells and T cells for 30 of the top 40 expressed genes from the random forest prediction.
| Fold difference in expression1 | B > X*T1 | T > X*T1 |
|---|---|---|
| X = 1 | 16 | 14 |
| X = 2 | 8 | 5 |
| X = 5 | 7 | 2 |
| X = 10 | 6 | 2 |
| X = 30 | 5 | 0 |
| Significant difference | 8 | 2 |
1X is the fold difference in expression between the two cell types.