| Literature DB >> 21244703 |
Joanne H Wang1, Derek Pappas, Philip L De Jager, Daniel Pelletier, Paul Iw de Bakker, Ludwig Kappos, Chris H Polman, Lori B Chibnik, David A Hafler, Paul M Matthews, Stephen L Hauser, Sergio E Baranzini, Jorge R Oksenberg.
Abstract
BACKGROUND: Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed.Entities:
Year: 2011 PMID: 21244703 PMCID: PMC3092088 DOI: 10.1186/gm217
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Demographic statistics of study participants
| Discovery dataset (N = 8,844) | Validation datasetb (N = 3,606) | |||
|---|---|---|---|---|
| Case | Control | Case | Control | |
| Stratuma | (N = 2,124) | (N = 6,720) | (N = 1,618) | (N = 1,988) |
| IMSGC UK, Affy 500K | 17.5% | 40.9% | - | - |
| IMSGC US, Affy 500K | 13.2% | 23.3% | - | - |
| BWH, Affy 6.0 | 32.2% | 23.9% | - | - |
| Gene MSA CH, Illumina 550K | 9.6% | 2.9% | - | - |
| Gene MSA NL, Illumina 550K | 8.9% | 3.1% | - | - |
| Gene MSA US, Illumina 550K | 18.6% | 5.9% | - | - |
| Male | 27.9% | 50.3% | 27.5% | 38.1% |
| Female | 72.1% | 49.7% | 72.5% | 61.9% |
| 52.7% | 25.1% | 56.9% | 29.8% | |
| 47.3% | 74.9% | 43.1% | 70.2% | |
aDatasets described in [24]. In each pair of matched cases and controls, all subjects are genotyped using the same genome-wide platform. bDatasets described in [12], with 1,618 Australian and New Zealand cases (Illumina Hap370CNV) matched with 1,988 US controls (Illumina Infinium).
Estimated cumulative genetic risk using 12 validated multiple sclerosis genesa
| Probability of being a MS case | |||
|---|---|---|---|
| 25% quartile | Median | 75% quartile | |
| Case (N = 2,062) | 0.228 | 0.379 | 0.589 |
| Control (N = 6,360) | 0.072 | 0.134 | 0.268 |
Classification results in the discovery dataset were: classification sensitivity, 35.1%; classification specificity, 93.5%; classification accuracy rate, 63.8%; model fit analysis, P = 0.007 (Hosmer-Lemeshow goodness-of-fit test [30] was implemented to assess 'lack of fit' of the selected model; P > 0.05 indicates that there is no evidence of a lack of fit of the selected model). aHLA-DRB1, CD58, CLEC16a, EVI5, IL2Ra, IRF8, RGS1, CD226, TNFRSF1a, CD6, GPC5 and IL7R.
Top significant markers (-Log 10(p) > 6)) after adjusting for DRB1*15:0 1 among the 700-independent-gene set
| rs ID | Position | Chrom. | Gene name | Allele 1 | Allele 2 | -Log10 p | OR | Lower CL | Upper CL |
|---|---|---|---|---|---|---|---|---|---|
| rs9268148 | 32367505 | 6 | A | G | 13.13 | 0.58 | 0.50 | 0.67 | |
| rs1611715 | 29937461 | 6 | C | A | 11.49 | 0.74 | 0.68 | 0.81 | |
| rs7772297 | 31436805 | 6 | C | G | 9.14 | 1.40 | 1.26 | 1.56 | |
| rs4939490 | 60550227 | 11 | G | C | 9.00 | 1.30 | 1.19 | 1.42 | |
| rs9275596 | 32789609 | 6 | T | C | 7.85 | 0.76 | 0.69 | 0.84 | |
| rs10244467 | 22584456 | 7 | T | C | 7.23 | 0.57 | 0.47 | 0.70 | |
| rs9596270 | 49740441 | 13 | T | C | 7.08 | 1.56 | 1.31 | 1.85 | |
| rs12025416 | 116750329 | 1 | C | T | 6.83 | 0.69 | 0.59 | 0.80 | |
| rs6836440 | 100405684 | 4 | A | G | 6.74 | 0.68 | 0.58 | 0.79 | |
| rs7137953 | 119357405 | 12 | C | T | 6.47 | 0.77 | 0.70 | 0.85 | |
| rs10846336 | 16413619 | 12 | T | C | 6.43 | 0.42 | 0.30 | 0.59 | |
| rs931555 | 35839334 | 5 | C | T | 6.41 | 1.25 | 1.15 | 1.36 | |
| rs10203141 | 179015804 | 2 | C | G | 6.40 | 0.81 | 0.75 | 0.88 | |
| rs2328523 | 20575342 | 6 | G | A | 6.28 | 0.79 | 0.72 | 0.87 | |
| rs4368946 | 98497864 | 8 | T | C | 6.25 | 0.70 | 0.61 | 0.80 | |
| rs3934035 | 281714 | 3 | C | T | 6.23 | 0.46 | 0.34 | 0.62 | |
| rs17062281 | 73654880 | 13 | C | G | 6.13 | 0.44 | 0.31 | 0.61 | |
| rs1356122 | 155666264 | 3 | G | C | 6.13 | 1.26 | 1.14 | 1.40 | |
| rs4447 | 31599694 | 22 | T | C | 6.10 | 0.74 | 0.66 | 0.83 | |
| rs655763 | 108682027 | 11 | C | T | 6.03 | 1.59 | 1.32 | 1.92 | |
| rs12419184 | 125561518 | 11 | C | T | 6.03 | 0.72 | 0.63 | 0.82 |
Chrom., chromosome; lower CL, lower bound of the confidence interval; OR, odds ratio; upper CL, upper bound of the confidence interval.
Classification results using different genetic models
| Classification | Classification | P-Hat (quantiles, case versus control) | |||
|---|---|---|---|---|---|
| Genetic model | sensitivity | specificity | 25% | 50% | 75% |
| Discovery dataset (N = 8,844) | |||||
| 12 Genesa | 35.1% | 93.5% | 0.23 0.07 | 0.38 0.13 | 0.59 0.27 |
| 350 Genesb | 79.9% | 95.8% | 0.65 0.00 | 0.90 0.01 | 0.99 0.06 |
| Validation dataset (N = 3,606) | |||||
| 12 Genesa | 54.3% | 74.0% | 0.36 0.30 | 0.53 0.36 | 0.63 0.51 |
| 350 Genesb | 62.3% | 75.9% | 0.41 0.19 | 0.59 0.32 | 0.74 0.49 |
aThe 12-gene set includes HLA-DRB1 and 11 additional validated susceptibility genes. bThe 350-gene set includes HLA-DRB1 and 349 additional genes identified in the genetic profile.
Figure 1ROC curves of different genetic models using the discovery dataset (N = 8,844). Stepwise selection from the 700-gene list yielded gene sets with different numbers of genes used in the predictive model: 255 genes (P = 0.01), 350 genes (P = 0.05), and 391 genes (P = 0.10).
Figure 2ROC curves of different genetic models using the validation dataset (N = 3,606). Logistic regression using forward selection method. The 350 genetic markers were entered into the model by rank of significance.
Clinical and demographic characteristics of various genetic-load groups
| Genetic-load groups by the level of estimated cumulative genetic risk | |||||
|---|---|---|---|---|---|
| High | Medium | Low | Misclassified | ||
| Clinical and demographic variables | P-Hat ≥ 0.95 | P-Hat = 0.75-0.95 | P-Hat = 0.5-0.75 | P-Hat < 0.5 | Test |
| Sample size, N (%) | 383 (39.6%) | 313 (32.3%) | 142 (14.7%) | 130 (13.4%) | |
| MSSS (least-square mean)a | 1.77 | 1.82 | 1.83 | 1.81 | F = 0.41, |
| T2-lesion load (mm3) (least-square mean)b | 15.41 | 15.40 | 14.32 | 15.81 | F = 0.98, |
| Age of disease onset (years) | 33.81 | 33.55 | 33.18 | 35.90 | F = 2.71, |
| 242 (63.2%) | 146 (46.7%) | 51 (35.9%) | 31 (23.9%) | χ2 = 74.13e | |
| 141 (36.8%) | 167 (53.4%) | 91 (64.1%) | 99 (76.1%) | ||
| Female, N (%) | 285 (74.4%) | 206 (65.8%) | 85 (59.9%) | 68 (52.3%) | χ2 = 25.41e |
| Male, N (%) | 98 (25.6%) | 107 (34.2%) | 57 (40.1%) | 62 (47.7%) | |
aMSSS (Multiple Sclerosis Severity Score [44]) after square-root transformation to meet normality assumption. bT2-lesion volumes after cube-root transformation to meet normality assumption. cANCOVA test result, with 'age of disease onset' and gender as covariates. dANCOVA test result, with gender as covariate.
eChi-square test result.
Functional annotation of the 350 genes
| Gene Ontologya | DAVIDb |
|---|---|
| Cell adhesion (GO:0007155) | 0.0000148 |
| Cell communication | 0.000632 |
| G-protein signaling, coupled to cyclic nucleotide second messenger | 0.001940c |
| System development (GO:0048731) | 0.000000016 |
| Central nervous system development | 0.000293c |
| Organ development (GO:0048513) | 0.000017 |
| Integral to membrane (GO:0016021) | 0.0000018 |
| Integral to plasma membrane (GO:0005887) | 0.000000026 |
| Dystrophin-associated glycoprotein complex | 0.002081c |
| Sarcoglycan complex | 0.004398c |
| Signal transducer activity (GO:0004871) | 0.0000025 |
| Transmembrane receptor activity (GO:0004888) | 0.0000274 |
| Transmembrane receptor protein phosphatase activity | 0.003811c |
| Amine receptor activity | 0.004557c |
| Hematopoietin/interferon-class (D200-domain) cytokine receptor activity | 0.001526c |
| Phosphoinositide binding | 0.000737c |
| GPI anchor binding | 0.003257c |
| Calcium-release channel activity | 0.004102c |
| Delayed rectifier potassium channel activity | 0.001212c |
| PEb | |
| Cell adhesion molecules (CAMs) | 0.00000036 |
| Neuroactive ligand-receptor interaction | 0.000542 |
| Allograft rejection | 0.001545 |
| Type I diabetes mellitus | 0.003487 |
aOnly significant Gene Ontology levels 4 or higher are indicated for clarity. bP-value correction: DAVID, Benjamini; Pathway Express (PE), FDR. cAnalysis results using GOTree Machine [32]. KEGG, Kyoto Encyclopedia of Genes and Genomes.
The percentage of variance (R2) explained by predictors in the regression model
| Center | Gender |
| 12 genesa | 350 genesb | |
|---|---|---|---|---|---|
| The discovery dataset (n = 8,844) | 15% | 4% | 7% | 10% | 57% |
| The validation dataset (n = 3,606) | NA | 2% | 9% | 11% (AUCc = 0.68) | 27% (AUCc = 0.769) |
aThe 12-gene set includes HLA-DRB1 and 11 additional validated genes. bThe 350-gene set includes HLA-DRB1 and 349 additional genes identified in the genetic profile. cAUC, area under curve from ROC analysis results.
Figure 3Distribution of the estimated cumulative genetic risk (P-Hat) of case and control groups using the 12-gene set and 350-gene set in the validation dataset. P-Hat is the estimated cumulative genetic risk (the probability of being a MS case). The median of the cumulative genetic risk (50% quantile) in the case group is 0.59, and in the control group 0.32. The genetic profile produced a significant difference of P-Hat between the case and control groups.