| Literature DB >> 26029704 |
Conor Smyth1, Iva Špakulová1, Owen Cotton-Barratt1, Sajjad Rafiq2, William Tapper3, Rosanna Upstill-Goddard3, John L Hopper4, Enes Makalic4, Daniel F Schmidt4, Miroslav Kapuscinski4, Jörg Fliege1, Andrew Collins3, Jacek Brodzki1, Diana M Eccles2, Ben D MacArthur5.
Abstract
Many common diseases have a complex genetic basis in which large numbers of genetic variations combine with environmental factors to determine risk. However, quantifying such polygenic effects has been challenging. In order to address these difficulties we developed a global measure of the information content of an individual's genome relative to a reference population, which may be used to assess differences in global genome structure between cases and appropriate controls. Informally this measure, which we call relative genome information (RGI), quantifies the relative "disorder" of an individual's genome. In order to test its ability to predict disease risk we used RGI to compare single-nucleotide polymorphism genotypes from two independent samples of women with early-onset breast cancer with three independent sets of controls. We found that RGI was significantly elevated in both sets of breast cancer cases in comparison with all three sets of controls, with disease risk rising sharply with RGI. Furthermore, these differences are not due to associations with common variants at a small number of disease-associated loci, but rather are due to the combined associations of thousands of markers distributed throughout the genome. Our results indicate that the information content of an individual's genome may be used to measure the risk of a complex disease, and suggest that early-onset breast cancer has a strongly polygenic component.Entities:
Keywords: breast cancer; information theory; polygenic disorder
Year: 2015 PMID: 26029704 PMCID: PMC4444159 DOI: 10.1002/mgg3.129
Source DB: PubMed Journal: Mol Genet Genomic Med ISSN: 2324-9269 Impact factor: 2.183
Overview of case and control data sets
| Data set | Size | Size after QC | Gender | Ethnicity | Genotyping platform |
|---|---|---|---|---|---|
| ABCFS cases | 204 | 201 | Female | Caucasian | Illumina 610-Quad SNP array |
| POSH cases | 574 | 536 | Female | Caucasian | Illumina 660-Quad SNP array |
| ABCFS control | 287 | 280 | Female | Caucasian | Illumina 610-Quad SNP array |
| NBS control | 2501 | 2501 | Both | Caucasian | Illumina 1.2M chip |
| 1958 control | 2699 | 2699 | Both | Caucasian | Illumina 1.2M chip |
post-QC.
Figure 1Breast cancer risk is associated with increased genome-wide disorder. (A) Multidimensional scaling plot of all samples and HapMap2 populations genotyped for ∼133,000 SNPs. (B) Expected information per locus (EIL) for each of the different data sets. Median ± 95% confidence intervals are shown. (C) Matrix of FDR adjusted P-values for comparisons of medians (two-sided Wilcoxon rank-sum test). (D) Q-Q plot of EIL in cases versus controls. P-value from a two-sample Kolmogorov–Smirnov test is shown. (E) Estimated odds ratio as a function of EIL. (F) Median number of loci required to account for the differences in EIL observed between cases and controls by percentile. 95% confidence intervals are within the markers, so are not shown.
Figure 2Disorder is not localized to specific regions of the genome. (A) Expected information per locus (EIL) by chromosome. (B) EIL by SNP annotation. (C) EIL in males and females in the controls. In all panels, median ± 95% confidence intervals are shown. Stars indicate significant changes at FDR adjusted P < 0.05 by one-sided Wilcoxon rank-sum test.