| Literature DB >> 21858135 |
Joyce Y Tung1, Chuong B Do, David A Hinds, Amy K Kiefer, J Michael Macpherson, Arnab B Chowdry, Uta Francke, Brian T Naughton, Joanna L Mountain, Anne Wojcicki, Nicholas Eriksson.
Abstract
While the cost and speed of generating genomic data have come down dramatically in recent years, the slow pace of collecting medical data for large cohorts continues to hamper genetic research. Here we evaluate a novel online framework for obtaining large amounts of medical information from a recontactable cohort by assessing our ability to replicate genetic associations using these data. Using web-based questionnaires, we gathered self-reported data on 50 medical phenotypes from a generally unselected cohort of over 20,000 genotyped individuals. Of a list of genetic associations curated by NHGRI, we successfully replicated about 75% of the associations that we expected to (based on the number of cases in our cohort and reported odds ratios, and excluding a set of associations with contradictory published evidence). Altogether we replicated over 180 previously reported associations, including many for type 2 diabetes, prostate cancer, cholesterol levels, and multiple sclerosis. We found significant variation across categories of conditions in the percentage of expected associations that we were able to replicate, which may reflect systematic inflation of the effects in some initial reports, or differences across diseases in the likelihood of misdiagnosis or misreport. We also demonstrated that we could improve replication success by taking advantage of our recontactable cohort, offering more in-depth questions to refine self-reported diagnoses. Our data suggest that online collection of self-reported data from a recontactable cohort may be a viable method for both broad and deep phenotyping in large populations.Entities:
Mesh:
Year: 2011 PMID: 21858135 PMCID: PMC3157390 DOI: 10.1371/journal.pone.0023473
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Replicated SNPs for binary traits.
Our log ORs and 95% confidence intervals are shown as black circles and lines. Published ORs are shown as blue Xs.
Figure 2Success rate (versus total power) by disease class.
Replications = number of associations we successfully replicated. Expected = number of associations we expected to replicate. Attempts = number of associations we attempted to replicate. The blue dot represents our success ratio (number of successful replications divided by number of expected replications). The black line represents the 95% prediction interval for the success ratio. The nine associations that we had high power to detect but had known conflicting data were not included in this figure (see text and Table 1). Conditions assigned to each class (also see Methods S1): Asthma: childhood asthma; Autoimmune: Crohn's disease, inflammatory bowel disease, lupus, multiple sclerosis, psoriasis, type 1 diabetes, ulcerative colitis; Cancer: basal cell carcinoma, bladder cancer, breast cancer, colorectal cancer, prostate cancer, lung cancer, melanoma, pancreatic cancer, scleroderma, testicular cancer, thyroid cancer; Celiac: celiac disease; Diabetes: type 2 diabetes; Heart: blood clots, coronary artery disease, heart attack; Pigment/Hair: eye color, freckling, hair color, red hair color, male pattern baldness; Neuro: Alzheimer's disease, autism, Parkinson's disease; Other: chronic obstructive pulmonary disease, kidney stones, stroke, osteoarthritis; Psychiatric: alcohol abuse, bipolar disorder, schizophrenia.
Associations with sufficient power for detection (> = 80%) that failed to replicate.
| Phenotype | SNP | Pub OR | Rep OR | P-value | Power | Cases | Controls | Replications in the Literature |
| Alcohol abuse | rs7590720 | 1.35 | 0.955 | 0.875 | 1 | 1811 | 8549 | Failed to replicate |
| Bipolar disorder | rs1012053 | 1.59 | 1 | 0.485 | 1 | 366 | 13030 | Failed to replicate |
| Bipolar disorder | rs420259 | 2.08 | 0.966 | 0.659 | 1 | 366 | 13030 | Failed to replicate |
| COPD | rs13180 | 1.3 | 1.05 | 0.26 | 0.89 | 403 | 2306 | Replicated |
| COPD | rs7671167 | 1.32 | 1.11 | 0.0968 | 0.93 | 403 | 2306 | Replicated |
| COPD | rs1828591 | 1.38 | 1 | 0.489 | 0.97 | 403 | 2306 | Replicated |
| Crohn's disease | rs2066847 | 3.99 | 1.54 | 0.151 | 0.88 | 84 | 13288 | Replicated |
| IBD | rs7517847 | 1.61 | 0.855 | 0.954 | 1 | 250 | 12808 | Replicated |
| Juvenile allergic asthma | rs2786098 | 1.43 | 1.07 | 0.181 | 1 | 641 | 6584 | Failed to replicate |
| Lupus | rs3131379 | 2.36 | 1.38 | 0.133 | 0.82 | 52 | 11675 | Not yet replicated |
| Parkinson's disease | rs17115100 | 1.25 | 0.992 | 0.555 | 0.97 | 2274 | 5336 | Not claimed |
| Parkinson's disease | rs823128 | 1.52 | 1.17 | 0.0531 | 1 | 2274 | 5336 | Not claimed |
| Psoriasis | rs20541 | 1.27 | 1.09 | 0.0953 | 0.92 | 833 | 4291 | Replicated |
| Rheumatoid arthritis | rs10499194 | 1.33 | 0.927 | 0.797 | 0.9 | 308 | 12845 | Failed to replicate |
| Rheumatoid arthritis | rs3761847 | 1.32 | 1.01 | 0.437 | 0.93 | 308 | 12845 | Between-study heterogeneity |
| Type 2 diabetes | rs9300039 | 1.48 | 0.976 | 0.595 | 0.97 | 778 | 3273 | Between-study heterogeneity |
| Type 2 diabetes | rs2943641 | 1.19 | 1.03 | 0.328 | 0.81 | 778 | 3273 | Not yet replicated |
| Thyroid cancer | rs965513 | 1.75 | 1.37 | 0.0559 | 0.83 | 52 | 11234 | Replicated |
| Ulcerative colitis | rs11209026 | 1.79 | 1.47 | 0.0577 | 0.85 | 181 | 13100 | Replicated |
Pub OR = published odds ratio. Rep OR = 23andMe attempted replication odds ratio. Power = estimated power to detect association.
COPD = Chronic Obstructive Pulmonary Disease. This analysis included smokers only.
IBD = Inflammatory Bowel Disease.
This SNP was initially associated with IBD, but replicated only for Crohn's disease [39], which is a subtype of IBD. Latiano et al. also replicates rs7517847 with Crohn's disease, but not with ulcerative colitis, which is the other major subtype of IBD [14].
This association was curated into the GWAS catalog as significantly associated with Parkinson's disease but was not identified by the authors as significant.
Replications without strictly matching phenotypes.
| 23andMe Phenotype | Published Phenotype | # Replications | Genes |
| Liver test | Bilirubin levels | 4 | CHUK, GGT1, SAMM50, UGT1A1 |
| High cholesterol | Cholesterol levels (quantitative) | 19 | ABCG8, APOA1, APOB, CELSR2, CILP2, DNAH11, DOCK7, FADS1, GCKR, HNF1A, LDLR, LIPC (×2), MAFB, NCAN, PCSK9 (×2), TOMM40, TRIB1 |
| Gall bladder removal | Gallstones | 1 | ABCG8 |
| High blood pressure | Blood pressure (quantitative) | 8 | ATP2B1, CYP17A1 (×2), CYP1A1, FGF5, SH2B3, ULK4, ZNF652 |
| Osteoporosis | Bone mineral density (quantitative) | 5 | MEF2C, MEPE, OSX, SOX6, SPTBN1 |
| Macular degeneration | Advanced age-related macular degeneration | 2 | C2, C3 |
| Nicotine abuse | Nicotine dependence | 1 | CHRNA3 |
These phenotypes were measured quantitatively in the published reports, but the corresponding 23andMe phenotypes listed here were measured qualitatively (yes/no).