| Literature DB >> 30657194 |
Richard J Biedrzycki1, Ashley E Sier1,2, Dongjing Liu1, Erika N Dreikorn1, Daniel E Weeks1,3.
Abstract
When interpreting genome-wide association peaks, it is common to annotate each peak by searching for genes with plausible relationships to the trait. However, "all that glitters is not gold"-one might interpret apparent patterns in the data as plausible even when the peak is a false positive. Accordingly, we sought to see how human annotators interpreted association results containing a mixture of peaks from both the original trait and a genetically uncorrelated "synthetic" trait. Two of us prepared a mix of original and synthetic peaks of three significance categories from five different scans along with relevant literature search results and then we all annotated these regions. Three annotators also scored the strength of evidence connecting each peak to the scanned trait and the likelihood of further studying that region. While annotators found original peaks to have stronger evidence (p Bonferroni = 0.017) and higher likelihood of further study ( p Bonferroni = 0.006) than synthetic peaks, annotators often made convincing connections between the synthetic peaks and the original trait, finding these connections 55% of the time. These results show that it is not difficult for annotators to make convincing connections between synthetic association signals and genes found in those regions.Entities:
Keywords: association peaks; false positives; genome-wide association studies; literature review
Mesh:
Year: 2019 PMID: 30657194 PMCID: PMC6590226 DOI: 10.1002/gepi.22189
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Traits annotated by each annotator, with “+” indicating they annotated this trait, and “–” indicating they did not
| Scanned Trait (original, synthetic) | |||||
|---|---|---|---|---|---|
| Annotator | (SCZ, LDL) | (TG, CD) | (HDL, UC) | (CAD, AD) | (AD, CAD) |
| 1 (R.J.B.) | + | + | + | – | + |
| 2 (E.N.D.) | + | + | – | – | – |
| 3 (A.E.S.) | + | + | + | + | – |
| 4 (D.L.) | + | + | + | + | – |
| 5 (D.E.W.) | + | + | – | – | – |
Note. AD: Alzheimer’s disease; CAD: coronary artery disease; CD: Crohn’s disease; HDL: high‐density lipoprotein; LDL: low‐density lipoprotein; SCZ: schizophrenia; TG: triglycerides; UC: ulcerative colitis.
Figure 1Annotation status of SNPs for showing evidence for association divided by annotator. (a) Schizophrenia and LDL, (b) triglycerides and Crohn's disease, (c), HDL and ulcerative colitis, (d) coronary artery disease and Alzheimer's disease, and (e) Alzheimer's disease and coronary artery disease. “O” indicates an original peak, and “S” indicates a synthetic peak. HDL: high‐density lipoprotein; LDL: low‐density lipoprotein; SNPs: single‐nucleotide polymorphisms
Figure 2Strength of evidence and likelihood of further study of annotated peaks. (a) Strength of evidence and (b) likelihood of further study scores from Annotators 1, 4, and 5 by peak significance category, original or synthetic peak, and convincing connection status signified by red for no convincing connection made and teal for a convincing connection made
Convincing connection status by peak type
| Convincing connection status | ||
|---|---|---|
| Peak type | No | Yes |
| Synthetic | 43 (45) | 52 (55) |
| Original | 9 (19) | 39 (81) |
χ 2 = 8.58, df = 1, p = 0.003.
Convincing connection status by peak significance category
| Convincing connection status | ||
|---|---|---|
| Peak significance category | No | Yes |
| Highly | 11 (23) | 37 (77) |
| Moderate | 16 (34) | 31 (66) |
| Suggestive | 25 (52) | 23 (48) |
χ 2 = 8.99, df = 2, p = 0.011.
Convincing connection status by peak type and peak significance category
| Convincing connection status | |||
|---|---|---|---|
| Peak significance category | Peak type | No | Yes |
| Highly | Original | 0 (0) | 16 (100) |
| Synthetic | 11 (34) | 21 (66) | |
| Moderate | Original | 3 (19) | 13 (81) |
| Synthetic | 13 (42) | 18 (58) | |
| Suggestive | Original | 6 (38) | 10 (62) |
| Synthetic | 19 (59) | 13 (41) | |