| Literature DB >> 26940994 |
Matthew P Salomon1,2, Wai Lok Sibon Li3, Christopher K Edlund4, John Morrison5, Barbara K Fortini6, Aung Ko Win7, David V Conti8, Duncan C Thomas9, David Duggan10, Daniel D Buchanan11,12, Mark A Jenkins13, John L Hopper14, Steven Gallinger15, Loïc Le Marchand16, Polly A Newcomb17, Graham Casey18, Paul Marjoram19.
Abstract
BACKGROUND: For the last decade the conceptual framework of the Genome-Wide Association Study (GWAS) has dominated the investigation of human disease and other complex traits. While GWAS have been successful in identifying a large number of variants associated with various phenotypes, the overall amount of heritability explained by these variants remains small. This raises the question of how best to follow up on a GWAS, localize causal variants accounting for GWAS hits, and as a consequence explain more of the so-called "missing" heritability. Advances in high throughput sequencing technologies now allow for the efficient and cost-effective collection of vast amounts of fine-scale genomic data to complement GWAS.Entities:
Mesh:
Year: 2016 PMID: 26940994 PMCID: PMC4776370 DOI: 10.1186/s12864-016-2459-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sample information for all samples sequenced in this study
| CCFR center | Num. samples | Population based | Pedigree | Buccal |
|---|---|---|---|---|
| Australia | 1,664 | 1,155 | 509 | 2 |
| USC | 370 | 88 | 282 | 266 |
| Seattle | 910 | 778 | 132 | 0 |
| Mt. Sinai | 1, 007 | 924 | 83 | 0 |
| Hawaii | 101 | 101 | 0 | 0 |
| Totals | 4,052 | 3,046 | 1,006 | 268 |
Both population based and pedigree based samples were included in the sequencing. The majority of samples were sequenced from genomic DNA extracted from stored blood, with a sub-set of samples that were sequenced from stored buccal swabs
Summary of 11 genomic regions sequenced
| SNP | Band | Region sequenced | Total sequenced (bp) | Mean coverage | % of target with > = 30X | Uncorrected | Uncorrected |
|---|---|---|---|---|---|---|---|
| rs16892766 | 8q23.3 | 8:117,291,701–117,930,819 | 639,118 | 69.58 | 77.25 | 0.04119 | 0.03253 |
| rs10505477 | 8q24 | 8:127,830,818–128,730,818 | 900,000 | 80.17 | 82.35 | 0.02315 | 0.01395 |
| rs719725 | 9p24 | 9:5,891,100–6,558,270 | 667,170 | 62.40 | 66.81 | 0.05862 | 0.05821 |
| rs10795668 | 10p14 | 10:8,376,087–8,772,195 | 396,108 | 81.17 | 81.86 | 0.002037 | 0.00187 |
| rs3802842 | 11q23 | 11:110,644,790–110,794,790 | 150,000 | 85.54 | 86.64 | 0.002948 | 0.002393 |
| rs3802842 | 11q23 | 11:111,047,966–111,504,790 | 456,824 | 85.54 | 86.64 | - | - |
| rs7136702 | 12q13.13 | 12:50,497,179–51,330,290 | 833,111 | 64.97 | 80.70 | 0.003239 | 0.006124 |
| rs4444235 | 14q22.2 | 14:54,370,768–54,840,250 | 469,482 | 73.01 | 78.49 | 0.5992 | 0.746 |
| rs4779584 | 15q13.3 | 15:32,958,831–33,432,615 | 473,784 | 78.31 | 82.85 | 0.08478 | 0.4264 |
| rs4939827 | 18q21 | 18:45,936,002–46,556,002 | 620,000 | 92.27 | 86.52 | 0.02014 | 0.008508 |
| rs4925386 | 20q13.33 | 20:60,840,110–60,995,164 | 155,054 | 74.18 | 72.77 | 0.1239 | 0.0981 |
| Totals | 5,760,651 | 76.16 | 79.62 |
The first column indicates the focal GWAS SNP that the region was designed around. Sequencing coverage for each region was calculated as the mean coverage across the entire targeted region and as the breadth of coverage. The breadth of coverage is defined as the number of bases per targeted region that are coverage at > = 30X coverage
Fig. 1Mean coverage across all 11 targeted regions. The x-axis is the mean coverage for each sample. The y-axis is the number of samples at a given coverage. See text for discussion of the bimodal distribution of coverage across chr20:60840111–60995164
Fig. 2Distribution of allele s with a MAF of less than 0.01 for the 2,838 population based samples
Fig. 3a and b. PCA analysis of the population based samples. a. PCAs colored by CCFR center. b. PCAs colored by race with all non-Caucasian individuals colored in red and Caucasian samples colored in black
Most significantly associated SNPs identified in the PLINK analysis
| Chr: Position | rs ID number | Feature | Base change | MAF: Cases (Controls) | Gene (distance to nearest genes) | Uncorrected p-value (0PCs) | FDR corrected (0PCs) | Uncorrected (2PCs) | FDR corrected (2PCs) |
|---|---|---|---|---|---|---|---|---|---|
| 9: 5,980,030 | Novel | intronic | A to G | 0.048(0.065) | KIAA2026 | 5.323e–008 | 0.0002117 | 1.587e–007 | 0.0005407 |
| 10: 8,542,529 | Novel | Intergenic | T to G | 0.016(0.028) | LINC00708(232261), LINC00709(775047) | 3.576e–007 | 0.0009478 | 2.461e–007 | 0.000734 |
| 12: 50,554,103 | Novel | intronic | A to G | 0.026(0.033) | CERS5 | 2.926e–007 | 0.0008725 | 6.119e–007 | 0.001622 |
| 12: 51,243,510 | Novel | intronic | A to T | 0.049(0.057) | TMPRSS12 | 7.362e–008 | 0.0002509 | 1.104e–007 | 0.0004388 |
| 14: 54,603,486 | rs116055771 | Intergenic | A to T | 0.013(0.022) | BMP4(179,932), CDKN3(260,187) | 4.085e–011 | 3.248e–007 | 2.014e–010 | 1.602e–006 |
| 15: 33,008,360 | Novel | Intergenic | A to C | 0.034(0.054) | SCG5(19,062), GREM1(1,845) | 6.384e–009 | 3.046e–005 | 2.652e–008 | 0.0001265 |
| 15: 33,345,877 | Novel | intronic | A to C | 0.004(0.011) | FMN1 | 2.048e–011 | 2.443e–007 | 6.591e–011 | 7.861e–007 |
| 18: 46,119,756 | Novel | intronic | T to C | 0.010(0.021) | CTIF | 2.128e–009 | 1.736e–006 | 5.41e–010 | 3.226e–006 |
| 18: 46,119,757 | rs76590328 | intronic | C to T | 0.031(0.054) | CTIF | 6.103e–013 | 1.456e–008 | 1.98e–012 | 4.724e–008 |
| 18: 46,503,254 | Novel | intronic | A to G | 0.031(0.057) | LOXHD1 | 2.722e–006 | 0.006494 | 4.838e–006 | 0.01154 |
Fig. 4QQ-plot for marginal associations between common polymorphism and cancer status across all sequenced regions. X-axis shows expected –log(p-value); y-axis shows observed –log(p-value)
Composition of the significantly associated SNP sets identified in the SKAT combined analysis
| GWAS SNP | Gene | Feature | Position | rs ID number | MAF: Cases (Controls) | PLINK p-value | p-value for SNP set |
|---|---|---|---|---|---|---|---|
| rs3802842 | C11orf53 | 3′ UTR | 11:111,156,836 | rs3087967 | 0.32(0.27) | 1.00 | Uncorrected 3.17e–004 (1.79e–004) |
| C11orf53 | 3′ UTR | 11:111,156,857 | Novel | 0(5.55e–004) | NA | FDR corrected 0.0486 (0.0275) | |
| C11orf53 | 3′ UTR | 11:111,156,877 | Novel | 2.51e–004(0) | |||
| C11orf53 | 3′ UTR | 11:111,156,937 | Novel | 0(5.55e–004) | NA | ||
| rs7136702 | ATF1 | 5′ UTR | 12:51,157,849 | Novel | 4.92e–003(7.96e–003) | 0.8771 | Uncorrected 1.04e–005 (1.83e–005) |
| ATF1 | 5′ UTR | 12:51,157,852 | Novel | 1.04e–003(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,157,863 | rs61926301 | 0.58(0.61) | 1.00 | FDR corrected 0.0032 (0.0056) | |
| ATF1 | 5′ UTR | 12:51,157,886 | Novel | 2.55e–004(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,157,960 | Novel | 2.59e–004(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,157,996 | Novel | 2.62e–004(0) | |||
| ATF1 | 5′ UTR | 12:51,158,010 | Novel | 2.61e–004(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,158,027 | Novel | 0(5.97e–004) | NA | ||
| ATF1 | 5′ UTR | 12:51,158,045 | Novel | 7.82e–004(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,158,047 | Novel | 2.61e–004(0) | NA |
Composition of the significantly associated SNP sets identified in the SKAT combined analysis
| GWAS SNP | Gene | Feature | Position | rs ID number | MAF: Cases (Controls) | PLINK |
|
|---|---|---|---|---|---|---|---|
| rs7136702 | ATF1 | 5′ UTR | 12:51,157,849 | Novel | 4.92e–003(7.96e–003) | 0.8771 | Uncorrected 8.08e–005 (7.59e–005) |
| ATF1 | 5′ UTR | 12:51,157,852 | Novel | 1.04e–003(0) | NA | FDR corrected 0.0055 (0.0052) | |
| ATF1 | 5′ UTR | 12:51,157,863 | rs61926301 | 0.5.(0.61) | 1.00 | ||
| ATF1 | 5′ UTR | 12:51,157,886 | Novel | 2.55e–004(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,157,960 | Novel | 2.59e–004(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,157,996 | Novel | 2.62e–004(0) | |||
| ATF1 | 5′ UTR | 12:51,158,010 | Novel | 2.61e–004(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,158,027 | Novel | 0(5.97e–004) | NA | ||
| ATF1 | 5′ UTR | 12:51,158,045 | Novel | 7.82e–004(0) | NA | ||
| ATF1 | 5′ UTR | 12:51,158,047 | Novel | 2.61e–004(0) | NA |
This analysis was performed on each targeted sequencing region separately and including the focal GWAS SNP as a covariates