| Literature DB >> 16848972 |
Dimitrios Avramopoulos1, Peter Zandi, Adrian Gherman, M Daniele Fallin, Susan S Bassett.
Abstract
Genes for complex disorders have proven hard to find using linkage analysis. The results rarely reach the desired level of significance and researchers often have failed to replicate positive findings. There is, however, a wealth of information from other scientific approaches which enables the formation of hypotheses on groups of genes or genomic regions likely to be enriched in disease loci. Examples include genes belonging to specific pathways or producing proteins interacting with known risk factors, genes that show altered expression levels in patients or even the group of top scoring locations in a linkage study. We show here that this hypothesis of enrichment for disease loci can be tested using genome-wide linkage data, provided that these data are independent from the data used to generate the hypothesis. Our method is based on the fact that non-parametric linkage analyses are expected to show increased scores at each one of the disease loci, although this increase might not rise above the noise of stochastic variation. By using a summary statistic and calculating its empirical significance, we show that enrichment hypotheses can be tested with power higher than the power of the linkage scan data to identify individual loci. Via simulated linkage scans for a number of different models, we gain insight in the interpretation of genome scan results and test the power of our proposed method. We present an application of the method to real data from a late-onset Alzheimer's disease linkage scan as a proof of principle.Entities:
Mesh:
Year: 2006 PMID: 16848972 PMCID: PMC3525155 DOI: 10.1186/1479-7364-2-6-345
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
The power of our method for different simulated models (five, ten or 20 disease loci, 1,180 or 590 sibling pairs, relative risk (RR) of 2 or 3) different levels of enrichment for true loci and different levels of significance.
| # Real | Group | Power: 1,180 | Power:1,180 | Power: 590 | Power: 590 | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Real loci | False | α = 0.05 | α = 0.01 | α = 0.05 | α = 0.01 | α = 0.05 | α = 0.01 | α = 0.05 | α = 0.01 | |
| 5 | 0 | 100 | 100 | 100 | 92 | 100 | 96 | 100 | 84 | |
| 5 | 3 | 100 | 98 | 90 | 71 | 94 | 88 | 80 | 51 | |
| 5 | 5 | 99 | 94 | 81 | 58 | 93 | 79 | 70 | 40 | |
| 5 | 5 | 10 | 95 | 76 | 69 | 42 | 84 | 58 | 53 | 25 |
| 3 | 0 | 100 | 95 | 88 | 72 | 97 | 88 | 82 | 52 | |
| 3 | 2 | 94 | 78 | 71 | 43 | 84 | 61 | 57 | 30 | |
| 3 | 5 | 80 | 52 | 54 | 27 | 67 | 37 | 38 | 16 | |
| 10 | 0 | 100 | 100 | 100 | 92 | 100 | 92 | 72 | 48 | |
| 10 | 5 | 99 | 92 | 89 | 63 | 88 | 64 | 54 | 29 | |
| 10 | 10 | 94 | 77 | 78 | 48 | 75 | 48 | 46 | 22 | |
| 10 | 20 | 80 | 54 | 58 | 30 | 60 | 30 | 35 | 14 | |
| 10 | 5 | 0 | 90 | 72 | 75 | 48 | 77 | 44 | 48 | 20 |
| 5 | 3 | 75 | 49 | 56 | 28 | 53 | 26 | 32 | 13 | |
| 5 | 5 | 68 | 42 | 49 | 24 | 47 | 21 | 29 | 11 | |
| 5 | 10 | 53 | 28 | 35 | 15 | 35 | 16 | 22 | 7 | |
| 20 | 0 | 100 | 76 | 68 | 44 | 65 | 35 | 32 | 12 | |
| 20 | 10 | 80 | 49 | 56 | 31 | 47 | 23 | 26 | 10 | |
| 20 | 20 | 67 | 37 | 47 | 21 | 40 | 18 | 24 | 10 | |
| 20 | 40 | 49 | 24 | 36 | 16 | 31 | 14 | 17 | 6 | |
| 15 | 0 | 84 | 58 | 61 | 35 | 53 | 26 | 29 | 13 | |
| 20 | 15 | 10 | 64 | 34 | 45 | 20 | 40 | 16 | 23 | 9 |
| 15 | 20 | 51 | 23 | 36 | 15 | 29 | 13 | 18 | 7 | |
| 15 | 30 | 41 | 19 | 29 | 12 | 26 | 9 | 14 | 5 | |
| 10 | 0 | 67 | 40 | 47 | 23 | 41 | 18 | 25 | 9 | |
| 10 | 5 | 51 | 24 | 38 | 15 | 32 | 13 | 19 | 6 | |
| 10 | 10 | 41 | 19 | 31 | 11 | 26 | 9 | 16 | 6 | |
| 10 | 15 | 35 | 14 | 24 | 9 | 23 | 8 | 15 | 5 | |
Results of simulated scans with 1,180 sibling pairs regarding their success in identifying the disease gene locations.
| Number of top findings | ||||||
|---|---|---|---|---|---|---|
| Simulated model | # Disease loci | 1 | 3 | 5 | 10 | 20 |
| H = 0.7, K ≈ 0.03, RR = 3 | 5 | 0.92 (92%) | 2.28 (76%) | 3.16 (63%) | 4.04 (40%) | 4.4 (22%) |
| 10 | 0.72 (72%) | 2.12 (71%) | 3.24 (65%) | 4.8 (48%) | 6.32 (32%) | |
| 20 | 0.52 (52%) | 1.48 (49%) | 2.24 (45%) | 4.24 (42%) | 8.04 (40%) | |
| H = 0.7, K ≈ 0.03, RR = 2 | 5 | 0.75 (75%) | 1.54 (51%) | 1.96 (39%) | 2.79 (28%) | 3.58 (18%) |
| 10 | 0.56 (56%) | 1.44 (48%) | 2.16 (43%) | 3.32 (33%) | 5.08 (25%) | |
| 20 | 0.4 (40%) | 1 (33%) | 1.64 (33%) | 3.24 (32%) | 5.56 (28%) | |
Simulation parameters: H = heritability, K = prevalence, RR = relative risk for each risk allele. The number of simulated disease loci is shown. For the one, three, five, ten and 20 top scoring loci for each of the 25 simulated genome scans, we show how many coincided with true disease loci (and their percentages), as well as their average non-parametric linkage (NPL) score.
Figure 1Power of our method when using a linkage scan of 1,180 sibling pairs and five (A), ten (B) or 20 (C) loci with a relative risk of 3. X-axis: Number of real loci included in the tested group; Y-axis: Fraction of real loci in the group; Z-axis: Power to detect the enrichment.
Application of our method to real data.
| Top locations from scan A | Significance on scan B |
|---|---|
| 5 | 0.220 |
| 10 | 0.016 |
| 15 | 0.048 |
| 20 | 0.077 |
| 25 | 0.089 |
| 30 | 0.094 |
Two sets of pedigrees were used for scans A and B. Groups of top linkage peaks from scan A (their size is shown in column 1) were then tested for enrichment on the results of scan B. Column 2 shows the empirical p-values for these groups.