| Literature DB >> 32677112 |
Sukanya Horpaopan1, Cathy S J Fann2, Mark Lathrop3, Jurg Ott4.
Abstract
An important aspect of disease gene mapping is replication, that is, a putative finding in one group of individuals is confirmed in another set of individuals. As it can happen by chance that individuals share an estimated disease position, we developed a statistical approach to determine the p-value for multiple individuals or families to share a possibly small number of candidate susceptibility variants. Here, we focus on candidate variants for dominant traits that have been obtained by our previously developed heterozygosity analysis, and we are testing the sharing of candidate variants obtained for different individuals. Our approach allows for multiple pathogenic variants in a gene to contribute to disease, and for estimated disease variant positions to be imprecise. Statistically, the method developed here falls into the category of equivalence testing, where the classical null and alternative hypotheses of homogeneity and heterogeneity are reversed. The null hypothesis situation is created by permuting genomic locations of variants for one individual after another. We applied our methodology to the ALSPAC data set of 1,927 whole-genome sequenced individuals, where some individuals carry a pathogenic variant for the BRCA1 gene, but no two individuals carry the same variant. Our shared genomic segment analysis found significant evidence for BRCA1 pathogenic variants within ±5 kb of a given DNA variant.Entities:
Keywords: ALSPAC; computer simulation; equivalence testing; gene mapping; genetic association analysis; sequence variants
Mesh:
Substances:
Year: 2020 PMID: 32677112 PMCID: PMC7540579 DOI: 10.1002/gepi.22335
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Significance levels associated with numbers N f of individuals within a distance d from a given variant
| Significance level for values of | |||||||
|---|---|---|---|---|---|---|---|
|
| 5 | 10 | 20 | 50 | 100 | 200 | 500 |
| 1 | 0.0907 | 0.1695 | 0.3093 | 0.5956 | 0.8296 | 0.9709 | 0.9999 |
| 2 |
|
| 0.0516 | 0.2279 | 0.5209 | 0.8604 | 0.9987 |
| 3 | 0.0001 | 0.0009 |
| 0.0606 | 0.2548 | 0.6679 | 0.9924 |
| 4 | 0.0001 | 0.0001 | 0.0006 |
| 0.0995 | 0.4484 | 0.9735 |
| 5 | 0.0001 | 0.0001 | 0.0001 | 0.0024 |
| 0.2583 | 0.9296 |
| 6 | 0.0001 | 0.0001 | 0.0001 | 0.0004 | 0.0099 | 0.1333 | 0.8475 |
| 7 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0031 | 0.0568 | 0.7308 |
| 8 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0006 |
| 0.5897 |
| 9 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0002 | 0.0088 | 0.4360 |
| 10 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0021 | 0.2990 |
| 11 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0007 | 0.1903 |
| 12 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.1101 |
| 13 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0624 |
| 14 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
|
| 15 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0155 |
| 16 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0001 | 0.0064 |
Note: The bold values are the largest value of p < .05 for each value of d.
Number of runs, N r, of variants capturing a significant (p < .05) number, N > N crit, of individuals, where H max positions within ±d kb of a given variant are considered
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 2 | 7 | 3 | 0.076 | – | – | 217 |
| 5 | 14 | 3 | 0.043 | 0.005 | – | 940 |
| 10 | 24 | 3 | 0.033 | 0.015 | – | 2,845 |
| 20 | 50 | 3 | 0.022 | 0.035 | – | 10,443 |
| 50 | 46 | 4 | 0.044 | 0.021 | – | 19,075 |
| 100 | 59 | 5 | 0.067 | 0.017 | 0.039 | 37,327 |
| 200 | 58 | 8 | 0.120 | – | – | 34,236 |
| 500 | 17 | 14 | 0.275 | – | – | 44,447 |
Note: L avg, average length of runs in kb. L 1 and L 2 are length(s) of runs overlapping the BRCA1 area (there was usually only one such run). N var, number of variants capturing a number of individuals, N > N crit, N crit is the smallest number of families captured with p < .05, but we set a minimum, N crit > 3.
Figure 1The graph shows the 46 runs of variants shared by N = 4 or more individuals (y‐axis) within d = 50 kb of a variant (see Table 2). Runs are numbered consecutively, n r (x axis), with N = number of individuals (y‐axis) in a given run. The red bar (run 25) represents a run containing a variant within the BRCA1 area