| Literature DB >> 32237178 |
Razvan G Romanescu1,2, Jessica Green1, Irene L Andrulis1,3, Shelley B Bull4.
Abstract
Next generation sequencing technologies have made it possible to investigate the role of rare variants (RVs) in disease etiology. Because RVs associated with disease susceptibility tend to be enriched in families with affected individuals, study designs based on affected sib pairs (ASP) can be more powerful than case-control studies. We construct tests of RV-set association in ASPs for single genomic regions as well as for multiple regions. Single-region tests can efficiently detect a gene region harboring susceptibility variants, while multiple-region extensions are meant to capture signals dispersed across a biological pathway, potentially as a result of locus heterogeneity. Within ascertained ASPs, the test statistics contrast the frequencies of duplicate rare alleles (usually appearing on a shared haplotype) against frequencies of a single rare allele copy (appearing on a nonshared haplotype); we call these allelic parity tests. Incorporation of minor allele frequency estimates from reference populations can markedly improve test efficiency. Under various genetic penetrance models, application of the tests in simulated ASP data sets demonstrates good type I error properties as well as power gains over approaches that regress ASP rare allele counts on sharing state, especially in small samples. We discuss robustness of the allelic parity methods to the presence of genetic linkage, misspecification of reference population allele frequencies, sequencing error and de novo mutations, and population stratification. As proof of principle, we apply single- and multiple-region tests in a motivating study data set consisting of whole exome sequencing of sisters ascertained with early onset breast cancer.Entities:
Keywords: burden tests; familial tests; pathway testing; rare variant tests; sib-pair testing
Year: 2020 PMID: 32237178 PMCID: PMC7318298 DOI: 10.1002/gepi.22291
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Means and variances for counts and conditional on identical by descent sharing ()
|
|
|
|
|
|
|---|---|---|---|---|
| 0 |
|
|
|
|
| 1 |
|
|
|
|
| 2 |
|
| 0 | 0 |
Note: Expressions are accurate to second order in .
Expected total counts of single and duplicate alleles in the sample, stratified by , under the null
| IBD sharing | Contribution to | Contribution to |
|---|---|---|
|
| 0 |
|
|
|
|
|
|
|
| 0 |
| Total |
| 2 |
Note: Bold indicates that a higher magnitude of proportional increase is expected under the alternative. Here, , and expressions are accurate to first order.
Genetic models used to generate ascertained datasets in the power simulations
| Model | Description | Simulation settings | Population MAF |
|---|---|---|---|
| Single region |
|
| <0.001 |
| Multiple region, Additive model |
|
| <0.001 |
| Multiple region, Epistatic model |
|
| <0.005 |
Figure 1Q–Q plots of single‐region test statistic p‐values under the null hypothesis for sample sizes N = 100 and 1,000 families and 100,000 replicated data sets
Figure 2Power curves for testing at α = .0005 (left) and α = .05 (right) for a sample size N = 500, and 10,000 replicated datasets. Results for single region (top panels) and two‐region pathway under the additive model (bottom panels). The horizontal black lines represent the significance threshold α
Figure 3Power of single region testing versus significance threshold for medium penetrance variants (HR = 4) for sample size N = 500 sib pairs, and 100,000 replicated data sets
Top hits for genes in DNA repair pathways (p value (ap‐w) < 0.1), rows are ordered by p value of the allelic parity‐weighted test
| Gene | Chrom |
|
|
|
|
| Pathway |
|---|---|---|---|---|---|---|---|
| MLH1 | 3 | 5 | 0.08 | 0.04 | 0.001 | 0.0002 | 2 |
| BLM | 15 | 3 | 0.35 | 0.13 | 0.003 | 0.0009 | 4 |
| ERCC4 | 16 | 1 | 0.31 | 0.14 | 0.009 | 0.0014 | 3 |
| XPC | 3 | 3 | 0.18 | 0.29 | 0.047 | 0.0076 | 3 |
| POLL | 10 | 2 | 0.24 | 0.11 | 0.047 | 0.0082 | 5 |
| POLD3 | 11 | 1 | 0.32 | – | 0.085 | 0.022 | 1, 3 |
| XRCC3 | 14 | 1 | 0.50 | 0.45 | 0.085 | 0.030 | 4 |
Note: Full results are given in Table S1.
R is the number of RV loci in the gene.
Pathway codes are: 1, base excision repair; 2, mismatch repair; 3, nucleotide excision repair; 4, homologous recombination; and 5, nonhomologous end‐joining.
Pathway testing of DNA repair mechanisms (separately and jointly)
| Pathway |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Base excision repair | 26 | −1.14 | 0.87 | 0.81 | 0.21 | 1.35 | 0.09 |
| Mismatch repair | 16 | 0.90 | 0.19 | 1.54 | 0.06 | 0.97 | 0.17 |
| Nucleotide excision repair | 20 | 1.58 | 0.07 | 1.19 | 0.12 | 3.46 | 2.7E−04 |
| Homologous recombination | 26 | −0.31 | 0.62 | −0.42 | 0.66 | 0.39 | 0.35 |
| Nonhomologous end‐joining | 17 | 0.21 | 0.42 | 0.94 | 0.17 | 2.02 | 0.02 |
| DNA repair (all mechanisms) | 83 | 0.43 | 0.34 | 0.83 | 0.20 | 3.67 | 1.2E−04 |
R is the number of RV loci in the pathway.