| Literature DB >> 34804120 |
Wan-Ping Lee1,2,3, Albert A Tucci4, Mitchell Conery5,6, Yuk Yee Leung1,2,3, Amanda B Kuzma1, Otto Valladares1, Yi-Fan Chou1, Wenbin Lu7, Li-San Wang1,2,3, Gerard D Schellenberg1,3, Jung-Ying Tzeng4,7.
Abstract
Alzheimer's Disease (AD) is a progressive neurologic disease and the most common form of dementia. While the causes of AD are not completely understood, genetics plays a key role in the etiology of AD, and thus finding genetic factors holds the potential to uncover novel AD mechanisms. For this study, we focus on copy number variation (CNV) detection and burden analysis. Leveraging whole-genome sequence (WGS) data released by Alzheimer's Disease Sequencing Project (ADSP), we developed a scalable bioinformatics pipeline to identify CNVs. This pipeline was applied to 1,737 AD cases and 2,063 cognitively normal controls. As a result, we observed 237,306 and 42,767 deletions and duplications, respectively, with an average of 2,255 deletions and 1,820 duplications per subject. The burden tests show that Non-Hispanic-White cases on average have 16 more duplications than controls do (p-value 2e-6), and Hispanic cases have larger deletions than controls do (p-value 6.8e-5).Entities:
Keywords: Alzheiemer’s disease; CNV association test; NGS—next generation sequencing; copy number variation—CNV; whole-genome sequence (WGS)
Year: 2021 PMID: 34804120 PMCID: PMC8599981 DOI: 10.3389/fgene.2021.752390
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Overview of the CNV identification workflow from WGS data consisting of the four steps. 1) Alignment coverage check. 2) Sample-level CNV calling including calling by CNVnator, JAX-CNV and Smoove, and merging the three callsets by Svimmer. Since Svimmer takes the VCF format as input, results of JAX-CNV in the BED format were converted to the VCF format. 3) Project-level CNV re-genotyping. 4) CNV list assembling for PLINK burden analysis. The illustrated three samples in the figure are notated by S1, S2 and S3 while 3,800 samples were processed in the study.
FIGURE 2Alignment coverages of 15 samples with uneven sequence data. Each line is a sample, and each dot presents the alignment coverage for a chromosome.
Total column denotes the number of samples remaining after each quality filtering step.
| AA | Hispanic | NHW | Others | Total | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| ADSP 5K | 472 | 521 | 44 | 826 | 746 | 40 | 910 | 820 | 393 | 5 | 4 | 7 | 4,788 |
| Replicate Removal | 467 | 521 | 44 | 810 | 733 | 40 | 910 | 815 | 393 | 5 | 4 | 7 | 4,749 |
| Unknown Status Removal | 467 | 521 | 0 | 810 | 733 | 0 | 910 | 815 | 0 | 5 | 4 | 0 | 4,265 |
| Uneven Coverage Removal | 466 | 521 | 0 | 808 | 731 | 0 | 902 | 813 | 0 | 5 | 4 | 0 | 4,250 |
| Relatedness Removal | 457 | 521 | 0 | 520 | 727 | 0 | 755 | 811 | 0 | 5 | 4 | 0 |
|
3,800 samples remained after all filtering steps.
FIGURE 3Characteristics of the project level CNV callset. Counts shown on the y-axes of the sub figures are in the log10 scale. (A). The average deletions and duplications detected by CNVnator, JAX-CNV, Smoove and GraphTyper2. Consolidated shows CNV counts after CNV merging and conflicts removing. (B). Length distribution of CNVs after applying GraphTyper2 and the PASS flag filter. Lengths of deletions were presented as negative values while lengths of duplications are positive values. (C). Allele frequency of CNVs of GraphTyper2.
CNV concordant checks with the 1,000 Genomes Project Phase 3 (1KG_P3), gnomAD, and Decipher callsets. Each column resents the percentages of CNVs in the callset overlapping with others listed in rows.
| At least 1bp overlap | At least 50% overlap | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Ours (280,073) | 1KG_P3 (48,131) | gnomAD (188,842) | Decipher (54,422) | Ours (280,073) | 1KG_P3 (48,131) | gnomAD (188,842) | Decipher (54,422) | ||
| Ours | 1 | 0.828 | 0.762 | 0.878 | Ours | 1 | 0.772 | 0.726 | 0.816 |
| 1KG_P3 | 0.398 | 1 | 0.410 | 0.679 | 1KG_P3 | 0.293 | 1 | 0.337 | 0.544 |
| gnomAD | 0.799 | 0.861 | 1 | 0.832 | gnomAD | 0.668 | 0.767 | 1 | 0.712 |
| Decipher | 0.763 | 0.662 | 0.500 | 1 | DECIPHER | 0.724 | 0.600 | 0.458 | 1 |
The four burden features were considered; 1) total event numbers, 2) Proportion of samples with ≥1 events, 3) total event length in kb, and 4) average event length in kb.
| Mean_Case Mean_Control | DelDup | Del | Dup | ||||
|---|---|---|---|---|---|---|---|
| All | Rare | All | Rare | All | Rare | ||
| Total event numbers | All | 4,073 | 59.29 | 2,249 | 47.67 | 1823 | 11.62 |
| 4,079 | 55.61 | 2,261 | 44.41 | 1818 | 11.2 | ||
| 0.736247 | 0.0709259 | 0.876096 | 0.0723559 | 0.021826 | 0.132316 | ||
| AA | 4,072 | 60.24 | 2,268 | 46.81 | 1803 | 13.43 | |
| 4,106 | 63.23 | 2,295 | 49.7 | 1811 | 13.53 | ||
| 0.990162 | 0.743957 | 0.989694 | 0.753128 | 0.882104 | 0.578805 | ||
| Hispanic | 4,193 | 42.63 | 2,408 | 33.98 | 1785 | 8.654 | |
| 4,177 | 59.35 | 2,384 | 48.04 | 1793 | 11.31 | ||
| 0.108108 | 1 | 0.016028 | 1 | 0.974318 | 1 | ||
| NHW | 3,991 | 45.33 | 2,129 | 34.1 | 1861 | 11.23 | |
| 3,972 | 38.81 | 2,127 | 29.01 | 1845 | 9.8 | ||
| 0.158684 | 0.0287979 | 0.461645 | 0.0303239 |
| 0.0354999 | ||
| Proportion of samples with ≥1 events | All | 0.9988 | 0.9988 | 0.9988 | 0.9988 | 0.9988 | 0.9988 |
| 0.9995 | 0.9995 | 0.9995 | 0.9995 | 0.9995 | 0.9985 | ||
| 0.904246 | 0.905054 | 0.905188 | 0.905048 | 0.904122 | 0.581927 | ||
| AA | 0.9956 | 0.9956 | 0.9956 | 0.9956 | 0.9956 | 0.9956 | |
| 1 | 1 | 1 | 1 | 1 | 1 | ||
| 1 | 1 | 1 | 1 | 1 | 1 | ||
| Hispanic | 1 | 1 | 1 | 1 | 1 | 0.9981 | |
| 0.9986 | 0.9986 | 0.9986 | 0.9986 | 0.9986 | 0.9986 | ||
| 0.583197 | 0.583439 | 0.582637 | 0.582109 | 0.583935 | 0.826018 | ||
| NHW | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 0.9975 | ||
| 1 | 1 | 1 | 1 | 1 | 0.269673 | ||
| Total event length in kb | All | 1.856e+05 | 1,053 | 2.983e+04 | 546.2 | 1.558e+05 | 507.2 |
| 1.852e+05 | 941.4 | 2.974e+04 | 457.1 | 1.555e+05 | 484.8 | ||
| 0.017098 | 0.01129 | 0.254809 | 0.00759198 | 0.0602339 | 0.148482 | ||
| AA | 1.859e+05 | 1,013 | 3.185e+04 | 502.7 | 1.54e+05 | 510.8 | |
| 1.857e+05 | 1,055 | 3.175e+04 | 477.6 | 1.54e+05 | 577.4 | ||
| 0.318897 | 0.704127 | 0.330257 | 0.291605 | 0.409045 | 0.942028 | ||
| Hispanic | 1.83e+05 | 750.3 | 3.183e+04 | 408.3 | 1.511e+05 | 342.7 | |
| 1.837e+05 | 911.8 | 3.097e+04 | 392.5 | 1.527e+05 | 519.4 | ||
| 0.982962 | 0.989972 |
| 0.356709 | 1 | 0.999998 | ||
| NHW | 1.873e+05 | 943.1 | 2.725e+04 | 455.1 | 1.601e+05 | 487.9 | |
| 1.863e+05 | 713.7 | 2.734e+04 | 301.8 | 1.59e+05 | 412.9 | ||
| 0.000591999 | 0.00145 | 0.670983 | 0.00347599 | 0.000116 | 0.013062 | ||
| Average event length in kb | All | 45.74 | 19.02 | 13.34 | 12.54 | 85.34 | 40.48 |
| 45.57 | 17.96 | 13.24 | 11.43 | 85.47 | 40.56 | ||
| 0.0489579 | 0.0469619 | 0.0544019 | 0.0573059 | 0.995478 | 0.523385 | ||
| AA | 45.5 | 16.3 | 14.03 | 10.58 | 85.04 | 36.11 | |
| 45.29 | 16.09 | 13.89 | 9.459 | 85.01 | 38.94 | ||
| 0.0487079 | 0.384237 | 0.108808 | 0.0501099 | 0.362197 | 0.935694 | ||
| Hispanic | 43.67 | 16.67 | 13.26 | 11.59 | 84.68 | 35.79 | |
| 44 | 15.33 | 13.02 | 9.041 | 85.03 | 39.45 | ||
| 0.9966 | 0.0739319 | 0.00848998 | 0.00603599 | 0.999998 | 0.966586 | ||
| NHW | 47.31 | 20.6 | 12.98 | 12.82 | 85.98 | 41.66 | |
| 47.16 | 18.64 | 13.02 | 10.99 | 86.16 | 39.69 | ||
| 0.22001 | 0.0258339 | 0.645417 | 0.0592879 | 0.98735 | 0.161868 | ||
Tests were done for all and rare CNVs as well as considering deletions and duplications (DelDup), deletions specific (Del) and duplications specific (Dup). Each cell has three values as mean of cases, mean of controls, and p-value. Two p-values marked in bold indicate statistically significant.
FIGURE 4Summary of CNV burden results for all and rare CNVs by CNV events (DelDup, Del, or Dup) and by ethnicities (ALL, AA, Hispanic, NHW). (A). Total event numbers per sample. (B). Total rare event numbers per sample. (C). Total event length in kb per sample. (D). Total rare event length in kb per sample.