| Literature DB >> 25083881 |
Rakesh Chettier1, Kenneth Ward1, Hans M Albertsen1.
Abstract
Endometriosis is a complex gynecological condition that affects 6-10% of women in their reproductive years and is defined by the presence of endometrial glands and stroma outside the uterus. Twin, family, and genome-wide association (GWA) studies have confirmed a genetic role, yet only a small part of the genetic risk can be explained by SNP variation. Copy number variants (CNVs) account for a greater portion of human genetic variation than SNPs and include more recent mutations of large effect. CNVs, likely to be prominent in conditions with decreased reproductive fitness, have not previously been examined as a genetic contributor to endometriosis. Here we employ a high-density genotyping microarray in a genome-wide survey of CNVs in a case-control population that includes 2,126 surgically confirmed endometriosis cases and 17,974 population controls of European ancestry. We apply stringent quality filters to reduce the false positive rate common to many CNV-detection algorithms from 77.7% to 7.3% without noticeable reduction in the true positive rate. We detected no differences in the CNV landscape between cases and controls on the global level which showed an average of 1.92 CNVs per individual with an average size of 142.3 kb. On the local level we identify 22 CNV-regions at the nominal significance threshold (P<0.05), which is greater than the 8.15 CNV-regions expected based on permutation analysis (P<0.001). Three CNV's passed a genome-wide P-value threshold of 9.3 × 10(-4); a deletion at SGCZ on 8p22 (P = 7.3 × 10(-4), OR = 8.5, Cl = 2.3-31.7), a deletion in MALRD1 on 10p12.31 (P = 5.6 × 10(-4), OR = 14.1, Cl = 2.7-90.9), and a deletion at 11q14.1 (P = 5.7 × 10(-4), OR = 33.8, Cl = 3.3-1651). Two SNPs within the 22 CNVRs show significant genotypic association with endometriosis after adjusting for multiple testing; rs758316 in DPP6 on 7q36.2 (P = 0.0045) and rs4837864 in ASTN2 on 9q33.1 (P = 0.0002). Together, the CNV-loci are detected in 6.9% of affected women compared to 2.1% in the general population.Entities:
Mesh:
Year: 2014 PMID: 25083881 PMCID: PMC4118997 DOI: 10.1371/journal.pone.0103968
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Pre and Post-filter CNV counts.
| Total CNV counts | FPR | |
|
| 450,779 (20.96) | - |
|
| 157,545 (7.84) | 0.78 |
|
| 43,560 (2.17) | 0.14 |
|
| 38,609 (1.92) | 0.07 |
The table show the CNV-counts for Cases and Controls combined (n = 20,100) at four levels of filtering. First, Raw PennCNV reflect total CNV-counts initially identified by PennCNV. The second level, PennCNV 10-SNP, show the counts after applying a minimum 10-SNP window. Next, the Post-filter counts are shown, which reflect the counts after a series of empirically-derived CNV-quality filters were applied. The empirically derived criteria were set after a meticulous review of a large number of the candidate CNVs included in PennCNV 10-SNP. After grouping the Post-filter CNVs by sample it became evident that a small subset of samples (2–3%) had very high CNV-counts, and visual inspection revealed a majority of the CNVs in these samples to be false. To eliminate the excessive CNV-counts an outlier-filter was applied. The CNVs remaining after Outlier-filter were used in the association analysis. The average CNV-counts per individual are shown in parenthesis. The right-most column show the False Positive Rate (FPR) determined by visual inspection of 1,000 CNVs randomly selected at each step.
Post-filter CNV counts and relative CNV frequency.
| Count by Individual | Count by CNV | Frequency of CNV | ||||
| CNV counts | CTL | ENDM | CTL | ENDM | CTL | ENDM |
| 0 | 2765 | 299 | 0 | 0 | 0.154 | 0.141 |
| 1 | 4814 | 539 | 4814 | 539 | 0.268 | 0.254 |
| 2 | 4520 | 517 | 9040 | 1034 | 0.251 | 0.243 |
| 3 | 2939 | 350 | 8817 | 1050 | 0.164 | 0.165 |
| 4 | 1513 | 196 | 6052 | 784 | 0.084 | 0.092 |
| 5 | 746 | 95 | 3730 | 475 | 0.042 | 0.045 |
| 6 | 326 | 53 | 1956 | 318 | 0.018 | 0.025 |
| 7 | 138 | 22 | 966 | 154 | 0.008 | 0.010 |
| 8 | 73 | 12 | 584 | 96 | 0.004 | 0.006 |
| 9 | 40 | 7 | 360 | 63 | 0.002 | 0.003 |
| 10 | 26 | 9 | 260 | 90 | 0.001 | 0.004 |
| 11 | 14 | 6 | 154 | 66 | 0.001 | 0.003 |
| 12 | 21 | 5 | 252 | 60 | 0.001 | 0.002 |
| 13 | 7 | 1 | 91 | 13 | 0.000 | 0.000 |
| 14 | 4 | 1 | 56 | 14 | 0.000 | 0.000 |
| 15 | 2 | 2 | 30 | 30 | 0.000 | 0.001 |
| 16 | 0 | 0 | 0 | 0 | 0.000 | 0.000 |
| 17 | 3 | 0 | 51 | 0 | 0.000 | 0.000 |
| 18 | 2 | 1 | 36 | 18 | 0.000 | 0.000 |
| 19 | 3 | 4 | 57 | 76 | 0.000 | 0.002 |
| 20 | 3 | 2 | 60 | 40 | 0.000 | 0.001 |
| 21 | 0 | 1 | 0 | 21 | 0.000 | 0.000 |
| 22 | 1 | 1 | 22 | 22 | 0.000 | 0.000 |
| 23 | 1 | 0 | 23 | 0 | 0.000 | 0.000 |
| 24 | 2 | 2 | 48 | 48 | 0.000 | 0.001 |
| > = 25 | 11 | 1 | 1049 | 41 | 0.001 | 0.000 |
| Total | 17974 | 2126 | 38508 | 5052 | 1.000 | 1.000 |
The CNV counts shown here represent the 43,560 candidate CNVs that remain after applying the Post-filter. The first column shows a specific CNV count. The second set of columns show the number of control and case individuals observed at each given CNV count. The center columns show the cumulative count of CNVs, and the last columns show the frequency at which a given number of CNVs are observed in each group of study-participants. A small subset of both case and control samples show highly inflated CNV counts with the highest CNV-counts being 41 and 231 in cases and controls respectively. A review of the individual CNVs in this group revealed that the vast majority of these CNVs are short (less than 20 SNPs), incorrectly called variants of the type CN = 1 and CN = 3. In fact, based on the visual inspection we generally found about 1–3 true CNVs per sample in this group. Using a systematic assessment to identify outliers we classified samples with more than 6 CNVs as outliers prone to increasingly high false-CNV counts.
Filtered CNV counts stratified by copy-number state before and after outlier removal.
| Before outlier removal | After outlier removal | |||
| CN state | ENDM | CTL | ENDM | CTL |
| 0 | 5 (0.1%) | 20 (0.1%) | 5 (0.1%) | 19 (0.1%) |
| 1 | 2,501 (49.5%) | 16,186 (42.0%) | 1,917 (45.6%) | 9,957 (47.1%) |
| 3 | 2,540 (50.2%) | 22,243 (57.8%) | 2,273 (54.1%) | 11,087 (52.5%) |
| 4 | 6 (0.1%) | 59 (0.2%) | 5 (0.1%) | 55 (0.3%) |
The table summarizes the filtered CNV counts by copy-number state before and after outlier removal. A group of 77 cases (3.6%) and 351 controls (2.0%) percentage of samples were found to have very high CNV-counts (>6). Visual inspection of many of these CNVs revealed that a majority of these CNVs are false positives and that these samples generally have 1–3 true CNVs. Based on this observation we applied an outlier-removal filter to minimize the inflation of CNV-counts caused by sample specific and systematic effects. The frequency of each CN state is shown in parenthesis. After outlier removal the frequency of the different CN-states are quite similar in the case and control populations.
Figure 1Overall comparisons of autosomal CNVs observed across the case and control populations after filtering are shown in the panels above.
The data represented here reflect samples with CNV-count ≥1 after outlier-removal (cases = 1,750, controls = 14,858), autosomal probes with call-rate ≥99% (n = 533,512), and filtered CNVs (n = 38,609). Panel A show the frequency of CNVs by probe-count in various bin-sizes (10–14 probes; 15–19 probes; etc.), and Panel B show observed CNV-lengths in various bin-sizes (25 kb–49 kb; 50 kb–99 kb; etc.). The combined length of CNVs observed per individual is shown in Panel C. The case and control distributions in each panel are statistically similar implying that on a global level there is no difference between cases and controls in this study.
Average CNV profiles in Cases and Controls with outliers removed.
| ENDM | CTL | |
| Probe count per CNV | 32 | 33 |
| Average CNV Count per Individual | 1.98 | 1.91 |
| Average CNV Length (kb) | 135.3 | 143.1 |
| Total genomic CNV (kb) per Individual | 324.6 | 331.4 |
Table 4 shows the average CNV profiles in cases and controls after outlier removal. The probe count is specific to the Illumina Omniexpress platform and dependent on the SNP-filters we applied, while the CNV count and lengths are likely to reflect true population averages for CNVs about 50 kb in length or larger.
Endometriosis CNV association results at specific loci.
| CNV characteristics | Statistics | CNV Counts | |||||||
| Locus | Cytoband | Gene | Probes | CNV Size (bp) | gain/loss | p-Value | OR[95%CI] | Case (n = 2,126) | Control (n = 17,974) |
| chr3:4050668–4076977 | 3p26.1 |
| 13 | 26,309 | loss | 2.3×10−2 | 2.61[1.02–5.94] | 8 | 26 |
| chr4:186997766–187054136 | 4q35.1 |
| 15 | 56,370 | gain | 1.0×10−2 | 6.77[1.34–31.5] | 4 | 5 |
| chr6:29096413–29161434 | 6p22.1 |
| 20 | 65,021 | loss | 3.6×10−2 | 2.03[0.98–3.88] | 12 | 50 |
| chr6:162828828–162864838 | 6q26 |
| 11 | 36,010 | loss | 5.0×10−3 | 3.03[1.31–6.43] | 10 | 28 |
| chr7:153586831–153654045 | 7q36.2 |
| 32 | 67,214 | gain | 2.9×10−2 | 2.99[0.96–7.96] | 6 | 17 |
| chr8:4231556–4261356 | 8p23.2 |
| 20 | 29,800 | loss | 3.1×10−2 | 4.23[0.93–15.8] | 4 | 8 |
| chr8:13864062–13924509 | 8p22 |
| 21 | 60,447 | loss | 7.3×10−4 | 8.47[2.26–31.7] | 6 | 6 |
| chr9:119533737–119576802 | 9q33.1 |
| 21 | 43,065 | loss | 2.1×10−2 | 3.85[1.05–12.0] | 5 | 11 |
| chr10:14987439–15045633 | 10p13 |
| 14 | 58,194 | gain | 1.8×10−2 | 0.15[0.00–0.88] | 1 | 56 |
| chr10:19669367–19687397 | 10p12.31 |
| 9 | 23,660 | loss | 5.6×10−4 | 14.1[2.74–90.9] | 5 | 3 |
| chr11:80984220–81014974 | 11q14.1 | none | 19 | 30,754 | loss | 5.7×10−4 | 33.84[3.3–1651] | 4 | 1 |
| chr11:97419744–97460206 | 11q22.1 | none | 7 | 40,462 | loss | 2.2×10−2 | 4.84[1.04–19.0] | 4 | 7 |
| chr12:73231378–73363956 | 12q21.1 | none | 13 | 132,578 | loss | 3.4×10−3 | 11.29[1.91–77.1] | 4 | 3 |
| chr13:20216910–20524704 | 13q12.11 |
| 28 | 307,794 | gain | 3.5×10−3 | 7.06[1.7–27.77] | 5 | 6 |
| chr13:84111773–84238849 | 13q31.1 |
| 27 | 127,076 | loss | 1.2×10−2 | 1.91[1.11–3.13] | 20 | 89 |
| chr14:63775684–63870930 | 14q23.2 |
| 22 | 95,246 | gain | 1.3×10−2 | 3.30[1.16–8.28] | 7 | 18 |
| chr19:20081948–20170214 | 19p12 |
| 41 | 88,266 | gain | 1.2×10−3 | 10.59[2.28–53.4] | 5 | 4 |
| chr19:53611187–53627882 | 19q13.42 |
| 12 | 16,695 | loss | 3.5×10−3 | 7.06[1.7–27.77] | 5 | 6 |
| chr22:42884997–42895634 | 22q13.2 |
| 2 | 10,637 | gain | 3.1×10−2 | 4.23[0.93–15.8] | 4 | 8 |
| chrX:8407958–8463228 | Xp22.31 |
| 5 | 55,270 | gain | 9.8×10−3 | 5.4[1.28–26.1] | 6 | 4 |
| chrX:115941080–115986516 | Xq23 | none | 13 | 45,436 | loss | 9.2×10−3 | 14.4[1.4–707.2] | 4 | 1 |
| chrX:140348507–140363732 | Xq27.2 |
| 2 | 15,225 | gain | 1.3×10−2 | 2.17[1.14–4.03] | 18 | 30 |
Copy-Number-Variant (CNV data from a case∶control cohort was analyzed for association with endometriosis. Of 34 candidate loci identified using ParseCNV 22 loci passed a nominal significance threshold upon individual inspection and three of these passed the genome-wide significance threshold of 9.3×10−4. The coordinates reported are based on NCBI build 37, hg19 reference sequence.
p-Values were calculated using Fisher's exact test.
CNV is located 20,000 bp downstream of SGCZ.
Flanking genes over 90 kb away.
The analysis of the X chromosome included 1,845 endometriosis cases and 6,640 female population control subjects.
Figure 2The genomic coverage of three rare copy number variant regions that show strong association with endometriosis are depicted here.
The deletion at SGCZ on 8p22 (P = 7.3×10−4, OR = 8.5, Cl = 2.3–31.7) is shown in panel A, a deletion in MALRD1 on 10p12.31 (P = 5.6×10−4, OR = 14.1, Cl = 2.7–90.9) is shown in panel B, and a deletion at 11q14.1 (P = 5.7×10−4, OR = 33.8, Cl = 3.3–1651) is shown in panel C. The genomic coverage of CNVs observed in endometriosis cases are represented in red bars and the population controls in brown bars, with genes represented in blue. The red box on each ideogram shows the chromosomal location of the CNVs. To ensure correct CNV-calls in the three regions we performed a visual inspection of the LRR and BAF plots for all samples in the study population. LRR and BAF plots for each of the individuals represented above are shown in Figure S1 in File S2. CNVs with apparently identical boundaries were grouped as indicated by the number in parenthesis. Haplotypes in each group were compared to determine if the CNVs in each group have shared ancestral origin.
SNP association in CNV regions.
| Genomic position | Genotype count | Genotype frequency | SNP count | SNP association | Gene location | |||||||
| SNP ID | Chr | Pos | ENDM | CTL | ENDM | CTL | CNV region | non-LD | OR | p-Value | Gene | Location |
| rs758316 | 7 | 153,631,648 | 303/1111/847 | 4046/11897/9468 | 0.378 | 0.393 | 34 | 7 | 0.94 | 0.0045 |
| INTRON |
| rs4837864 | 9 | 119,544,433 | 49/445/1789 | 300/4780/20384 | 0.119 | 0.106 | 22 | 7 | 1.14 | 0.0002 |
| INTRON |
The SNPs located within the eighteen autosomal CNV regions have previously been evaluated for association with endometriosis (Albertsen et al.), where they failed to pass the genome-wide significance threshold (p<5×10−8) applied in GWA studies. Using the less stringent criteria applied to candidate genes, we found that both regions have 7 non-LD SNPs (r2<0.2) which provide an adjusted significance threshold of 0.007. The two SNPs listed here have genotypic p-Values that are significant at this threshold. Both SNPs were in Hardy-Weinberg equilibrium (p>10−3).