| Literature DB >> 32533064 |
Lide Han1,2, Xuefang Zhao3,4,5, Mary Lauren Benton2,6, Thaneer Perumal7, Ryan L Collins3,4,8, Gabriel E Hoffman9,10, Jessica S Johnson9, Laura Sloofman9, Harold Z Wang3,4, Matthew R Stone3,4, Kristen J Brennand9, Harrison Brand3,4,5, Solveig K Sieberts7, Stefano Marenco11, Mette A Peters7, Barbara K Lipska11, Panos Roussos9,10,12,13, John A Capra2,6,14, Michael Talkowski3,4,5,8,15, Douglas M Ruderfer16,17,18,19.
Abstract
Structural variants (SVs) contribute to many disorders, yet, functionally annotating them remains a major challenge. Here, we integrate SVs with RNA-sequencing from human post-mortem brains to quantify their dosage and regulatory effects. We show that genic and regulatory SVs exist at significantly lower frequencies than intergenic SVs. Functional impact of copy number variants (CNVs) stems from both the proportion of genic and regulatory content altered and loss-of-function intolerance of the gene. We train a linear model to predict expression effects of rare CNVs and use it to annotate regulatory disruption of CNVs from 14,891 independent genome-sequenced individuals. Pathogenic deletions implicated in neurodevelopmental disorders show significantly more extreme regulatory disruption scores and if rank ordered would be prioritized higher than using frequency or length alone. This work shows the deleteriousness of regulatory SVs, particularly those altering CTCF sites and provides a simple approach for functionally annotating the regulatory consequences of CNVs.Entities:
Mesh:
Year: 2020 PMID: 32533064 PMCID: PMC7293301 DOI: 10.1038/s41467-020-16736-1
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1Details of CMC SV dataset.
Characterization of high confidence rare (<0.5%) SV dataset stratified by a type of SV, b allele frequency, and c length (log10-scaled) colored by type of SV. SV types, include Alu (Alu), complex (CPX), translocation (CTX), deletion (DEL), duplication (DUP), insertion (INS), inversion (INV), long interspersed nuclear element-1 (LINE1), SINE-VNTR-Alu (SVA), including short interspersed nuclear elements, variable number tandem repeat, and Alu.
Fig. 2Genic and regulatory SVs occur at significantly lower frequencies.
Proportion of variants that are seen only a single time with bootstrapped 95% confidence interval in the sample stratified by overlap with any annotation, allowing for multiple (CMC), only a single annotation (CMC unique) and any annotation in gnomAD SV.
Fig. 3Genic SVs induce observable changes in expression.
Expression presented as a z-score for a all CNV that overlap any proportion of the exonic sequence of a gene, b CNV that delete or duplicate 100% of the exonic sequence of a gene, and c all inversions with any gene overlap (green) compared to all other SVs (gray). Deletions are red, duplications are blue. The dashed lines are located at z-score of 2 and −2.
Genes affected by CNVs are significantly more likely to be expression outliers.
| | | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| SV type | Annotation class | Outliers | Proportion | Outliers | Proportion | Outliers | Proportion | ||||
| Deletions | Coding | 1670 | 30 | 0.018 | 8.59E−02 | 237 | 0.142 | 3.1E-132* | 267 | 0.160 | 3.62E−104* |
| Intergenic | 5,85,882 | 7980 | 0.014 | 9.99E−01 | 10,173 | 0.017 | 1.0E+00 | 18,153 | 0.031 | 1.00E+00 | |
| Intronic | 15,997 | 264 | 0.017 | 1.50E−03 | 286 | 0.018 | 5.1E−01 | 550 | 0.034 | 2.31E−02 | |
| Other transcribed product | 566 | 7 | 0.012 | 6.58E−01 | 114 | 0.201 | 1.5E−81* | 121 | 0.214 | 2.28E−62* | |
| Duplications | Coding | 2374 | 324 | 0.136 | 7.99E−181* | 36 | 0.015 | 7.9E−01 | 360 | 0.152 | 1.45E−122* |
| Intergenic | 2,30,007 | 3619 | 0.016 | 1.00E+00 | 3942 | 0.017 | 7.4E−02 | 7561 | 0.033 | 1.00E+00 | |
| Intronic | 4560 | 74 | 0.016 | 7.60E−01 | 76 | 0.017 | 6.0E−01 | 150 | 0.033 | 7.44E−01 | |
| Other transcribed product | 895 | 143 | 0.160 | 2.81E−89* | 5 | 0.006 | 1.0E+00 | 148 | 0.165 | 7.45E−56* | |
| Insertions | Coding | 1395 | 21 | 0.015 | 2.57E−01 | 28 | 0.020 | 1.7E−01 | 49 | 0.035 | 1.16E−01 |
| Intergenic | 82,414 | 1049 | 0.013 | 9.11E−01 | 1357 | 0.016 | 8.7E−01 | 2406 | 0.029 | 9.53E−01 | |
| Intronic | 829 | 12 | 0.014 | 3.75E−01 | 12 | 0.014 | 7.2E−01 | 24 | 0.029 | 5.57E−01 | |
| Other transcribed product | 337 | 7 | 0.021 | 1.45E−01 | 9 | 0.027 | 1.1E−01 | 16 | 0.047 | 4.24E−02 | |
| Inversions | Coding | 515 | 14 | 0.027 | 1.46E−02 | 21 | 0.041 | 4.5E−04* | 35 | 0.068 | 2.39E−05* |
| Intergenic | 2495 | 29 | 0.012 | 9.98E−01 | 42 | 0.017 | 9.8E−01 | 71 | 0.028 | 1.00E+00 | |
| Intronic | 57 | 0 | 0.000 | 1.00E+00 | 0 | 0.000 | 1.0E+00 | 0 | 0.000 | 1.00E+00 | |
| Other transcribed product | 162 | 5 | 0.031 | 9.02E−02 | 0 | 0.000 | 1.0E+00 | 5 | 0.031 | 6.63E−01 | |
| Alu | Coding | 174 | 1 | 0.006 | 9.07E−01 | 6 | 0.034 | 6.8E−02 | 7 | 0.040 | 2.68E−01 |
| Intergenic | 1,29,005 | 1751 | 0.014 | 3.69E−01 | 2116 | 0.016 | 7.9E−01 | 3867 | 0.030 | 5.94E−01 | |
| Intronic | 361 | 5 | 0.014 | 5.42E−01 | 4 | 0.011 | 8.4E−01 | 9 | 0.025 | 7.56E−01 | |
| Other transcribed product | 22 | 0 | 0.000 | 1.00E+00 | 1 | 0.045 | 3.1E−01 | 1 | 0.045 | 4.88E−01 | |
Number and proportion of expression outliers by SV type and annotation. p Values are from Fisher’s Exact test (one-sided) comparing SVs in annotation class to others within SV type. * indicates significance after Bonferroni multiple test correction for 60 tests (p < 0.00083).
Genic and regulatory features significantly contribute to predicting transcriptional consequences of CNVs.
| CNV class | Variable | Beta | SE | T | P | Beta | SE | T | P | Beta | SE | T | P |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Deletions | Exonic Proportion | −1.7762 | 0.0664 | −26.77 | 9.9E−158 | −2.0563 | 0.0852 | −24.12 | 1.8E−128 | −1.5571 | 0.0939 | −16.59 | 8.68E−62 |
| Enhancer sum | −0.0152 | 0.0083 | −1.83 | 6.7E−02 | −0.0093 | 0.0094 | −0.99 | 3.2E−01 | −0.0188 | 0.0112 | −1.69 | 9.14E−02 | |
| Promoter proportion | −0.1726 | 0.0589 | −2.93 | 3.4E−03 | −0.1719 | 0.0747 | −2.30 | 2.1E−02 | −0.0872 | 0.0843 | −1.04 | 3.01E−01 | |
| SV Length | −2.14E−07 | 1.46E-08 | −14.70 | 6.9E−49 | −1.92E−07 | 1.70E−08 | −11.26 | 2.0E−29 | −1.89E−07 | 3.42E−08 | −5.53 | 3.19E−08 | |
| Within TAD | −0.0090 | 0.0022 | −4.03 | 5.7E−05 | −0.0113 | 0.0031 | −3.61 | 3.0E−04 | −0.0046 | 0.0028 | −1.65 | 9.83E−02 | |
| Duplications | Exonic Proportion | 0.7825 | 0.0352 | 22.22 | 3.0E−109 | 1.1285 | 0.0546 | 20.67 | 1.0E−94 | 0.5043 | 0.0442 | 11.42 | 3.43E−30 |
| Enhancer sum | −0.0157 | 0.0027 | −5.77 | 8.1E−09 | 0.0015 | 0.0062 | 0.24 | 8.1E−01 | −0.0164 | 0.0030 | −5.42 | 5.84E−08 | |
| Promoter proportion | 0.3735 | 0.0326 | 11.45 | 2.5E−30 | 0.3523 | 0.0509 | 6.92 | 4.5E−12 | 0.3438 | 0.0403 | 8.54 | 1.37E−17 | |
| SV Length | 3.99E-07 | 2.60E-08 | 15.34 | 4.6E−53 | 4.07E−07 | 3.52E−08 | 11.58 | 5.3E−31 | 3.05E−07 | 3.68E−08 | 8.30 | 1.04E−16 | |
| Within TAD | 0.0046 | 0.0036 | 1.26 | 2.1E−01 | 0.0072 | 0.0052 | 1.38 | 1.7E−01 | 0.0038 | 0.0045 | 0.86 | 3.89E−01 | |
Coefficients of linear regression model to predict expression z-scores in deletions and duplications, across all samples and stratified by cohort.
Fig. 4Genes intolerant to variation are less likely to be affected by genic or regulatory SVs.
Each plot stratifies genes using either the LoF intolerance metric or the CNV intolerance metric that have been split into quintiles (20% bins) ordered left to right from least to most intolerant genes and by deletion (red) and duplication (blue). The plots show the effect of this stratification on a the proportion of the exonic sequence that is affected showing mean and standard deviation, b the deviation from the expected 20% of CNV that alter exonic sequence, c the deviation from expected for noncoding CNV that alter promoters, and d the deviation from expected for noncoding CNV that alter enhancers.
Fig. 5Transcriptional consequences of rare CNVs can be significantly predicted.
SV expression prediction performance and associated R2 from building the same linear model using different training and test datasets. a CMC into CMC_HBCC, b CMC_HBCC into CMC, c CMC into CMC, and d CMC_HBCC into CMC_HBCC. The best fit line with confidence interval was produced using generalized additive model smoothing.
Fig. 6Regulatory disruption scores prioritize pathogenic CNVs better than standard annotations.
Number of pathogenic variants defined as 50% overlap with known pathogenic variant in ClinGen (84 deletions and 84 duplications) identified based on rank ordering deletions a and duplications b by length (yellow), number of genes deleted (green), number of intolerant genes deleted (purple) allele frequency (red), and regulatory disruption (blue). Where multiple variants had the same value, the order was random.