| Literature DB >> 27533299 |
Douglas M Ruderfer1,2,3, Tymor Hamamsy1, Monkol Lek3,4, Konrad J Karczewski3,4, David Kavanagh1,2, Kaitlin E Samocha3,4, Mark J Daly3,4, Daniel G MacArthur3,4, Menachem Fromer1,2,3,4, Shaun M Purcell1,2,3,4,5.
Abstract
Copy number variation (CNV) affecting protein-coding genes contributes substantially to human diversity and disease. Here we characterized the rates and properties of rare genic CNVs (<0.5% frequency) in exome sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC) database. On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes affected by CNVs were more intolerant than in controls. The ExAC CNV data constitute a critical component of an integrated database spanning the spectrum of human genetic variation, aiding in the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online.Entities:
Mesh:
Year: 2016 PMID: 27533299 PMCID: PMC5042837 DOI: 10.1038/ng.3638
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Figure 1Distribution of number and amount (in kb) of CNV across 59,898 exome-sequenced individuals. Including histogram of number of CNVs per individual (top), two-dimensional density plot of CNV number and amount (middle), and density plot of amount of CNV per individual (right).
Figure 2Genic summary of rare deletions and duplications in ExAC sample
a. Proportion of individuals having from 0 to 10 or more genes deleted (red) or duplicated (blue). b. Proportion of CNV that affect multiple genes (multi-gene), impact the entirety of a single gene (full-gene), or partially disrupt a single gene (partial-gene). The two rightmost bars split these proportions for deletion and duplications, respectively.
Number of total genes impacted (N), and mean number of gene-level CNV per individual (rate). The bottom two rows consider only CNV affecting a single entire gene (single-gene) or only part of a gene (partial-gene); second and third columns separately split out deletions and duplications.
| Genes (n=15,734) | N | Rate | N | Rate | N | Rate |
|---|---|---|---|---|---|---|
| All | 13,862 | 2.565 | 9,156 | 0.817 | 12,696 | 1.747 |
| Single-gene | 7,159 | 0.881 | 4,723 | 0.399 | 5,268 | 0.481 |
| Partial-gene | 4,886 | 0.543 | 3,358 | 0.251 | 3,435 | 0.292 |
Figure 3Brain relevant genes demonstrate greatest intolerance to dosage changes from CNVs
a. After removing genes highly expressed in all tissues (FPKM > 20), 27 tissues[30] were rank-ordered by the mean ExAC CNV intolerance scores for the highly expressed genes in each tissue; mean and standard error of mean intolerance score are indicated by bold line and box width, respectively. Box color denotes significance of two-sided t-test of difference of intolerance scores between tissue-expressed genes and all others; white bars indicate no significant difference (p > 0.05). Vertical dashed blue line marks the mean CNV intolerance score for all genes. b. Network diagrams of pathways significantly enriched for the 5% most CNV-intolerant (red) and CNV-tolerant (blue) genes [created using Enrichment Map Cytoscape plug-in[38]]. Results are based on tests of 9 categories of pathways (GO molecular, GO biological, GO cellular, Human Phenotype, Mouse Phenotype, Domain, Pathway, Gene Family, and Disease); only those surpassing Bonferroni (p < 0.05) and FDR significance are shown. Node size represents number of genes in a pathway, color represents significance of enrichment, and thickness of a pairwise edge corresponds to the proportion of genes overlapping between the corresponding pair of gene sets. Groupings were manually assigned a label, and genes listed are those present in all significant pathways within a group.