| Literature DB >> 28176757 |
Alan M Rice1, Aoife McLysaght1.
Abstract
Human copy number variants (CNVs) account for genome variation an order of magnitude larger than single-nucleotide polymorphisms. Although much of this variation has no phenotypic consequences, some variants have been associated with disease, in particular neurodevelopmental disorders. Pathogenic CNVs are typically very large and contain multiple genes, and understanding the cause of the pathogenicity remains a major challenge. Here we show that pathogenic CNVs are significantly enriched for genes involved in development and genes that have greater evolutionary copy number conservation across mammals, indicative of functional constraints. Conversely, genes found in benign CNV regions have more variable copy number. These evolutionary constraints are characteristic of genes in pathogenic CNVs and can only be explained by dosage sensitivity of those genes. These results implicate dosage sensitivity of individual genes as a common cause of CNV pathogenicity. These evolutionary metrics suggest a path to identifying disease genes in pathogenic CNVs.Entities:
Mesh:
Year: 2017 PMID: 28176757 PMCID: PMC5309798 DOI: 10.1038/ncomms14366
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Summary of human CNVs used in CNV analysis.
Figure 1Illustration of CNVRs and intersection with genes.
Illustrative CNVs are shown with benign CNVs above the genomic region and pathogenic CNVs below (blue and pink lines respectively). Shaded boxes bound CNVRs with local peak coverage regions indicated by darker shading. Genes overlapped by both benign and pathogenic CNVs are termed Class X ‘passenger' genes here (yellow). Where only a single non-passenger pathogenic gene is within a region, it is termed a ‘solitary Class P' gene (orange).
Figure 2Patterns of gene duplication and loss across mammals for orthologues of human genes in CNVs.
(a) Venn diagram showing the number of protein-coding genes overlapped by different combinations of CNV types (blue, benign CNGs; yellow, benign CNLs; green, pathogenic gain peak coverage regions; red, pathogenic loss peak regions). (b) Genes that are covered exclusively by benign CNVs are labelled as ‘Class B' (shaded red), those exclusive to pathogenic CNVs as ‘Class P' (shaded blue) and those falling in CNVs with both clinical interpretations for gain or loss are considered as likely to be passenger genes and labelled ‘Class x' (shaded grey). It is noteworthy that the classification refers to the CNVs that the genes fall within rather than the genes themselves. (c) Phylogenetic tree of 13 mammalian species used for gene conservation analysis and examples of human genes from each CNV overlap pattern type (Venn diagram segment) showing the orthologue distribution in the mammals. A dash indicates no change. (d) Box plot of the number of mammalian species where copy number is unchanged (black), duplication has occurred (green) and no orthologues (orange) for different categories of CNV overlap, as indicated below the boxplots. Upper and lower hinges of boxes correspond to the first and third quartiles. The median is shown within each box. Whiskers extend to values 1.5 × interquartile range. These data were calculated per gene as illustrated in c. The sample size is shown below each boxplot.
Genes included in different types of CNV have different genetic and functional characteristics.
| Developmental genes | 22.4% (117) | 20.2% (19) | 25.5% (28) | 2.7 × 10−11 | |||
| Protein complex members | 33.0% (31) | 35.5% (39) | 7.5 × 10−9 | ||||
| Ohnologues | 29.8% (28) | 41.8% (46) | 4.1 × 10−10 | ||||
| Haploinsufficient genes | 13.4% (11) | 14.9% (15) | 4.3 × 10−6 | ||||
| Haploinsufficiency score (median) | 0.014 | 0.028 | 0.009 | 0.002 | 0.001 | ||
| · | · | 0.005 | |||||
| Maximal expression in RPKMs (median) | 9.6 | 19.6 | 12.6 | 20.4 | 14.1 | ||
| · | · | <1.0 × 10−16 | |||||
| · | · | 0.005 | |||||
| · | · | 4.5 × 10−13 |
BG/PL, genes exclusively overlapped by benign gain CNVRs and pathogenic loss peak CNVRs; BL/PG, genes exclusively overlapped by benign loss CNVRs and pathogenic gain peak CNVRs; CNV, copy number variant; CNVR, CNV regions; RPKM, reads per kilobase of transcript per million mapped reads.
*Genes exclusively observed in benign CNVRs.
†Gene exclusively observed in pathogenic CNVRs.
‡Genes observed in contradictory CNV types and clinical interpretations.
§All P-values are Bonferroni corrected. Values in bold have adjusted residuals >±2 in the χ2-test.
‖Pairwise comparisons are indicated with dots. All P-values are Bonferroni corrected.
¶Genes with probability of loss-of-function mutation intolerance >90% inferred in ref. 49.
#Probability of loss-of-function mutation intolerance inferred in ref. 49.
Figure 3Mammalian copy number changes for genes within known pathogenic CNVRs.
(a) Copy number changes across mammalian species for genes within known pathogenic CNVRs associated with schizophrenia and other neurodevelopmental disorders obtained from ref. 36. The minimal (min) CNVR (shaded dark grey) is typically the smallest region associated with the disease phenotype, while the maximal (max) CNVR (shaded light grey) is typically observed. Ten flanking genes on each side are also plotted where possible. Each region is labelled above with the chromosomal band and position along chromosome in megabases is shown on the x axis. Genes are plotted by start position. Each point represents for one human gene the number of duplications (green) and losses (orange). Genes within regions are listed in Supplementary Data 6. (b) For each protein-coding gene within the 22q11 region, copy number changes across 13 mammalian species are shown. Green circles indicate where orthologues are duplicated, orange circles where orthologues are missing. Genes highlighted in light red are genes where at least one orthologue is present in all species and genes highlighted in dark red are genes with conserved one-to-one orthology across the mammalian species tested (completely conserved genes). Grey dashed outlines group orthologues that are neighbouring on their respective chromosome/scaffold in each species. Genes with greyed-out names were not included in copy number analysis and so no data are displayed for them.