| Literature DB >> 23293001 |
Joseph T Glessner1, Jin Li, Hakon Hakonarson.
Abstract
A number of copy number variation (CNV) calling algorithms exist; however, comprehensive software tools for CNV association studies are lacking. We describe ParseCNV, unique software that takes CNV calls and creates probe-based statistics for CNV occurrence in both case-control design and in family based studies addressing both de novo and inheritance events, which are then summarized based on CNV regions (CNVRs). CNVRs are defined in a dynamic manner to allow for a complex CNV overlap while maintaining precise association region. Using this approach, we avoid failure to converge and non-monotonic curve fitting weaknesses of programs, such as CNVtools and CNVassoc, and although Plink is easy to use, it only provides combined CNV state probe-based statistics, not state-specific CNVRs. Existing CNV association methods do not provide any quality tracking information to filter confident associations, a key issue which is fully addressed by ParseCNV. In addition, uncertainty in CNV calls underlying CNV associations is evaluated to verify significant results, including CNV overlap profiles, genomic context, number of probes supporting the CNV and single-probe intensities. When optimal quality control parameters are followed using ParseCNV, 90% of CNVs validate by polymerase chain reaction, an often problematic stage because of inadequate significant association review. ParseCNV is freely available at http://parsecnv.sourceforge.net.Entities:
Mesh:
Year: 2013 PMID: 23293001 PMCID: PMC3597648 DOI: 10.1093/nar/gks1346
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.CNV Analysis Workflow. Pre-processing, file formats and post-processing. This general framework shows the stepwise procedure to prepare input data to use and evaluate ParseCNV output. ‘…’ represents additional columns not shown.
Figure 2.Possible statistical contingency table definitions to capture CNV frequency difference in cases versus control subjects. The middle statistical definition of deletions signifying loss of function mutations and duplications signifying gain of function mutations is used predominantly. This is in contrast to a view that all CNVs are similarly detrimental put forth by the top statistical definition and the view that all CNV states lead to a unique outcome put forth by the bottom statistical definition.
Figure 3.Complex CNV Overlap and CNVR definition examples. Rectangles represent individual sample CNV call boundaries as provided by a CNV calling algorithm. Each assayed point represented by the probe framework listed in the map file input determines the possible boundary assignments. The CNVR definition assigned by ParseCNV is shown as a dashed box. Small variance in individual CNV call boundaries allows extension of CNVR definition. CNV peninsula is shown as the most common false-positive result based on variable extension of CNV boundary (typically the region common to cases and controls has many probes, whereas the case only extension has few probes).
Significant CNVR output fields description
| Column | Description |
|---|---|
| CNVR | CNV region of greatest significance and overlap coordinates. |
| CountSNPs | The number of probes available in the CNVR for this data set. In this case, contributing individual CNV calls may be larger. |
| SNP | Tag SNP for ease and clarity of reporting and replication. |
| DelTwoTailed | Two-tailed Fisher’s exact |
| DupTwoTailed | Two-tailed Fisher’s exact |
| ORDel | The odds ratio for deletion. |
| ORDup | The odds ratio for duplication. |
| Cases Del | The number of cases with a deletion detected in this region by PennCNV. |
| Cases Diploid | The number of cases without a deletion or duplication detected in this region by PennCNV. |
| Control Del | The number of control subjects with a deletion detected in this region by PennCNV. |
| Control Diploid | The number of control subjects without a deletion or duplication detected in this region by PennCNV. |
| Cases Dup | The number of cases with a duplication detected in this region by PennCNV. |
| Cases Diploid | The number of cases without a deletion or duplication detected in this region by PennCNV. |
| Control Dup | The number of control subjects with a duplication detected in this region by PennCNV. |
| Control Diploid | The number of control subjects without a deletion or duplication detected in this region by PennCNV. |
| IDsCasesDel | The sample IDs of cases corresponding to the Cases Del column for clinical data lookup. To convert to list in Excel: Data-TextToColumns-Delimited-Space then Copy-PasteSpecial-Transpose. |
| IDsCasesDup | The sample IDs of cases corresponding to the Cases Dup column for clinical data lookup. To convert to list in Excel: Data-TextToColumns-Delimited-Space then Copy-PasteSpecial-Transpose. |
| StatesCasesDel | CN states listed corresponding to IDsCasesDel [1 (CN = 0)/2 (CN = 1)]. |
| StatesCasesDup | CN states listed corresponding to IDsCasesDup [5 (CN = 3)/6 (CN = 4)]. |
| TotalStatesCases(1) | The number of cases in Cases Del with a homozygous deletion or both copies lost. |
| TotalStatesCases(2) | The number of cases in Cases Del with a hemizygous deletion or one copy lost. |
| TotalStatesCases(5) | The number of cases in Cases Dup with a hemizygous duplication or one copy gained. |
| TotalStatesCases(6) | The number of cases in Cases Dup with a homozygous duplication or two copies gained. |
| IDsDelControl | The sample IDs of control subjects corresponding to the Control Del column for clinical data lookup. |
| IDsDupControl | The sample IDs of control subjects corresponding to the Control Dup column for clinical data lookup. |
| StatesDelControl | CN states listed corresponding to IDsDelControl [1 (CN = 0)/2 (CN = 1)]. |
| StatesDupControl | CN states listed corresponding to IDsDupControl [5 (CN = 3)/6 (CN = 4)]. |
| TotalStates(1) | The number of Controls in Controls Del with a homozygous deletion or both copies lost. |
| TotalStates(2) | The number of Controls in Controls Del with a hemizygous deletion or one copy lost. |
| TotalStates(5) | The number of Controls in Controls Dup with a hemizygous duplication or one copy gained. |
| TotalStates(6) | The number of Controls in Controls Dup with a homozygous duplication or two copies gained. |
| ALLTwoTailed | All CNV states considered together |
| ORALL | All CNV states considered together OR. |
| ZeroTwoTailed | Only CN = 0 CNV state considered together |
| ORZero | Only CN = 0 CNV state considered together OR. |
| OneTwoTailed | Only CN = 1 CNV state considered together |
| OROne | Only CN = 1 CNV state considered together OR. |
| ThreeTwoTailed | Only CN = 3 CNV state considered together |
| ORThree | Only CN = 3 CNV state considered together OR. |
| FourTwoTailed | Only CN = 4 CNV state considered together |
| ORFour | Only CN = 4 CNV state considered together OR. |
| Gene | The closest proximal gene based on UCSC Genes, which includes both RefSeq Genes and Hypothetical Gene transcripts. |
| Distance | The distance from the CNVR to the closest proximal gene annotated. If the value is 0, the CNVR resides directly on the gene. |
| Description | The gene description delimited by ‘/’ for multiple gene transcripts or multiple genes listed. |
| Pathway | Annotated pathway membership of gene with reference compiled from Gene Ontology database, BioCarta database and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (definition files in GeneRef folder). |
| AverageNumsnpsCaseDel | The average numsnp of CNV calls contributing to Case Del CNVR. Allows for much more informative CNV size (confidence) filtering |
| AverageLengthCaseDel | The average length of CNV calls contributing to Case Del CNVR. Allows for much more informative CNV size (confidence) filtering |
| CNVRangeCaseDel | Alternative larger CNV Range Case Del definition compared with minimal common overlap definition of CNVR. |
| AverageNumsnpsControlDel | The average numsnp of CNV calls contributing to Control Del CNVR. Allows for much more informative CNV size (confidence) filtering |
| AverageLengthControlDel | The average length of CNV calls contributing to Control Del CNVR. Allows for much more informative CNV size (confidence) filtering |
| CNVRangeControlDel | Alternative larger CNV Range Control Del definition compared with minimal common overlap definition of CNVR. |
| CNVType | Deletion or duplication CNVR significant in combined report. |
| Cytoband | Cytoband genomic landmark designations. |
| redFlagCount | Count red flag from association review (see text, briefly: Segmental Duplications, Database of Genomic Variants, Centromere/Telomere, GC base content, Probe Count, Population Frequency, Peninsula, Inflated). |
| redFlagReasons | The failing metrics for association review and their values. |
Quantitative PCR validation of CNVR associations
| Project | Validations attempted | Cases | Control subjects | Loci | Count Del | CN 0 | CN 1 | CN 2 | CN 3 | CN 4 | PCR failed | Validation failed | Success rate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Autism | 37 | 2195 | 2519 | 25 | 13 | 0 | 8 | 13 | 13 | 3 | 0 | 4 | 0.89 |
| Schizophrenia | 52 | 1735 | 3485 | 8 | 47 | 14 | 21 | 14 | 3 | 0 | 0 | 10 | 0.81 |
| Obesity | 104 | 2559 | 4075 | 35 | 36 | 0 | 31 | 45 | 27 | 0 | 10 | 5 | 0.95 |
| ADHD | 135 | 3506 | 13 327 | 12 | 57 | 0 | 35 | 56 | 37 | 7 | 7 | 11 | 0.92 |
| AutSczAdhd | 10 | 9 | 1 | 1 | 10 | 0 | 9 | 1 | 0 | 0 | 0 | 0 | 1 |
| OldYoung | 23 | 9392 | 7393 | 23 | 12 | 0 | 9 | 3 | 11 | 0 | 1 | 3 | 0.87 |
| Progressive supranuclear palsy | 48 | 1855 | 6701 | 24 | 38 | 0 | 32 | 9 | 7 | 0 | 4 | 9 | 0.81 |
Figure 4.Quantitative PCR validation of CNVR associations. Each sample with attempted validation for a specific CNV at a specific locus is shown. The validation data output is 0.5 for deletions, 1 for diploid, 1.5 for duplications with standard error values from triplicate runs.
Figure 5.Sampling of different settings of distance (1 MB) and significance (±1 power of 10 P-value). Based on 785 cases versus 1110 control subjects and 561 308 probes data set. By this sampling procedure, we show these defaults are justifiable based on balancing CNVR extension to allow boundary variability while maintaining unique loci, except in rare instances. The x-axis shows the CNVR typed and distance setting. The colour shows the P-value variance setting. The y-axis shows the count CNVRs resulting from these settings.
Figure 6.Increased frequency of specific CNV state in cases. chr14:104241048–104348254 4:0 (case:control) deletions 2:11 duplications 6:11 combined ParseCNV provides case enriched deletion significance for this region P = 0.03 (duplication control enriched P = 0.09). As Plink only uses combined count definition, the P = 1 and the region is missed. chr11:133663955–133715739 1:3 deletions 5:0 duplications 6:3 combined ParseCNV provides case enriched duplication significance for this region P = 0.01 (deletion control enriched P = 0.65). As Plink only uses combined count definition, the P = 0.12 and the region is missed.