| Literature DB >> 20605930 |
Michael Wittig1, Ingo Helbig, Stefan Schreiber, Andre Franke.
Abstract
MOTIVATION: Copy number variation (CNV), a major contributor to human genetic variation, comprises >/= 1 kb genomic deletions and insertions. Yet, the identification of CNVs from microarray data is still hampered by high false negative and positive prediction rates due to the noisy nature of the raw data. Here, we present CNVineta, an R package for rapid data mining and visualization of CNVs in large case-control datasets genotyped with single nucleotide polymorphism oligonucleotide arrays. CNVineta is compatible with various established CNV prediction algorithms, can be used for genome-wide association analysis of rare and common CNVs and enables rapid and serial display of log(2) of raw data ratios as well as B-allele frequencies for visual quality inspection. In summary, CNVineta aides in the interpretation of large-scale CNV datasets and prioritization of target regions for follow-up experiments.Entities:
Mesh:
Year: 2010 PMID: 20605930 PMCID: PMC2922892 DOI: 10.1093/bioinformatics/btq356
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Workflow and screenshots from CNVineta. (A) Workflow. Before starting the CNV screening, the SNP array data has to be processed with a third-party CNV prediction algorithm. Subsequent CNV association screening can be performed for rare and/or common CNVs. The functions automatically generate result tables and graphs for all regions that were identified as associated by CNVineta. The visual data mining can be performed in a stepwise fashion. (B–D) Plotting results for a known common deletion at the IRGM gene locus (McCarroll et al., 2008) in the Affymetrix® 6.0 HapMap dataset (International HapMap Consortium, 2003) comprising 180 samples (CEU and YRI). (B) Regional overview plot. From top to bottom the predicted CNVs (deletions highlighted by red horizontal lines), array probe sets within the region (SNP marker in black and non-polymorphic probe sets by blue vertical lines) and annotated genes (purple arrows). (C) Raw data plots. For each sample, the raw data visualization includes LRR (upper panel) and BAF (lower panel). (D) Heat map. To obtain a sample set-wide impression of the particular CNV and in order to identify potential false positive and negative CNVs that should be subjected to further follow-up, heat maps of LRR data can be generated.