Literature DB >> 26258046

Platform comparison of detecting copy number variants with microarrays and whole-exome sequencing.

Joep de Ligt1, Philip M Boone2, Rolph Pfundt1, Lisenka E L M Vissers1, Nicole de Leeuw1, Christine Shaw3, Han G Brunner4, James R Lupski5, Joris A Veltman4, Jayne Y Hehir-Kwa1.   

Abstract

Copy number variation (CNV) is a common source of genetic variation that has been implicated in many genomic disorders, Mendelian diseases, and common/complex traits. Genomic microarrays are often employed for CNV detection. More recently, whole-exome sequencing (WES) has enabled detection of clinically relevant point mutations and small insertion-deletion exome wide. We evaluated (de Ligt et al. 2013) [1] the utility of short-read WES (SOLiD 5500xl) to detect clinically relevant CNVs in DNA from 10 patients with intellectual disability and compared these results to data from three independent high-resolution microarray platforms. Calls made by the different platforms and detection software are available at dbVar under nstd84.

Entities:  

Keywords:  Copy number variation; Microarray; Whole exome sequencing

Year:  2014        PMID: 26258046      PMCID: PMC4526866          DOI: 10.1016/j.gdata.2014.06.009

Source DB:  PubMed          Journal:  Genom Data        ISSN: 2213-5960


Direct link to deposited data

http://www.ncbi.nlm.nih.gov/dbvar/studies/nstd84/ (CNV calls from all platforms/programs) http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46060 (raw 250 k data)

Experimental design, materials and methods

Sample selection

Ten samples were selected that had previously been diagnostically reported as containing at least one clinically relevant, rare de novo CNV associated with intellectual disability (ID), detected by routine microarray based screening within the Department of Human Genetics, Radboud University Medical Centre, Nijmegen [1]. These CNVs were chosen to represent a wide range of clinically relevant CNVs detected by microarray based analysis in our Genome Diagnostics division. The selected CNVs (1) contained at least one coding region, (2) were validated de novo using the same microarray platform on parental DNAs, (3) occurred across a variety of chromosomes, (4) ranged in copy number state from zero to three, and (5) ranged in genomic size from 15 kb to 24 Mb (Table 1).
Table 1

Overview of the detection of 12 clinically relevant de novo CNVs.



Discovery microarray
WES read depth algorithms
PatientChromosomeEstimated start position (kb)Estimated end position (kb)CNV size (kb)Copy number stateNr. GenesCONTRAcn.MOPSExonDepthCoNIFER
1chr1089,642.689,657.514.911a
2chr1933,371.133,394.223.001VV
3chr877,745.677,795.249.611-VV
4chr171,203.61,516.5312.938-VV
5chr1629,673.229,988.3315.1116VV
6chr1543,759.844,862.91103.2124V
7chr2233,166.3233,886.7720.5316VV
8chrX6495.37951.71456.405VV
9chr2239,952.7241,373.11420.5314VV
chr2241,442.7243,001.91559.2131VV
chr1560,489.762,906.524,603.63210VV
10chr2077,771.0102,374.62416.8391VVV

CNVs as detected by the discovery microarray (hg19), genomic location, size, predicted copy number state and the number of genes in the region. a. A single exon deletion. Detection by the different WES approaches; –, CNV is not detected with a minimum overlap of 30%, and V, detected with a minimum overlap of 30%.

Eleven of these de novo CNVs were detected using an Affymetrix 250K NspI (Affymetrix, Santa Clara, CA) microarray and one, in patient 1, with the Affymetrix 2.7M microarray platform (Table 1).

Whole exome sequencing

Genomic DNA from these 10 samples was isolated from blood using the QIAamp DNA Mini Kit (Qiagen, Venlo, The Netherlands). Whole exome sequencing (WES) was performed at the University Medical Centre Nijmegen (UMCN) according to the manufacturer's guidelines. The samples were enriched using the Agilent SureSelect V2 protocol, and sequencing was performed on a SOLiD 5500xl system (Life Technologies) to a median read depth of 67 across targeted regions. Read correction and mapping were performed with Lifescope v1.3 (Life Technologies), using default settings. After mapping reads with a mapping quality (MAPQ) value below 20 were discarded to select reliably mapped reads. The value of 20 was based on read depth ratio of the X chromosome between female and male samples. This was the lowest quality value that resulted in a 0.5 ratio. The WES data were analyzed with four different published CNV detection programs; (1) cn.MOPS v1.6.4 [2], (2) CONTRA v2.0.3 [4], (3) CoNIFER v0.2.0 [3], and (4) ExomeDepth v0.8.4 [6], with hg19-based RefSeq gene exon definitions as target regions in the analysis. Overlapping exonic regions were merged resulting in a list with unique genomic regions for further analysis. The tools were selected based on their availability and ability to perform rare CNV detection on .bam files. CNV segments identified by WES underwent additional merging. CNV calls of the same copy number were merged if they were within 5 Mb distance and fewer than 30 informative data points were between the calls. These values protect against overcalling while allowing for gene deserts, resulting in a more robust and uniform call-set.

Affymetrix 250K NspI & Affymetrix 2.7M microarray

Samples were processed in accordance with the manufacturer's instructions. Hybridization, washing and scanning were performed with appropriate Affymetrix GeneChip products. The 250K microarrays image processing was performed with Affymetrix GeneChip Command Console software. Genotypes were called with Affymetrix Genotyping Console Software v2.1 using the BRLMM algorithm with default-calling threshold of 0.5 and a prior size of 10,000 bases. Samples were required to have a minimum Quality Control SNP call rate of 90%. CNV identification was performed using CNAG v2.0 with default HMM settings [5]. Image processing, CNV calling and merging for the 2.7M microarray was performed via Affymetrix Power Tools v1.14.3 with default settings and calling thresholds. Data viewing and analysis was performed with the Affymetrix Chromosome Analysis Suite (ChAS) software v2.0 and the UCSC genome browser (UCSC Genome Browser on Human Feb. 2009, GRCh37/hg19, NCBI Build 37.3). Samples were required to have minimum quality thresholds of; MAPD ≤ 0.2049, SNP-QC > 1,1 and WavinessSegCount ≤ 10.

Affymetrix CytoScanHD (2.6M) microarray

The Affymetrix CytoScanHD platform with 2.6 million probes was used to serve as a benchmark of the high resolution microarrays currently used in a diagnostic setting at the UMCN genetics department. Experiments were performed in the UMCN according to the manufacturer's specifications. CNVs were called with Affymetrix Power Tools v1.14.3 using default settings and calling thresholds. Data viewing and analysis was performed with ChAS v2.0 and the UCSC genome browser (UCSC Genome Browser on Human Feb. 2009, GRCh37/hg19, NCBI Build 37.3). Samples were required to have minimum quality thresholds of; MAPD < 0.25, SNPQC > 15 and Waviness-SD < 0.12.

NimbleGen custom design ExonArray (4.2 M) microarray

A custom, comparative genomic hybridization (CGH) array with approximately 4.2 million oligonucleotide probes was manufactured for BCM by Roche NimbleGen (RNG); this array, referred to as the “ExonArray”, served as an orthologonal independent experimental approach and a high resolution benchmark for CNV detection. The custom array design included 2.15 million backbone probes and 1.85 million supplemental exonic probes targeted to the RNG “Big Exome” (exome definition includes all exons from RNG Exome v2.0, Agilent SureSelect 50Mb, RefSeq, CCDS, and the BCM Human Genome Sequencing Center HGSC content (both VCRome and HGSCv1 designs)). The aim of the design was to cover each exon (and flanking sequence, if necessary) with at least 8 probes. A series of test arrays was manufactured with 10X oversampling of exon-targeted probes (i.e. 80 per exon) and runs with control DNA to empirically identify the 8 probes per exon with best linear signal response to a range of DNA concentrations, which were included in the final ExonArray design. In the ExonArray, the ideal coverage of 8 or more probes was achieved for > 135,000 (~ 86%) of the targeted exons; 249 (0.16%) of the exons could not be targeted at all. The 10 patient samples were analyzed with the ExonArray at BCM according to the manufacturer's specifications and using gender matched control DNA (HapMap individuals NA10851 and NA15510). Segmentation was performed with RNG DEVA software, using default settings (requiring a minimum of five probes per segment), to account for the high resolution of the platform the maximum number of segments allowed per chromosome was increased to 500. CNVs were derived from the segments using a log2 deviation value of ≤ − 0.415 (the theoretical log2 of a 50% mosaic heterozygous loss) for deletions or ≥ 0.322 (the theoretical log2 ratio of a 50% mosaic heterozygous gain) for duplications and higher order gains. Segments were merged when they were; 1) within 1 Mb of each other, 2) fewer than 100 probes were located between the events and 3) the average log2 values had a maximum difference of 0.5.

Evaluating the CNV detection power of WES

The false negative (FN) detection rate of WES was calculated by measuring the number of CNV events detected using the high-resolution microarray platforms that were missed by WES. To prevent overestimation due to platform design (exon targeted vs. whole genome), we accounted for both the exome enrichment targets and the detection power of WES. We selected CNVs that were identified by at least two independent microarray platforms (minimum overlap of 30% of the CNV region, to allow for breakpoint inaccuracies due to the large differences in probe densities) and the CNV had to encompass at least three exons. For each CNV, the largest region, detected by the CytoScanHD or the ExonArray, was used for further analysis. After applying these selection criteria to the total set of 6074 CNVs identified by the different microarray experiments, the resulting consensus dataset contained 97 CNVs. Of these 97 consensus CNVs, 25 did not occur in the common CNV dataset and were considered rare CNVs. Consensus CNVs were only considered as positively detected by WES if a CNV was called in the same region and overlapped the consensus CNV region for at least 30%.

Discussion

We present a high quality data set of CNVs in 10 individuals. The use of different techniques and algorithms allowed a systematic assessment of detection power and accuracy. The high-resolution array datasets could be studied in more detail for their discrepancies or the mechanisms of genomic instability leading to small (coding) events. The WES callset is useful for developers of calling software to identify the caveats and advantages of the different models evaluated in our study.

Conflict of interest

The authors declare no conflict of interest.
Specifications
Organism/cell line/tissueHuman blood
Sex
Sequencer or array typeAffymetrix 250 k, Affymetrix CytoScanHD, Nimblegen Custom ExonArray, Solid 5500xl
Data formatAnalyzed
Experimental factorsNormal
Experimental featuresPositive samples with rare de-novo coding CNV
ConsentAll patients gave their written informed consent before study entry.
Sample source locationNA
  6 in total

1.  Detection of clinically relevant copy number variants with whole-exome sequencing.

Authors:  Joep de Ligt; Philip M Boone; Rolph Pfundt; Lisenka E L M Vissers; Todd Richmond; Joel Geoghegan; Kathleen O'Moore; Nicole de Leeuw; Christine Shaw; Han G Brunner; James R Lupski; Joris A Veltman; Jayne Y Hehir-Kwa
Journal:  Hum Mutat       Date:  2013-08-30       Impact factor: 4.878

2.  A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays.

Authors:  Yasuhito Nannya; Masashi Sanada; Kumi Nakazaki; Noriko Hosoya; Lili Wang; Akira Hangaishi; Mineo Kurokawa; Shigeru Chiba; Dione K Bailey; Giulia C Kennedy; Seishi Ogawa
Journal:  Cancer Res       Date:  2005-07-15       Impact factor: 12.701

3.  Copy number variation detection and genotyping from exome sequence data.

Authors:  Niklas Krumm; Peter H Sudmant; Arthur Ko; Brian J O'Roak; Maika Malig; Bradley P Coe; Aaron R Quinlan; Deborah A Nickerson; Evan E Eichler
Journal:  Genome Res       Date:  2012-05-14       Impact factor: 9.043

4.  CONTRA: copy number analysis for targeted resequencing.

Authors:  Jason Li; Richard Lupat; Kaushalya C Amarasinghe; Ella R Thompson; Maria A Doyle; Georgina L Ryland; Richard W Tothill; Saman K Halgamuge; Ian G Campbell; Kylie L Gorringe
Journal:  Bioinformatics       Date:  2012-04-02       Impact factor: 6.937

5.  cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

Authors:  Günter Klambauer; Karin Schwarzbauer; Andreas Mayr; Djork-Arné Clevert; Andreas Mitterecker; Ulrich Bodenhofer; Sepp Hochreiter
Journal:  Nucleic Acids Res       Date:  2012-02-01       Impact factor: 16.971

6.  A robust model for read count data in exome sequencing experiments and implications for copy number variant calling.

Authors:  Vincent Plagnol; James Curtis; Michael Epstein; Kin Y Mok; Emma Stebbings; Sofia Grigoriadou; Nicholas W Wood; Sophie Hambleton; Siobhan O Burns; Adrian J Thrasher; Dinakantha Kumararatne; Rainer Doffinger; Sergey Nejentsev
Journal:  Bioinformatics       Date:  2012-08-31       Impact factor: 6.937

  6 in total
  5 in total

Review 1.  Ciliopathies: Genetics in Pediatric Medicine.

Authors:  Machteld M Oud; Ideke J C Lamers; Heleen H Arts
Journal:  J Pediatr Genet       Date:  2016-11-10

2.  Lessons learned from additional research analyses of unsolved clinical exome cases.

Authors:  Mohammad K Eldomery; Zeynep Coban-Akdemir; Tamar Harel; Jill A Rosenfeld; Tomasz Gambin; Asbjørg Stray-Pedersen; Sébastien Küry; Sandra Mercier; Davor Lessel; Jonas Denecke; Wojciech Wiszniewski; Samantha Penney; Pengfei Liu; Weimin Bi; Seema R Lalani; Christian P Schaaf; Michael F Wangler; Carlos A Bacino; Richard Alan Lewis; Lorraine Potocki; Brett H Graham; John W Belmont; Fernando Scaglia; Jordan S Orange; Shalini N Jhangiani; Theodore Chiang; Harsha Doddapaneni; Jianhong Hu; Donna M Muzny; Fan Xia; Arthur L Beaudet; Eric Boerwinkle; Christine M Eng; Sharon E Plon; V Reid Sutton; Richard A Gibbs; Jennifer E Posey; Yaping Yang; James R Lupski
Journal:  Genome Med       Date:  2017-03-21       Impact factor: 11.117

3.  WISExome: a within-sample comparison approach to detect copy number variations in whole exome sequencing data.

Authors:  Roy Straver; Marjan M Weiss; Quinten Waisfisz; Erik A Sistermans; Marcel J T Reinders
Journal:  Eur J Hum Genet       Date:  2017-11-08       Impact factor: 4.246

4.  Copy number variant and runs of homozygosity detection by microarrays enabled more precise molecular diagnoses in 11,020 clinical exome cases.

Authors:  Avinash V Dharmadhikari; Rajarshi Ghosh; Bo Yuan; Pengfei Liu; Hongzheng Dai; Sami Al Masri; Jennifer Scull; Jennifer E Posey; Allen H Jiang; Weimin He; Francesco Vetrini; Alicia A Braxton; Patricia Ward; Theodore Chiang; Chunjing Qu; Shen Gu; Chad A Shaw; Janice L Smith; Seema Lalani; Pawel Stankiewicz; Sau-Wai Cheung; Carlos A Bacino; Ankita Patel; Amy M Breman; Xia Wang; Linyan Meng; Rui Xiao; Fan Xia; Donna Muzny; Richard A Gibbs; Arthur L Beaudet; Christine M Eng; James R Lupski; Yaping Yang; Weimin Bi
Journal:  Genome Med       Date:  2019-05-17       Impact factor: 11.117

5.  Challenges in exome analysis by LifeScope and its alternative computational pipelines.

Authors:  Erinija Pranckevičiene; Tautvydas Rančelis; Aidas Pranculis; Vaidutis Kučinskas
Journal:  BMC Res Notes       Date:  2015-09-07
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.