Literature DB >> 25161258

cnvOffSeq: detecting intergenic copy number variation using off-target exome sequencing data.

Abstract

MOTIVATION: Exome sequencing technologies have transformed the field of Mendelian genetics and allowed for efficient detection of genomic variants in protein-coding regions. The target enrichment process that is intrinsic to exome sequencing is inherently imperfect, generating large amounts of unintended off-target sequence. Off-target data are characterized by very low and highly heterogeneous coverage and are usually discarded by exome analysis pipelines. We posit that off-target read depth is a rich, but overlooked, source of information that could be mined to detect intergenic copy number variation (CNV). We propose cnvOffseq, a novel normalization framework for off-target read depth that is based on local adaptive singular value decomposition (SVD). This method is designed to address the heterogeneity of the underlying data and allows for accurate and precise CNV detection and genotyping in off-target regions.
RESULTS: cnvOffSeq was benchmarked on whole-exome sequencing samples from the 1000 Genomes Project. In a set of 104 gold standard intergenic deletions, our method achieved a sensitivity of 57.5% and a specificity of 99.2%, while maintaining a low FDR of 5%. For gold standard deletions longer than 5 kb, cnvOffSeq achieves a sensitivity of 90.4% without increasing the FDR. cnvOffSeq outperforms both whole-genome and whole-exome CNV detection methods considerably and is shown to offer a substantial improvement over naïve local SVD.
AVAILABILITY AND IMPLEMENTATION: cnvOffSeq is available at http://sourceforge.net/p/cnvoffseq/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
DNA, Intergenic

Year: 2014 PMID： 25161258 PMCID： PMC4147927 DOI： 10.1093/bioinformatics/btu475

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

The advent of whole-exome sequencing (WES) has led to a renaissance in the field of Mendelian genetics (Bamshad ; Hoischen ; Ng ,b). WES offers deep coverage for protein-coding regions at a lower cost than whole-genome sequencing (WGS) and has thus sparked renewed interest in elucidating rare diseases. Exome sequence analysis typically focuses on detecting previously unobserved (or very low-frequency) coding single-nucleotide polymorphisms (SNPs) and small frame-shift indels that are absent from a reference set. This approach is based on the prior hypothesis that such loss-of-function mutations are more likely to cause severe phenotypic effects commonly seen in rare diseases. A number of studies have also identified exonic copy number variation (CNV) as the underlying genetic basis for various Mendelian disorders (Lango Allen ; Rohrer ). Exome-based CNV detection is complicated by the presence of strong batch effects introduced by the enrichment process. Various approaches have been developed to detect, genotype and perform association testing on exonic CNVs in population exome datasets (Coin ; Krumm ; Sathirapongsasuti ). The majority of these approaches operate under the assumption that most of the variation in read depth (RD) is driven by systematic bias which swamps the real signal in global noise. Exome CNV methods attempt to mitigate such bias either through control-based normalization or by identifying and removing the strongest components of variation across the population. Most exome sequencing technologies rely on a hybridization step to enrich for DNA fragments arising from specific genomic targets. Hybridization is a sensitive but imperfect process that captures large amounts of off-target fragments along with the intended exonic regions. Although hybridization efficiency is highly variable and platform-specific, off-target sequence has been consistently shown to comprise as much as 50% of whole-exome datasets (Hedges ). Despite its potential to generate high-quality SNP genotypes in non-coding regions (Guo ), non-target reads are almost always treated as a contaminant and ignored by exome analysis pipelines. We postulate that off-target data are a rich, but overlooked, source of information that could be mined to detect intergenic CNV. The off-target sequence coverage from a WES experiment can range from 0.5× to 3×. We have previously shown (Bellos ) that it is possible to accurately identify and genotype CNVs longer than 1 kb from low-coverage WGS data, which itself can be as little as 2× coverage. Despite the similar levels of coverage, analysis of off-target exome sequence data is confounded by the highly uneven nature of off-target coverage. The mechanism behind off-target enrichment is highly complex and contingent on both sequence properties and stochastic processes. Different parts of the genome are subject to distinct biases, such that off-target RD behaves like a combination of whole-genome and whole-exome data and cannot be handled uniformly. To address the heterogeneity of off-target sequencing coverage, we have developed a dynamic RD normalization pipeline called cnvOffSeq. The pipeline is based on a modified version of singular value decomposition (SVD) that allows for flexible, region-specific noise reduction and signal enhancement. Global (or exome-wide) SVD has been established as a robust framework for on-target RD normalization and CNV calling using WES data. Here we propose a local adaptive SVD approach for detection and genotyping of intergenic CNVs based solely on off-target reads of WES experiments. In principle, introns could also be analysed using our framework, but we focused on intergenic regions to avoid contamination from exome targets.

2 METHODS

2.1 Local adaptive SVD

The enrichment step that is essential for most targeted sequencing technologies gives rise to biases that heavily affect the depth of coverage. The resulting read depth is highly variable across target regions, rendering CNV detection problematic. Global SVD can alleviate the effects of enrichment bias and has formed the basis of numerous WES normalization strategies for CNV identification (Fromer ; Krumm ). In this paradigm, the noise is assumed to be non-random and highly correlated across samples, as the confounders (such as capture specificity and capture probe design) are shared among them. Therefore, by exome-wide application of SVD and elimination of the strongest singular components, CNV methods can effectively de-noise RD and achieve accurate segmentation. Contrary to on-target exome data, off-target reads are unintentional by-products of the enrichment technologies and are thus subject to a mixture of sequencing biases that are highly variable and dependent on genomic context. Due to the repetitive nature of the human genome, some non-coding regions may be enriched if they have a certain degree of homology with exonic sequence. Such off-target regions will share properties with on-target data and exhibit a similar pattern of highly correlated noise. Hybridization-based whole-exome enrichment involves a washing step to remove uncaptured and poorly hybridized fragments. The efficiency of this washing step varies according to the desired level of hybridization stringency and can therefore also introduce varying amounts of off-target sequence into the results. In this case, the off-target noise is expected to be random and decorrelated among samples due to the stochastic nature of its origin. Consequently, off-target data are highly diverse and not amenable to global normalization approaches that treat noise as either entirely random or entirely correlated. To that end, we introduce a localized variant of the SVD, which segments the non-coding genome according to the observed RD noise pattern (Fig. 1). To maintain contiguity and facilitate CNV calling, we retain the coding regions in our analysis but substitute their RD with the genome-wide off-target coverage.

Fig. 1.

Flowchart describing the local adaptive SVD normalization process. (a) Iterative segmentation process. (b) Differential normalization of each segment according to the observed RD pattern

Flowchart describing the local adaptive SVD normalization process. (a) Iterative segmentation process. (b) Differential normalization of each segment according to the observed RD pattern Our normalization framework divides the non-coding genome into regions with distinct RD noise profiles through an iterative application of SVD to genomic windows of varying size: The columns of matrix U represent the left-singular vectors, the rows of V* represent the right-singular vectors and Σ is a diagonal matrix that contains the singular values in decreasing order. Each normalization window shrinks repeatedly by 100 bp until it achieves a local maximum for the largest singular value, σ1. When such a maximum has been detected, we have identified the most ‘singular’ or degenerate region and the window shifts forward by the size of the shrunken window and repeats the process until the whole genome is covered. The resulting segmentation provides the means for differentiating between regions of correlated and random noise. This can be achieved by examining the relative contribution of each region’s maximum singular value as defined by The larger the contribution the more likely the region resembles an exonic target with a highly consistent noise pattern across samples. The lower the contribution, the more randomly distributed the noise will tend to be across samples. We applied heuristic thresholds of 30 and 70% for the RCσ1 to define three region classes: If RCσ1<30%, then the noise appears to be random and highly represented in the lower singular components. Thus, we normalize these regions by keeping the first singular component and eliminating the rest. If 30% ≤ RCσ1 ≤ 70%, the noise remains mostly random but there appear to be some signs of some systematic bias. Thus, we eliminate all but the first two singular components. If RCσ1>70%, then the RD is dominated by systematic bias and is therefore normalized by removing the first singular component following the paradigm of WES normalization. The local RD matrices are then reconstructed using either the truncated or the modified Σ matrix to obtain the filtered signal that can now be used for CNV detection.

2.2 Data modelling and CNV calling

Following normalization, we use the hidden Markov model (HMM) framework described in cnvHiTSeq (Bellos ) to perform CNV segmentation and genotyping. Like its predecessor, cnvOffseq models the RD at the population level to achieve optimal results. The observed continuous RD is considered to be generated by the hidden underlying discrete copy number states. The emission distribution of RD is modelled using the negative binomial distribution to account for its documented overdispersed nature (Bentley ). The transition probabilities of our HMM are determined by the combination of a global transition rate matrix and a local transition rate, to capture the overall transition between copy number states across the region as well as position-specific changes. Despite the substantial de-noising achieved through our local adaptive SVD, the very low off-target coverage combined with the high RD variability informs a more stringent prior on the transition rate than previously used for WGS. As previously described, the parameters of the HMM are estimated using a generalized expectation-maximization algorithm and the most likely CNV segmentation for the trained model is obtained by the Viterbi algorithm.

2.3 Samples and datasets

Our normalization method was developed and evaluated using whole-exome sequencing samples from the 1000 Genomes Project. We only considered samples for which the 1000 Genomes consortium has also generated genome-wide CNV calls, based on the low-coverage and trio phase datasets. Furthermore, we excluded samples that exhibited off-target coverage <1×. The aforementioned criteria were fulfilled by 50 samples spanning 7 populations (Supplementary Table S1) that were sequenced either by the Broad Institute (BI) or Washington University, Genome Sequencing Center (WUGSC). For target enrichment, BI used the Agilent SureSelect All Exon assay, while WUGSC also used the NimbleGen SeqCap EZ Exome assay. The samples were sequenced either on Illumina Genome Analyzer II or HiSeq 2000 using a paired-end protocol, with the insert size varying between 100 and 400 bp across samples. Sequencing reads were aligned to the human reference genome (assembly NCBI37) using BWA (Li and Durbin, 2009). Because we are interested only in off-target data, we excluded regions that were reported as captured by either enrichment assay along with the surrounding 5 kb to avoid possible contamination. In our 50 samples, the resulting on-target coverage ranges from 49× to 248× with a mean of 96×, while the off-target coverage ranges from 1.07× to 2.66× with a mean of 1.97×. Our off-target CNV calls were compared against the 1000 Genomes gold standard set of genotyped deletions that were created by combining the predictions of five computational methods to maximize confidence.

2.4 cnvOffSeq implementation

cnvOffSeq is implemented as a collection of JAR-packaged Java tools and UNIX shell scripts. The main input of the algorithm is BAM alignment files, which are then pre-processed using SAMtools (Li ). cnvOffSeq also requires the coordinates of targeted coding regions (to be excluded) in BED format. Such files are typically provided by capture assay vendors. cnvOffSeq produces normalized RD files in text and binary format that can be disentangled from our CNV calling pipeline and used by third party segmentation algorithms. When used in conjunction with our HMM framework, cnvOffSeq generates CNV calls in text format and optional segmentation plots. The sampling density of RD is a user-specified parameter that determines the CNV breakpoint resolution and the computational requirements of the algorithm. At the default high-resolution setting of 100 bp, the normalization of chromosome 6 in 50 samples required 4GB of memory and one hour of processing time. The software is freely available from http://sourceforge.net/p/cnvoffseq.

3 RESULTS

cnvOffSeq’s performance was benchmarked against chromosome 6 gold standard deletions for 50 WES samples. We excluded CNVs located less than 5 kb from the nearest gene target as well as CNVs overlapping regions that were blacklisted by the ENCODE project for anomalous RD patterns. We also eliminated gold standard deletions for which none of the 50 samples passed quality control. Our CNV detection resolution is limited by the very low off-target coverage, so we only considered events larger than 500 bp. The final gold standard dataset consisted of 104 deleted regions harbouring 497 confident calls across all samples (Supplementary Table S2). These deletions range from 606 bp to 69 281 bp with a median length of 2375 bp.

3.1 Normalization

First we sampled our RD data every 100 bp and calculated the off-target coverage for each sample. Then we applied the local adaptive SVD with a starting window size of 50 kb and a minimum window of 2.5 kb. Our dynamic normalization framework divides chromosome 6 into 5354 segments with an average length of 32.5 kb. 582 of these segments, amounting to almost 1 Mb of sequence, exhibit highly correlated RD patterns across samples. These segments resemble on-target data and are dominated by systematic bias, which can be mitigated by removing the first singular component (Fig. 2b). The remaining segments are affected by varying degrees of random noise and would suffer from loss of actual signal if normalized the same way (Fig. 2b). Instead, such segments are de-noised by eliminating low-order singular components (Fig. 2c). Consequently, global SVD methods developed for exome sequencing tend to over-correct off-target RD (Fig. 2b), while low-order component filtering tends to under-correct the most outlying observations (Fig. 2c). Depending on the local RD profile our adaptive SVD algorithm applies either first component filtering or low-order component filtering, thus combining the best of both worlds (Fig. 2d).

Fig. 2.

Comparison of normalization techniques for two samples on chr6:48 911 000–48 959 000. The red dashed lines denote the breakpoints of a gold standard deletion that is present only in sample NA06989. (a) Raw RD data. The noisy nature of unnormalized RD makes the deletion difficult to detect in NA06989. The two regions denoted by the yellow dashed lines, show highly correlated RD profiles that most likely correspond to systematic bias. (b) Removing the first singular component mitigates the systematic bias, but also suppresses the signal of the true deletion. (c) Filtering low-order singular components de-noises the signal and enhances the true deletion but leaves systematic bias unaffected. (d) Our local adaptive SVD algorithm achieves the best results by combining the two approaches in one cohesive framework. The yellow shaded regions were normalized by filtering the first singular component (as in b) while the red shaded region was normalized by filtering low-order components (as in c)

3.2 Benchmark

Because off-target regions comprise the majority of the genome, we first set out to compare cnvOffSeq with CNV methods designed for WGS. WGS approaches largely rely on accurate depth of coverage estimations to obtain a baseline for comparison. This is problematic in our case, as the on-target coverage is at least an order of magnitude higher than the off-target. Thus, the genome-wide coverage calculation is confounded by the on-target read depth, leading to coverage estimates that are too high for off-target regions and too low for on-target regions. One plausible solution is to exclude exome target regions from the calculations, but this is not trivial for most existing WGS methods. In fact, when we applied CNVnator (Abyzov ) to our whole-exome dataset, the results reflected the skewed coverage estimation, as most of the genome was deemed either deleted or duplicated in every sample. Therefore, to ensure an accurate and fair comparison we modified our previously described WGS framework, cnvHiTSeq, to accommodate discontiguous RD datasets, such as those represented by whole-exome off-target analyses. cnvHiTSeq utilizes LOESS smoothing and GC/alignability correction to mitigate sequencing biases, but relies on the same HMM framework as cnvOffSeq. Thus, by comparing cnvHiTSeq with cnvOffSeq we provide a benchmark of ‘naïve’ smoothing versus local adaptive SVD. Although cnvHiTSeq detects only 26 gold standard deletions fewer than cnvOffSeq (55.0% versus 57.5% sensitivity), it suffers from much higher rates of false-positive calls (82.7% versus 5.0% false discovery rate), resulting in considerably lower accuracy (Table 1). This demonstrates the advantage of local adaptive SVD in minimizing aberrant CNV calls while enhancing the true RD signal (Fig. 3).

Table 1.

cnvOffSeq performance comparison

Metric	cnvOffSeq (%)	cnvHiTSeq (%)	CoNIFER (%)	Local Static SVD
Sensitivity	57.5	55.0	7.6	49.4
Specificity	99.2	79.1	99.9	97.4
PPV	95.0	17.3	96.0	82.2
NPV	89.8	95.7	81.1	88.7
FPR	0.8	20.9	0.1	2.6
FDR	5.0	82.7	4.0	17.8
Accuracy	90.4	77.4	81.3	88.0

Note: PPV, positive predictive value; NPV, negative predictive value; FPR, false positive rate; FDR, false discovery rate.

Fig. 3.

Normalization results for seven gold standard regions that account for 30% of the total deletion calls. The top panels in each plot represent LOESS smoothed RD. The bottom panels represent RD that is normalized using local adaptive SVD. Each colour corresponds to a different sample while the red dashed lines denote the breakpoints of the deletions as determined by the 1000 Genomes Project cnvOffSeq performance comparison Note: PPV, positive predictive value; NPV, negative predictive value; FPR, false positive rate; FDR, false discovery rate. Next, we examined the performance of CoNIFER (Krumm ), as a representative of the global SVD approaches for whole-exome CNV detection. Like all exome CNV methods, CoNIFER requires an explicit list of target coordinates to function. In order to expand CoNIFER’s functionality to off-target data, we created pseudo-targets comprising the 104 gold standard regions (extended by 50 kb on either side). These pseudo-targets were then broken down into pseudo-probes of 100 bp and 1000 bp to allow for higher resolution. The best results were obtained by using 1000 bp long pseudo-probes and removing only the first singular component. However, CoNIFER’s performance falls far short of cnvOffSeq’s in terms of sensitivity. CoNIFER detects only 12 of the gold standard calls (corresponding to a sensitivity of 7.6%), most of which were genotyped as homozygous deletions by the 1000 Genomes Project (Table 1). This highlights the highly stringent nature of global SVD normalization, which renders it unsuitable for off-target CNV detection. cnvOffSeq uses low-order singular component filtering to de-noise a large portion of the off-target RD. To demonstrate why low-order filtering alone is not sufficient for optimal results, we also developed a variation of our normalization algorithm called local static SVD. Like its adaptive counterpart, static SVD segments the genome using patterns of RD, but then applies low-order filtering to every segment. This approach achieves a sensitivity of 49.4%, which is comparable with cnvOffSeq. As expected, however, the non-adaptive normalization does not cope well with segments dominated by systematic bias and is therefore more prone to false positives (Fig. 2b). Thus static SVD exhibits an elevated FDR of 17.8% compared to 5% for adaptive SVD (Table 1). We also investigated the genotyping performance of cnvOffSeq as compared with cnvHiTSeq and local static SVD. CoNIFER was excluded from this comparison, as it doesn’t provide absolute copy numbers. All methods in this analysis generate posterior probabilities for each CNV call, which were used to exclude results of low confidence (designated as missing). cnvOffSeq achieved an overall genotyping accuracy of 96.3% versus 73.0% for cnvHiTSeq and 90.8% for local static SVD (Table 2). Furthermore cnvOffSeq and local static SVD exhibited comparable missing rates, while cnvHiTSeq’s was almost twice as high signifying a higher proportion of low quality genotype calls.

Table 2.

Genotyping accuracy across methods

Method	Genotyping accuracy (%)	Missing rate (%)
cnvOffSeq	96.3	10.4
cnvHiTSeq	73.0	19.0
Local Static SVD	90.8	8.3

Genotyping accuracy across methods Finally, we set out to explore how our off-target results compare to those obtained using whole-genome data. To that end, we applied cnvHiTSeq and CNVnator to low-coverage WGS data (3×–16×) for the same 50 samples. The higher and more even coverage of the WGS dataset leads to higher overall sensitivity (75.5% for cnvHiTSeq and 62.6% for CNVnator versus 57.5% for cnvOffSeq). The improvement was more pronounced in CNVs smaller than 3 kb for which cnvHiTSeq achieves a sensitivity of 41.8% compared to 19.5% for cnvOffSeq. The FDR does not appear to benefit and remains comparable between WGS and off-target results (5.5% for cnvHiTSeq and 11.1% for CNVnator versus 5% for cnvOffSeq).

3.3 Performance versus CNV length

The inherently irregular nature of off-target RD may pose limits in the attainable resolution for CNV detection. Therefore, we investigated cnvOffSeq’s performance as a function of the gold standard CNV length, by only considering events larger than a certain (variable) threshold. The results indicate that both the sensitivity and the accuracy improve significantly for longer CNVs (Table 3).

Table 3.

cnvOffSeq performance across CNV lengths

CNV length threshold (bp)	Sensitivity (%)	Specificity (%)	FDR (%)	Accuracy (%)	Number of CNV loci
>500	57.5	99.2	5.0	90.4	104
>2000	70.2	99.1	5.0	93.2	89
>3000	73.5	98.9	5.0	93.5	67
>4000	83.4	98.6	5.4	95.3	58
>5000	90.4	98.4	5.8	96.7	49
>6000	89.9	98.3	6.8	96.6	39
>7000	88.5	99.0	4.4	96.8	36
>8000	87.4	98.9	4.9	96.6	31
>9000	94.2	99.0	4.0	98.0	25
>10 000	93.9	98.9	4.2	97.8	22
>11 000	94.4	98.7	5.6	97.8	20
>13 000	95.7	98.4	5.7	97.8	16
>14 000	98.5	98.1	5.7	98.2	13
>15 000	100.0	98.2	4.3	98.7	11
>22 000	100.0	97.7	3.9	98.5	5
>25 000	100.0	97.3	9.5	97.8	2

cnvOffSeq performance across CNV lengths Specifically, the sensitivity exceeds 90% for deletions above 5 kb, while the specificity reaches 97%. False positives appear to be evenly distributed across CNV lengths, as the FDR remains consistent throughout. The performance deteriorates for lengths higher than 25 kb, simply because there are only two deletions in the gold standard that exceed this threshold. Thus, we conclude that cnvOffSeq is especially well suited for longer deletions (>5 kb) which are detected with very high sensitivity and consistently low false discovery rate.

4 DISCUSSION

Exome sequencing is a relatively nascent technology that has nevertheless achieved near ubiquity, particularly in the field of Mendelian genetics. Due to its cost- and time-effectiveness, exome sequencing has largely superseded both WGS and more traditional linkage studies for investigating rare genetic disorders. Furthermore, exome sequencing has proven to be a powerful diagnostic tool that has revolutionized clinical genetics. By elucidating disorders of unknown genetic aetiology, exome sequencing can inform custom treatment options, thus ushering in a new era of truly personalized medicine. Off-target data are an integral, but overlooked, component of exome sequencing. Regardless of the underlying technology, enrichment strategies attempt to strike a balance between on-target stringency and coverage maximization. The fortunate side effect of this delicate equilibrium is off-target reads, which are often regarded as wasteful and dispensable. As exome sequencing is maturing, it will continue to generate increasing amounts of off-target data that represent an unexplored treasure of genomic information. cnvOffSeq is the first method to tap into this valuable resource for the purpose of CNV detection. Off-target enrichment arises through various processes leading to highly heterogeneous off-target coverage. Off-target read depth is affected by a mixture of deterministic biases and random noise that require individualized treatment, but are difficult to determine a priori. As a result off-target data pose a significant challenge for CNV detection that prompted the development of a tailored data-driven normalization approach, implemented in cnvOffSeq. We have demonstrated that our local adaptive SVD approach provides a flexible and robust framework for off-target read depth normalization that can enhance true signal while eliminating capture artifacts. cnvOffSeq clearly outperforms CNV detection methods that were designed for WGS. Furthermore, it was shown to improve upon both high-order and low-order singular component filtering, thus amounting to more than the sum of its parts. cnvOffSeq was tested on data from both Agilent and Nimblegen capture assays but remains platform-agnostic. Like all SVD-based techniques, cnvOffSeq’s ability to mitigate systematic bias improves with larger sample sizes. However, even in the absence of sufficiently large datasets, cnvOffSeq retains some of its de-noising capacity by essentially reverting to low-order singular component filtering. Finally, the modular nature of our pipeline allows for the RD normalization to be separated from the CNV calling and incorporated into future applications as a pre-processing step.

5 CONCLUSION

cnvOffSeq offers a new perspective on exome sequencing analysis and provides the tools for repurposing previously discarded surplus into meaningful data. Given the abundance of exome datasets, cnvOffSeq constitutes a powerful, novel method for investigating intergenic CNV and exploring its contribution to disease and phenotypic variation. Funding: Research leading to these results has received funding from the European Union’s seventh Framework program (EC-GA 279185). This work was also supported by the Australian Research Council (FT110100972 to L.J.M.C). Conflict of interest: none declared.

17 in total

1. De novo mutations of SETBP1 cause Schinzel-Giedion syndrome.

Authors: Alexander Hoischen; Bregje W M van Bon; Christian Gilissen; Peer Arts; Bart van Lier; Marloes Steehouwer; Petra de Vries; Rick de Reuver; Nienke Wieskamp; Geert Mortier; Koen Devriendt; Marta Z Amorim; Nicole Revencu; Alexa Kidd; Mafalda Barbosa; Anne Turner; Janine Smith; Christina Oley; Alex Henderson; Ian M Hayes; Elizabeth M Thompson; Han G Brunner; Bert B A de Vries; Joris A Veltman
Journal: Nat Genet Date: 2010-05-02 Impact factor: 38.330

2. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.

Authors: Alexej Abyzov; Alexander E Urban; Michael Snyder; Mark Gerstein
Journal: Genome Res Date: 2011-02-07 Impact factor: 9.043

3. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV.

Authors: Jarupon Fah Sathirapongsasuti; Hane Lee; Basil A J Horst; Georg Brunner; Alistair J Cochran; Scott Binder; John Quackenbush; Stanley F Nelson
Journal: Bioinformatics Date: 2011-08-09 Impact factor: 6.937

Review 4. Exome sequencing as a tool for Mendelian disease gene discovery.

Authors: Michael J Bamshad; Sarah B Ng; Abigail W Bigham; Holly K Tabor; Mary J Emond; Deborah A Nickerson; Jay Shendure
Journal: Nat Rev Genet Date: 2011-09-27 Impact factor: 53.242

5. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

6. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome.

Authors: Sarah B Ng; Abigail W Bigham; Kati J Buckingham; Mark C Hannibal; Margaret J McMillin; Heidi I Gildersleeve; Anita E Beck; Holly K Tabor; Gregory M Cooper; Heather C Mefford; Choli Lee; Emily H Turner; Joshua D Smith; Mark J Rieder; Koh-Ichiro Yoshiura; Naomichi Matsumoto; Tohru Ohta; Norio Niikawa; Deborah A Nickerson; Michael J Bamshad; Jay Shendure
Journal: Nat Genet Date: 2010-08-15 Impact factor: 38.330

7. Comparison of three targeted enrichment strategies on the SOLiD sequencing platform.

Authors: Dale J Hedges; Toumy Guettouche; Shan Yang; Guney Bademci; Ashley Diaz; Ashley Andersen; William F Hulme; Sara Linker; Arpit Mehta; Yvonne J K Edwards; Gary W Beecham; Eden R Martin; Margaret A Pericak-Vance; Stephan Zuchner; Jeffery M Vance; John R Gilbert
Journal: PLoS One Date: 2011-04-29 Impact factor: 3.240

8. Exome sequencing identifies the cause of a mendelian disorder.

Authors: Sarah B Ng; Kati J Buckingham; Choli Lee; Abigail W Bigham; Holly K Tabor; Karin M Dent; Chad D Huff; Paul T Shannon; Ethylin Wang Jabs; Deborah A Nickerson; Jay Shendure; Michael J Bamshad
Journal: Nat Genet Date: 2009-11-13 Impact factor: 38.330

9. Accurate whole human genome sequencing using reversible terminator chemistry.

Authors: David R Bentley; Shankar Balasubramanian; Harold P Swerdlow; Geoffrey P Smith; John Milton; Clive G Brown; Kevin P Hall; Dirk J Evers; Colin L Barnes; Helen R Bignell; Jonathan M Boutell; Jason Bryant; Richard J Carter; R Keira Cheetham; Anthony J Cox; Darren J Ellis; Michael R Flatbush; Niall A Gormley; Sean J Humphray; Leslie J Irving; Mirian S Karbelashvili; Scott M Kirk; Heng Li; Xiaohai Liu; Klaus S Maisinger; Lisa J Murray; Bojan Obradovic; Tobias Ost; Michael L Parkinson; Mark R Pratt; Isabelle M J Rasolonjatovo; Mark T Reed; Roberto Rigatti; Chiara Rodighiero; Mark T Ross; Andrea Sabot; Subramanian V Sankar; Aylwyn Scally; Gary P Schroth; Mark E Smith; Vincent P Smith; Anastassia Spiridou; Peta E Torrance; Svilen S Tzonev; Eric H Vermaas; Klaudia Walter; Xiaolin Wu; Lu Zhang; Mohammed D Alam; Carole Anastasi; Ify C Aniebo; David M D Bailey; Iain R Bancarz; Saibal Banerjee; Selena G Barbour; Primo A Baybayan; Vincent A Benoit; Kevin F Benson; Claire Bevis; Phillip J Black; Asha Boodhun; Joe S Brennan; John A Bridgham; Rob C Brown; Andrew A Brown; Dale H Buermann; Abass A Bundu; James C Burrows; Nigel P Carter; Nestor Castillo; Maria Chiara E Catenazzi; Simon Chang; R Neil Cooley; Natasha R Crake; Olubunmi O Dada; Konstantinos D Diakoumakos; Belen Dominguez-Fernandez; David J Earnshaw; Ugonna C Egbujor; David W Elmore; Sergey S Etchin; Mark R Ewan; Milan Fedurco; Louise J Fraser; Karin V Fuentes Fajardo; W Scott Furey; David George; Kimberley J Gietzen; Colin P Goddard; George S Golda; Philip A Granieri; David E Green; David L Gustafson; Nancy F Hansen; Kevin Harnish; Christian D Haudenschild; Narinder I Heyer; Matthew M Hims; Johnny T Ho; Adrian M Horgan; Katya Hoschler; Steve Hurwitz; Denis V Ivanov; Maria Q Johnson; Terena James; T A Huw Jones; Gyoung-Dong Kang; Tzvetana H Kerelska; Alan D Kersey; Irina Khrebtukova; Alex P Kindwall; Zoya Kingsbury; Paula I Kokko-Gonzales; Anil Kumar; Marc A Laurent; Cynthia T Lawley; Sarah E Lee; Xavier Lee; Arnold K Liao; Jennifer A Loch; Mitch Lok; Shujun Luo; Radhika M Mammen; John W Martin; Patrick G McCauley; Paul McNitt; Parul Mehta; Keith W Moon; Joe W Mullens; Taksina Newington; Zemin Ning; Bee Ling Ng; Sonia M Novo; Michael J O'Neill; Mark A Osborne; Andrew Osnowski; Omead Ostadan; Lambros L Paraschos; Lea Pickering; Andrew C Pike; Alger C Pike; D Chris Pinkard; Daniel P Pliskin; Joe Podhasky; Victor J Quijano; Come Raczy; Vicki H Rae; Stephen R Rawlings; Ana Chiva Rodriguez; Phyllida M Roe; John Rogers; Maria C Rogert Bacigalupo; Nikolai Romanov; Anthony Romieu; Rithy K Roth; Natalie J Rourke; Silke T Ruediger; Eli Rusman; Raquel M Sanches-Kuiper; Martin R Schenker; Josefina M Seoane; Richard J Shaw; Mitch K Shiver; Steven W Short; Ning L Sizto; Johannes P Sluis; Melanie A Smith; Jean Ernest Sohna Sohna; Eric J Spence; Kim Stevens; Neil Sutton; Lukasz Szajkowski; Carolyn L Tregidgo; Gerardo Turcatti; Stephanie Vandevondele; Yuli Verhovsky; Selene M Virk; Suzanne Wakelin; Gregory C Walcott; Jingwen Wang; Graham J Worsley; Juying Yan; Ling Yau; Mike Zuerlein; Jane Rogers; James C Mullikin; Matthew E Hurles; Nick J McCooke; John S West; Frank L Oaks; Peter L Lundberg; David Klenerman; Richard Durbin; Anthony J Smith
Journal: Nature Date: 2008-11-06 Impact factor: 49.962

10. cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data.

Authors: Evangelos Bellos; Michael R Johnson; Lachlan J M Coin
Journal: Genome Biol Date: 2012-12-22 Impact factor: 13.583

5 in total

1. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing.

Authors: Eric Talevich; A Hunter Shain; Thomas Botton; Boris C Bastian
Journal: PLoS Comput Biol Date: 2016-04-21 Impact factor: 4.475

2. Discovery of targetable genetic alterations in advanced non-small cell lung cancer using a next-generation sequencing-based circulating tumor DNA assay.

Authors: Helei Hou; Xiaonan Yang; Jinping Zhang; Zhe Zhang; Xiaomei Xu; Xiaoping Zhang; Chuantao Zhang; Dong Liu; Weihua Yan; Na Zhou; Hongmei Zhu; Zhaoyang Qian; Zhuokun Li; Xiaochun Zhang
Journal: Sci Rep Date: 2017-11-06 Impact factor: 4.379