Literature DB >> 28630945

The ICR96 exon CNV validation series: a resource for orthogonal assessment of exon CNV calling in NGS data.

Shazia Mahamdallie^1,2, Elise Ruark^1,2, Shawn Yost^1,2, Emma Ramsay^1,2, Imran Uddin^1,2, Harriett Wylie^1,2, Anna Elliott^1,2, Ann Strydom^1,2, Anthony Renwick¹, Sheila Seal^1,2, Nazneen Rahman^1,2,3.

Abstract

Detection of deletions and duplications of whole exons (exon CNVs) is a key requirement of genetic testing. Accurate detection of this variant type has proved very challenging in targeted next-generation sequencing (NGS) data, particularly if only a single exon is involved. Many different NGS exon CNV calling methods have been developed over the last five years. Such methods are usually evaluated using simulated and/or in-house data due to a lack of publicly-available datasets with orthogonally generated results. This hinders tool comparisons, transparency and reproducibility. To provide a community resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs, the most difficult type of exon CNV to detect. The FASTQ files for the ICR96 exon CNV validation series can be accessed through the European-Genome phenome Archive (EGA) under the accession number EGAS00001002428.

Entities: Chemical Disease Gene Species

Keywords: BRCA1; BRCA2; CNV calling; NGS; TruSight Cancer; cancer predisposition gene; next-generation sequencing

Year: 2017 PMID： 28630945 PMCID： PMC5473400 DOI： 10.12688/wellcomeopenres.11689.1

Source DB: PubMed Journal: Wellcome Open Res ISSN： 2398-502X

Introduction

The use of targeted next-generation sequencing (NGS) in clinical genomics has increased the capacity, throughput and affordability of gene testing [1– 3]. Use of NGS data in the clinical setting requires comprehensive validation of methods. Ideally, this should include evaluation of the NGS test performance in samples with pre-determined positive and negative results to provide information on sensitivity, specificity and false detection rate. Deletions and duplications of whole exons, termed ‘exon copy number variants’ or ‘exon CNVs’, are an important class of clinically relevant gene mutations [4]. Accurate exon CNV detection has proved difficult in targeted NGS data, particularly if only a single exon is affected [5]. This has led many research and clinical laboratories to either exclude exon CNV detection or to use separate methods for their detection, which can substantially increase the time and cost of tests. Datasets are available for base substitutions and small insertions and deletions [6, 7] and for copy number variants [6], but datasets with experimentally validated exon CNV data are not widely available. As a result, methods for detecting exon CNVs in NGS data are usually evaluated using simulated and/or in-house data. This hinders tool comparisons, transparency and reproducibility. We recently released DECoN ( www.icr.ac.uk/DECoN), a tool optimised to detect exon CNVs in targeted NGS panels in the clinical setting [8]. During our validation of DECoN performance, we utilised samples with orthogonally generated exon CNV data. This proved extremely valuable in our evaluations and we believe such data will also be highly useful to others. We have therefore put together the ICR96 exon CNV validation series, which we present here. This was undertaken as part of the Transforming Genetic Medicine Initiative (TGMI, www.thetgmi.org), a Wellcome funded initiative which is developing frameworks and resources to facilitate genetic medicine. The ICR96 exon CNV validation series includes data from 96 samples. Each sample has sequencing data generated using the TruSight Cancer Panel (TSCP) a gene-targeted NGS assay for analysis of cancer predisposition genes [8– 11] and Multiplex Ligation-dependent Probe Amplification (MLPA) data [12]. 66 samples contain at least one validated exon CNV, including 25 single exon CNVs and 30 samples have validated negative results for 26 genes. Of note, the series has excellent representation of the cancer predisposition genes most frequently tested in clinical practice with 46 exon CNVs in one of BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN. The ICR96 exon CNV validation series has been extremely helpful in our assessment of exon CNV detection tools, and the comprehensive orthogonal data allows evaluation of sensitivity, specificity and false detection rate. We believe the ICR96 exon CNV validation series could serve as a benchmarking set, particularly for the many clinical and research laboratories now undertaking cancer predisposition gene testing.

Materials and methods

The data included in this resource were generated from two types of studies. Firstly, through the BOCS, FACT and COG studies, which aimed to discover and characterise disease predisposition genes. All patients gave informed consent for use of their DNA in genetic research. The studies have been approved by the London Multicentre Research Ethics Committee (MREC/01/2/18, MREC/01/2/044, 05/MRE02/17 respectively). Secondly, the data included here was obtained through clinical testing by the TGLclinical laboratory, an ISO 15189 accredited genetic testing laboratory that we run. The consent given from patients tested through TGLclinical includes, as standard, consent for the use of samples for quality-control. It also provides the option of consenting to the use of samples/data in research; all patients whose data was included in the ICR96 series approved this option. We generated high-quality targeted NGS data for the ICR96 exon CNV validation series using the TruSight Cancer Panel (TSCP) v2 which targets exons from 100 cancer predisposition genes ( Supplementary File 1). We prepared targeted DNA libraries from 50 ng genomic DNA using the TSCP and TruSight Rapid Capture kit (Illumina). We followed the manufacturer’s protocol, with the exception of library enrichment pool complexity, which we performed in 48-plex. We sequenced a final 10 pM pooled library on a HiSeq 2500 platform set in Rapid-run mode following standard protocols: 96-plex pool per flow cell, HiSeq® Rapid SBS Kit v2, 101 bp paired-end dual index run, and onboard clustering using HiSeq® Rapid PE Cluster Kit v2. CASAVA v1.8.1 (Illumina) was used to demultiplex and create FASTQ files per sample from the raw base call files. All 96 samples also had independently generated exon CNV data available. Standard MLPA and/or MLPA by NGS was performed for one or more of the following 32 genes: APC, ATM, BAP1, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1, CDH1, CDK4, CDKN2A, CHEK2, EPCAM (exon 9 only) , FH, MLH1, MSH2, MSH6, MUTYH, NBN, NF1, NSD1, PALB2, PMS2 (excluding exons 12-15), PTEN, RAD51C, RAD51D, RB1, SDHB, SMAD4, STK11, TP53 and WT1 ( Supplementary Table 1). The EZH2 exon 1-20 deletion was identified by comparative genomic hybridisation (CGH) array and was also confirmed by fluorescent in situ hybridisation (FISH). For simplicity, we included this one CGH result with the MLPA results. We provide genomic coordinates in both build 37 and build 38 for all results ( Supplementary Table 1). The genomic coordinates are the most 5’ and most 3’ coordinates of the exons involved in the exon CNV, as determined by MLPA, according to the specified transcript. Of note, these are not the actual breakpoints; neither MLPA nor targeted NGS data can provide breakpoint sequence information for exon CNVs. We provide the MLPA results for all exon CNVs using the following notation “Exon X deletion/duplication” for single exon CNVs and “Exon X-Y deletion/duplication” for exon CNVs involving more than one exon, where X specifies the number of the first exon involved in the exon CNV with respect to the transcript, Y specifies the number of the last exon involved in the exon CNV with respect to the transcript, and deletion or duplication is specified as appropriate. For all genes except BRCA1 the numbering is consecutive from the first non-coding exon in the transcript. For BRCA1 we use the conventional clinical numbering system which does not include exon 4.

Dataset

The ICR96 exon CNV validation series includes samples from 96 individuals. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes ( Supplementary Table 1). Two of the 66 individuals had an exon CNV in two different genes, such that the dataset includes a total of 68 exon CNVs. This includes 25 single exon CNVs, the most difficult type of exon CNV to detect. The dataset can be used to evaluate the performance of any tool that aims to detect exon CNVs in NGS data. It has particular utility in validating cancer predisposition gene exon CNV detection. The dataset has excellent representation of the cancer predisposition genes most frequently tested in clinical practice. BRCA1 and BRCA2 are particularly well represented, with 15 BRCA1 exon CNVs and 10 BRCA2 exon CNVs, of which 11 and four respectively, are single exon CNVs. The 25 BRCA1 and BRCA2 exon CNVs include 22 different mutations. We deliberately included, in the same pool, two separate samples with a BRCA1 exon 13 duplication. This small exon duplication is one of the most common BRCA1 mutations in the UK [13] and hence we wanted to cover the clinical scenario of having two different individuals with this mutation in the same sequencing run. To provide further representation of the cancer predisposition genes most frequently tested in clinical practice, the dataset includes 21 exon CNVs in MLH1, MSH2, MSH6, PMS2, EPCAM, PTEN or TP53. Between the two pools, we ensured there was no difference in the representation of exon CNVs in any particular gene or in the proportion of samples without an exon CNV, to minimise potential batch effects ( Table 1).

Table 1.

MLPA results for each pool in the ICR96 exon CNV validation series.

	Pool 1	Pool 2
Gene	Number of exon CNVs
ATM	1	1
BRCA1	7	8
BRCA2	5	5
CHEK2	2	3
EPCAM	0	1
EZH2	0	1
FH	1	0
MLH1	1	0
MSH2	4	4
MSH6	1	1
NF1	1	0
NSD1	3	3
PALB2	1	0
PMS2	3	2
PTEN	0	1
RAD51C	0	1
RB1	1	0
SDHB	1	1
TP53	1	2
WT1	0	1
Total	33	35
	Samples with no exon CNV
APC, ATM, BAP1, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1, CDH1, CDK4, CDKN2A, CHEK2, EPCAM, MLH1, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51C, RAD51D, SMAD4, STK11, TP53	15	15

Data availability

The data referenced by this article are under copyright with the following copyright statement: Copyright: © 2017 Mahamdallie S et al. We have deposited the FASTQ files for all 96 samples in the European Genome-phenome archive (EGA). The accession number is EGAS00001002428. Details of how to access the data is available at EGA or from www.icr.ac.uk/icr96. Researchers and authors that use the ICR96 exon CNV validation series should reference this paper and should include the following acknowledgement: “This study makes use of the ICR96 exon CNV validation series data generated by Professor Nazneen Rahman’s team at The Institute of Cancer Research, London as part of the TGMI”. This manuscript describes a dataset consisting of massive paralleled sequencing data (FASTQ files) from 96 human DNA samples (ICR96 samples) which are enriched in samples (66/96) with MLPA validated exon CNVs (especially single exon CNVs). The authors used this dataset to evaluate their recently released method DECoN, a CNV detection tool for targeted NGS panel data and provide these data for similar use to other laboratories. We evaluated the usability of this dataset with our recently published method panelcn.MOPS. Our data access was processed quickly without problems. Analysis of all 96 samples with panelcn.MOPS revealed that the dataset is less homogenous than the data used in our study comparing panelcn.MOPS to five different CNV detection tools. Compared to the dataset used in our study, more samples and regions of the ICR96 samples were classified as low-quality and low correlation of read counts between test and control samples was observed in many cases. Additional optimization of panelcn.MOPS to the provided dataset was required, showing that any method needs to be adapted to the data to be analyzed. Overall, the manuscript clearly describes a very useful dataset for the evaluation of CNV detection in targeted NGS data. Minor comments: Although it is mentioned briefly in Materials and Methods, the origin of the two different pools should be described again in the Dataset section. Since both, GRCh37 and GRCh38, coordinates are provided in Supplementary Table 1, it should be specified which coordinates are given in the BED file that is provided as Supplementary File 1. For easier use in method evaluations, it would be helpful to include the exon number used in Supplementary Table 1 also in Supplementary File 1 e.g. writing SDHB.E8.chr1.17345375.17345454 in column 4 instead of SDHB.chr1.17345375.17345454. We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. This paper describes a resource which is publicly available to laboratories wishing to validate protocols for using NGS data to detect exon CNVs in a clinical setting. The CNV validation series appears to be soundly designed and internally validated against a dataset generated by MLPA of normal, single exon and multi exon CNVs. The focus of the series is on the genes most commonly implicated in Mendelian cancer predisposition syndromes with data generated using the TruSight Cancer Panel assay. It is not entirely clear how valid this dataset would be for assessing CNV detection tools for other targeted gene panels or for data generated by other NGS chemistries and/or platforms. Nevertheless, this paper describes a useful quality and benchmarking resource for clinical laboratories offering testing for cancer predisposition genes, particularly using the TruSight Cancer Panel assay. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

14 in total

1. Detection of clinically relevant copy number variants with whole-exome sequencing.

Authors: Joep de Ligt; Philip M Boone; Rolph Pfundt; Lisenka E L M Vissers; Todd Richmond; Joel Geoghegan; Kathleen O'Moore; Nicole de Leeuw; Christine Shaw; Han G Brunner; James R Lupski; Joris A Veltman; Jayne Y Hehir-Kwa
Journal: Hum Mutat Date: 2013-08-30 Impact factor: 4.878

2. Detection of high frequency of mutations in a breast and/or ovarian cancer cohort: implications of embracing a multi-gene panel in molecular diagnosis in India.

Authors: Ashraf U Mannan; Jaya Singh; Ravikiran Lakshmikeshava; Nishita Thota; Suhasini Singh; T S Sowmya; Avshesh Mishra; Aditi Sinha; Shivani Deshwal; Megha R Soni; Anbukayalvizhi Chandrasekar; Bhargavi Ramesh; Bharat Ramamurthy; Shila Padhi; Payal Manek; Ravi Ramalingam; Suman Kapoor; Mithua Ghosh; Satish Sankaran; Arunabha Ghosh; Vamsi Veeramachaneni; Preveen Ramamoorthy; Ramesh Hariharan; Kalyanasundaram Subramanian
Journal: J Hum Genet Date: 2016-02-25 Impact factor: 3.172

3. Extensive sequencing of seven human genomes to characterize benchmark reference materials.

Authors: Justin M Zook; David Catoe; Jennifer McDaniel; Lindsay Vang; Noah Spies; Arend Sidow; Ziming Weng; Yuling Liu; Christopher E Mason; Noah Alexander; Elizabeth Henaff; Alexa B R McIntyre; Dhruva Chandramohan; Feng Chen; Erich Jaeger; Ali Moshrefi; Khoa Pham; William Stedman; Tiffany Liang; Michael Saghbini; Zeljko Dzakula; Alex Hastie; Han Cao; Gintaras Deikus; Eric Schadt; Robert Sebra; Ali Bashir; Rebecca M Truty; Christopher C Chang; Natali Gulbahce; Keyan Zhao; Srinka Ghosh; Fiona Hyland; Yutao Fu; Mark Chaisson; Chunlin Xiao; Jonathan Trow; Stephen T Sherry; Alexander W Zaranek; Madeleine Ball; Jason Bobe; Preston Estep; George M Church; Patrick Marks; Sofia Kyriazopoulou-Panagiotopoulou; Grace X Y Zheng; Michael Schnall-Levin; Heather S Ordonez; Patrice A Mudivarti; Kristina Giorda; Ying Sheng; Karoline Bjarnesdatter Rypdal; Marc Salit
Journal: Sci Data Date: 2016-06-07 Impact factor: 6.444

4. Next-generation sequencing for the diagnosis of hereditary breast and ovarian cancer using genomic capture targeting multiple candidate genes.

Authors: Laurent Castéra; Sophie Krieger; Antoine Rousselin; Angélina Legros; Jean-Jacques Baumann; Olivia Bruet; Baptiste Brault; Robin Fouillet; Nicolas Goardon; Olivier Letac; Stéphanie Baert-Desurmont; Julie Tinat; Odile Bera; Catherine Dugast; Pascaline Berthet; Florence Polycarpe; Valérie Layet; Agnes Hardouin; Thierry Frébourg; Dominique Vaur
Journal: Eur J Hum Genet Date: 2014-02-19 Impact factor: 4.246

Review 5. Disease-targeted sequencing: a cornerstone in the clinic.

Authors: Heidi L Rehm
Journal: Nat Rev Genet Date: 2013-03-12 Impact factor: 53.242

6. The ICR142 NGS validation series: a resource for orthogonal assessment of NGS analysis.

Authors: Elise Ruark; Anthony Renwick; Matthew Clarke; Katie Snape; Emma Ramsay; Anna Elliott; Sandra Hanks; Ann Strydom; Sheila Seal; Nazneen Rahman
Journal: F1000Res Date: 2016-03-22

7. Benchmarking of Whole Exome Sequencing and Ad Hoc Designed Panels for Genetic Testing of Hereditary Cancer.

Authors: Lídia Feliubadaló; Raúl Tonda; Mireia Gausachs; Jean-Rémi Trotta; Elisabeth Castellanos; Adriana López-Doriga; Àlex Teulé; Eva Tornero; Jesús Del Valle; Bernat Gel; Marta Gut; Marta Pineda; Sara González; Mireia Menéndez; Matilde Navarro; Gabriel Capellá; Ivo Gut; Eduard Serra; Joan Brunet; Sergi Beltran; Conxi Lázaro
Journal: Sci Rep Date: 2017-01-04 Impact factor: 4.379

8. panelcn.MOPS: Copy-number detection in targeted NGS panel data for clinical diagnostics.

Authors: Gundula Povysil; Antigoni Tzika; Julia Vogt; Verena Haunschmid; Ludwine Messiaen; Johannes Zschocke; Günter Klambauer; Sepp Hochreiter; Katharina Wimmer
Journal: Hum Mutat Date: 2017-05-16 Impact factor: 4.878

9. Accurate clinical detection of exon copy number variants in a targeted NGS panel using DECoN.

Authors: Anna Fowler; Shazia Mahamdallie; Elise Ruark; Sheila Seal; Emma Ramsay; Matthew Clarke; Imran Uddin; Harriet Wylie; Ann Strydom; Gerton Lunter; Nazneen Rahman
Journal: Wellcome Open Res Date: 2016-11-25

10. Development of a Comprehensive Sequencing Assay for Inherited Cardiac Condition Genes.

Authors: Chee Jian Pua; Jaydutt Bhalshankar; Kui Miao; Roddy Walsh; Shibu John; Shi Qi Lim; Kingsley Chow; Rachel Buchan; Bee Yong Soh; Pei Min Lio; Jaclyn Lim; Sebastian Schafer; Jing Quan Lim; Patrick Tan; Nicola Whiffin; Paul J Barton; James S Ware; Stuart A Cook
Journal: J Cardiovasc Transl Res Date: 2016-02-17 Impact factor: 4.132

4 in total

1. The ICR639 CPG NGS validation series: A resource to assess analytical sensitivity of cancer predisposition gene testing.

Authors: Shazia Mahamdallie; Elise Ruark; Esty Holt; Emma Poyastro-Pearson; Anthony Renwick; Ann Strydom; Sheila Seal; Nazneen Rahman
Journal: Wellcome Open Res Date: 2018-06-12

2. SavvyCNV: Genome-wide CNV calling from off-target reads.

Authors: Thomas W Laver; Elisa De Franco; Matthew B Johnson; Kashyap A Patel; Sian Ellard; Michael N Weedon; Sarah E Flanagan; Matthew N Wakeling
Journal: PLoS Comput Biol Date: 2022-03-16 Impact factor: 4.475

3. ifCNV: A novel isolation-forest-based package to detect copy-number variations from various targeted NGS datasets.

Authors: Simon Cabello-Aguilar; Julie A Vendrell; Charles Van Goethem; Mehdi Brousse; Catherine Gozé; Laurent Frantz; Jérôme Solassol
Journal: Mol Ther Nucleic Acids Date: 2022-09-22 Impact factor: 10.183

4. The Quality Sequencing Minimum (QSM): providing comprehensive, consistent, transparent next generation sequencing data quality assurance.

Authors: Shazia Mahamdallie; Elise Ruark; Shawn Yost; Márton Münz; Anthony Renwick; Emma Poyastro-Pearson; Ann Strydom; Sheila Seal; Nazneen Rahman
Journal: Wellcome Open Res Date: 2018-04-04

4 in total