Literature DB >> 30175241

The ICR639 CPG NGS validation series: A resource to assess analytical sensitivity of cancer predisposition gene testing.

Shazia Mahamdallie1,2, Elise Ruark1,2, Esty Holt1, Emma Poyastro-Pearson1,2, Anthony Renwick1, Ann Strydom1,2, Sheila Seal1,2, Nazneen Rahman1,2,3.   

Abstract

The analytical sensitivity of a next generation sequencing (NGS) test reflects the ability of the test to detect real sequence variation. The evaluation of analytical sensitivity relies on the availability of gold-standard, validated, benchmarking datasets. For NGS analysis the availability of suitable datasets has been limited. Most laboratories undertake small scale evaluations using in-house data, and/or rely on in silico generated datasets to evaluate the performance of NGS variant detection pipelines. Cancer predisposition genes (CPGs), such as BRCA1 and BRCA2, are amongst the most widely tested genes in clinical practice today. Hundreds of providers across the world are now offering CPG testing using NGS methods. Validating and comparing the analytical sensitivity of CPG tests has proved difficult, due to the absence of comprehensive, orthogonally validated, benchmarking datasets of CPG pathogenic variants. To address this we present the ICR639 CPG NGS validation series. This dataset comprises data from 639 individuals. Each individual has sequencing data generated using the TruSight Cancer Panel (TSCP), a targeted NGS assay for the analysis of CPGs, together with orthogonally generated data showing the presence of at least one CPG pathogenic variant per individual. The set consists of 645 pathogenic variants in total. There is strong representation of the most challenging types of variants to detect, with 339 indels, including 16 complex indels and 24 with length greater than five base pairs and 74 exon copy number variations (CNVs) including 23 single exon CNVs. The series includes pathogenic variants in 31 CPGs, including 502 pathogenic variants in BRCA1 or BRCA2, making this an important comprehensive validation dataset for providers of BRCA1 and BRCA2 NGS testing. We have deposited the TSCP FASTQ files of the ICR639 series in the European Genome-phenome Archive (EGA) under accession number EGAD00001004134.

Entities:  

Keywords:  BRCA1 and BRCA2; analytical sensitivity; cancer predisposition gene; genetic testing; next generation sequencing; validation; variant benchmarking

Year:  2018        PMID: 30175241      PMCID: PMC6081973          DOI: 10.12688/wellcomeopenres.14594.1

Source DB:  PubMed          Journal:  Wellcome Open Res        ISSN: 2398-502X


Introduction

For a clinical test based on next generation sequencing (NGS) to be approved for use, its performance with respect to accuracy, analytical sensitivity, analytical specificity and precision, must be evaluated [1– 4]. Analytical sensitivity refers to the ability of a sequencing test to detect real sequence variation. The evaluation of analytical sensitivity therefore relies on the availability of gold-standard, validated, benchmarking datasets. For NGS analysis the availability of suitable datasets has been limited. Most laboratories undertake small scale evaluations, using in-house data that seldom comprehensively covers the spectra of variant types the test must detect [5, 6]. Many laboratories also rely on in silico generated datasets to evaluate the performance of NGS variant detection pipelines. Whilst of value, in silico data cannot completely replace experimental data generated from biological samples that have been orthogonally validated [7]. Cancer predisposition genes (CPGs), such as BRCA1 and BRCA2, are amongst the most widely tested genes in clinical practice [6, 8– 10]. Hundreds of providers across the world are now offering CPG testing using NGS methods, either through panel, exome, or whole genome testing [9]. Increasingly, the analysis of the data is processed separately to the generation of data and the clinical reporting of results, sometimes through outsourcing data analysis to a separate provider. This makes assessments and comparisons of analytical sensitivity even more challenging. We have conducted CPG testing in research and clinical settings for over a decade, identifying many hundreds of pathogenic variants. We have generated extensive sequence-based data on thousands of samples using a variety of technologies including NGS methods, PCR amplification with Sanger sequencing, standard Multiplex Ligation-dependent Probe Amplification (MLPA), MLPA by NGS and Conformation Sensitive Gel Electrophoresis (CSGE). To validate the analytical sensitivity of our ISO 15189 accredited CPG NGS clinical testing pipeline, we used data from 639 individuals known to have pathogenic variants in CPGs through testing by other methods. This validation resource has proved invaluable for ensuring optimal analytical sensitivity during the initial and ongoing development of our NGS pipelines. To assist those without access to extensive validated datasets we have put together the ICR639 CPG NGS validation series, which we present here. The ICR639 CPG NGS validation series comprises data from 639 individuals. Each individual has sequencing data generated using the TruSight Cancer Panel (TSCP), a targeted NGS assay for the analysis of CPGs [11], together with orthogonally generated data showing the presence of at least one CPG pathogenic variant per individual. The set consists of 645 pathogenic variants in total. There is strong representation of the most challenging types of variants, with 339 indels, including 16 complex indels and 24 with length greater than five base pairs and 74 exon copy number variations (CNVs) including 23 single exon CNVs ( Table 1). The series includes pathogenic variants in 31 CPGs. There are 502 pathogenic variants in BRCA1 or BRCA2, making this an important comprehensive validation dataset for providers of BRCA1 and BRCA2 NGS testing. The vast majority of variants occur in extremely high-quality sequencing data, fulfilling a Quality Sequencing Minimum (QSM) of C50_B10(85)_M20(95) [12]. As such, it is anticipated that any accredited test provider will be able to detect these variants.
Table 1.

Summary of pathogenic variant types by gene in the ICR639 CPG NGS validation series.

Pathogenic Variant Type
GeneBase substitutionsDeletionsInsertionsComplex indelsExon CNVsTotal
APC 211004
ATM 311016
BRCA1 8510224330244
BRCA2 86120301111258
BRIP1 4430011
CDH1 100001
CDKN2A 210003
CHEK2 020046
DICER1 102104
EPCAM 000011
FH 200013
GPC3 100001
HRAS 100001
MEN1 101002
MLH1 412018
MSH2 4110814
MSH6 6230213
MUTYH 100001
NF1 120014
PALB2 6930119
PMS2 210047
PTEN 221016
RAD51C 300014
RAD51D 211004
RB1 000011
RET 300003
SDHB 100023
SDHD 100001
SMARCB1 200002
TP53 200136
WT1 300014
The dataset size and comprehensive representation of variant types that can be detected by targeted sequencing, makes the ICR639 CPG NGS validation series a valuable benchmarking resource for providers of CPG testing by NGS. The dataset may also be of value to laboratories analysing other genes, and those performing exome or genome testing which will encompass CPGs. The ICR639 CPG NGS validation series was constructed as part of the Transforming Genetic Medicine Initiative (TGMI, www.thetgmi.org) a Wellcome funded initiative that is developing frameworks and resources to facilitate genetic medicine.

Methods

We used lymphocyte DNA from 639 individuals. The individuals were either recruited to our studies to discover and characterise disease predisposition genes, which have been approved by the London Multicentre Research Ethics Committee (05/MRE02/17, MREC/01/2/044, MREC/01/2/18), or from the TGLclinical laboratory, an ISO 15189 accredited genetic testing laboratory. Written informed consent from patients tested through TGLclinical includes use of samples for quality-control and research. We generated high-quality targeted NGS data for the ICR639 CPG NGS validation series using the TruSight Cancer Panel (TSCP) v2 ( Supporting File 1). We prepared targeted DNA libraries from 50ng genomic DNA using the TSCP and TruSight Rapid Capture kit (Illumina, San Diego, CA, USA). We followed the manufacturer’s protocol with the exception of library enrichment pool complexity, which we performed in 48-plex. For every sample, we sequenced a final 10pM pooled library on a HiSeq 2500 platform set in Rapid-run mode following standard protocols: 96-plex pool per flow cell, HiSeq® Rapid SBS Kit v2, 101 bp paired-end dual index run, and onboard clustering using HiSeq® Rapid PE Cluster Kit v2. CASAVA v.1.8.2 was used to demultiplex and create FASTQ files per sample from the raw base call files. To evaluate data quality, we mapped the sequencing reads to the human reference genome (GRCh37) using Stampy v.1.0.20 [13] with BWA v.0.7.5a [14] for pre-mapping. We used CoverView v.1.1.0 [15] to flag fragments containing the pathogenic variant, which did not fulfil a QSM [12] of C50_B10(85)_M20(95) for all base substitutions and indels. All samples with an exon CNV pathogenic variant passed the default settings of DECoN v.1.0.0 [16]. All 639 individuals also had orthogonally generated data available. These data were generated through either PCR amplification with Sanger sequencing [17], standard MLPA or MLPA by NGS [11, 18]. Annotation of base substitutions and indels follows Clinical Sequencing Notation (CSN) v.1.0 [19] using the RefSeq mRNA transcripts. For all genes except WT1 the coding annotation (c.) starts at 1, the A of the ATG translation initiation codon. For WT1, c.1 is the A of the first in-frame AUG translation initiation codon and the KTS exon 9 sequence is included. We used Ensembl ENST transcripts from release 65 for exon CNV annotation as RefSeq mRNA transcripts do not specify intron/exon boundaries. All exon CNVs are described using the following notation “Exon X deletion/duplication” for single exon CNVs and “Exon X-Y deletion/duplication” for exon CNVs involving more than one exon, where X specifies the number of the first exon involved in the exon CNV with respect to the transcript, Y specifies the number of the last exon involved in the exon CNV with respect to the transcript, and deletion or duplication is specified as appropriate. For all genes except BRCA1 exon numbering is consecutive from the first non-coding exon in the transcript. For BRCA1 we use the conventional clinical numbering system that does not include exon 4. We provide the left-aligned CHR, POS, REF and ALT information according to GRCh37 for base substitutions and indels to allow comparison with Variant Calling Format (VCF) files. All exon CNVs were validated by MLPA. We provide the most 5’ and most 3’ genomic coordinates of the exons involved in the exon CNV according to the exon numbering of the specified transcript. Of note, these are not the actual breakpoints; standardly neither MLPA nor targeted NGS data provides breakpoint sequence information for exon CNVs.

Dataset

The ICR639 CPG NGS validation series includes data from 639 individuals. Six individuals have two different pathogenic variants, so the dataset contains a total of 645 pathogenic CPG variants ( Supporting File 2). The pathogenic variants occur in 31 different genes that are all proven disease-causing genes and are routinely tested in clinical practice [6, 8– 10]. The series includes 502 pathogenic variants in BRCA1 or BRCA2, which are the most commonly tested CPGs, and 43 pathogenic variants in the Lynch syndrome genes ( MLH1, MSH2, MSH6, PMS2 and EPCAM). All 645 pathogenic CPG variants are different and together they cover the variant types routinely detected and reported in clinical genetic testing ( Table 1). There are 232 base substitutions, 323 insertions or deletions, 16 complex indels and 74 exon CNVs. Of note, the set include 24 insertions or deletions with length greater than five base pairs and 23 single exon CNVs, two challenging variant classes to detect in NGS data. The ICR639 CPG NGS validation series comprises high quality sequencing data. For 561 of the 571 base substitutions and indels (98%), the fragment containing the variant fulfilled a QSM of C50_B10(85)_M20(95) [12]. This represents a minimum quality requirement whereby 100% of bases in the fragment had at least 50x depth of coverage with a base quality score of ≥10 in at least 85% of reads and a mapping quality score of ≥20 in at least 95% of reads. For the remaining ten pathogenic variants, the fragment containing the variant did not meet the QSM requirement for either the base quality (n=5) or the mapping quality (n=5). All fragments fulfilled the coverage requirement. We include these variants to allow evaluation of variant detection performance in data with suboptimal base or mapping quality, as such data is commonly encountered in genetic testing. The sequencing data for all 74 exon CNVs fulfilled the minimum quality requirements of DECoN, a batch-based exon CNV calling tool [16], namely a minimum correlation of 0.98 with other samples in its batch and a minimum median coverage metric of 100 across all exons in the target. The ICR639 CPG NGS validation series is thus a high-quality sequencing dataset and users are expected to detect all pathogenic variants in CPG(s) of relevance to their pipeline. We have previously made freely available other datasets that groups may find useful in conjunction with the ICR639 CPG NGS validation series. For example, we generated TSCP data for the NIST-led Genome in a Bottle (GIAB) Consortium reference material (RM) 8398 [15, 20]. We have also made available the ICR142 exome validation series [17] and ICR96 exon CNV validation series [11]. These resources allow evaluation of both sensitivity and specificity, for small variants and exon CNVs respectively. Of note, 50 exon CNVs are included in both the ICR96 exon CNV validation series and the ICR639 CPG NGS validation series.

Data availability

We have deposited the TSCP FASTQ files for all 639 individuals in the European Genome-phenome archive (EGA). The accession number is EGAS00001002993. Details of how to access the data is available at EGA or from www.icr.ac.uk/icr639. The individual level genetic data on EGA is under managed access in line with general recommendations for use of patient information, the specific consent obtained for use of data from these samples and our institutional data access committee. The ICR-GSR data access application form should be completed and returned to rahmanlab@icr.ac.uk. Applications will only be accepted electronically. Access to the data will require the completion of a Data Access Agreement. Any queries regarding access procedures or completion of the forms should be sent to rahmanlab@icr.ac.uk. Supporting data files have been archived on Open Science Framework: http://doi.org/10.17605/OSF.IO/N2VWR [21] under a CC0 1.0 Universal licence. Supporting File 1. TSCP targeted BED file. Targets of the Illumina TruSight Cancer Panel (TSCP) v2 in BED file format. Supporting File 2. Pathogenic variants in the ICR639 CPG NGS validation series. The ICR639 CPG NGS validation series: a resource to assess analytical sensitivity of cancer predisposition gene testing The description of the column headings are given below: SampleID – sample ID in the ICR639 CPG NGS validation series AnnotationTranscript – the transcript used to annotate the variant, either the RefSeq NM ID or the Ensembl v65 ENST ID Gene – HGNC symbol ReportedVariant – Base substitutions and indels are in accordance with CSN v.1.0. Exon CNVs are described with notation “Exon X deletion/duplication” for single exon CNVs and “Exon X-Y deletion/duplication” for multi-exon CNVs, where X is the first exon and Y the last exon involved and deletion/duplication as appropriate VariantType – “bs”, “del”, “ins”, “complex”, or “exonCNV” for base substitutions, deletions, insertions, complex indels, or exon CNV variants, respectively Zygosity – “heterozygous” a pathogenic variant that is present on only one allele AdjacentVariant – annotation according to CSN v.1.0 if a variant was detected adjacent to the reported variant, “.” if no adjacent variant was detected QSMFragmentResult – “PASS” if the fragment containing the variant fulfilled QSM C50_B10(85)_M20(95), “FLAG – BaseQuality” if at least one position in the fragment containing the variant did not fulfil B10(85), “FLAG – MappingQuality” if at least one position in the fragment containing the variant did not fulfil M20(95), “.” if the variant was an exon CNV CHR – chromosome POS – the left-aligned position in GRCh37 coordinates, “.” if the variant was an exon CNV REF – the reference allele in GRCh37, “.” if the variant was an exon CNV ALT – the alternative allele, “.” if the variant was an exon CNV 5PrimeExon37 – most 5’ genomic coordinate of most 5’ exon in GRCh37 3PrimeExon37 – most 3’ genomic coordinate of most 3’ exon in GRCh37 Researchers and authors that use the ICR639 CPG NGS validation series should reference this paper and should include the following acknowledgement: "This study makes use of the ICR639 CPG NGS validation series data generated by Professor Nazneen Rahman’s team at The Institute of Cancer Research, London as part of the TGMI”. Rahman et al., in this study, analyzed cancer predisposition genes (CPGs) using NGS methods in 639 individuals and offering a new important dataset of CPG pathogenic variants for diagnostic community. The paper is well written and clear in its meaning and analysis. In my opinion the article can be accepted for indexing. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The manuscript is clearly written overall and the provided dataset will be a valuable tool for the diagnostics community, specially for those working on hereditary cancer. As the biological samples are not offered, the wetlab part of the diagnostic validation process of any reader’s diagnostics service will need own samples. However, the data analysis step has accquired paramount importance in next generation sequencing techniques. Thus, increasing the dataset size for the validation of the NGS data analysis step allows for a better assessment for the sensitivity and specificity of the technique. Information regarding the methods used to obtain the sequence data and its quality is adequate and the mutation list is detailed. Nevertheless, I have missed some information on the results of the analytical sensitivity assessment of the authors’ lab, and the criteria used to choose the dataset samples and variants. Minor points: The dataset is described as a collection of FASTQs of sequence data from patients and genes where 639 pathogenic variants have been detected by orthogonal techniques, but I cannot find a paragraph explaining if all or some of these mutations have been detected with the described NGS pipeline too. This information would be also very useful. If any of the mutations were not found, a short explanation of the suspected reason for the false negative would be also welcome. The dataset includes variants in regions with suboptimal base quality or mapping quality. Have the authors in their research and clinical experience with previous techniques detected some point mutation located in the region of interest of the TruSight Cancer Panel, that have not been able to detect with the Panel? If any, are they in the dataset or left aside? We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
  20 in total

1.  Extensive sequencing of seven human genomes to characterize benchmark reference materials.

Authors:  Justin M Zook; David Catoe; Jennifer McDaniel; Lindsay Vang; Noah Spies; Arend Sidow; Ziming Weng; Yuling Liu; Christopher E Mason; Noah Alexander; Elizabeth Henaff; Alexa B R McIntyre; Dhruva Chandramohan; Feng Chen; Erich Jaeger; Ali Moshrefi; Khoa Pham; William Stedman; Tiffany Liang; Michael Saghbini; Zeljko Dzakula; Alex Hastie; Han Cao; Gintaras Deikus; Eric Schadt; Robert Sebra; Ali Bashir; Rebecca M Truty; Christopher C Chang; Natali Gulbahce; Keyan Zhao; Srinka Ghosh; Fiona Hyland; Yutao Fu; Mark Chaisson; Chunlin Xiao; Jonathan Trow; Stephen T Sherry; Alexander W Zaranek; Madeleine Ball; Jason Bobe; Preston Estep; George M Church; Patrick Marks; Sofia Kyriazopoulou-Panagiotopoulou; Grace X Y Zheng; Michael Schnall-Levin; Heather S Ordonez; Patrice A Mudivarti; Kristina Giorda; Ying Sheng; Karoline Bjarnesdatter Rypdal; Marc Salit
Journal:  Sci Data       Date:  2016-06-07       Impact factor: 6.444

2.  Digital Multiplex Ligation-Dependent Probe Amplification for Detection of Key Copy Number Alterations in T- and B-Cell Lymphoblastic Leukemia.

Authors:  Anne Benard-Slagter; Ilse Zondervan; Karel de Groot; Farzaneh Ghazavi; Virinder Sarhadi; Pieter Van Vlierberghe; Barbara De Moerloose; Claire Schwab; Kim Vettenranta; Christine J Harrison; Sakari Knuutila; Jan Schouten; Tim Lammens; Suvi Savola
Journal:  J Mol Diagn       Date:  2017-07-19       Impact factor: 5.568

Review 3.  In Silico Proficiency Testing for Clinical Next-Generation Sequencing.

Authors:  Eric J Duncavage; Haley J Abel; John D Pfeifer
Journal:  J Mol Diagn       Date:  2016-11-15       Impact factor: 5.568

4.  Validation of a Next-Generation Sequencing Pipeline for the Molecular Diagnosis of Multiple Inherited Cancer Predisposing Syndromes.

Authors:  Paula Paulo; Pedro Pinto; Ana Peixoto; Catarina Santos; Carla Pinto; Patrícia Rocha; Isabel Veiga; Gabriela Soares; Catarina Machado; Fabiana Ramos; Manuel R Teixeira
Journal:  J Mol Diagn       Date:  2017-05-18       Impact factor: 5.568

5.  Accurate clinical detection of exon copy number variants in a targeted NGS panel using DECoN.

Authors:  Anna Fowler; Shazia Mahamdallie; Elise Ruark; Sheila Seal; Emma Ramsay; Matthew Clarke; Imran Uddin; Harriet Wylie; Ann Strydom; Gerton Lunter; Nazneen Rahman
Journal:  Wellcome Open Res       Date:  2016-11-25

6.  The ICR96 exon CNV validation series: a resource for orthogonal assessment of exon CNV calling in NGS data.

Authors:  Shazia Mahamdallie; Elise Ruark; Shawn Yost; Emma Ramsay; Imran Uddin; Harriett Wylie; Anna Elliott; Ann Strydom; Anthony Renwick; Sheila Seal; Nazneen Rahman
Journal:  Wellcome Open Res       Date:  2017-05-26

Review 7.  Realizing the promise of cancer predisposition genes.

Authors:  Nazneen Rahman
Journal:  Nature       Date:  2014-01-16       Impact factor: 49.962

8.  CSN and CAVA: variant annotation tools for rapid, robust next-generation sequencing analysis in the clinical setting.

Authors:  Márton Münz; Elise Ruark; Anthony Renwick; Emma Ramsay; Matthew Clarke; Shazia Mahamdallie; Victoria Cloke; Sheila Seal; Ann Strydom; Gerton Lunter; Nazneen Rahman
Journal:  Genome Med       Date:  2015-07-28       Impact factor: 11.117

Review 9.  New challenges for BRCA testing: a view from the diagnostic laboratory.

Authors:  Andrew J Wallace
Journal:  Eur J Hum Genet       Date:  2016-09       Impact factor: 4.246

10.  Clinical testing of BRCA1 and BRCA2: a worldwide snapshot of technological practices.

Authors:  Amanda Ewart Toland; Andrea Forman; Fergus J Couch; Julie O Culver; Diana M Eccles; William D Foulkes; Frans B L Hogervorst; Claude Houdayer; Ephrat Levy-Lahad; Alvaro N Monteiro; Susan L Neuhausen; Sharon E Plon; Shyam K Sharan; Amanda B Spurdle; Csilla Szabo; Lawrence C Brody
Journal:  NPJ Genom Med       Date:  2018-02-15       Impact factor: 8.617

View more
  4 in total

1.  Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study.

Authors:  Jonathan Foox; Scott W Tighe; Charles M Nicolet; Justin M Zook; Marta Byrska-Bishop; Wayne E Clarke; Michael M Khayat; Medhat Mahmoud; Phoebe K Laaguiby; Zachary T Herbert; Derek Warner; George S Grills; Jin Jen; Shawn Levy; Jenny Xiang; Alicia Alonso; Xia Zhao; Wenwei Zhang; Fei Teng; Yonggang Zhao; Haorong Lu; Gary P Schroth; Giuseppe Narzisi; William Farmerie; Fritz J Sedlazeck; Don A Baldwin; Christopher E Mason
Journal:  Nat Biotechnol       Date:  2021-09-09       Impact factor: 54.908

2.  The ICR639 CPG NGS validation series: A resource to assess analytical sensitivity of cancer predisposition gene testing.

Authors:  Shazia Mahamdallie; Elise Ruark; Esty Holt; Emma Poyastro-Pearson; Anthony Renwick; Ann Strydom; Sheila Seal; Nazneen Rahman
Journal:  Wellcome Open Res       Date:  2018-06-12

3.  A machine learning approach based on ACMG/AMP guidelines for genomic variant classification and prioritization.

Authors:  Giovanna Nicora; Susanna Zucca; Ivan Limongelli; Riccardo Bellazzi; Paolo Magni
Journal:  Sci Rep       Date:  2022-02-15       Impact factor: 4.379

4.  Assessment of structural chromosomal instability phenotypes as biomarkers of carboplatin response in triple negative breast cancer: the TNT trial.

Authors:  O Sipos; H Tovey; J Quist; S Haider; S Nowinski; P Gazinska; S Kernaghan; C Toms; S Maguire; N Orr; S C Linn; J Owen; C Gillett; S E Pinder; J M Bliss; A Tutt; M C U Cheang; A Grigoriadis
Journal:  Ann Oncol       Date:  2020-10-21       Impact factor: 32.976

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.