Literature DB >> 30304370

Pisces: an accurate and versatile variant caller for somatic and germline next-generation sequencing data.

Tamsen Dunn1, Gwenn Berry1, Dorothea Emig-Agius1, Yu Jiang1, Serena Lei1, Anita Iyer1, Nitin Udar1, Han-Yu Chuang1, Jeff Hegarty1, Michael Dickover1, Brandy Klotzle1, Justin Robbins1, Marina Bibikova1, Marc Peeters2, Michael Strömberg1.   

Abstract

MOTIVATION: Next-generation sequencing technology is transitioning quickly from research labs to clinical settings. The diagnosis and treatment selection for many acquired and autosomal conditions necessitate a method for accurately detecting somatic and germline variants.
RESULTS: We have developed Pisces, a rapid, versatile and accurate small-variant calling suite designed for somatic and germline amplicon sequencing applications. Accuracy is achieved by four distinct modules, each incorporating a number of novel algorithmic strategies.
AVAILABILITY AND IMPLEMENTATION: Pisces is distributed under an open source license and can be downloaded from https://github.com/Illumina/Pisces. Pisces is available on the BaseSpace™ SequenceHub. It is distributed on Illumina sequencing platforms such as the MiSeq™ and is included in the Praxis™ Extended RAS Panel test which was recently approved by the FDA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 30304370      PMCID: PMC6499249          DOI: 10.1093/bioinformatics/bty849

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The diagnosis and treatment for many oncological conditions necessitate a method for accurately detecting somatic and germline variants (Dietel ; Dong ). Many algorithms have been developed for somatic single nucleotide variant (SNV) detection in matched tumor-normal DNA sequencing, and many algorithms have been developed for detecting germline variants, GATK being the most well-known (McKenna ). However, there is no single front runner, and different callers dominate in different situations. Particularly in the context of amplicon workflows, the standardization of variant calling pipelines remains elusive (Betge ; Horak ). Pisces is unique primarily because it excels in the difficult and common situation where no matched normal sample exists for a given tumor sample. Pisces also performs well on germline samples. Pisces requires only aligned sequence data (BAM files) and a reference genome, and it returns a variant call file with SNVs and small indels. We present an overview of the Pisces algorithms, and compare the results to alternative small-variant calling tools.

2 Materials and methods

Pisces comprises four modules, each with a novel algorithmic strategy: Pisces read stitcher: reduces noise by stitching paired reads into consensus reads. Pisces Variant Caller: calls small variants, includes a collapsing algorithm to rescue variants broken up by read boundaries. Pisces variant quality recalibrator: in the event that the variant calls overwhelmingly follow a pattern associated with thermal damage or formalin-fixed paraffin-embedded (FFPE) deamination, this step will recalculate the variant QScore given the signature of the detected noise. Pisces variant phaser (Scylla): uses a read-backed greedy clustering method to assemble small variants into complex alleles. Runtime for the Pisces Variant Caller on a 470 MB BAM (8 million reads) is 85 s. Runtime for a 2 GB BAM (60 million reads) is about 4 min. All were run with 20 threads on 2.60 GHz processors.

2.1 Testing methodology

We compared Pisces performance with the following alternative small-variant calling tools: the GATK HaplotypeCaller, LoFreq, VarDict and VarScan (Koboldt ; Lai ; Wilm ). The selection of third-party tools was based on the principle that they showed a superior performance in previous benchmarking studies (Dietel ; Horak ). Each tool chosen offers a different variant calling strategy and might be optimal in other situations. A comprehensive comparison of tools is given elsewhere (Sandmann ). For our testing, we generated BAMs from four amplicon datasets, using the Illumina amplicon aligner, and then processed the BAMs through the variant callers. The results were assessed using the Hap.py accuracy assessment tool (https://github.com/Illumina/hap.py). The datasets were selected to include both well-characterized samples and realistic cancer samples.

2.2 Datasets

All germline testing was done using established cell line samples from individuals NA12878 and NA12877 from the Coriell Institute. High-confidence variant calls are available for these individuals via Platinum Genomes build 2016-1.0 (Eberle ). These samples were run on two different panels to produce two distinct datasets. The Variant Panel was designed to target known variants in the NA12878 and NA12877 samples, specifically for the purpose of assessing the accuracy of sequencing applications. The Myeloid Panel is a commercial panel which targets genes frequently mutated in blood cancer disorders. The somatic datasets are as follows: the Titration dataset is a mixture of the NA12878 and NA12877 cell line samples, serially diluted to present a range of variant frequencies, observed down to 1%. The titrated samples were run with the Variant Panel, and cover the same high-confidence variants. The RAS Panel dataset was generated from a set of colorectal cancer tissue blocks which were FFPE treated and extracted 8–9 years later. Those samples were evaluated by alternate methods (Sanger sequencing and therascreen KRAS test by Qiagen) to provide a gold standard.

3 Results

In Table 1, we show average accuracy metrics by variant caller across all samples for each dataset. The F-score given is the average of the F1 for SNVs and the F1 for indels. Pste means the full Pisces Suite was used and Pvc means only the Pisces Variant Caller was used. In each of the four datasets, Pisces attained the highest number of best-performing metrics. For germline calling, Pisces Variant Caller alone does slightly better than the more complex pipeline. In the somatic case, best results are achieved with the full Pisces Suite. Pisces’ success with respect to indel calling is due to its variant collapsing algorithm, while the stitching algorithm enabled higher accuracy for low frequency datasets. We give more discussion in the Supplementary Results section. To conclude, Pisces is an accurate tool for small-variant detection.
Table 1.

Accuracy metrics by Variant Caller

Work flowDatasetToolSNV recallSNV precisionIndel recallIndel precision#Truth Var F1
SomaticTitrPste99.999.197.991.3210097.0
TitrPvc99.998.497.987.2210095.7
TitrLoFreq99.291.099.872.8210089.9
TitrVarDict96.875.682.285.0210084.7
RASPste98.184.1NANA63890.5
RASPvc98.378.4NANA63887.2
RASLoFreq98.366.7NANA63879.5
RASVarDict98.066.8NANA63879.4
GermlineVPPste100.0100.098.9100.0337699.7
VPPvc100.0100.0100.0100.03376100.0
VPGATK79.297.091.097.1337690.7
VPVarScan94.394.597.887.7337693.5
MylPste93.694.891.498.874994.6
MylPvc93.694.892.699.074994.9
MylGATK90.094.063.638.974971.2
MylVarScan84.493.995.758.074982.4
Accuracy metrics by Variant Caller Click here for additional data file.
  10 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

Review 2.  A 2015 update on predictive molecular pathology and its role in targeted cancer therapy: a review focussing on clinical relevance.

Authors:  M Dietel; K Jöhrens; M V Laffert; M Hummel; H Bläker; B M Pfitzner; A Lehmann; C Denkert; S Darb-Esfahani; D Lenze; F L Heppner; A Koch; C Sers; F Klauschen; I Anagnostopoulos
Journal:  Cancer Gene Ther       Date:  2015-09-11       Impact factor: 5.987

3.  Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection.

Authors:  Daniel C Koboldt; David E Larson; Richard K Wilson
Journal:  Curr Protoc Bioinformatics       Date:  2013-12

4.  Amplicon sequencing of colorectal cancer: variant calling in frozen and formalin-fixed samples.

Authors:  Johannes Betge; Grainne Kerr; Thilo Miersch; Svenja Leible; Gerrit Erdmann; Christian L Galata; Tianzuo Zhan; Timo Gaiser; Stefan Post; Matthias P Ebert; Karoline Horisberger; Michael Boutros
Journal:  PLoS One       Date:  2015-05-26       Impact factor: 3.240

Review 5.  Integrating next-generation sequencing into clinical oncology: strategies, promises and pitfalls.

Authors:  Peter Horak; Stefan Fröhling; Hanno Glimm
Journal:  ESMO Open       Date:  2016-11-18

6.  A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree.

Authors:  Michael A Eberle; Epameinondas Fritzilas; Peter Krusche; Morten Källberg; Benjamin L Moore; Mitchell A Bekritsky; Zamin Iqbal; Han-Yu Chuang; Sean J Humphray; Aaron L Halpern; Semyon Kruglyak; Elliott H Margulies; Gil McVean; David R Bentley
Journal:  Genome Res       Date:  2016-11-30       Impact factor: 9.043

7.  Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data.

Authors:  Sarah Sandmann; Aniek O de Graaf; Mohsen Karimi; Bert A van der Reijden; Eva Hellström-Lindberg; Joop H Jansen; Martin Dugas
Journal:  Sci Rep       Date:  2017-02-24       Impact factor: 4.379

8.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets.

Authors:  Andreas Wilm; Pauline Poh Kim Aw; Denis Bertrand; Grace Hui Ting Yeo; Swee Hoe Ong; Chang Hua Wong; Chiea Chuen Khor; Rosemary Petric; Martin Lloyd Hibberd; Niranjan Nagarajan
Journal:  Nucleic Acids Res       Date:  2012-10-12       Impact factor: 16.971

9.  Clinical Next Generation Sequencing for Precision Medicine in Cancer.

Authors:  Ling Dong; Wanheng Wang; Alvin Li; Rina Kansal; Yuhan Chen; Hong Chen; Xinmin Li
Journal:  Curr Genomics       Date:  2015-08       Impact factor: 2.236

10.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research.

Authors:  Zhongwu Lai; Aleksandra Markovets; Miika Ahdesmaki; Brad Chapman; Oliver Hofmann; Robert McEwen; Justin Johnson; Brian Dougherty; J Carl Barrett; Jonathan R Dry
Journal:  Nucleic Acids Res       Date:  2016-04-07       Impact factor: 16.971

  10 in total
  16 in total

1.  Genotype correlates with clinical severity in PIK3CA-associated lymphatic malformations.

Authors:  Kaitlyn Zenner; Chi Vicky Cheng; Dana M Jensen; Andrew E Timms; Giridhar Shivaram; Randall Bly; Sheila Ganti; Kathryn B Whitlock; William B Dobyns; Jonathan Perkins; James T Bennett
Journal:  JCI Insight       Date:  2019-11-01

2.  Deep oncopanel sequencing reveals within block position-dependent quality degradation in FFPE processed samples.

Authors:  Yifan Zhang; Thomas M Blomquist; Rebecca Kusko; Daniel Stetson; Zhihong Zhang; Lihui Yin; Robert Sebra; Binsheng Gong; Jennifer S Lococo; Vinay K Mittal; Natalia Novoradovskaya; Ji-Youn Yeo; Nicole Dominiak; Jennifer Hipp; Amelia Raymond; Fujun Qiu; Hanane Arib; Melissa L Smith; Jay E Brock; Daniel H Farkas; Daniel J Craig; Erin L Crawford; Dan Li; Tom Morrison; Nikola Tom; Wenzhong Xiao; Mary Yang; Christopher E Mason; Todd A Richmond; Wendell Jones; Donald J Johann; Leming Shi; Weida Tong; James C Willey; Joshua Xu
Journal:  Genome Biol       Date:  2022-06-29       Impact factor: 17.906

3.  Advanced Molecular Characterisation in Relapsed and Refractory Paediatric Acute Leukaemia, the Key for Personalised Medicine.

Authors:  Galán-Gómez Víctor; Matamala Nerea; Ruz-Caracuel Beatriz; Valle-Simón Paula; Ochoa-Fernández Bárbara; Guerra-García Pilar; Pernas-Sánchez Alicia; Minguillón Jordi; González Berta; Martínez-Romera Isabel; San Román-Pacheco Sonsoles; Estival-Monteliú Pablo; Ibáñez-Navarro Adrián; Pérez-Martínez Antonio; Escudero-López Adela
Journal:  J Pers Med       Date:  2022-05-27

4.  High Frequency of Juxtamembrane Domain ERBB2 Mutation in Gastric Cancer.

Authors:  Sujin Park; Soomin Ahn; Deok Geun Kim; Hyunjin Kim; So Young Kang; Kyoung-Mee Kim
Journal:  Cancer Genomics Proteomics       Date:  2022 Jan-Feb       Impact factor: 4.069

5.  Ideafix: a decision tree-based method for the refinement of variants in FFPE DNA sequencing data.

Authors:  Maitena Tellaetxe-Abete; Borja Calvo; Charles Lawrie
Journal:  NAR Genom Bioinform       Date:  2021-10-27

6.  Genetic, Clinicopathological, and Radiological Features of Intrahepatic Cholangiocarcinoma with Ductal Plate Malformation Pattern.

Authors:  Taek Chung; Hyungjin Rhee; Hyo Sup Shim; Jeong Eun Yoo; Gi Hong Choi; Haeryoung Kim; Young Nyun Park
Journal:  Gut Liver       Date:  2021-11-23       Impact factor: 4.321

7.  Molecular characterization of precise in vivo targeted gene integration in human cells using AAVHSC15.

Authors:  Huei-Mei Chen; Rachel Resendes; Azita Ghodssi; Danielle Sookiasian; Michael Tian; Serena Dollive; Laura Adamson-Small; Nancy Avila; Cagdas Tazearslan; John F Thompson; Jeff L Ellsworth; Omar Francone; Albert Seymour; Jason B Wright
Journal:  PLoS One       Date:  2020-05-26       Impact factor: 3.240

8.  Activated interleukin-7 receptor signaling drives B-cell acute lymphoblastic leukemia in mice.

Authors:  Kerri R Thomas; Eric J Allenspach; Nathan D Camp; Michelle N Wray-Dutra; Socheath Khim; Anna Zielinska-Kwiatkowska; Andrew E Timms; Joseph P Loftus; H Denny Liggitt; Katia Georgopoulos; Sarah K Tasian; Richard G James; David J Rawlings
Journal:  Leukemia       Date:  2021-06-30       Impact factor: 11.528

9.  Uncovering Low-Level Maternal Gonosomal Mosaicism in X-Linked Agammaglobulinemia: Implications for Genetic Counseling.

Authors:  Jacques G Rivière; Clara Franco-Jarava; Mónica Martínez-Gallo; Aina Aguiló-Cucurull; Laura Blasco-Pérez; Ida Paramonov; María Antolín; Andrea Martín-Nalda; Pere Soler-Palacín; Roger Colobran
Journal:  Front Immunol       Date:  2020-02-12       Impact factor: 7.561

10.  Intra-individual heteroplasmy in the Gentiana tongolensis plastid genome (Gentianaceae).

Authors:  Shan-Shan Sun; Xiao-Jun Zhou; Zhi-Zhong Li; Hong-Yang Song; Zhi-Cheng Long; Peng-Cheng Fu
Journal:  PeerJ       Date:  2019-11-27       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.