Literature DB >> 31350555

amplimap: a versatile tool to process and analyze targeted NGS data.

Nils Koelling1,2, Marie Bernkopf1,2, Eduardo Calpena1,2, Geoffrey J Maher1,2, Kerry A Miller1,2, Hannah K Ralph1,2, Anne Goriely1,2, Andrew O M Wilkie1,2.   

Abstract

SUMMARY: amplimap is a command-line tool to automate the processing and analysis of data from targeted next-generation sequencing experiments with PCR-based amplicons or capture-based enrichment systems. From raw sequencing reads, amplimap generates output such as read alignments, annotated variant calls, target coverage statistics and variant allele counts and frequencies for each target base pair. In addition to its focus on user-friendliness and reproducibility, amplimap supports advanced features such as consensus base calling for read families based on unique molecular identifiers and filtering false positive variant calls caused by amplification of off-target loci.
AVAILABILITY AND IMPLEMENTATION: amplimap is available as a free Python package under the open-source Apache 2.0 License. Documentation, source code and installation instructions are available at https://github.com/koelling/amplimap.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 31350555      PMCID: PMC6954648          DOI: 10.1093/bioinformatics/btz582

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Targeted next-generation sequencing (NGS), for example from PCR-generated amplicons or capture-based methods, is widely used for screening of candidate disease genes in patient cohorts (Fenwick ) or for quantification of variant allele frequencies (VAFs) to detect allele-specific expression or mosaic mutations (Bernkopf ; Reijnders ). Recently, targeted NGS techniques have also been extended to redundantly sequence the same original molecule of DNA multiple times to achieve very low error rates (Salk ). This enables the detection of somatic, sub-clonal mutations from cancer samples or mosaicism down to low levels (Acuna-Hidalgo ; Maher ). These high-fidelity protocols typically rely on the inclusion of unique molecular identifier (UMI) sequences, for example with single-molecule molecular inversion probes (smMIPs, Hiatt ). However, significant computational work needs to be carried out to translate the raw sequencing reads generated by these protocols into interpretable genomic data, such as variant calls or VAFs. In practice, the processing and analysis of targeted NGS data often involves custom scripts, written specifically for the experimental design and dataset. Thus, each new analysis requires a significant amount of hands-on work from computational specialists and may be difficult to reproduce or repeat later. Furthermore, a common challenge is the unintended amplification of highly homologous loci, such as pseudogenes (Claes and De Leeneer, 2014). These loci may be amplified when primers inadvertently hybridize to highly homologous regions, creating chimeric reads that may lead to false variant calls (Fig. 1a). Currently, such false positives are often only identified through manual comparison to pseudogene sequences.
Fig. 1.

Illustration of custom features in amplimap. (a) False positive variants due to off-target events: Amplification of pseudogenes may generate chimeric reads that match the target gene better than the pseudogene, resulting in misalignment (left alignment, shown in green/dark grey) and pseudogene-specific bases being called as variants. (b) Trimming primers before alignment helps detect chimeric reads generated by off-target events: The trimmed read aligns to the pseudogene (right alignment, shown in green/dark grey), avoiding a false positive variant call. (c) Consensus calls are determined within each read family and filtered with user-defined stringency thresholds, resulting in more accurate allele counts. (Color version of this figure is available at Bioinformatics online.)

Illustration of custom features in amplimap. (a) False positive variants due to off-target events: Amplification of pseudogenes may generate chimeric reads that match the target gene better than the pseudogene, resulting in misalignment (left alignment, shown in green/dark grey) and pseudogene-specific bases being called as variants. (b) Trimming primers before alignment helps detect chimeric reads generated by off-target events: The trimmed read aligns to the pseudogene (right alignment, shown in green/dark grey), avoiding a false positive variant call. (c) Consensus calls are determined within each read family and filtered with user-defined stringency thresholds, resulting in more accurate allele counts. (Color version of this figure is available at Bioinformatics online.) Here, we describe amplimap, a versatile tool that solves these challenges and enables the efficient processing and analysis of data from targeted NGS experiments.

2 Features and implementation

amplimap was developed in close collaboration with experimental scientists to ensure maximum user-friendliness and wide applicability. Processing a dataset only requires running a single command. Analyses can be customized in a variety of ways to fit the exact experiment performed. Input files are automatically checked for problems and tutorials are provided to act as a blueprint for common experiments. Results are provided as tables in CSV format, which can easily be loaded into Python, R or Excel for further analysis. The main component of amplimap is the amplimap command-line executable, which has been tested with Python 3.5+ on current versions of Linux, MacOS and Windows (through the Windows Subsystem for Linux). Internally, amplimap uses the Snakemake work flow management package (Köster and Rahmann, 2012) to automate the efficient execution of external software as well as custom code. An overview of the pipeline and its input and output files is available from https://amplimap.readthedocs.io.

2.1 Annotated variant table and target coverage data

amplimap can create an aggregate table of all germline variants in all samples, including annotation such as the affected gene, the predicted deleteriousness of the variant and its frequency in reference populations. Additional target coverage tables give an overview of how thoroughly each target was sequenced in each sample.

2.2 Primer trimming and detection of off-target events

Primer sequences can be detected and trimmed by amplimap before alignment. This helps remove false positive variant calls caused by off-target amplification (Fig. 1b). The locations of off-target alignments are reported to allow further investigation. In addition, primer trimming also corrects skewed VAFs in regions where primers overlap another amplicon.

2.3 Read family consensus calls

To enable high-fidelity VAF quantification, amplimap can group reads based on identical UMIs (e.g. from smMIPs). When calculating allele counts and fractions, amplimap uses these read families instead of the individual reads to obtain more accurate, error-corrected consensus calls (Fig. 1c). The stringency of the consensus calls can be adjusted by specifying the minimum number of reads required to form a read family as well as the minimum fraction of reads supporting a consensus call. amplimap also calculates metrics such as the number of reads per family to provide quality control information and support future experimental design.

2.4 Tutorials

Tutorials are available from https://amplimap.readthedocs.io showing how to apply amplimap in three different contexts: variant calling in a patient panel, quantification of allele-specific expression and the identification of low-frequency somatic mutations with UMI-tagged smMIPs.
  9 in total

1.  Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation.

Authors:  Joseph B Hiatt; Colin C Pritchard; Stephen J Salipante; Brian J O'Roak; Jay Shendure
Journal:  Genome Res       Date:  2013-02-04       Impact factor: 9.043

2.  Snakemake--a scalable bioinformatics workflow engine.

Authors:  Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2012-08-20       Impact factor: 6.937

3.  Ultra-sensitive Sequencing Identifies High Prevalence of Clonal Hematopoiesis-Associated Mutations throughout Adult Life.

Authors:  Rocio Acuna-Hidalgo; Hilal Sengul; Marloes Steehouwer; Maartje van de Vorst; Sita H Vermeulen; Lambertus A L M Kiemeney; Joris A Veltman; Christian Gilissen; Alexander Hoischen
Journal:  Am J Hum Genet       Date:  2017-06-29       Impact factor: 11.025

4.  Dealing with pseudogenes in molecular diagnostics in the next-generation sequencing era.

Authors:  Kathleen B M Claes; Kim De Leeneer
Journal:  Methods Mol Biol       Date:  2014

Review 5.  Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations.

Authors:  Jesse J Salk; Michael W Schmitt; Lawrence A Loeb
Journal:  Nat Rev Genet       Date:  2018-03-26       Impact factor: 53.242

6.  Quantification of transmission risk in a male patient with a FLNB mosaic mutation causing Larsen syndrome: Implications for genetic counseling in postzygotic mosaicism cases.

Authors:  Marie Bernkopf; David Hunt; Nils Koelling; Tim Morgan; Amanda L Collins; Joanna Fairhurst; Stephen P Robertson; Andrew G L Douglas; Anne Goriely
Journal:  Hum Mutat       Date:  2017-07-06       Impact factor: 4.878

7.  Mutations in CDC45, Encoding an Essential Component of the Pre-initiation Complex, Cause Meier-Gorlin Syndrome and Craniosynostosis.

Authors:  Aimee L Fenwick; Maciej Kliszczak; Fay Cooper; Jennie Murray; Luis Sanchez-Pulido; Stephen R F Twigg; Anne Goriely; Simon J McGowan; Kerry A Miller; Indira B Taylor; Clare Logan; Sevcan Bozdogan; Sumita Danda; Joanne Dixon; Solaf M Elsayed; Ezzat Elsobky; Alice Gardham; Mariette J V Hoffer; Marije Koopmans; Donna M McDonald-McGinn; Gijs W E Santen; Ravi Savarirayan; Deepthi de Silva; Olivier Vanakker; Steven A Wall; Louise C Wilson; Ozge Ozalp Yuregir; Elaine H Zackai; Chris P Ponting; Andrew P Jackson; Andrew O M Wilkie; Wojciech Niedzwiedz; Louise S Bicknell
Journal:  Am J Hum Genet       Date:  2016-06-30       Impact factor: 11.025

8.  De Novo and Inherited Loss-of-Function Variants in TLK2: Clinical and Genotype-Phenotype Evaluation of a Distinct Neurodevelopmental Disorder.

Authors:  Margot R F Reijnders; Kerry A Miller; Mohsan Alvi; Jacqueline A C Goos; Melissa M Lees; Anna de Burca; Alex Henderson; Alison Kraus; Barbara Mikat; Bert B A de Vries; Bertrand Isidor; Bronwyn Kerr; Carlo Marcelis; Caroline Schluth-Bolard; Charu Deshpande; Claudia A L Ruivenkamp; Dagmar Wieczorek; Diana Baralle; Edward M Blair; Hartmut Engels; Hermann-Josef Lüdecke; Jacqueline Eason; Gijs W E Santen; Jill Clayton-Smith; Kate Chandler; Katrina Tatton-Brown; Katelyn Payne; Katherine Helbig; Kelly Radtke; Kimberly M Nugent; Kirsten Cremer; Tim M Strom; Lynne M Bird; Margje Sinnema; Maria Bitner-Glindzicz; Marieke F van Dooren; Marielle Alders; Marije Koopmans; Lauren Brick; Mariya Kozenko; Megan L Harline; Merel Klaassens; Michelle Steinraths; Nicola S Cooper; Patrick Edery; Patrick Yap; Paulien A Terhal; Peter J van der Spek; Phillis Lakeman; Rachel L Taylor; Rebecca O Littlejohn; Rolph Pfundt; Saadet Mercimek-Andrews; Alexander P A Stegmann; Sarina G Kant; Scott McLean; Shelagh Joss; Sigrid M A Swagemakers; Sofia Douzgou; Steven A Wall; Sébastien Küry; Eduardo Calpena; Nils Koelling; Simon J McGowan; Stephen R F Twigg; Irene M J Mathijssen; Christoffer Nellaker; Han G Brunner; Andrew O M Wilkie
Journal:  Am J Hum Genet       Date:  2018-05-31       Impact factor: 11.025

9.  Selfish mutations dysregulating RAS-MAPK signaling are pervasive in aged human testes.

Authors:  Geoffrey J Maher; Hannah K Ralph; Zhihao Ding; Nils Koelling; Hana Mlcochova; Eleni Giannoulatou; Pawan Dhami; Dirk S Paul; Stefan H Stricker; Stephan Beck; Gilean McVean; Andrew O M Wilkie; Anne Goriely
Journal:  Genome Res       Date:  2018-10-24       Impact factor: 9.043

  9 in total
  3 in total

1.  Sensitive screening of single nucleotide polymorphisms in cell free DNA for diagnosis of gestational tumours.

Authors:  Mark R Openshaw; Michael J Seckl; Geoffrey J Maher; Rosemary A Fisher; Baljeet Kaur; Xianne Aguiar; Preetha Aravind; Natashia Cedeno; James Clark; Debbie Damon; Ehsan Ghorani; Adam Januszewski; Foteini Kalofonou; Ravindhi Murphy; Rajat Roy; Naveed Sarwar
Journal:  NPJ Genom Med       Date:  2022-04-08       Impact factor: 8.617

2.  New locus underlying auriculocondylar syndrome (ARCND): 430 kb duplication involving TWIST1 regulatory elements.

Authors:  Vanessa Luiza Romanelli Tavares; Sofia Ligia Guimarães-Ramos; Yan Zhou; Cibele Masotti; Suzana Ezquina; Danielle de Paula Moreira; Henk Buermans; Renato S Freitas; Johan T Den Dunnen; Stephen R F Twigg; Maria Rita Passos-Bueno
Journal:  J Med Genet       Date:  2021-11-08       Impact factor: 5.941

3.  Unexpected role of SIX1 variants in craniosynostosis: expanding the phenotype of SIX1-related disorders.

Authors:  Eduardo Calpena; Maud Wurmser; Simon J McGowan; Rodrigo Atique; Débora R Bertola; Michael L Cunningham; Jonas A Gustafson; David Johnson; Jenny E V Morton; Maria Rita Passos-Bueno; Andrew T Timberlake; Richard P Lifton; Steven A Wall; Stephen R F Twigg; Pascal Maire; Andrew O M Wilkie
Journal:  J Med Genet       Date:  2021-01-12       Impact factor: 6.318

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.