Literature DB >> 28484262

BAMClipper: removing primers from alignments to minimize false-negative mutations in amplicon next-generation sequencing.

Chun Hang Au1, Dona N Ho1, Ava Kwong2,3,4, Tsun Leung Chan1, Edmond S K Ma5.   

Abstract

Amplicon-based next-generation sequencing (NGS) has been widely adopted for genetic variation detection in human and other organisms. Conventional data analysis paradigm includes primer trimming before read mapping. Here we introduce BAMClipper that removes primer sequences after mapping original sequencing reads by soft-clipping SAM/BAM alignments. Mutation detection accuracy was affected by the choice of primer handling approach based on real NGS datasets of 7 human peripheral blood or breast cancer tissue samples with known BRCA1/BRCA2 mutations and >130000 simulated NGS datasets with unique mutations. BAMClipper approach detected a BRCA1 deletion (c.1620_1636del) that was otherwise missed due to edge effect. Simulation showed high false-negative rate when primers were perfectly trimmed as in conventional practice. Among the other 6 samples, variant allele frequencies of 5 BRCA1/BRCA2 mutations (indel or single-nucleotide variants) were diluted by apparently wild-type primer sequences from an overlapping amplicon (17 to 82% under-estimation). BAMClipper was robust in both situations and all 7 mutations were detected. When compared with Cutadapt, BAMClipper was faster and maintained equally high primer removal effectiveness. BAMClipper is implemented in Perl and is available under an open source MIT license at https://github.com/tommyau/bamclipper.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28484262      PMCID: PMC5431517          DOI: 10.1038/s41598-017-01703-6

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Amplicon-based next-generation sequencing (NGS) has been one of the major platforms for high-throughput germline[1, 2] and somatic mutation detection[3-7]. Various experimental strategies[8], including traditional or multiplex polymerase chain reaction (PCR) and selective circularization probes, all converge to a common molecular design of NGS sequencing library (Fig. 1A).
Figure 1

Amplicon library design and bioinformatics approaches of handling gene-specific primers. (A) Gene-specific primer sequences are present as part of NGS reads. The observed primer sequences are usually identical to reference genome sequence but may be slightly different due to errors in primer synthesis and/or sequencing. Common read mapping tools are not aware of the amplicon library design and thus map the primers as if they are part of region of interest that lies in between. Although sequencing adapters may exist as part of NGS reads (depending of amplicon length and sequencing read length), they will become soft-clipped after mapping due to the lack of similarity to reference genome by design. Soft-clipped part of alignments is usually ignored by downstream processing. (B) In primer handling approach 1, primer sequence was trimmed from sequencing reads (in FASTQ format) and the shorter trimmed reads are mapped to give BAM alignments for downstream variant calling and quality control such as sequencing depth statistics. In approach 2, original reads are directly mapped that primers are present in BAM alignments as if they are part of region of interest. In approach 3 represented by BAMClipper, reads are also directly mapped but BAM alignments are further processed to soft-clip primer sequences as if they were sequencing adapters.

Amplicon library design and bioinformatics approaches of handling gene-specific primers. (A) Gene-specific primer sequences are present as part of NGS reads. The observed primer sequences are usually identical to reference genome sequence but may be slightly different due to errors in primer synthesis and/or sequencing. Common read mapping tools are not aware of the amplicon library design and thus map the primers as if they are part of region of interest that lies in between. Although sequencing adapters may exist as part of NGS reads (depending of amplicon length and sequencing read length), they will become soft-clipped after mapping due to the lack of similarity to reference genome by design. Soft-clipped part of alignments is usually ignored by downstream processing. (B) In primer handling approach 1, primer sequence was trimmed from sequencing reads (in FASTQ format) and the shorter trimmed reads are mapped to give BAM alignments for downstream variant calling and quality control such as sequencing depth statistics. In approach 2, original reads are directly mapped that primers are present in BAM alignments as if they are part of region of interest. In approach 3 represented by BAMClipper, reads are also directly mapped but BAM alignments are further processed to soft-clip primer sequences as if they were sequencing adapters. The NGS reads usually contain both gene-specific primers (from the reagent) and region of interest (ROI, from DNA sample of interest) and have fixed start positions defined by primer sites. Common read mapping tools map primer sequences in the same way as ROI because of their equally high similarity to reference genome sequence. Common bioinformatics workflows may thus include a primer trimming step before mapping to remove primer sequence from NGS reads and avoid potential interference (approach 1, Fig. 1B), for example variant allele frequency (VAF) dilution by apparently wild-type primer sequences. However, variant calling edge effect occurred for single-nucleotide variants (SNV) such that SNV in ROI near primer sites were prone to escape variant calling[9]. In some specific situations, primer trimming is not necessary (approach 2, Fig. 1B). Some genes of clinical importance have well-known hotspot mutational profile[10] (e.g. gain-of-function mutations of EGFR and KRAS) and primers are designed to avoid any potential edge effect. In case of short exons or ROI, amplicon tiling is not needed so that ROI of one amplicon is not overlapped with and not interfered by primer of another amplicon. However, there are genes having diverse mutational profile[10] (e.g. loss-of-function mutations of BRCA1 and BRCA2) or less characterized profile and full exons are usually covered by tiling amplicons. For gene panels having these 2 kinds of genes, both approaches 1 and 2 are needed to complement each other for comprehensive mutation detection. Mutation detection accuracy is maximized, but computing and manual interpretation efforts are doubled. In this study, we devised primer clipping after mapping as an alternative approach of primer removal (approach 3, Fig. 1B). Existing tools of approach 3 include PcrClipReads (https://github.com/lindenb/jvarkit/wiki/PcrClipReads) that performs primer clipping on non-overlapping amplicons by design. GATK[11] ClipReads performs primer clipping based on exact sequence matching regardless of its position within sequencing reads. Thus, GATK ClipReads will improperly handle overlapping amplicons due to sequence cross-matching and cannot recognize primers in case of any mismatch (sequencing errors for example). MiSeq Reporter Custom Amplicon Workflow (https://www.illumina.com/systems/miseq/software/miseq-reporter.html) properly clips primers from overlapping amplicons but the software is proprietary and supports specific sequencing platforms and commercial amplicon designs only. We implemented a tool called BAMClipper, which natively supports overlapping amplicons and any gene panel design. During evaluation of a breast and ovarian cancer gene panel for peripheral blood and breast cancer tumour tissue, BAMClipper approach detected BRCA1 and BRCA2 mutations that could otherwise be missed in conventional approaches due to edge or dilution effect. By experiments and simulation, insertion or deletions (indels) were shown to be susceptible to variant calling edge effect as in SNV. In addition to improving mutation detection accuracy, BAMClipper was more efficient in computation and maintained equally high primer removal effectiveness when compared with conventional primer trimming approach.

Results

Variant calling edge effect when primers are trimmed

During evaluation of an amplicon-based breast and ovarian cancer gene panel with positive controls, we encountered a false-negative result for a BRCA1 17 nt deletion (c.1620_1636del). BWA-MEM alignments of the corresponding ROI were found to be soft-clipped when gene-specific primers were trimmed from NGS sequencing reads before read mapping (approach 1, Fig. 2). The soft-clipping removed wild-type sequence adjacent to the deletion site from read alignments of deletion allele. Alignment pileup therefore did not contain any deletion events for conventional variant callers to report. When primer trimming was skipped (approach 2) or primer clipping was performed after mapping (approach 3), the deletion was properly aligned and detected by variant callers (Fig. 2). These observations remained the same when the alignment algorithm was replaced with BowTie 2 (see Supplementary Fig. S1). Since SNV were known to be susceptible to variant calling edge effects in amplicon sequencing[9], we hypothesized that indels were also susceptible to similar edge effects.
Figure 2

A BRCA1 deletion escaped from variant calling when primers were trimmed before mapping. NGS read alignments of BRCA1 c.1620_1636del allele from three primer handling approaches are shown in conjunction with the amplicon design and reference genome sequence. Individual forward and reverse sequencing reads after any soft-clipping were represented by red and purple horizontal lines, respectively. The expected deletion event (black box) was present in the alignments from approaches 2 and 3 only.

A BRCA1 deletion escaped from variant calling when primers were trimmed before mapping. NGS read alignments of BRCA1 c.1620_1636del allele from three primer handling approaches are shown in conjunction with the amplicon design and reference genome sequence. Individual forward and reverse sequencing reads after any soft-clipping were represented by red and purple horizontal lines, respectively. The expected deletion event (black box) was present in the alignments from approaches 2 and 3 only. We simulated NGS reads representing 420 unique insertions and 420 unique deletions of different lengths (1 to 20 nt) at 21 different positions of the aforementioned BRCA1 amplicon (Fig. 3A). All three approaches of primer handling were applied to the simulated reads for variant calling. In approaches 1, 2 and 3, 40%, 100% and 93% of insertions were detected and 30%, 100%, 98% of deletions were detected, respectively (Fig. 3B). If primers were trimmed from sequencing reads before mapping, indels were susceptible to variant calling edge effects. Longer indels and indels closer to gene-specific primer boundary were more susceptible to the edge effects (Fig. 3C). Regarding the 29 insertions (7%) and 7 deletions (2%) missed in approach 3, although BAMClipper clipped the primer sequences properly and the corresponding indels remained in both alignments and alignment pileup, they were not called by variant caller SAMtools.
Figure 3

Indels are susceptible to variant calling edge effects as shown by simulation. (A) Simulation scheme of 420 insertions and 420 deletions with 20 different lengths at 21 different positions. (B) Venn diagram of called insertions or deletions in 3 approaches of primer handling. (C) Heat map of length and position of called insertions or deletions.

Indels are susceptible to variant calling edge effects as shown by simulation. (A) Simulation scheme of 420 insertions and 420 deletions with 20 different lengths at 21 different positions. (B) Venn diagram of called insertions or deletions in 3 approaches of primer handling. (C) Heat map of length and position of called insertions or deletions. To explore whether the edge effects also applied to overlapping amplicons, we simulated NGS reads for 68045 unique insertions and 64637 unique deletions similarly (length 1 to 20 nt) at 3446 different positions for the entire BRCA1 exon 11. A total of 38 overlapping amplicons covered the region. In approaches 1, 2 and 3, 88.0%, 96.7% and 96.5% of insertions were detected and 76.9%, 96.7%, 93.2% of deletions were detected, respectively (see Supplementary Fig. S2). Similar edge effect was observed even in overlapping amplicon setting.

VAF dilution effect when primers are not removed

Since higher indel detection rate was demonstrated in approaches 2 and 3, we evaluated these approaches with 6 additional BRCA1/BRCA2 positive controls (Table 1). Although all 6 mutations were detected in both approaches 2 and 3, the variant allele frequencies (VAF) of approach 2 (range: 9–60%) were consistently lower than approach 3 (range: 46–74%). Five of the 6 mutations were found to overlap with a gene-specific primer of another overlapping amplicon owing to the tiling amplicon design (Fig. 4). Primer sequences remained in approach 2 contributed to wild-type allele frequency and in turn underestimated VAF within region of interest of other overlapping amplicons. VAF underestimation ranged from 17% to 82% (Fig. 4) and VAF was not affected by BAMClipper if the mutation did not overlap with other primer site (Table 1). Since approach 3 properly handled primer sequences in NGS reads, downstream VAF determination was not interfered by overlapping primer sites. Overall, although approach 2 was robust to detect indels near primer, it was susceptible to VAF dilution effect. Approach 3 was versatile to detect indels near primer and mutations overlapping with other primer sequences.
Table 1

Variant allele frequency underestimation due to overlapping primer site.

SampleMutationVAFOverlap with other primer site?VAF underestimation
Approach 2Approach 3
NDH1 BRCA1 c.4372C>T9%51%Yes82%
PMH1 BRCA2 c.8023A>G19%49%Yes61%
TWH1 BRCA2 c.1261C>T50%74%Yes32%
TWH2 BRCA2 c.3109C>T51%51%No0%
TWH3 BRCA1 c.502A>T17%46%Yes63%
QMH1 BRCA1 c.3858_3861del60%72%Yes17%
Figure 4

Dilution of variant allele frequency when primers are not clipped after mapping. NGS read alignments in BRCA1 c.4372C>T region from primer handling approaches 2 and 3 are shown in conjunction with the amplicon design and reference genome sequence. The c.4372C>T mutation is located in the region of interest of one amplicon and gene-specific primer site of another amplicon. Since primer sequences retained in approach 2 contributed to wild-type allele frequency, VAF of c.4372C>T was in turn underestimated by 82% (9% in approach 2 and 51% in approach 3).

Variant allele frequency underestimation due to overlapping primer site. Dilution of variant allele frequency when primers are not clipped after mapping. NGS read alignments in BRCA1 c.4372C>T region from primer handling approaches 2 and 3 are shown in conjunction with the amplicon design and reference genome sequence. The c.4372C>T mutation is located in the region of interest of one amplicon and gene-specific primer site of another amplicon. Since primer sequences retained in approach 2 contributed to wild-type allele frequency, VAF of c.4372C>T was in turn underestimated by 82% (9% in approach 2 and 51% in approach 3).

Effective primer removal and efficient computing

Performance characteristics of BAMClipper were further evaluated by various scenarios of primer clipping. Cutadapt (a representative tool of approach 1) was also applied to the same scenarios for benchmarking. BAMClipper removed 99.96% of gene-specific primers without affecting the regions of interest (i.e. without over-clipping), achieving comparable primer removal efficiency as Cutadapt (99.82% of gene-specific primers removed) (see Supplementary Fig. S3). For genes with well-known hotspot mutation profile, mutation detection sensitivity of approach 3 was as good as approaches 1 and 2 (see Supplementary Table S3), with one notable difference: the 52 nt hotspot deletion in CALR was detected in approaches 2 and 3 but not approach 1 due to similar edge effect. BAMClipper was consistently faster than Cutadapt, particularly for larger amplicon panel, longer sequencing reads and more sequencing reads (see Supplementary Fig. S3).

Discussion

Amplicon-based next-generation sequencing has been widely adopted for genomic DNA mutation detection in both clinical diagnostics and research settings. Common bioinformatics workflows utilize tools originally designed for random fragment sequencing library (e.g. whole-genome sequencing, exome sequencing by hybridization capture). Although the tools are largely compatible with amplicon library, bioinformatics challenges exist in some scenarios, such as the detection of FLT3 internal tandem duplication in acute myeloid leukaemia by amplicon sequencing[3]. In this study, we presented the bioinformatics challenges encountered during evaluation of a breast and ovarian cancer gene panel, namely variant calling edge effect and variant allele frequency dilution effect when handling gene-specific primers in amplicon sequencing library. Edge effect led to a false-negative result of a BRCA1 17 nt deletion if primer trimming was performed according to common bioinformatics workflows. We further showed that such edge effect was not specific to this particular mutation but generally applicable to both single amplicon setting (65.1% false-negative among 840 unique indels) and overlapping amplicon setting (17.4% false-negative among 132682 unique indels). Indels were thus shown to be susceptible to similar edge effect reported for single-nucleotide variants[9]. When primer trimming was skipped for a panel with amplicon tiling, evaluation of 6 additional BRCA1/BRCA2 mutations revealed that 5 of the mutations were susceptible to VAF dilution effect (under-estimation by 17% to 82%). The dilution effect applied to both SNV and indels. Since the evaluated breast and ovarian cancer panel used amplicon tiling to cover protein-coding regions, 31% of ROI are overlapping with primer of other amplicon(s) and susceptible to the dilution effect. Whether a mutation affected by dilution effect would be missed by variant calling might depend on the variant calling algorithms (e.g. any VAF threshold) and sequencing depth uniformity among overlapping amplicons. Even though the 6 tumour DNA samples were derived from patients with a germline mutation (VAF >30% expected), VAF of a mutation dropped to below 10%. The situation represented possible false-negative scenarios of two common primer handling approaches in germline and somatic mutation detection due to the two effects. Suitability of primer trimming depends on the genes of interest. For genes with well-known hotspot mutational profile (e.g. gain-of-function mutations of EGFR and KRAS) and their hotspot regions well covered by primers without amplicon tiling, primer trimming is not mandatory. However, for genes with diverse mutational profile (e.g. loss-of-function mutations of BRCA1 and BRCA2) that necessitated their whole coding regions to be covered by amplicon tiling, primer trimming avoided dilution effect at the expense of sensitivity of mutations near primers due to edge effect. Taking the evaluated breast and ovarian cancer panel as an example, 40 nt of region of interest (20 nt of each end) for each amplicon was susceptible to edge effects if indels up to 20 nt are targeted. Since the mean ROI length is 110 nt for this panel to accommodate somatic mutation detection from degraded DNA (e.g. formalin-fixed paraffin-embedded tissue), 36% of ROI are susceptible to edge effect. When a single panel comprise both kinds of genes or genes with less characterized mutational profile, both primer handling approaches were needed to complement each other. We demonstrated that primer clipping was a general approach for mutation detection in both genes with diverse mutational profile and genes with well-known hotspot mutational profile (Fig. 2 and Supplementary Table S1). Although some commercial gene panel products implemented proprietary primer clipping solution to handle both kinds of genes properly (e.g. MiSeq Reporter results of TruSight Myeloid panel did not need redundant primer clipping by BAMClipper), there was a lack of general bioinformatics tool for primer clipping of most other gene panel designs. We implemented BAMClipper as a novel approach of primer handling. By clipping primers after mapping, mutation detection accuracy was improved by minimizing false-negative results caused by edge effect and dilution effect. Well-established variant calling and quality control tools (e.g. SAMtools and bam-readcount) were tested to be compatible with this approach. Although the junction of primer and ROI was shown to be a variant calling blind spot by simulation of a single amplicon (even though the primers were properly clipped), the junction was covered by another amplicon in the evaluated panel that ROI overlapped by at least 1 nt. Since amplicon tiling was a common practice in commercial or custom panel design, few existing panels were actually affected. The choice of primer handling approaches was shown to affect germline and somatic mutation detection accuracy. False-negative BRCA1/BRCA2 mutation could affect the management of a high-risk cancer patient and his/her family members and could result in a missed opportunity of targeted therapy (e.g. poly(ADP-ribose) polymerases inhibitors in breast and ovarian cancers[12]). Since BAMClipper was implemented in Perl and did not involve new file format, it could be deployed in existing bioinformatics workflows with minimal efforts. It was computation-efficient because the primers were already aligned by mapping tools and BAMClipper manipulated the alignments. Computing time saved by mapping shorter trimmed reads in conventional approach was usually exceeded by the trimming process. Even though advances in trimming algorithms could result in perfect trimming with minimal computing time, edge effect could persist as shown in the simulation. BAMClipper maintained the same high level of primer removal effectiveness as conventional trimming approach. We expected the new primer handling approach could improve amplicon-based germline and somatic mutation detection workflows. To conclude, primers should be removed after mapping original sequencing reads for optimized genetic variation detection by amplicon sequencing. BAMClipper is an open-source, fast and effective tool to remove primers from mapped alignments of amplicon sequencing reads.

Methods

Implementation of BAMClipper algorithm

The BAMClipper algorithm was implemented mainly in Perl to read mapped alignments (standard SAM/BAM format[13]) and corresponding primer design (standard BEDPE format[14]) and return modified alignments in the same format. Gene-specific primer sequences were identified by comparing mapped positions of input alignments and reference positions of primer design (no de novo sequence alignment process was involved and thus the algorithm tolerated sequencing errors in observed primer sequences). Input alignments (a series of Concise Idiosyncratic Gapped Alignment Report or CIGAR operators in the alignment field of SAM/BAM format) were modified so that primer sequences were represented as S operators instead of M, I or D operators, as if they were sequencing adapters not matching to the reference genome and hence became soft-clipped during mapping. Output primer-clipped alignments remained in the standard BAM format that no modification of downstream variant calling (e.g. SAMtools[13] and GATK[11]) or quality control (e.g. coverage-depth analysis by bam-readcount) was needed for analysis. BAMClipper was tested to be sequencing platform-independent by clipping primers from amplicon sequencing reads from both Illumina MiSeq and Ion Torrent PGM platforms. It was shown to be aligner-independent by processing BWA-MEM and BowTie 2 alignments of same sequencing data. It was tested on CentOS Linux 5.5 and depended on the built-in Perl 5 installation of any Linux/Unix-like environment without compilation. A multi-threaded pipeline was also implemented (a single shell script bamclipper.sh) to leverage multiple processor cores and required SAMtools[13] and GNU Parallel (https://www.gnu.org/software/parallel/).

Germline and somatic mutation detection by a breast and ovarian cancer panel

Totally 7 positive control DNA samples were employed to evaluate the Qiagen GeneRead DNAseq v2 Human Breast Cancer Panel, which comprised 2915 amplicons targeting 44 genes. DNA was extracted from 1 peripheral blood sample and 6 frozen tumour tissue samples of 7 breast cancer patients with a known germline BRCA1/BRCA2 mutation recruited from Hong Kong Hereditary and High Risk Breast Cancer Programme and samples obtained from the tissue bank of Hong Kong Hereditary Breast Cancer Family Registry[1]. PCR amplicons were prepared according to manufacturer’s instructions followed by sequencing library preparation based on KAPA Library Preparation Kit for Illumina platforms version 1.13. Paired-end sequencing runs were performed on a MiSeq with reagent kit v3 (2 × 230 nucleotides (nt)). Primer sequences were trimmed from sequencing reads by Cutadapt[15] version 1.4.1 (parameters -m 0 -e 0.1 -n 2). Original sequencing reads or primer-trimmed reads were mapped to reference genome sequence using BWA-MEM[16] version 0.7.7 (default parameters) or BowTie 2[17] version 2.2.9 (default parameters). Variant calling and variant allele frequency (VAF) measurement was performed by SAMtools[13] version 1.3 and assisted by manual alignment examination using IGV[18] version 2.1.30.

Simulation of sequencing reads for insertions and deletions

In silico simulation targeted the single BRCA1 amplicon from the breast and ovarian cancer panel covering the BRCA1 mutation c.1620_1636del (amplicon: chr17:41245872-41246030; ROI excluding both gene-specific primers chr17:41245895-41246009). A single pair of sequencing reads was simulated for each unique insertion or deletion with 0% error rate and 100% VAF. Insertion sequence was based on a template without known similarity to reference genome (5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′) and 3′ trimmed to desirable length. Simulated reads of 840 unique indels were mapped to reference genome by BWA-MEM version 0.7.7 (default parameters) followed by variant calling using SAMtools version 1.3. Another similar simulation was performed for the entire BRCA1 exon 11 (chr17:41243442-41246887) in the same breast and ovariant cancer panel. Sequencing reads of all 38 overlapping amplicons were simulated for each unique insertion or deletion with 0% error rate and 100% VAF. Indels were excluded for simulation if they overlapped with primer binding site and sequencing reads could not represent the mutations.

Performance characterization

Performance of BAMClipper and Cutadapt was characterized using amplicon NGS datasets of a myeloid neoplasm gene panel with 568 amplicons (Illumina TruSight Myeloid sequencing panel). Data based on peripheral blood sample of a de-identified healthy adult with normal complete blood profile[3] was used to assess computing time with variable amplicon count, sequencing read length and sequencing read count, while data of the HapMap sample NA12878 (sample 2 of the public datasets “MiSeq v3: TruSight Myeloid (Coriell & HorizonDx, Pool 1)” available from https://basespace.illumina.com) was used to assess primer removal effectiveness. Data based on 8 mutation control samples from a previous study[3] were used to compare the mutation detection sensitivity of all 3 primer handling approaches. Identical Cutadapt parameters as in the previous Methods were employed to work on FASTQ sequencing reads. Default BAMClipper parameters were used to work on corresponding BWA-MEM alignments (default parameters with BWA-MEM computing time ignored). Wall-clock computing time of BAMClipper and Cutadapt was measured based on 16 CPU cores (Intel Xeon E5 processors) (see Supplementary Note). For primer removal effectiveness, BWA-MEM alignment depth of original sequencing reads (2 × 151 nt) of ROI or gene-specific primers was measured by bam-readcount (https://github.com/genome/bam-readcount) and served as reference. Reference positions of depth <500X were excluded from the following analysis. Alignment depth of BAMClipper and Cutadapt final BAM files was also measured similarly to calculate median decrease of depth after primer removal for each amplicon. Data from each amplicon were handled separately so that depth measurement was not interfered by any overlapping amplicons. For amplicon count, one million (1 M) read pairs (2 × 275 nt) were first randomly sampled. Full panel (568 amplicons, 5 replicates) or random sample of 50 to 550 amplicons (5 random sampling without replacement for each amplicon count) was passed to BAMClipper and Cutadapt for primer removal of the same 1 M read pairs. For sequencing read length, the same 1 M read pairs were further trimmed to pairs of 100, 125, 150, 175, 200, 225 or 250 nt sequencing reads using Seqtk (https://github.com/lh3/seqtk) (trimfq -q 1). For sequencing read count, 1000 to 1000000 sequencing read pairs (2 × 275 nt; 5 random sampling without replacement for each read count) were randomly sampled.

Reference sequences

Human reference genome sequence: GRCh37/hg19, BRCA1: NM_007294.3, BRCA2: NM_000059.3. Variants were described according to the recommendations of Human Genome Variation Society (HGVS)[19]. Variant descriptions were checked by Mutalyzer Name Checker[19].

Ethics statement

The study involving the breast and ovarian cancer panel was approved by the Institutional Review Board of the University of Hong Kong/Hospital Authority West Cluster and other contributing hospitals in Hong Kong. The study involving the myeloid neoplasm panel was approved by the Research Ethics Committee of Hong Kong Sanatorium & Hospital (reference number: REC-2015-02). All participants gave written informed consent; with the exception that informed consent is not needed for the use of a pre-existing peripheral blood sample from a de-identified healthy adult and 8 de-identified mutation control samples. All methods were performed in accordance with the relevant guidelines and regulations of the institution.
  18 in total

1.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

2.  Homoharringtonine (omacetaxine mepesuccinate) as an adjunct for FLT3-ITD acute myeloid leukemia.

Authors:  Stephen S Y Lam; Eric S K Ho; Bai-Liang He; Wui-Wing Wong; Chae-Yin Cher; Nelson K L Ng; Cheuk-Him Man; Harinder Gill; Alice M S Cheung; Ho-Wan Ip; Chi-Chiu So; Jerome Tamburini; Chi Wai Eric So; Dona N Ho; Chun-Hang Au; Tsun-Leung Chan; Edmond S K Ma; Raymond Liang; Yok-Lam Kwong; Anskar Y H Leung
Journal:  Sci Transl Med       Date:  2016-10-05       Impact factor: 17.956

3.  Panel Testing for Familial Breast Cancer: Calibrating the Tension Between Research and Clinical Care.

Authors:  Ella R Thompson; Simone M Rowley; Na Li; Simone McInerny; Lisa Devereux; Michelle W Wong-Brown; Alison H Trainer; Gillian Mitchell; Rodney J Scott; Paul A James; Ian G Campbell
Journal:  J Clin Oncol       Date:  2016-01-19       Impact factor: 44.544

4.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

5.  BEDTools: a flexible suite of utilities for comparing genomic features.

Authors:  Aaron R Quinlan; Ira M Hall
Journal:  Bioinformatics       Date:  2010-01-28       Impact factor: 6.937

6.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

7.  Edge effects in calling variants from targeted amplicon sequencing.

Authors:  Ravi Vijaya Satya; John DiCarlo
Journal:  BMC Genomics       Date:  2014-12-05       Impact factor: 3.969

8.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Authors:  Simon A Forbes; David Beare; Prasad Gunasekaran; Kenric Leung; Nidhi Bindal; Harry Boutselakis; Minjie Ding; Sally Bamford; Charlotte Cole; Sari Ward; Chai Yin Kok; Mingming Jia; Tisham De; Jon W Teague; Michael R Stratton; Ultan McDermott; Peter J Campbell
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

9.  Next-generation sequencing with a myeloid gene panel in core-binding factor AML showed KIT activation loop and TET2 mutations predictive of outcome.

Authors:  C Y Cher; G M K Leung; C H Au; T L Chan; E S K Ma; J P Y Sim; H Gill; A K W Lie; R Liang; K F Wong; L L P Siu; C S P Tsui; C C So; H W W Wong; S F Yip; H K K Lee; H S Y Liu; J S M Lau; T H Luk; C K Lau; S Y Lin; Y L Kwong; A Y H Leung
Journal:  Blood Cancer J       Date:  2016-07-08       Impact factor: 11.037

10.  Clinical evaluation of panel testing by next-generation sequencing (NGS) for gene mutations in myeloid neoplasms.

Authors:  Chun Hang Au; Anna Wa; Dona N Ho; Tsun Leung Chan; Edmond S K Ma
Journal:  Diagn Pathol       Date:  2016-01-22       Impact factor: 2.644

View more
  10 in total

1.  Clonal Hematopoiesis-Associated Gene Mutations in a Clinical Cohort of 448 Patients With Ovarian Cancer.

Authors:  Konstantin Weber-Lassalle; Corinna Ernst; Alexander Reuss; Kathrin Möllenhoff; Klaus Baumann; Christian Jackisch; Jan Hauke; Dimo Dietrich; Julika Borde; Tjoung-Won Park-Simon; Lars Hanker; Katharina Prieske; Sandra Schmidt; Nana Weber-Lassalle; Esther Pohl-Rescigno; Stefan Kommoss; Frederik Marmé; Florian Heitz; Julia C Stingl; Rita K Schmutzler; Philipp Harter; Eric Hahnen
Journal:  J Natl Cancer Inst       Date:  2022-04-11       Impact factor: 13.506

2.  Polygenic risk scores indicate extreme ages at onset of breast cancer in female BRCA1/2 pathogenic variant carriers.

Authors:  Rita K Schmutzler; Eitan Friedman; Eric Hahnen; Corinna Ernst; Julika Borde; Yael Laitman; Britta Blümcke; Dieter Niederacher; Konstantin Weber-Lassalle; Christian Sutter; Andreas Rump; Norbert Arnold; Shan Wang-Gohrke; Judit Horváth; Andrea Gehrig; Gunnar Schmidt; Véronique Dutrannoy; Juliane Ramser; Julia Hentschel; Alfons Meindl; Christopher Schroeder; Barbara Wappenschmidt; Christoph Engel; Karoline Kuchenbaecker
Journal:  BMC Cancer       Date:  2022-06-27       Impact factor: 4.638

3.  Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples.

Authors:  Minfeng Xiao; Xiaoqing Liu; Jingkai Ji; Min Li; Jiandong Li; Lin Yang; Wanying Sun; Peidi Ren; Guifang Yang; Jincun Zhao; Tianzhu Liang; Huahui Ren; Tian Chen; Huanzi Zhong; Wenchen Song; Yanqun Wang; Ziqing Deng; Yanping Zhao; Zhihua Ou; Daxi Wang; Jielun Cai; Xinyi Cheng; Taiqing Feng; Honglong Wu; Yanping Gong; Huanming Yang; Jian Wang; Xun Xu; Shida Zhu; Fang Chen; Yanyan Zhang; Weijun Chen; Yimin Li; Junhua Li
Journal:  Genome Med       Date:  2020-06-30       Impact factor: 11.117

4.  pTrimmer: An efficient tool to trim primers of multiplex deep sequencing data.

Authors:  Xiaolong Zhang; Yanyan Shao; Jichao Tian; Yuwei Liao; Peiying Li; Yu Zhang; Jun Chen; Zhiguang Li
Journal:  BMC Bioinformatics       Date:  2019-05-10       Impact factor: 3.169

5.  Assessment of a multiplex PCR and Nanopore-based method for dengue virus sequencing in Indonesia.

Authors:  Samuel C B Stubbs; Barbara A Blacklaws; Benediktus Yohan; Frilasita A Yudhaputri; Rahma F Hayati; Brian Schwem; Edsel M Salvaña; Raul V Destura; James S Lester; Khin S Myint; R Tedjo Sasmono; Simon D W Frost
Journal:  Virol J       Date:  2020-02-13       Impact factor: 4.099

6.  Mutation screening of germline TP53 mutations in high-risk Chinese breast cancer patients.

Authors:  Ava Kwong; Vivian Yvonne Shin; Cecilia Y S Ho; Chun Hang Au; Thomas P Slavin; Jeffrey N Weitzel; Tsun-Leung Chan; Edmond S K Ma
Journal:  BMC Cancer       Date:  2020-11-02       Impact factor: 4.430

7.  Development and Analytical Validation of a Targeted Next-Generation Sequencing Panel to Detect Actionable Mutations for Targeted Therapy.

Authors:  Dandan Wang; Kai Ma; Wei Deng; Jingyu Li; Shaohua Xiang; Yang Zhang; Ying Fu; Heng Dai; Bingding Huang
Journal:  Onco Targets Ther       Date:  2021-04-07       Impact factor: 4.147

8.  MIPP-Seq: ultra-sensitive rapid detection and validation of low-frequency mosaic mutations.

Authors:  Ryan N Doan; Michael B Miller; Sonia N Kim; Rachel E Rodin; Javier Ganz; Sara Bizzotto; Katherine S Morillo; August Yue Huang; Reethika Digumarthy; Zachary Zemmel; Christopher A Walsh
Journal:  BMC Med Genomics       Date:  2021-02-12       Impact factor: 3.063

9.  Genome Sequences of SARS-CoV-2 Strains Detected in Hong Kong.

Authors:  Chun Hang Au; Wai Sing Chan; Ho Yin Lam; Dona N Ho; Simon Y M Lam; Jonpaul S T Zee; Tsun Leung Chan; Edmond S K Ma
Journal:  Microbiol Resour Announc       Date:  2020-07-30

10.  Mitigating the effects of reference sequence bias in single-multiplex massively parallel sequencing of the mitochondrial DNA control region.

Authors:  Tunde I Huszar; Jon H Wetton; Mark A Jobling
Journal:  Forensic Sci Int Genet       Date:  2019-01-16       Impact factor: 4.882

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.