Literature DB >> 35075133

Misaligned sequencing reads from the GNAQ-pseudogene locus may yield GNAQ artefact variants.

Jing Quan Lim1,2, Soon Thye Lim3,4, Choon Kiat Ong5,6,7.   

Abstract

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35075133      PMCID: PMC8786957          DOI: 10.1038/s41467-022-28115-z

Source DB:  PubMed          Journal:  Nat Commun        ISSN: 2041-1723            Impact factor:   14.919


× No keyword cloud information.
Zhaoming Li et al. Nature Communications 10.1038/s41467-019-12032-9 (2019) Next-generation sequencing (NGS) has enabled the interrogation of DNA sequences at an unprecedented fashion. After the sequencing of genomic library DNA, all reference-based bioinformatics analyses involve a mandatory ‘alignment’ step before many downstream analyses can take place. A bioinformatics tool, such as BWA[1,2], can perform this ‘alignment’ step and report the positional coordinates of each NGS read with respect to the reference genome that it has based the alignments on. An aligner scores each seed alignment, by accounting for the matches, mismatches or gaps with a scoring function, between the read and the locality of the reference genome that the aligner assigns it to. In practice, the seed-extended alignment with the highest score would be the primary alignment for a read. However, the primary alignment might not always be correct for a read. For instance, a single-nucleotide polymorphism (SNP) would cause a mismatch in the alignment between a read and the reference genome and will not be considered as an exact-matching alignment instead. As such, a correct alignment covering common polymorphisms would not be considered as a ‘better’ hit, if another incorrect alignment containing fewer mismatches would be found by the aligner. Thus, sequencing reads from homologous genomic loci, such as genes and their corresponding pseudogenes, are very likely to be misaligned to one or the other. Formalin-fixed paraffin-embedded (FFPE) archival materials present great opportunities to study various diseases. However, FFPE DNA are often more fragmented and yield shorter NGS reads as compared to fresh/frozen (FF) tissue. In general, a shorter read-length would contain less information content for a read to be aligned uniquely and would be misaligned more often than NGS reads of longer read-lengths. As such, subsequent analysis of misaligned SNP-stricken NGS-reads would cascade into a mirage of results. A recent study found recurrent GNAQ mutation encoding p.T96S in 8.7% (11/127) of natural-killer/T cell lymphoma (NKTCL) using NGS technologies[3]. The study demonstrated that GNAQ deficiency led to enhanced NK cell survival in conditional knockout mice (Ncr1-Cre-Gnaq) via the inhibition of AKT and MAPK signalling pathways. It was also shown to be clinically important as patients with GNAQ p.T96S had inferior survival and could be relevant for the development of therapies. As the Zhaoming Li et al.[2] study used FFPE materials for all their sequencing work, we investigated the recurrent GNAQ mutations encoding p.T96S and p.Y101X. It was of peculiar interest to us that the two GNAQ hotspot somatic mutations (p.T96S and p.Y101X) reported in the study were not reported in other NKTCL studies that also used NGS[4-9]. We analyzed the Sanger sequences provided in Supplementary Fig. 4 of the work in question and realized that the single-nucleotide variant (SNV) that encoded for p.T96S had a minor allele frequency (MAF) of 1.18% (1386/117782, ExAC v1.0[10] database; dbSNP151[11], rs753716491), which we found to be too common if it was to contribute substantially to the pathogenesis of NKTCL. Moreover, the authors wrote in the published work that the GNAQ somatic mutations encoding for p.Y101X tended to co-occur with p.T96S. However, the GNAQ somatic mutation that encoded for p.Y101X was not marked as a common SNP by germline databases and it was also functionally redundant for a stop-gain (p.Y101X) mutation to co-occur with another missense (p.T96S) mutation on the same gene. This suggested to us that the alignments to the GNAQ locus that encoded for both p.T96S and p.Y101X were erroneous. In an attempt to reproduce the findings of Zhaoming Li et al., we analyzed the sequencing data of the GNAQ-mutant cases from the original paper. The original sample IDs are 9622, 9634, 8186, 9626 and 8188. The read-depth supporting the GNAQ-mutant allele/total allele are 3/37, 9/71, 10/69, 7/69 and 7/44, respectively. However, all the mutant reads could be non-uniquely aligned to both GNAQ and GNAQP loci. Within these five samples, 9626 and 9622 had matching-normal samples, where they had longer read-lengths (125 bp) than their matching-tumor FFPE samples (<~100 bp) at the concerned GNAQ locus. This allowed the artefact variants from the tumors to leak through the germline filter during a somatic variant-calling procedure. Next, we further analysed the NGS reads that encoded for both p.T96S and p.Y101X somatic mutations and found they were indeed misaligned. We simulated 100 bp long NGS reads that would encode for both p.T96S and p.Y101X somatic mutations from the genomic locus of GNAQ using the same hg19 reference that the authors have used and realigned the in silico reads back to the same reference (Fig. 1a). The reads were multi-mapped to the genomic loci of GNAQ and GNAQ-psuedogene-1 (GNAQP) at chr9q21.2 and chr2q21.1, respectively. As expected, the read was realigned back to the GNAQ locus that it was simulated from and recapitulated the two simulated SNVs too; chr9:80537095[G>T] (p.Y101X) and chr9:80537112[T>A] (p.T96S, rs753716491). Next, Fig. 1b shows that the realignment mapped the read to GNAQP too and yielded three SNVs, all of which are common SNPs as denoted by their respective dbSNP IDs; chr2:132182138[G>T] (rs3730150), chr2:132182159[T>C] (rs3730148) and chr2:132182199[C>T] (rs3730153).
Fig. 1

GNAQ p.T96S and p.Y101X mutations could be the results of misaligned sequencing reads from GNAQ-Pseudogene-1.

a Reference sequence from GNAQ locus (top), in silico simulated read that would encode for GNAQ p.T96S and p.101X mutations (middle—green box) and in silico read that represents co-occurring SNPs, rs3730150, rs3730148 and rs3730153 (bottom—orange box; the co-occurring SNPs are in red). b Top-scoring alignments of the read that would encode for GNAQ p.T96S and p.101X. The read aligns to both GNAQ (with one mismatch and one SNP) and GNAQP (with three SNPs) simultaneously. Linkage disequilibrium analysis of the three SNPs from the GNAQP locus also showed that they tend to co-occur and cause an misalignment to GNAQ locus. This misalignment would yield the wrong callings of GNAQ p.T96S and p.101X mutations. c GNAQ-GNAQP homologous regions that implicated p.T96S and p.Y101X, and rs3730150, rs3730148 and rs3730153 in the GNAQ and GNAQP loci, respectively. The immediate regions outside of chr9:80537082-80537173 are unique to GNAQ that would further help Zhaoming Li et al. to further validate their current findings.

GNAQ p.T96S and p.Y101X mutations could be the results of misaligned sequencing reads from GNAQ-Pseudogene-1.

a Reference sequence from GNAQ locus (top), in silico simulated read that would encode for GNAQ p.T96S and p.101X mutations (middle—green box) and in silico read that represents co-occurring SNPs, rs3730150, rs3730148 and rs3730153 (bottom—orange box; the co-occurring SNPs are in red). b Top-scoring alignments of the read that would encode for GNAQ p.T96S and p.101X. The read aligns to both GNAQ (with one mismatch and one SNP) and GNAQP (with three SNPs) simultaneously. Linkage disequilibrium analysis of the three SNPs from the GNAQP locus also showed that they tend to co-occur and cause an misalignment to GNAQ locus. This misalignment would yield the wrong callings of GNAQ p.T96S and p.101X mutations. c GNAQ-GNAQP homologous regions that implicated p.T96S and p.Y101X, and rs3730150, rs3730148 and rs3730153 in the GNAQ and GNAQP loci, respectively. The immediate regions outside of chr9:80537082-80537173 are unique to GNAQ that would further help Zhaoming Li et al. to further validate their current findings. We performed linkage disequilibrium (LD–LDlink) analysis[12] of all three possible pairwise combinations of the three SNPs within GNAQP and found that they were likely to co-occur together as a triplet of SNPs within GNAQP (Fig. 1b, D′ = 1, R2 ≥ 0.9403). As such, NGS reads that were representing these SNPs would be misaligned to GNAQ instead and be misinterpreted for somatic mutations encoding for p.T96S and p.Y101X instead. By performing a pair-wise Smith-Waterman alignment[13] between the genomic sequences of GNAQ and GNAQP, we found that chr9:80537082–80537222 and chr2:132182125–132182265 were homologous and encapsulated all the SNPs and variants that implicated the validity of the reported GNAQ somatic mutations (Fig. 1c). To confirm the reported mutations, the following two criteria need to be satisfied. 1) The alignment must represent GNAQ mutations that encode for p.T96S and p.Y101X. 2) The alignment must extend errorless beyond chr9:80537082-80537222. If either of the two criteria cannot be satisfied, then the validity of the reported GNAQ somatic mutations in NKTCL is questionable. As the 127 NKTCLs that were studied by Zhaoming Li et al.[2] were all FFPE archival materials and 101 of them had matched whole blood as its germline counterpart. DNA extracted from whole-blood are typically less fragmented and tends to yield longer NGS read-lengths than DNA extracted from FFPE archival materials. This allows NGS reads sequenced from whole-blood to align more accurately than those sequenced from FFPE archival materials onto a reference genome. This would mean that sequencing reads that originated from one genomic locus could be mapped to more than one genomic loci and yielded variant artefacts in subsequent downstream analyses. In an analysis for somatic mutations, the germline mutations would be subtracted from the tumor mutations. In this case, the GNAQ p.T96S and p.Y101X somatic artefacts may have leaked through the subtraction step as reads sequenced from the GNAQ and GNAQP loci were aligned differently from both FFPE archival tumor and normal whole-blood samples. Thus, the combination of the following three criteria 1) Short tumor reads that failed to align correctly 2) Long germline reads that aligned correctly and 3) SNP-stricken genomic region from where the tumor reads were sequenced that may have contributed to the GNAQ p.T96S and p.Y101X artefacts.

Methods

Realignment of sequencing reads from GNAQ-pseudogene locus

Genomic aligner BWA-MEM (v0.7.17-r1188) and reference genome hg19 were used to realign the sequencing data described in this study[2]. LDlink (version: March 2020) (public web tool: https://ldlink.nci.nih.gov/) was used to interrogate the prevalence of co-occurring polymorphisms that caused sequencing reads to misalign and produce the artefact calls reported by Zhaoming Li et al. Nature Communications 2019[12]. Smith-Waterman alignment algorithm (version: March 2020) (public web tool: https://www.ebi.ac.uk/Tools/psa/emboss_water/) was used to derive the homologous GNAQ and GNAQP loci[13].
  12 in total

1.  LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants.

Authors:  Mitchell J Machiela; Stephen J Chanock
Journal:  Bioinformatics       Date:  2015-07-02       Impact factor: 6.937

2.  Genomic and Transcriptomic Characterization of Natural Killer T Cell Lymphoma.

Authors:  Jie Xiong; Bo-Wen Cui; Nan Wang; Yu-Ting Dai; Hao Zhang; Chao-Fu Wang; Hui-Juan Zhong; Shu Cheng; Bin-Shen Ou-Yang; Yu Hu; Xi Zhang; Bin Xu; Wen-Bin Qian; Rong Tao; Feng Yan; Jian-Da Hu; Ming Hou; Xue-Jun Ma; Xin Wang; Yuan-Hua Liu; Zun-Min Zhu; Xiao-Bin Huang; Li Liu; Chong-Yang Wu; Li Huang; Yun-Feng Shen; Rui-Bin Huang; Jing-Yan Xu; Chun Wang; De-Pei Wu; Li Yu; Jian-Feng Li; Peng-Peng Xu; Li Wang; Jin-Yan Huang; Sai-Juan Chen; Wei-Li Zhao
Journal:  Cancer Cell       Date:  2020-03-16       Impact factor: 31.743

3.  Recurrent ECSIT mutation encoding V140A triggers hyperinflammation and promotes hemophagocytic syndrome in extranodal NK/T cell lymphoma.

Authors:  Haijun Wen; Huajuan Ma; Qichun Cai; Suxia Lin; Xinxing Lei; Bin He; Sijin Wu; Zifeng Wang; Yan Gao; Wensheng Liu; Weiping Liu; Qian Tao; Zijie Long; Min Yan; Dali Li; Keith W Kelley; Yongliang Yang; Huiqiang Huang; Quentin Liu
Journal:  Nat Med       Date:  2018-01-01       Impact factor: 53.440

4.  Janus kinase 3-activating mutations identified in natural killer/T-cell lymphoma.

Authors:  Ghee Chong Koo; Soo Yong Tan; Tiffany Tang; Song Ling Poon; George E Allen; Leonard Tan; Soo Ching Chong; Whee Sze Ong; Kevin Tay; Miriam Tao; Richard Quek; Susan Loong; Kheng-Wei Yeoh; Swee Peng Yap; Kuo Ann Lee; Lay Cheng Lim; Daryl Tan; Christopher Goh; Ioana Cutcutache; Willie Yu; Cedric Chuan Young Ng; Vikneswari Rajasegaran; Hong Lee Heng; Anna Gan; Choon Kiat Ong; Steve Rozen; Patrick Tan; Bin Tean Teh; Soon Thye Lim
Journal:  Cancer Discov       Date:  2012-06-15       Impact factor: 39.397

5.  Oncogenic activation of the STAT3 pathway drives PD-L1 expression in natural killer/T-cell lymphoma.

Authors:  Tammy Linlin Song; Maarja-Liisa Nairismägi; Yurike Laurensia; Jing-Quan Lim; Jing Tan; Zhi-Mei Li; Wan-Lu Pang; Atish Kizhakeyil; Giovani-Claresta Wijaya; Da-Chuan Huang; Sanjanaa Nagarajan; Burton Kuan-Hui Chia; Daryl Cheah; Yan-Hui Liu; Fen Zhang; Hui-Lan Rao; Tiffany Tang; Esther Kam-Yin Wong; Jin-Xin Bei; Jabed Iqbal; Nicholas-Francis Grigoropoulos; Siok-Bian Ng; Wee-Joo Chng; Bin-Tean Teh; Soo-Yong Tan; Navin Kumar Verma; Hao Fan; Soon-Thye Lim; Choon-Kiat Ong
Journal:  Blood       Date:  2018-07-27       Impact factor: 22.113

6.  Exome sequencing identifies somatic mutations of DDX3X in natural killer/T-cell lymphoma.

Authors:  Lu Jiang; Zhao-Hui Gu; Zi-Xun Yan; Xia Zhao; Yin-Yin Xie; Zi-Guan Zhang; Chun-Ming Pan; Yuan Hu; Chang-Ping Cai; Ying Dong; Jin-Yan Huang; Li Wang; Yang Shen; Guoyu Meng; Jian-Feng Zhou; Jian-Da Hu; Jin-Fen Wang; Yuan-Hua Liu; Lin-Hua Yang; Feng Zhang; Jian-Min Wang; Zhao Wang; Zhi-Gang Peng; Fang-Yuan Chen; Zi-Min Sun; Hao Ding; Ju-Mei Shi; Jian Hou; Jin-Song Yan; Jing-Yi Shi; Lan Xu; Yang Li; Jing Lu; Zhong Zheng; Wen Xue; Wei-Li Zhao; Zhu Chen; Sai-Juan Chen
Journal:  Nat Genet       Date:  2015-07-20       Impact factor: 38.330

7.  Analysis of protein-coding genetic variation in 60,706 humans.

Authors:  Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2016-08-18       Impact factor: 49.962

8.  The EMBL-EBI search and sequence analysis tools APIs in 2019.

Authors:  Fábio Madeira; Young Mi Park; Joon Lee; Nicola Buso; Tamer Gur; Nandana Madhusoodanan; Prasad Basutkar; Adrian R N Tivey; Simon C Potter; Robert D Finn; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

9.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

10.  Recurrent GNAQ mutation encoding T96S in natural killer/T cell lymphoma.

Authors:  Zhaoming Li; Xudong Zhang; Weili Xue; Yanjie Zhang; Chaoping Li; Yue Song; Mei Mei; Lisha Lu; Yingjun Wang; Zhiyuan Zhou; Mengyuan Jin; Yangyang Bian; Lei Zhang; Xinhua Wang; Ling Li; Xin Li; Xiaorui Fu; Zhenchang Sun; Jingjing Wu; Feifei Nan; Yu Chang; Jiaqin Yan; Hui Yu; Xiaoyan Feng; Guannan Wang; Dandan Zhang; Xuefei Fu; Yuan Zhang; Ken H Young; Wencai Li; Mingzhi Zhang
Journal:  Nat Commun       Date:  2019-09-16       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.