Literature DB >> 23613681

Annotation of genes having candidate somatic mutations in acute myeloid leukemia with whole-exome sequencing using concept lattice analysis.

Kye Hwa Lee¹, Jae Hyeun Lim, Ju Han Kim.

Abstract

In cancer genome studies, the annotation of newly detected oncogene/tumor suppressor gene candidates is a challenging process. We propose using concept lattice analysis for the annotation and interpretation of genes having candidate somatic mutations in whole-exome sequencing in acute myeloid leukemia (AML). We selected 45 highly mutated genes with whole-exome sequencing in 10 normal matched samples of the AML-M2 subtype. To evaluate these genes, we performed concept lattice analysis and annotated these genes with existing knowledge databases.

Entities: Chemical Disease Gene Species

Keywords: DNA mutational analysis; DNA sequence analysis; acute myeloid leukemia; biolattice; concept lattice analysis; oncogenes

Year: 2013 PMID： 23613681 PMCID： PMC3630384 DOI： 10.5808/GI.2013.11.1.38

Source DB: PubMed Journal: Genomics Inform ISSN： 1598-866X

Introduction

Acute myeloid leukemia (AML) is one of the most wellstudied diseases in the genomic research area [1, 2]. AML occurs usually in middle-aged people and is diagnosed by increasing leukemic myeloblasts in blood over 30% [3]. AML is a genetically heterogeneous disease, since 1/3 of AML patients have chromosomal rearrangements, like t(8;21) and t(15;17), but other AML patients have normal karyotypes [4]. With recent advances of high-throughput genomic technology, a favorable prognosis has been observed with some genetic changes in cytogenetically normal AML [5]. These results were reflected by the World Health Organization (WHO) diagnostic criteria; the NMP1 and CEBPA mutations were included in the 2008 revision of these criteria [6]. The molecular change of AML is considered to be the accumulation of somatic mutations in hematopoietic progenitor cells [7]. Next-generation sequencing technology gave us new insights into the clonal heterogeneity of leukemic mutations so that we can make an explanation why some of these mutations are highly reproducible but others are very rare [8]. Still, in 30% of cytogenetically normal AML, the genetic causality origin or strongly associated genetic changes have not yet been discovered [9, 10]. With advances of high-throughput technology, discovery of disease-associated genes is growing [11]. As a consequence, the genetic knowledge databases are growing rapidly. Accordingly, the annotation of candidate causal genes in genetic studies is a very challenging process for researchers. We propose a workflow of the detection of somatic mutation candidates in 10 normal matched AML samples and introduce concept lattice analysis for clustering the samples that have highly mutated genes in common.

Methods

Primacy sequence analysis

We received the fastq files of whole-exome sequencing results of tumor and matched normal sample data of 10 AML patients from the Korea Genome Organization in December 2012. There were no patient-related medical or characteristic data. We aligned the sequencing reads to the human reference genome (hg 19, GRCh37) from USCC by BWA 0.6.2 [12] (Figs. 1 and 2). To filter the known single nucleotide polymorphisms (SNPs), we used dbSNP bulid 137. We removed PCR duplicates and filtering low-quality SNPs by Samtools 0.1.18 [13], Picard 1.68 [14], and GATK 2.3.4 [15]. After the filtering process, the SAM file was converted to VCF file by VCF Tools 0.1.10 [16]. For detecting somatic mutation candidates, we obtained the difference in VCF files between tumor and normal samples. For annotation of these somatic mutation candidates, we used the ANNOVAR tool [17].

Fig. 1

Primary sequence analysis pipeline.

Fig. 2

Workflow of detection of somatic mutation candidate from exome sequencing of normal matched samples from 10 acute myeloid leukemia.

Formal concept analysis

We used formal concept analysis (FCA) for the construction of hierarchical relationships among samples sharing highly mutated genes [18]. FCA is a useful method in conceptual clustering of objects, attributes, and their binary relationship. In FCA, the sets of formal objects and formal attributes together with their relation to each other form a "formal context," which can be represented by a crosstable. In our case, the objects are 10 AML samples, and the attributes are 45 highly mutated genes. We defined the formal context as K = (G, M, I), where G is a set of objects (i.e., samples), M is a set of attributes (i.e., mutated genes), and I ⊆ G × M is the incidence relations where (g, m) ⊆ I if object g has attribute m. For A ⊆ G and B ⊆ M, we define the operators A' = {m ∈ M|gIm for all g ∈ A} (i.e., the set of attributes common to the objects in A) and B' = {g ∈ G|gIm for all m ∈ B} (i.e., the set of objects common to the attributes in B). A pair of (A, B) is a formal concept of k(G, M, I) if and only if A ⊆ G, B ⊆ M, A' = B, and A = B'. A is called the extent and B is the intent of the concept (A, B). The extent consists of all objects belonging to the concept while the intent contains all attributes shared by the objects. The concept of a given context is naturally ordered by the subconcept-superconcept relation, defined by (A1, B1) ≤ (A2, B2): <=> A1 ⊆ A2 (<=> B2 ⊆ B1). The ordered set of all concepts of the context (G, M, I) is denoted by C(G, M, I) and is called the concept lattice of (G, M, I). We represent the structure of this concept lattice with a Hasse diagram, in which nodes are the concepts and the edges correspond to the neighborhood relationship among the concepts. All concepts above an object label (below the attribute label) include that object (attribute). The top element of a lattice is a unit concept, representing a concept that contains all objects. The bottom element is a zero concept having no object.

Results

Overview of mutations

We have identified 12,908 somatic mutation candidates in 10 AML sequenced exomes, including 1,281 point mutations, 625 insertion/deletions (Indels) (Table 1, Fig. 3). The point mutations include 7,483 synonymous single nucleotide variations (SNVs), 4,297 nonsynonymous SNVs, 282 stopgain SNVs, 14 stoploss SNVs, and 5 frameshift substitutions, and the Indels include 247 frame shift insertions, 310 frameshift deletions, and 68 nonframe shifts (Fig. 4). For each patient, the average nonsynonymous mutation count was 429.7 (SD, 97.16).

Table 1

The distribution of somatic mutation candidates in 10 AML samples

AML, acute myeloid leukemia; SNP, single nucleotide polymorphism; NS, nonsynonymous; SNV, single nucleotide variation.

Fig. 3

Distribution of somatic mutation candidates in all samples. NS SNV, nonsynonymous single nucleotide variation.

Fig. 4

Distribution of nonsynonymous somatic mutations in 10 acute myeloid leukemia samples. NS SNPs, nonsynonymous single nucleotide polymorphisms.

About 342 to 665 genes have nonsynonymous somatic mutation candidates at least once in each AML sample (Table 2). Recurrent mutated genes were observed in all samples.

Table 2

Classification of genes according to the count of mutations in each sample

Mutation analysis

The most frequently mutated genes across all samples were USP9Y and MUC5B; these genes were mutated in 5 samples. These genes were also highly mutated in each sample; for USP9Y genes, it had 6 nonsynonymous mutations in sample 3. We have selected 45 highly mutated genes (1.5%) from 2981 mutated genes. We defined highly mutated genes as genes having 3 or more nonsynonymous mutations in each sample (Table 3). In a comparison of mutations with the COSMIC database [19], among 45 highly mutated genes, 21 genes matched to hematopoietic and lymphoid tissue malignancy terms, and 21 genes matched to other cancer types. In 3 genes, there was no matched term in COSMIC (Table 4).

Table 3

List of 3 more mutated genes in 10 AML samples

aGenes mutated in 5 samples; bGenes mutated in 3 samples; cGenes mutated in 2 samples.

Table 4

Comparison of list of 3 more mutated genes with COMIC database

We used the concept lattice to construct the hierarchical relationship between the samples that had 45 highly mutated genes. Concept Biolattice analysis is a mathematical framework based on concept lattice analysis for better biological interpretation of genomic data. The top element of a lattice is a unit concept, representing a concept that contains all objects. The bottom element is a zero concept having no object [20, 21]. For comparing with the Concept lattice (Fig. 5), we also performed hierarchical clustering analysis by Ward method. In hierarchical clustering, cluster 1 has 5 samples (nos. 1, 2, 5, 9, and 10), cluster 2 has 2 samples (nos. 4 and 7), and others have 1 sample each (Fig. 6). We divided the samples into 4 subgroups by interpretation of the concept lattice result (Fig. 7). Lattice subgroup 1 shared SYNE1 gene mutation, and samples 3, 4, and 7 were included in this subgroup. Subgroup 2 was comprised of 5 samples (nos. 1, 2, 5, 6, and 9) that had MUC5 gene mutations in common. Samples 10 and 8 could be isolated by the uniqueness of their mutated gene sharing pattern.

Fig. 5

Concept lattice of 45 genes and 10 acute myeloid leukemia patients having 3 or more nonsynonymous mutations, annotated by COSMIC database. Red rectangles represent annotated hematopoietic and lymphoid tissue malignancy; yellow rectangles represent other cancer type annotated in the COSMIC database.

Fig. 6

Hierarchical clustering of samples by binarized score of 45 highly mutated gene states.

Fig. 7

Subgroup analysis by concept lattice. (A) Subgroup 1 shares SYNE1 mutation in samples 3, 4 and 7. (B) Subgroup 2 shares MUC5B mutation in samples 1, 2, 5, 6, and 9. (C) Subgroup 3 sample 8 only has mutated genes, such as MUC2 and HELZ2. (D) Subgroup 4 has sample 10, having only mutated genes, like MUC6, CDH23, BRD2, OR6V1, and DNAH17.

Discussion

In this study, we proposed a workflow of matched normal AML exome sequencing analysis and the framework for defining sample subgroups. We observed every sample having a nonsynonymous mutation associated with hematological and lymphoid malignancy genes, but the candidate oncogenes showed diverse characters. We selected 45 genes that had 3 or more nonsynonymous mutations and performed hierarchical clustering analysis of the samples by these genes. In classic hierarchical clustering analysis by Ward's method, we could not identify the genetic relationship of those clusters. On the other hand, the result of concept lattice analysis gave us insight into the mutational pattern of each sample. In subgroup 1, samples 3, 4, and 7 shared SYNE1 gene mutations. SYNE1 gene encodes a spectrin repeat-containing protein expressed in skeletal and smooth muscle and peripheral blood lymphocytes that localizes to the nuclear membrane [21]. This gene is not a well-known leukemic gene but is observed in some hematological malignancies and other cancer types [22]. In glioblastoma, SYNE1 mutation is significantly correlated with the overexpression of several known glioblastoma survival genes [23]. In the case of sample 3, the ITGAX gene, encoding ankyrin repeat domain 18A, was mutated. This gene is well known by the association with leukemia [24] and lung cancer [25]. For sample 4, the possible oncogene is LAMC3. LAMC3 gene encodes laminins, which are the major non-collagenous constituent of basement membrane. LAMC3 mutations are associated with several cancers, including colon cancer, lung cancer, and melanoma, and candidate tumor suppressor genes in bladder transitional cell carcinoma [26]. LAMC3 is involved in the phosphatidylinositol 3-kinase.Akt signaling pathway, since it has a role in cell adhesion. The ANKRD18A gene is the oncogene candidate for sample 7 and is a novel epigenetic regulation gene in lung cancer [25]. Therefore, it is possible that the pair relationship of those genes (ITGAX-SYNE1, LAMC3- SYNE1, and ANKRD18A-SYNE1) could contribute together to evolve the leukemic cell transformation. The major limitation of our study is that we could not validate the mutation results by Sanger method or deep sequencing. We selected highly mutated genes having 3 mutations or more, but this definition is arbitrary, so we might have lost candidate oncogenes in some patients. In this study, we suggest the concept of clustering samples that have diverse mutated genes. AML is very heterogeneous genetic disease. Despite the small number of samples we have studied, the genetic variation patterns were not common for all samples. It could have been better to evaluate more sample data for analysis by clustering analysis.

24 in total

1. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043

Review 2. Molecular genetics of adult acute myeloid leukemia: prognostic and therapeutic implications.

Authors: Guido Marcucci; Torsten Haferlach; Hartmut Döhner
Journal: J Clin Oncol Date: 2011-01-10 Impact factor: 44.544

3. Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival.

Authors: David L Masica; Rachel Karchin
Journal: Cancer Res Date: 2011-05-09 Impact factor: 12.701

4. DNMT3A mutations in acute myeloid leukemia.

Authors: Timothy J Ley; Li Ding; Matthew J Walter; Michael D McLellan; Tamara Lamprecht; David E Larson; Cyriac Kandoth; Jacqueline E Payton; Jack Baty; John Welch; Christopher C Harris; Cheryl F Lichti; R Reid Townsend; Robert S Fulton; David J Dooling; Daniel C Koboldt; Heather Schmidt; Qunyuan Zhang; John R Osborne; Ling Lin; Michelle O'Laughlin; Joshua F McMichael; Kim D Delehaunty; Sean D McGrath; Lucinda A Fulton; Vincent J Magrini; Tammi L Vickery; Jasreet Hundal; Lisa L Cook; Joshua J Conyers; Gary W Swift; Jerry P Reed; Patricia A Alldredge; Todd Wylie; Jason Walker; Joelle Kalicki; Mark A Watson; Sharon Heath; William D Shannon; Nobish Varghese; Rakesh Nagarajan; Peter Westervelt; Michael H Tomasson; Daniel C Link; Timothy A Graubert; John F DiPersio; Elaine R Mardis; Richard K Wilson
Journal: N Engl J Med Date: 2010-11-10 Impact factor: 91.245

5. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

Review 6. The 2008 revision of the World Health Organization (WHO) classification of myeloid neoplasms and acute leukemia: rationale and important changes.

Authors: James W Vardiman; Jüergen Thiele; Daniel A Arber; Richard D Brunning; Michael J Borowitz; Anna Porwit; Nancy Lee Harris; Michelle M Le Beau; Eva Hellström-Lindberg; Ayalew Tefferi; Clara D Bloomfield
Journal: Blood Date: 2009-04-08 Impact factor: 22.113

7. In-silico human genomics with GeneCards.

Authors: Gil Stelzer; Irina Dalah; Tsippi Iny Stein; Yigeal Satanower; Naomi Rosen; Noam Nativ; Danit Oz-Levi; Tsviya Olender; Frida Belinky; Iris Bahir; Hagit Krug; Paul Perco; Bernd Mayer; Eugene Kolker; Marilyn Safran; Doron Lancet
Journal: Hum Genomics Date: 2011-10 Impact factor: 4.639

8. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.

Authors: Simon A Forbes; Nidhi Bindal; Sally Bamford; Charlotte Cole; Chai Yin Kok; David Beare; Mingming Jia; Rebecca Shepherd; Kenric Leung; Andrew Menzies; Jon W Teague; Peter J Campbell; Michael R Stratton; P Andrew Futreal
Journal: Nucleic Acids Res Date: 2010-10-15 Impact factor: 16.971

Review 9. Genetic risk prediction in complex disease.

Authors: Luke Jostins; Jeffrey C Barrett
Journal: Hum Mol Genet Date: 2011-08-25 Impact factor: 6.150

10. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome.

Authors: Timothy J Ley; Elaine R Mardis; Li Ding; Bob Fulton; Michael D McLellan; Ken Chen; David Dooling; Brian H Dunford-Shore; Sean McGrath; Matthew Hickenbotham; Lisa Cook; Rachel Abbott; David E Larson; Dan C Koboldt; Craig Pohl; Scott Smith; Amy Hawkins; Scott Abbott; Devin Locke; Ladeana W Hillier; Tracie Miner; Lucinda Fulton; Vincent Magrini; Todd Wylie; Jarret Glasscock; Joshua Conyers; Nathan Sander; Xiaoqi Shi; John R Osborne; Patrick Minx; David Gordon; Asif Chinwalla; Yu Zhao; Rhonda E Ries; Jacqueline E Payton; Peter Westervelt; Michael H Tomasson; Mark Watson; Jack Baty; Jennifer Ivanovich; Sharon Heath; William D Shannon; Rakesh Nagarajan; Matthew J Walter; Daniel C Link; Timothy A Graubert; John F DiPersio; Richard K Wilson
Journal: Nature Date: 2008-11-06 Impact factor: 49.962

2 in total

1. Identification of somatic mutations using whole-exome sequencing in Korean patients with acute myeloid leukemia.

Authors: Seong Gu Heo; Youngil Koh; Jong Kwang Kim; Jongsun Jung; Hyung-Lae Kim; Sung-Soo Yoon; Ji Wan Park
Journal: BMC Med Genet Date: 2017-03-01 Impact factor: 2.103

2. Molecular Characterization of Somatic Alterations in Dukes' B and C Colorectal Cancers by Targeted Sequencing.

Authors: Shafina-Nadiawati Abdul; Nurul-Syakima Ab Mutalib; Khor S Sean; Saiful E Syafruddin; Muhiddin Ishak; Ismail Sagap; Luqman Mazlan; Isa M Rose; Nadiah Abu; Norfilza M Mokhtar; Rahman Jamal
Journal: Front Pharmacol Date: 2017-07-18 Impact factor: 5.810

2 in total