Literature DB >> 16381848

ChimerDB--a knowledgebase for fusion sequences.

Namshin Kim1, Pora Kim, Seungyoon Nam, Seokmin Shin, Sanghyuk Lee.   

Abstract

Chromosome translocation and gene fusion are frequent events in the human genome and are often the cause of many types of tumor. ChimerDB is the database of fusion sequences encompassing bioinformatics analysis of mRNA and expressed sequence tag (EST) sequences in the GenBank, manual collection of literature data and integration with other known database such as OMIM. Our bioinformatics analysis identifies the fusion transcripts that have non-overlapping alignments at multiple genomic loci. Fusion events at exon-exon borders are selected to filter out the cloning artifacts in cDNA library preparation. The result is classified into two groups--genuine chromosome translocation and fusion between neighboring genes owing to intergenic splicing. We also integrated manually collected literature and OMIM data for chromosome translocation as an aid to assess the validity of each fusion event. The database is available at http://genome.ewha.ac.kr/ChimerDB/ for human, mouse and rat genomes.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16381848      PMCID: PMC1347382          DOI: 10.1093/nar/gkj019

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Chromosomal aberrations are frequently observed in many hematologic and solid tumors (1,2). Various large-scale and high-throughput techniques, such as chromosome banding (1,3), comparative genomic hybridization (CGH) (4) and fluorescence in situ hybridization (FISH) (5), are being used in modern cancer cytogenetics to detect structural and copy number changes in chromosomes. The most common type of mutation among the known cancer genes is chromosomal translocation (6). It can deregulate the gene expression by disrupting the promoter region of the gene or by joining the gene with enhancer elements like immunoglobulin or T-cell receptor genes (7). Alternatively, fusion of two coding regions creates a chimeric gene that encodes a fusion protein that interferes with the normal regulating pathways (2,8). The most famous example is the fusion protein BCRABL, the target protein of the drug gleevec treating chronic myeloid leukemia (CML) (8,9). CML is associated in most cases with a chromosomal translocation between chromosomes 9 and 22 that creates the Philadelphia chromosome. The BCR gene in chr22 is fused with the gene ABL in chr9, so called the t(9;22)(q34;q11) translocation. The tyrosine kinase activity of ABL is constantly activated by the BCR gene (GTPase activator) in the fusion protein, resulting in the rapid cellular mitosis and inability of the cell to perform apoptosis. Gleevec inhibits the tyrosine kinase ability of the BCRABL fusion protein. Successful development of gleevec opened an era of targeted molecular therapy. Chimeric sequences can be generated from other mechanisms too. Two adjacent, independent genes may be co-transcribed and the intergenic region is spliced out so that the resulting fused transcript possesses exons from both genes (10). This phenomenon, termed as cotranscription and intergenic splicing (CoTIS), can lead to fusion protein or altered promoter region for the downstream gene in the same way as chromosome translocation. Furthermore, trans-splicing can join two independently transcribed mRNA sequences at canonical exon–exon borders. Even though several cases of natural trans-splicing are reported in human (11,12), it is generally considered to be rare and will be ignored in this work. Given the importance of chromosome aberration in cancer detection, prognosis and target identification, it is quite natural to search for chimeric sequences in the GenBank. Alterbi and co-workers (13) developed a screening procedure to identify heterologous, spliced mRNAs with potential origin from chromosomal translocation, mRNA trans-splicing and multi-locus transcription. Hahn et al. (14) extended this approach to include expressed sequence tag (EST) sequences that expanded the search scope significantly. They experimentally verified the predicted IRA1RGS17 fusion in the breast cancer cell line MCF7. However, they deliberately discarded fusion cases between neighboring genes. Curated databases are also available from cancer cytogenetics community. NCBI's database of cancer chromosomes (15) integrated the NCI/NCBI SKY/M-FISH and CGH database and the NCI Mitelman Database of Chromosome Aberrations in Cancers. The Cancer Genome Project at the Sanger Institute maintains a list of cancer genes based on published scientific literatures (6). Mutation data and associated information for these cancer genes are stored in the COSMIC database (16). The ‘Atlas of Genetics and Cytogenetics in Oncology and Haematology’ is a peer-reviewed resource that contains concise and updated cards on genes involved in cancer, cytogenetics and clinical entities in oncology, and cancer-prone diseases (17). In this paper, we describe a new database of fusion genes, ChimerDB. It aims to be a knowledgebase that integrates bioinformatics analysis of transcript sequences (mRNA and EST), literature data from scientific journals (6) and OMIM data on translocation (18). It should be a valuable resource for developing cancer biomarkers and drug targets.

DATABASE CONSTRUCTION

In silico identification of fusion transcripts

All mRNA and EST sequences in the GenBank (Release 148; June 15, 2005) were aligned onto the human genome (NCBI Build 35) using the BLAT program (19). Minimum length and percent identity of valid alignments were 100 bp and 93%, respectively. Transcripts with two non-overlapping, contiguous alignments were selected as fusion candidates. Small overlap (<10 bp) was allowed due to uncertainty in BLAT alignments. Alignments in the same chromosome were restricted to be in opposite orientation to avoid fusion by CoTIS. We found 261 mRNA and 2484 EST sequences as fusion candidates, including artificial chimeras created by accidental ligation of different cDNAs during the cloning procedure. Genuine and artificial chimeras can be distinguished by examining the fusion boundaries. Fusion points in true chimeras usually coincide with a splice site since chromosome breakage tends to take place in long intronic regions rather than in short exons (14). Allowing 10 bp deviation from known splice sites, we obtained 159 mRNA and 258 EST sequences as reliable fusion transcripts. They constitute 355 fusion cases involving 638 genes. Fusion cases owing to CoTIS can be identified using the ECgene clustering system (20,21). ECgene clusters sequences that share any splice sites in the genomic alignment, taking gene orientation into consideration. Fusion transcripts cause two neighboring genes to join to form a single cluster in the ECgene system. Therefore, we searched for ECgene clusters (Version 1.2) that contained two non-overlapping known genes and identified fusion transcripts. We found 223 mRNA and 396 EST sequences encoding 337 cases of CoTIS. Fusion by CoTIS creates a subtle problem in the genome-based EST clustering procedure that groups sequences sharing any splice sites. They should be identified and removed in advance.

Literature database

Journal publication is the single most important source of scientific knowledge. PubMed search for publications reporting fusion events owing to chromosome translocation gave 2945 manuscripts. Manual inspection of abstract produced 254 fusion cases involving 286 genes. We also imported the list of cancer genes associated with chromosome translocation from the Cancer Genome Project at the Sanger Institute (6). Current cancer gene census contains 257 translocation cases involving 346 genes, most of which coincide with our PubMed search result. OMIM database is another knowledgebase of human genes and genetic disorders (18). We searched OMIM database for records with chromosome translocations. Manual inspection of ∼850 records gave 320 translocation cases with 597 genes. Literature databases should greatly extend the utility of fusion database by providing literature proof and relevant medical information for each computationally identified event.

Database integration

One of the major problems in dealing with heterogeneous databases, especially the literature data, is the use of aliases for the same gene. This is the source of redundant and fragmented entries. All records use the official HUGO gene to avoid this problem. Alias field in Entrez Gene database (22) was used to deal with different names for the same gene. In silico results from transcript mapping, literature and OMIM data were all integrated according to this official gene symbols. Furthermore, Mitelman's recurrent aberration database and the Atlas Chromosomes in Cancer data were also integrated into ChimerDB. Table 1 is the summary statistics of ChimerDB. Currently, ChimerDB contains 1258 fusion cases that involve 1777 genes, 381 mRNA and 654 EST sequences. Assuming total number of human genes ∼25 000, this implies that ∼4.4% of human genes are involved in chromosome translocation and another ∼2.7% of human genes show fusion between neighboring genes (CoTIS). It should be noted that overlap between the transcript mapping data and other known databases is not large. This suggests that majority of known chromosome translocations are not supported by transcript data, such as mRNA and EST. Unless transcripts were discarded owing to low alignment quality, they would be from non-sequence-based methods and it would be interesting to obtain clone-based sequence data for those cases.
Table 1

Summary statistics of ChimerDB

Data sourceFusion casesGenesamRNAEST
Transcript mappingb
    Translocation355638159258
    CoTIS337674223396
PubMed literature254286 (76)
Sanger CGP257346 (80)
OMIM records320597 (66)
Mitelman breakpoint144158 (54)
Atlas chromosomes307 (61)
Total (non-redundant)12581777381654
    Known genes10091528
    EST clustersc249249

aNumbers in the parentheses indicate common genes with translocation data.

bTranscript mapping data include EST clusters as well as known genes.

cEST clusters come from 151 translocation and 98 CoTIS cases.

WEB INTERFACE

The contents of ChimerDB can be accessed at . It supports various types of queries such as gene name and cytogenetic band position. Query can be a breakpoint (e.g. AML1) or a fusion event (e.g. BCR–ABL1). We also support searches by site and/or diagnosis as in the NCBI Cancer Chromosomes. Search result page shows all relevant fusion cases with available types of data. Details page opens an output page for a specific fusion case that consists of a summary table, detailed information table and fusion transcript table as shown in Figure 1A. It includes extensive links to relevant resources, such as the Entrez Gene, OMIM and PubMed databases. Links to NCBI Cancer Chromosomes provide detailed information on SKY/M-FISH and CGH and Mitelman databases—primary databases for cancer cytogenetics. Links to Atlas of Genetics and Cytogenetics in Oncology and Haematology database allow access to community efforts to annotate cancer genes, rich in cytogenetic and clinical information. The transcript table in Figure 1A shows the tissue and pathology information for EST sequences. It also describes properties of the fusion—transcript direction, aligned region, number of exons, deviation of fusion boundary from known splice site and so on. Intact and affected domains before and after translocation are also summarized using the InterPro database (23).
Figure 1

(A) Part of the output page from ChimerDB. (B) Custom genome browser for RUNX1T1 genomic locus to visualize fusion transcripts. Red dot indicates the fusion point.

Figure 1B is the custom genome browser showing alignment of fusion transcripts in each gene. Breakpoints and fusion partner genes can be immediately recognized in the viewer. It also shows the position of functional domains present in the gene. Most fusion genes owing to CoTIS do not have detailed information on their functional significance yet. Therefore, we simply provide the minimal information—fused genes, genomic locus, functional domains, alignment browser and exon/intron properties.

FUTURE DIRECTIONS

ChimerDB is an integrated database for fusion sequences that includes bioinformatics analysis, literature data and OMIM data. However, functional significance of fusion events should be examined thoroughly so that these fusion events could serve as drug targets for cancer treatment. Expression analysis of fused transcripts in different histological and pathological conditions should be performed with the bioinformatics analysis such as domain and promoter changes, frame shift and so on. Integrative approach that combines high-throughput techniques, such as SKY, CGH, SNP chip, microarray, proteomics, interactions and pathway analysis, would prove to be powerful in elucidating the functional significance of fusion genes. ChimerDB will continue to integrate relevant data available in public.
  23 in total

1.  Finding fusion genes resulting from chromosome rearrangement by analyzing the expressed sequence databases.

Authors:  Yoonsoo Hahn; Tapan Kumar Bera; Kristen Gehlhaus; Ilan R Kirsch; Ira H Pastan; Byungkook Lee
Journal:  Proc Natl Acad Sci U S A       Date:  2004-08-23       Impact factor: 11.205

2.  ECgene: genome-based EST clustering and gene modeling for alternative splicing.

Authors:  Namshin Kim; Seokmin Shin; Sanghyuk Lee
Journal:  Genome Res       Date:  2005-04       Impact factor: 9.043

3.  Differential expression of the translocated and the untranslocated c-myc oncogene in Burkitt lymphoma.

Authors:  A ar-Rushdi; K Nishikura; J Erikson; R Watt; G Rovera; C M Croce
Journal:  Science       Date:  1983-10-28       Impact factor: 47.728

4.  Prevalence estimates of recurrent balanced cytogenetic aberrations and gene fusions in unselected patients with neoplastic disorders.

Authors:  Felix Mitelman; Fredrik Mertens; Bertil Johansson
Journal:  Genes Chromosomes Cancer       Date:  2005-08       Impact factor: 5.006

5.  Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors.

Authors:  A Kallioniemi; O P Kallioniemi; D Sudar; D Rutovitz; J W Gray; F Waldman; D Pinkel
Journal:  Science       Date:  1992-10-30       Impact factor: 47.728

Review 6.  A census of human cancer genes.

Authors:  P Andrew Futreal; Lachlan Coin; Mhairi Marshall; Thomas Down; Timothy Hubbard; Richard Wooster; Nazneen Rahman; Michael R Stratton
Journal:  Nat Rev Cancer       Date:  2004-03       Impact factor: 60.716

Review 7.  Chromosome aberrations in solid tumors.

Authors:  Donna G Albertson; Colin Collins; Frank McCormick; Joe W Gray
Journal:  Nat Genet       Date:  2003-08       Impact factor: 38.330

8.  Detection and analysis of spliced chimeric mRNAs in sequence databanks.

Authors:  Antonello Romani; Emanuela Guerra; Marco Trerotola; Saverio Alberti
Journal:  Nucleic Acids Res       Date:  2003-02-15       Impact factor: 16.971

9.  InterPro, progress and status in 2005.

Authors:  Nicola J Mulder; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Alex Bateman; David Binns; Paul Bradley; Peer Bork; Phillip Bucher; Lorenzo Cerutti; Richard Copley; Emmanuel Courcelle; Ujjwal Das; Richard Durbin; Wolfgang Fleischmann; Julian Gough; Daniel Haft; Nicola Harte; Nicolas Hulo; Daniel Kahn; Alexander Kanapin; Maria Krestyaninova; David Lonsdale; Rodrigo Lopez; Ivica Letunic; Martin Madera; John Maslen; Jennifer McDowall; Alex Mitchell; Anastasia N Nikolskaya; Sandra Orchard; Marco Pagni; Chris P Ponting; Emmanuel Quevillon; Jeremy Selengut; Christian J A Sigrist; Ville Silventoinen; David J Studholme; Robert Vaughan; Cathy H Wu
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

10.  The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website.

Authors:  S Bamford; E Dawson; S Forbes; J Clements; R Pettett; A Dogan; A Flanagan; J Teague; P A Futreal; M R Stratton; R Wooster
Journal:  Br J Cancer       Date:  2004-07-19       Impact factor: 7.640

View more
  22 in total

1.  Discovery of New Fusion Transcripts in a Cohort of Pediatric Solid Cancers at Relapse and Relevance for Personalized Medicine.

Authors:  Célia Dupain; Anne C Harttrampf; Yannick Boursin; Manuel Lebeurrier; Windy Rondof; Guillaume Robert-Siegwald; Pierre Khoueiry; Birgit Geoerger; Liliane Massaad-Massade
Journal:  Mol Ther       Date:  2018-11-02       Impact factor: 11.454

2.  Novel mechanism of conjoined gene formation in the human genome.

Authors:  Ryong Nam Kim; Aeri Kim; Sang-Haeng Choi; Dae-Soo Kim; Seong-Hyeuk Nam; Dae-Won Kim; Dong-Wook Kim; Aram Kang; Min-Young Kim; Kun-Hyang Park; Byoung-Ha Yoon; Kang Seon Lee; Hong-Seog Park
Journal:  Funct Integr Genomics       Date:  2012-01-10       Impact factor: 3.410

3.  Second generation sequencing of the mesothelioma tumor genome.

Authors:  Raphael Bueno; Assunta De Rienzo; Lingsheng Dong; Gavin J Gordon; Colin F Hercus; William G Richards; Roderick V Jensen; Arif Anwar; Gautam Maulik; Lucian R Chirieac; Kim-Fong Ho; Bruce E Taillon; Cynthia L Turcotte; Robert G Hercus; Steven R Gullans; David J Sugarbaker
Journal:  PLoS One       Date:  2010-05-13       Impact factor: 3.240

4.  Expression of conjoined genes: another mechanism for gene regulation in eukaryotes.

Authors:  Tulika Prakash; Vineet K Sharma; Naoki Adati; Ritsuko Ozawa; Naveen Kumar; Yuichiro Nishida; Takayoshi Fujikake; Tadayuki Takeda; Todd D Taylor
Journal:  PLoS One       Date:  2010-10-12       Impact factor: 3.240

5.  Alternative 3'-end processing of long noncoding RNA initiates construction of nuclear paraspeckles.

Authors:  Takao Naganuma; Shinichi Nakagawa; Akie Tanigawa; Yasnory F Sasaki; Naoki Goshima; Tetsuro Hirose
Journal:  EMBO J       Date:  2012-09-07       Impact factor: 11.598

6.  ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data.

Authors:  Milana Frenkel-Morgenstern; Alessandro Gorohovski; Vincent Lacroix; Mark Rogers; Kristina Ibanez; Cesar Boullosa; Eduardo Andres Leon; Asa Ben-Hur; Alfonso Valencia
Journal:  Nucleic Acids Res       Date:  2012-11-09       Impact factor: 16.971

7.  Novel domain combinations in proteins encoded by chimeric transcripts.

Authors:  Milana Frenkel-Morgenstern; Alfonso Valencia
Journal:  Bioinformatics       Date:  2012-06-15       Impact factor: 6.937

8.  A transcriptional sketch of a primary human breast cancer by 454 deep sequencing.

Authors:  Alessandro Guffanti; Michele Iacono; Paride Pelucchi; Namshin Kim; Giulia Soldà; Larry J Croft; Ryan J Taft; Ermanno Rizzi; Marjan Askarian-Amiri; Raoul J Bonnal; Maurizio Callari; Flavio Mignone; Graziano Pesole; Giovanni Bertalot; Luigi Rossi Bernardi; Alberto Albertini; Christopher Lee; John S Mattick; Ileana Zucchi; Gianluca De Bellis
Journal:  BMC Genomics       Date:  2009-04-20       Impact factor: 3.969

9.  ChimerDB 2.0--a knowledgebase for fusion genes updated.

Authors:  Pora Kim; Suhyeon Yoon; Namshin Kim; Sanghyun Lee; Minjeong Ko; Haeseung Lee; Hyunjung Kang; Jaesang Kim; Sanghyuk Lee
Journal:  Nucleic Acids Res       Date:  2009-11-11       Impact factor: 16.971

10.  Identification and analysis of pig chimeric mRNAs using RNA sequencing data.

Authors:  Lei Ma; Shulin Yang; Weiming Zhao; Zhonglin Tang; Tingting Zhang; Kui Li
Journal:  BMC Genomics       Date:  2012-08-28       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.