Literature DB >> 17597845

AVATAR: a database for genome-wide alternative splicing event detection using large scale ESTs and mRNAs.

Fang Rong Hsu1, Hwan-You Chang, Yaw-Lin Lin, Yin-Te Tsai, Hui-Ling Peng, Ying Tsong Chen, Chia Yang Cheng, Min Yao Shih, Chia-Hung Liu, Chin-Feng Chen.   

Abstract

UNLABELLED: In the past years, identification of alternative splicing (AS) variants has been gaining momentum. We developed AVATAR, a database for documenting AS using 5,469,433 human EST sequences and 26,159 human mRNA sequences. AVATAR contains 12000 alternative splicing sites identified by mapping ESTs and mRNAs with the whole human genome sequence. AVATAR also contains AS information for 6 eukaryotes. We mapped EST alignment information into a graph model where exons and introns are represented with vertices and edges, respectively. AVATAR can be queried using, (1) gene names, (2) number of identified AS events in a gene, (3) minimal number of ESTs supporting a splicing site, etc. as search parameters. The system provides visualized AS information for queried genes. AVAILABILITY: The database is available for free at http://avatar.iecs.fcu.edu.tw/

Entities:  

Year:  2005        PMID: 17597845      PMCID: PMC1891626          DOI: 10.6026/97320630001016

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Alternative splicing (AS) is an important mechanism for functional diversity in eukaryotic cells. AS allow processing of one pre-mRNA into different transcripts in a cell type. This results in protein diversity with each protein having distinct function. [1–2 –3] To address this problem we used EST (short, single pass cDNA sequences generated from randomly selected library clones produced in a high throughput manner from different tissues, individuals and conditions) and mRNA sequences to detect AS variants. The detected variants (using 5,469,433 EST and 26,159 mRNA sequences) were stored in a database called AVATAR. Although, AS databases are available in the public domain, not many contain AS information for multiple eukaryotes (a comparison summarized in AVATAR web site). Therefore, it is important to document AS information for multiple eukaryotes. Hence, we developed AVATAR containing AS information for six eukaryotes. Here, we describe AVATAR development, its content and utility.

Methodology

Dataset used

The dbEST database (Jan 16, 2004) at NCBI contains nearly 5.4 million human EST sequences and this dataset is used in the current analysis. [4] The human genome sequences (CONTIG build 3.4) in Genbank format is obtained from NCBI. [5] Gene information and mRNA sequence were downloaded from the NCBI RefSeq project.

Identification of AS

The identification of AS in AVATAR is performed in three steps (described below) as illustrated in Figure 1.
Figure 1

Process of EST alignment and screening. (a) Search the genomic location for each EST. (b) Screening ESTs scores with greater than 94%. (c) Grouping intron by splicing site matched within 3 bp. (d) Detection of AS sites

Step 1: Alignment of EST and mRNA with the genome sequence

EST sequences were aligned to the whole genome sequence using Mugup. [6 ] Mugup is a sequence alignment program developed in Windows platform. This procedure identified splice sites in the ESTs (Figure 1 panel A and B). The matched regions and gaps correspond to exons and introns, respectively. EST and mRNA alignments with scores greater than 94% were used for further analysis.

Step 2: Clustering EST and mRNA

EST and mRNA were clustered according to their location in the genome (Figure 1 panel C). EST and mRNA with overlapping regions were then assembled together.

Step 3: Detection of AS sites

The mapping of EST alignment with genome sequence to intron positions helps to identify skipped exons and included exons.

Searching AVATAR

AVATAR can be queried using keywords. The keywords include accession number, gene name, gene isoform, gene location, cytogenetic locations, chromosome number and number of AS events. The database search produces AS visuals for queried gene.

Utility to the Biological Community

AVATAR is a collection of AS information for 6 eukaryotic organisms. The database can be queried simultaneously for 6 organisms. It can also be searched using gene names and desired number of AS events. EST sequences are error prone resulting in the detection of aberrant transcripts. Frequency of EST alignment at a specific site provides improved detection in AVATAR.

Caveats

AS information on paralogous genes in eukaryotic genomes are not included in AVATAR due to the difficulty in identifying their corresponding chromosomal locations using EST sequences.

Future developments

New EST sequences are generated in laboratories every day. Hence, it is a time consuming to keep AS databases updated due to the growth of genome and mRNA sequences. Hence, we are in the process of developing a computer agent which can update AVATAR automatically. We also plan to include tumor specific AS data.
Table 1

Database statistics

OrganismExon skipping3’ AS5’ ASTotal
Homo sapiens58003227321312240
Mus musculus2772150414885764
Rattus norvegicus158145162465
Drosophila melanogaster8100106214
Caenorhabditis elegans75063120
Arabidopsis thaliana25976137
  3 in total

Review 1.  Alternative splicing: increasing diversity in the proteomic world.

Authors:  B R Graveley
Journal:  Trends Genet       Date:  2001-02       Impact factor: 11.639

Review 2.  Alternative RNA splicing in the nervous system.

Authors:  P J Grabowski; D L Black
Journal:  Prog Neurobiol       Date:  2001-10       Impact factor: 11.685

Review 3.  Alternative splicing: a ubiquitous mechanism for the generation of multiple protein isoforms from single genes.

Authors:  R E Breitbart; A Andreadis; B Nadal-Ginard
Journal:  Annu Rev Biochem       Date:  1987       Impact factor: 23.643

  3 in total
  5 in total

Review 1.  Function of alternative splicing.

Authors:  Olga Kelemen; Paolo Convertini; Zhaiyi Zhang; Yuan Wen; Manli Shen; Marina Falaleeva; Stefan Stamm
Journal:  Gene       Date:  2012-08-15       Impact factor: 3.688

2.  Cry-Bt identifier: a biological database for PCR detection of Cry genes present in transgenic plants.

Authors:  Vinay Kumar Singh; Sonu Ambwani; Soma Marla; Anil Kumar
Journal:  Bioinformation       Date:  2009-10-23

3.  Alternative splicing enriched cDNA libraries identify breast cancer-associated transcripts.

Authors:  Elisa N Ferreira; Maria C R Rangel; Pedro F Galante; Jorge E de Souza; Gustavo C Molina; Sandro J de Souza; Dirce M Carraro
Journal:  BMC Genomics       Date:  2010-12-22       Impact factor: 3.969

Review 4.  Alternative splicing for diseases, cancers, drugs, and databases.

Authors:  Jen-Yang Tang; Jin-Ching Lee; Ming-Feng Hou; Chun-Lin Wang; Chien-Chi Chen; Hurng-Wern Huang; Hsueh-Wei Chang
Journal:  ScientificWorldJournal       Date:  2013-05-22

5.  Discovery of novel human transcript variants by analysis of intronic single-block EST with polyadenylation site.

Authors:  Pingzhang Wang; Peng Yu; Peng Gao; Taiping Shi; Dalong Ma
Journal:  BMC Genomics       Date:  2009-11-12       Impact factor: 3.969

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.