Literature DB >> 23519394

CEPiNS: Conserved Exon Prediction in Novel Species.

Shihab Hasan1, Christopher W Wheat.   

Abstract

Exon structure is relatively well conserved among orthologs in several large clades of species (e.g. Mammalia, Diptera, Lepidoptera) across evolutionary distances of up to 80 million years. Thus, it should be straightforward to predict the exon structures in novel species based upon the known exon structures of species that have had their genomes sequenced and well assembled. Being able to predict the exon boundaries in the genes of novel species is important given the quickly growing numbers of transcriptome sequencing projects. CEPiNS is a new pipeline for mining exon boundaries of predicted gene sets from model species and then using this information to identify the exon boundaries in a novel species through codon based alignment. The pipeline uses the freeware SPIDEY, an exon boundary prediction tool, and BLAST (BLASTN, BLASTP, TBLASTX), both of which are part of NCBI's toolkit. CEPiNS provides an important tool to analyze the transcriptome of novel species.

Entities:  

Keywords:  Bioinformatics Software; Evolutionary and Comparative genomics; Exon prediction; Gene structure; Model species; Novel species; Transcriptomics

Year:  2013        PMID: 23519394      PMCID: PMC3602892          DOI: 10.6026/97320630009210

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Genes contain exons which are regions coding for proteins and introns which are non-coding regions. In transcription process, introns are removed by RNA splicing and the exons are joined together to form the functional messenger RNA (mRNA) [1]. Accurate prediction of precise exon–intron boundaries in genes is an essential step in the analysis of genomic sequences [2]. This gene structure is conserved between closely related species for the majority of genes [3]. In evolution, gene structure conservation may be a record of core events [4]. The aim of this project is to develop a new pipeline to predict exon sequences and their boundaries for novel species comparing a model species by using the sequence similarity method.

Methodology

CEPiNS is a bioinformatics tool for large-scale exon prediction. This application allows study of gene structure for model and novel species by predicting exon boundaries and sequences. Given the input of a set of gene sequences and their genomic sequences for a model species, CEPiNS generates a table of exon boundaries for these genes. CEPiNS uses BLAST [5] to identify the orthologous genes between the genes of a reference species and the predicted genes from a novel species' assembled transcriptome. Once this orthology has been established, the exons in the genomic reference species can be transferred to the novel species. The output is therefore the predicted exons in the novel species. The workflow for CEPiNS is illustrated in (Figure 1).
Figure 1

Workflow in CEPiNS. CEPiNS uses Exon Table obtained from SPIDEY to select the exon boundaries in cDNA of GRS. Then the Exon Sequences of GRS are used to find exon boundaries in Novel species. *GRS=Genomic Reference Species; *WGS=Whole Genome Sequence.

Preprocessing of Dataset:

CEPiNS has preprocessing tool to remove alternate splicing and to predict orthologous sequences by using BLAST. For both the model and novel species, multiple copies of the same genes within a genome are removed by identifying sequences with at least 95% similarity at the nucleotide level using BLASTN and retaining only the longest. Gene sequences in both model and novel species with at least 60% similarity at protein level CEPiNS treated as orthologous sequences using BLASTP.

Exon Prediction in Model Species:

SPIDEY is an mRNA to genomic DNA alignment program [6]. When intron-exon boundaries are not already annotated in the reference (model) species, SPIDEY gives the exon boundaries by using a set of mRNA or cDNA sequences and their corresponding genomic sequences. CEPiNS generates a table with the gene ID, genomic sequence ID, exon boundaries in genes and genomic sequences and length of exons by using the table created by SPIDEY. It also creates a file of exon sequences in fasta format.

Exon Prediction in Novel Species:

CEPiNS uses TBLASTX for the alignment at amino acid level in all six reading frames for each predicted genes from transcriptome assembly of the novel species and its corresponding exon sequences of the reference species, which has been created by SPIDEY. CEPiNS creates a table output with cDNA ID of Reference species, cDNA ID of novel species, exon Number, genomic coordinates, mRNA coordinates, length and percent identity. It also creates exon sequences fasta file for novel species.

Software Input and output

CEPiNS requires the input of a set of transcribed genes of a model species, the genomic sequence of the same species and of a predicted gene set from transcriptome assembly of a novel species in fasta file format. The log screen keeps tracking the steps performed and results can be viewed by clicking corresponding buttons. The final outputs are the predicted exons boundaries in a text file and exons sequences in a fasta file of the novel species.

Conclusion

CEPiNS is a package for predicting large scale exons for novel species with Graphical User Interface (GUI) so that biologists, ecologists, geneticists and people from other backgrounds can use it very easy way. The output data offer several opportunities for further work & other tool development.
  6 in total

1.  Spidey: a tool for mRNA-to-genomic alignments.

Authors:  S J Wheelan; D M Church; J M Ostell
Journal:  Genome Res       Date:  2001-11       Impact factor: 9.043

2.  Exon structure conservation despite low sequence similarity: a relic of dramatic events in evolution?

Authors:  M J Betts; R Guigó; P Agarwal; R B Russell
Journal:  EMBO J       Date:  2001-10-01       Impact factor: 11.598

Review 3.  Cross-species sequence comparisons: a review of methods and available resources.

Authors:  Kelly A Frazer; Laura Elnitski; Deanna M Church; Inna Dubchak; Ross C Hardison
Journal:  Genome Res       Date:  2003-01       Impact factor: 9.043

4.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

5.  Statistical features of human exons and their flanking regions.

Authors:  M Q Zhang
Journal:  Hum Mol Genet       Date:  1998-05       Impact factor: 6.150

6.  GeneAlign: a coding exon prediction tool based on phylogenetical comparisons.

Authors:  Shu Ju Hsieh; Chun Yuan Lin; Ning Han Liu; Wei Yuan Chow; Chuan Yi Tang
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

  6 in total
  1 in total

1.  LEMONS - A Tool for the Identification of Splice Junctions in Transcriptomes of Organisms Lacking Reference Genomes.

Authors:  Liron Levin; Dan Bar-Yaacov; Amos Bouskila; Michal Chorev; Liran Carmel; Dan Mishmar
Journal:  PLoS One       Date:  2015-11-25       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.