Literature DB >> 12743024

Human-mouse gene identification by comparative evidence integration and evolutionary analysis.

Lingang Zhang1, Vladimir Pavlovic, Charles R Cantor, Simon Kasif.   

Abstract

The identification of genes in the human genome remains a challenge, as the actual predictions appear to disagree tremendously and vary dramatically on the basis of the specific gene-finding methodology used. Because the pattern of conservation in coding regions is expected to be different from intronic or intergenic regions, a comparative computational analysis can lead, in principle, to an improved computational identification of genes in the human genome by using a reference, such as mouse genome. However, this comparative methodology critically depends on three important factors: (1) the selection of the most appropriate reference genome. In particular, it is not clear whether the mouse is at the correct evolutionary distance from the human to provide sufficiently distinctive conservation levels in different genomic regions, (2) the selection of comparative features that provide the most benefit to gene recognition, and (3) the selection of evidence integration architecture that effectively interprets the comparative features. We address the first question by a novel evolutionary analysis that allows us to explicitly correlate the performance of the gene recognition system with the evolutionary distance (time) between the two genomes. Our simulation results indicate that there is a wide range of reference genomes at different evolutionary time points that appear to deliver reasonable comparative prediction of human genes. In particular, the evolutionary time between human and mouse generally falls in the region of good performance; however, better accuracy might be achieved with a reference genome further than mouse. To address the second question, we propose several natural comparative measures of conservation for identifying exons and exon boundaries. Finally, we experiment with Bayesian networks for the integration of comparative and compositional evidence.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12743024      PMCID: PMC403647          DOI: 10.1101/gr.703903

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  42 in total

1.  Codon-substitution models for heterogeneous selection pressure at amino acid sites.

Authors:  Z Yang; R Nielsen; N Goldman; A M Pedersen
Journal:  Genetics       Date:  2000-05       Impact factor: 4.562

2.  Analysis of expressed sequence tags indicates 35,000 human genes.

Authors:  B Ewing; P Green
Journal:  Nat Genet       Date:  2000-06       Impact factor: 38.330

3.  A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes.

Authors:  J B Hogenesch; K A Ching; S Batalov; A I Su; J R Walker; Y Zhou; S A Kay; P G Schultz; M P Cooke
Journal:  Cell       Date:  2001-08-24       Impact factor: 41.582

4.  Comparative ab initio prediction of gene structures using pair HMMs.

Authors:  Irmtraud M Meyer; Richard Durbin
Journal:  Bioinformatics       Date:  2002-10       Impact factor: 6.937

Review 5.  Computational prediction of eukaryotic protein-coding genes.

Authors:  Michael Q Zhang
Journal:  Nat Rev Genet       Date:  2002-09       Impact factor: 53.242

6.  A hidden Markov model that finds genes in E. coli DNA.

Authors:  A Krogh; I S Mian; D Haussler
Journal:  Nucleic Acids Res       Date:  1994-11-11       Impact factor: 16.971

7.  Maximum-Likelihood Models for Combined Analyses of Multiple Sequence Data

Authors: 
Journal:  J Mol Evol       Date:  1996-05       Impact factor: 2.395

Review 8.  Computational methods for the identification of genes in vertebrate genomic sequences.

Authors:  J M Claverie
Journal:  Hum Mol Genet       Date:  1997       Impact factor: 6.150

9.  A codon-based model of nucleotide substitution for protein-coding DNA sequences.

Authors:  N Goldman; Z Yang
Journal:  Mol Biol Evol       Date:  1994-09       Impact factor: 16.240

10.  Human and mouse gene structure: comparative analysis and application to exon prediction.

Authors:  S Batzoglou; L Pachter; J P Mesirov; B Berger; E S Lander
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

View more
  10 in total

1.  Subtree power analysis and species selection for comparative genomics.

Authors:  Jon D McAuliffe; Michael I Jordan; Lior Pachter
Journal:  Proc Natl Acad Sci U S A       Date:  2005-05-23       Impact factor: 11.205

2.  Prediction of small, noncoding RNAs in bacteria using heterogeneous data.

Authors:  Brian Tjaden
Journal:  J Math Biol       Date:  2007-03-13       Impact factor: 2.259

3.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures.

Authors:  Alexander Stark; Michael F Lin; Pouya Kheradpour; Jakob S Pedersen; Leopold Parts; Joseph W Carlson; Madeline A Crosby; Matthew D Rasmussen; Sushmita Roy; Ameya N Deoras; J Graham Ruby; Julius Brennecke; Emily Hodges; Angie S Hinrichs; Anat Caspi; Benedict Paten; Seung-Won Park; Mira V Han; Morgan L Maeder; Benjamin J Polansky; Bryanne E Robson; Stein Aerts; Jacques van Helden; Bassem Hassan; Donald G Gilbert; Deborah A Eastman; Michael Rice; Michael Weir; Matthew W Hahn; Yongkyu Park; Colin N Dewey; Lior Pachter; W James Kent; David Haussler; Eric C Lai; David P Bartel; Gregory J Hannon; Thomas C Kaufman; Michael B Eisen; Andrew G Clark; Douglas Smith; Susan E Celniker; William M Gelbart; Manolis Kellis
Journal:  Nature       Date:  2007-11-08       Impact factor: 49.962

4.  The truth about mouse, human, worms and yeast.

Authors:  David R Nelson; Daniel W Nebert
Journal:  Hum Genomics       Date:  2004-01       Impact factor: 4.639

5.  Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure.

Authors:  Avril Coghlan; Richard Durbin
Journal:  Bioinformatics       Date:  2007-05-05       Impact factor: 6.937

6.  GeneWaltz--A new method for reducing the false positives of gene finding.

Authors:  Kazuharu Misawa; Reiko F Kikuno
Journal:  BioData Min       Date:  2010-09-28       Impact factor: 2.522

7.  Gene finding in the chicken genome.

Authors:  Eduardo Eyras; Alexandre Reymond; Robert Castelo; Jacqueline M Bye; Francisco Camara; Paul Flicek; Elizabeth J Huckle; Genis Parra; David D Shteynberg; Carine Wyss; Jane Rogers; Stylianos E Antonarakis; Ewan Birney; Roderic Guigo; Michael R Brent
Journal:  BMC Bioinformatics       Date:  2005-05-30       Impact factor: 3.169

8.  Using several pair-wise informant sequences for de novo prediction of alternatively spliced transcripts.

Authors:  Paul Flicek; Michael R Brent
Journal:  Genome Biol       Date:  2006-08-07       Impact factor: 13.583

9.  An empirical analysis of training protocols for probabilistic gene finders.

Authors:  William H Majoros; Steven L Salzberg
Journal:  BMC Bioinformatics       Date:  2004-12-21       Impact factor: 3.169

10.  Performance and scalability of discriminative metrics for comparative gene identification in 12 Drosophila genomes.

Authors:  Michael F Lin; Ameya N Deoras; Matthew D Rasmussen; Manolis Kellis
Journal:  PLoS Comput Biol       Date:  2008-04-18       Impact factor: 4.475

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.