MOTIVATION: Retrocopies are important genes in the genomes of almost all higher eukaryotes. However, the annotation of such genes is a non-trivial task. Intronless genes have often been considered to be retroposed copies of intron-containing paralogs. Such categorization relies on the implicit premise that alignable regions of the duplicates should be long enough to cover exon-exon junctions of the intron-containing genes, and thus intron loss events can be inferred. Here, we examined the alternative possibility that intronless genes could be generated by partial DNA-based duplication of intron-containing genes in the fruitfly genome. RESULTS: By building pairwise protein-, transcript- and genome-level DNA alignments between intronless genes and their corresponding intron-containing paralogs, we found that alignments do not cover exon-exon junctions in 40% of cases and thus no intron loss could be inferred. For these cases, the candidate parental proteins tend to be partially duplicated, and intergenic sequences or neighboring genes are included in the intronless paralog. Moreover, we observed that it is significantly less likely for these paralogs to show inter-chromosomal duplication and testis-dominant transcription, compared to the remaining 60% of cases with evidence of clear intron loss (retrogenes). These lines of analysis reveal that DNA-based duplication contributes significantly to the 40% of cases of single exon gene duplication. Finally, we performed an analogous survey in the human genome and the result is similar, wherein 34% of the cases do not cover exon-exon junctions. Thus, genome annotation for retrogene identification should discard candidates without clear evidence of intron loss. CONTACT: mlong@uchicago.edu; zhangy@uchicago.edu
MOTIVATION: Retrocopies are important genes in the genomes of almost all higher eukaryotes. However, the annotation of such genes is a non-trivial task. Intronless genes have often been considered to be retroposed copies of intron-containing paralogs. Such categorization relies on the implicit premise that alignable regions of the duplicates should be long enough to cover exon-exon junctions of the intron-containing genes, and thus intron loss events can be inferred. Here, we examined the alternative possibility that intronless genes could be generated by partial DNA-based duplication of intron-containing genes in the fruitfly genome. RESULTS: By building pairwise protein-, transcript- and genome-level DNA alignments between intronless genes and their corresponding intron-containing paralogs, we found that alignments do not cover exon-exon junctions in 40% of cases and thus no intron loss could be inferred. For these cases, the candidate parental proteins tend to be partially duplicated, and intergenic sequences or neighboring genes are included in the intronless paralog. Moreover, we observed that it is significantly less likely for these paralogs to show inter-chromosomal duplication and testis-dominant transcription, compared to the remaining 60% of cases with evidence of clear intron loss (retrogenes). These lines of analysis reveal that DNA-based duplication contributes significantly to the 40% of cases of single exon gene duplication. Finally, we performed an analogous survey in the human genome and the result is similar, wherein 34% of the cases do not cover exon-exon junctions. Thus, genome annotation for retrogene identification should discard candidates without clear evidence of intron loss. CONTACT: mlong@uchicago.edu; zhangy@uchicago.edu
Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney Journal: Genome Res Date: 2002-10 Impact factor: 9.043
Authors: Yong E Zhang; Maria D Vibranovski; Patrick Landback; Gabriel A B Marais; Manyuan Long Journal: PLoS Biol Date: 2010-10-05 Impact factor: 8.029
Authors: R M Kuhn; D Karolchik; A S Zweig; H Trumbower; D J Thomas; A Thakkapallayil; C W Sugnet; M Stanke; K E Smith; A Siepel; K R Rosenbloom; B Rhead; B J Raney; A Pohl; J S Pedersen; F Hsu; A S Hinrichs; R A Harte; M Diekhans; H Clawson; G Bejerano; G P Barber; R Baertsch; D Haussler; W J Kent Journal: Nucleic Acids Res Date: 2006-11-16 Impact factor: 16.971
Authors: Roddy Jorquera; Rodrigo Ortiz; F Ossandon; Juan Pablo Cárdenas; Rene Sepúlveda; Carolina González; David S Holmes Journal: Database (Oxford) Date: 2016-06-07 Impact factor: 3.451