Literature DB >> 24739306

RetrogeneDB--a database of animal retrogenes.

Michał Kabza1, Joanna Ciomborowska1, Izabela Makałowska2.   

Abstract

Retrocopies of protein-coding genes, reverse transcribed and inserted into the genome copies of mature RNA, have commonly been categorized as pseudogenes with no biological importance. However, recent studies showed that they play important role in the genomes evolution and shaping interspecies differences. Here, we present RetrogeneDB, a database of retrocopies in 62 animal genomes. RetrogeneDB contains information about retrocopies, their genomic localization, parental genes, ORF conservation, and expression. To our best knowledge, this is the most complete retrocopies database providing information for dozens of species previously never analyzed in the context of protein-coding genes retroposition. The database is available at http://retrogenedb.amu.edu.pl.
© The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  database; gene duplication; retrogene; retroposition

Mesh:

Substances:

Year:  2014        PMID: 24739306      PMCID: PMC4069623          DOI: 10.1093/molbev/msu139

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Retrogenes, for a long time considered to be not important copies of parental genes are nowadays called “seeds of the evolution,” because they made a significant contribution to genomes evolution (Brosius 1991). It has been shown that they play very important role in the diversification of transcriptomes and proteomes and may be responsible for the wealth of species-specific features (Betrán et al. 2002; Balasubramanian et al. 2009; Szcześniak et al. 2011). As duplicates of their parental genes, they evolve relatively fast, so these genes may acquire novel functions. Retrocopies of protein-coding genes are also known to be involved in many diseases (Prendergast 2001; Ciomborowska et al. 2013). Analyses of retroduplications have been mostly limited to the few mammalian model species (mainly human and mouse) and fruit fly (Kaessmann et al. 2009). Nonmammalian vertebrates have been largely overlooked in retrocopies studies, and our knowledge of their evolution in other animals is even more limited. Although retrocopies are annotated in major genomic databases (Ensembl [Flicek et al. 2014], UCSC Genome Browser [Meyer et al. 2013], National Center for Biotechnology Information Gene [Maglott et al. 2011]), they are often annotated just as “pseudogenes,” the same way as duplicates originated via DNA-based mechanisms. The same problem refers to more specialized database Pseudogene.org (www.pseudogene.org, last accessed January 2014). The most complete retrocopies’ annotations are in Ensembl database; although they are very good for human and mouse, the quality is very poor for remaining genomes. There are only two databases fully dedicated to retrocopies: RCPedia (Navarro and Galante 2013) and HOPPSIGEN (Khelifi et al. 2005). However, the first one contains data only for a few primate species, and the latter is limited to human and mouse. We have analyzed genomes of 62 animal species to identify retrocopies. The search was done based on the similarities between reference genomic sequence and proteins coded by multiexon genes in a given species. To increase accuracy, we applied several criteria to call a genomic region a retrocopy: Length of the alignment at least 150 bp, minimum of 50% coverage of parental gene, minimum of 50% identity, and loss of at least two introns among others (for details see supplementary file S1, Supplementary Material online). Resulting data set was additionally manually inspected to exclude potential false positives, especially copies of transposons annotated as protein-coding genes, which in some genomes totaled for as many as few thousands. Our strategy led to identification of 84,808 retrocopies, including 6,277 protein-coding genes not recognized previously as retrogenes. A total of 64,225 retrocopies identified by us are not present in the Ensembl database, this includes 139 retrocopies in the human and as many as 2,205 in the mouse genome, which belong to the best annotated. Because of our stringent requirements, applied in the order to generate a high-quality data set, the number of identified retrocopies in a given species is considerably lower than in most other databases. However, this method gave consistently good results in both, well and poorly annotated, low-coverage genomes, for example, alpaca or dolphin. The number of retrocopies differs significantly even between closely related species, for example, 4,927 in human vs. 3,285 in chimpanzee. This may be resulting from differences in annotations and from species-specific retroposition events. In addition, retrocopies are polymorphic and higher number of retrocopies in human (vs. chimpanzee) may reflect a large amount of human population data (Abyzov et al. 2013). Retrocopies, as a second copy of the existing gene, evolve relatively quickly and accumulate mutations. However, many of them gain functionality and become subjected to purifying selection (Vinckenbosch et al. 2006; Yu et al. 2007). We compared retrocopies with their progenitors to single out those with conserved ORF, that is, without internal stop codons or frameshifts over the entire alignment. Conserved ORFs in mammals account for 10–25% of retrocopies. In nonmammalian animals, the fraction is much higher, considerably over 50% and in some species close to 100. However, the conservation of the ORF over the length of alignment does not automatically imply that a retrocopy is efficiently translated, even if it is expressed. In selected species, we also identified expressed retrocopies based on the RNA-seq data. Because of the high similarity to parental genes, in the process of reads mapping, we made sure they uniquely and perfectly map to retrocopies (supplementary file S1, Supplementary Material online). This led to the underestimation of retrocopies expression level but prevented false-positive predictions of expressed retrocopies. Approximately 10–20% of mammalian retrocopies are expressed in at least one library at minimal level of 1 RPM (reads per million mapped). In lizard, this number is higher with almost 40% of expressed retrocopies. Majority of expressed retrocopies in marsupials, egg-laying mammals, and nonmammalian species have conserved ORFs. However, in placental mammals, the fraction of expressed retrocopies with conserved ORF is lower, from only 30% in human up to 65% in horse. All the data are stored in MySQL database (www.mysql.com, last accessed September 2013), and the web interface was developed using Django framework (www.djangoproject.com, last accessed January 2014). The database is available at http://retrogenedb.amu.edu.pl (last accessed April 26, 2014) and can be searched either from the retrocopy or the parental gene perspective. The retrocopy search can be done based on the genomic localization, key words, parental gene name, and retrocopy ID, and results can be filtered based on the retrocopy type, ORF conservation, or expression. In addition, a JBrowse genome browser was implemented allowing retrocopy inspection in the genomic context (fig. 1). The search from parental gene perspective enables to identify all retrocopies of a given gene or all orthologs, which were retroposed in any other species. Users can also perform sequence-based search using BLAST tool.
F

Example of RetrogeneDB record with selected data.

Example of RetrogeneDB record with selected data.

Supplementary Material

Supplementary file S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
  15 in total

Review 1.  Actin' up: RhoB in cancer and apoptosis.

Authors:  G C Prendergast
Journal:  Nat Rev Cancer       Date:  2001-11       Impact factor: 60.716

2.  Evolutionary fate of retroposed gene copies in the human genome.

Authors:  Nicolas Vinckenbosch; Isabelle Dupanloup; Henrik Kaessmann
Journal:  Proc Natl Acad Sci U S A       Date:  2006-02-21       Impact factor: 11.205

Review 3.  Retroposons--seeds of evolution.

Authors:  J Brosius
Journal:  Science       Date:  1991-02-15       Impact factor: 47.728

Review 4.  RNA-based gene duplication: mechanistic and evolutionary insights.

Authors:  Henrik Kaessmann; Nicolas Vinckenbosch; Manyuan Long
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

5.  Primate and rodent specific intron gains and the origin of retrogenes with splice variants.

Authors:  Michal W Szcześniak; Joanna Ciomborowska; Witold Nowak; Igor B Rogozin; Izabela Makałowska
Journal:  Mol Biol Evol       Date:  2010-10-01       Impact factor: 16.240

6.  Entrez Gene: gene-centered information at NCBI.

Authors:  Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal:  Nucleic Acids Res       Date:  2010-11-28       Impact factor: 16.971

7.  HOPPSIGEN: a database of human and mouse processed pseudogenes.

Authors:  Adel Khelifi; Khelifi Adel; Laurent Duret; Duret Laurent; Dominique Mouchiroud; Mouchiroud Dominique
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

8.  Ensembl 2014.

Authors:  Paul Flicek; M Ridwan Amode; Daniel Barrell; Kathryn Beal; Konstantinos Billis; Simon Brent; Denise Carvalho-Silva; Peter Clapham; Guy Coates; Stephen Fitzgerald; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah Hunt; Nathan Johnson; Thomas Juettemann; Andreas K Kähäri; Stephen Keenan; Eugene Kulesha; Fergal J Martin; Thomas Maurel; William M McLaren; Daniel N Murphy; Rishi Nag; Bert Overduin; Miguel Pignatelli; Bethan Pritchard; Emily Pritchard; Harpreet S Riat; Magali Ruffier; Daniel Sheppard; Kieron Taylor; Anja Thormann; Stephen J Trevanion; Alessandro Vullo; Steven P Wilder; Mark Wilson; Amonida Zadissa; Bronwen L Aken; Ewan Birney; Fiona Cunningham; Jennifer Harrow; Javier Herrero; Tim J P Hubbard; Rhoda Kinsella; Matthieu Muffato; Anne Parker; Giulietta Spudich; Andy Yates; Daniel R Zerbino; Stephen M J Searle
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

9.  Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes.

Authors:  Suganthi Balasubramanian; Deyou Zheng; Yuen-Jong Liu; Gang Fang; Adam Frankish; Nicholas Carriero; Rebecca Robilotto; Philip Cayting; Mark Gerstein
Journal:  Genome Biol       Date:  2009-01-05       Impact factor: 13.583

10.  Analysis of the role of retrotransposition in gene evolution in vertebrates.

Authors:  Zhan Yu; David Morais; Mahine Ivanga; Paul M Harrison
Journal:  BMC Bioinformatics       Date:  2007-08-24       Impact factor: 3.169

View more
  14 in total

1.  The hnRNP Q-like gene is retroinserted into the B chromosomes of the cichlid fish Astatotilapia latifasciata.

Authors:  Bianca O Carmello; Rafael L B Coan; Adauto L Cardoso; Erica Ramos; Bruno E A Fantinatti; Diego F Marques; Rogério A Oliveira; Guilherme T Valente; Cesar Martins
Journal:  Chromosome Res       Date:  2017-08-03       Impact factor: 5.239

2.  HSDatabase-a database of highly similar duplicate genes from plants, animals, and algae.

Authors:  Xi Zhang; Yining Hu; David Roy Smith
Journal:  Database (Oxford)       Date:  2022-10-08       Impact factor: 4.462

Review 3.  Overcoming challenges and dogmas to understand the functions of pseudogenes.

Authors:  Seth W Cheetham; Geoffrey J Faulkner; Marcel E Dinger
Journal:  Nat Rev Genet       Date:  2019-12-17       Impact factor: 53.242

4.  Processed pseudogene insertions in somatic cells.

Authors:  Haig H Kazazian
Journal:  Mob DNA       Date:  2014-07-02

5.  Comparative genomic analysis of retrogene repertoire in two green algae Volvox carteri and Chlamydomonas reinhardtii.

Authors:  Marcin Jąkalski; Kazutaka Takeshita; Mathieu Deblieck; Kanako O Koyanagi; Izabela Makałowska; Hidemi Watanabe; Wojciech Makałowski
Journal:  Biol Direct       Date:  2016-08-04       Impact factor: 4.540

6.  Inter-population Differences in Retrogene Loss and Expression in Humans.

Authors:  Michał Kabza; Magdalena Regina Kubiak; Agnieszka Danek; Wojciech Rosikiewicz; Sebastian Deorowicz; Andrzej Polański; Izabela Makałowska
Journal:  PLoS Genet       Date:  2015-10-16       Impact factor: 5.917

7.  SinEx DB: a database for single exon coding sequences in mammalian genomes.

Authors:  Roddy Jorquera; Rodrigo Ortiz; F Ossandon; Juan Pablo Cárdenas; Rene Sepúlveda; Carolina González; David S Holmes
Journal:  Database (Oxford)       Date:  2016-06-07       Impact factor: 3.451

8.  Emergence and evolution of inter-specific segregating retrocopies in cynomolgus monkey (Macaca fascicularis) and rhesus macaque (Macaca mulatta).

Authors:  Xu Zhang; Qu Zhang; Bing Su
Journal:  Sci Rep       Date:  2016-09-07       Impact factor: 4.379

9.  RetrogeneDB-a database of plant and animal retrocopies.

Authors:  Wojciech Rosikiewicz; Michal Kabza; Jan G Kosinski; Joanna Ciomborowska-Basheer; Magdalena R Kubiak; Izabela Makalowska
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

10.  The Genomic Impact of Gene Retrocopies: What Have We Learned from Comparative Genomics, Population Genomics, and Transcriptomic Analyses?

Authors:  Claudio Casola; Esther Betrán
Journal:  Genome Biol Evol       Date:  2017-06-01       Impact factor: 3.416

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.