| Literature DB >> 27242033 |
So Nakagawa1, Mahoko Ueda Takahashi2.
Abstract
In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27242033 PMCID: PMC4885607 DOI: 10.1093/database/baw087
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Comparison of EVE databases.
| Database (URL) | Species | Methods | Released date | Last update date | Reference |
|---|---|---|---|---|---|
| HERVd ( | Human | RepeatMasker with Repbase | Jul 2000 | Sep 2003 | |
| ERE database ( | Mouse | PCR library for LTR U3 sequences | Nov 2007 | Feb 2008 | |
| Homology search (Megablast) | |||||
| gEVE database ( | 19 mammalian species | RetroTector | Apr 2014 | Apr 2015 | This paper |
| RepeatMasker with Repbase | |||||
| Homology search (BLAT) |
Genome data used in the gEVE database and EVE ORF viral profiles for each genome.
| Species | Genome ID | Genome, released date | EVEs (Met) | others | ||||
|---|---|---|---|---|---|---|---|---|
| Human ( | Hsap38 | GRCh38, Dec 2013 | 33 966 (31 292) | 1782 | 1482 | 29 120 (21 087) | 1731 | 11 |
| Chimpanzee ( | Ptro214 | CSAC 2.1.4, Feb 2011 | 30 099 (28 136) | 1813 | 1125 | 25 572 (19 043) | 1719 | 10 |
| Gorilla ( | Ggor31 | gorGor3.1, May 2011 | 26 335 (24 409) | 1456 | 1034 | 22 462 (16 140) | 1486 | 8 |
| Orangutan ( | Pabe2 | PPYG2, Sep 2007 | 28 315 (26 716) | 1214 | 846 | 24 919 (19 492) | 1400 | 14 |
| Baboon ( | Panu2 | Panu_2.0, Jun 2012 | 27 230 (25 192) | 2101 | 1240 | 22 125 (15 476) | 1962 | 5 |
| Macaque ( | Mmul1 | MMUL 1.0, Feb 2006 | 26 941 (25 043) | 1980 | 1130 | 21 968 (15 745) | 2020 | 7 |
| Marmoset ( | Cjac321 | C_jacchus3.2.1, Jan 2010 | 21 802 (20 614) | 992 | 406 | 19 575 (16 070) | 888 | 3 |
| Mouse ( | Mmus38 | GRCm38.p1, Jan 2012 | 61 184 (58 805) | 7494 | 5602 | 46 784 (29 122) | 3075 | 16 |
| Rat ( | Rnor50 | Rnor_5.0, Mar 2012 | 34 861 (32 525) | 2570 | 1491 | 29 258 (21 517) | 1771 | 6 |
| Rabbit ( | Ocun2 | oryCun2, Nov 2009 | 13 214 (12 909) | 438 | 237 | 12 275 (10 473) | 292 | 2 |
| Cow ( | BtauUMD31 | UMD3.1, Dec 2009 | 105 654 (104 674) | 1023 | 673 | 103 402 (98 952) | 648 | 1 |
| Cow ( | Btau461 | Btau_4.6.1 Nov 2011 | 98 016 (97 150) | 860 | 641 | 96 065 (92 153) | 585 | 0 |
| Dog ( | Cfam31 | CanFam3.1, Sep 2011 | 11 393 (11 011) | 399 | 135 | 10 815 (10 019) | 78 | 0 |
| Cat ( | Fcat62 | Felis_catus_6.2, Sep 2011 | 11 132 (10 625) | 694 | 203 | 9,898 (8,505) | 391 | 1 |
| Horse ( | Ecab2 | EquCab2.0, Sep 2007 | 14 391 (13 972) | 190 | 142 | 13 904 (12 554) | 167 | 0 |
| Sheep ( | Oari31 | Oar_v3.1, Sep 2012 | 61 093 (60 184) | 1099 | 517 | 58 940 (55 274) | 628 | 1 |
| Pig ( | Sscr102 | Sscrofa10.2, Aug 2011 | 15 210 (14 761) | 456 | 155 | 14 350 (13 207) | 285 | 9 |
| Goat ( | Chir1 | CHIR_1.0, Jan 2013 | 37 003 (36 060) | 1106 | 508 | 34 797 (31 146) | 653 | 0 |
| Opossum ( | Mdom5 | monDom5, Oct 2006 | 77 190 (73 029) | 2546 | 2723 | 71 821 (46 874) | 1134 | 0 |
| Platypus ( | Oana5 | OANA5, Dec 2005 | 1742 (1365) | 2 | 1 | 1732 (1658) | 7 | 0 |
Number of EVE sequences containing at least an amino acid of Methionine was shown in parentheses.
Number shown in parentheses indicates pol genes that were thought to be derived from LINEs, which were annotated as ‘LINE’ by RepeatMasker and/or ‘YP_073558.1’ or ‘NP_048132.1’ by BLASTP against the NCBI Viral Genome Database.
Figure 1.A schematic workflow of a four-step procedure for identifying EVE ORFs in 20 mammalian genomes. (A) First extraction of EVE candidates by RetroTector and RepeatMasker (STEP1) followed by ORF extraction processes in each genome (STEP2). (B) Second extraction of EVE ORFs by BLAT search for retrieving missed EVE candidates in STEP2 (STEP3). Similarly to the first extraction, EVE ORF datasets are generated by ORF extraction processes (STEP4). This is the final dataset of the gEVE database. The numbers for EVE ORF sequences in (A) and (B) indicate the total numbers of non-overlapping sequences in the 20 mammalian genomes. The numbers of extracted EVE sequences at STEP2 and STEP4 for each genome are shown in the Supplementary Table S3.
Figure 2.Web interface of the gEVE database. (a) A menu bar is shown at the top, and the current page is ‘Annotation Datasheet’. (b) Display option is available to select annotations of interest (boxed in gray dashed line, left). (c) Advanced searches for the EVE annotations such as genome IDs, viral HMM profiles, chromosome ID and amino acid lengths can be given in a new window (boxed in gray dashed line, right). (d) The annotation table or sequences (nucleotide and/or amino acid) shown in the window can be downloaded in tab-delimited format or FASTA format, respectively.
Figure 3.Phylogenetic tree of syncytin-1 like sequences. All sequences over 400 amino acids were extracted from BLASTP hits with e-values
Top 10 highly expressed gEVE sequences in the RNA-seq data of ERR315374
| gEVE ID | HMM profile | Known EVE | FPKM |
|---|---|---|---|
| Hsap38.chr7.94664474.94665679.+ | PEG10 | 481.4 | |
| Hsap38.chr7.94663299.94664531.+ | PEG10 | 392.9 | |
| Hsap38.chr3.129171078.129171320.- | – | 210.5 | |
| Hsap38.chr21.42917294.42917818.- | (suppressyn) | 158.5 | |
| Hsap38.chr21.42918527.42919045.- | suppressyn | 131.1 | |
| Hsap38.chr7.92468768.92470387.- | syncytin-1 | 44.5 | |
| Hsap38.chr21.42919026.42919586.- | (suppressyn) | 30.7 | |
| Hsap38.chr6.11103697.11105316.- | syncytin-2 | 24.6 | |
| Hsap38.chr21.42921853.42922110.- | (suppressyn) | 24.2 | |
| Hsap38.chr16.20680984.20681253.+ | – | 20.7 |
A gene name in parentheses for a gEVE ID represents that the EVE sequence is located close to the known functional EVE sequence. A character, ‘–’, indicates the corresponding sequence is not reported to our knowledge.