| Literature DB >> 23185269 |
Daniel N Murphy1, Aoife McLysaght.
Abstract
BACKGROUND: New genes in eukaryotes are created through a variety of different mechanisms. De novo origin from non-coding DNA is a mechanism that has recently gained attention. So far, de novo genes have been described in a handful of organisms, with Drosophila being the most extensively studied. We searched for genes that have appeared de novo in the mouse and rat lineages.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23185269 PMCID: PMC3504067 DOI: 10.1371/journal.pone.0048650
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Flowchart summary of methods used.
Each of the steps taken to obtain the sets of mouse and rat de novo genes is shown in yellow boxes. The numbers of mouse and rat genes remaining after each step are shown in blue boxes.
Summary of the 69 candidate mouse novel genes.
| EnsEMBL ID | Genomic location | Length (aa) | Overlapping genes | Number of exonsa | Knockouts | Peptide evidenceb | Expression evidencec |
| ENSMUSG00000075472 | 11:106683070..106683258:-1 | 62 | ENSMUSG00000018363 | 1 | PeptideAtlas (3) | ArrayExpress, Genevestigator | |
| ENSMUSG00000078251 | 11:113331568..113331885:1 | 106 | ENSMUSG00000041654 | 1 | PeptideAtlas (2) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000074740 | 19:60865923..60867569:1 | 84 | ENSMUSG00000024991 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (1), PeptideAtlas (4) | Bgee, Genevestigator |
| ENSMUSG00000066371 | 9:107774792..107775570:1 | 129 | ENSMUSG00000032582 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (3), PeptideAtlas (4) | ArrayExpress, Bgee, Genevestigator |
| ENSMUSG00000051562 | 9:122929985..122930362:-1 | 125 | 1 | PRIDE (4), PeptideAtlas (5) | ArrayExpress, Bgee, Genevestigator | ||
| ENSMUSG00000072684 | 14:26484519..26486217:1 | 121 | ENSMUSG00000007817 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (3), PeptideAtlas (7) | Bgee, Genevestigator |
| ENSMUSG00000075582 | 14:57719305..57719652:1 | 115 | ENSMUSG00000046352 | 1 | PRIDE (1), PeptideAtlas (4) | Genevestigator | |
| ENSMUSG00000056640 | 14:70057982..70058305:-1 | 107 | ENSMUSG00000085092, ENSMUSG00000034205 | 1 | PRIDE (2), PeptideAtlas (6) | Genevestigator | |
| ENSMUSG00000054990 | 18:25288507..25288980:-1 | 157 | ENSMUSG00000034295, ENSMUSG00000024269 | 1 | PRIDE (2), PeptideAtlas (3) | ArrayExpress, Genevestigator | |
| ENSMUSG00000074880 | 10:80257231..80258205:1 | 115 | ENSMUSG00000061589 | 1 | PeptideAtlas (3) | Genevestigator | |
| ENSMUSG00000055108 | 10:98588633..98588830:-1 | 65 | ENSMUSG00000019952 | 1 | PRIDE (1), PeptideAtlas (3) | ArrayExpress, Bgee, CleanEx, Genevestigator, GermOnline | |
| ENSMUSG00000074246 | 8:74148472..74148777:-1 | 101 | ENSMUSG00000034807 | 1 | PRIDE (2), PeptideAtlas (6) | Bgee, Genevestigator | |
| ENSMUSG00000072431 | 8:80040955..80042477:1 | 122 | ENSMUSG00000037148 | 1 | PeptideAtlas (5) | ||
| ENSMUSG00000037982 | 8:83537426..83539566:1 | 164 | ENSMUSG00000038250 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (3), PeptideAtlas (10) | ArrayExpress, Bgee, Genevestigator |
| ENSMUSG00000078283 | 6:91712358..91712738:-1 | 126 | ENSMUSG00000030098 | 1 | PRIDE (4), PeptideAtlas (5) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000079446 | 6:100654749..100657668:1 | 108 | ENSMUSG00000030074 | 2 (1) | PeptideAtlas (6) | Genevestigator | |
| ENSMUSG00000073546 | 18:65466084..65469530:1 | 103 | ENSMUSG00000032845 | 2 (1) | PRIDE (3), PeptideAtlas (5) | ArrayExpress, Genevestigator | |
| ENSMUSG00000072655 | 6:149234588..149234908:-1 | 106 | 1 | PeptideAtlas (8) | Genevestigator | ||
| ENSMUSG00000063757 | 7:4985680..4988370:1 | 138 | ENSMUSG00000043290 | 1 | PeptideAtlas (5) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000078384 | 7:28886093..28886566:-1 | 157 | ENSMUSG00000047730 | 1 | PeptideAtlas (3) | Genevestigator | |
| ENSMUSG00000070574 | 7:51932335..51933605:1 | 172 | 2 (2) | PRIDE (4), PeptideAtlas (16) | ArrayExpress, Bgee, Genevestigator | ||
| ENSMUSG00000074118 | 7:53038302..53039837:-1 | 106 | ENSMUSG00000062044 | 2 (2) | PRIDE (1) | Genevestigator | |
| ENSMUSG00000074087 | 7:66486831..66487037:-1 | 68 | ENSMUSG00000025326 | 1 | PRIDE (2), PeptideAtlas (3) | Genevestigator | |
| ENSMUSG00000073994 | 7:107610439..107610750:-1 | 103 | ENSMUSG00000047248 | 1 | PeptideAtlas (5) | Bgee, Genevestigator | |
| ENSMUSG00000044407 | 17:10512525..10513094:-1 | 189 | 1 | knockout and mutations in other databases cause phenotypes including mortality | PRIDE (3), PeptideAtlas (10) | Bgee, Genevestigator, Eurexpress | |
| ENSMUSG00000073464 | 17:11924400..11924819:-1 | 139 | ENSMUSG00000023826 | 1 | PRIDE (3), PeptideAtlas (4) | Genevestigator | |
| ENSMUSG00000049740 | 7:114768047..114768379:-1 | 110 | ENSMUSG00000036528 | 1 | PeptideAtlas (3) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000067798 | 5:20045621..20046234:1 | 129 | ENSMUSG00000040003 | 2 (2) | several phenotypes including mortality | PRIDE (1), PeptideAtlas (4) | Bgee, Genevestigator |
| ENSMUSG00000078181 | 5:31947374..31948728:1 | 110 | ENSMUSG00000029142, ENSMUSG00000029136 | 1 | PeptideAtlas (8) | Bgee, Genevestigator | |
| ENSMUSG00000072962 | 5:44491681..44493752:1 | 153 | ENSMUSG00000029086 | 1 | PRIDE (5), PeptideAtlas (12) | Bgee, Genevestigator | |
| ENSMUSG00000057354 | 5:115886084..115886548:-1 | 154 | ENSMUSG00000054256 | 1 | PRIDE (4), PeptideAtlas (8) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000072639 | 5:122689186..122689530:-1 | 114 | ENSMUSG00000064267 | 1 | PRIDE (1), PeptideAtlas (4) | Bgee, Genevestigator | |
| ENSMUSG00000063155 | 5:130698013..130698477:-1 | 154 | ENSMUSG00000053094 | 1 | PeptideAtlas (3) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000021206 | 5:139853651..139856007:1 | 139 | ENSMUSG00000053553, ENSMUSG00000044197 | 1 | PRIDE (4), PeptideAtlas (7) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000073875 | 4:42090151..42090576:-1 | 141 | 1 | PeptideAtlas (1) | |||
| ENSMUSG00000070700 | 4:133559278..133560802:1 | 120 | ENSMUSG00000050966 | 1 | PRIDE (1), PeptideAtlas (6) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000053280 | 17:31570894..31571349:-1 | 151 | ENSMUSG00000041119 | 1 | PRIDE (2), PeptideAtlas (4) | ArrayExpress, Genevestigator | |
| ENSMUSG00000066178 | 4:136018165..136019892:1 | 148 | 1 | knocked out in cell line, no phenotype in live mouse yet | PeptideAtlas (5) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000073719 | 4:144562623..144563060:-1 | 145 | ENSMUSG00000020220 | 1 | knocked out in cell line, no phenotype in live mouse yet | PeptideAtlas (6) | ArrayExpress, Genevestigator |
| ENSMUSG00000054354 | 17:34118437..34122296:1 | 113 | 1 | PRIDE (1), PeptideAtlas (1) | ArrayExpress, Bgee, Genevestigator | ||
| ENSMUSG00000069012 | 3:54517759..54517881:-1 | 40 | ENSMUSG00000027751 | 1 | PeptideAtlas (2) | Genevestigator | |
| ENSMUSG00000074517 | 3:82931484..82932008:-1 | 174 | 1 | PRIDE (3), PeptideAtlas (4) | ArrayExpress, Bgee, Genevestigator | ||
| ENSMUSG00000074318 | 3:107690458..107690682:-1 | 74 | ENSMUSG00000040600 | 1 | PeptideAtlas (4) | Bgee, Genevestigator | |
| ENSMUSG00000074237 | 3:127741612..127743579:1 | 127 | 1 | PRIDE (2), PeptideAtlas (4) | Genevestigator | ||
| ENSMUSG00000054773 | 3:156871295..156871534:-1 | 79 | ENSMUSG00000040037 | 1 | PeptideAtlas (2) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000049276 | X:12605226..12605390:-1 | 54 | 1 | PRIDE (1), PeptideAtlas (2) | ArrayExpress, Bgee, Genevestigator | ||
| ENSMUSG00000073231 | X:39880439..39881044:1 | 80 | ENSMUSG00000016150 | 1 | PeptideAtlas (2) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000072960 | X:133276321..133277682:1 | 141 | ENSMUSG00000031422, ENSMUSG00000087368 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (2), PeptideAtlas (4) | ArrayExpress, Genevestigator |
| ENSMUSG00000072913 | X:147275576..147275953:-1 | 125 | ENSMUSG00000087149 | 1 | PeptideAtlas (12) | Genevestigator | |
| ENSMUSG00000069875 | 2:4088273..4088632:1 | 119 | ENSMUSG00000026657 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (2), PeptideAtlas (14) | Genevestigator |
| ENSMUSG00000073388 | 17:47026744..47027154:-1 | 136 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (1), PeptideAtlas (6) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000074989 | 2:104836539..104836859:-1 | 106 | ENSMUSG00000045106 | 1 | PeptideAtlas (3) | Bgee, Genevestigator | |
| ENSMUSG00000074940 | 2:112201851..112202165:-1 | 104 | ENSMUSG00000027130 | 1 | PRIDE (2), PeptideAtlas (12) | ArrayExpress, Genevestigator | |
| ENSMUSG00000044744 | 1:33726688..33727557:1 | 184 | 1 | PRIDE (2), PeptideAtlas (7) | ArrayExpress, Bgee, Genevestigator | ||
| ENSMUSG00000080025 | 1:37474621..37474944:-1 | 107 | ENSMUSG00000026112 | 1 | PeptideAtlas (3) | ArrayExpress, Genevestigator | |
| ENSMUSG00000073694 | 1:46232431..46232727:1 | 98 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (2), PeptideAtlas (5) | Genevestigator | |
| ENSMUSG00000073531 | 1:158973680..158974325:1 | 71 | 1 | PeptideAtlas (4) | Bgee, Genevestigator | ||
| ENSMUSG00000054546 | 15:27505467..27505868:-1 | 133 | ENSMUSG00000022265 | 1 | PRIDE (1), PeptideAtlas (4) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000078299 | 15:64119151..64119294:-1 | 47 | 1 | PeptideAtlas (2) | ArrayExpress, Bgee, Genevestigator | ||
| ENSMUSG00000078298 | 15:64160033..64160371:-1 | 112 | 1 | PeptideAtlas (4) | Genevestigator | ||
| ENSMUSG00000018006 | 15:78501215..78506545:1 | 158 | ENSMUSG00000043460 | 4 (4) | PeptideAtlas (4) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000075433 | 15:97580292..97581586:1 | 169 | ENSMUSG00000022469 | 1 | PeptideAtlas (8) | Bgee, Genevestigator | |
| ENSMUSG00000043805 | 15:102888711..102889043:-1 | 111 | 1 | PeptideAtlas (1) | Genevestigator | ||
| ENSMUSG00000055849 | 13:25081073..25081390:-1 | 105 | ENSMUSG00000021340 | 1 | PRIDE (1), PeptideAtlas (3) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000047061 | 13:45173740..45174554:1 | 144 | ENSMUSG00000078915 | 1 | PeptideAtlas (12) | ArrayExpress, Bgee, Genevestigator | |
| ENSMUSG00000051555 | 13:55723728..55724168:-1 | 146 | 1 | knocked out in cell line, no phenotype in live mouse yet | PeptideAtlas (5) | Genevestigator | |
| ENSMUSG00000048603 | 13:99086377..99087259:1 | 124 | ENSMUSG00000021660 | 1 | knocked out in cell line, no phenotype in live mouse yet | PRIDE (4), PeptideAtlas (6) | ArrayExpress, Bgee, Genevestigator |
| ENSMUSG00000053556 | 12:81291858..81292307:1 | 149 | ENSMUSG00000015143 | 1 | PeptideAtlas (8) | ArrayExpress, Genevestigator | |
| ENSMUSG00000084085 | 11:18909131..18911186:1 | 122 | ENSMUSG00000020160 | 2 (2) | PRIDE (1), PeptideAtlas (5) | Genevestigator |
a – If the number of exons is greater than 1, the number of exons in which the coding sequence is contained is shown in brackets.
b – Peptide evidence is shown with the databases in which the peptides are found followed by the number of unique peptides.
c – Databases are shown that contain the expression evidence, in the form of EST and microarray data, for each of the respective genes.
Retired in EnsEMBL version 61.
Summary of the 6 candidate rat novel genes.
| EnsEMBL ID | Genomic location | length (aa) | Overlapping genes | Number of exons | Expression evidence |
| ENSRNOG00000038369 | X:68776250..68790582:1 | 208 | 4 (4) | Genevestigator | |
| ENSRNOG00000028932 | 4:80911025..80914836:1 | 97 | Intronic sequence of ENSRNOG00000008063 on opposite strand | 2 (2) | Genevestigator |
| ENSRNOG00000030156 | 18:18805826..18808748:-1 | 135 | 3 (3) | Genevestigator | |
| ENSRNOG00000042175 | 11:64631466..64632612:1 | 70 | 1 | Genevestigator | |
| ENSRNOG00000013433 | 15:47304008..47304328:-1 | 106 | Intronic and exonic sequence of ENSRNOG00000013441 on opposite strand | 1 | ArrayExpress, Genevestigator, GermOnline |
| ENSRNOG00000029808 | 15:60840404..60841246:-1 | 125 | 5′ UTR, some intronic and coding sequence of ENSRNOG00000012594 on opposite strand | 2 (2) | ArrayExpress, Genevestigator |
If the number of exons is greater than 1, the number of exons in which the coding sequence is contained is shown in brackets.
Mouse candidates with evidence for transcription, translation and lineage-specific enabler.
| EnsEMBL ID | Length (aa) | Peptide evidencea | Expression evidenceb | Enabler in mouse |
| ENSMUSG00000075472 | 62 | PeptideAtlas (3) | Gene Expression Atlas, ArrayExpress, Genevestigator | deletion of 5nt causing frameshift |
| ENSMUSG00000075582 | 115 | PRIDE (1), PeptideAtlas (4) | Genevestigator | G->A creating start codon |
| ENSMUSG00000037982 | 164 | PRIDE (3), PeptideAtlas (10) | ArrayExpress, Bgee, Genevestigator, Gene Expression Atlas | T->G removing stop codon |
| ENSMUSG00000078384 | 157 | PeptideAtlas (3) | Genevestigator | deletion of G resulting in frameshift |
| ENSMUSG00000057354 | 154 | PRIDE (4), PeptideAtlas (8) | ArrayExpress, Bgee, Genevestigator, Gene Expression Atlas | deletion of G resulting in frameshift |
| ENSMUSG00000070700 | 120 | PRIDE (1), PeptideAtlas (6) | ArrayExpress, Bgee, Genevestigator, Gene Expression Atlas | deletion of A resulting in frameshift |
| ENSMUSG00000074517 | 174 | PRIDE (3), PeptideAtlas (4) | ArrayExpress, Bgee, Genevestigator, Gene Expression Atlas | C->T creating start codon and deletion of C causing frameshift |
| ENSMUSG00000073388 | 136 | PRIDE (1), PeptideAtlas (6) | ArrayExpress, Bgee, Genevestigator, Gene Expression Atlas | insertion of 38nt resulting in frameshift and novel protein |
| ENSMUSG00000075433 | 169 | PeptideAtlas (8) | Bgee, Genevestigator | T->G creating start codon |
| ENSMUSG00000043805 | 111 | PeptideAtlas (1) | Genevestigator | deletion of C creating start codon, 3 other separate indels causing frameshifts |
| ENSMUSG00000048603 | 124 | PRIDE (4), PeptideAtlas (6) | ArrayExpress, Bgee, Genevestigator, Gene Expression Atlas | indel of several nt causing a frameshift |
a – Peptide evidence is shown with the databases in which the peptides are found followed by the number of unique peptides.
b – Databases are shown that contain the expression evidence, in the form of EST and microarray data, for each of the respective genes.
Retired in EnsEMBL version 61.
Only one unique peptide is considered to be weak evidence for the protein-coding potential of the gene.
Large ORFs are present in ancestral location in other species but a frameshift means they encode completely different proteins.
Knockout experiments and SNPs within mouse de novo genes.
| EnsEMBL ID | Overlapping genes | Knock-outs | SNPs |
| ENSMUSG00000075472 | 3′ UTR of ENSMUSG00000018363 on the same strand | 1 NS in PWK/PhJ | |
| ENSMUSG00000075582 | 1st intron and some coding sequence of ENSMUSG00000046352 on opposite strand | 2NS total: 1NS in 2 strains, 1NS in WSB/EiJ | |
| ENSMUSG00000037982 | 5′ UTR and 1st exon and intron of ENSMUSG00000038250 on opposite strand | knocked out in cell line, no phenotype in live mouse yet | 5NS and 1S total: 4NS in Spretus/EiJ, 1NS and 1S in 14 strains |
| ENSMUSG00000078384 | some coding and intronic sequence of ENSMUSG00000047730 on opposite strand | 3NS and 1S total: 2NS and 1S in Spretus/EiJ, 1NS in 6 strains | |
| ENSMUSG00000057354 | intronic sequence of ENSMUSG00000054256 on opposite strand | 4NS and 2S total: 2NS in 5 strains, 2S in Spretus/EiJ, 1NS in 2 strains, 1NS in 3 strains | |
| ENSMUSG00000070700 | some coding sequence of ENSMUSG00000050966 on opposite strand | 3NS total: 2NS in Spretus/EiJ and 1NS in 2 strains | |
| ENSMUSG00000074517 | 5NS and 2S total: 3NS in PWK/PhJ (2 in same codon producing premature stop), 1S in Spretus/EiJ, 2NS in 2 strains, 1S in 6 strains | ||
| ENSMUSG00000073388 | knocked out in cell line, no phenotype in live mouse yet | 2NS and 3S total: 3S in Spretus/EiJ, 1NS in Spretus/EiJ, 1NS in CAST/EiJ | |
| ENSMUSG00000075433 | some intronic and coding sequence of ENSMUSG00000022469 on opposite strand | 8NS and 4S total: 1NS and 3S in CAST/EiJ, 4NS in 12 srains, 1NS in 11 strains (removing start codon), 1S in Spretus/EiJ, 1NS in 2 strains, 1NS in 10 strains | |
| ENSMUSG00000043805 | 3NS in Spretus/EiJ | ||
| ENSMUSG00000048603 | 5′ UTR and 1st exon and intron of ENSMUSG00000021660 on opposite strand | knocked out in cell line, no phenotype in live mouse yet | 7NS and 4S total: 1S and 5NS in Spretus/Eij, 1S in LPJ, 1S in CAST/EiJ, 1NS in 8 strains, 1S in 8 strains |
SNPs disrupt the valid ORF.
NS – nonsynonymous SNPs.
S – synonymous SNPs.
Figure 2Ancestral regions of mouse gene ENSMUSG00000037982.
A: Conserved synteny of the orthologous region containing the ancestral sequence of the gene in mouse, rat, guinea pig and human. Red boxes indicate orthologous genes, yellow boxes indicate non-orthologous genes, and the green box represents the location of the de novo gene. B: Alignment of the coding sequence of ENSMUSG00000037982 with the ancestral sequence present in rat, guinea pig and human. Red boxes indicate the locations of stop codons and empty triangles indicate the positions of the enabling mutations.
Figure 3Alignment of the coding sequence of ENSMUSG00000037982 with 17 different mouse strains.
In each alignment the mouse reference sequence taken from Ensembl is in the top row. 3A: Sections of the coding sequence available from Ensembl are aligned with the sequences for 17 different mouse strains taken from the Mouse Genome Project database. SNPs are indicated by empty triangles. 3B: Translated peptide sequences for each of the sections in 3A. The locations of each of the non-synonymous and synonymous SNPs are again indicated by empty triangles.
Figure 4Ancestral regions of mouse gene ENSMUSG00000078384.
4A: Conserved synteny of the orthologous region containing the ancestral sequence of the gene in mouse, rat, guinea pig and human. Red boxes indicate orthologous genes, yellow boxes indicate non-orthologous genes, and the green box represents the location of the de novo gene. 4B: Alignment of the coding sequence of ENSMUSG00000078384 with the ancestral sequence present in rat, guinea pig and human. Red boxes indicate the locations of stop codons and empty triangles indicate the positions of the enabling mutations.
Figure 5Alignment of the coding sequence of ENSMUSG00000078384 with 17 different mouse strains.
In each alignment the mouse reference sequence taken from Ensembl is in the top row. 5A: Sections of the coding sequence available from Ensembl are aligned with the sequences for 17 different mouse strains taken from the Mouse Genome Project database. SNPs are indicated by empty triangles. 5B: Translated peptide sequences for each of the sections in 5A. The locations of each of the non-synonymous and synonymous SNPs are again indicated by empty triangles.