| Literature DB >> 23348040 |
Lothar Wissler1, Jürgen Gadau, Daniel F Simola, Martin Helmkampf, Erich Bornberg-Bauer.
Abstract
Orphan genes are defined as genes that lack detectable similarity to genes in other species and therefore no clear signals of common descent (i.e., homology) can be inferred. Orphans are an enigmatic portion of the genome because their origin and function are mostly unknown and they typically make up 10% to 30% of all genes in a genome. Several case studies demonstrated that orphans can contribute to lineage-specific adaptation. Here, we study orphan genes by comparing 30 arthropod genomes, focusing in particular on seven recently sequenced ant genomes. This setup allows analyzing a major metazoan taxon and a comparison between social Hymenoptera (ants and bees) and nonsocial Diptera (flies and mosquitoes). First, we find that recently split lineages undergo accelerated genomic reorganization, including the rapid gain of many orphan genes. Second, between the two insect orders Hymenoptera and Diptera, orphan genes are more abundant and emerge more rapidly in Hymenoptera, in particular, in leaf-cutter ants. With respect to intragenomic localization, we find that ant orphan genes show little clustering, which suggests that orphan genes in ants are scattered uniformly over the genome and between nonorphan genes. Finally, our results indicate that the genetic mechanisms creating orphan genes-such as gene duplication, frame-shift fixation, creation of overlapping genes, horizontal gene transfer, and exaptation of transposable elements-act at different rates in insects, primates, and plants. In Formicidae, the majority of orphan genes has their origin in intergenic regions, pointing to a high rate of de novo gene formation or generalized gene loss, and support a recently proposed dynamic model of frequent gene birth and death.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23348040 PMCID: PMC3590893 DOI: 10.1093/gbe/evt009
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Data Set Overview: Genomes and Selected Features of 30 Arthropod Species
| Species | Genome Version | Source | Abbreviation | Genes | Missing Genes | SSOGs Standard | SSOGs Refined | Distance to MRCA |
|---|---|---|---|---|---|---|---|---|
| L1.2 | [V] | 17,399 | 1,161 | 1,105 | 52 | |||
| 1.2 | [H] | 18,093 | 989 | 2,074 | 1,970 | 10 | ||
| 1 | [A] | 34,821 | 12,522 | 12,151 | 280 | |||
| 2.0/3.8 | [H] | 17,278 | 1,253 | 2,730 | 2,644 | 10 | ||
| P3.6 | [V] | 14,324 | 840 | 769 | 150 | |||
| 2 | [H] | 11,062 | 223 | 431 | 289 | 140 | ||
| 2 | [S] | 14,623 | 2,866 | 2,701 | 285 | |||
| 3.3 | [H] | 17,064 | 676 | 1,896 | 1,761 | 115 | ||
| J1.2 | [V] | 18,882 | 902 | 821 | 52 | |||
| 1.3 | [F] | 15,070 | 944 | 810 | 12 | |||
| 1.3 | [F] | 15,048 | 666 | 558 | 4 | |||
| 1.3 | [F] | 14,986 | 811 | 702 | 32 | |||
| 5.37 | [F] | 13,914 | 318 | 230 | 4 | |||
| 1.3 | [F] | 14,595 | 1,118 | 948 | 24 | |||
| 1.3 | [F] | 16,878 | 878 | 755 | 1 | |||
| 1.3 | [F] | 16,029 | 567 | 464 | 1 | |||
| 1.1 | [J] | 30,907 | 13,709 | 13,181 | 470 | |||
| 1.3 | [F] | 16,471 | 645 | 527 | 1 | |||
| 1.3 | [F] | 15,415 | 637 | 522 | 1 | |||
| 1.2 | [F] | 14,491 | 792 | 680 | 24 | |||
| 1.3 | [F] | 15,513 | 1,230 | 1,105 | 36 | |||
| 1.3 | [F] | 16,082 | 959 | 829 | 4 | |||
| 3.3 | [H] | 18,564 | 622 | 1,919 | 1,391 | 125 | ||
| W1.1 | [V] | 20,486 | 7,007 | 6,677 | 550 | |||
| 1.2 | [H] | 16,116 | 678 | 1,448 | 1,349 | 120 | ||
| 1.2 | [H] | 18,822 | 150 | 2,305 | 2,191 | 150 | ||
| 1.2 | [H] | 17,189 | 729 | 2,173 | 2,054 | 105 | ||
| U1.2 | [V] | 10,774 | 1,176 | 1,096 | 280 | |||
| 2.2.3 | [H] | 16,522 | 1,049 | 926 | 885 | 60 | ||
| 3 | [B] | 16,645 | 3,757 | 3,623 | 300 |
Note.—For each genome, the species name, genome version, and download source are given. A: AphidBase (Legeai et al. 2010); B: BeetleBase (Kim et al. 2010); F: FlyBase (McQuilton et al. 2012); H: Hymenoptera Genome Database (Munoz-Torres et al. 2011); J: DOE Joint Genome Institute (http://www.jgi.doe.gov/); S: SilkDB (Duan et al. 2010); V: VectorBase (Lawson et al. 2009). Abbreviation: Four-letter species abbreviation used throughout this manuscript; Genes: Number of protein-coding genes in the OGS (i.e., excluding possibly missing genes); SSOGs: derived with standard methods (Standard) or comprehensive filtering (Refined); Distance to the MRCA: Evolutionary distance (Myr) to the MRCA node in the phylogenetic tree of these 30 arthropods.
FAbundance of SSOGs and their dependence on the distance to the MRCA (table 1). SSOGs per species were plotted against the distance to the MRCA node in the phylogenetic tree. A linear regression (solid black line) was constructed to fit the observed SSOG counts from 12 Drosophilidae, 3 Culicidae, Bombyx, Tribolium, and Nasonia (circles, R2 = 0.85). The confidence interval and the prediction interval of the linear model are shown in dark and light gray, respectively. The ant SSOG data points were added after fitting the linear model and are shown as triangles.
FContrasting the abundance and rate of emergence of orphan genes between partially overlapping taxonomic groups of Hymenoptera and Diptera. Each tested group is highlighted by a rectangle and the associated group data, including number of species, distance to the MRCA, total orphan gene count, and rate of orphan gene emergence, are shown on the right side. Branch lengths in the phylogenetic tree are approximate values and were obtained from the timetree.org database (Hedges et al. 2006).
Inferred Origins of Orphan Genes in Different Data Sets and Investigated in Different Studies
| This Study | This Study | |||
|---|---|---|---|---|
| Study | ||||
| Data set | Formicidae SSOGs | Attini SSOGs | Primate shared TSOGs | |
| Genome size (Mb) | 250–450 | 300–335 | 1,600–2,870 | 125 |
| Genomic TE content (%) | 8–30 | 25–28 | ∼50 | ∼10 |
| Origins (%) | ||||
| Gene duplication | 9.9 | 6.4 | 24 | 22 |
| Overlap with TE | 12.4 | 10.6 | 53 | 10 |
| Frame shift | 2.2 | 2.2 | NA | 7 |
| Overlapping genes | 11.1 | 13.3 | NA | 1 |
| Intergenic match | 43.5 | 61.2 | 6% de novo | 25 |
| HGT | 0.1 | 0.0 | NA | NA |
| Unexplained | 20.8 | 6.3 | 17 | 35 |
Note.—The Formicidae and Attini data sets consist of 12,054 and 4,614 SSOGss, respectively. Genome stats were obtained from the Arabidopsis Genome Initiative (2000), Lander et al. (2001), Bonasio et al. (2010), Nygaard et al. (2011), Smith CD, et al. (2011); Smith CR, et al. (2011), and Suen et al. (2011).
Expression Support across Formicidae Genes
| Species | Gene Count | Genes Supported | SSOG Count | SSOGs Supported |
|---|---|---|---|---|
| Successful mapping of ESTs to annotated gene models | ||||
| | 18,093 | 2,837 (16%) | 1,970 | 217 (11%) |
| | 16,522 | 4,075 (25%) | 885 | 244 (28%) |
| | 17,189 | 5,920 (34%) | 2,054 | 401 (20%) |
| | 17,740 | 1,480 (8%) | 1,761 | 30 (2%) |
| | 16,116 | 3,089 (19%) | 1,349 | 136 (10%) |
| | 18,564 | 1,727 (9%) | 1,391 | 26 (2%) |
| Successful mapping of RNA-seq reads to annotated gene models | ||||
| | 17,064 | 14,407 (84%) | 1,761 | 1,187 (67%) |
| | 18,564 | 15,913 (86%) | 1,391 | 859 (62%) |