| Literature DB >> 27509525 |
Philipp H Schiffer1,2, Jan Gravemeyer3, Martina Rauscher4, Thomas Wiehe5.
Abstract
Gene duplication is an important mechanism of molecular evolution. It offers a fast track to modification, diversification, redundancy or rescue of gene function. However, duplication may also be neutral or (slightly) deleterious, and often ends in pseudo-geneisation. Here, we investigate the phylogenetic distribution of ultra large gene families on long and short evolutionary time scales. In particular, we focus on a family of NACHT-domain and leucine-rich-repeat-containing (NLR)-genes, which we previously found in large numbers to occupy one chromosome arm of the zebrafish genome. We were interested to see whether such a tight clustering is characteristic for ultra large gene families. Our data reconfirm that most gene family inflations are lineage-specific, but we can only identify very few gene clusters. Based on our observations we hypothesise that, beyond a certain size threshold, ultra large gene families continue to proliferate in a mechanism we term "run-away evolution". This process might ultimately lead to the failure of genomic integrity and drive species to extinction.Entities:
Keywords: NLR-genes; adaptation; gene clusters; gene family; genome evolution; neutral evolution; run-away-evolution; selection
Year: 2016 PMID: 27509525 PMCID: PMC5041008 DOI: 10.3390/life6030032
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
List of all databases which were mined for genome, proteome and annotation files. Corresponding genome assembly statistics.
| Species | Database | N50 | Contigs | Largest Contig | Total Length |
|---|---|---|---|---|---|
| NCBI (adi_v0.9) | 41904 | 18834 | 0.48 Mb | 412 Mb | |
| Ensembl (Aqu1.29) | 120365 | 13397 | 1.9 Mb | 166.7 Mb | |
| Carpbase (v2.0) | 7828866 | 9376 | 29.1 Mb | 1713.7 Mb | |
| Ensembl (Zv9) | 54093808 | 1133 | 77.3 Mb | 1412.5 Mb | |
| NCBI (v2.0) | 2586727 | 398 | 11.5 Mb | 521.9 Mb | |
| NCBI (Spur_4.2) | 421711 | 27578 | 2.5 Mb | 989.4 Mb | |
| Flybase (r1.04) | 18748788 | 5103 | 26.6 Mb | 152.7 Mb | |
| Flybase (r6.07) | 25286936 | 1870 | 32.1 Mb | 143.7 Mb | |
| Flybase (r3.03) | 12541198 | 4463 | 30.8 Mb | 152.6 Mb | |
| Flybase (r2.01) | 23539531 | 2601 | 27.2 Mb | 123.6 Mb | |
| Wormbase (WS249) | 87708 | 11453 | 0.87 Mb | 99.01 Mb | |
| Wormbase (WS249) | 381961 | 3305 | 4.1 Mb | 190.4 Mb | |
| Wormbase (WS249) | 17485439 | 12 | 21.5 Mb | 108.4 Mb | |
| Wormbase (WS249) | 17493829 | 7 | 20.9 Mb | 100.3 Mb | |
| Wormbase (WS249) | 94149 | 18808 | 1.1 Mb | 166.3 Mb | |
| Wormbase (WS249) | 435512 | 3670 | 4.5 Mb | 145.4 Mb | |
| NCBI (v3.1) | 145327772 | 49216 | 229.5 Mb | 3035 Mb | |
| NCBI (v2.0.2) | 135191526 | 61534 | 229.9 Mb | 3411 Mb | |
| NCBI (v2.1.4) | 143986469 | 24128 | 247.5 Mb | 3309 Mb | |
| NCBI (v1.1) | 144709823 | 10209 | 247.9 Mb | 3286 Mb | |
| NCBI (GRCh38.p5) | 145138636 | 517 | 145.1 Mb | 3230 Mb |
Figure 1Domains and proteins with large families in Caenorhabditis, Drosophila, and the great apes. Trees are based on NCBI taxonomy with branch lengths scaled for divergence time following [14,15,16]; grey branches without divergence time estimate. Gene family size divergence calculated for any pair of species as mean squared difference in gene counts. Per nucleotide pairwise substitution rates are calculated with the program Andi [17]. Axes scaling is linear (large plot) and logarithmic (small plot). Linear regression lines are shown only in the large plot.
NACHT-domain and leucine-rich-repeat-containing (NLR) gene candidates (encoding for NACHT domains and LRRs) identified with Pfam, Gene3D and Superfamily in interproscan. Supplemented through PANTHER annotations.
| Species | NLR Genes |
|---|---|
| 276 | |
| 95 | |
| 44 | |
| 153 | |
| 65 |
Figure 2Clusters of F-Box genes found in C. remanei. Each dot indicates the position of an F-box gene on one of the four contigs shown in four lines.
Pairwise comparisons in Fisher’s exact text to identify enriched Pfam domains.
| Species 1 | Compared with |
|---|---|