| Literature DB >> 22984517 |
Yun Zhu1, Peng Du, Luay Nakhleh.
Abstract
Gene duplication has long been acknowledged by biologists as a major evolutionary force shaping genomic architectures and characteristics across the Tree of Life. Major research has been conducting on elucidating the fate of duplicated genes in a variety of organisms, as well as factors that affect a gene's duplicability--that is, the tendency of certain genes to retain more duplicates than others. In particular, two studies have looked at the correlation between gene duplicability and its degree in a protein-protein interaction network in yeast, mouse, and human, and another has looked at the correlation between gene duplicability and its complexity (length, number of domains, etc.) in yeast. In this paper, we extend these studies to six species, and two trends emerge. There is an increase in the duplicability-connectivity correlation that agrees with the increase in the genome size as well as the phylogenetic relationship of the species. Further, the duplicability-complexity correlation seems to be constant across the species. We argue that the observed correlations can be explained by neutral evolutionary forces acting on the genomic regions containing the genes. For the duplicability-connectivity correlation, we show through simulations that an increasing trend can be obtained by adjusting parameters to approximate genomic characteristics of the respective species. Our results call for more research into factors, adaptive and non-adaptive alike, that determine a gene's duplicability.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22984517 PMCID: PMC3439388 DOI: 10.1371/journal.pone.0044491
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Duplicability-connectivity correlations.
| Ecol | Scer | Dmel | Cele | Mmus | Hsap | |
| Number of gene families | 2906 | 5383 | 8054 | 10260 | 9247 | 10158 |
| Number of genes | 4258 | 6692 | 13917 | 20389 | 22791 | 21227 |
|
| −0.138 | 0.081 | 0.172 | 0.221 | 0.224 | 0.290 |
|
| 10−13 | 10−8 | 10−15 | 10−15 | 10−15 | 10−15 |
Correlations between gene duplicability and connectivity in six species: H. sapien (Hsap), M. musculus (Mmus), D. melanogaster (Dmel), C. elegans (Cele), S. cerevisiae (Scer), and E. coli (Ecol). The ‘Number of gene families’ row contains, for each species, the number of gene families that had at least one member for that species. The ‘Number of genes’ row contains, for each species, the number of genes covered by the gene families. The value is Spearman’s rank correlation coefficient between duplicability and connectivity, and the -value is computed for the correlation.
Figure 1Duplicability-connectivity correlations vs. genome sizes and evolutionary relationship.
Spearman’s rank correlation coefficient () between gene duplicability and gene connectivity for six species: H. sapien (Hsap), M. musculus (Mmus), D. melanogaster (Dmel), C. elegans (Cele), S. cerevisiae (Scer), and E. coli (Ecol). The evolutionary relationship of the species is based in part on [46]. Genome size (in Mbp) information for all species, except E. coli, were obtained from the Animal Genome Size Database and the Fungal Genome Database.
Parameters and results for four simulation settings under the subfunctionalization model (model Ib in [5]) and neofunctionalization model (model IIc in [5]).
| setting I | setting II | setting III | setting IV | |
| duplication rate | 0.00001 | 0.000012 | 0.000014 | 0.000016 |
| fraction of edge loss (for model Ib) | 0.8 | 0.4 | 0.2 | 0.1 |
| fraction of edge gain (for model IIc) | 0.1 | 0.2 | 0.4 | 0.8 |
|
| −0.685 | −0.349 | −0.245 | −0.089 |
|
| 0.807 | 0.672 | 0.371 | 0.284 |
|
| 0.186 | 0.453 | 0.737 | 0.892 |
|
| −0.099 | −0.390 | −0.613 | −0.782 |
Fraction of edge loss indicates the number of edges that a duplicated gene loses, when it undergoes subfunctionalization, as a proportion of the number of that gene’s existing edges. Fraction of edge gain indicates the number of new edges a duplicated gene gains, when it acquires a new function, as a proportion of the number of that gene’s existing edges. The correlations are calculated by applying Spearman’s rank correlation. (p-values are less than .).
Figure 2Duplicability-connectivity correlations in simulations.
Spearman’s rank correlation coefficient () between gene duplicability and gene connectivity for different settings under the subfunctionalization model (model Ib in [5]) and the neofunctionalization model (model IIc in [5]). The parameter values in each of the four settings are given in Table 2.
Duplicability-complexity correlations.
| Ecol | Scer | Dmel | Cele | Mmus | Hsap | |
| #families | 2906 | 5383 | 8054 | 10260 | 9247 | 10158 |
| #genes | 4258 | 6692 | 13917 | 20389 | 22791 | 21227 |
|
| 0.234 | 0.137 | 0.137 | 0.183 | 0.240 | 0.255 |
|
| 10−15 | 10−15 | 10−15 | 10−15 | 10−15 | 10−15 |
|
| 0.232 | 0.133 | 0.270 | 0.282 | 0.379 | 0.325 |
|
| 10−15 | 10−15 | 10−15 | 10−15 | 10−15 | 10−15 |
Correlations between gene duplicability and length and between gene duplicability and number of domains, in six species: H. sapien (Hsap), M. musculus (Mmus), D. melanogaster (Dmel), C. elegans (Cele), S. cerevisiae (Scer), and E. coli (Ecol). The numbers of gene families and genes for each of the six species are the same as in Table 1. The value is Spearman’s rank correlation coefficient between duplicability and connectivity, and the -value is computed for the correlation.
Nine models of gene duplication; reproduced from [5].
| Model | Description | Mutation | Fitness |
| Ia | Extra copies of a gene are redundant andcan be relieved from purifying selection | pseudogenization and very rare new functionalization | maintained at |
| Ib | Each gene has subfunctions; functionally complementary copies produce one function | mutation removes a subfunction or whole function | same as Ia, with complementary copies treated as a functioning copy |
| Ic | functionally complementary copies canspecialize and be more advantageous | same as Ib | specialized copy has increased fitness value |
| IIa | Extra copies are always beneficial | same as Ia | increase in dosage results in increase in fitness |
| IIb | Extra copies can shield genesagainst deleterious mutations | same as Ia; simulated with a highermutation rate | same as Ia |
| IIc | Gene duplication develops amodified function | mutation can introduce new functionsto the extra copies | new functions increase fitness |
| IIIa | Original gene carries multiple subfunctions whichcan adapt to full-fledged functions in extra copies | mutation can adapt the subfunction to full function in extra copies | extra new full function increases fitness |
| IIIb | Different allele types pre-exist in population;duplication and recombination togethercan create advantageous heterozygote | pseudogenization | heterozygote genes have higher fitness |
| IIIc | Similar to IIIc, with multi-allelic diversitybeing advantageous | pseudogenization | genes that accumulate several different alleles have higher fitness |
Parameter settings used in the simulations (units for all rates are “per gene per generation”).
| population size |
|
| num of generation |
|
| fitness coefficient ( |
|
| duplication fitness coefficients ( |
|
| duplication rate |
|
| null function mutation rate | 10−5 |
| edge mutation rate | 10−5 |
| functional innovation rate | 10−7 |
| gene conversion rate |
|
| recombination rate |
|