| Literature DB >> 19087332 |
Kevin Hannay1, Edward M Marcotte, Christine Vogel.
Abstract
BACKGROUND: One mechanism to account for robustness against gene knockouts or knockdowns is through buffering by gene duplicates, but the extent and general correlates of this process in organisms is still a matter of debate. To reveal general trends of this process, we provide a comprehensive comparison of gene essentiality, duplication and buffering by duplicates across seven bacteria (Mycoplasma genitalium, Bacillus subtilis, Helicobacter pylori, Haemophilus influenzae, Mycobacterium tuberculosis, Pseudomonas aeruginosa, Escherichia coli), and four eukaryotes (Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Mus musculus (mouse)).Entities:
Mesh:
Year: 2008 PMID: 19087332 PMCID: PMC2627895 DOI: 10.1186/1471-2164-9-609
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Essentiality and gene duplicates in ten bacterial and eukaryotic organisms.
| Random insertion (clones) | 460 | 364 | 89 | -0.13 | 0.35 | 0.16 | |
| Random insertion (population) | 1,559 | 329 | 358 | 0.13*** | 0.26 | 0.14 | |
| Random insertion (population) | 1,704 | 631 | 400 | 0.01* | 0.27 | 0.20 | |
| Random insertion (population) | 3,920 | 614 | 1,683 | 0.06*** | 0.63* | 0.40 | |
| Random insertion (clones) | 5,566 | 364 | 2,689 | 0.07*** | 0.64** | 0.80** | |
| Targeted insertion (clones) | 4,105 | 191 | 1,857 | 0.0045* | 0.37 | 0.01 | |
| Targeted knockout (clones) | 3,221 | 291 | 1,940 | 0.06*** | 0.64** | 0.82** | |
| Targeted knockout (clones) | 5,318 | 952 | 2,531 | 0.12*** | 0.00 | 0.72* | |
| Targeted knockdown (clones) | 13,915 | 1,345 | 9,203 | 0.09*** | 0.74** | 0.92*** | |
| Targeted knockdown in cell line (clones) | 12,145 | 318 | 7,004 | 0.01*** | 0.00 | 0.60* | |
| Collection of individual experiments | 4,267 | 1,438 | 3,664 | -0.07** | 0.03 | 0.00 | |
The table summarizes properties of the eleven organisms in our analysis, such as (from left to right) the names of the organims; the type of KO/KD experiment; the number of genes tested for their essentiality in gene-KO or KD experiments; the number of genes resulting in lethal phenotypes (essential genes); the number of genes with one or more duplicates (D ≥ 1) amongst the tested genes; the contribution of duplicates to buffering C = P(S|D ≥ 1)/P(S|D = 0) - 1; the correlation between P(S) and effective family size of the genes D (D ranges from 0 to 8+, see text); and the correlation between P(S) and distance of a gene to its nearest neighbor (measured in -log(E-value), bin size 5). In the experimental descriptions, 'clones' refers to clonal outgrowth on plates or in cultures; 'population' refers to (mixed) population outgrowth in liquid culture. P-value thresholds of 0.05, 0.01, and 0.001 are marked with *, **, and ***, respectively.
KD – knockdown; KO – knockout; P(S) – probability of survival; D – effective gene family size (number of additional gene duplicates)
Figure 1Chances of survival upon gene-KO/KD vary between organisms. While the number and fraction of duplicate genes increases from prokaryotes to single- and multi-cellular eukaryotes, the fraction of essential genes (and hence chances of survival upon gene-KO/KD) vary widely. The three panels show the probability of survival P(S)(A), the gene family distribution and number of genes with duplicates (D ≥ 1)(B). Singleton genes are labeled D = 0, members of two-gene families are labeled D = 1, members of larger gene families are labeled D ≥ 2. Red bars indicate values for all genes, as also listed in Table 1. High (black) and low (white) gene expression levels are estimated by codon bias indices (see methods). Significant differences between genes of high and low expression (χ2 test) are marked with ** (P-value ≤ 0.01) and *** (P-value ≤ 0.001). D – effective gene family size (number of additional duplicates of a gene); S – survival upon gene deletion (1-essentiality). Mgen – Mycoplasma genitalium; Hpyl – Helicobacter pylori; Hinf – Haemophilus influenzae; Mtub – Mycobacterium tuberculosis; Paer – Pseudomonas aeruginosa; Bsub – Bacillus subtilis; Ecol – Escherichia coli; Scer – Saccharomyces cerevisiae (yeast); Cele – Caenorhabditis elegans (worm); Dmel – Drosophila melanogaster (fly); Mmus – Mus musculus (mouse).
Figure 2Small but significant buffering of duplicate genes against gene-KO/KD. In most organisms of our analysis, duplicates contribute significantly to survival against gene-KO/KD (P-value ≤ 0.05), although to only a small extent. Buffering is increased amongst genes of high expression levels (high CBI, black bars) compared to genes of lower expression levels (white bars). In highly expressed genes, duplicates contribute to survival by up to 23% (E. coli). Significant enrichment of duplicates amongst non-essential genes (hypergeometric distribution) and significant differences between genes of high and low expression (χ2 test) are marked with *, **, and *** for P-value thresholds of 0.05, 0.01, and 0.001, respectively. For abbreviations see Figure 1.
Figure 3Survival upon single gene-KO/KD is correlated with the number of duplicates present and their distance to the gene only in some organisms. For E. coli, yeast and worm, we deconvolute the set of duplicates into different effective family sizes (A), or according to the distance with respect to sequence between the deleted gene and its nearest homolog (B). In E. coli and worm, chances of survival increase slightly with an increasing number of duplicates present per gene (D) or increasing sequence similarity (as measured by the E-value). Yeast has no correlation between the effective family size and survival (A), but chances for survival are higher in two-gene families (D = 1) than in larger families (D ≥ 2). For abbreviations see Figure 1.
Characteristics of buffering and non-buffering yeast two-gene families
| [ | 4.948 | 91 | 0.906 | 14 | 4.04* | |
| [ | 35040 | 29 | 2116 | 4 | 2.84 | |
| [ | 66299.9 | 99 | 91885.0 | 16 | -2.33 | |
| [ | 0.232 | 99 | 0.134 | 16 | 4.97* | |
| [ | 0.187 | 99 | 0.051 | 16 | 5.18* | |
| [ | 0.632 | 90 | 0.056 | 12 | 3.45* | |
| [ | 5.733 | 85 | 1.388 | 11 | 4.07* | |
| [ | 0.109 | 85 | 0.040 | 11 | 2.87 | |
| [ | 108.5 | 74 | 177.1 | 13 | -0.50 | |
| [ | 0.056 | 56 | 0.113 | 8 | -1.95 | |
| [ | 8.1 | 94 | 5.8 | 15 | 1.52 | |
| [ | 15.2 | 84 | 4.3 | 14 | 4.50* | |
| BLAST output | 54.3 | 50 | 32.5 | 8 | 4.91* | |
| [ | 1.27 | 48 | 1.63 | 8 | -1.26 | |
| [ | 0.15 | 23 | 0.04 | 7 | 2.04 | |
| [ | 0.13 | 25 | 0.03 | 8 | 2.01 | |
| See methods | 0.01 | 26 | 0.07 | 7 | -1.49 | |
| [ | 0.17 | 10 | 0.11 | 2 | 0.27 | |
| [ | 1556 | 254 | 1359 | 32 | 1.10 | |
| [ | 0.396 | 254 | 0.326 | 32 | 2.46 | |
| Analysis by [ | 0.34 | 0.50 | ||||
The table lists a selection of characteristics tested for the two sets of buffering and non-buffering yeast two-gene families, respectively. Also see Table 3 for description of the data. A small number of characteristics could also be tested for worm two-gene families, identified in published work [33]. Due to multiple hypothesis testing, a t-score > 3.26 should be considered significant at an adjusted P-value of 0.05 (Bonferroni); significant scores are marked with *. An E-value of '0' signifies an E-value that is smaller than 10-360.
Examples of yeast buffering two-gene families (SSL double-KO phenotype)
| Formin, nucleates the formation of linear actin filaments, involved in cell processes such as budding and mitotic spindle orientation which require the formation of polarized actin cables, functionally redundant with BNI1 | Formin, nucleates the formation of linear actin filaments, involved in cell processes such as budding and mitotic spindle orientation which require the formation of polarized actin cables, functionally redundant with BNR1 | 1E-82 | 32 | ||
| One of two isozymes of HMG-CoA reductase that catalyzes the conversion of HMG-CoA to mevalonate, which is a rate-limiting step in sterol biosynthesis; localizes to the nuclear envelope; overproduction induces the formation of karmellae | One of two isozymes of HMG-CoA reductase that convert HMG-CoA to mevalonate, a rate-limiting step in sterol biosynthesis; overproduction induces assembly of peripheral ER membrane arrays and short nuclear-associated membrane stacks | 0 | 62 | ||
| Glycerol-3-phosphate acyltransferase located in both lipid particles and the ER; involved in the stepwise acylation of glycerol-3-phosphate and dihydroxyacetone, which are intermediate steps in lipid biosynthesis | Glycerol 3-phosphate/dihydroxyacetone phosphate dual substrate-specific sn-1 acyltransferase of the glycerolipid biosynthesis pathway, prefers 16-carbon fatty acids, similar to Gpt2p, gene is constitutively transcribed | 2E-118 | 36 | ||
| Guanosine diphosphatase located in the Golgi, involved in the transport of GDP-mannose into the Golgi lumen by converting GDP to GMP after mannose is transferred its substrate | Apyrase with wide substrate specificity, involved in preventing the inhibition of glycosylation by hydrolyzing nucleoside tri- and diphosphates which are inhibitors of glycotransferases; partially redundant with Gda1p | 5E-28 | 27 | ||
| ER membrane protein involved in regulation of OLE1 transcription, acts with homolog Mga2p; inactive ER form dimerizes and one subunit is then activated by ubiquitin/proteasome-dependent processing followed by nuclear targeting | ER membrane protein involved in regulation of OLE1 transcription, acts with homolog Spt23p; inactive ER form dimerizes and one subunit is then activated by ubiquitin/proteasome-dependent processing followed by nuclear targeting | 1E-163 | 37 | ||
| Evolutionarily conserved protein with similarity to Orm2p, required for resistance to agents that induce the unfolded protein response; human ortholog is located in the endoplasmic reticulum | Evolutionarily conserved protein with similarity to Orm1p, required for resistance to agents that induce the unfolded protein response; human ortholog is located in the endoplasmic reticulum | 3E-68 | 72 | ||
| Beta subunit of the Sec61p ER translocation complex (Sec61p-Sss1p-Sbh1p); involved in protein translocation into the endoplasmic reticulum; interacts with the exocyst complex | Ssh1p-Sss1p-Sbh2p complex component, involved in protein translocation into the endoplasmic reticulum | 8E-19 | 55 | ||
| Ceramide synthase component, involved in synthesis of ceramide from C26(acyl)-coenzyme A and dihydrosphingosine or phytosphingosine, functionally equivalent to Lac1p | Ceramide synthase component, involved in synthesis of ceramide from C26(acyl)-coenzyme A and dihydrosphingosine or phytosphingosine, functionally equivalent to Lag1p | 6E-169 | 73 | ||
| Constituent of 66S pre-ribosomal particles, required for ribosomal large subunit maturation; functionally redundant with Ssf2p | Protein required for ribosomal large subunit maturation, functionally redundant with Ssf1p | 0 | 94 | ||
| Protein required for beta-1,6 glucan biosynthesis; putative beta-glucan synthase; appears functionally redundant with Skn1p | Protein involved in sphingolipid biosynthesis; type II membrane protein with similarity to Kre6p | 0 | 68 | ||
Two-gene families and their phenotypes in double-KOs are a good model for buffering by gene duplicates. We distinguish between 'buffering genes' (50), i. e. gene pairs resulting in a synthetic sick or lethal (SSL) phenotype upon double-KO; and 'non-buffering genes' (eight), i. e. gene pairs that result in a viable phenotype upon double gene-KO, and which are thus unlikely to buffer for each other in single gene-KO.
Tables 3 and 4 list the functions of a subset of buffering and all eight non-buffering gene pairs, respectively, with one pair per row. The ten buffering gene pairs in this table originate from the same large-scale screens as the eight non-buffering pairs in table 4. The remaining 40 buffering gene pairs originate from small-scale screens, and are listed in the Additional file 2. The descriptions of functions are taken from SGD [66]. Buffering genes (this table) are more often described as having identical functions than non-buffering genes (Table 4).
Examples of yeast non-buffering two-gene families (viable phenotype in double-KO)
| Alpha-1,6-mannosyltransferase involved in cell wall mannan biosynthesis; subunit of a Golgi-localized complex that also contains Anp1p, Mnn9p, Mnn11p, and Mnn10p; identified as a suppressor of a cell lysis sensitive pkc1-371 allele | Mannosyltransferase of the cis-Golgi apparatus, initiates the polymannose outer chain elongation of N-linked oligosaccharides of glycoproteins | 2E-40 | 27 | ||
| Protein kinase that forms a complex with Mad1p and Bub3p that is crucial in the checkpoint mechanism required to prevent cell cycle progression into anaphase in the presence of spindle damage, associates with centromere DNA via Skp1p | Component of the spindle-assembly checkpoint complex, which delays the onset of anaphase in cells with defects in mitotic spindle assembly; interacts physically with the spindle checkpoint proteins Bub3p and Mad2p | 2E-50 | 35 | ||
| Histone methyltransferase, subunit of the COMPASS (Set1C) complex which methylates histone H3 on lysine 4; required in transcriptional silencing near telomeres and at the silent mating type loci; contains a SET domain | Histone methyltransferase with a role in transcriptional elongation, methylates a lysine residue of histone H3; associates with the C-terminal domain of Rpo21p; histone methylation activity is regulated by phosphorylation status of Rpo21p | 2E-16 | 30 | ||
| Protein involved in regulation of cell wall composition and integrity and response to osmotic stress; overproduction suppresses a lysis sensitive PKC mutation; similar to Lre1p, which functions antagonistically to protein kinase A | Protein involved in control of cell wall structure and stress response; inhibits Cbk1p protein kinase activity; overproduction confers resistance to cell-wall degrading enzymes | 5E-34 | 34 | ||
| Alpha-1,2-mannosidase involved in ER quality control; catalyzes the removal of one mannose residue from Man9GlcNAc to produce a single isomer of Man8GlcNAc in N-linked oligosaccharide biosynthesis; integral to ER membrane | Alpha mannosidase-like protein of the endoplasmic reticulum required for degradation of glycoproteins but not for processing of N-linked oligosaccharides | 9E-25 | 25 | ||
| Serine/threonine rich cell surface protein that contains an EF hand motif; involved in the regulation of cell wall beta-1,3 glucan synthesis and bud site selection; overexpression confers resistance to Hansenula mrakii killer toxin, HM-1 | Mucin family member at the head of the Cdc42p- and MAP kinase-dependent filamentous growth signaling pathway; also functions as an osmosensor in parallel to the Sho1p-mediated pathway; potential Cdc28p substrate | 6E-12 | 29 | ||
| DNA helicase involved in telomere formation and elongation; acts as a catalytic inhibitor of telomerase; also plays a role in repair and recombination of mitochondrial DNA | DNA helicase involved in rDNA replication and Ty1 transposition; relieves replication fork pauses at telomeric regions; structurally and functionally related to Pif1p | 5E-102 | 40 | ||
| DNA helicase and DNA-dependent ATPase involved in DNA repair, required for proper timing of commitment to meiotic recombination and the transition from Meiosis I to Meiosis II; potential Cdc28p substrate | Mitochondrial inner membrane localized ATP-dependent DNA helicase, required for the maintenance of the mitochondrial genome; not required for mitochondrial transcription; has homology to E. coli helicase uvrD | 2E-18 | 21 | ||
See Table 3 for description. Tables 3 and 4 list the functions of a subset of buffering and all eight non-buffering gene pairs, respectively, with one pair per row. The descriptions of functions are taken from SGD [66]. Buffering genes (Table 3) are more often described as having identical functions than non-buffering genes (this table).
Orthologs of yeast buffering and non-buffering two-gene families
| - essential | 11 | 0 |
| - non-essential | 13 | 3 |
| - all duplicates essential | 1 | 0 |
| - all duplicates non-essential | 6 | 0 |
| 24 | 6 | |
This table lists the number of instances in which for the buffering and non-buffering yeast two-gene families, respectively, single or multiple orthologs were found in fly, worm or mouse and their KO-phenotype if known. Also see Table 3 for description of the data. Orthologs are divided into single-gene orthologs (no additional homologs in the organism) and multi-gene orthologs (additional paralogs). Single- or multi-gene orthologs can be essential or non-essential in the other organism.