| Literature DB >> 15345056 |
Jeanne Boyer1, Gwenaël Badis, Cécile Fairhead, Emmanuel Talla, Florence Hantraye, Emmanuelle Fabre, Gilles Fischer, Christophe Hennequin, Romain Koszul, Ingrid Lafontaine, Odile Ozier-Kalogeropoulos, Miria Ricchetti, Guy-Franck Richard, Agnès Thierry, Bernard Dujon.
Abstract
We have screened the genome of Saccharomyces cerevisiae for fragments that confer a growth-retardation phenotype when overexpressed in a multicopy plasmid with a tetracycline-regulatable (Tet-off) promoter. We selected 714 such fragments with a mean size of 700 base-pairs out of around 84,000 clones tested. These include 493 in-frame open reading frame fragments corresponding to 454 distinct genes (of which 91 are of unknown function), and 162 out-of-frame, antisense and intergenic genomic fragments, representing the largest collection of toxic inserts published so far in yeast.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15345056 PMCID: PMC522879 DOI: 10.1186/gb-2004-5-9-r72
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Overexpression library construction and screening. (a) Construction of an HA-tagged vector. The pCMha190 vector used here was constructed by insertion of a linker (gray box) in place of the multiple cloning site in vector pCM190 [31]. Features shown include the promoter and TATA box as well as the terminator from the original plasmid (open boxes), and the start codon, HA-tag, BamHI site and stop codons (thick vertical bars) from the introduced linker sequence. The linker was composed from the following annealed oligonucleotides: EXP3: 5'-GATCGTTTAAACCATATGTACCCATACGACGTCCCAGACTACGCTGG ATCCTGACTGACTGATC-3', EXP4: 5'-GGCCGATCAGTCAGTCAGGATCCAGCGT AGTCTGGGACGTCGTATGGGTACATATGGTTTAAAC-3'. (b) Library construction in pCMha190 (see Materials and methods for experimental details). The resulting ligation product is schematized, with the insert as a striped box and adaptors as hatched boxes. Sequences shown below are from junctions, with uppercase letters corresponding to vector (the extra nucleotide from filling-in is underlined), lowercase letters to adaptors and bold nnn's to insert. Arrows indicate the different primers used: SEQ8 and SEQ4 are used for PCR amplification of the insert, and SEQ1 for sequencing (see sequences in Additional data file 8). (c) First-round screening of toxic phenotypes. The growth of random and control clones on selective medium in uninduced and overexpression conditions is shown. Drops of serial dilutions (1/100 to 1/100,000) of cultures were grown for 45 h at 30°C. A3, non-toxic control clone transformed by pCMha190; H1, toxic control clone transformed by MCM1 gene cloned in pCMha190; G1, B2, D2, E3, library transformed clones, exhibiting different levels of toxicity in overexpression conditions (see Figure 2).
Figure 2Second-round scoring of toxic phenotypes and control. (a) Selected clones from the first round were diluted and three drops (1/100, 1/1,000 and 1/10,000) were spotted and grown for 42 h at 30°C, with controls on same plates, for confirmation of toxicity. Growth levels in the presence and absence of doxycycline were scored as described in the text. Each clone was assigned a growth index where the first number represents the growth in uninduced conditions and second number the growth in induced conditions; for example, 3/3 indicates a non-toxic insert; 3/0 indicates a highly toxic insert. Clone numbers are the same as in the tables describing the toxic inserts (see Additional file 1,2,3,4). (b) After 5-FOA-induced plasmid loss, growth of surviving clones is scored in the same way as in (a). Wild-type phenotypes in overexpression conditions are indicative of plasmid-borne toxicity.
Distribution of the toxic inserts between the different genetic objects
| Genetic objects represented | Number of toxic inserts | Percentage of total | Mean size ± SD (nucleotides) (minimum-maximum) | Phenotypes | Inserts encoding artificial peptides | |||
| 3/0, 3/1 | 3/2 | 2/0, 2/1 | 1/0 | |||||
| In-frame ORF fragments | 493 | 68.7 | 743 ± 311 (220-2,120) | 375 | 87 | 23 | 8 | _ |
| Antiparallel ORF fragments | 68 | 9.6 | 532 ± 247 (140-1,220) | 37 | 11 | 12 | 8 | 53 |
| Out-of-frame ORF fragments | 53 | 7.5 | 733 ± 306 (170-1,620) | 12 | 11 | 22 | 8 | 12 |
| Intergenic regions | 41 | 6.0 | 625 ± 358 (170-1,820) | 13 | 4 | 16 | 8 | 27 |
| LTRs | 2 | 0.3 | 595 (320-1,120) | 1 | 0 | 0 | 1 | 1 |
| Ty elements | 15 (10) | 2.1 | 633 ± 265 (320-870) | 7 | 4 | 2 | 2 | _ |
| Y' elements | 9 (3) | 1.2 | 678 ± 370 (320-1,320) | 9 | 0 | 0 | 0 | 6 |
| RNA genes | 4 | 0.5 | 662 ± 246 (470-1,020) | 3 | 0 | 1 | 0 | 3 |
| 2 μm plasmid | 17 (10) | 2.4 | 564 ± 288 (170-1,220) | 13 | 3 | 1 | 0 | 5 |
| Mitochondrial DNA | 12 | 1.7 | 483 ± 201 (200-920) | 9 | 3 | 0 | 0 | 10 |
| Total | 714 | 100 | 703 ± 313 (140-2,120) | 479 | 123 | 77 | 35 | 117 |
The first column indicates nature of sequence in toxic inserts. Second and third columns contain, respectively, actual number of inserts of each type and corresponding percentages. For Tys, Y' and 2 μm plasmid, numbers in brackets represent numbers of in-frame fragments of natural ORFs. The fourth column shows the mean size of insert in nucleotides ± standard deviation (SD) with minimum and maximum sizes in brackets. Scoring of each type of phenotype is shown in the next four columns. The last column shows the number of inserts in which artificial ORFs of more than 24 codons were detected.
Conserved domains found more than once among the toxic in-frame ORF fragments
| Domain reference | Domain name | Toxic inserts | Mean | 95% confidence interval | Result | Domain description | |
| COG0471 | CitT | 4 | 4 | 0.21 | 0.17-1.25 | + | Di-and tricarboxylate transporter |
| pfam03169 | OPT | 3 | 3 | 0.16 | 0.11-1.17 | + | Oligopeptide transporter protein |
| COG1953 | FUI1 | 9 | 3 | 0.48 | 0.44-1.56 | + | Nucleotide transporter |
| pfam00324 | aa_permeases | 22 | 7 | 1.16 | 1.04-2.22 | + | Amino acid permease |
| pfam00153 | mito_carr | 97 | 24 | 5.13 | 5.07-6.45 | + | Mitochondrial carrier protein |
| COG0531 | PotE | 26 | 5 | 1.38 | 1.28-2.48 | + | Amino acid transporter |
| COG0474 | MgtA | 23 | 4 | 1.22 | 1.12-2.30 | + | Cation transport ATPase |
| cd00267 | ABC_ATPase | 58 | 6 | 3.07 | 2.93-4.22 | + | ABC transporter nucleotide-binding domain |
| pfam00664 | ABC_membrane | 14 | 2 | 0.74 | 0.68-1.82 | + | ABC transporter transmembrane region |
| COG0842 | COG0842 | 6 | 3 | 0.32 | 0.29-1.38 | + | ABC-type multidrug transport system, permease component |
| COG1131 | CcmA | 54 | 4 | 2.86 | 2.74-4.01 | NS | ABC-type multidrug transport system, ATPase component |
| pfam00083 | Sugar_tr | 58 | 5 | 3.07 | 2.94-4.23 | + | Sugar (and other) transporter |
| pfam00076 | rrm | 72 | 11 | 3.81 | 3.62-4.95 | + | RNA recognition motif (transcription) |
| COG5099 | (PUF) | 9 | 5 | 0.48 | 0.44-1.56 | + | Pumilio family RNA-binding repeat (translational repression) |
| smart00322 | KH | 11 | 4 | 0.58 | 0.54-1.66 | + | K homology: RNA-binding domain (transcription, RNA metabolism) |
| smart00356 | ZnF_C3H1 | 5 | 4 | 0.26 | 0.21-1.30 | + | Zinc finger, C3H1 type (transcription) |
| COG5048 | C2H2-type Zn_finger | 15 | 4 | 0.79 | 0.74-1.89 | + | Zn-finger (C2H2-type) (transcription) |
| COG0210 | UvrD | 4 | 2 | 0.21 | 0.17-1.24 | + | DNA and RNA helicases, superfamily I (DNA replication, recombination, repair) |
| cd00086 | Homeodomain | 9 | 2 | 0.48 | 0.45-1.57 | + | DNA binding domain (eukaryotic development) |
| pfam00249 | myb_DNA-binding | 13 | 2 | 0.69 | 0.66-1.80 | + | Myb-like DNA-binding domain (transcription) |
| pfam00170 | bZIP | 4 | 2 | 0.21 | 0.17-1.25 | + | Basic-leucine zipper DNA binding and dimerization domains (transcription) |
| smart00066 | GAL4 | 48 | 2 | 2.54 | 2.44-3.72 | NS | GAL4-like Zn(II)2Cys6 DNA-binding domain (fungal) (transcription) |
| pfam04082 | Fungal_trans | 26 | 2 | 1.38 | 1.29-2.48 | NS | Fungal specific transcription factor domain. |
| pfam00270 | DEAD | 48 | 3 | 2.54 | 2.38-3.63 | NS | DEAD/DEAH box helicase (replication, repair, transcription) |
| cd00079 | HELICc | 60 | 2 | 3.18 | 3.08-4.34 | _ | Helicase superfamily, C-ter domain (replication, repair, transcription) |
| cd00200 | WD40 | 327 | 29 | 17.31 | 16.87-18.54 | + | Tandem repeats of about 40 residues interacting with peptides |
| pfam01602 | Adaptin_N | 9 | 2 | 0.48 | 0.43-1.54 | + | N-ter region of adaptor proteins (clathrin-coated pits and vesicles) |
| pfam00786 | PBD | 4 | 2 | 0.21 | 0.20-1.27 | + | P21-Rho-binding domain (or CRIB) |
| pfam00169 | PH | 11 | 3 | 0.58 | 0.55-1.67 | + | PH: pleckstrin homology. binds phosphoinositides or other ligands (signalling) |
| COG5271 | MDN1 | 16 | 3 | 0.85 | 0.78-1.93 | + | AAA : ATPase with von Willebrand factor type A domain (multiprot. complexes) |
| smart00268 | ACTIN | 14 | 2 | 0.74 | 0.67-1.82 | + | ACTIN, cytoskeleton/motor protein |
| COG5022 | Myosin heavy chain | 7 | 5 | 0.37 | 0.33-1.43 | + | ATPase, molecular motor |
| COG5043 | MRS6 | 4 | 2 | 0.21 | 0.17-1.24 | + | Vacuolar protein sorting-associated protein |
| KOG0446* | Dynamin | 3 | 3 | 0.16 | 0.13-1.20 | + | GTPase that mediates vesicle trafficking |
| pfam03901 | PMP | 5 | 2 | 0.21 | 0.21-1.29 | + | Mannosyltransferase |
| COG1928 | PMT1 | 7 | 4 | 0.37 | 0.30-1.40 | + | Mannosyltransferase |
| pfam00561 | Abhydrolase | 18 | 3 | 0.95 | 0.88-2.05 | + | Abhydrolase, alpha/beta hydrolase fold (catalytic domain) |
| pfam00107 | ADH_zinc_N | 21 | 2 | 1.11 | 1.01-2.19 | NS | Zinc-binding dehydrogenase |
| pfam00501 | AMP-binding | 11 | 2 | 0.58 | 0.51-1.64 | + | AMP-binding synthetase |
| pfam00674 | DUP | 35 | 3 | 1.85 | 1.81-3.03 | NS | DUP family (proteins of unknown functions) |
| COG5384 | Mpp10 | 1 | 2 | 0.05 | 0.03-1.07 | + | M phase phosphoprotein 10 (U3 small nucleolar ribonucleoprotein component) |
| COG5032 | TEL1 | 8 | 4 | 0.42 | 0.34-1.44 | + | PI kinase and protein kinases of the PI kinase family |
| COG1025 | Ptr | 5 | 2 | 0.26 | 0.22-1.31 | + | Zn-dependent peptidases (secreted/periplasmic, insulinase-like) |
| pfam02902 | Peptidase_C48 | 2 | 2 | 0.11 | 0.08-1.13 | + | Ulp1 protease family, C-terminal catalytic domain |
| pfam00004 | AAA | 43 | 3 | 2.28 | 2.15-3.39 | NS | AAA, ATPase family associated with various cellular activities (AAA) |
| smart00220 | S_TKc | 125 | 4 | 6.52 | 6.31-7.72 | - | Serine/threonine protein kinases, catalytic domain |
Peptide sequences of toxic natural ORF fragments were searched for domains (see text), and the frequency of domains found more than once was compared to the frequency in the whole proteome. References and names of domains are in the first two columns; occurrences in the whole genome (S. cerevisiae) and in the toxic inserts are in the third and fourth columns, respectively. The next three columns show the statistical analysis performed as follows: 1,000 random selections of 843 domains (total number of occurrences in the toxic inserts) were made from the set of 15,925 domains identified in S. cerevisiae (see Materials and methods); mean (column 5) represents the mean number of occurrences of each domain among the toxic inserts; the 95% confidence interval (column 6) was calculated using the SD of the 1,000 random drawings; column 7 shows the result of this analysis for each domain: NS, not significant; +, domain over-represented in toxic inserts; -, domain under-represented in toxic inserts. The last column gives a brief description of domains from NCBI Conserved Domain Database [65]. *KOG0446 was found using cdd.v1.63 of NCBI CD-Search [64].
Distribution of selected genes versus all S. cerevisiae genes
| All | Percentage of total | Selected toxic genes | Percentage of total | |
| 670 | 75 | |||
| 486 | 66 | |||
| Cell rescue, defense and virulence | 288 | 5.0 | 23 | 5.1 |
| Cellular communication/signal transduction mechanism | 59 | 1.0 | 6 | 1.3 |
| 525 | 67 | |||
| Classification not yet clear-cut | 112 | 1.9 | 6 | 1.3 |
| Control of cellular organization | 207 | 3.6 | 22 | 4.8 |
| Energy | 244 | 4.2 | 12 | 2.6 |
| Metabolism | 1,061 | 18.3 | 88 | 19.4 |
| Protein fate (folding, modification, destination) | 593 | 10.2 | 47 | 10.4 |
| 377 | 17 | |||
| 197 | 29 | |||
| 801 | 88 | |||
| 321 | 61 | |||
| 1,706 | 91 | |||
| Extracellular | 54 | 1.4 | 5 | 1.6 |
| Cell wall | 38 | 1.0 | 4 | 1.3 |
| Golgi | 103 | 2.6 | 8 | 2.5 |
| Transport vesicles | 54 | 1.4 | 3 | 0.9 |
| 171 | 34 | |||
| 1,367 | 130 | |||
| 2,001 | 137 | |||
| Peroxisome | 42 | 1.1 | 3 | 0.9 |
| Endosome | 20 | 0.5 | 2 | 0.6 |
| 154 | 22 | |||
| Vacuole | 82 | 2.1 | 8 | 2.5 |
| Endoplasmic reticulum | 353 | 9.0 | 27 | 8.5 |
| Mitochondria | 562 | 14.3 | 37 | 11.6 |
| 939 | 96 | |||
| Essential or not | 160 | 2.8 | 20 | 4.4 |
| 3,717 | 336 | |||
| 1674 | 106 | |||
| 412 | 10 |
The distribution of genes was examined in respect of four classifications: function, cellular localization of the gene product, viability and phylogeny. Data are from MIPS [38] and Génolevures [37]. Cellular localization was known for 3,928 out of the 5,803 proteins in the entire genome and for 319 proteins out of the 454 that yield toxic inserts. For other comparisons, the set of 454 selected genes was compared to the set of 5,803 genes of S. cerevisiae. Note that a given gene may be present in more than one MIPS class. Significant evidence that a given gene class is over-or under-represented among toxic genes as compared to all S. cerevisiae genes is emphasized by bold characters. *p < 0.005; †p < 0.025.
Toxicity of fragments versus whole ORF products
| ORF/Gene name | Gene description | Phenotype of gene deletion | Conserved domain or TMS in entire protein | Phenotype of gene overexpression | Conserved domain or TMS in insert | Phenotype of insert overexpression |
| YDL112w/TRM3* | tRNA 2'-O-ribose methyltransferase | Viable | SpoU_methylase | 3/3 | - | 3/1 |
| YML128C/MSC1/ GIN3*† | Weak similarity to | Viable | 1 TMS | 3/3 | - | 3/0 |
| YGR149w/_* ‡§ | Similar to | Viable | 5 TMS | 3/2 to 3/3 | 3 TMS | 3/2 |
| YGL023c/PIB2* § | Phosphatidylinositol 3-phosphate binding | Viable | FYVE | 3/1 | FYVE | 3/0 |
| YPL043w/NOP4¶ | Nucleolar protein, RNA processing | Lethal | RRM (4 motifs) | 3/0 | Bias D, E, K | 3/0 |
| YOR166c/_ * § | Similarity to hypothetical | Viable | PINc (nucleotide binding) | 3/0 | PINc | 3/0 |
| YJL212c/OPT1¶¥ | Oligopeptide transporter | Viable | OPT | 3/1 | 2 TMS, OPT | 3/1 |
| YNL003c/PET8¥¤ | Mitochondrial carrier | Viable | mito_carrier | 3/2 | mito_carrier | 3/2 |
| YJL092w/HPR5¥# | DNA helicase involved in DNA repair | Viable | UvrD | 2/0 | UvrD (central) | 3/2 |
| YMR190c/SGS1¥¤ | DNA helicase of DEAD/DEAH family | Viable | DEAD, HELICc, HRDC | 3/0 | DEAD | 3/2 |
| YNL033W/_§ | Strong similarity to YNL019c | Viable | 2 TMS | 3/1 | 1 TMS | 3/2 |
| YHR067w/_* § | Weak similarity to | Viable | Maoc : Acyldehydratase | 3/1 | MaoC | 3/2 |
| YGL263w/COS12§¥¤ | Similarity to subtelomeric encoded proteins | Viable | DUP | 3/0 | DUP | 3/1 |
Systematic nomenclature and gene name, where applicable, are given in the first column. *Singleton: the gene has no paralog in S. cerevisiae. †Gene fragment and #entire gene, respectively, were already known as toxic upon overexpression. ‡Putative uncharacterized transporter (see [35]). §Gene of unknown classification. ¶Two non-overlapping inserts of the ORF were selected. ¥One or several paralogs of this gene have also been selected as toxic inserts in this work (see Additional data file 3). ¤Gene having a paralog in S. cerevisiae already known as toxic upon overexpression. Columns 2 and 3 contain respectively a brief description of the function of the gene product and the phenotype of the disruption mutant (MIPS [38]). The results of a search for conserved domains is shown in columns 4 (in whole protein) and 6 (in inserts). Phenotypes in uninduced and overexpression conditions of the entire gene and of fragments are given in columns 5 and 7 respectively (see Figure 3 for illustrations of the phenotypes).
Figure 3Toxic phenotypes of overexpressed fragments versus whole ORF products. Complete ORFs are cloned in pCMha191 (tryptophan marker); inserts are cloned in pCMha190 (uracil marker). Eleven out of the 13 cases are represented in this figure. + doxycycline, uninduced conditions; - doxycycline, overexpressed conditions.
Figure 4Positions of selected toxic fragments relative to the structure of genes of the PI kinase family. Names of the selected genes and protein lengths (in amino acids) are indicated. Coordinates of the toxic fragments selected in this work and of known toxic domains (see text) are also given. Conserved domains in the proteins have been positioned using the NCBI CD-Search program [64] (see Materials and methods). Domain abbreviations: FAT (pfam 00259) is named after FRAP, ATM and TRRAP, which are human homologs of yeast TOR, TEL1 and TRA1, respectively; PI3Kc (smart00146) is the PI kinase catalytic domain; FATC (pfam02260.11) is named after FRAP, ATM, TRRAP carboxy-terminal region. Complete COG5032 TEL1 (2,105 residues) spans the carboxy-terminal regions of the four proteins. The drawing is not to scale.