Literature DB >> 21706014

A public genome-scale lentiviral expression library of human ORFs.

Xiaoping Yang1, Jesse S Boehm, Xinping Yang, Kourosh Salehi-Ashtiani, Tong Hao, Yun Shen, Rakela Lubonja, Sapana R Thomas, Ozan Alkan, Tashfeen Bhimdi, Thomas M Green, Cory M Johannessen, Serena J Silver, Cindy Nguyen, Ryan R Murray, Haley Hieronymus, Dawit Balcha, Changyu Fan, Chenwei Lin, Lila Ghamsari, Marc Vidal, William C Hahn, David E Hill, David E Root.   

Abstract

Functional characterization of the human genome requires tools for systematically modulating gene expression in both loss-of-function and gain-of-function experiments. We describe the production of a sequence-confirmed, clonal collection of over 16,100 human open-reading frames (ORFs) encoded in a versatile Gateway vector system. Using this ORFeome resource, we created a genome-scale expression collection in a lentiviral vector, thereby enabling both targeted experiments and high-throughput screens in diverse cell types.

Entities:  

Mesh:

Year:  2011        PMID: 21706014      PMCID: PMC3234135          DOI: 10.1038/nmeth.1638

Source DB:  PubMed          Journal:  Nat Methods        ISSN: 1548-7091            Impact factor:   28.547


While recent technological advances provide the means to efficiently scan the human genome to identify genes associated with diseases[1-2], the subsequent functional characterization of these genes is a bottleneck in translating these discoveries into mechanistic insights and ultimately into therapeutics. Genome-scale RNA interference reagents have recently been created to enable systematic loss-of-function mammalian genomics[3]. To perform complementary gain-of-function gene studies, comparable libraries of arrayed cDNAs or open-reading frames (ORFs) are required along with efficient methods to employ these reagents in cell-based assays. We[4-5] and others[6-9] have previously reported the construction of genome-scale ORFeome collections. These collections are useful templates for subcloning[9] or recombinational transfer[8] between vectors and protein production[8], and they enable certain applications including interaction mapping[10], but there are limits to the direct applications of these collections. They vary dramatically in terms of gene representation, format, and functionality, as well as quality measures such as the extent of clonality, sequence annotation, and experimental validation (see Supplementary Table 1). Tellingly, gain-of-function screening now lags behind the use of RNAi. Here, we report the creation and characterization of two publicly available genome-scale human ORFeome collections: the human ORFeome version 8.1 Entry Clone Collection (hORFeome V8.1) and the CCSB-Broad Lentiviral Expression Library. Together, these collections are: (i) extensive, comprising 16,172 distinct ORFs mapping to 13,833 genes, (ii) clonal and sequenced, as each ORF plasmid is derived from a single bacterial colony and nearly all clones are fully sequenced, (iii) versatile, due to use of Gateway recombinational cloning[11-12] (iv) enabling of cell-based functional screens, as the Expression Library encodes these clones in a lentiviral expression vector that produces consistent titers and gene expression levels and permits delivery to most cell types, and (v) available via ORFeome Collaboration (Supplementary Note 1). We assembled these collections in four phases: First we expanded our previous collections to 19,281 ORFs in polyclonal format largely using existing protocols[4-5]; second, we derived clonal plasmid isolates from single bacterial colonies; third, we sequenced these clonal isolates and used the sequence data to choose clones for inclusion in hORFeome V8.1; and fourth we transferred clones to a lentiviral expression vector to create the CCSB-Broad Lentiviral Expression Library. We expanded our library by transferring recently available ORFs from Mammalian Gene Collection (MGC)[9] cDNAs into the Gateway system using directed PCR[4-5] to create Entry vector clones while removing stop codons (Fig. 1a, top). To maximize throughput during this initial phase, clones were represented as a non-clonal pool of bacteria derived from recombinational cloning of PCR products. We next resolved this polyclonal library into clonal isolates using a robust and efficient workflow (Fig. 1a, middle) in which we isolated two colonies per ORF bacterial stock (see online Methods, Supplementary Note 2) from which we prepared ORF templates for sequencing.
Figure 1

Overview of hORFeome V8.1

(a) Schematic of hORFeome V8.1 creation. Templates from the Mammalian Gene Collection (MGC) were transferred into the Gateway system via PCR and recombinational cloning, resolved as clonal isolates, fully sequenced and rearrayed. (b) Sequencing outcomes for 19,281 ORF samples in polyclonal format, from which single colonies were isolated. 14,524 ORFs were fully sequenced and accepted into the final collection (Complete, accepted). 198 ORFs were fully sequenced, but rejected for lacking a start codon (Complete, rejected). 825 ORFs were partially sequenced, including undetermined nucleotides (Partial). 823 ORFs were made clonal but were intentionally not sequenced, since these ORFs were isoforms of other ORFs in the sequencing pool and could cause unambiguous read mapping (Not attempted). See Supplementary Figure 3 for more details. (c) Alignment of the 14,524 completely sequenced clones with MGC templates. 12,736 clones have identical sequence as templates or have one synonymous error only (Perfect), and another 1,788 clones have additional mutations (Mutant). (d) Alignment of the 14,524 completely sequenced clones with NCBI RefSeq transcripts. 10,216 ORFs represent full length coding sequences with > 99% homology (Full), 1,545 ORFs were partial length coding sequences with > 85% homology (Partial) and 2,763 clones fell into other categories (Other). See Supplementary Figure 4 for more details.

We developed an optimized process to leverage next-generation sequencing and efficient alignment algorithms[13] to efficiently sequence large numbers of ORF clones at high coverage (Fig. 1a, bottom). Clonal isolates for each ORF were pooled. Using Illumina sequencing technology, we compared efficiencies of sequencing full vectors versus purified ORF inserts only (Supplementary Fig. 1a). While purified ORF inserts yielded higher median sequence coverage, the added clone manipulation led to substantially greater coverage variability (Supplementary Fig. 1b-g, Supplementary Table 2) so we proceeded to sequence pools of full entry-clone plasmids. For some clone pools, we employed an alternative approach in which we PCR-amplified ORF sequences from individual bacterial colonies and sequenced the amplicons in a multiplexed, pooled format previously reported[14] using 454 technology. Both methods were effective at sequencing ORF clones (Supplementary Figure 2), but our protocol based on Illumina technology yielded higher yields at lower cost per attempted clone (data not shown), and was therefore used to sequence the majority (84%) of the collection. To assemble ORF sequences, reads from each clone pool were aligned to MGC reference sequences. Adequate reads were obtained to produce full ORF alignments for > 27,000 clonal isolates from 14,722 polyclonal ORFs, at the fold-coverage required for accurate base-calling (Supplementary Fig. 3). ORF sequences were annotated for mismatches, insertions, and deletions (Supplementary Tables 3,4). To evaluate the sequence accuracy of multiplexed Illumina and 454 sequencing combined with our automated alignment algorithms, we re-sequenced >121,000 nucleotides from 287 ORFs by the Sanger method, and found a confirmation rate of >99.99% of nucleotides. For each original ORF stock, the clonal isolate that most closely matched the MGC reference sequence was selected for inclusion in the hORFeome V8.1 collection. 198 clones with missing start codons were omitted. Of 14,524 retained sequenced ORFs (Figure 1b), 82% (12,736) were either sequence-identical to the MGC reference or had one synonymous error, and comprise the majority of the hORFeome V8.1 collection (Fig. 1c, Supplementary Fig. 3). Another component of the V8.1 collection, denoted as the hORFeome V8.1 Mutant Subcollection, consists of 1,788 ORFs that had more than one synonymous or any non-synonymous mutations or other errors, and were retained since these plasmids may prove useful in some applications. We supplemented the fully sequenced set of ORFs with 825 clones comprising the hORFeome V8.1 Partially Sequenced Subcollection, including 597 clones from our recently described subcollection of kinases and kinase-related ORFs (clonal isolates, end-read Sanger sequencing in 2 directions)[15] (see Supplementary Note 3) and 228 clones that were sequenced using next-generation technology over only part of the intended MGC ORF sequences. Finally, we denote the 823 clonal versions of isoforms that were removed prior to pooled sequencing as the hORFeome V8.1 Unsequenced Subcollection. Overall, hORFeome V8.1 includes 16,172 clonal ORFs, mapping to 13,833 human genes, of which 14,524 clones (90%) for 12,940 genes (94%) are fully sequenced (Supplementary Fig. 3, Supplementary Tables 3,4). We next determined which currently annotated human transcripts are represented in hORFeome V8.1. The 14,524 fully sequenced library clones were mapped to National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) coding transcripts, and we found that 10,216 ORFs map with > 99% homology and constitute full length coding sequences (Fig. 1d, Supplementary Fig. 4). 1,545 additional ORFs represent partial coding sequences. The remaining 2,763 sequenced ORFs map to non-coding transcripts or to RefSeq transcripts with lower homology or are not currently found in RefSeq. Since the original MGC cDNA source templates for these clones were derived from expressed cellular transcripts, some of these non-full length clones may represent un- or mis-annotated but physiologically relevant transcripts. Indeed, incomplete knowledge of the transcriptome is a major challenge to obtaining a comprehensive ORF resource (Supplementary Note 4). hORFeome V8.1 enables many applications as it permits rapid ORF shuttling into any Gateway-compatible expression vector. To enable large-scale screening of this collection in mammalian cells, we developed, optimized and validated a series of Gateway-compatible mammalian expression vectors (pLX series, Supplementary Fig. 5) encoding numerous desirable elements (see online Methods). We elected to shuttle the entire hORFeome V8.1 collection into the pLX304-Blast-V5 vector to create the CCSB-Broad Lentiviral Expression Library (Fig. 2a).
Figure 2

Overview and performance of the CCSB-Broad Lentiviral Expression Library

(a) Schematic of the creation of the CCSB-Broad Lentiviral Expression Library. pLX304-Blast-V5 is a custom lentiviral vector validated for high-throughput screening encoding Blasticidin (Blast) resistance and a C-terminal V5-epitope tag. (b) Evaluation of high-throughput ORF transduction and expression in A549 lung cancer cell lines. The micrographs show images of cells stained for cellular DNA (to assess cell number) and with antibodies recognizing the V5 epitope (to assess ORF expression) after lentiviral infection and three days of growth in blasticidin. Wells in which no virus was added are highlighted with yellow outline. (c) Distribution of ORF sizes and average viral titer as a function of ORF size. (d) ORF expression as a function of ORF size. ORFs larger than 3 kb showed a decreased yet detectable above-background level of expression. Background was assessed from cells expressing a control vector without V5 expression.

We conducted a pilot experiment on 509 ORF clones to assess: (i) protocols to transfer the entry library, (ii) high-throughput production of DNA and virus, and (iii) ORF expression in A549 cells (Fig. 2b-d, Supplementary Figs. 6-9). Plasmid DNA production and viral packaging were achieved in 96-well format with consistent DNA yields and titers averaging 2.1 × 106 infectious units (IU)/ml (Fig. 2c, Supplementary Fig. 7a). Titers were preserved across all ORF sizes (Fig. 2c, Supplementary Fig. 8a). We assessed ORF expression via quantification of V5-epitope tag expression and observed that approximately 90% of ORF lentiviruses induced expression signals greater than 2 standard deviations above the control mean (Fig. 2b, d, Supplementary Figs. 7b, 8b, 9). Using the optimized protocols, we then produced the CCSB-Broad Lentiviral Expression Library in the pLX304-Blast-V5 vector, successfully isolating a single bacterial colony from 98.5% of reactions (15,935 total clones). To estimate the accuracy of the final collection of expression vectors, we performed end-read sequencing of 325 colonies and confirmed 98.2% accurate transfers (see online Methods). The utility of this resource for systematic functional genomic screens in mammalian cells is illustrated by recent results from a screen of a pilot subset of this collection (597 genes), which identified novel mediators of resistance to RAF inhibition in melanoma[15]. Additional pilot experiments confirm that this resource enables other readouts including immunofluorescence (Supplementary Figure 10). In summary, we report here the construction of the most fully sequenced, flexible and annotated version of the human ORFeome to date. The entire collection, comprising both source (entry) clones and lentivirus vector expression clones, is available without restriction through the ORFeome Collaboration (Supplementary Note 1). We anticipate that these collections will greatly facilitate the systematic functional assessment of human genes that mediate cellular phenotypes. Supplementary Figure 1 Pilot experiments to optimize pooling strategy for next-generation sequencing of ORF clones. Supplementary Figure 2 Coverage histograms of sequencing hORFeome V8.1. Supplementary Figure 3 Flowchart of hORFeome V8.1 creation. Supplementary Figure 4 Alignment results of 14,524 completely sequenced clones with current NCBI RefSeq transcripts. Supplementary Figure 5 Plasmid maps of pLX lentiviral expression vectors created as part of this study. Supplementary Figure 6 Confirmation of viral preparations. Supplementary Figure 7 Determination of virus titer and ORF expression. Supplementary Figure 8 Virus titer and ORF expression are maintained across a wide range of ORF lengths. Supplementary Figure 9 Western blot showing expressed ORFs. Supplementary Figure 10 Viral preparations enable immunofluoresence high throughout screens. Supplementary Table 1a Clonal and sequenced ORF Gateway entry clone collections Supplementary Table 1b Comparison of Nomura and CCSB-Broad ORF collections Supplementary Table 2 Illumina sequencing pilot data Supplementary Table 3 Overview of next generation sequencing results Supplementary Table 4 Annotated list of hORFeome V8.1 and CCSB-Broad Lentiviral Expresson Library. Supplementary Note 1 Availability of clones and distribution procedures Supplementary Note 2 Pilot experiments to determine number of colonies to isolate per polyclonal ORF. Supplementary Note 3 Supplementing hORFeome V8.1 with kinase ORFs. Supplementary Note 4 Challenges of completing the human ORFeome Supplementary Note 5 Detailed high-throughput protocol of single colony isolation. Supplementary Note 6 Details of pooled sequencing protocol optimization experiments. Supplementary Note 7 Computing virus titers. Supplementary Note 8 Li-COR in-cell Western and immunoblotting.
  19 in total

1.  Woodchuck hepatitis virus posttranscriptional regulatory element enhances expression of transgenes delivered by retroviral vectors.

Authors:  R Zufferey; J E Donello; D Trono; T J Hope
Journal:  J Virol       Date:  1999-04       Impact factor: 5.103

2.  GATEWAY recombinational cloning: application to the cloning of large numbers of open reading frames or ORFeomes.

Authors:  A J Walhout; G F Temple; M A Brasch; J L Hartley; M A Lorson; S van den Heuvel; M Vidal
Journal:  Methods Enzymol       Date:  2000       Impact factor: 1.600

3.  Towards a proteome-scale map of the human protein-protein interaction network.

Authors:  Jean-François Rual; Kavitha Venkatesan; Tong Hao; Tomoko Hirozane-Kishikawa; Amélie Dricot; Ning Li; Gabriel F Berriz; Francis D Gibbons; Matija Dreze; Nono Ayivi-Guedehoussou; Niels Klitgord; Christophe Simon; Mike Boxem; Stuart Milstein; Jennifer Rosenberg; Debra S Goldberg; Lan V Zhang; Sharyl L Wong; Giovanni Franklin; Siming Li; Joanna S Albala; Janghoo Lim; Carlene Fraughton; Estelle Llamosas; Sebiha Cevik; Camille Bex; Philippe Lamesch; Robert S Sikorski; Jean Vandenhaute; Huda Y Zoghbi; Alex Smolyar; Stephanie Bosak; Reynaldo Sequerra; Lynn Doucette-Stamm; Michael E Cusick; David E Hill; Frederick P Roth; Marc Vidal
Journal:  Nature       Date:  2005-09-28       Impact factor: 49.962

4.  OSP: a computer program for choosing PCR and DNA sequencing primers.

Authors:  L Hillier; P Green
Journal:  PCR Methods Appl       Date:  1991-11

5.  A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen.

Authors:  Jason Moffat; Dorre A Grueneberg; Xiaoping Yang; So Young Kim; Angela M Kloepfer; Gregory Hinkle; Bruno Piqani; Thomas M Eisenhaure; Biao Luo; Jennifer K Grenier; Anne E Carpenter; Shi Yin Foo; Sheila A Stewart; Brent R Stockwell; Nir Hacohen; William C Hahn; Eric S Lander; David M Sabatini; David E Root
Journal:  Cell       Date:  2006-03-24       Impact factor: 41.582

6.  Identification of an epitope on the P and V proteins of simian virus 5 that distinguishes between two isolates with different biological characteristics.

Authors:  J A Southern; D F Young; F Heaney; W K Baumgärtner; R E Randall
Journal:  J Gen Virol       Date:  1991-07       Impact factor: 3.891

7.  Human ORFeome version 1.1: a platform for reverse proteomics.

Authors:  Jean-François Rual; Tomoko Hirozane-Kishikawa; Tong Hao; Nicolas Bertin; Siming Li; Amélie Dricot; Ning Li; Jennifer Rosenberg; Philippe Lamesch; Pierre-Olivier Vidalain; Tracey R Clingingsmith; James L Hartley; Dominic Esposito; David Cheo; Troy Moore; Blake Simmons; Reynaldo Sequerra; Stephanie Bosak; Lynn Doucette-Stamm; Christian Le Peuch; Jean Vandenhaute; Michael E Cusick; Joanna S Albala; David E Hill; Marc Vidal
Journal:  Genome Res       Date:  2004-10       Impact factor: 9.043

8.  COT drives resistance to RAF inhibition through MAP kinase pathway reactivation.

Authors:  Cory M Johannessen; Jesse S Boehm; So Young Kim; Sapana R Thomas; Leslie Wardwell; Laura A Johnson; Caroline M Emery; Nicolas Stransky; Alexandria P Cogdill; Jordi Barretina; Giordano Caponigro; Haley Hieronymus; Ryan R Murray; Kourosh Salehi-Ashtiani; David E Hill; Marc Vidal; Jean J Zhao; Xiaoping Yang; Ozan Alkan; Sungjoon Kim; Jennifer L Harris; Christopher J Wilson; Vic E Myer; Peter M Finan; David E Root; Thomas M Roberts; Todd Golub; Keith T Flaherty; Reinhard Dummer; Barbara L Weber; William R Sellers; Robert Schlegel; Jennifer A Wargo; William C Hahn; Levi A Garraway
Journal:  Nature       Date:  2010-11-24       Impact factor: 49.962

9.  hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes.

Authors:  Philippe Lamesch; Ning Li; Stuart Milstein; Changyu Fan; Tong Hao; Gabor Szabo; Zhenjun Hu; Kavitha Venkatesan; Graeme Bethel; Paul Martin; Jane Rogers; Stephanie Lawlor; Stuart McLaren; Amélie Dricot; Heather Borick; Michael E Cusick; Jean Vandenhaute; Ian Dunham; David E Hill; Marc Vidal
Journal:  Genomics       Date:  2007-01-05       Impact factor: 5.736

10.  The full-ORF clone resource of the German cDNA Consortium.

Authors:  Stephanie Bechtel; Heiko Rosenfelder; Anny Duda; Christian Peter Schmidt; Ute Ernst; Ruth Wellenreuther; Alexander Mehrle; Claudia Schuster; Andre Bahr; Helmut Blöcker; Dagmar Heubner; Andreas Hoerlein; Guenter Michel; Holger Wedler; Karl Köhrer; Birgit Ottenwälder; Annemarie Poustka; Stefan Wiemann; Ingo Schupp
Journal:  BMC Genomics       Date:  2007-10-31       Impact factor: 3.969

View more
  263 in total

Review 1.  The New State of the Art: Cas9 for Gene Activation and Repression.

Authors:  Marie F La Russa; Lei S Qi
Journal:  Mol Cell Biol       Date:  2015-09-14       Impact factor: 4.272

2.  Potassium channel mutant KCNJ5 T158A expression in HAC-15 cells increases aldosterone synthesis.

Authors:  Kenji Oki; Maria W Plonczynski; Milay Luis Lam; Elise P Gomez-Sanchez; Celso E Gomez-Sanchez
Journal:  Endocrinology       Date:  2012-02-07       Impact factor: 4.736

3.  Engineering and Functional Characterization of Fusion Genes Identifies Novel Oncogenic Drivers of Cancer.

Authors:  Hengyu Lu; Nicole Villafane; Turgut Dogruluk; Caitlin L Grzeskowiak; Kathleen Kong; Yiu Huen Tsang; Oksana Zagorodna; Angeliki Pantazi; Lixing Yang; Nicholas J Neill; Young Won Kim; Chad J Creighton; Roel G Verhaak; Gordon B Mills; Peter J Park; Raju Kucherlapati; Kenneth L Scott
Journal:  Cancer Res       Date:  2017-05-16       Impact factor: 12.701

4.  Sporadic activation of an oxidative stress-dependent NRF2-p53 signaling network in breast epithelial spheroids and premalignancies.

Authors:  Elizabeth J Pereira; Joseph S Burns; Christina Y Lee; Taylor Marohl; Delia Calderon; Lixin Wang; Kristen A Atkins; Chun-Chao Wang; Kevin A Janes
Journal:  Sci Signal       Date:  2020-04-14       Impact factor: 8.192

Review 5.  Integrating phenotypic small-molecule profiling and human genetics: the next phase in drug discovery.

Authors:  Cory M Johannessen; Paul A Clemons; Bridget K Wagner
Journal:  Trends Genet       Date:  2014-12-12       Impact factor: 11.639

6.  Widespread macromolecular interaction perturbations in human genetic disorders.

Authors:  Nidhi Sahni; Song Yi; Mikko Taipale; Juan I Fuxman Bass; Jasmin Coulombe-Huntington; Fan Yang; Jian Peng; Jochen Weile; Georgios I Karras; Yang Wang; István A Kovács; Atanas Kamburov; Irina Krykbaeva; Mandy H Lam; George Tucker; Vikram Khurana; Amitabh Sharma; Yang-Yu Liu; Nozomu Yachie; Quan Zhong; Yun Shen; Alexandre Palagi; Adriana San-Miguel; Changyu Fan; Dawit Balcha; Amelie Dricot; Daniel M Jordan; Jennifer M Walsh; Akash A Shah; Xinping Yang; Ani K Stoyanova; Alex Leighton; Michael A Calderwood; Yves Jacob; Michael E Cusick; Kourosh Salehi-Ashtiani; Luke J Whitesell; Shamil Sunyaev; Bonnie Berger; Albert-László Barabási; Benoit Charloteaux; David E Hill; Tong Hao; Frederick P Roth; Yu Xia; Albertha J M Walhout; Susan Lindquist; Marc Vidal
Journal:  Cell       Date:  2015-04-23       Impact factor: 41.582

7.  GLTSCR2/PICT1 links mitochondrial stress and Myc signaling.

Authors:  John C Yoon; Alvin J Y Ling; Meltem Isik; Dong-Young Donna Lee; Michael J Steinbaugh; Laura M Sack; Abigail N Boduch; T Keith Blackwell; David A Sinclair; Stephen J Elledge
Journal:  Proc Natl Acad Sci U S A       Date:  2014-02-20       Impact factor: 11.205

8.  Yes-Associated Protein Inhibits Transcription of Myocardin and Attenuates Differentiation of Vascular Smooth Muscle Cell from Cardiovascular Progenitor Cell Lineage.

Authors:  Lunchang Wang; Ping Qiu; Jiao Jiao; Hiroyuki Hirai; Wei Xiong; Jifeng Zhang; Tianqing Zhu; Peter X Ma; Y Eugene Chen; Bo Yang
Journal:  Stem Cells       Date:  2016-09-20       Impact factor: 6.277

Review 9.  High-throughput cloning and expression library creation for functional proteomics.

Authors:  Fernanda Festa; Jason Steel; Xiaofang Bian; Joshua Labaer
Journal:  Proteomics       Date:  2013-04-05       Impact factor: 3.984

10.  Peptide-based PET quantifies target engagement of PD-L1 therapeutics.

Authors:  Dhiraj Kumar; Ala Lisok; Elyes Dahmane; Matthew McCoy; Sagar Shelake; Samit Chatterjee; Viola Allaj; Polina Sysa-Shah; Bryan Wharram; Wojciech G Lesniak; Ellen Tully; Edward Gabrielson; Elizabeth M Jaffee; John T Poirier; Charles M Rudin; Jogarao Vs Gobburu; Martin G Pomper; Sridhar Nimmagadda
Journal:  J Clin Invest       Date:  2019-01-07       Impact factor: 14.808

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.