| Literature DB >> 34268494 |
Isaac Shamie1, Sascha H Duttke2, Karen J la Cour Karottki3, Claudia Z Han4, Anders H Hansen3, Hooman Hefzi1, Kai Xiong3, Shangzhong Li1, Samuel J Roth2, Jenhan Tao4, Gyun Min Lee3, Christopher K Glass4, Helene Faustrup Kildegaard3, Christopher Benner2, Nathan E Lewis1.
Abstract
Chinese hamster ovary (CHO) cells are widely used for producing biopharmaceuticals, and engineering gene expression in CHO is key to improving drug quality and affordability. However, engineering gene expression or activating silent genes requires accurate annotation of the underlying regulatory elements and transcription start sites (TSSs). Unfortunately, most TSSs in the published Chinese hamster genome sequence were computationally predicted and are frequently inaccurate. Here, we use nascent transcription start site sequencing methods to revise TSS annotations for 15 308 Chinese hamster genes and 3034 non-coding RNAs based on experimental data from CHO-K1 cells and 10 hamster tissues. We further capture tens of thousands of putative transcribed enhancer regions with this method. Our revised TSSs improves upon the RefSeq annotation by revealing core sequence features of gene regulation such as the TATA box and the Initiator and, as exemplified by targeting the glycosyltransferase gene Mgat3, facilitate activating silent genes by CRISPRa. Together, we envision our revised annotation and data will provide a rich resource for the CHO community, improve genome engineering efforts and aid comparative and evolutionary studies.Entities:
Year: 2021 PMID: 34268494 PMCID: PMC8276764 DOI: 10.1093/nargab/lqab061
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
Figure 2.An experimental realignment of TSS annotation for the Chinese hamster uncovers expected genomic elements. A comparison of our TSSs to Chinese hamster RefSeq annotation GCF_003668045.1 (A andB) Average normalized CPM around protein-coding reference TSSs. (A) Comparison of experimentally defined TSSs from CHO cells by 5′GRO-seq and csRNA-seq relative to the RefSeq annotation. (B) Comparison of experimentally defined TSSs from representative tissues relative to the RefSeq annotation. (C) Nucleotide frequency plots of TSSs and their relative information content in Human RefSeq, Chinese hamster RefSeq, and our revised Chinese hamster annotation. (D) Frequency of positional core promoter elements: the TATA box and the Initiator that are commonly found at -30 and +1, relative to the TSS. (E) Frequency of distance between revised TSSs observed and the nearest RefSeq TSS. (F) Summary of total protein-coding and non-distal ncRNA TSSs observed and their distances to RefSeq TSSs.
Figure 3.Composition of diverse tissue-specific Chinese hamster transcriptomes. (A) Experimentally detected genes and the number of tissues wherein they were confidently expressed, as defined by csRNA-seq and 5′GRO-seq. (B) Cumulative plot of the distribution of transcript abundance as defined by RNA-seq in various tissues. The transcriptome of highly specialized tissues such as the heart or the pancreas is more dominated by the high expression of a small set of specific RNAs than those of complex tissues such as the brain. (C) Comparison of gene expression distributions across tissues as defined by csRNA-seq and 5′GRO-seq. (D) Tissue-specific gene enrichment analysis (TSEA) comparing the gene expression patterns of our samples as defined by csRNA-seq and 5′GRO-seq to orthologous human pre-defined tissue-specific genes. -loge (P-value) values are shown. (E andF) Motif analysis with Homer. Significance of hypergeometric enrichment of the motifs shown as -loge (P-value). (E) Transcription factor motifs (top 3 per sample) enriched in TSSs for each tissue highlight conservation and factors involved in maintaining tissue-specific expression patterns. (F) Transcription factor motifs enriched in all protein-coding gene-associated TSSs in the revised TSS annotation.
Figure 1.A Chinese hamster Transcriptome Atlas. (A) Overview of datasets generated to identify transcription start sites. * Denote cell lines, ** denote primary cells. (B andC) IGV viewer of data. Units are in counts per million (CPM) (B) Example transcription start site at single-nucleotide resolution as defined by 5′GRO-seq and csRNA-seq (using GRO-seq and sRNA-seq as input, respectively) of the focused Eukaryotic Translation Elongation Factor 1 Alpha (Eef1A1) promoter in CHO cells and diverse tissues. Brain RNA-seq reads are shown in orange. (C) Example of unstable transcription start sites of enhancer RNAs that are poorly detected by conventional RNA-seq at the Sp1 ‘super enhancer’ locus in CHO cells. Note: Raw IGV browser visualization data are provided in Supplementary Figure S3. (D) Number of TSSs captured, grouped by TSS type and samples detected in (E). Cumulative plot across all samples of protein-coding genes with a TSS detected by csRNA-seq and/or 5′GRO-seq enrichment over GRO-seq and/or csRNA-seq. Sorted by taking CHO as the first sample, followed by hamster tissues.
Figure 4.Experimentally measured TSSs facilitates genome engineering to humanize glycosylation. (A) List of human glycosylation enzyme classes detected in our samples as defined by 5′GRO-seq/csRNA-seq in the Chinese hamster. The number of genes expressed in CHO cells (blue) and additional genes for which experimental TSSs were discovered in our tissue samples (red) are shown. (B) Overview of the RefSeq TSS targeted by guide RNAs with CRISPRa to induce Mgat3 expression in CHO cells. The Mgat3-encoded glycosyltransferase catalyzes the addition of bisecting N‐acetylglucosamines on glycoproteins, but is silenced in CHO cells. (C) Quantitative RT-PCR measurement of Mgat3 expression in CHO cells and upon activation by the three designed gRNAs using our new TSSs. As a control, the cells were transfected with NT-gRNA (gRNA-Ctrl) or NT-gRNA and VPR-dCas9 (Cas9-Ctrl). (E) Comparison of the levels of bisecting N‐acetylglucosamines in secretome following CRISPRa. As a control, the cells were transfected with NT-gRNA (gRNA-Ctrl). (E) Overview: Experimental TSS facilitates efficient engineering of Mgat3 in an upstream promoter.