| Literature DB >> 32786350 |
Joseph R Brady1,2, Melody C Tan1, Charles A Whittaker1, Noelle A Colant1,2, Neil C Dalvie1,2, Kerry Routenberg Love1, J Christopher Love1,2.
Abstract
Constructing efficient cellular factories often requires integration of heterologous pathways for synthesis of novel compounds and improved cellular productivity. Few genomic sites are routinely used, however, for efficient integration and expression of heterologous genes, especially in nonmodel hosts. Here, a data-guided framework for informing suitable integration sites for heterologous genes based on ATAC-seq was developed in the nonmodel yeast Komagataella phaffii. Single-copy GFP constructs were integrated using CRISPR/Cas9 into 38 intergenic regions (IGRs) to evaluate the effects of IGR size, intensity of ATAC-seq peaks, and orientation and expression of adjacent genes. Only the intensity of accessibility peaks was observed to have a significant effect, with higher expression observed from IGRs with low- to moderate-intensity peaks than from high-intensity peaks. This effect diminished for tandem, multicopy integrations, suggesting that the additional copies of exogenous sequence buffered the transcriptional unit of the transgene against effects from endogenous sequence context. The approach developed from these results should provide a basis for nominating suitable IGRs in other eukaryotic hosts from an annotated genome and ATAC-seq data.Entities:
Keywords: ATAC-seq; RNA-seq; genome engineering; heterologous gene; locus
Year: 2020 PMID: 32786350 PMCID: PMC7506950 DOI: 10.1021/acssynbio.0c00299
Source DB: PubMed Journal: ACS Synth Biol ISSN: 2161-5063 Impact factor: 5.110
Figure 1Genome-wide analysis of gene expression (RNA-seq) and chromatin accessibility (ATAC-seq). (A) Workflow for cultivation and sampling for RNA-seq and ATAC-seq after 24 h growth on glycerol and an additional 24 h growth on methanol. (B) Relative frequency of mapped fragment sizes recovered in ATAC-seq libraries. (C) Log2(fold-change) in gene expression and accessibility score relative to genome-wide averages for 7.5 kbp intervals across each chromosome. Approximate positions of centromeres are depicted for each chromosome. (D) Nucleosome positioning around translation start sites in K. phaffii as determined by NucleoATAC.
Figure 2Characterization of genomic properties of IGRs that potentially influence suitability for integration of a heterologous gene. (A) Distribution of intergenic region (IGR) length for each chromosome. (B) Distribution of IGR length for each orientation of adjacent genes. (C,D) Overall accessibility score in the promoter region (defined as 600 bp upstream of translation start site) for low (bottom 25%), medium (middle 50%), and high (top 25%) expression of genes under growth on (C) glycerol or (D) methanol.
Figure 3Evaluation of the impact of IGR properties on transgene expression and cell growth. (A) Workflow diagram for creation and analysis of an IGR targeting library. (B) Normalized GFP fluorescence versus IGR size. (C) Fluorescence versus IGR orientation. (D) Fluorescence versus category of adjacent gene expression and overall IGR accessibility. (E) Normalized max growth rate versus IGR size. (F) Growth versus IGR orientation. (G) Growth versus expression-accessibility category. Adjusted p-values computed using a Wilcoxon signed-rank test with the Benjamini–Hochberg correction for multiple hypotheses.
Figure 4Comparison of transgene expression among insertion loci across three promoters (PAOX1, PDAS2, and POLE1) and two heterologous genes (G-CSF and hGH). Transgene expression was measured by RNA-seq as log2(TPM + 1). Each point is a unique clone and represents the average of triplicate samples analyzed by RNA-seq. The transgene copy number is indicated for each clone.
Figure 5Influence of expression of heterologous genes on the downstream native gene. (A) Expression of TDH3, the transgene (hGH or G-CSF), and PIF1 as a function of integration either upstream of the TDH3 promoter or elsewhere in the genome. Each point represents the average of triplicate RNA-seq measurements of expression in log2(TPM + 1). Adjusted p-values were calculated using DESeq2. (B) Expression of TDH3, the transgene (hGH or G-CSF), and PIF1 as a function of integration either upstream of the PIF1 promoter or elsewhere in the genome.