Literature DB >> 35271248

Putative Phenotypically Neutral Genomic Insertion Points in Prokaryotes.

Casey B Bernhards^1,2, Alvin T Liem^1,3, Kimberly L Berk¹, Pierce A Roth^1,3, Henry S Gibbons¹, Matthew W Lux¹.

Abstract

The barriers to effective genome editing in diverse prokaryotic organisms have been falling at an accelerated rate. As editing becomes easier in more organisms, quickly identifying genomic locations to insert new genetic functions without disrupting organism fitness becomes increasingly useful. When the insertion is noncoding DNA for applications such as information storage or barcoding, a neutral insertion point can be especially important. Here we describe an approach to identify putatively neutral insertion sites in prokaryotes. An algorithm (targetFinder) finds convergently transcribed genes with gap sizes within a specified range, and looks for annotations within the gaps. We report putative editing targets for 10 common synthetic biology chassis organisms, including coverage of available RNA-seq data, and provide software to apply to others. We further experimentally evaluate the neutrality of six identified targets in Escherichia coli through insertion of a DNA barcode. We anticipate this information and the accompanying tool will prove useful for synthetic biologists seeking neutral insertion points for genome editing.

Entities: Chemical

Keywords: CRISPR; genetic barcoding; genome editing; insertion sites; neutral sites; synthetic biology

Mesh：

Year: 2022 PMID： 35271248 PMCID： PMC9016761 DOI： 10.1021/acssynbio.1c00531

Source DB: PubMed Journal: ACS Synth Biol ISSN： 2161-5063 Impact factor: 5.249

Tools for genomic editing have expanded rapidly in recent years, driven in large part by the development of CRISPR-Cas (Clustered Regularly Interspersed Short Palindromic Repeats-CRISPR associated) editing systems.[1] For prokaryotes, CRISPR-Cas editing has been demonstrated for at least 36 species.[2] Genomic editing tools based on phage-derived homologous recombination systems are also widely used in multiple prokaryotes, sometimes in tandem with CRISPR-Cas.[3] In parallel, numerous efforts have endeavored to expand the number of chassis organisms available for tractable engineering by developing genetic toolboxes for increasing numbers of nonmodel prokaryotes.[4,5] A recent trend in CRISPR-Cas editing research to harness the native CRISPR-Cas system of a target prokaryote rather than using a well-studied, transferrable system promises to expand the range of editable organisms further.[6] While many editing applications modify endogenous host function, others aim to add entirely new capabilities. The genomic insertion point can be extremely important, both due to potential fitness effects and poorly understood variation in expression levels.[7,8] Of particular motivation here is the case of phenotypically neutral insertions, with applications including use of genomic DNA as a digital data storage medium,[9,10] event recording,[11] and genetic barcodes for environmental tracking,[12,13] studying disease bottlenecks,[14−16] or provenance identification.[17,18] Given that the definition of function in genomes can be a controversial topic,[19] reliably predicting functionally neutral insertion points remains a challenge. Previous studies have used transposon mutagenesis,[20] low transcriptional activity,[21,22] or gaps between convergently transcribed genes[12,21] to select neutral insertion locations in several organisms. To facilitate this selection, we developed a bioinformatics pipeline combining the latter two approaches and present recommended gaps for 10 synthetic biology chassis organisms with experimental evidence that no obvious fitness disruption occurs for the gaps identified for Escherichia coli.

Results and Discussion

The core of our approach is to identify gaps between convergently transcribed genes where critical intergenic regulatory sequences, such as promoters or 5′ UTRs, are less likely to occur. While transcriptional terminators and unknown regulatory sequences could certainly be disrupted, we postulate that these regions reduce the probability of fitness effects. Moreover, transcriptional activity across any insertion is likely buffered in each direction by flanking terminators corresponding to the convergent genes. The tool, called targetFinder, identifies convergently transcribed genes within a range of gap sizes (300 to 2000 bp used here), checks for annotations in the gap, and looks for repetitive structures that may inhibit insertion (Supplementary Methods). While our approach was designed with prokaryotes in mind, a similar approach could be effective for higher organisms. We applied this approach to 10 synthetic biology chassis organisms (Table , Table S1). The number of identified gaps per genome ranged from 6 to 44 (Table ). Interestingly, the number of gaps is not correlated with genome size (Figure S1A, R2 = 0.1386, p-value that slope is nonzero = 0.29), suggesting that either genome architecture or the degree of annotation influence gap frequency. For E. coli and Bacillus subtilis, the two most well-studied organisms on the list, the largest gap is 603 bp, while 47 larger gaps were found across seven of the other eight organisms, which may suggest the presence of unannotated coding sequences (Figure S1B). The gaps are well distributed across the genome in each case (Figure S1C).

Table 1

Putative Phenotypically Neutral Insertion Sites for Common Synthetic Biology Chassis Organismsa

organism	chassis information	genome size (bp)	number of gaps identified by targetFinder	RNA-seq coverage
Escherichia coli	Ubiquitous model Gram-negative organism[24]	4 641 652	7	Figure S2
Bacillus subtilis	Spore former, model Gram-positive organism[24]	4 215 606	21	Figure S3
Vibrio natriegens	Ultrafast growing marine organism[25]	5 175 153	39	Figure S4–S5
Clostridium acetobutylicum	Robust anaerobe, industrial solvent production[26,27]	3 940 880	19	Figure S6
Marinobacter atlanticus	Marine electrogen[28]	4 768 422	21	Figure S7
Lactobacillus plantarum	Gut resident, common probiotic[29]	3 308 273	44	Figure S8–S9
Bacteroides thetaiotaomicron	Gut resident, common probiotic[30]	6 260 361	42	Figure S10–S11
Pseudomonas putida	Soil dweller and industrial host[31]	6 181 873	23	Figure S12
Deinococcus radiodurans	Extremely hardy[32]	3 060 986	6	Figure S13
Synechocystis sp. PCC 6803	Photosynthetic model cyanobacterium[33]	3 573 470	9	Figure S14

See Figure S1 for an assessment of gaps across organisms.

See Figure S1 for an assessment of gaps across organisms. A number of approaches can be used to further reduce the set of candidate gaps suggested by targetFinder. For each organism, we analyzed randomly selected RNA-seq data sets (Table S2, Supplementary Methods) to assess whether each identified gap had low apparent transcriptional activity relative to the rest of the genome (Figure A, S2–S14). The inclusion of transcriptomic coverage of the DNA immediately flanking each gap can also provide important context, depending on the editing application. For insertion of a functionally neutral sequence, low transcriptional activity across the gap and flanking DNA may be ideal as the insertion is less likely to interfere with native gene expression. For applications where high-expression from an inserted sequence is desired, low transcription levels in the gap and high levels in the flanking DNA may point to a well-expressed region of the genome. Assessment of local genetic functions can further inform selection of candidate gaps depending on the application. For example, proximity to rRNA operons may influence expression levels,[7,23] or knowledge that flanking genes are essential may motivate exclusion over fitness disruption concerns. Alternative annotation sources to identify missed annotations are another approach to eliminate candidate gaps identified by targetFinder. We note that gap information can change as genome annotations are updated, as indeed happened for E. coli during the course of this study (Supplementary Results).

Figure 1

Recommended insertion sites are phenotypically neutral. (A) RNA-seq coverage of each identified E. coli gap. RNA-seq data was selected randomly from a public repository (Supplementary Methods, Table S2). Panels display the identified gap indicated by the panel title (black shading) plus the 500 bp flanking either side (gray shading). Dashed lines indicate the median coverage depth across the genome. To limit computational intensity, reads aligning to locations with larger than 8000× depth were discarded, and thus the maximum depth is 8000. (B) Growth curves of wild-type E. coli and strains barcoded at each of six putative targets grown in rich medium (left) and minimal medium (right). Error bars represent the standard deviation; n = 9. (C) Abundance of each strain during repeated passages of competitive growth in rich medium as assessed by qPCR of the six barcoded strains. Error bars represent the standard deviation; n = 9 (triplicate cultures and triplicate qPCR reactions). In addition to the chassis organisms in Table , we previously used a similar approach to insert barcodes into two pathogens, Yersinia pestis and Bacillus anthracis, and simulant Bacillus thuringiensis kurstaki for applications in disease and dissemination tracking.[12,13,34] Here, we experimentally validate the neutrality of the sites identified for E. coli by inserting genetic barcodes at the putative insertion sites via CRISPR-Cas gene editing (see Supplementary Methods).[35] We eliminated one of the seven gaps due to problems with the commercial synthesis of the homology directed DNA repair template for editing this region. We found that growth curves for the wild-type and six barcoded strains were highly similar in rich and minimal media (Figure B). To further evaluate fitness impacts, we grew the strains together and assessed relative abundance by qPCR of the barcodes (Figure C, S15, Table S4). We found no apparent fitness differences after six serial passages and approximately 46 doublings (p > 0.05 within each gap; variation between gaps presumed to result from different relative initial concentrations). We note that it is possible that defects exist under different conditions, or if inserting an expressed gene rather than a neutral barcode. Interestingly, of the seven gaps initially identified for E. coli, Gap 7 (genomic coordinates 4510134–4510689) is in close proximity to a site (position 4506858) selected by Park et al. for insertion of a “landing pad” based on high-expression from an inserted reporter gene and lack of obvious fitness defects.[20] The study by Park et al. also supports our use of gaps between convergent genes, as an insertion at another intergenic location between nonconvergent genes abolished transcription of the downstream operon. In conclusion, we present a straightforward approach to identify phenotypically neutral genomic insertion points in prokaryotes and provide putative sites for 10 commonly used organisms in the synthetic biology community. We experimentally validate the sites recommended for E. coli. We anticipate the utility of this resource will grow as techniques to genetically engineer these organisms continue to expand.

Methods

Gaps were identified using the targetFinder software (https://github.com/ECBCgit/targetFinder), and then insertion points within gaps were selected using GuideFinder[36] and Cas-OFFinder[37] software. Detailed information about the alignment of RNA-seq data, generation of barcoded strains, phenotypic assays, and qPCR are available in the Supplementary Methods.

37 in total

1. Development of a Genetic System for Marinobacter atlanticus CP1 (sp. nov.), a Wax Ester Producing Strain Isolated From an Autotrophic Biocathode.

Authors: Lina J Bird; Zheng Wang; Anthony P Malanoski; Elizabeth L Onderko; Brandy J Johnson; Martin H Moore; Daniel A Phillips; Brandon J Chu; J Fitzpatrick Doyle; Brian J Eddie; Sarah M Glaven
Journal: Front Microbiol Date: 2018-12-21 Impact factor: 5.640

2. Multigene editing in the Escherichia coli genome via the CRISPR-Cas9 system.

Authors: Yu Jiang; Biao Chen; Chunlan Duan; Bingbing Sun; Junjie Yang; Sheng Yang
Journal: Appl Environ Microbiol Date: 2015-01-30 Impact factor: 4.792

3. Is junk DNA bunk? A critique of ENCODE.

Authors: W Ford Doolittle
Journal: Proc Natl Acad Sci U S A Date: 2013-03-11 Impact factor: 11.205

4. Molecular Toolkit for Gene Expression Control and Genome Modification in Rhodococcus opacus PD630.

Authors: Drew M DeLorenzo; Austin G Rottinghaus; William R Henson; Tae Seok Moon
Journal: ACS Synth Biol Date: 2018-02-05 Impact factor: 5.110

5. Synthetic Biology Tools for the Fast-Growing Marine Bacterium Vibrio natriegens.

Authors: Tanya Tschirhart; Vrinda Shukla; Erin E Kelly; Zachary Schultzhaus; Erin NewRingeisen; Jeffrey S Erickson; Zheng Wang; Whitney Garcia; Emaleigh Curl; Robert G Egbert; Enoch Yeung; Gary J Vora
Journal: ACS Synth Biol Date: 2019-09-09 Impact factor: 5.110

6. Detection and tracking of a novel genetically tagged biological simulant in the environment.

Authors: Peter A Emanuel; Patricia E Buckley; Tiffany A Sutton; Jason M Edmonds; Andrew M Bailey; Bryan A Rivers; Michael H Kim; William J Ginley; Christopher C Keiser; Robert W Doherty; F Joseph Kragl; Fiona E Narayanan; Sarah E Katoski; Sari Paikoff; Samuel P Leppert; John B Strawbridge; Daniel R VanReenen; Sally S Biberos; Douglas Moore; Douglas W Phillips; Lisa R Mingioni; Ogba Melles; Daniel G Ondercin; Beth Hirsh; Kendall M Bieschke; Crystal L Harris; Kristin M Omberg; Vipin K Rastogi; Sheila Van Cuyk; Henry S Gibbons
Journal: Appl Environ Microbiol Date: 2012-09-21 Impact factor: 4.792

7. Digitally Barcoding Mycobacterium tuberculosis Reveals In Vivo Infection Dynamics in the Macaque Model of Tuberculosis.

Authors: Constance J Martin; Anthony M Cadena; Vivian W Leung; Philana Ling Lin; Pauline Maiello; Nathan Hicks; Michael R Chase; JoAnne L Flynn; Sarah M Fortune
Journal: mBio Date: 2017-05-09 Impact factor: 7.867

8. A Universal, Genomewide GuideFinder for CRISPR/Cas9 Targeting in Microbial Genomes.

Authors: Michelle Spoto; Changhui Guan; Elizabeth Fleming; Julia Oh
Journal: mSphere Date: 2020-02-12 Impact factor: 4.389

9. The role of host and microbial factors in the pathogenesis of pneumococcal bacteraemia arising from a single bacterial cell bottleneck.

Authors: Alice Gerlini; Leonarda Colomba; Leonardo Furi; Tiziana Braccini; Ana Sousa Manso; Andrea Pammolli; Bo Wang; Antonio Vivi; Maria Tassini; Nico van Rooijen; Gianni Pozzi; Susanna Ricci; Peter W Andrew; Uwe Koedel; E Richard Moxon; Marco R Oggioni
Journal: PLoS Pathog Date: 2014-03-20 Impact factor: 6.823