Literature DB >> 35930647

In vivo processing of digital information molecularly with targeted specificity and robust reliability.

Yangyi Liu1, Yubin Ren1, Jingjing Li2, Fan Wang2, Fei Wang3, Chao Ma1, Dong Chen4, Xingyu Jiang5, Chunhai Fan3, Hongjie Zhang1,2, Kai Liu1,2.   

Abstract

DNA has attracted increasing interest as an appealing medium for information storage. However, target-specific rewriting of the digital data stored in intracellular DNA remains a grand challenge because the highly repetitive nature and uneven guanine-cytosine content render the encoded DNA sequences poorly compatible with endogenous ones. In this study, a dual-plasmid system based on gene editing tools was introduced into Escherichia coli to process information accurately. Digital data containing large repeat units in binary codes, such as text, codebook, or image, were involved in the realization of target-specific rewriting in vivo, yielding up to 94% rewriting reliability. An optical reporter was introduced as an advanced tool for presenting data processing at the molecular level. Rewritten information was stored stably and amplified over hundreds of generations. Our work demonstrates a digital-to-biological information processing approach for highly efficient data storage, amplification, and rewriting, thus robustly promoting the application of DNA-based information technology.

Entities:  

Year:  2022        PMID: 35930647      PMCID: PMC9355361          DOI: 10.1126/sciadv.abo7415

Source DB:  PubMed          Journal:  Sci Adv        ISSN: 2375-2548            Impact factor:   14.957


INTRODUCTION

As a biological genetic information carrier, deoxyribonucleic acid (DNA) provides a molecular-scale storage option for the vast amount of digital information generated (–). However, the high-precision addressing in DNA sequences has not been achieved via in vitro enzyme reactions, rendering DNA as a medium solely for writing or reading (, ). Because of the kaleidoscopic variety of digital data as well as the uneven distribution of bases in information-encoded DNA sequences, target-specific addressing and processing of digital information in living cells is still a great challenge (–). The latest developments in genome editing present tools, which could address target sites, repair damaged sequences, and revise specific genes within living cells (–). Those progresses inspire us to develop innovative biotechnology for in vivo rewriting exogenous information in a versatile way. CRISPR systems are well known for gene editing in chemical biology (). A variety of CRISPR-associated proteins (Cas), which are guided by their corresponding CRISPR RNA (crRNA) to cleave a target locus in a DNA sequence, appear promising to address and rewrite information-encoded DNA sequences accurately (, ). However, the recognition function of Cas is limited by its protospacer-adjacent-motif site, and editing efficiency is severely affected by the secondary structure of guided RNA, rendering the CRISPR-Cas tools highly restricted (–). Thus, arbitrary editing of gene sequences is difficult. In addition, information-encoded DNA sequences record a diverse range of digital information with highly repeating binary codes and have a low compatibility with endogenous DNA sequences, which pose great challenges for the practical applications of CRISPR-Cas tools in DNA-based information storage (). Therefore, target-specific processing of information-encoded DNA sequences within living cells remains unexplored. In this study, a dual-plasmid system based on a single crRNA-guided endonuclease (CRISPR-Cas12a) was designed and developed for DNA-based information storage and processing in living cells. High storage density, outstanding rewriting reliability, and outstanding amplification stability of digital text, codebook, and image information were successfully demonstrated in vivo using the system. This study explored the application of a dual-plasmid system for DNA-based information storage as well as the specific processing of information-encoded DNA sequences within living cells, such as target-specific rewriting.

RESULTS

DNA-based information storage and rewriting within living cells

The entire process of DNA-based information storage and rewriting within living cells is schematically illustrated in Fig. 1. The binary codes of the original digital information were first encoded into DNA sequences using a designed high-density encoding algorithm such as Base64 or Huffman (fig. S1) (, ). Once the digital binary codes were converted to biological DNA sequences, the information was not easily accessible, and information stored in the DNA sequences could be retrieved only by using the corresponding decoding algorithm. These exogenous information-encoded DNA sequences were then synthesized in vitro by solid-state reaction and cloned in plasmid containing encoded information (info plasmid) of Escherichia coli strain MG1655 for long-term storage ().
Fig. 1.

Schematic illustration of DNA-based information storage and rewriting within living cells.

The digital information stored in binary codes was encoded into DNA sequences using a high-density encoding algorithm. The sequences were then cloned into the plasmid (info, info plasmid) of living cells for long-term storage, data amplification, and information rewriting. Information rewriting was achieved using the dual-plasmid system based on CRISPR-Cas12a-λRed, whose encoding template was cloned in the plasmid (help, help plasmid). The original information was target-specifically rewritten by replacing the target DNA fragment within the info plasmid with a donor DNA fragment. Following precise revision, the rewritten information was decoded by decoding the DNA sequence within the new info plasmid (info*).

Schematic illustration of DNA-based information storage and rewriting within living cells.

The digital information stored in binary codes was encoded into DNA sequences using a high-density encoding algorithm. The sequences were then cloned into the plasmid (info, info plasmid) of living cells for long-term storage, data amplification, and information rewriting. Information rewriting was achieved using the dual-plasmid system based on CRISPR-Cas12a-λRed, whose encoding template was cloned in the plasmid (help, help plasmid). The original information was target-specifically rewritten by replacing the target DNA fragment within the info plasmid with a donor DNA fragment. Following precise revision, the rewritten information was decoded by decoding the DNA sequence within the new info plasmid (info*). To rewrite the complex information, a dual-plasmid system based on CRISPR-Cas12a and phage-derived recombinases (λRed) was constructed (). The info plasmid contained the target DNA sequence carrying the original digital information, whereas the other plasmid contained the templates for the expressions of Cas12a and λRed (help plasmid). Following activation of λRed and Cas12a expressions by tetracycline and arabinose, respectively, the target DNA fragment present in the info plasmid was selectively excised by Cas12a under the guidance of crRNA. The donor DNA, which carried the rewritten information and two homologous arms of ~500 base pairs (bp) each, then replaced the target DNA fragment and recombined the info plasmid with the help of λRed (fig. S2). Thus, the original information encoded in the DNA sequence within the info plasmid was selectively revised by replacing the target DNA with donor DNA using the dual-plasmid system. Following target-specific information rewriting, the rewritten DNA sequence in the info plasmid was sequenced using the universal M13 forward primer (M13F). The precisely revised information was then retrieved by decoding the rewritten DNA sequence using the corresponding decoding algorithm.

Target-specific rewriting of information encoded by Base64

To demonstrate the process of information rewriting within living cells, the binary codes of the digital message “IT WAS THE WORST OF TIMES.” from A Tale of Two Cities by Charles Dickens were encoded into a 128-bp DNA sequence using the Base64 encoding algorithm (table S1). The Base64 algorithm encodes digital information into DNA sequences with controlled guanine-cytosine (GC) contents and reduced homopolymers, thus reducing the error rates in DNA synthesis and sequencing (). The information-encoded DNA sequence coding by the Base64 algorithm was synthesized by solid-state reaction and stored in the info plasmid, pUC57-Base64-text (table S1). The info plasmid also contained the template for constitutive transcription of crRNA, which recognizes the location of the target DNA fragment for subsequent information revision. In addition, the template for green fluorescence protein (GFP) was cloned without promoter and fused at the 3′ end of information-encoded DNA sequence (Fig. 2A). The help plasmid (p46Cpf1-OP2) expresses λRed recombinases under the control of tetracycline-inducible promoter (Ptet), as well as Cas12a under the control of the arabinose-inducible promoter (ParaBAD). The info plasmid and help plasmid were introduced into MG1655 by heat shock and electroporation, respectively, obtaining an E. coli strain MG1655BT for long-term storage, data amplification, and information rewriting (Fig. 2B and table S2).
Fig. 2.

Selective and precise rewriting of information stored in the DNA sequence using the dual-plasmid system.

(A) The binary codes of a digital text were encoded into a DNA sequence using the Base64 encoding algorithm and synthesized by solid-state reaction. The info plasmid also carried the templates for the constitutive transcription of crRNA and GFP without promoter. (B) The info plasmid, pUC57-Base64-text, and the help plasmid, p46Cpf1-OP2, were simultaneously introduced into MG1655 to obtain MG1655BT. The help plasmid expressed λRed recombinases under the control of Ptet and Cas12a under the control of ParaBAD. (C) Target DNA fragment present in the info plasmid was recognized by crRNA and excised by Cas12a, forming a break in the dsDNA. The donor DNA fragment, which carried the rewritten information, Ptac, and two ~500-bp homologous arms, replaced the target DNA fragment and recombined the dsDNA assisted by λRed recombinase. (D) After rewriting, the original info plasmid was revised into a new info plasmid, pUC57-Base64-newtext. Because the Ptac promoter was also inserted upstream of the template for GFP (GFP*) expression in the new info plasmid, GFP was successfully expressed in the rewritten strain MG1655BNT, thus serving as a reporter for successful information revision. (E) Sequencing of the DNA fragment before and after information rewriting, to retrieve original and rewritten information, respectively. (F) GFP expression level in 48-hour-old cells (black contour line: MG1655BT; gray contour line: MG1655BNT) and (G) ratio of fluorescent cells before and after information rewriting. (H) Successful information rewriting in 9 of 10 fluorescent strains.

Selective and precise rewriting of information stored in the DNA sequence using the dual-plasmid system.

(A) The binary codes of a digital text were encoded into a DNA sequence using the Base64 encoding algorithm and synthesized by solid-state reaction. The info plasmid also carried the templates for the constitutive transcription of crRNA and GFP without promoter. (B) The info plasmid, pUC57-Base64-text, and the help plasmid, p46Cpf1-OP2, were simultaneously introduced into MG1655 to obtain MG1655BT. The help plasmid expressed λRed recombinases under the control of Ptet and Cas12a under the control of ParaBAD. (C) Target DNA fragment present in the info plasmid was recognized by crRNA and excised by Cas12a, forming a break in the dsDNA. The donor DNA fragment, which carried the rewritten information, Ptac, and two ~500-bp homologous arms, replaced the target DNA fragment and recombined the dsDNA assisted by λRed recombinase. (D) After rewriting, the original info plasmid was revised into a new info plasmid, pUC57-Base64-newtext. Because the Ptac promoter was also inserted upstream of the template for GFP (GFP*) expression in the new info plasmid, GFP was successfully expressed in the rewritten strain MG1655BNT, thus serving as a reporter for successful information revision. (E) Sequencing of the DNA fragment before and after information rewriting, to retrieve original and rewritten information, respectively. (F) GFP expression level in 48-hour-old cells (black contour line: MG1655BT; gray contour line: MG1655BNT) and (G) ratio of fluorescent cells before and after information rewriting. (H) Successful information rewriting in 9 of 10 fluorescent strains. To rewrite the message to “IT WAS THE BEST OF TIMES.” by replacing the word “WORST” with “BEST,” the donor DNA fragment was prepared. The donor DNA, containing 75 bp of rewritten information, a 10-bp hybrid promoter (Ptac), and two homologous arms of ~500 bp each, was introduced exogenously into MG1655BT by electroporation (Fig. 2C). The two ~500-bp arms were consistent with the upstream and downstream DNA sequences of the target DNA fragment. The target DNA fragment present in the info plasmid was recognized by crRNA. In addition, it was cut by Cas12a, forming a break in the info plasmid. Subsequently, the donor DNA fragment replaced the target DNA fragment and recombined the info plasmid assisted by λRed recombinases. Thus, the original info plasmid was revised to a new info plasmid, pUC57-Base64-newtext (table S1). A rewritten strain MG1655BNT was obtained (Fig. 2D and table S2). After that, to stop rewriting, the strain solution was spread and cultured in the LB medium without the inducer of Cas12a. In the new info plasmid, the Ptac promoter was inserted upstream of the template for GFP expression. Because of the trace component of lactose in LB medium, the reporter GFP was successfully expressed in the rewritten strain MG1655BNT. The green fluorescence of reporter GFP enabled the direct visualization and identification of living cells with edited information. To verify the accuracy of the information before and after rewriting, we selected the nonfluorescent strains containing the encoded DNA sequences with original information and the fluorescent strains with the rewritten information. They were verified by sequencing using the universal primer M13F (Fig. 2E). After decoding from the DNA sequences using the Base64 decoding algorithm, both the original and rewritten information were correct, confirming the successful rewriting of fluorescent strains. This was distinguishable by flow cytometry analysis (Fig. 2F). In addition, the ratio of cells showing green fluorescence after rewriting to the total number of cells was up to 89%, suggesting that the rewritten plasmids dominated in most of the cells and the rewriting system exhibited a rather high efficiency regarding data revision (Fig. 2G). To further test the reliability of information rewriting in fluorescent strains after 48 hours of cell culture, we randomly selected 10 fluorescent strains for sequencing. Last, in the nine strains, the info plasmids with the rewriting processing showed the correct information consistently, as summarized in Fig. 2H. Thus, the reliability of information rewriting in fluorescent strains, which is defined as the ratio of fluorescent strains with correct rewritten information to the total number of fluorescent strains, was up to 90%. The above results suggested that the dual-plasmid system based on CRISPR-Cas12a-λRed was applicable for long-term storage, data amplification, and information rewriting of DNA-based information systems. Furthermore, the information-encoded DNA sequence in the info plasmid was selectively and precisely rewritten within living cells. The successful expression of GFP in the rewritten strains greatly facilitated the identification of strains with rewritten information.

Selective rewriting of information encoded by 15-ary Huffman

The Base64 algorithm has the ability to transform binary information into DNA sequences with controlled GC contents and reduced homopolymers. However, for certain information containing large number of repeats, the Base64 algorithm could not reduce the length of the DNA sequence to improve coding efficiency, and homopolymers may still appear. Therefore, the 15-ary Huffman algorithm was developed to compress information by reducing the length of the DNA sequence and avoid the appearance of homopolymers (, ). To demonstrate that the dual-plasmid system is also capable of information rewriting using the 15-ary Huffman encoding algorithm, the same text message “IT WAS THE WORST OF TIMES.” was encoded into a 52-bp DNA sequence. The DNA sequence encoded by the 15-ary Huffman encoding algorithm was shorter than that by the Base64 encoding algorithm and contained no homopolymers, as shown in table S1. The word “WORST” was then revised to “BEST” following a similar strategy (Fig. 3, A and B).
Fig. 3.

In vivo rewriting of text information encoded by the 15-ary Huffman algorithm.

(A) The original strain MG1655HT with two plasmids, the info plasmid, pUC57-Huffman-text, and the help plasmid (help), was introduced into E. coli MG1655. The rewriting process formed a new info plasmid, pUC57-Huffman-newtext, in the rewritten strain MG1655HNT. (B) The binary codes of digital text were encoded into a different DNA sequence using the 15-ary Huffman encoding algorithm, and the synthesized DNA sequence was then stored in the info plasmid of bacteria. The target DNA fragment present in the info plasmid was recognized by crRNA and excised by Cas12a, forming a break in the dsDNA. The donor DNA fragment then replaced the target DNA fragment and recombined the info plasmid assisted by λRed recombinase, thus forming a new info plasmid, pUC57-Huffman-newtext, in the rewritten strain MG1655HNT. (C) Green fluorescence of rewritten strains under bright field (BF) and UV light. (D) Sequencing of the DNA fragment before and after information rewriting to retrieve original and rewritten information, respectively. (E) GFP expression level in 48-hour-old cells (black contour line: MG1655HT; gray contour line: MG1655HNT) and (F) ratio of fluorescent cells before and after information rewriting. (G) Successful information rewriting in 14 of 15 fluorescent strains.

In vivo rewriting of text information encoded by the 15-ary Huffman algorithm.

(A) The original strain MG1655HT with two plasmids, the info plasmid, pUC57-Huffman-text, and the help plasmid (help), was introduced into E. coli MG1655. The rewriting process formed a new info plasmid, pUC57-Huffman-newtext, in the rewritten strain MG1655HNT. (B) The binary codes of digital text were encoded into a different DNA sequence using the 15-ary Huffman encoding algorithm, and the synthesized DNA sequence was then stored in the info plasmid of bacteria. The target DNA fragment present in the info plasmid was recognized by crRNA and excised by Cas12a, forming a break in the dsDNA. The donor DNA fragment then replaced the target DNA fragment and recombined the info plasmid assisted by λRed recombinase, thus forming a new info plasmid, pUC57-Huffman-newtext, in the rewritten strain MG1655HNT. (C) Green fluorescence of rewritten strains under bright field (BF) and UV light. (D) Sequencing of the DNA fragment before and after information rewriting to retrieve original and rewritten information, respectively. (E) GFP expression level in 48-hour-old cells (black contour line: MG1655HT; gray contour line: MG1655HNT) and (F) ratio of fluorescent cells before and after information rewriting. (G) Successful information rewriting in 14 of 15 fluorescent strains. The info plasmid encoded text by the 15-ary Huffman algorithm was named pUC57-Huffman-text (table S1). The help plasmid (p46Cpf1-OP2) for rewriting was the same as the former section. Those two plasmids were transferred into MG1655 by heat shock and electroporation, respectively, generating the E. coli strain MG1655HT (table S2). To efficiently guide the cleavage of the info plasmid by Cas12a in vitro and in vivo, a 30-bp sequence was designed as the binding site for crRNA (). It was located between the 52-bp information-encoded DNA sequence and the 705-bp sequence of frame code for GFP expression in the info plasmid. The template for the transcription of crRNA was fused at the 3′ end of the 705-bp GFP sequence to easily construct the info plasmid for information rewriting. The target DNA fragment present in the info plasmid was revised in the same way as the previous section, forming a new info plasmid pUC57-Huffman-newtext in the E. coli strain MG1655HNT (tables S1 and S2). The revised DNA sequence within living cells was successfully confirmed by the fluorescent signal of the rewritten strains under ultraviolet (UV) light (Fig. 3C). The information-encoded DNA fragments were then sequenced from nonfluorescent and fluorescent strains, allowing retrieval of original and rewritten information, respectively (Fig. 3D and fig. S6A). Strains were easily distinguished before and after information rewriting by the level of GFP expression (Fig. 3E). After rewriting the information encoded by the 15-ary Huffman encoding algorithm, the ratio of cells showing green fluorescence to the total number of cells was up to 95% (Fig. 3F). In addition, 14 of 15 fluorescent strains showed the correct rewritten information, with the reliability of information rewriting in fluorescent strains being at least 93% (Fig. 3G). Thus, the dual-plasmid system based on CRISPR-Cas12a-λRed is suitable for information rewriting using the 15-ary Huffman algorithm as well. Compared to the information encoded by the Base64 algorithm, the compression algorithm in the 15-ary Huffman algorithm shortened the information sequence and the donor DNA fragment, improving the coding efficiency to 4.0 bits per nucleotide. However, the compression algorithm increased the difficulty of information rewriting, because small modifications of the original information might result in a large change in the entire DNA sequence, making the 15-ary Huffman algorithm especially suitable for information of small size or those with large repeats. Nevertheless, homopolymers were completely eliminated in the DNA sequence encoded by the 15-ary Huffman algorithm, which greatly reduced the error rates in DNA synthesis and sequencing. Generally, it is difficult to store digital information with large repeat units in binary codes in DNA sequences, such as codebook and image. They are hard to synthesize and recognize by crRNA for rewriting within living cells. To address this challenge, a codebook with a size of 56 bytes, which contained nine characters in a 3 × 3 matrix, was converted into a 140-bp DNA sequence with a coding efficiency of 3.3 bits per nucleotide by the 15-ary Huffman encoding algorithm. The second row in the codebook was successfully switched with the third following the same strategy as that used for text information rewriting, as schematically illustrated in Fig. 4A. The information-encoded DNA fragments were sequenced from nonfluorescent and fluorescent strains, which carry the original and revised information, respectively (fig. S6B). The original and revised information decoded from the DNA sequences are 100% correct.
Fig. 4.

Rewriting of codebook and image stored in the DNA sequences using CRISPR-Cas12a-λRed.

(A) Schematics showing row switch in the codebook and (B) color change in the image, achieved by information rewriting. (C) Green fluorescent color of rewritten strains under UV light, which confirmed the revision of codebook and image encoded in the DNA sequences using the dual-plasmid system based on CRISPR-Cas12a-λRed. (D) GFP expression level in 48-hour-old cells (black contour line: MG1655HC and MG1655HI; gray contour line: MG1655HNC and MG1655HNI) and (E) ratio of fluorescent cells before and after information rewriting. (F) Successful row switch of codebook information in 17 of 18 fluorescent strains and successful color change of image information in 19 of 21 fluorescent strains. (G) Flow cytometry analysis of cell size and GFP expression level in 48-hour-old cells of unrevised strains (MG1655HT, MG1655HC, and MG1655HI) and rewritten strains (MG1655HNT, MG1655HNC, and MG1655HNI) after the fifth inoculation. The high GFP expression level observed after the fifth inoculation suggests that the rewritten information can be passed on to future generations.

Rewriting of codebook and image stored in the DNA sequences using CRISPR-Cas12a-λRed.

(A) Schematics showing row switch in the codebook and (B) color change in the image, achieved by information rewriting. (C) Green fluorescent color of rewritten strains under UV light, which confirmed the revision of codebook and image encoded in the DNA sequences using the dual-plasmid system based on CRISPR-Cas12a-λRed. (D) GFP expression level in 48-hour-old cells (black contour line: MG1655HC and MG1655HI; gray contour line: MG1655HNC and MG1655HNI) and (E) ratio of fluorescent cells before and after information rewriting. (F) Successful row switch of codebook information in 17 of 18 fluorescent strains and successful color change of image information in 19 of 21 fluorescent strains. (G) Flow cytometry analysis of cell size and GFP expression level in 48-hour-old cells of unrevised strains (MG1655HT, MG1655HC, and MG1655HI) and rewritten strains (MG1655HNT, MG1655HNC, and MG1655HNI) after the fifth inoculation. The high GFP expression level observed after the fifth inoculation suggests that the rewritten information can be passed on to future generations. Furthermore, we tested an image, which is notably larger than the codebook. The 376-byte large image was encoded into a 748-bp sequence with a coding efficiency of 4.0 bits per nucleotide by the 15-ary Huffman encoding algorithm. After rewriting the DNA sequence, the color of the apple pattern changed from red to green, as illustrated in Fig. 4B. The revised information was further confirmed by DNA sequencing (fig. S6C). In the case of codebook and image information, the expression of GFP and the resulting green fluorescence in the rewritten strains under UV light served as a good reporter for successful information rewriting (Fig. 4, C and D). The ratio of cells showing green fluorescence to the total number of rewritten cells was up to 94% for the codebook and 90% for the image (Fig. 4E); the reliability of information rewriting in fluorescent strains reached 94% for the codebook and 91% for the image (Fig. 4F). The reliability of information rewriting for the image was slightly lower as compared to that for the codebook, which can be attributed to the large size of the image. Owing to its advanced algorithm and the explored crRNA sequence, the capability of the dual-plasmid system to precisely rewrite the exogenous information sequence is comparable to that of the existing gene editing system (). Notably, this dual-plasmid system based on the prokaryotic genome editing tool, CRISPR-Cas12a-λRed, proved to be a feasible strategy for long-term storage, amplification, and rewriting of different types of information containing large repeat units in binary codes, including text, codebook, and image.

Stability of DNA-based information stored in living cells

E. coli bacteria generally proliferate into the next generation within 20 min under optimal conditions. Mutations during plasmid replication or changes in cellular fitness may compromise the fidelity of DNA-based information stored within living cells (). To test the genetic stability of the information, E. coli strains were randomly chosen and inoculated consecutively five times. MG1655HT, MG1655HC, and MG1655HI, storing three different types of information encoded by the 15-ary Huffman encoding algorithm, were cultured. Following information revision, the rewritten fluorescent strains MG1655HNT, MG1655HNC, and MG1655HNI were reinoculated every 12 hours at 30°C, during which bacteria were allowed to proliferate under optimal culture conditions (). After the fifth inoculation, the bacteria were cultured for 36 hours at 30°C and showed a high GFP expression level as detected by flow cytometry (Fig. 4G and fig. S7). During the whole process, bacteria were cultured for a total of 84 hours, and one bacterium could proliferate over 252 generations into 7.2 × 1075 bacteria. The copy of information stored in the bacteria was exponentially amplified simultaneously. Notably, the decoded DNA fragment was sequenced from the 252nd generation of the fluorescent strains. The results indicated that the rewritten information was 100% correct, suggesting that the DNA-based information stored in bacteria was robust, and the expression of GFP has little effect on the stability of the DNA sequence (fig. S8). Furthermore, the genomic DNA of the information-encoded E. coli strains before and after rewriting was studied by next-generation sequencing. The average rate of single-nucleotide variants among the rewritten information-encoded strains was only 0.000030, which was much lower than the control group (fig. S9). This confirmed that there was almost no off-target and mutant behaviors in the process of rewriting and storing. Thus, the DNA-based information stored in bacteria could be passed on to future generations, providing a versatile and reliable way to amplify the information.

DISCUSSION

We developed an in vivo dual-plasmid system using a rational design of coding algorithm and an information editing tool. It is suitable and universal for storing, rewriting, and reading various types of information, including text, codebook, and image. The system fully explores the coding capability of DNA sequences without requiring any addressing indices or backup sequences and is compatible with various kinds of coding algorithms, thus enabling a high coding efficiency. Notably, the coding efficiency of DNA-based information was achieved by incorporating compression algorithm with rotation mapping in the 15-ary Huffman algorithm. This aimed to reduce the length of the encoded DNA sequence for some information containing a large number of repeats and to avoid the appearance of homopolymers. Eventually, the efficiency was up to 4.0 bits per nucleotide in the present systems, which can be an alternative strategy to improve coding efficiency (, , , , –), as shown in fig. S10. In addition, the digital information encoded in info plasmids can be stored and hidden in microbial colonies stably over hundreds of generations for a long duration of time, greatly improving the security of information during storage and transportation. To achieve efficiency as well as reliability in the rewriting of the complex information stored in exogenous DNA sequences in vivo, the high specificity between complementary pairs of nucleic acid molecules was used to accurately construct new information. By optimizing the crRNA sequence, the information rewriting tool became highly adaptable to complex information, resulting in a high rewriting reliability of up to 94% and comparable to that of existing gene editing systems (, ). In addition, because of the Cas12a/crRNA complex being activated by exogenous inducers, the rewriting process could be maneuvered to initiate at a desired time and space (). The introduction of fluorescent proteins as reporters also greatly improved the readability of living cells carrying rewritten information and resulted in the direct visibility of the molecular-level editing of DNA sequences. Hence, this study realized the application of the CRISPR-Cas tool for the processing and revision of various types of digital information within living cells. By combining the CRISPR-Cas tool and the coding algorithm, the dual-plasmid system presented as a universal platform for DNA-based information rewriting in vivo, thus depicting a new strategy for information processing and target-specific rewriting of large and complicated data on a molecular level. In summary, digital information stored in exogenous DNA sequences was specifically targeted and reliably rewritten within living cells using the dual-plasmid CRISPR-Cas12a system. The digital-to-biological system enables DNA, a molecular-level information carrier, to be processed as traditional physical memory with targeted specific access and information editing. The system also serves as a novel paradigm for flexible and dynamic processing of digital information within living cells. While this dual-plasmid system is low cost and nimble, the storage capacity is limited by the host genome size and the length of inserted DNA. Future exploration of the system in a living host with a larger genome and adoption of plasmids such as yeast artificial chromosome, which allows the insertions of longer artificial DNA sequences, would further pave the way for practical applications regarding big data storage. Furthermore, this system could combine with multiple functional elements, such as silica biomineralization peptides and light-inducible transcription factors for long-term data storage and adjustable information processing, respectively. The system could also integrate with microfluidic technology to build a living cell information storage array (). Nevertheless, this study widens the platform of DNA-based information storage and highlights the infinite potential applications of chemical biology in multiple disciplines.

MATERIALS AND METHODS

Materials

Propagation, subcloning, and storage of plasmids were performed within E. coli DH5α purchased from TransGen (China). Recombinant proteins were expressed in E. coli BL21(DE3) purchased from TransGen (China). E. coli MG1655 purchased from ZOMANBIO (China) were used for information storage and rewriting. DNA sequences including info plasmid, new information sequences, and the template for GFP expression (705 bp) were synthesized by GENEWIZ (China) and are listed in table S1. The strains used in this study are listed in table S2. Primers in table S3 were synthesized by Tsingke (China). After DNA amplification, procedures including DNA cloning, DNA fragment fusion, and polymerase chain reaction (PCR) test were performed to construct the donor DNA fragment, which used the enzymes purchased from Vazyme (China). DNA purification and plasmid isolation kits were purchased from TIANGEN (China).

Construction of the dual-plasmid system

The dual-plasmid system was constructed by first introducing 100 ng of info plasmid into 50 μl of chemocompetent MG1655 cells via heat shock. The resulting cells were incubated at 37°C for 1 hour and were spread on 1.5% agarose plate with kanamycin (50 μg/ml) to isolate the strains that contain the info plasmid. The plate was cultured at 37°C for 12 hours, and one isolated strain was cultured in 1 ml of LB medium with kanamycin (50 μg/ml) at 30°C under 220 rpm vibration overnight. After centrifugation at 4000g for 2 min at 4°C, collected cells were washed two times with 1 ml of ice-cold 10 vol% glycerol solution, suspended in 200 μl of ice-cold 10 vol% glycerol solution, and separated into 40 μl per tube. One hundred nanograms of help plasmid was then introduced into 40 μl of MG1655 cells containing info plasmid and incubated on ice for 1 min. The cells were then transferred to a chilled electroporation cuvette with a gap of 2 mm (Bio-Rad, USA) and pulsed once at 2.5 kV for 2.5 ms with a Bio-Rad GenePulser (Bio-Rad, USA). After electroporation using MG1655, cells carrying both info and help plasmids were incubated with kanamycin (50 μg/ml) and chloramphenicol (12.5 μg/ml) at 30°C.

Construction of the donor DNA

The DNA fragment carrying the rewritten information was synthesized by solid-state reaction. To construct the donor DNA, two ~500-bp homologous arms, which are consistent with the upstream and downstream DNA sequences of the target DNA fragment, were fused with the DNA fragment at its upstream and downstream, respectively. The homologous arms were amplified from info plasmid by a pair of primers. For text information encoded by the Base64 algorithm, the upstream homologous arm was amplified by Ba-rewrite-F and Ba-UParm-R from the upstream DNA sequence of the target DNA fragment in the info plasmid, and the downstream homologous arm was amplified by Ba-DWarm-F and Ba-rewrite-R from the downstream DNA sequence of the target DNA fragment. For information encoded by the 15-ary Huffman algorithm, the upstream homologous arm was amplified by Hu-rewrite-F and Hu-UParm-R from the upstream DNA sequence of the target DNA fragment in the info plasmid, and the downstream homologous arm was amplified by Hu-DWarm-F and Hu-rewrite-R from the downstream DNA sequence of the target DNA fragment. Before information rewriting, there is no promoter for the expression of GFP. Promoter Ptac, which was contained in the donor DNA, was fused at the 5′ end of template for GFP expression after information rewriting, and thus, GFP was successfully expressed in rewritten strains, which serves as a good reporter for rewritten strains.

Functions of crRNA, Cas12a, λRed, and donor DNA in the dual-plasmid system

The cutting of info plasmids by Cas12a under the guidance of crRNA was tested in vitro. The selected cutting site of the info plasmid pUC57-Base64-text was designed in the information sequence. However, for the info plasmid pUC57-Huffman-text, the selected cutting site recognized by crRNA was fused with the 3′ end of the information sequence. To synthesize the cutting guider crRNA in vitro, DNA templates were prepared, which contained a T7 promoter, 25-nt guide sequence, and a crRNA scaffold. The templates were generated by annealing a pair of primers, such as Ba-crRNA-F and Ba-crRNA-R for info plasmid pUC57-Base64-text or Hu-crRNA-F and Hu-crRNA-R for info plasmid pUC57-Huffman-text. The pair of primers are dissolved in 20 μl of sterile water at a concentration of 1 μM. The annealing process included heating at 95°C for 5 min and cooling from 95° to 4°C at a rate of −1°C/min. The annealed double-stranded DNA (dsDNA) templates were then transcribed and purified in vitro using the T7 High-Yield RNA Transcription Kit and VAHTS RNA Clean Beads purchased from Vazyme (China). For the expression of Cas12a, the plasmid pET-28b-T7-henAsCas12a-HF1 was purchased from Addgene (America, #114073) and introduced into E. coli BL21(DE3). Subsequently, recombinant Cas12a was expressed by E. coli and purified by His-tag column (fig. S3A). Then, cutting of info plasmid by the purified Cas12a in vitro was performed according to the manufacturer’s instruction (). Briefly, the in vitro reaction containing info plasmid (12 ng/μl), Cas12a protein (50 ng/μl), and crRNA (35 ng/μl) in 10 μl of solution (containing 20 mM Hepes, 150 mM KCl, 1 mM MgCl2, and 10% glycerol) was performed at 37°C for 1 hour. The sample was then treated with 4 μg of ribonuclease purchased from Vazyme (China) and incubated at 37°C for 15 min; subsequently, the sample was treated with 2.5 μg of proteinase K purchased from Sigma-Aldrich (USA) and incubated at 55°C for 10 min. After 5 min at room temperature, the sample received 1 μl of STOP solution (30% glycerol, 1.2% SDS, and 250 mM EDTA at pH 8.0) and was incubated at 37°C for 15 min. Last, the sample was analyzed by gel electrophoresis in 1% agarose gel (fig. S3B). E. coli MG1655 bearing the dual-plasmid system were incubated with kanamycin (50 μg/ml) and chloramphenicol (12.5 μg/ml) at 30°C. In the living cells, crRNA was constitutively transcribed under the control of PspeI. λRed recombinase, which was expressed under the control of Ptet, was promoted at the addition of tetracycline (80 ng/ml). In the presence of l-arabinose (5 mg/ml) and the absence of glucose (20 mg/ml), Cas12a was expressed and cut the target DNA sequence in the info plasmid under the guidance of crRNA. Rewritten information was encoded in the donor DNA sequence, and 2 μl of purified donor DNA (100 ng/μl) was introduced into the dual-plasmid system. The info plasmid with a donor DNA sequence aims to accomplish information rewriting. Therefore, the functions of crRNA, Cas12a, λRed, and donor DNA in the dual-plasmid system were tested in the crRNA-free, Cas12a-free, λRed-free, and donor DNA–free groups, respectively, by turning off each individual induction signal and incubating the system at 30°C for 1 day. The results are shown in fig. S4.

Information rewriting using the dual-plasmid system

A schematic diagram illustrating the process for information rewriting using the dual-plasmid system is shown in fig. S2. The procedure is described in detail below: (i) Day 1: Individual colony containing info plasmid and help plasmid was picked and grew in LB medium with kanamycin (50 μg/ml), chloramphenicol (12.5 μg/ml), and glucose (20 mg/ml) at 30°C under 220 rpm vibration overnight. (ii) Day 2: 10 μl of the above bacterial culture was added into 1 ml of LB medium with kanamycin (50 μg/ml) and chloramphenicol (12.5 μg/ml) in a 1.5-ml eppendorf (EP) tube and then cultured at 30°C under 220 rpm vibration for 6 hours. l-Arabinose (5 mg/ml) and tetracycline (80 ng/ml) were then added into the bacterial culture. After incubation at 30°C for 2 hours, the sample was centrifuged at 4000g for 2 min at 4°C. The cells were washed two times with 1 ml of ice-cold 10 vol% glycerol solution, then suspended in 200 μl of ice-cold 10 vol% glycerol solution, and separated into 40 μl per tube. Two hundred nanograms of donor DNA was added into 40 μl of resuspended cells for electroporation. The electroporation condition was the same as Construction of the dual-plasmid system. The cells after electroporation were resuspended by 1 ml of Super Optimal broth with Catabolite repression (SOC) medium and transferred into a 1.5-ml EP tube. After incubation at 30°C for 1 hour, l-arabinose (5 mg/ml) and tetracycline (80 ng/ml) were added into the bacterial culture and incubated for another 1 hour. The bacterial culture was then centrifuged at 4000g for 5 min at room temperature, and collected cells were redispersed in 100 μl of LB medium, which were spread on 1.5% agarose plate with kanamycin (50 μg/ml), chloramphenicol (12.5 μg/ml), and glucose (20 mg/ml) and cultured at 30°C for 48 hours. (iii) Days 3 to 4: Individual strain was observed under UV light to distinguish green fluorescent strains, which were then sequenced by the first-generation sequencing method using the universal primer M13F.

Flow cytometry analysis of fluorescent strains

After information rewriting and incubation for 48 hours, all strains on the plate were harvested and suspended in 1 ml of PBS in a 1.5-ml EP tube. The cells were washed two times with 1 ml of PBS and resuspended in 1 ml of PBS. The cell suspension was filtered through a stainless steel mesh (300 mesh), and the green fluorescence attributed to the expression of GFP was detected by the flow cytometer. Blank control consisted of strains before information rewriting, which were also harvested from 1.5% agarose plate with kanamycin (50 μg/ml), chloramphenicol (12.5 μg/ml), and glucose (20 mg/ml).

Next-generation sequencing of the information-encoded strains

After information rewriting and incubation for 48 hours, the E. coli strains MG1655HT, MG1655HNT, MG1655HC, MG1655HNC, MG1655HI, and MG1655HNI were harvested and suspended in 200 μl of 20 vol% glycerin in PCR tube, which was stored at 4°C. The negative control was the MG1655 strain with p46Cpf1-OP2, which was cultured at 30°C overnight and stimulated by l-arabinose (5 mg/ml) at 30°C for 24 hours. Each strain was inoculated in 5 ml of LB media, proliferated at 30°C for 20 hours, and collected by centrifugation for sequencing (GENEWIZ, China). For next-generation sequencing, each sample was sequenced on the Illumina HiSeq X Ten/NovaSeq/MGISEQ-2000 System. The annotation for single-nucleotide variants was performed by Annovar (V21 April 2018) and referred the sequence of MG1655 complete genome (National Center for Biotechnology Information reference sequence: NC_000913.3, https://ncbi.nlm.nih.gov/nuccore/556503834?report=fasta).
  41 in total

1.  What's new in clinical CRISPR?

Authors:  Carrie Arnold
Journal:  Nat Med       Date:  2021-02       Impact factor: 53.440

2.  Random access in large-scale DNA data storage.

Authors:  Lee Organick; Siena Dumas Ang; Yuan-Jyue Chen; Randolph Lopez; Sergey Yekhanin; Konstantin Makarychev; Miklos Z Racz; Govinda Kamath; Parikshit Gopalan; Bichlien Nguyen; Christopher N Takahashi; Sharon Newman; Hsing-Yeh Parker; Cyrus Rashtchian; Kendall Stewart; Gagan Gupta; Robert Carlson; John Mulligan; Douglas Carmean; Georg Seelig; Luis Ceze; Karin Strauss
Journal:  Nat Biotechnol       Date:  2018-02-19       Impact factor: 54.908

3.  Towards practical, high-capacity, low-maintenance information storage in synthesized DNA.

Authors:  Nick Goldman; Paul Bertone; Siyuan Chen; Christophe Dessimoz; Emily M LeProust; Botond Sipos; Ewan Birney
Journal:  Nature       Date:  2013-01-23       Impact factor: 49.962

4.  The tac promoter: a functional hybrid derived from the trp and lac promoters.

Authors:  H A de Boer; L J Comstock; M Vasser
Journal:  Proc Natl Acad Sci U S A       Date:  1983-01       Impact factor: 11.205

5.  Adaptive laboratory evolution of a genome-reduced Escherichia coli.

Authors:  Donghui Choe; Jun Hyoung Lee; Minseob Yoo; Soonkyu Hwang; Bong Hyun Sung; Suhyung Cho; Bernhard Palsson; Sun Chang Kim; Byung-Kwan Cho
Journal:  Nat Commun       Date:  2019-02-25       Impact factor: 14.919

6.  A versatile platform for locus-scale genome rewriting and verification.

Authors:  Ran Brosh; Jon M Laurent; Raquel Ordoñez; Emily Huang; Megan S Hogan; Angela M Hitchcock; Leslie A Mitchell; Sudarshan Pinglay; John A Cadley; Raven D Luther; David M Truong; Jef D Boeke; Matthew T Maurano
Journal:  Proc Natl Acad Sci U S A       Date:  2021-03-09       Impact factor: 11.205

7.  Scaling DNA data storage with nanoscale electrode wells.

Authors:  Bichlien H Nguyen; Christopher N Takahashi; Gagan Gupta; Jake A Smith; Richard Rouse; Paul Berndt; Sergey Yekhanin; David P Ward; Siena D Ang; Patrick Garvan; Hsing-Yeh Parker; Rob Carlson; Douglas Carmean; Luis Ceze; Karin Strauss
Journal:  Sci Adv       Date:  2021-11-24       Impact factor: 14.136

8.  Universal and high-fidelity DNA single nucleotide polymorphism detection based on a CRISPR/Cas12a biochip.

Authors:  Yong Chen; Yixin Mei; Xingyu Jiang
Journal:  Chem Sci       Date:  2021-02-09       Impact factor: 9.825

9.  An Epigenetics-Inspired DNA-Based Data Storage System.

Authors:  Clemens Mayer; Gordon R McInroy; Pierre Murat; Pieter Van Delft; Shankar Balasubramanian
Journal:  Angew Chem Int Ed Engl       Date:  2016-07-21       Impact factor: 15.336

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.