Je H Lee1. 1. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
Abstract
The spatial information associated with gene expression is important for elucidating the context-dependent transcriptional regulation during development. Recently, high-resolution sampling approaches, such as RNA tomography or single-cell RNA-seq combined with fluorescence in situ hybridization (FISH), have provided indirect ways to view global gene expression patterns in three dimensions. Now in situ sequencing technologies, such as fluorescent in situ sequencing (FISSEQ), are attempting to visualize the genetic signature directly in microscope images. This article will examine the basic principle of modern in situ and single-cell genetic methods, hurdles in quantifying intrinsic and extrinsic forces that influence cell decision-making, and technological requirements for making a visual map of gene regulation, form, and function. Successfully addressing these challenges will be essential for investigating the functional evolution of regulatory sequences during growth, development, and cancer progression. WIREs Syst Biol Med 2017, 9:e1369. doi: 10.1002/wsbm.1369 For further resources related to this article, please visit the WIREs website.
The spatial information associated with gene expression is important for elucidating the context-dependent transcriptional regulation during development. Recently, high-resolution sampling approaches, such as RNA tomography or single-cell RNA-seq combined with fluorescence in situ hybridization (FISH), have provided indirect ways to view global gene expression patterns in three dimensions. Now in situ sequencing technologies, such as fluorescent in situ sequencing (FISSEQ), are attempting to visualize the genetic signature directly in microscope images. This article will examine the basic principle of modern in situ and single-cell genetic methods, hurdles in quantifying intrinsic and extrinsic forces that influence cell decision-making, and technological requirements for making a visual map of gene regulation, form, and function. Successfully addressing these challenges will be essential for investigating the functional evolution of regulatory sequences during growth, development, and cancer progression. WIREs Syst Biol Med 2017, 9:e1369. doi: 10.1002/wsbm.1369 For further resources related to this article, please visit the WIREs website.
Comprehensive cataloguing of the molecular and cellular heterogeneity is important for classifying regulatory pathways into functional categories; however, they alone cannot reveal how sequence‐to‐function relationships evolve over space and time. Because cells interpret functional sequences in a cell state‐, history‐, or environment‐dependent manner, retracing their decision‐making algorithm requires examining cell identities, lineage, and external cellular interactions across a range of landscapes, respectively (Figure 1). Here, modern high‐throughput approaches are becoming indispensible for understanding how cells recapitulate complex form and function using a defined set of conserved signaling pathways, while overcoming or responding to biological noise and perturbations.
Figure 1
Molecular or cellular taxonomy alone is insufficient for understanding the functional dynamics of genetic and phenotypic evolution, as it requires analyzing how the selection pressure from the environment changes the phenotype from a common ancestor. To properly address this question, one needs to compare multiple cell lineages, cell states/types, and microenvironments in parallel. Traditionally, the genetic material was isolated from pulverized tissues, masking the cellular heterogeneity as well as their spatial context. Methods now exist to sample randomly chosen cells or from spatially defined regions; however, they all lack the precise spatial resolution for understanding cell–cell or cell–environment interactions.
Molecular or cellular taxonomy alone is insufficient for understanding the functional dynamics of genetic and phenotypic evolution, as it requires analyzing how the selection pressure from the environment changes the phenotype from a common ancestor. To properly address this question, one needs to compare multiple cell lineages, cell states/types, and microenvironments in parallel. Traditionally, the genetic material was isolated from pulverized tissues, masking the cellular heterogeneity as well as their spatial context. Methods now exist to sample randomly chosen cells or from spatially defined regions; however, they all lack the precise spatial resolution for understanding cell–cell or cell–environment interactions.For the past 100 years, optical microscopy and general tissue stains were the main tools for surveying the tissue landscape; however, elucidating the genetic mechanism required extracting the nucleic acids from pulverized tissues and destroying its spatial information. Recently, several new technologies have emerged to address the tissue heterogeneity, including single‐cell sequencing and multiplexed in situ probe hybridization.1, 2 Single‐cell RNA‐seq (scRNA‐seq) can measure the transcriptional heterogeneity and reconstruct the cellular location using the known gene expression pattern in space,3, 4, 5 while single‐molecule fluorescence in situ hybridization (smFISH) can directly visualize a modest number of genes simultaneously in situ.
6, 7, 8, 9 Other approaches combine geographically defined single‐cell isolation,10, 11 laser‐assisted sampling,12 or serial tissue sectioning13 with genome14 or transcriptome profiling.15, 16In situ sequencing is a set of proof‐of‐concept technologies that can sequence the DNA or the RNA within fixed cells without destroying their spatial context,17, 18, 19 and it can be used to directly link spatial features to particular genetic elements in native tissue specimens, as long as sequencing libraries can be constructed from the sequence‐of‐interest inside the cell. Technologically, in situ sequencing is an extension of high‐throughput DNA sequencing, in which the nucleic acid sequence is read‐out on a solid surface using optical microscopy.20, 21 In next‐generation sequencing (NGS), the temporal alignment of colors from fluorescent amplicons determines the sequence of nucleotide bases, and in situ enzyme reactions on the glass surface allow for saturation kinetics and robust molecular imaging, leading to longer reads and higher read densities.Because FISH also involves fixing the nucleic acids onto a solid matrix, the possibility of NGS inside fixed cells had existed. In 2004, Nilsson and coworkers demonstrated a padlock probe and rolling circle amplification (RCA)‐based approach to capture, amplify, and image the DNA with single‐nucleotide resolution in fixed cells and tissues.22 Later, the Church laboratory incorporated padlock probes for targeted NGS, suggesting that the Nilsson method could be scaled up for NGS inside the cell.23, 24, 25, 26, 27 Eventually, the Nilsson laboratory demonstrated targeted RNA detection using in situ sequencing to identify a small number of transcripts and mutations in tissue sections,17, 18 which was later followed by the report of transcriptome‐wide fluorescent in situ sequencing (FISSEQ) from the Church laboratory.19, 28At the present moment, in situ sequencing is at the proof‐of‐concept stage (e.g. known cancer mutations in tissues,18 cell culture models,19 and short barcodes29), unlike smFISH or single‐cell sequencing, but it may be able to address questions that other methods cannot if a number of technical limitations can be overcome. With that in mind, this article will focus on the biological and technological hurdles that are addressed by the previous generation of in situ hybridization and single‐cell technologies and the challenges that remain. To summarize the history of DNA sequencing and in situ methods in a couple of paragraphs will undoubtedly neglect seminal contributions from multiple groups, so this article will focus on technologies that immediately preceded or followed the development of in situ sequencing. Beyond providing a historical account, the article will also address several key questions in biology, explore whether current methods are ready to tackle these problems, and conclude with a set of technological goals for investigating the spatial dynamics and the functional evolution of gene regulation during the emergence of form and function in development.
A BRIEF WALK THROUGH MODERN IN SITU AND SINGLE‐CELL METHODS
The padlock probe‐based method from the Nilsson was a conceptually elegant approach to quantifying multiple DNA or RNA molecules with single‐nucleotide resolution in situ;17, 18, 22 however, the variable sensitivity of individual probes across different samples and gene targets required careful validation and limited its scalability. Moreover, it faced a competition from smFISH, which was more uniformly sensitive and easier to implement, soon making it the gold standard in single‐cell analysis in situ. With the lower cost of generic oligonucleotide synthesis, it was also more scalable, although it lacked the exquisite single‐base specificity of the Nilsson approach2, 30 (Figure 2).
Figure 2
Quantitative methods for detecting multiple RNA molecules in situ. (a) The Nilsson method uses target‐specific reverse transcription (RT) primers (typically locked nucleic acid (LNA) derivatives) to make cDNA molecules in situ used for padlock probe‐based T4 DNA ligation.17 The intramolecular ligation reaction here is less efficient and specific than the sequencing‐by‐ligation reaction kinetics. The circular padlock probe is then amplified using rolling circle amplification (RCA) that increases the number of barcode‐binding sites by 100‐fold or more for robust imaging; however, it is not known how the physical constraints or molecular crowding around individual transcripts in tissues affect the RCA bias that is observed in fluorescent in situ sequencing (FISSEQ).19 (b) Compared to the previous method that involves long customized probes and multiple enzymatic steps, single‐molecule fluorescence in situ hybridization (smFISH) offers the unmatched sensitivity, spatial resolution, ease of use, affordability, and scalability, as long as one can optically resolve individual signals under a microscope.7 Because smFISH is so sensitive, this can be challenging for most abundantly expressed transcripts, especially at low magnification.31 In addition, smFISH does not have the single‐nucleotide specificity of the Nilsson method.
Quantitative methods for detecting multiple RNA molecules in situ. (a) The Nilsson method uses target‐specific reverse transcription (RT) primers (typically locked nucleic acid (LNA) derivatives) to make cDNA molecules in situ used for padlock probe‐based T4 DNA ligation.17 The intramolecular ligation reaction here is less efficient and specific than the sequencing‐by‐ligation reaction kinetics. The circular padlock probe is then amplified using rolling circle amplification (RCA) that increases the number of barcode‐binding sites by 100‐fold or more for robust imaging; however, it is not known how the physical constraints or molecular crowding around individual transcripts in tissues affect the RCA bias that is observed in fluorescent in situ sequencing (FISSEQ).19 (b) Compared to the previous method that involves long customized probes and multiple enzymatic steps, single‐molecule fluorescence in situ hybridization (smFISH) offers the unmatched sensitivity, spatial resolution, ease of use, affordability, and scalability, as long as one can optically resolve individual signals under a microscope.7 Because smFISH is so sensitive, this can be challenging for most abundantly expressed transcripts, especially at low magnification.31 In addition, smFISH does not have the single‐nucleotide specificity of the Nilsson method.For decades, FISH was used to localize gene transcripts in situ; however, quantifying gene expression was difficult due to the variable signal‐to‐noise ratio and the nonspecific signal from nucleic acid hybridization. Singer and coworkers demonstrated that smFISH can obtain quantitative measurements of the transcript abundance in single cells and that genetic barcoding can be used to multiplex transcript detection.6, 32 Raj, along with colleagues in Tyagi and van Oudenaarden laboratories, made smFISH practical and robust by using multiple short probes that co‐localize on the transcript for a high signal‐to‐noise ratio7, 33, 34 (Figure 2). This also permitted better tissue penetration, reducing the need to optimize fixatives and partial tissue digestion in FISH experiments. In addition, these advances affirmed an important concept in NGS and in situ studies: the high specificity and sensitivity can be achieved by co‐localizing numerous short reads or probes with quality scores based on a statistical model.Because counting individual molecules using microscopy is limited by the optical resolution, conceptual approaches in super‐resolution microscopy (SRM) were important in inspiring new iterations of smFISH. For example, localization microscopy, such as Stochastic Optical Reconstruction Microscopy (STORM), achieves higher resolution by using stochastic molecular fluorescence over many cycles,35, 36, 37 suggesting that cyclical imaging of barcoded probes may enable multiplexing in lieu of SRM38 (Figure 3). In fact, Cai and coworkers reported generating a unique temporal barcode by sequentially hybridizing different colored probes to the same transcript, enabling them to multiplex smFISH.8 Taking a step further, Zhuang and coworkers used redundancy coding to detect more than 1000 genes using smFISH in an error‐resistant manner.9 They also demonstrated faster sequential color read‐out by inactivating fluorophores with light, thereby shortening the read‐out cycle time.
Figure 3
Spatial versus temporal genetic barcoding for multiplexed RNA detection. (a) Genetic barcoding to multiplex transcript detection was first demonstrated in situ by Singer and coworkers.6 Hood and coworkers then popularized this concept and made it commercially successful for single‐molecule RNA quantification (Nanostring, Seattle, WA).39 Here, the target RNA serves as a splint that pulls down a complementary probe with a spatial barcode composed of fluorescent nucleic acid segments (~1–2 µm). Optically resolving various color sequences‐associated each RNA molecule enables target identification and quantification. (b) Large barcodes cannot be used for multiplexing in single cells; however, Cai and coworkers showed that targeting the same loci repeatedly with single‐molecule fluorescence in situ hybridization (smFISH) but using different colors generates a temporal barcode.8 Using four‐color imaging and seven hybridization cycles, one could interrogate over 16,000 genes in theory, despite a number of practical challenges due to the diffraction limit of optical microscopy and the imaging time required for super‐resolution microscopy.
Spatial versus temporal genetic barcoding for multiplexed RNA detection. (a) Genetic barcoding to multiplex transcript detection was first demonstrated in situ by Singer and coworkers.6 Hood and coworkers then popularized this concept and made it commercially successful for single‐molecule RNA quantification (Nanostring, Seattle, WA).39 Here, the target RNA serves as a splint that pulls down a complementary probe with a spatial barcode composed of fluorescent nucleic acid segments (~1–2 µm). Optically resolving various color sequences‐associated each RNA molecule enables target identification and quantification. (b) Large barcodes cannot be used for multiplexing in single cells; however, Cai and coworkers showed that targeting the same loci repeatedly with single‐molecule fluorescence in situ hybridization (smFISH) but using different colors generates a temporal barcode.8 Using four‐color imaging and seven hybridization cycles, one could interrogate over 16,000 genes in theory, despite a number of practical challenges due to the diffraction limit of optical microscopy and the imaging time required for super‐resolution microscopy.In theory, smFISH can be combined with SRM to profile the messenger RNA (mRNA; ~100,000 molecules per cell);40 however, the imaging time required even for a small tissue makes it impractical. Because most biologically relevant gene expression patterns occur across many microns and millimeters, SRM is akin to asking a single Google Street driver to map the whole continent, and it is poorly suited for whole‐tissue mapping. One solution could be to have an army of automated drivers, but it would require an enormous amount of time and resources to update the map frequently across multiple continents. Compounding the problem is the fact that many RNA molecules are tightly packaged into subcellular granules, making their quantification using optical imaging challenging at any fixed resolution (Figure 4).
Figure 4
Single‐molecule fluorescence in situ hybridization (smFISH) requires spatially resolving individual molecules for quantification and multiplexing.8, 9 The addition of multiple small hybridization probes to cells and tissues generates significant nonspecific fluorescence;7 however, the co‐localization of many independent probe sequences on the same transcript can be detected as a diffraction‐limited spot, resulting in a high signal‐to‐noise ratio (SNR). But when the transcript density is too high or the imaging magnification is too low, it becomes difficult to discriminate signal from noise, rendering smFISH largely qualitative and challenging for a high degree of multiplexing.31
Single‐molecule fluorescence in situ hybridization (smFISH) requires spatially resolving individual molecules for quantification and multiplexing.8, 9 The addition of multiple small hybridization probes to cells and tissues generates significant nonspecific fluorescence;7 however, the co‐localization of many independent probe sequences on the same transcript can be detected as a diffraction‐limited spot, resulting in a high signal‐to‐noise ratio (SNR). But when the transcript density is too high or the imaging magnification is too low, it becomes difficult to discriminate signal from noise, rendering smFISH largely qualitative and challenging for a high degree of multiplexing.31In contrast, a global approach (satellite view) to RNA imaging enables comprehensive and efficient mapping of whole tissues and organs, albeit at much lower resolution. For example, Deisseroth and coworkers developed CLARITY to make whole organs optically transparent for high‐resolution three‐dimensional (3D) imaging.41 Since then, other groups have shown faster and easier methods for whole‐tissue clearing,42 and recently the Deisseroth laboratory and others have demonstrated sequential FISH in optically cleared tissues using serial hybridization of modified RNA targets.43, 44 Here, a high‐resolution objective for single‐molecule imaging is not a practical option for whole‐tissue imaging; therefore, RNA detection methods that depend on single‐molecule imaging for multiplexing cannot be used,44 although potential solutions are starting to emerge.31, 45, 46While smFISH is relatively straightforward to implement, it requires a list of candidate genes, typically based on existing annotations. As NGS has shown, unbiased sequencing often allows for new discoveries compared with targeted gene expression arrays, which are best suited for profiling already known genetic elements. For example, NGS has shown that a large fraction of the long noncoding RNA (lncRNA) is more tissue specific than the mRNA.47 In addition, many regulatory enhancers express noncoding RNAs, whose spatial distribution may provide clues for understanding gene regulation.48 Because many mRNAs still lack detailed functional annotations, targeted assays are generally limited to well‐annotated genes. In addition, massively parallel functional analysis uses expression constructs containing synthetic barcode sequences for de novo short‐read sequencing.49, 50, 51For these and other reasons, direct RNA sequencing is preferable over targeted hybridization‐based methods in many applications (Figure 5). Over the past several years, improvements in the detection sensitivity and reliability have allowed transcriptome‐wide scRNA‐seq.15, 16 To investigate the cell‐type composition in space, Quake and coworkers used a microfluidics platform for scRNA‐seq (Fluidigm, San Francisco, CA), sequencing a couple of hundred single cells from the lung alveoli to infer their location using known anatomic biomarkers.3 Several laboratories took it a step further by integrating high‐throughput scRNA‐seq with an online database of FISH experiments to reconstruct the cellular composition of 3D tissue structures.4, 5 These approaches, however, depend on having a database of positional biomarkers, which may not be available in most cases.
Figure 5
Location‐aware sampling methods for next‐generation sequencing (NGS). For unbiased profiling of transcriptome‐wide gene expression, NGS is currently the only wide available method. Tissue samples can be dissociated into random or sorted single cells and virtually reconstructed later using the known spatial patterns of gene expression (single‐cell RNA‐seq, scRNA‐seq).4, 5 Alternatively, they can be spatially dissected [i.e., laser capture microdissection (LCM), transcriptome in vivo analysis (TIVA)],12 sectioned (RNA tomography),13 or systematically subsampled (i.e., RNA capture array)52 for RNA‐seq. Here, the practical sampling number limits the spatial resolution as the specimen size increases along multiple dimensions. Practically, only a small fraction of all possible sampling points are used for multiplexed RNA‐seq with a lower sequencing depth. In addition, the sampling noise requires pooling multiple regions or single cells together to detect subtle variations, further handicapping NGS‐based methods from achieving the single‐cell spatial resolution based on less abundant and possibly more tissue‐specific transcripts.
Location‐aware sampling methods for next‐generation sequencing (NGS). For unbiased profiling of transcriptome‐wide gene expression, NGS is currently the only wide available method. Tissue samples can be dissociated into random or sorted single cells and virtually reconstructed later using the known spatial patterns of gene expression (single‐cell RNA‐seq, scRNA‐seq).4, 5 Alternatively, they can be spatially dissected [i.e., laser capture microdissection (LCM), transcriptome in vivo analysis (TIVA)],12 sectioned (RNA tomography),13 or systematically subsampled (i.e., RNA capture array)52 for RNA‐seq. Here, the practical sampling number limits the spatial resolution as the specimen size increases along multiple dimensions. Practically, only a small fraction of all possible sampling points are used for multiplexed RNA‐seq with a lower sequencing depth. In addition, the sampling noise requires pooling multiple regions or single cells together to detect subtle variations, further handicapping NGS‐based methods from achieving the single‐cell spatial resolution based on less abundant and possibly more tissue‐specific transcripts.To enable de novo spatial reconstruction of gene expression, van Oudenaarden and coworkers developed RNA tomography, in which vertebrate embryos or organs were cut into topographically mapped 10–100‐µm sections, sequenced, and reconstructed leveraging the scRNA‐seq pipeline.13 Although one can generate thinner tissue sections, smaller tissue sections lead to more sampling noise, requiring averaging of data from adjacent sections and limiting the spatial resolution to several hundred microns for less abundant transcripts1, 53 (Figure 5). Recently, a start‐up from Joakim Lundeberg's laboratory reported a method of extracting RNA from tissue sections placed directly on a barcoded poly‐dT array, followed by NGS and computational reconstruction.52 Each spatial element captures ~10K unique mRNA transcripts per 100‐µm feature, which is several times lower than that of scRNA‐seq. For coarse multiregional sampling, however, this method may provide a less destructive alternative to microdissection (Figure 5).For higher spatial resolution, Eberwine and coworkers developed transcriptome in vivo analysis (TIVA), in which photo‐caged biotinylated poly‐dT primers are released exactly at the point of laser excitation inside living cells.12 By pulling down the captured mRNA for sequencing, one could theoretically sequence almost any region in the cell with submicron resolution as long as enough of the identical regions are sampled. If automated, TIVA could be a promising option for studying the transcriptome composition in different subcellular compartments and cellular regions using unbiased deep sequencing; however, the cell type‐ and tissue‐specific penetration of the TIVA reagent currently limits its application.To conclude, the imaging resolution, the transcript density, and the imaging time limit multiplexed methods that depend on single‐molecule color coding to simpler systems suited for high‐resolution microscopy. Multiplexed error‐robust fluorescence in situ hybridization (MERFISH)9 can cycle through multiple probe sets and deduce the transcript identity more accurately than other smFISH methods. Because of its reliance on single‐molecule imaging, however, profiling a large number of transcripts at the ‘‐omics’ scale is currently out of reach for MERFISH or other smFISH techniques.8, 9Unbiased deep transcriptome profiling is a major benefit of scRNA‐seq, but the limited detection sensitivity as well as the sampling noise requires pooling of data from multiple sources to detect small variations. This in turn reduces the sequencing depth for each cell sharing a sequencing lane. RNA tomography, TIVA, RNA‐capture arrays, and other NGS‐based methods typically sacrifice the sampling resolution for the sequencing depth. In general, biological patterns or structures that occur reproducibly, such as in embryos, stable cell types, or clinically defined histologic features, benefit most from the NGS‐dependent methods, as sampling of many identical cell types or structures will enable the statistically significant detection of small variations with higher spatial resolution, whereas smFISH‐based methods allow for quantifying the spatial variation in any small specimens, but mostly for validating a limited number of preselected genes.
ISSUES IN QUANTIFYING INTRINSIC AND EXTRINSIC VARIABLES IN CELLULAR DECISION‐MAKING
In development, cells use information regarding lineage, intrinsic, and extrinsic factors to make decisions, and the genetic code specifies how such information is interpreted in different regions. The lineage information includes a historical record of master transcription factors and epigenetic modifications, whereas intrinsic factors include the cell state and the cell morphology. Extrinsic factors are composed of cell–cell or cell–extracellular matrix (ECM) interactions, such as the effect of stromal, endothelial, and inflammatory cells. The scale of cell–cell interactions can span few microns to many millimeters, implying that the quantification of extrinsic factors needs to occur at multiple scales.The lineage information can be obtained by observing individual cells under a microscope;54 however, this approach is limited to small and relatively transparent organisms, and they are low‐throughput. Alternative approaches include combinatorial cell labeling using fluorescent proteins or somatic mutations for retrospective phylogenetic reconstruction.55, 56 The former (i.e., Brainbow) lacks the combinatoric capacity to trace complete cell lineages, but it shows the cellular morphology, the cell–cell interaction, and the local tissue context.55 In contrast, the latter (i.e., GESTALT) can be scaled up for complete phylogeny reconstruction of whole organisms;56 however, it requires dissecting the whole organism into bulk tissues or single cells for NGS, destroying the spatial information. In situ DNA or RNA sequencing may provide a solution;18, 22, 29 however, sufficiently sensitive methods for sequencing multiple mutated loci in single cells in situ do not yet exist.Intrinsic cellular variations can be measured using some of the existing single‐cell analysis tools. For example, scRNA‐seq can address the transcriptome heterogeneity, and the increasing throughput in combination with smFISH could make high‐resolution cell‐type mapping practical for well‐defined developmental structures.57 To put this into a perspective, however, a single 1‐cm tissue mass contains up to a billion cells; therefore, scaling single‐cell analyses to large tissues or organs may be impractical. A focused approach that combines labeled cells, fluorescence‐activated cell sorting (FACS), and scRNA‐seq may better quantify cellular variations in defined cell populations.10, 11 Of note, a variety of single‐cell technologies, such as protein mass spectrometry, electrophysiology, and cellular perturbations, are all useful; however, many are relatively low‐throughput compared with NGS‐based single‐cell methods, and they are more suited for addressing specific questions rather than whole‐tissue mapping.Unlike single‐cell variations, the environmental context does not have a well‐defined perimeter, as it can span multiple spatial scales. This is a well‐known problem in functional genomics, where genetic features are visualized across a continuum of chromosomal window sizes and resolutions. It is also relevant in epidemiology, in which the emergence of nonrandom spatial patterns can alert one to a spreading epidemic, and the statistical significance is measured across a continuum of resolutions and scales.58 Therefore, technological and statistical approaches to quantify nonrandom spatial patterns across various scales are necessary to quantify how genetic information is interpreted in a context‐dependent manner in vivo.
59, 60, 61Practically, in vitro experiments are useful for constraining extrinsic parameters, but they cannot simulate the native environment, especially across different spatial scales. On the other hand, direct sampling generates data that are dependent on the size and breadth of biopsies, and it introduces a significant amount of sampling noise at the lower end of measurement values. Pair‐wise comparisons of intrinsic and extrinsic factors are incompatible with dissociated single‐cell analysis, unless cells are labeled for reproducible sampling around individually marked cells. In contrast, in situ experiments can examine single cells and their native surroundings across a limited number of spatial scales; however, high‐throughput molecular profiling in situ requires new types of technologies that require more development. One particular method that is potentially capable of co‐localizing and quantifying lineage histories, intrinsic factors, and extrinsic forces across multiple tissue landscapes will now be the focus of the next discussion.
IN SITU SEQUENCING: KEY FEATURES, UNANSWERED QUESTIONS, AND CURRENT IMPLEMENTATION
High‐throughput NGS has been used to read genetic barcode variants for lineage tracing, single‐cell gene expression profiling, and microenvironmental gene expression profiling; however, it lacks the ability to co‐localize such information onto the native tissue landscape. Because several popular NGS platforms sequence polymerase chain reaction amplicons on glass in situ,62 it was inevitable that some would attempt NGS inside single cells.19 Sequencing flow cells, however, are very different than the cellular environment, in terms of the subcellular transcript density, the tissue thickness, and the spatial scale at which imaging is typically done. The key aspect of FISSEQ is that it can change the random transcript sampling rate during 3D imaging to quantify the gene expression heterogeneity across multiple scales and resolutions.19, 28 To do so, FISSEQ amplicons are orders of magnitude brighter than smFISH signals, as low‐magnification 3D microscopy makes single‐molecule imaging demanding. Also, FISSEQ enables programmable subsampling using in situ sequencing primers so that individual transcript sequence can be resolved at any magnification (Figure 6).
Figure 6
Fluorescent in situ sequencing (FISSEQ) can utilize the programmable signal density to acquire a similar amount of information from multiplexed RNA detection regardless of the optical magnification or the transcript density.19, 28 In this way, small and large biological patterns can be observed in the same specimen across a range of spatial scales, similar to how modern maps display the similar density of geographical information regardless of their resolution, making the map useful at all scales.
Fluorescent in situ sequencing (FISSEQ) can utilize the programmable signal density to acquire a similar amount of information from multiplexed RNA detection regardless of the optical magnification or the transcript density.19, 28 In this way, small and large biological patterns can be observed in the same specimen across a range of spatial scales, similar to how modern maps display the similar density of geographical information regardless of their resolution, making the map useful at all scales.To briefly summarize the FISSEQ workflow, which is detailed elsewhere,28 fixed cells are immersed in an enzyme cocktail with random hexamers for reverse transcription in situ. The cDNA fragments are circularized and amplified using RCA, and sequencing‐by‐ligation is performed on a four‐color confocal microscope (Figure 7). Base calling and sequence mapping against RefSeq are done for every pixel. The unmapped pixels are filtered out, and the mapped pixels are clustered using the sequence identity, allowing one to compare gene expression in multiple cellular regions, compartments, and morphologies. In the original publication,19 FISSEQ identified coding and noncoding RNAs associated with quiescent and proliferating fibroblasts, including those that regulated cellular migration and metabolism. In addition, it identified differential splicing of fibronectin in situ and detected a large shift in ribosomal RNA (rRNA) transcription in proliferating cells treated with EGF.
Figure 7
Fluorescent in situ sequencing (FISSEQ) converts endogenous RNA molecules in fixed cells or tissues into short cDNA fragments in situ using random hexamer‐primed reverse transcription (RT). (a) Each cDNA fragment contains a common sequencing adapter, which is then circularized prior to rolling circle amplification (RCA) in situ. RCA amplicons are then crosslinked to generate a stable 3D matrix of DNA molecules for in situ next‐generation sequencing (NGS) reactions. FISSEQ then generates 3D images containing NGS reads at each pixel for data analysis.19, 28 (b) Currently, the efficiency of RCA is not uniform across subcellular compartments, especially across different cell types.28 We hypothesize that molecular crowding or liquid droplet phase transition may contribute to such observations; however, it is not clear whether such phenomenon is responsible for the relative paucity of housekeeping genes in FISSEQ.
Fluorescent in situ sequencing (FISSEQ) converts endogenous RNA molecules in fixed cells or tissues into short cDNA fragments in situ using random hexamer‐primed reverse transcription (RT). (a) Each cDNA fragment contains a common sequencing adapter, which is then circularized prior to rolling circle amplification (RCA) in situ. RCA amplicons are then crosslinked to generate a stable 3D matrix of DNA molecules for in situ next‐generation sequencing (NGS) reactions. FISSEQ then generates 3D images containing NGS reads at each pixel for data analysis.19, 28 (b) Currently, the efficiency of RCA is not uniform across subcellular compartments, especially across different cell types.28 We hypothesize that molecular crowding or liquid droplet phase transition may contribute to such observations; however, it is not clear whether such phenomenon is responsible for the relative paucity of housekeeping genes in FISSEQ.Interestingly, data from random hexamer‐initiated in situ sequencing libraries are relatively scarce in major housekeeping genes despite their high abundance.19, 28 Because FISSEQ amplicons are 100–1000 times larger than RNA–DNA duplexes in smFISH, we suspect that molecular crowding or sequestration of particular transcripts impacts the RCA efficiency and imaging (Figure 7). Our ongoing experiments indicate that the efficiency of generating RCA amplicons from an smFISH probe is higher for the FISSEQ‐compatible transcripts than for β‐actin or tubulin mRNAs. Because of its potential implications for stress‐ or cell type‐associated RNA compartmentalization, our group is investigating its biological basis in different cell and tissue types, including the differential subcellular cDNA amplification that are possibly indicative of liquid‐phase transitions in the cytoplasm and the nucleus.63From a practical point‐of‐view, dedicated FISSEQ hardware is most cost‐effective when using modular microscopy platforms capable of various 2D and 3D imaging modalities for examining subcellular regions, 3D organoids, and whole tissues at multiple resolutions and scales. While the existing protocols work well in most 2D or 3D cultured cells, RCA appears to be hindered in compact tissue sections due to the crowding effect, fixation, or embedding reagents. Limited tissue clearing may improve the RCA yield,44, 45 but it is not obvious how each clearing method will affect the detection bias and sensitivity. For now, a dedicated high‐speed spinning disk confocal microscope with four‐color imaging appears to meet our needs (~$350K, including customization). Our FISSEQ microscope from Nikon (Melville, NY) takes 1 min per 25 optical slices per field‐of‐view (FOV), and approximately 1000 FOVs per day of tiled imaging using PerfectFocus (~20 million mapped FISSEQ reads). With the installation of a perfusion set‐up on the microscope stage for wash cycles, the hands‐on time is less than 5 min per sequencing cycle.
While FISSEQ enables unbiased transcriptome sampling in morphologically distinct subcellular and microenvironment regions, its estimated detection sensitivity is less than 0.005% compared to smFISH.28 Yet, nonnormalized reads contain few major housekeeping genes from cultured cells19 and tissue sections (in preparation), even without depleting the rRNA.19 For example, common genes such as ACTB, TUB1A1B, HSP90B1, and ribosomal proteins (i.e., RPS6, PRS23, RPS12) are significantly underrepresented in FISSEQ versus RNA‐seq compared with cell type‐specific transcripts (odds ratio = 0.002–0.02; P‐value < 10−12).19 We hypothesize that this finding could be useful for investigating RNA compartmentalization or localization, and we are currently extending our findings to multiple different cell and tissue types (Figure 7). Regardless, this causes targeted RNA‐seq in situ to behave unpredictably depending on the cell or tissue type. Combined with the low detection sensitivity, the unpredictable or unknown nature of the FISSEQ detection bias makes selective sequencing of posttranscriptional modifications, mutations, indels, barcodes, and transcriptional reporters inefficient and prone to false negatives, especially in tissue sections. This means that investigating cell lineage, neural connections, and signaling reporters using in situ sequencing could reflect the technical bias rather than the biological variation. While tissue‐clearing methods may reduce the detection heterogeneity,44, 45 it could also worsen the detection sensitivity due to the prolonged nature of tissue clearing steps.To overcome these limitations, smFISH‐like approaches are needed, in which small oligonucleotides are bound directly to the RNA template for single‐molecule imaging; however, it requires single‐nucleotide resolution for sequential sequencing reactions in situ. Our laboratory is developing Heuristic In Situ Targeted Oligopaint sequencing (HISTO‐seq), in which sequencing‐by‐ligation occurs directly on targeted RNA molecules in situ with high sensitivity and specificity using saturation sequencing enzyme kinetics (in progress). By incorporating the oligonucleotide cleavage and re‐ligation chemistry from SOLiD (ThermoFisher), we are now determining the maximum HISTO‐seq read length in situ, which is especially important for cell and lineage barcoding applications. We are also developing an approach to remove nonspecific probe binding, so that molecular quantification can occur in the absence of single‐molecule imaging. By using programmable detection primers, targeted multiplex RNA‐seq in situ with the smFISH‐like sensitivity may now be possible.To summarize, FISSEQ is useful for finding novel biomarkers and quantifying extrinsic factors in the environmental landscape in an unbiased manner; however, it lacks the sensitivity and the reproducibility for targeted RNA detection and genetic barcode sequencing in single cells. Moreover, much remains unknown about why FISSEQ detects predominantly non‐housekeeping genes, emphasizing the need to see whether this extends to other cell and tissue types. Technologies like HISTO‐seq require knowing the target genes‐of‐interest; however, if their high sensitivity and ease of use can parallel that of smFISH, they can become a powerful tool for investigating the single‐cell intrinsic variation and cell lineage information with subcellular, single‐cell, and single‐base resolution in situ.
BUILDING TOOLS TO INVESTIGATE HOW CELL KNOW WHERE THEY ARE
One of the foundational ideas in biology is that of positional information in normal development.64 Turing first coined the term ‘morphogens’ and predicted that a simple regulatory network of diffusible molecules can generate robust patterns, suggesting how gene networks may encode patterns and forms in development. Wolpert then proposed the ‘French Flag’ model, in which the morphogen concentration gradient was interpreted differently through multiple signaling thresholds.65 While they remain some of the most fundamental ideas in biology today, testing these hypotheses in vivo has proven difficult, except in a limited number of model organisms and isolated experimental systems. Regardless, the key assumption in these idealized models is that the diffusion rate is consistent across various tissue landscapes and that all responding cells interpret a particular morphogen concentration in the same way.Despite many pioneers in the field, however, it is still not clear how cells interpret positional information to make decisions about its growth and differentiation.66 If morphogen is one of the extrinsic factors, is the morphogen concentration the only variable in position‐dependent cell fate specification? Do intrinsic cellular variations such as the cell cycle state,67 biophysical forces,68 morphology,69 and cell–cell interactions play a role? If these decisions are also dependent on their cellular memory or lineage information that is not directly observable, how does one quantify the effect of intrinsic, extrinsic, and lineage information in cellular decision‐making? (Figure 8).
Figure 8
Focusing technology development around a central question enables multiple creative approaches necessary to measure specific elements, rather than simply scaling up existing technologies. For example, positional information in developmental biology has been investigated under the assumption of idealized morphogen gradients without considering cellular and environmental variations. If true, any fluctuations in the signal strength due to environmental factors can make precise tissue patterning difficult over a global scale. To understand what makes interpreting positional information robust, our laboratory is developing three distinct in situ sequencing methods capable of measuring single‐cell variations, microenvironmental heterogeneity, and cell lineages. In the future, it should be possible to perform selective knockdown of genetic pathways or optogenetic induction of morphogen signaling in vivo and quantify how intrinsic, extrinsic, and lineage‐specific factors drive location‐specific cell fate commitment and differentiation.
Focusing technology development around a central question enables multiple creative approaches necessary to measure specific elements, rather than simply scaling up existing technologies. For example, positional information in developmental biology has been investigated under the assumption of idealized morphogen gradients without considering cellular and environmental variations. If true, any fluctuations in the signal strength due to environmental factors can make precise tissue patterning difficult over a global scale. To understand what makes interpreting positional information robust, our laboratory is developing three distinct in situ sequencing methods capable of measuring single‐cell variations, microenvironmental heterogeneity, and cell lineages. In the future, it should be possible to perform selective knockdown of genetic pathways or optogenetic induction of morphogen signaling in vivo and quantify how intrinsic, extrinsic, and lineage‐specific factors drive location‐specific cell fate commitment and differentiation.The location awareness of cells is critical to understanding the origin of cancer as well.70 The vast majority of cancers are derived from the epithelial cells involved in developmental patterning.71 Normally, such cells proliferate and differentiate progressively within a defined zone, and the cells that become lost commit apoptosis.72 In cancer‐prone conditions, a disproportionate number of cells become lost and escape the geo‐location fences set‐up around the epithelial layer.73 Therefore, investigating how cancer cells lose the ability to choose their fate based on their location may require examining intrinsic, extrinsic, and lineage information with single‐cell resolution across various tissue regions.Our collaborators and we are developing tools to sequence complete cell lineage information, similar to GESTALT56 but in situ, using HISTO‐seq, and we are also using it to classify single‐cell transcriptional, morphological, and behavioral variations. Such information is then superimposed on the FISSEQ data to reveal how the landscape heterogeneity affects cellular decision‐making. The most important benefit of in situ sequencing is that one can develop methods to capture other types of genetic data, including the somatic variation, the methylation signature, the DNA accessibility, and the chromatin topology. Rather than using bulk sampling‐based or context‐unaware technologies, adding the spatial dimension to these sequencing approaches could have a dramatic impact on our ability to select functionally relevant information.
CONCLUSION
The take‐home message is that methods that depend on standard NGS (i.e., scRNA‐seq, RNA tomography, LCM RNA‐seq, spatial array capture RNA‐seq) have limited spatial resolution because higher the sampling rate (i.e., single cells, ultra‐thin tissue sections, microdissected regions) larger the random sampling noise for less abundant transcripts, requiring pooling of data from multiple individual samples. Also, the number of sampling points increases as the specimen size increases, making the economics of high‐resolution sampling unfavorable. While shallow sequencing can provide some relief, it misses many low abundance and tissue‐specific genes, capping its ability to spatially discriminate subtle cell type‐ or state‐specific variations. These considerations make determining cell–cell and cell–microenvironment interactions challenging using NGS‐based methods.On the other hand, multiplexed smFISH depends on single‐molecule imaging to discriminate signal from noise, hampering its application to whole‐tissue imaging using low‐magnification objectives. Because of its high sensitivity and nonspecific background probe binding, the lack of single‐molecule imaging can lead to frequent false positives. Multiplexing approaches using single‐molecule barcoding face a similar set of issues, as it is restricted by the optical diffraction limit, the practical voxel size, and the imaging time. Given these considerations, quantitatively surveying large tissues using multiplexed smFISH will likely be limited to a relatively small number of less abundant genes.While FISSEQ is promising for investigating cell–cell and cell–microenvironmental interactions in situ, it is too premature to say whether the method will be practical or informative in a wide range of cell and tissue types. FISSEQ on cultured cells, organoids, and whole‐mount embryos are straightforward; however, a protocol for a wide range of fixed tissues is not yet available. The most interesting aspect of FISSEQ is its ability to selectively detect cell type‐ and state‐specific transcripts without any human intervention, suggesting that the subcellular organization of RNA molecules could have functional implications for cell type‐ or tissue‐specific transcripts.Currently, the low sensitivity of FISSEQ limits its application for the targeted detection of specific RNA transcripts, including those bearing genetic barcode sequences. For example, it is typical for ~5000 specific transcripts to yield ~10 targeted FISSEQ amplicons per cultured cell, resulting in a high false negative rate due to stochastic biological or technical variations. This requires developing a targeted RNA sequencing method with the smFISH‐like sensitivity with single‐base resolution such as HISTO‐seq. If successful, such approaches could allow for sequencing of various barcode associated with cell lineage tracing, signaling pathways, promoter activities, and Cas9‐targeted gene perturbations in situ. In addition, it could allow for tracking RNA processing, transport, and compartmentalization as well as discriminating multiple odorant receptors and their connectivity based on the transcript sequence.Finally, finding a way of out of the maze of new technologies can be challenging. The real‐world information regarding the sensitivity, the specificity, the usability, the economics, and the availability require some digging, but they are generally available; however, a self‐critical assessment about whether such technologies will actually address important biological questions are harder to find. For example, given that complex traits and cancer clonal evolution involve spatial mosaics of gene expression and somatic mutations, will cheaper sequencing of a large patient population be sufficient to map causative genetic elements? The current crop of sequencing technologies uses the abundance as a functional significance metric, which favors highly expressed transcripts over less abundant, less conserved, but often more tissue‐specific noncoding RNAs. Will scaling up single‐cell sequencing lead to the better understanding of the genome regulation, especially since ultra high‐throughput single‐cell approaches focus on the less noisy and more abundant transcripts?The cutting‐edge technology development can be as important as basic biological research because they can both lead to surprising and unexpected outcomes that can revolutionize science. Given the financial and opportunity cost of technology development and large‐scale studies using such tools, however, the broad research community faces an important challenge of balancing the call to scale up the latest technologies versus the need to find the right technologies capable of addressing key biological questions. Because identifying fundamental biological questions and variables that need to be measured are important elements in new technology paradigms, we advocate training young scientists familiar with technology development to ask fundamental biological questions, and those comfortable with rigorous experimental approaches to learn cutting‐edge technology development. In the long run, we believe that such scientists may be best prepared to utilize emerging technologies, recognize their deficiencies, and build tools necessary to tackle many of the unaddressed questions in biology.
Authors: Gary K Geiss; Roger E Bumgarner; Brian Birditt; Timothy Dahl; Naeem Dowidar; Dwayne L Dunaway; H Perry Fell; Sean Ferree; Renee D George; Tammy Grogan; Jeffrey J James; Malini Maysuria; Jeffrey D Mitton; Paola Oliveri; Jennifer L Osborn; Tao Peng; Amber L Ratcliffe; Philippa J Webster; Eric H Davidson; Leroy Hood; Krassen Dimitrov Journal: Nat Biotechnol Date: 2008-02-17 Impact factor: 54.908
Authors: Kun Zhang; Jin Billy Li; Yuan Gao; Dieter Egli; Bin Xie; Jie Deng; Zhe Li; Je-Hyuk Lee; John Aach; Emily M Leproust; Kevin Eggan; George M Church Journal: Nat Methods Date: 2009-07-20 Impact factor: 28.547
Authors: Madeleine P Ball; Jin Billy Li; Yuan Gao; Je-Hyuk Lee; Emily M LeProust; In-Hyun Park; Bin Xie; George Q Daley; George M Church Journal: Nat Biotechnol Date: 2009-03-29 Impact factor: 54.908
Authors: Sheel Shah; Eric Lubeck; Maayan Schwarzkopf; Ting-Fang He; Alon Greenbaum; Chang Ho Sohn; Antti Lignell; Harry M T Choi; Viviana Gradinaru; Niles A Pierce; Long Cai Journal: Development Date: 2016-06-24 Impact factor: 6.868
Authors: Floris Bosveld; Olga Markova; Boris Guirao; Charlotte Martin; Zhimin Wang; Anaëlle Pierre; Maria Balakireva; Isabelle Gaugue; Anna Ainslie; Nicolas Christophorou; David K Lubensky; Nicolas Minc; Yohanns Bellaïche Journal: Nature Date: 2016-02-17 Impact factor: 49.962
Authors: Violeta Rayon-Estrada; Dewi Harjanto; Claire E Hamilton; Yamina A Berchiche; Emily Conn Gantman; Thomas P Sakmar; Karen Bulloch; Khatuna Gagnidze; Sheila Harroch; Bruce S McEwen; F Nina Papavasiliou Journal: Proc Natl Acad Sci U S A Date: 2017-11-22 Impact factor: 11.205
Authors: Tam Vu; Alexander Vallmitjana; Joshua Gu; Kieu La; Qi Xu; Jesus Flores; Jan Zimak; Jessica Shiu; Linzi Hosohama; Jie Wu; Christopher Douglas; Marian L Waterman; Anand Ganesan; Per Niklas Hedde; Enrico Gratton; Weian Zhao Journal: Nat Commun Date: 2022-01-10 Impact factor: 14.919
Authors: Songlei Liu; Sukanya Punthambaker; Eswar P R Iyer; Thomas Ferrante; Daniel Goodwin; Daniel Fürth; Andrew C Pawlowski; Kunal Jindal; Jenny M Tam; Lauren Mifflin; Shahar Alon; Anubhav Sinha; Asmamaw T Wassie; Fei Chen; Anne Cheng; Valerie Willocq; Katharina Meyer; King-Hwa Ling; Conor K Camplisson; Richie E Kohman; John Aach; Je Hyuk Lee; Bruce A Yankner; Edward S Boyden; George M Church Journal: Nucleic Acids Res Date: 2021-06-04 Impact factor: 16.971