Literature DB >> 33581341

Profiling Chromatin Accessibility at Single-cell Resolution.

Sarthak Sinha¹, Ansuman T Satpathy², Weiqiang Zhou³, Hongkai Ji³, Jo A Stratton⁴, Arzina Jaffer⁵, Nizar Bahlis⁶, Sorana Morrissy⁷, Jeff A Biernaskie⁸.

Abstract

How distinct transcriptional programs are enacted to generate cellular heterogeneity and plasticity, and enable complex fate decisions are important open questions. One key regulator is the cell's epigenome state that drives distinct transcriptional programs by regulating chromatin accessibility. Genome-wide chromatin accessibility measurements can impart insights into regulatory sequences (in)accessible to DNA-binding proteins at a single-cell resolution. This review outlines molecular methods and bioinformatic tools for capturing cell-to-cell chromatin variation using single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) in a scalable fashion. It also covers joint profiling of chromatin with transcriptome/proteome measurements, computational strategies to integrate multi-omic measurements, and predictive bioinformatic tools to infer chromatin accessibility from single-cell transcriptomic datasets. Methodological refinements that increase power for cell discovery through robust chromatin coverage and integrate measurements from multiple modalities will further expand our understanding of gene regulation during homeostasis and disease.

Entities: Chemical

Keywords: Cis-regulatory elements; Epigenetics; Gene regulation; Single-cell ATAC-seq; Single-cell multi-omics

Mesh：

Substances：
Chromatin
Transposases

Year: 2021 PMID： 33581341 PMCID： PMC8602754 DOI： 10.1016/j.gpb.2020.06.010

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

Since 1665, when Robert Hooke first described that single cells form the building blocks of complex tissues, biologists have sought to delineate cellular taxonomies based on form and function. This task, however, has proven to be extraordinarily difficult due to the remarkable diversity in function, regulation, and morphology, even between cells grouped as one type. Single-cell sequencing has become a turning point for cell biologists as it enables profiling of single cells without experimental purifications and provides an unbiased view of cells, their different states, and their interactions with neighboring cells through molecular cross-talks within an intact tissue [1], [2], [3]. Leading the revolution in single-cell measurement technologies is single-cell RNA sequencing (scRNA-seq), which has imparted significant insights into gene expression. Owing to rapid developments in technology, such as droplet-based microfluidics capable of profiling thousands of single-cells by counting 3′-end of transcripts [4], [5], organism-scale atlases have started to detangle transcriptional heterogeneity in cells comprising complex tissues [6], [7], [8], [9], [10], [11], [12], [13], [14]. Although powerful in resolving cell-specific gene programs, scRNA-seq fails to capture the diverse and dynamic chromatin landscape that regulates gene expression. Waddington famously conceptualized all cells as derivatives of a single totipotent cell canalized through different troughs in the sinuous epigenetic landscape, similar to a ball (cell) rolling down a hill [15]. Analysis of transcriptomes, detached from properties of chromosomes that form this epigenetic landscape, is akin to studying the features of the ball without appreciating its relationship and residence on this hill. Indeed, bulk assay for transposase-accessible chromatin (ATAC-seq) [16], [17] and Deoxyribonuclease I digestion (DNase-seq) [18] has revealed that chromatin accessibility is a key component of the epigenetic landscape. By exposing different promoters, enhancers, and regulatory elements, cells enable transcription factors (TFs) to regulate gene expression and to navigate it through bifurcating valleys in the epigenetic landscape. For example, during development, specialized cells progressively diverge from their progenitors by gaining or losing chromatin accessibility [19], [20]. At any given time point through this process, measurements of chromatin state can impart insights into a cell’s past (lineage relationships) and provide a window into its future (terminal fate) that cannot be captured by gene expression alone [20]. In contrast to developmental trajectories, tumorigenesis is a reversion towards a de-differentiated state as transformed cells reactivate chromatin states reminiscent of embryonic stem cells [20], [21]. However, their upward climb on the epigenetic landscape is haphazard as they also adopt chromatin profiles (and subsequently activate gene programs) of noncognate lineages [20]. Is there an example of epigenomic reversion where cells truly de-differentiate by climbing the same hill they previously rolled down from? This would imply retention and maintenance of information regarding prior fate decisions. A number of models have emerged to test this hypothesis in tissues like the skin [22], [23], jaw [24], digit tips [25], and the intestine [26], where injury induces an embryonic phenotype, enabling the tissue to regenerate and restore function. To fully appreciate the fundamental mechanisms by which cells navigate the epigenetic landscape requires techniques that capture cell-to-cell chromatin variations in defined populations. To this end, epigenomic techniques such as ATAC-seq [27], [28], ChIP-seq [29], Hi-C [30], [31], and DNase-seq [32] have recently been developed to assess chromatin structure at single-cell resolution. While the sensitivity of ChIP-seq traditionally required large (~ 107 cells) inputs, recent developments such as nano-ChIP [33], iChIP [34], MOWChIP [35], SurfaceChIP [36], and single-cell ChIP using droplet separation termed Drop-ChIP [37] have significantly reduced the input material. Impressively, Drop-ChIP profiled post-translational histone modifications that either enable (H3K4me3) or repress (H3K27me3) transcription at single-cell resolution with relatively high coverage (~ 1000 unique reads per cell) [37]. In particular, this review focuses on single-cell ATAC-seq as it has demonstrated significant promise in capturing large cell numbers (up to 100,000 single cells from 13 rodent tissues profiled at 9000–23,000 unique reads per cell depth [38]) to create organism-scale atlases comparable to those created using scRNA-seq. These datasets are now being integrated with single-cell transcriptomics and proteomics to generate multi-omic measurements. This review catalogs and compares recently developed scATAC-seq technologies and analysis tools that can be exploited to delineate principles of gene regulation and features of regulatory sequences, as well as extrapolates on what may be possible by integrating chromatin accessibility with other single-cell measurements.

scATAC-seq technologies

scATAC-seq library construction

The key element employed in ATAC-seq for detecting nucleosome-free regions in the genome is Tn5 transposase, a dimer of two chemically identical monomers that typically catalyzes the movement of transposons to different parts of the genome through a “cut-and-paste” mechanism (Figure 1). Since endogenous transposase is relatively inactive (as transpositions are highly mutagenic), a hyperactive version of the Tn5 transposase [17], [39] is first loaded with sequencing adapters to create a dimeric transposome complex (Figure 1A), and then introduced into intact nuclei where it simultaneously cuts exposed DNA and ligates the loaded sequences (Figure 1B). Since tightly packed heterochromatin has high steric hindrance, these sites remain inaccessible to transposase and makes fragmentation at these sites less probable. Following fragmentation, transposase inserts its adaptor payload to permanently mark active regulatory sites. Adapter-ligated fragments are isolated, and inserts that contain two different ends (s5 and s7) can be extended to add identifiers such as cell barcodes and sample indexes [16], [40] (Figure 1B and C). Highly accessible regions return significantly more sequencing reads that form peaks when called with peak-calling algorithms like ZINBA [41] (Figure 1D and E). Interestingly, close inspection of bulk ATAC peaks also reveal subregions (usually a few bp in length) that had escaped tagmentation since they were actively occupied by DNA-binding proteins and are considered “footprints” of those binding protein [16]. Bulk ATAC became a useful method for profiling accessible chromatin since: 1) its measurements are highly correlated with single- and double-cut DNase-seq [16], [42], 2) libraries can be generated from 500 cells (compared to >100,000 needed for DNase-seq), and 3) library generation requires two hours (compared to multiday DNase-seq preparation) [16].

Figure 1

ATAC-seq probes genome-wide chromatin accessibility using hyperactive Tn5 transposase

A. Schematic illustrating hyperactive Tn5 being loaded with sequencing adapters by mixing equal amounts of two indexed oligos (s5 and s7) with Tn5 and incubating the mixture for approximately one hour. B. During Tn5 tagmentation (fragmentation and tagging), the transposase cleaves accessible DNA and attaches adaptor overhangs within intact nuclei. Since nuclei are not fragmented in this process, bulk Tn5 tagging can be performed in scATAC reactions prior to partitioning tagged nuclei. C. Tagmentation generates three different products: 1) sequence with s5 at both ends, 2) sequence with s7 at both ends, or ideally, 3) sequence with s5 and s7 at opposite ends (as shown in the diagram). Only the final product (containing different ends) is amplifiable. Final library is generated by appending additional identifiers such as cell/sample-specific barcodes using PCR. D. scATAC libraries are paired end sequenced and mapped to a reference genome. E. Peak-calling algorithms identify enriched (peak) regions which correspond to open chromatin states. ATAC-seq, assay for transposase-accessible chromatin using sequencing; scATAC, single-cell assay for transposase-accessible chromatin.

ATAC-seq probes genome-wide chromatin accessibility using hyperactive Tn5 transposase A. Schematic illustrating hyperactive Tn5 being loaded with sequencing adapters by mixing equal amounts of two indexed oligos (s5 and s7) with Tn5 and incubating the mixture for approximately one hour. B. During Tn5 tagmentation (fragmentation and tagging), the transposase cleaves accessible DNA and attaches adaptor overhangs within intact nuclei. Since nuclei are not fragmented in this process, bulk Tn5 tagging can be performed in scATAC reactions prior to partitioning tagged nuclei. C. Tagmentation generates three different products: 1) sequence with s5 at both ends, 2) sequence with s7 at both ends, or ideally, 3) sequence with s5 and s7 at opposite ends (as shown in the diagram). Only the final product (containing different ends) is amplifiable. Final library is generated by appending additional identifiers such as cell/sample-specific barcodes using PCR. D. scATAC libraries are paired end sequenced and mapped to a reference genome. E. Peak-calling algorithms identify enriched (peak) regions which correspond to open chromatin states. ATAC-seq, assay for transposase-accessible chromatin using sequencing; scATAC, single-cell assay for transposase-accessible chromatin. To resolve chromatin accessibility at a single-cell resolution, the Shendure and Chang/Greenleaf laboratories modified methods that were originally developed for single-cell transcriptomics and de novo genome assembly to tag transposed DNA fragments from the same cell with a unique molecular barcode (Figure 2) [27], [28]. The two groups employed different molecular methods to achieve this. The study by Shendure’s group profiled >15,000 single cells using combinatorial cellular indexing, where nuclei were first tagged with barcoded Tn5 transposase in a 96 well plate, then pooled and diluted, and 15–25 nuclei were redistributed to another plate where a second barcode was added during PCR with primers targeting Tn5 adapters (Figure 2A) [28]. They reasoned that this “split-pool” approach would generate a large array of barcode combinations, and all fragments originating from the same cell can be identified by their shared combinatorial barcode. Although this approach overcame the need to individually compartmentalize single nuclei, since these libraries were prepared from thousands of single cells sequenced to an average depth of 2500 reads per barcode, the libraries had low complexity (number of unique fragments recovered). The Chang/Greenleaf laboratories programmed Fluidigm’s microfluidics chip to capture, transpose, and amplify DNA fragments from a single nucleus captured onto a microchamber (Figure 2B) [27]. This approach sampled fewer cells (~ 1600), but each cell received 30 times the sequencing coverage for an average of 73,000 reads per cell, generating at least a two-fold increase in library complexity. However, even with this approach, only 10% of all ubiquitously open regions were recovered from a single cell [43]. Notwithstanding limitations of sparse coverage, combinatorial indexing has been employed to profile accessibility from 13 different tissues (~ 100,000 cells) [38], developing Drosophila embryos (23,085 cells) [44], mouse forebrains (~ 15,000 cells) [45], and cortices of mice with Down Syndrome (13,766 cells) [46].

Figure 2

Schematic diagrams showing methods for transposition, barcoding, and library preparation for scATAC-seq

A. In combinatorial indexing, nuclei are first tagged in bulk via barcoded Tn5 transposase in a 96 well plate. Then, cells are pooled, and 15 to 25 nuclei are randomly sorted into another 96 well plate where a second barcode is added during PCR. The probability for two cells to share the same combination of barcode is between 6%−11%. B. Micro-chamber capture utilizes a plate with thousands of non-adherent, barcode-containing microwells with a central and a serpentine microfluid flow path. The first cell entering a microwell gets trapped in the central path and blocks entry of subsequent cells, forcing them to take the serpentine path and be captured by a downstream chamber. The cell trapped in the central path is subsequently subjected to lysis, transposition, and downstream library construction within the chamber. C. Nanodispensers use non-contact dispensing to place a single cell (stained with live/dead Hoechst stain) into a nanowell containing preprinted barcodes. Only wells containing a dispensed cell (approximately one third of the 5184 nanowells) are transposed to generate sequencing libraries. D. In Drop-seq, transposed nuclei are compartmentalized into nanoliter-sized aqueous droplets containing unique barcodes that are carried in a continuous oil phase.

Schematic diagrams showing methods for transposition, barcoding, and library preparation for scATAC-seq A. In combinatorial indexing, nuclei are first tagged in bulk via barcoded Tn5 transposase in a 96 well plate. Then, cells are pooled, and 15 to 25 nuclei are randomly sorted into another 96 well plate where a second barcode is added during PCR. The probability for two cells to share the same combination of barcode is between 6%−11%. B. Micro-chamber capture utilizes a plate with thousands of non-adherent, barcode-containing microwells with a central and a serpentine microfluid flow path. The first cell entering a microwell gets trapped in the central path and blocks entry of subsequent cells, forcing them to take the serpentine path and be captured by a downstream chamber. The cell trapped in the central path is subsequently subjected to lysis, transposition, and downstream library construction within the chamber. C. Nanodispensers use non-contact dispensing to place a single cell (stained with live/dead Hoechst stain) into a nanowell containing preprinted barcodes. Only wells containing a dispensed cell (approximately one third of the 5184 nanowells) are transposed to generate sequencing libraries. D. In Drop-seq, transposed nuclei are compartmentalized into nanoliter-sized aqueous droplets containing unique barcodes that are carried in a continuous oil phase. Recently, two additional microfluidic capture strategies have been developed. First, Takara Bio’s SMARTer ICELL8 platform employs individually indexed nanoliter-scale wells that can theoretically enable 5184 ATAC reactions to run in parallel (Figure 2C) [47]. A particular advantage of this platform is the integration of multi-color fluorescence imaging to determine wells that received single cells to selectively add reagents, thereby reducing library preparation/sequencing cost to ~ 81 per cell. Integration of imaging may also foreshadow methods that enable simultaneous imaging of accessible chromatin to reveal its positional identity within the native configuration of the nucleus, similar to its bulk counterpart ATAC-see [48]. Second, Bio-Rad’s SureCell® ATAC kit (http://www.bio-rad.com/en-ca/life-science-research/news/bio-rad-launches-scatac-seq-solution-for-early-access-customers?vertical=LSR&ID=Bio-Rad-Launches-scA_1537378901) [49] and 10X Genomics’s scATAC solution [40] employ microfluidics to partition up to 10,000 transposed nuclei into nanosized droplets, where content from each droplet is distinguished by one of ~ 750,000 unique barcodes (Figure 2D). Using the 10x technology, Satpathy and colleagues demonstrated that up to 6000 single cells can be individually barcoded whilst maintaining a reasonable multiplet rate (~ 1%), an approach that costs ~ 25 per cell [40]. Impressively, Lareau and colleagues further extend Bio-Rad’s microfluidics platform by combining droplet capture with combinatorial indexing (termed droplet-based sciATAC-seq) to enable simultaneous profiling of ~ 100,000 cells per microfluidic run. This conjugate approach profiled 502,207 cells from diverse mammalian and human tissues, including the rodent brain (~ 39 K cells), human blood (~ 60 K cells), and experimentally perturbed cell populations (~ 76 K cells) [49]. Since both commercial kits leverage their widely adopted barcoding platforms, ddSEQ Single-Cell Isolator and the Chromium System, an advantage for existing customers is that the scATAC-seq assay can be easily integrated into existing infrastructure. Finally, it is worth recognizing that scATAC is still in its infancy and its evolution is constrained by what is computationally and technologically feasible. One example is whether single cells or single nuclei should be the starting point for ATAC assays. Currently, scATAC-seq technologies have been optimized to isolate single nuclei to reduce mitochondrial reads and downstream sequencing costs (https://www.10xgenomics.com/solutions/single-cell-atac/) [42], [50]. However, two recent reports reveal that the 10- to 100-fold higher mutation rate in mtDNA (compared to nuclear DNA) can be employed as a natural barcode to reconstruct clonal architecture and lineage relationships in complex organisms [51], [52]. Impressively, both studies demonstrate mitochondrial mutation tracking in human tissues and tumors, a system where there is currently no method for fate mapping and studying clonal dynamics. Indeed, how scATAC will be further customized to impart novel single-cell measurements and go beyond deciphering causality between gene expression and chromatin accessibility are important open questions. In this regard, we highlight recent developments (Table 1) and extrapolate on what may be possible.

Table 1

Summary of molecular methods for single-cell isolation and ATAC library preparation

Note: PCR, polymerase chain reaction; CRISPR, clustered regularly interspaced short palindromic repeats; sgRNA, single guide RNA; TCR, T cell receptor; TRA, T cell receptor alpha; TRB, T cell receptor beta; sciATAC-seq, single-cell combinatorial indexing; μATAC-seq, nano-well scATAC-seq; SMARTer, switching mechanism at the 5′ end of the RNA transcript; Perturb-ATAC, perturbation-indexed scATAC-seq; T-ATAC-seq, transcript-indexed ATAC-seq; scPi-ATAC-seq, single-cell protein-indexed ATAC; scCAT-seq, single-cell chromatin accessibility and transcriptome sequencing; sciCAR-seq, single-cell combinatorial indexing-based chromatin accessibility and mRNA.

Summary of molecular methods for single-cell isolation and ATAC library preparation Note: PCR, polymerase chain reaction; CRISPR, clustered regularly interspaced short palindromic repeats; sgRNA, single guide RNA; TCR, T cell receptor; TRA, T cell receptor alpha; TRB, T cell receptor beta; sciATAC-seq, single-cell combinatorial indexing; μATAC-seq, nano-well scATAC-seq; SMARTer, switching mechanism at the 5′ end of the RNA transcript; Perturb-ATAC, perturbation-indexed scATAC-seq; T-ATAC-seq, transcript-indexed ATAC-seq; scPi-ATAC-seq, single-cell protein-indexed ATAC; scCAT-seq, single-cell chromatin accessibility and transcriptome sequencing; sciCAR-seq, single-cell combinatorial indexing-based chromatin accessibility and mRNA.

Customizations to scATAC-seq library construction

To corroborate chromatin accessibility with a cell’s function and to quantitate the impact of genetic perturbations on the epigenome, the Chang and Khavari laboratories repurposed a candidate reverse transcription (RT)-based approach previously paired with single-cell transcriptomics (Figure 3) [53], [54], [55]. Briefly, this approach, referred to as “Indexed Single-cell Seq,” seeks to enrich specific amplicons during PCR steps of library construction to provide an orthogonal measurement of the cell’s state or of the perturbations introduced.

Figure 3

Customizations to scATAC-seq enables high-throughput CRISPR screening and T cell clonotyping

A. Perturb-ATAC maps the impact of CRISPR perturbation on chromatin accessibility in single-cells. First, cells are transduced by sgRNA vectors containing a reporter sequence. FACS enriched cells are captured on microchambers (Figure 1C) and transposed with Tn5 enzyme. Following transposition, CRISPR sgRNAs are reversely transcribed using primers targeting the common 3′ end of sgRNA vectors. sgRNA and ATAC amplicons are amplified, pooled, sequenced, and analyzed for changes in TF features following genetic perturbations. B. T-ATAC-seq simultaneously profiles chromatin accessibility and TCRs in single T cells. Single CD4+ T cells are captured on microchambers (Figure 1C) where they are lysed, and their accessible chromatin transposed with Tn5 enzyme. TRα and TRβ transcripts (TRA and TRB) are reversely transcribed with primers targeting TRA and TRB, and ATAC amplicons are PCR amplified with well-specific barcodes, pooled, and sequenced. TF, transcription factor; sgRNA, single guide RNA; CRISPR, clustered regularly interspaced short palindromic repeats; FACS, fluorescence-activated cell sorting; Perturb-ATAC, perturbation-indexed scATAC-seq; T-ATAC-seq, transcript-indexed ATAC-seq; TCR, T cell receptor; TRA, T cell receptor alpha; TRB, T cell receptor beta; RT, reverse transcription; CDR3, complementarity-determining region 3.

Customizations to scATAC-seq enables high-throughput CRISPR screening and T cell clonotyping A. Perturb-ATAC maps the impact of CRISPR perturbation on chromatin accessibility in single-cells. First, cells are transduced by sgRNA vectors containing a reporter sequence. FACS enriched cells are captured on microchambers (Figure 1C) and transposed with Tn5 enzyme. Following transposition, CRISPR sgRNAs are reversely transcribed using primers targeting the common 3′ end of sgRNA vectors. sgRNA and ATAC amplicons are amplified, pooled, sequenced, and analyzed for changes in TF features following genetic perturbations. B. T-ATAC-seq simultaneously profiles chromatin accessibility and TCRs in single T cells. Single CD4+ T cells are captured on microchambers (Figure 1C) where they are lysed, and their accessible chromatin transposed with Tn5 enzyme. TRα and TRβ transcripts (TRA and TRB) are reversely transcribed with primers targeting TRA and TRB, and ATAC amplicons are PCR amplified with well-specific barcodes, pooled, and sequenced. TF, transcription factor; sgRNA, single guide RNA; CRISPR, clustered regularly interspaced short palindromic repeats; FACS, fluorescence-activated cell sorting; Perturb-ATAC, perturbation-indexed scATAC-seq; T-ATAC-seq, transcript-indexed ATAC-seq; TCR, T cell receptor; TRA, T cell receptor alpha; TRB, T cell receptor beta; RT, reverse transcription; CDR3, complementarity-determining region 3. By recovering the single guide RNA (sgRNA) through RT of an index called “guide barcode” during a microfluidics-based scATAC-seq run, Rubin and colleagues identified regions of the genome deleted in individual Cas9-expressing cells, and then analyzed the impact of those deletions on the regulatory dynamics broadly (Figure 3A) [56]. Since gRNA can target any region of the genome, this combinatorial deletion approach, termed perturbation-indexed single-cell ATAC-seq (or Perturb-ATAC), can unearth new insights into how coding (i.e., TFs, chromatin regulators) and noncoding genomes enact distinct chromatin regulatory networks in each cell. Already, a knockout screen on differentiating keratinocytes in vitro has revealed positive and negative regulation amongst interacting TFs and delineated how differentiation trajectories can be rerouted as a result of TF perturbation. As well, these analyses also reveal conceptual models of TF interactions that can only be observed by perturbing them in pairs. For example, it identified that, while the majority of TF interactions were additive (no interaction between the pair perturbed), a number of them were synergistic (positive interaction) or buffering (negative interaction). Further measurements supported the notion that genomic co-localization and co-binding of TF pairs may regulate synergistic TF activation, a mechanism that may be especially important for activating key regulatory genes that in turn, enact a distinct cell state. Combining Perturb-ATAC with other candidate index or mitochondrial lineage tracing methods can unearth unprecedented insights. For example, it’s theoretically possible to combine Satpathy and colleagues’ TCR sequencing with ATAC approach (termed ‘T-ATAC-seq’, Figure 3B) that enables studying chromatin landscapes in clonal T cells [57] with Perturb-ATAC [56] and mitochondrial lineage tracing [51]. Since T cells exhibit particularly high clonal dynamics, and since understanding these dynamics are critical in states like tumor infiltrating T lymphocytes to boost anti-tumor response, simultaneous assessments of perturbations that enhance T cell efficacy alongside TCR and epigenome/transcriptome sequencing would be critical for identifying the repertoire and regulatory drivers that can be targeted. Indeed, Yost, Satpathy, and colleagues forecast the potential of such multiomic T cell profiling approaches, as their combination of TCR and transcriptome sequencing revealed novel repertoires of T cell clones that may be suitable candidates for modulation in skin cancers [58].

Methods integrating scATAC-seq with other omic approaches

Deciphering regulatory associations between different genomic layers can impart novel perspectives on how cellular information flows from one layer to the next [59]. This approach, applied to single cells, can further reveal the layering of regulatory controls (and its evolution over time) that underlie complex processes such as fate choice during development or disease. Combining single-cell chromatin accessibility measurements with transcriptomics is of particular interest, as their joint analysis can reveal novel cis-regulatory sequences (non-coding DNA regions which modulate TF activity) and help catalog its impact on gene expression (Figure 4). This is particularly important as these regulatory sequences may play a more important role than their gene-coding counterpart [60], [61] and their purposeful correction/modulation may be crucial for regulating cell function [62].

Figure 4

Methods for single-cell multi-omics that integrate chromatin accessibility with proteomics and transcriptomics

A. scCAT-seq separates the nucleus and the cytoplasm from single cells sorted in a 96-well plate. The cytoplasm is subjected to full-length transcript capture using Smart-seq2 and the nucleus to transposition, and both are marked by a barcode unique to each well. B. sci-CAR-seq profiling starts with nuclei distributed in a 96-well plate. First, nuclear RNA is indexed by reversely transcribing poly(A) mRNA with a poly(T) primer carrying a well-specific barcode and a UMI. Then, accessible chromatin is indexed with transposase carrying a well-specific barcode. All nuclei are pooled, and 15 to 25 are randomly sorted into another 96 well plate where a second barcode is added during indexed PCR for RNA-seq or for ATAC-seq. Amplicons from both libraries are pooled and sequenced. C. scPi-ATAC-seq starts with fixed and permeabilized cells that are subjected to antibody staining and bulk transposition. Cells are then sorted into a 96 well plate where fluorescence emitted by antibodies are quantified, proteins are reverse crosslinked, and barcodes are added by indexing PCR. scCAT-seq, single-cell chromatin accessibility and transcriptome sequencing; sci-CAR-seq, single-cell combinatorial indexing-based chromatin accessibility and mRNA; scPi-ATAC-seq, single-cell protein-indexed ATAC-seq; UMI, unique molecular identifier.

Methods for single-cell multi-omics that integrate chromatin accessibility with proteomics and transcriptomics A. scCAT-seq separates the nucleus and the cytoplasm from single cells sorted in a 96-well plate. The cytoplasm is subjected to full-length transcript capture using Smart-seq2 and the nucleus to transposition, and both are marked by a barcode unique to each well. B. sci-CAR-seq profiling starts with nuclei distributed in a 96-well plate. First, nuclear RNA is indexed by reversely transcribing poly(A) mRNA with a poly(T) primer carrying a well-specific barcode and a UMI. Then, accessible chromatin is indexed with transposase carrying a well-specific barcode. All nuclei are pooled, and 15 to 25 are randomly sorted into another 96 well plate where a second barcode is added during indexed PCR for RNA-seq or for ATAC-seq. Amplicons from both libraries are pooled and sequenced. C. scPi-ATAC-seq starts with fixed and permeabilized cells that are subjected to antibody staining and bulk transposition. Cells are then sorted into a 96 well plate where fluorescence emitted by antibodies are quantified, proteins are reverse crosslinked, and barcodes are added by indexing PCR. scCAT-seq, single-cell chromatin accessibility and transcriptome sequencing; sci-CAR-seq, single-cell combinatorial indexing-based chromatin accessibility and mRNA; scPi-ATAC-seq, single-cell protein-indexed ATAC-seq; UMI, unique molecular identifier. To this end, two methods have been developed to simultaneously profile chromatin and transcriptome from single cells. The first method, called single-cell Chromatin Accessibility and Transcriptome Sequencing (scCAT-seq), sorts single cells into a 96-well plate, where the cell membrane is lysed and the nucleus separated from the cytoplasm due to physical dissociation (Figure 4A) [63]. Physical separation of the nucleus and the cytoplasm enables simultaneous transposition of the nucleus and full-length transcript capture from the cytoplasm using Smart-seq2. This generates high-confidence regulatory interactions for a small number of cells. The second method, called single-cell combinatorial indexing-based profiling of chromatin accessibility and mRNA (sci-CAR), trades the high sensitivity of Smart-seq2 for high-throughput offered with combinatorial indexing [64]. The key modification to the pool-sort strategy used for ATAC alone [28] is the addition of an RNA-seq index through RT that contains: 1) poly(T) sequence to capture transcripts, 2) a well/pooling round-specific barcode to trace the origin of transcripts, and 3) a unique molecular identifier (UMI) to unambiguously quantitate the number of transcript copies recovered (Figure 4B). Finally, although there is a paucity of methods that combine chromatin with comprehensive proteomics, the recently developed Protein-indexed ATAC (Pi-ATAC) seeks to correlate chromatin accessibility to phenotypes that are delineated based on protein markers [65]. By measuring fluorescence from antibodies raised against epitopes of interest in each well and using well-specific barcodes for ATAC fragments, this approach can generate phenotype-resolved accessibility signatures (Figure 4C). While such a method is limited by the number of resolvable fluorophores, it is well-suited for a system where (relative) expressions of key markers robustly identify cell states, which can then be correlated to its chromatin status. Albeit in its infancy, co-assaying the ‘input’ (chromatin accessibility) alongside gene or protein ‘outputs’ will reveal the combinatorial logic that generates diversity and dynamics in biological systems and eventually inform in silico models that predict functional impacts of perturbations to (non)coding regions of the genome [66], [67].

scATAC-seq analysis

Computational challenges in scATAC-seq analysis

In comparison to bulk chromatin accessibility, where cell ensembles exhibit a range of signal intensities across accessible regions, data from single cells is essentially binary. Although it is possible to have multiple Tn5-transposase insertions at one site in a single cell, since such events are rare [27]. The binarization of read count matrix is often the first processing step. By the same account, if a site receives no read, it is difficult to discern if that is truly an inaccessible chromatin region or a site that was missed by transposase or lost during amplification [68]. scATAC-seq data analysis is also more challenging than scRNA-seq due to the former’s considerably larger data matrices that have much higher sparsity. Since scATAC-seq matrices can contain counts from over hundreds of thousands of regulatory sites (compared to 20,000 protein coding genes typically assessed in RNA-seq) across thousands of single cells, the result is large datasets containing sparsely mapped reads. Similar to tackling the dropout problem in scRNA-seq data, where only a small percentage of the transcriptome is retrieved for each cell, resulting in a sparse gene-barcode matrix, an antidote to scATAC-seq sparsity is aggregating signals across similar cells and across sites that share a common feature [27], [28]. However, aggregating cells results in a loss of the single-cell nature of scATAC-seq. Moreover, grouping sites by a common feature (i.e., TF motif) requires significant a priori knowledge. Despite these challenges, scATAC-seq tools can: 1) perform discrete and continuous cell groupings [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], 2) identify cis-regulatory regions and their target genes [76], [83], and impressively, 3) integrate single-cell chromatin dataset with other single-cell measurements, such as scRNA-seq [84], [85], [86].

Bioinformatics workflow for scATAC-seq analysis

A full comparative overview of bioinformatic tools, either published through a peer-reviewed stream (9/19) or uploaded on preprint servers (10/19) that are accessible through open code repositories are described in Table 2. Briefly, scATAC-seq processing constitutes the following five sequential steps.

Table 2

Summary of open source bioinformatic tools for scATAC-seq analysis

Note: CRAN, comprehensive R archive network; scABC, single cell Accessibility Based Clustering; SOM, self-organizing maps; coupled NMF, coupled nonnegative matrix factorizations; STREAM, single-cell trajectories reconstruction, exploration and mapping; Scasat, single-cell ATAC-seq analysis tool; Destin, detection of cell-type specific difference in chromatin accessibility; SCRAT, single cell R analysis toolkit; BROCKMAN, brockman representation of chromatin by K-mers in mark-associated nucleotides; SCENIC, single-cell rEgulatory network inference and clustering; TFBS, transcription factor binding site; MATCHER, manifold alignment to CHaracterize experimental relationship; BIRD, Big data regression for predicting DNase I hypersensitivity.

Summary of open source bioinformatic tools for scATAC-seq analysis Note: CRAN, comprehensive R archive network; scABC, single cell Accessibility Based Clustering; SOM, self-organizing maps; coupled NMF, coupled nonnegative matrix factorizations; STREAM, single-cell trajectories reconstruction, exploration and mapping; Scasat, single-cell ATAC-seq analysis tool; Destin, detection of cell-type specific difference in chromatin accessibility; SCRAT, single cell R analysis toolkit; BROCKMAN, brockman representation of chromatin by K-mers in mark-associated nucleotides; SCENIC, single-cell rEgulatory network inference and clustering; TFBS, transcription factor binding site; MATCHER, manifold alignment to CHaracterize experimental relationship; BIRD, Big data regression for predicting DNase I hypersensitivity.

Processing raw sequencing data

This includes trimming sequencing adapters, eliminating poor quality reads, mapping paired-end reads, eliminating cells with a library size that falls below a chosen threshold, and aggregating cell barcodes for downstream processing. For users employing a commercial platform like 10X Genomics, accompanying analysis pipelines such as CellRanger-ATAC (https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/what-is-cell-ranger-atac) may readily perform these steps.

Pre-defined feature selection

Typically, the complete list of chromosomal locations is not inspected for differential accessibility. Instead, the overall variance in a dataset is reduced by restricting the analysis to pre-defined motifs or a list of genomic/gene set annotations. For example, chromVAR, one of the first scATAC-seq tools developed by the Greenleaf lab, aggregated reads based on known TF motifs. This aggregation converted sparse accessibility by cell barcode matrix to a more stable, bias-corrected deviation per TF motif matrix, enabling cell-to-cell similarity measurements by analyzing gain or loss of accessibility within defined genomic features [69].

Heterogeneity calculation of single cells

Using the filtered accessibility–barcode matrix, a dissimilarity measurement that quantitates the extent of divergence between two cells is calculated based on differential peak accessibility. For example, Scasat employs Jaccard distance as a dissimilarity index [87] by identifying shared and distinct peaks. SCRAT first aggregates co-activated sites to derive pathway level accessibility. It then uses pathway-level accessibility as a feature to compute cell dissimilarity. It is shown that aggregation can increase signal-to-noise ratio and lead to improved dissimilarity measures that better separate cells by cell type [75].

Dimensionality reduction

Since the accessibility matrix is a high-dimensional dataset, reducing its dimensionality, such that biological variance is retained but random variables contributing to noise are eliminated, is critical for downstream analysis and visualization. Multidimensional scaling (MDS) [88], t-distributed stochastic neighbor embedding (tSNE) [89], and Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) [90] are common dimensionality reduction techniques where the number of dimensions included is user-specified and depends on the overall complexity of the dataset.

Cell clustering and differential accessibility analysis

Based on heterogeneity in chromatin accessibility, clustering algorithms modified for scATAC-seq (i.e., k-medoid giving higher weight to cells with higher sequencing coverage [77]) hierarchically cluster cells into different groups. A number of different statistical tests (i.e., Fisher exact, Information gain) can then be employed to calculate differential accessibility signatures between two clusters. A more targeted two-sided t-test can be used to determine whether a particular motif is differentially accessible between two clusters. Extending differential accessibility measurements, a number of tools have started to impart novel perspectives on the chromatin regulatory landscape. Below, we highlight three particularly impressive approaches that enable: 1) mapping co-accessible DNA elements to connect regulatory sequences with its targets, 2) predicting cell-type specific transcription factor activity, and 3) integrating scATAC-seq data with scRNA-seq, spatial transcriptomics, and other single-cell measurements. We conclude by highlighting that methods capable of predicting chromatin status from scRNA-seq datasets will further improve the power of both techniques and enable superior integration across single-cell modalities.

Predicting DNA interactions by mapping chromatin co-accessibility

The mechanisms by which cis-regulatory sequences engage in long-range interactions to recruit transcriptional machinery and control gene expression is highly cell type-, state-, and context-specific [91], [92]. Indeed, such widespread and context-dependent enhancer–promoter associations pose a significant challenge to generating computational models of gene expression and the way they change as a result of perturbation [66]. Although long-range contacts can be quantified in cell ensembles using structural genomic approaches, such as ChIA-PET [93], Hi-C [94], and HiChIP [95], [96], there are no methods to infer this in single cells. Cicero, a software developed in the Shendure/Trapnell labs, seeks to infer these interactions by mapping the co-accessibility of promoters with other regulatory sites. This infers that the cis-regulatory landscapes that enact a given homeostatic gene program enable tracking changes in this landscape over pseudotime by modifying trajectory inference algorithms developed for Monocle2 [97], [98]. As a proof-of-concept, Cicero’s application to myoblast differentiation revealed that while accessible promoter analysis explained 17% of the variance, the addition of promoter–distal site linkages explained 2.27-fold higher variance in expression. Extending this idea further, sequences that are in close physical proximity, share common TF targets, and undergo synchronized changes in accessibility over pseudotime may qualify as “chromatin hubs” [99], and their modulation could preferentially bias a cell’s fate. Tracking chromatin co-accessibility enables interrogating fundamental biological questions in a new light, including the fate reversion question posed in the beginning of this review. Knowing the extent to which reacquisition of embryonic-exclusive TFs are driven by interactions with the same distal regulatory sequences after injury, and their accessibility/co-accessibility status in the uninjured adult tissue, will be critical for assessing whether a cell remembers its lineage history and harbors a latent capacity to navigate Waddington’s valley in a reverse direction.

Predicting cell type-specific TF function

An emerging concept in single-cell regulomics is that expression, activation, and function of any particular TF is determined by the complete composition of co-factors active at a given time. As such, methods that use differences in features for cell grouping fail to appreciate how the presence or absence of other factors may impact the function of each feature used. To circumvent this and to improve prediction of the cell type-specific impact of a TF, Aerts group’s cisTopic [83] employs topic modelling, where co-accessible regions are assigned to one or multiple regulatory topics with varying strengths, and these topics (and not its constituent features) are then used for cell grouping. This approach revealed an interesting feature about the SOXE TFs: while a core SOXE topic (SOX9/10) was robustly present in a variety of cell types (i.e., melanocytes, melanoma, oligodendrocytes, astrocytes), its predicted motif and co-regulatory factors were cell-type and cell-state dependent. These observations were explained by a model where SOXE proteins function as pioneering TFs that unmask chromatin domains and recruit cell type-specific co-factors (i.e., OLIG in oligodendrocytes) to enforce somatic cellular programs. A recent assessment comparing ten scATAC tools on a standardized benchmarking platform revealed that, while cisTopic was slightly more sensitive to noise, along with SnapATAC [100] and Cusanovich2018 [38], it outperformed other methods used for separating cell types/states [101]. Revisiting the discussion on the epigenetic basis of injury-induced fate reversions, it can be concluded that it is possible, at least in theory, to carry out an assessment of somatic cells’ propensity to activate pioneering TFs (perhaps by alleviating repression that prevents their function during adulthood) that subsequently recruit lineage-specific TFs to reenact development.

Leveraging strengths of different single-cell modalities

Understanding biochemical mechanisms that endow cell type-specific functions requires comprehensive integration of modalities, measuring distinct cell states and purposeful use of one measurement to inform interpretations of another. For example, out of all the single-cell measurement assays, transcriptomics is the most powered for detecting cell types, yet it is devoid of positional context of each cell in a three-dimensional tissue. While spatial transcriptomics can resolve the distribution of transcripts, limitations in how closely positional barcodes (or barcoded beads) can be placed precludes single-cell [102], [103] or subcellular resolution [104]. Similarly, while scATAC-seq profiles chromatin accessibility, and with inclusion of mitochondrial reads enables robust lineage reconstruction [51], [52], its sparse and binarized nature derives statistical strength from cluster (not single-cell) analysis. As such, algorithms that integrate each modality to generate harmonized atlases add significant value. To this end, algorithms have been devised to perform join dimensionality reduction to compare two single-cell datasets. In Seurat Version 3 [84], Linked Inference of Genomic Experimental Relationships (LIGER) [105], a concept similar to mutual nearest neighbors is applied, where each cell in the first dataset (say scATAC-seq) is queried against all cells in a second dataset (say scRNA-seq) to identify its closest neighbor across datasets [106]. This is subsequently repeated, but now starting from the second (scRNA-seq) dataset where each cell is queried against cells in the first (scATAC-seq) dataset. If both analyses identify a mutual pair, these two separate measurements are thought to be taken from the same underlying cell state and the pairs are integrated. MATCHER is another tool that integrates data types by performing the alignment of pseudotime from transcriptome and epigenome data, although it works well on trajectories that are linear rather than branched [107]. A similar but more generalizable approach utilizes self-organizing maps (SOMs) [85], a type of artificial neural network that is trained to generate low-dimensional representations of either scRNA-seq or scATAC data. Resulting SOMs for each data type can then be used to identify groups of co-expressed genes and open chromatin elements in order to identify those that jointly change across datasets. Advantages of this approach include the ability to analyze time series and a lack of reliance on inferred pseudotime. While benchmarking studies have compared scRNA-seq analysis methods [108] (i.e., tools for Gene Regulatory Network construction [109], [110]) and a recent study extended this approach to compare ten scATAC tools [101], none have systematically compared data integration algorithms. Parameters that will need to be assessed using standardized benchmarking frameworks include joint dimensionality reduction/clustering, label transfer, resistance to variable sequencing coverage, impact of changing signal-to-noise ratios, as well as practical considerations such as running times and CPU/memory requirements especially when considering integration of large (i.e., >80,000 single-cell in sci-ATAC-seq mouse atlas [38]) datasets across multiple modalities. An essential prerequisite to integration is sufficient batch correction of all datasets being integrated [111], especially if the variance introduced due to batch is comparable to variance across cell types/states. An ever-evolving challenge continues to be the need to improve molecular methods to recover greater percentage of the data for each cell and to develop imputation and predictive methods, both of which can increase statistical power for integration. Together, robust integrative analyses will enable the transfer of biological and computational phenotypes from one dataset to another. For example, a lineage trajectory predicted based on RNA splicing kinetics from transcriptomics [112] can be directly compared with one delineated based on mtDNA mutations from accessible chromatin [51], [52]. Such analyses will provide unprecedented insights and inform both the biology and the bioinformatic tools employed to decipher it.

Predicting chromatin accessibility from transcriptomics

Single-cell RNA-seq is currently the most widely used single cell functional genomic technology. Most scRNA-seq datasets are generated without accompanying scATAC-seq data. Conventionally, scRNA-seq is only used for analyzing transcriptome but not chromatin accessibility. However, a recent study from the Ji lab has shown that chromatin accessibility can be predicted using scRNA-seq (Figure 5) [113]. They used DNase-seq and RNA-seq samples from matching cell types generated by the ENCODE and Roadmap Epigenomics projects to train prediction models, leveraging the recently developed Big Data Regression for Predicting DNase I hypersensitivity (BIRD) algorithm [114]. The trained prediction models are then applied to new bulk or single-cell RNA-seq samples to predict genome-wide chromatin accessibility (Figure 5A). Interestingly, using bulk DNase-seq data from the corresponding cell type [115] as ‘ground truth’, BIRD prediction using low-input RNA-seq data from 30 and 100 cells outcompeted ATAC-seq signals from 500 cells. Moreover, compared to scATAC-seq performed using the same number of cells, chromatin accessibility predicted by BIRD from scRNA-seq is more continuous, less sparse, and can more accurately reconstruct gold standard bulk DNase-seq and bulk ATAC-seq profiles in the same cell type (Figure 5B).

Figure 5

Predicting chromatin accessibility from single-cell transcriptomics

A. Overview of the prediction approach. B. BIRD-predicted chromatin accessibility, experimental scATAC-seq data, and bulk ATAC-seq data for two cell types (GM12878 and H1) are compared in a sample genomic region. The scATAC-seq signals are sparse and discrete, while BIRD-predicted signals are more continuous and correlate better with the bulk ATAC-seq signals. BIRD, Big Data Regression for predicting DNase I hypersensitivity.

Predicting chromatin accessibility from single-cell transcriptomics A. Overview of the prediction approach. B. BIRD-predicted chromatin accessibility, experimental scATAC-seq data, and bulk ATAC-seq data for two cell types (GM12878 and H1) are compared in a sample genomic region. The scATAC-seq signals are sparse and discrete, while BIRD-predicted signals are more continuous and correlate better with the bulk ATAC-seq signals. BIRD, Big Data Regression for predicting DNase I hypersensitivity. This prediction approach has at least three potential applications. First, when an investigator only has scRNA-seq data but not scATAC-seq data, one can use scRNA-seq to predict the chromatin accessibility landscape and cis-regulatory element activities. Because the vast majority of single-cell genomic data are scRNA-seq, prediction may substantially increase the value of both existing and new scRNA-seq experiments. Second, when scATAC-seq data are available, chromatin accessibility predicted by scRNA-seq can be used as pseudo-replicates to improve signal quality. It has been shown that combining these pseudo-replicates with experimental scATAC-seq data can tremendously increase the signal-to-noise ratio and the accuracy for inferring cis-regulatory element activities [113]. Third, when both scRNA-seq and scATAC-seq data are available, prediction may be used as a bridge to align and integrate these two distinct data types. Thus, predicting chromatin accessibility using scRNA-seq offers a new tool for single-cell regulome analysis. It is useful both when scATAC-seq data is unavailable and when it is available. One limitation of prediction is that it requires training data. Currently, only human and mouse have matched chromatin accessibility and transcriptome data from a broad spectrum of cell types that may support training prediction models. Most other species do not have enough training data. However, as more data become available, this limitation will eventually be resolved in the future. Moreover, most of today’s scRNA-seq data (94% of scRNA-seq samples in SRA) are from human and mouse. Thus, prediction in these two species alone would be tremendously useful. Overall, predicting chromatin accessibility from scRNA-seq is a new area that has yet to be fully explored. There is plenty of room to improve prediction accuracy. New methods that can better handle technical biases in single-cell data and more effectively train prediction models for ultra-high-dimensional responses (i.e., genome-wide chromatin accessibility) using ultra-high-dimension predictors (i.e., transcriptome) could make this approach more powerful in the future.

Conclusion

How organisms achieve a division of labor amongst trillions of cells that enable emergent properties like consciousness to arise has intrigued philosophers for centuries. Scientists have since appreciated that specialized cell function, which forms the basis of division in labor, is the final product of many layers of combinatorial control. Single-cell genomics has started to impart a detailed glimpse of the epigenetic landscapes that maintain this specialized cell function and has started to correlate these landscapes with distinct gene and protein expression outputs. Methods that assess chromatin accessibility at a single cell resolution are becoming powerful tools for inferring this epigenetic landscape and for asking how it changes following perturbation, injury, disease, or more gradually over micro/macro-evolutionary timescales. We predict that its applications will impart novel insights into important questions, like: Do somatic cells retain an epigenetic memory of their lineage? If so, how do they recall the trajectories they traversed to adopt a final fate? Such memories may explain their latent capacity to revert back to an embryonic stage and reactivate tissue-specific developmental programs after sustaining an injury [22], [23], [24], [25], [26]. Understanding how specialized cells maintain such memories can inform drug treatments that purposefully activate regenerative programs to improve the quality of healing and restore native tissue function. Single-cell chromatin measurements can also unearth novel dynamics in non-coding regions that underlie changes over evolutionary or disease processes. Comparative epigenomic studies have implicated changes in chromatin accessibility in driving cis-regulatory evolution across different species [116], [117]. The ability to profile cell-to-cell divergences in chromatin landscapes can enable “cellular anthropology” with unprecedented resolution and track how noncoding regulatory elements change with the emergence of new cell types and species. Indeed, integration of single-cell transcriptomic and chromatin accessibility data from rodents has revealed that enhancers may facilitate modifications to existing gene regulatory networks by integrating de novo genes transcribed from regions of the genome that were previously non-coding. Such wiring and rewiring of gene regulatory networks may explain phenomena like the birth of de novo gene sets across eumetazoan evolution [118]. In the context of human diseases, the majority of pathogenic variants lie within non-coding functional elements that lead to gene dysregulation. Since multiple layers of the epigenome are dysfunctional in multiple cell types, leading to tumor initiation and progression [119], [120], integration of scATAC-seq with low-input or single-cell ChIP-seq for selected TFs/histone modifications may be an attractive approach. Simultaneous measurements of TF binding and chromatin accessibility enabled by ‘ChIPmentation’ [121] or ‘CUT&Tag’ [122] can reveal definitive binding sites of key TFs (eliminating reliance on motif inferences or deciphering ATAC TF footprints). Since ‘CUT&Tag’ relies on tethering pA-Tn5 transposomes to antibodies bounded to a target chromatin protein, it is well suited for low cell inputs and can be configured with commercial single-cell platforms, such as 10X Genomics droplet separation system (Figure 2B) or SMARTer ICELL8 nano-dispensation system (Figure 2D) [122]. Alternatively, orthogonal single-cell ChIP and ATAC measurements assaying the same underlying cell types/states may be bioinformatically integrated by identifying shared biological states, such that information captured by one technique may inform the interpretation of another. Notwithstanding these expectations, single-cell chromatin accessibility profiling will lead to many unexpected discoveries, each revealing mechanisms of gene regulation that underlie the remarkable cell diversity and plasticity seen in complex tissues.

CRediT author statement

Sarthak Sinha: Conceptualization, Writing - original draft, Visualization, Writing - review & editing. Ansuman T. Satpathy: Writing - review & editing. Weiqiang Zhou: Writing - original draft, Writing - review & editing, Visualization. Hongkai Ji: Writing - original draft, Writing - review & editing, Visualization. Jo A. Stratton: Writing - review & editing. Arzina Jaffer: Visualization, Writing - review & editing. Nizar Bahlis: Writing - review & editing, Supervision. Sorana Morrissy: Writing - review & editing, Supervision. Jeff A. Biernaskie: Writing - review & editing, Supervision, Funding acquisition. All authors read and approved the final manuscript.

Competing interests

ATS is an advisor to Immunai. There are no other conflicts of interest to be declared.

113 in total

1. Towards a map of cis-regulatory sequences in the human genome.

Authors: Meng Niu; Ehsan Tabari; Pengyu Ni; Zhengchang Su
Journal: Nucleic Acids Res Date: 2018-06-20 Impact factor: 16.971

2. Global prediction of chromatin accessibility using small-cell-number and single-cell RNA-seq.

Authors: Weiqiang Zhou; Zhicheng Ji; Weixiang Fang; Hongkai Ji
Journal: Nucleic Acids Res Date: 2019-11-04 Impact factor: 16.971

3. High-throughput single-cell ChIP-seq identifies heterogeneity of chromatin states in breast cancer.

Authors: Kevin Grosselin; Adeline Durand; Justine Marsolier; Adeline Poitou; Elisabetta Marangoni; Fariba Nemati; Ahmed Dahmani; Sonia Lameiras; Fabien Reyal; Olivia Frenoy; Yannick Pousse; Marcel Reichen; Adam Woolfe; Colin Brenan; Andrew D Griffiths; Céline Vallot; Annabelle Gérard
Journal: Nat Genet Date: 2019-05-31 Impact factor: 38.330

4. Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data.

Authors: Hannah A Pliner; Jonathan S Packer; José L McFaline-Figueroa; Darren A Cusanovich; Riza M Daza; Delasa Aghamirzaie; Sanjay Srivatsan; Xiaojie Qiu; Dana Jackson; Anna Minkina; Andrew C Adey; Frank J Steemers; Jay Shendure; Cole Trapnell
Journal: Mol Cell Date: 2018-08-02 Impact factor: 17.970

Review 5. Chromatin modifiers and remodellers: regulators of cellular differentiation.

Authors: Taiping Chen; Sharon Y R Dent
Journal: Nat Rev Genet Date: 2013-12-24 Impact factor: 53.242

6. Droplet Barcoding-Based Single Cell Transcriptomics of Adult Mammalian Tissues.

Authors: Jo Anne Stratton; Sarthak Sinha; Wisoo Shin; Elodie Labit; Tak-Ho Chu; Prajay T Shah; Rajiv Midha; Jeff Biernaskie
Journal: J Vis Exp Date: 2019-01-10 Impact factor: 1.355

Review 7. Multi-omics approaches to disease.

Authors: Yehudit Hasin; Marcus Seldin; Aldons Lusis
Journal: Genome Biol Date: 2017-05-05 Impact factor: 13.583

8. Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations.

Authors: Zhana Duren; Xi Chen; Mahdi Zamanighomi; Wanwen Zeng; Ansuman T Satpathy; Howard Y Chang; Yong Wang; Wing Hung Wong
Journal: Proc Natl Acad Sci U S A Date: 2018-07-09 Impact factor: 11.205

9. BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization.

Authors: Carl G de Boer; Aviv Regev
Journal: BMC Bioinformatics Date: 2018-07-03 Impact factor: 3.169

10. Joint single-cell DNA accessibility and protein epitope profiling reveals environmental regulation of epigenomic heterogeneity.

Authors: Xingqi Chen; Ulrike M Litzenburger; Yuning Wei; Alicia N Schep; Edward L LaGory; Hani Choudhry; Amato J Giaccia; William J Greenleaf; Howard Y Chang
Journal: Nat Commun Date: 2018-11-02 Impact factor: 14.919

6 in total

1. Advanced Single-cell Omics Technologies and Informatics Tools for Genomics, Proteomics, and Bioinformatics Analysis.

Authors: Luonan Chen; Rong Fan; Fuchou Tang
Journal: Genomics Proteomics Bioinformatics Date: 2021-12-16 Impact factor: 7.691

2. Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis.

Authors: Luca Alessandri; Maria Luisa Ratto; Sandro Gepiro Contaldo; Marco Beccuti; Francesca Cordero; Maddalena Arigoni; Raffaele A Calogero
Journal: Int J Mol Sci Date: 2021-11-25 Impact factor: 5.923

Review 3. Bibliometric review of ATAC-Seq and its application in gene expression.

Authors: Liheng Luo; Michael Gribskov; Sufang Wang
Journal: Brief Bioinform Date: 2022-05-13 Impact factor: 13.994

4. Single-nucleus chromatin accessibility and RNA sequencing reveal impaired brain development in prenatally e-cigarette exposed neonatal rats.

Authors: Zhong Chen; Wanqiu Chen; Yong Li; Malcolm Moos; Daliao Xiao; Charles Wang
Journal: iScience Date: 2022-06-30

Review 5. Single-cell transcriptomics in human skin research: available technologies, technical considerations and disease applications.

Authors: Georgios Theocharidis; Stavroula Tekkela; Aristidis Veves; John A McGrath; Alexandros Onoufriadis
Journal: Exp Dermatol Date: 2022-03-04 Impact factor: 4.511

Review 6. Molecular regulation of neuroinflammation in glaucoma: Current knowledge and the ongoing search for new treatment targets.

Authors: Gülgün Tezel
Journal: Prog Retin Eye Res Date: 2021-08-01 Impact factor: 21.198

6 in total