Literature DB >> 34196990

Application of single cell genomics to focal epilepsies: A call to action.

Sattar Khoshkhoo^1,2,3,4,5, Dennis Lal^5,6,7,8, Christopher A Walsh^2,3,4,5,9,10.

Abstract

Focal epilepsies are the largest epilepsy subtype and associated with significant morbidity. Somatic variation is a newly recognized genetic mechanism underlying a subset of focal epilepsies, but little is known about the processes through which somatic mosaicism causes seizures, the cell types carrying the pathogenic variants, or their developmental origin. Meanwhile, the inception of single cell biology has completely revolutionized the study of neurological diseases and has the potential to answer some of these key questions. Focusing on single cell genomics, transcriptomics, and epigenomics in focal epilepsy research, circumvents the averaging artifact associated with studying bulk brain tissue and offers the kind of granularity that is needed for investigating the consequences of somatic mosaicism. Here we have provided a brief overview of some of the most developed single cell techniques and the major considerations around applying them to focal epilepsy research.

Entities: Chemical

Keywords: focal epilepsy; single cell genomics; somatic variant

Mesh：

Year: 2021 PMID： 34196990 PMCID： PMC8412079 DOI： 10.1111/bpa.12958

Source DB: PubMed Journal: Brain Pathol ISSN： 1015-6305 Impact factor: 6.508

INTRODUCTION

Focal epilepsies are a heterogeneous group of disorders that are associated with significant morbidity and approximately one‐third of focal epilepsy patients do not respond to available anti‐seizure medications (1, 2, 3). The most common type of focal epilepsy, temporal lobe epilepsy (TLE), is notoriously pharmacoresistant and in roughly two‐thirds of the medically refractory cases requires surgical intervention, which is not always effective (4) and can have negative effects on cognition (5). In the pediatric population, malformations of cortical development (MCD) account for the majority of focal epilepsies and may need surgical resection if deemed medically refractory, although outcomes vary greatly based on pathology (6). Notably, the success rate is lower among patients who do not have a lesion visible on MRI (7, 8, 9, 10). One of the most important advances in the field over past two decades, is establishing a clear link between somatic variants and MCD. Somatic variants arise when spontaneous DNA damage escapes DNA repair machinery and most commonly gives rise to single nucleotide variants (SNVs) and indels (11, 12, 13), or to larger structural abnormalities such as copy number variants (CNVs) (14, 15). Historically, twin studies (16) and de novo variant discovery in genetic generalized epilepsies (17, 18) represented the first wave of investigation in epilepsy genetics, but naturally, the main focus was on identifying damaging germline variants that essentially excluded most focal epilepsies. Despite epilepsy surgery providing a unique opportunity to investigate the affected brain tissue directly (19), the recognition of somatic variation as a major contributor to focal epilepsies was delayed partly due to technical factors such as sequencing technology and partly as the result of the barriers to routinely testing surgical resections. Naturally, identifying somatic mosaicism in subtypes of focal epilepsies created a new wave of excitement as it explained the focality of lesions and/or neuronal circuit disruptions that are typically observed. This created a paradigm shift, moving us away from descriptive pathology to molecular genetics as the standard diagnostic approach to focal epilepsy. While sequencing genomic DNA in blood or saliva is sufficient to diagnose germline variants and a small fraction of somatic variants (20), the recognition of somatic mosaicism in focal epilepsies brought to light the importance of studying the affected tissue directly. This is not only key for the detection of somatic variants with low variant allele fraction (VAF), but also to discern the impact of these variants in situ. In order to understand whether a somatic variant contributes to disease in a cell‐autonomous manner or if it acts through disruption of complex cellular networks, it is important to know which cells carry the variant and how they differ from their genotypically normal counterparts. In the next phase of discovery that ensued, targeted sequencing of the affected brain tissue helped identify a large set of somatic variants, the majority of which were associated with MCD (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33). This experimental approach has not been as effective for non‐lesional cases, particularly most adult‐onset focal epilepsies, although these cases are not as well studied. Furthermore, even in “solved” cases, bulk sequencing is inherently blind to the specific cell types or clones carrying the pathogenic variants and rarely provides a mechanistic explanation for epileptogenesis. The absence of a clear correlation between genotype and clinical phenotype hinders the efforts to devise new targeted treatments. Unlike some germline genetic epilepsies where the genetic diagnosis is now informing treatment, the limited understanding of molecular mechanisms in focal epilepsies has hindered the use of genetic diagnosis in clinical decision making. While we grapple with these fundamental challenges in studying and treating focal epilepsies, the inception of single cell biology has completely revolutionized cancer research and treatment and is beginning to permeate the study of all other human diseases. Focusing on single cell genomics, transcriptomics, and epigenomics in focal epilepsy research, circumvents the averaging artifact associated with studying bulk brain tissue and offers the kind of granularity that is needed for investigating the consequences of somatic variants. Moreover, since most focal epilepsies are not yet associated with a causal gene variant, identifying defective transcriptional or epigenetic changes at the single cell level may shed light on the affected cell types/pathways and provide an opportunity for intervention. Here we will review some of the key single cell genomic techniques with their unique advantages and limitations, and highlight a few areas in focal epilepsy where they could be immediately applied.

SOMATIC MOSAICISM IN FOCAL EPILEPSIES AND THE NEED FOR SINGLE CELL GENOMIC APPROACHES

It is estimated that somatic variants develop at a rate of ~1–3 variants per division per cell during early embryonic development (11, 13, 34, 35). This suggests that a typical individual carries ~80 somatic single nucleotide variants (sSNVs) in ≥2% of cells with ~2% of these being exonic variants (34). Some striking estimates predict ~37%–45% of new exonic variants that have not yet undergone evolutionary selection to be damaging (34, 36, 37), which means about half the population carries ≥1 damaging exonic sSNV at ≥2% of cells (34). This number does not include the potentially toxic sSNVs in non‐coding parts of the genome that have also been linked to neurodevelopmental disorders such as autism (34, 38, 39, 40, 41, 42, 43). Nor does it take into account the effect of somatic structural variants such CNVs that are known causes of focal epilepsies (44, 45). In other words, the contribution of somatic variants to focal epilepsies is likely more far‐reaching than what is known to date. For example, only a handful of genes have been implicated in focal epilepsies so far, while larger scale investigation is underway that will certainly expand that list. Animal studies have demonstrated that cell type specific knock‐ins or deletions of ion channel or small molecule transporter genes such as SCN1A (46, 47, 48), SCN2A (49), SCN8A (50), KCNQ2 (51), and GLT1 (52) can generate a seizure phenotype, which supports the notion that a subset of focal epilepsies may be caused by somatic variation in these classic germline epilepsy gene families. It is important to note though, even in MCD where causal somatic variants have been identified in a significant number of cases, the downstream effects of these variants on cells and on neuronal circuits are largely unknown. This is partly due to the fact that epigenetic factors such as chromatin accessibility and DNA methylation heavily and dynamically regulate gene expression in each cell, making it very difficult to study the effects of somatic variation in bulk tissue. These interactions may be even more intricate in focal epilepsies that are associated with germline variants in genes such as TSC1, TSC2, DEPDC5, NPRL2, which require a somatic second‐hit (21, 32). Focal cortical dysplasia type 2 (FCD II) is an example that highlights some of these challenges in assessing somatic variants. The majority of known somatic variants that cause FCD II, either directly or indirectly activate the PI3K/Akt/mTOR signaling cascade (21, 53). This is a crucial intracellular signaling pathway that is involved in a range of intracellular processes such as protein synthesis, gene transcription, autophagy, nutrient sensing, growth, etc (54). However, depending on the timing of mTOR overactivation, the specific part of the pathway affected, and the particular clones involved, cellular phenotypes could range from FCD, to neurodegeneration, to death (55). In FCD II, a simplistic explanation may be that mTOR overactivation during development disrupts normal neuro‐glial differentiation, migration, and integration into the neuronal circuit, which gives rise to epilepsy. While this hypothesis may be partially valid, it is unlikely to capture the full spectrum of mTOR functions, in particular the ongoing effects of mTOR overactivation on cellular excitability and altered inhibition in mature neuronal circuits (53). The fact that mTOR inhibitors improve seizure control in animal models of FCD (23), as well as patients with FCD and tuberous sclerosis complex (TSC) (56) is a clear demonstration of that point. Several RNA‐sequencing and whole‐genome methylation profiling studies have had modest success in characterizing some of the differentially expressed genes and unique methylation signatures in FCD (57, 58). For example, Kobow et al. identified unique methylation signatures that distinguished subtypes of FCD from TLE and non‐epileptic controls (57). However, these studies are inherently limited in scope and more granular investigation of single cell transcriptomes and epigenomes is necessary to push the study of FCD and other focal epilepsies to the next level.

TRANSCRIPTIONAL AND EPIGENETIC PROFILING OF SINGLE CELLS

Single cell (nucleus) RNA sequencing

mRNA is the molecule through which the identity of a cell, its activities, and function are determined—it is the direct link between genotype and phenotype in a single cell. Naturally, studying mRNA expression in the affected tissue can provide a great deal of information about the pathogenesis and downstream consequences of a diseased state. To study gene expression, we previously relied on reverse transcription (RT)‐PCR of specific genes or microarray analysis of the transcriptome, which are low throughput and riddled with technical limitations (59). But with the advent of RNA sequencing (60, 61, 62), high throughput and accurate identification, quantification, and discovery of new genes and their splicing isoforms became possible (59, 60, 61, 62). Development of reliable single cell transcriptome amplification (63, 64, 65) paved the way for single cell RNA sequencing (scRNA‐seq) (64, 65). In just over a decade since the first proof of concept scRNA‐seq experiment (65), there has been an explosion in the number of scRNA‐seq tools with significant improvements in the technology (66, 67, 68). Adaptation of single cell cDNA library preparation to single nuclei (69) facilitated the application of this important technology to human tissue where isolating whole cells is frequently not possible (70, 71). Additionally, the adoption of unique molecular identifiers (UMIs) allowed for absolute quantification of molecular counts, increasing accuracy, reducing cost, and improving throughput (72, 73). scRNA‐seq methods can be divided based on two key features: single cell isolation and cDNA library preparation strategies. The initial step in all single cell molecular genetics techniques is assigning a unique identifier to each cell, either through physical separation of individual cells/nuclei or through single cell combinatorial indexing (SCI). Physical isolation of cells in individual wells of microtiter plates (74, 75, 76, 77) can be achieved through limiting dilution (71, 78), micromanipulation with a capillary pipette (79), fluorescent activated cell sorting (FACS) (74, 80), or laser capture microdissection (81). But most recent techniques use either microfluidic devices to isolate cells in nanoliter droplets (82, 83) or apply split‐pool barcoding to index cDNA molecules from each cell with a unique molecular tag without attempting physical separation (84) (Figure 1). The key distinction between cDNA library preparation techniques is whether full‐length transcripts are sequenced (66, 75) or if only the 3’‐ (82) or 5’‐ends (85) of the transcripts are captured.

FIGURE 1

SCI‐ and droplet‐based cell isolation are the most popular barcoding strategies used by most high‐throughput single cell genomic applications. SCI uniquely tags the nucleic acid molecules in each cell through serial mixing, splitting, and barcoding. The higher the number of barcoding steps, the higher the number of cells that can be uniquely tagged in each experiment. Droplet‐based techniques rely on physical isolation of individual cells and engineered barcoded beads in nanoliter droplets, which limits their scalability but they produce less noisy results Each scRNA‐seq strategy has its own advantages and drawbacks, which should be carefully considered for the specific application in mind (68). Full‐transcript scRNA‐seq technologies have lower transcript dropout rates, allow for isoform detection and RNA editing analysis, and are in general superior in capturing rare and lowly expressed transcripts (86). However, they require physical isolation of single cells in microtiter plates and rely on significant amounts of sequencing per cell to cover the relatively large cDNA library, which renders them labor‐intensive, inefficient, and expensive for most high‐throughput applications. The most popular single cell full‐transcript sequencing technique is Smart‐seq2 (75) that is commercially available and was recently updated to include UMIs for improved isoform quantification (Smart‐seq3 (73)). Microfluidic droplet‐based approaches are highly scalable, optimized for transcript quantification, and capture only one end of the transcript to reduce sequencing cost through smaller cDNA libraries (87). The main drawbacks are that only a fraction of the transcripts in each cell is captured, diminishing their efficiency in detecting low‐abundance transcripts, the 3’‐ or 5’‐bias significantly reduces their utility for isoform quantification or allele expression detection, and although cheaper than full‐transcript techniques in cost of sequencing per cell, they are still expensive. Recently, commercial droplet‐based scRNA‐seq technologies, such as the product by 10X Genomics, have become the standard approach for most scRNA‐seq applications. It is noteworthy that SCI‐based scRNA‐seq is gaining some traction due to theoretically unlimited scalability and lower cost. These techniques do not require physical separation of cells and are entirely performed at the benchtop (84), but until recently had lower transcript‐capture efficiency compared to droplet‐based approaches (84), which limited their utility (88). While not a standard application of the technique, it is noteworthy that RNA‐seq (89, 90) and scRNA‐seq (86, 91) data could also be used for germline and somatic variant discovery, although with a lot of limitations.

Single cell characterization of chromatin accessibility

Nucleosome, which consists of an octamer of histone proteins encircled by DNA, is the basic structural element of chromatin (92)—the complex responsible for packaging the long DNA molecule in eukaryotic cells. Nucleosomes, along with other chromatin binding structures, indirectly affect cellular function by restricting access to parts of DNA. For example, internucleosomal DNA is rich in gene regulatory elements (GREs) such as enhancers, promoters, insulators, as well as transcription factor binding sites (93, 94). Nucleosome occupancy is not a binary variable and it can dynamically change from closed and inaccessible chromatin to open and fully accessible chromatin (95). These features create an intricate and dynamic process through which gene expression is regulated in each cell. In other words, chromatin accessibility is a surrogate marker for transcription factor binding and regulatory potential of a given locus (92), and not only provides information about the current state of a cell, but it can also predict its future function (88, 96). The most commonly used techniques for measuring chromatin accessibility across the genome are DNAse I hypersensitive site sequencing (DNAse‐seq) (97), assay for transposase‐accessible chromatin (ATAC‐seq) (98), and micrococcal nuclease sequencing (MNase‐seq) (99), which all rely on enzymatic cleavage of the DNA molecule to mark open regions of chromatin. Both DNAase‐seq and ATAC‐seq have be adapted for single cell applications (100, 101), but due to the ability of Tn5 transposase to easily tag cleaved oligonucleotide fragments in each nucleus, ATAC‐seq has been the most easily scalable and widely used. An array of commercial and non‐commercial technologies have been developed for scATAC‐seq that uses microfluidic capture (102), individually indexable wells of a nano‐well array (103), SCI (104), and droplet‐based microfluidic isolation (105). Similar to scRNA‐seq, combinatorial strategies offer excellent scalability, but the library complexity is lower than the microfluidic‐based approaches (92). Given that only ~10% of the DNAse I hypersensitive sites are detected via scATAC‐seq (101), this could appear as a major drawback of the technique. However, similar to scRNA‐seq, SCI‐based protocols are rapidly improving (88) and will likely be the standard in the future.

Single cell methylation profiling

DNA methylation is an important epigenetic modification that plays a key role in the regulation of transcription, X chromosome inactivation, genomic imprinting, and chromosomal stability through silencing of transposable elements (106, 107, 108, 109). 5‐methylcytosine is the most common methylated DNA base in vertebrates (110). Cytosine is generally methylated in the context of a CpG dinucleotide—cytosine followed by guanine on the same DNA strand—which clusters together in distinct genomic regions called CpG islands (111). CpG islands are frequently associated with promoters and enhancers of gene expression, so hypermethylation indicates repression of these GREs whereas DNA hypomethylation is a surrogate for active regulation of gene transcription (111, 112). Even though DNA methylation has been an active area of scientific exploration for years (113), techniques for measurement of single cell genome‐wide methylation still face some technical hurdles. Recently whole‐genome bisulfide sequencing (WGBS) has established itself as the gold standard for bulk DNA methylation sequencing, covering as much as ~95% of the CpG sites (114). In WGBS, unmethylated cytosines are deaminated into uracil, while methylated cytosines remain unaltered (115). When combined with next‐generation sequencing, methylated cytosines can be detected as the single base resolution (116). To overcome the costs associated with deep whole‐genome sequencing required for WGBS, reduced representation bisulfite sequencing (RRBS) was developed that uses methylation‐insensitive restriction enzymes to generate smaller sequencing libraries (117, 118). The main drawback for RRBS is that it only covers ~10% of the total CpG sites, which means regions of low CpG density such as enhancers are not well‐represented (119). Several different single cell adaptations of both bisulfide‐based and restriction enzyme‐based methylation sequencing have been developed, but they have several key differences and should be chosen based on the biologic question in mind (115). The first iteration of single cell methylation profiling used a bisulfide‐based approach, but suffered from poor and inconsistent coverage across different cells (120). Some of these limitations are inherent to bisulfite conversation, as it causes DNA degradation (115), but using UMIs (121), bisulfite conversion prior to adapter ligation (122), and PCR amplification of the tagged fragments, have extended the coverage rate to ~18% of all CpG sites (scBS‐seq) (123). Further improvements in library preparation and read mapping have increased uniformity and reduced artifactual reads (snmC‐seq2) (124, 125). However, they are lower throughput compared to SCI‐based strategies (sci‐MET) (126) that are highly scalable at the cost of lower data quality. To circumvent the problems associated with bisulfite conversion, a small number of single cell methylation techniques utilize methylation‐sensitive restriction enzymes (127, 128), but their lower resolution and non‐quantitative design, limits their application. Overall, single cell methylation profiling is more challenging and less developed compared to other single cell molecular genetics tools, but if applied to the right biological question, it could be quite powerful—particularly when combined with scRNA‐seq or scATAC‐seq.

Combined transcriptional and epigenetic analysis of single cells

It has been demonstrated that DNA hypomethylated and DNAse I hypersensitive sites overlap at a high rate, suggesting that a combination of these signatures may reflect stages of enhancer activation (119, 129), Importantly though, GRE accessibility and DNA methylation state change independently during cell fate transitions with delayed loss of methylation in regions of open chromatin (129). This creates a complex dynamic between these important epigenetic regulatory mechanisms through which transcription is regulated. In other words, independent measurements of gene transcription, chromatin accessibility, and DNA methylation will not tell the full story on epigenetic regulation of gene expression at the single cell level. A combined assay could be quite impactful in unraveling how different cell types are affected by focal epilepsies and even predict their ongoing adaptative response to seizures. Since most combined single cell approaches are derived from techniques that have already been described, we will not elaborate on each method, but rather list some of the more popular options available. The most robust co‐assays developed to date, profile RNA and chromatin accessibility in single cells (scRNA+ATAC‐seq). The first generation of these techniques were sci‐CAR (130), Paired‐seq (131), and SNARE‐seq (132), which used similar protocols, but relied on SCI vs droplet‐based barcoding of single cells. While an important achievement, these initial methods produced lower quality data compared to what could be generated by individual scRNA‐seq or scATAC‐seq assays (88). The second‐generation scRNA+ATAC‐seq techniques, which include SHARE‐seq (88) and a commercial product by 10X genomics, have closed that gap considerably with remarkable improvement in data quality. Other available co‐assays simultaneously profile single cell transcriptome and methylome (scM&T‐seq (133)), single cell nucleosome occupancy and DNA methylation (scNMOe‐seq (134)), and single cell transcriptome, chromatin accessibility, and DNA methylation (scNMT‐seq (135)), but they have lower throughput and are less developed compared to scRNA+ATAC‐seq. It is important to note that combined single cell techniques are in their nascency and have not been extensively applied to the study of human brain. In other words, they may have important limitations that are not yet known, however, a handful of examples offer a glimpse of their future potential in studying the human brain and focal epilepsies (88, 131, 136).

APPLICATION OF SOMATIC VARIANTS TO LINEAGE TRANCING IN NORMAL AND DISEASED BRAINS

During neurogenesis approximately 105 neurons are generated per minute (137, 138, 139), making the brain particularly susceptible to somatic variants that have been estimated to accumulate at rates as high rate as ~5.1 sSNVs per day per cerebral cortical progenitor (11). Different studies have estimated as many as ~300–900 somatic SNVs for a post‐zygotic neuron soon after birth (11, 140). Somatic CNVs are not thought to be as common—though more difficult to detect—but they reportedly happen in up to 41% of human neurons (141). Somatic transposon insertion events are an important and well‐studied cause of somatic subchromosomal CNVs that happen at rates significantly lower than sSNVs (15, 142). When discussing somatic mutagenesis, typically the primary focus is on its role in disease causation. Importantly though, somatic variants that are present in all the clones of a progenitor, can also serve as a lineage map to determine both their origin and timing of development (11). For example, when a neuroglial progenitor has spontaneous DNA damage that escapes the DNA repair machinery, it accumulates variants that are unique to that cell and are passed down to its progeny as a lineage barcode (Figure 2). If the lineage‐defining clonal variants in a given tissue are known—typically through bulk or synthetic bulk WGS (143)—it is possible to trace back the developmental origin of cortical neurons or glia by identifying the variants that they share. The cell types and the fraction of cells carrying a specific somatic variant could serve as surrogate markers to infer the developmental timing of when a mutation occurred (11, 144). This technique is particularly powerful when combined with RNA sequencing, as it can draw a connection between specific cell types, their developmental origins, and their timeline of differentiation (143). Lineage tracing in the human brain has so far been mostly limited to the study of normal tissue (11, 143, 144), but it is easy to imagine how this transformative technology could elucidate the mechanisms of genetic focal epilepsies. Lineage tracing could shed light on the timing of when pathogenic somatic variants arise and the conditions under which they cause disease. Beyond the obvious diagnostic and treatment implications, knowing when and how pathogenic somatic variants occur could provide clues about potential modifiable factors and may eventually lead to preventive measures.

FIGURE 2

Somatic variants are spontaneously acquired during development. All the somatic variants in a progenitor cell are passed down to its daughter cells. The number of cells carrying a specific variant is an indirect marker for the developmental timepoint at which it was generated Let's continue using FCD II as an example. One of the hallmarks of FCD II is the presence of dysmorphic neurons (DN) and balloon cells (BC, FCD IIb) (145). Naturally, understanding the developmental lineage of these aberrant cell types is an important step in deciphering how FCD lesions develop and how they give rise to epilepsy. One of the first attempts at lineage tracing in FCD, used single cell microdissection and an X‐androgen receptor (XAR) inactivation (146) to show disparate XAR CAG repeat lengths in single DNs and BCs, and proposed a possible role for random X‐inactivation in FCD (146). Several follow‐up studies have used more advanced techniques such as laser capture microdissection and SNP genotyping to detect the presence of somatic SNVs in individual DNs and BCs. One such study looking at a DEPDC5‐related FCD IIa, demonstrated that the second‐hit somatic DEPDC5 pathogenic variant was enriched in DNs compared to their normal‐appearing counterparts (32, 147). Another study that used a similar experimental approach in FCD IIb, showed enrichment of pathogenic MTOR and PIK3CA variants in DNs and BCs compared to morphologically normal‐appearing neurons and glial cells (32). While DNs typically have neuronal properties and BCs express some glial markers, there is a range of intermediate cellular phenotypes that share properties of both as well markers of immaturity (53, 145). Older studies relying on immunohistochemistry and cell type‐specific antibodies, have suggested that DNs and BCs arise from the telencephalic ventricular zone and neuroglial progenitors (148). Although, cytomegalic interneurons have also been reported (149), calling the developmental origin of these cells into question. In a more recent effort to characterize the developmental lineage of FCD II, D’Gama et al. used FACS to sort neurons vs non‐neuronal cells followed by scWGA and SNP genotyping to demonstrate an apparent enrichment of pathogenic variants in the neuronal lineage (21). However, due to technical limitations, they stopped short of characterizing the specific neuronal cell types. Animal studies have had modest success, honing in on the developmental timing of somatic mutagenesis in FCD (23, 27), but much is left to be desired. In the meanwhile, with the advent of single cell DNA sequencing (scDNA‐seq), single cell lineage tracing in normal human brain has been advancing rapidly. Application of this technology to FCD and other focal epilepsies could help answer many of these important questions.

Single cell whole‐genome amplification and sequencing

To better understand the advantages and drawbacks of scDNA‐seq technologies it is important to review some concepts in bulk DNA sequencing first. The gold standard for unbiased discovery of clonal somatic variants is unamplified bulk whole‐genome sequencing (bWGS) (15). However, the human genome is approximately 3 billion base pairs in size, which means bWGS can be prohibitively expensive at the high sequencing depth that is required for the detection of rare somatic variants (21, 139). To circumvent the enormous financial burden of bWGS, target capture/amplification techniques such whole‐exome sequencing (WES) and gene panel sequencing, have gained popularity in disease‐associated somatic variant discovery. These techniques though very efficient and high yield, suffer from major artifacts associated with PCR duplication errors (150) that reduce mosaic variant validation rate (151), and by definition miss any pathogenic variants outside the selected genomic regions. In broad terms, accuracy, resolution, and efficiency are competing interests in bulk DNA sequencing technologies. The same challenges persist in scDNA‐seq, but are even more magnified since the starting material is just one DNA molecule. A normal diploid human genome contains about 6‐7 picograms of DNA (152), but several nanograms of input DNA are required for NGS. Logically, a DNA amplification step is necessary. All single cell whole‐genome amplification (scWGA) strategies are imperfect, and to that end, the utility of a specific scDNA‐seq approach is determined by the scWGA strategy applied (Table 1). Some common considerations include amplification bias with preferential amplification of specific genomic regions (uniformity), allelic dropouts (coverage), and nucleotide copy errors (false positive mutations) (153). The Holy Grail of single cell genomics is a scWGA technique that covers the entire genome, is uniform, has low copy errors, and is scalable. Since the first attempt at scWGA (154), there has been a lot of progress although two scWGA techniques have dominated the field so far. These methods that rely on non‐linear exponential amplification are degenerate oligonucleotide‐primed PCR (DOP‐PCR) (155) and multiple displacement amplification (MDA) (156). DOP‐PCR is fast, uniform, and readily accessible through popular commercial products such as GenomePlex (Sigma‐Aldrich) (157), but it has low coverage of the genome and is error prone (158). MDA is another popular and commercially available strategy for scWGA (159) that offers high coverage of the genome and low error rate, but it lacks uniformity (15). Given their highly uniform product, PCR‐based approaches are more suitable for CNV analysis, while high coverage and low error rate make MDA ideal for SNV detection (14, 144, 160). To minimize the random non‐uniform amplification associated with MDA, a semi‐linear scWGA technique, multiple annealing and looping‐based amplification cycles (MALBAC) (161) was created. MALBAC generates self‐annealing amplicons to facilitate several cycles of linear amplification. However, it requires exponential amplification during the final steps and has a higher false‐positive SNV rate compared to MDA (162). Linear amplification via transposon insertion (LIANTI) (163) took a major leap by solely utilizing linear amplification, offering more uniformity and improved coverage. LIANTI is not yet commercially available and it relies on a more complex protocol and Tn5 transposases so its adoption has been slow. A very recent development in scWGA is primary template‐directed amplification (PTA) (164) that similar to MDA uses the high‐fidelity Phi29 DNA polymerase, but unlike MDA it generates the majority of the copies from the primary DNA strand to achieve linear amplification. PTA is a commercial product and relatively expensive, but it achieves uniform, high coverage, low error rate scWGA through a simple protocol.

TABLE 1

Comparison between common scWGA techniques

	Amplification strategy	Uniformity	Coverage	Error rate

DOP‐PCR	Exponential
MDA	Exponential
MALBAC	Semi‐linear
LIANTI	Linear
PTA	Linear

Number of upward arrows reflects assay reliability in each represented category; one arrow depicts least and three arrows represents most reliable.

Comparison between common scWGA techniques Number of upward arrows reflects assay reliability in each represented category; one arrow depicts least and three arrows represents most reliable. High coverage scWGS is not easily scalable due to prohibitive sequencing costs, limiting the study of SNVs to a small subset of cells (140, 144). On the other hand, low coverage sequencing is sufficient for large CNV detection, making it the primary high throughput application of scWGS (14). Many microfluidic‐based and droplet‐based adoptions of PCR‐ (165), MDA‐ (166, 167), MALBAC‐based (168) scWGA techniques exist. SCI has also been applied to scWGS (169) including a new technique that combinates SCI with linear amplification (170). But due to the inherent limitations outlined above, all of these techniques are only optimized for studying CNVs at a large scale, which is helpful if CNVs play a major role in the disease under investigation such as hemimegalencephaly (HME) (44). Another barrier to using scWGS for large‐scale lineage tracing experiments is limited access to transcriptional information to perform detailed cell type analysis, forcing us to resort to nuclear sorting for the determination of cellular identity. A handful of techniques have had modest success in combining scRNA‐seq and scDNA‐seq (G&T‐seq (171) and DR‐seq (172)), but the protocols are laborious and transcript dropout limits their utility. A recent method, parallel RNA and DNA analysis after sequencing (PRDD‐seq), used a microfluidic approach to simultaneously interrogate cell‐type‐specific cDNA and lineage‐informative sSNVs using single cell qPCR (143). This approach facilitated large‐scale lineage tracing with the incorporation of more granular cellular identities. Another promising new tool that has not been yet applied to lineage tracing, sci‐L3, utilizes SCI to perform single cell RNA and SNP‐genotyping in thousands of cells (170). Sci‐L3 does not require a priori knowledge about patterns of gene expression, which allows for the identification of new and rare cell types. While new and better techniques will be emerging in the near future, many of the available tools can be immediately used to study cellular lineage in focal epilepsies.

PRIVATE VARIANTS AND THE POTENTIAL FOR GENOTOXIC DAMAGE IN EPILEPSY

Surprisingly, even terminally differentiated neurons continue to accumulate private variants (somatic variants unique to each cell) at a rate of ~23 SNVs per year per neuron in the prefrontal cortex and at ~40 SNVs per year per neuron in the dentate gyrus (140). It is difficult to conceive, and it is unlikely that focal epilepsy is caused by these truly private variants. It is, however, plausible that cells that are exposed to chronic seizures acquire private somatic variants at an accelerated rate, due to increased oxidative stress and disruption of normal cellular homeostasis. If this were to be the case, a subset of these private somatic variants in exonic or regulatory regions may have direct function‐altering or toxic effects with deleterious consequences at the single cell level. It has been previously demonstrated that neurons in FCD and HME express abnormally high levels of phosphorylated tau (173, 174), which is typically a molecular feature of neurodegenerative diseases (175). Interestingly, patients with AD are at increased risk of having seizures (176), which has also been corroborated in animal models of AD (177). In other words, it is possible that genotoxic damage from chronic seizures plays a role in treatment‐refractory epilepsy and could be a plausible explanation for accelerated neurodegeneration seen in patients with epilepsy (178, 179).

THE CHALLENGE OF NON‐LESIONAL FOCAL EPILEPSIES AND OPPORTUNITY FOR SINGLE CELL INVESTIGATION

In order for somatic variants to be detected by the current diagnostic approaches, the variant allele fraction (VAF) should typically exceed 1% in the tested tissue (21). This may partly account for the fact that most somatic variants have been detected in MCD, where the affected tissue is easy to identify and the pathogenic variants arise mid‐gestation so they are expressed in a higher percentage of cells (53). Since some neurogenesis continues after birth (180, 181), clonal somatic variants may continue to be passed down to a small subset of daughter cells postnatally. While the percentage of cells harboring such variants is likely extremely small, it is nevertheless theoretically possible that these variants contribute to the pathogenesis of a subset of adolescent‐ and adult‐onset focal epilepsies such as TLE. At this juncture, this claim is purely theoretical with no experimental evidence to support it. Nevertheless, irrespective of whether somatic mosaicism contributes to the development of non‐lesional focal epilepsies, single cell DNA, RNA, and epigenomic sequencing will give us the opportunity to shed light on the specific cell types affected, the burden of clonal and private somatic variants, and the GREs involved in the disease process.

CONCLUDING REMARKS AND OUTLOOK

Epilepsy is one of the oldest diseases described in human literature and one of the most studied neurologic diseases, yet our approach to treating it has remained unchanged for centuries. Part of the challenge is that the limited clinical classification of seizures is not reflective of the great molecular heterogeneity underlying different seizure types. The growing influence of somatic mosaicism in the scientific discourse surrounding focal epilepsies has generated a novel, mechanistic framework that takes into account genetic diversity at the single cell level. In light of this, to adequately investigate the molecular mechanisms underlying focal epilepsies, it is obligatory that we take advantage of single cell genomic approaches. Here we have provided a brief overview of a few available single cell techniques and some major considerations around using them. It is important to note that despite their differences, many of these techniques have reached their maturity and can be immediately utilized to study focal epilepsies.

175 in total

1. Somatic Mutations in TSC1 and TSC2 Cause Focal Cortical Dysplasia.

Authors: Jae Seok Lim; Ramu Gopalappa; Se Hoon Kim; Suresh Ramakrishna; Minji Lee; Woo-Il Kim; Junho Kim; Sang Min Park; Junehawk Lee; Jung-Hwa Oh; Heung Dong Kim; Chang-Hwan Park; Joon Soo Lee; Sangwoo Kim; Dong Seok Kim; Jung Min Han; Hoon-Chul Kang; Hyongbum Henry Kim; Jeong Ho Lee
Journal: Am J Hum Genet Date: 2017-02-16 Impact factor: 11.025

2. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder.

Authors: Joon-Yong An; Kevin Lin; Lingxue Zhu; Donna M Werling; Shan Dong; Harrison Brand; Harold Z Wang; Xuefang Zhao; Grace B Schwartz; Ryan L Collins; Benjamin B Currall; Claudia Dastmalchi; Jeanselle Dea; Clif Duhn; Michael C Gilson; Lambertus Klei; Lindsay Liang; Eirene Markenscoff-Papadimitriou; Sirisha Pochareddy; Nadav Ahituv; Joseph D Buxbaum; Hilary Coon; Mark J Daly; Young Shin Kim; Gabor T Marth; Benjamin M Neale; Aaron R Quinlan; John L Rubenstein; Nenad Sestan; Matthew W State; A Jeremy Willsey; Michael E Talkowski; Bernie Devlin; Kathryn Roeder; Stephan J Sanders
Journal: Science Date: 2018-12-14 Impact factor: 47.728

3. Bisulfite-independent analysis of CpG island methylation enables genome-scale stratification of single cells.

Authors: Lin Han; Hua-Jun Wu; Haiying Zhu; Kun-Yong Kim; Sadie L Marjani; Markus Riester; Ghia Euskirchen; Xiaoyuan Zi; Jennifer Yang; Jasper Han; Michael Snyder; In-Hyun Park; Rafael Irizarry; Sherman M Weissman; Franziska Michor; Rong Fan; Xinghua Pan
Journal: Nucleic Acids Res Date: 2017-06-02 Impact factor: 16.971

4. Modeling transformations of neurodevelopmental sequences across mammalian species.

Authors: Alan D Workman; Christine J Charvet; Barbara Clancy; Richard B Darlington; Barbara L Finlay
Journal: J Neurosci Date: 2013-04-24 Impact factor: 6.167

5. Single cell lineage analysis in human focal cortical dysplasia.

Authors: Yue Hua; Peter B Crino
Journal: Cereb Cortex Date: 2003-06 Impact factor: 5.357

6. Highly multiplexed and strand-specific single-cell RNA 5' end sequencing.

Authors: Saiful Islam; Una Kjällquist; Annalena Moliner; Pawel Zajac; Jian-Bing Fan; Peter Lönnerberg; Sten Linnarsson
Journal: Nat Protoc Date: 2012-04-05 Impact factor: 13.491

7. Patterns and rates of exonic de novo mutations in autism spectrum disorders.

Authors: Benjamin M Neale; Yan Kou; Li Liu; Avi Ma'ayan; Kaitlin E Samocha; Aniko Sabo; Chiao-Feng Lin; Christine Stevens; Li-San Wang; Vladimir Makarov; Paz Polak; Seungtai Yoon; Jared Maguire; Emily L Crawford; Nicholas G Campbell; Evan T Geller; Otto Valladares; Chad Schafer; Han Liu; Tuo Zhao; Guiqing Cai; Jayon Lihm; Ruth Dannenfelser; Omar Jabado; Zuleyma Peralta; Uma Nagaswamy; Donna Muzny; Jeffrey G Reid; Irene Newsham; Yuanqing Wu; Lora Lewis; Yi Han; Benjamin F Voight; Elaine Lim; Elizabeth Rossin; Andrew Kirby; Jason Flannick; Menachem Fromer; Khalid Shakir; Tim Fennell; Kiran Garimella; Eric Banks; Ryan Poplin; Stacey Gabriel; Mark DePristo; Jack R Wimbish; Braden E Boone; Shawn E Levy; Catalina Betancur; Shamil Sunyaev; Eric Boerwinkle; Joseph D Buxbaum; Edwin H Cook; Bernie Devlin; Richard A Gibbs; Kathryn Roeder; Gerard D Schellenberg; James S Sutcliffe; Mark J Daly
Journal: Nature Date: 2012-04-04 Impact factor: 49.962

8. Laser capture microscopy coupled with Smart-seq2 for precise spatial transcriptomic profiling.

Authors: Susanne Nichterwitz; Geng Chen; Julio Aguila Benitez; Marlene Yilmaz; Helena Storvall; Ming Cao; Rickard Sandberg; Qiaolin Deng; Eva Hedlund
Journal: Nat Commun Date: 2016-07-08 Impact factor: 14.919

9. Improved DOP-PCR (iDOP-PCR): A robust and simple WGA method for efficient amplification of low copy number genomic DNA.

Authors: Konstantin A Blagodatskikh; Vladimir M Kramarov; Ekaterina V Barsova; Alexey V Garkovenko; Dmitriy S Shcherbo; Andrew A Shelenkov; Vera V Ustinova; Maria R Tokarenko; Simon C Baker; Tatiana V Kramarova; Konstantin B Ignatov
Journal: PLoS One Date: 2017-09-11 Impact factor: 3.240

10. Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing data.

Authors: Fenglin Liu; Yuanyuan Zhang; Lei Zhang; Ziyi Li; Qiao Fang; Ranran Gao; Zemin Zhang
Journal: Genome Biol Date: 2019-11-19 Impact factor: 13.583

3 in total

Review 1. Recent advances and application of whole genome amplification in molecular diagnosis and medicine.

Authors: Xiaoyu Wang; Yapeng Liu; Hongna Liu; Wenjing Pan; Jie Ren; Xiangming Zheng; Yimin Tan; Zhu Chen; Yan Deng; Nongyue He; Hui Chen; Song Li
Journal: MedComm (2020) Date: 2022-02-03

2. The ILAE consensus classification of focal cortical dysplasia: An update proposed by an ad hoc task force of the ILAE diagnostic methods commission.

Authors: Imad Najm; Dennis Lal; Mario Alonso Vanegas; Fernando Cendes; Iscia Lopes-Cendes; Andre Palmini; Eliseu Paglioli; Harvey B Sarnat; Christopher A Walsh; Samuel Wiebe; Eleonora Aronica; Stéphanie Baulac; Roland Coras; Katja Kobow; J Helen Cross; Rita Garbelli; Hans Holthausen; Karl Rössler; Maria Thom; Assam El-Osta; Jeong Ho Lee; Hajime Miyata; Renzo Guerrini; Yue-Shan Piao; Dong Zhou; Ingmar Blümcke
Journal: Epilepsia Date: 2022-06-15 Impact factor: 6.740

3. An introduction to the mini-symposium on "The Neuropathology of Focal Human Epilepsy".

Authors: Ingmar Blümcke
Journal: Brain Pathol Date: 2021-07 Impact factor: 6.508

3 in total