Réka Barbara Bod1, János Rokai2,3, Domokos Meszéna2,4, Richárd Fiáth2,4, István Ulbert2,4, Gergely Márton2,4. 1. Laboratory of Experimental Neurophysiology, Department of Physiology, Faculty of Medicine, George Emil Palade University of Medicine, Pharmacy, Science and Technology of Târgu Mureş, Târgu Mureş, Romania. 2. Integrative Neuroscience Group, Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Budapest, Hungary. 3. School of PhD Studies, Semmelweis University, Budapest, Hungary. 4. Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.
Abstract
The meaning behind neural single unit activity has constantly been a challenge, so it will persist in the foreseeable future. As one of the most sourced strategies, detecting neural activity in high-resolution neural sensor recordings and then attributing them to their corresponding source neurons correctly, namely the process of spike sorting, has been prevailing so far. Support from ever-improving recording techniques and sophisticated algorithms for extracting worthwhile information and abundance in clustering procedures turned spike sorting into an indispensable tool in electrophysiological analysis. This review attempts to illustrate that in all stages of spike sorting algorithms, the past 5 years innovations' brought about concepts, results, and questions worth sharing with even the non-expert user community. By thoroughly inspecting latest innovations in the field of neural sensors, recording procedures, and various spike sorting strategies, a skeletonization of relevant knowledge lays here, with an initiative to get one step closer to the original objective: deciphering and building in the sense of neural transcript.
The meaning behind neural single unit activity has constantly been a challenge, so it will persist in the foreseeable future. As one of the most sourced strategies, detecting neural activity in high-resolution neural sensor recordings and then attributing them to their corresponding source neurons correctly, namely the process of spike sorting, has been prevailing so far. Support from ever-improving recording techniques and sophisticated algorithms for extracting worthwhile information and abundance in clustering procedures turned spike sorting into an indispensable tool in electrophysiological analysis. This review attempts to illustrate that in all stages of spike sorting algorithms, the past 5 years innovations' brought about concepts, results, and questions worth sharing with even the non-expert user community. By thoroughly inspecting latest innovations in the field of neural sensors, recording procedures, and various spike sorting strategies, a skeletonization of relevant knowledge lays here, with an initiative to get one step closer to the original objective: deciphering and building in the sense of neural transcript.
Electrophysiology has been constructed on electrical properties of biological membranes provided by ion exchanges between extra- and intracellular fluids. This phenomenon gives rise to the presence of electrochemical gradient across every single eukaryotic cell membrane, the state of net equilibrium called resting membrane potential. Transitory perturbations of this balance-like state and the spillover feature of these sudden changes render excitable cells, such as neurons capable of electrochemical signal propagation. An action potential (AP) occurs when the resting membrane potential of a neuron, around −70 mV, is reversed toward positive values in about an ms and then restored (Raghavan et al., 2019). Regarded as a foundation stone in neurophysiology, recorded extracellular APs (commonly referred to as spikes) are the fingerprints of single neurons' activities, an observation that has been fueling neuroscientific research for almost a century (Carlson and Carin, 2019; Zhang and Constandinou, 2021a). Analyzing spike trains and spatiotemporal properties of extracellular AP waveforms provides us precious evidence of a cell's functional profile and morphology, including dendritic tree architecture, surrounding environment, and relative position of the recording site (Chaure et al., 2018; Rodriguez-Collado and Rueda, 2021; Soleymankhani and Shalchyan, 2021) and sheds light on the meticulously orchestrated functioning of neural networks (Leibig et al., 2016; Luan et al., 2018). Besides providing insight into brain activity at the highest temporal resolution currently available (Rey et al., 2015; Wouters et al., 2021), facilitating the “reverse-engineering” of the brain (Petrantonakis and Poirazi, 2017), extracellular APs are eagerly sourced in the development of brain-machine interfaces, too (Hammad et al., 2016).However, barely collecting APs does not reveal much information on representation among neural populations, activity correlations, let alone higher order brain functions (Lefebvre et al., 2016; Abbott et al., 2020; Valencia and Alimohammad, 2021). To bridge the gap between a signal [i.e., voltage change recorded as waveform or, ubiquitously speaking, a spike (Dallal et al., 2016)] and its actual meaning, one must detect neural activity and attribute it to its corresponding source neuron correctly, a process named spike sorting (Pachitariu et al., 2016; Pakman et al., 2020; Guzman et al., 2021; Pagin, 2021) (Figure 1). As it is commonly treated, at the heart of the spike sorting technique lies a clustering problem (Souza et al., 2019), but prior steps such as the subsequently presented spike detection, feature extraction, and alignment are inescapable aspects as well (Steinmetz, 2017).
Figure 1
From brain to spike. Gaining, sorting, and employing of spikes begin with the acquisition of neural signals, either from human or animal neural tissues. Microelectrode arrays and micro-assembled probes are two of the most frequently used modalities of neural tissue recordings. These recordings are routinely filtered in order to render them more accessible for the conventional spike sorting procedure. As a result, single unit activities would be recognized and finally organized into clusters based on similar morphology.
From brain to spike. Gaining, sorting, and employing of spikes begin with the acquisition of neural signals, either from human or animal neural tissues. Microelectrode arrays and micro-assembled probes are two of the most frequently used modalities of neural tissue recordings. These recordings are routinely filtered in order to render them more accessible for the conventional spike sorting procedure. As a result, single unit activities would be recognized and finally organized into clusters based on similar morphology.It is enough to look back on half a decade's spike sorting techniques to demonstrate the urging need for more efficient algorithms (Rossant et al., 2016). With the ever-growing number of recording sites capable of detecting thousands of APs simultaneously and computational power in our hands (Navratilova et al., 2016; Haessig et al., 2020), we must turn our attention to challenges that lay still unresolved or require fine-tuning (Mena et al., 2017; Zhang et al., 2018; Jurczynski et al., 2021).In this study, we aimed to outline up-to-date improvements in the field of extracellular neurophysiological data recordings and then present and compare the most elaborate and promising approaches in each spike sorting step. Next, we described some pivotal points of spike sorting that we consider worthy of enhancement. Finally, a set of probable further directions was summarized.
Data Acquisition: From Single Electrodes to Neuropixels Probes
Being aware of Moore's adapted law on neural recording devices, we might expect that the number of channels in which we can record simultaneously practically doubles every 7 years (Radmanesh et al., 2021). In light of this, we are inclined to expect that the number of spikes recorded should likewise grow steeply. For this assumption to be true, a series of circumstances should also be realized, such as our recording devices provide a decent signal-to-noise ratio and provide a quality that ultimately enables rejecting artifacts in a stable manner.Neural sensors should be able to detect voltage variations on a large scale (Saggese et al., 2021). Each electrode records by default extracellular de- and hyperpolarizations in the close vicinity of its nano-to-micrometer-wide tip (to a distance of about 140 μm in the case of a single wire electrode), and given the propagating nature of action potentials, these deflections would not only be APs assigned to a particular neuron, or single unit activity (SUA) but rather the spatio-temporal summation of a neural population in the close proximity, multi-unit activities (MUAs), and local field potentials (Hong and Lieber, 2019; Tambaro et al., 2021). Most spike sorting techniques discard local field potentials by simply high pass filtering data and concentrate on the spatial and temporal contexts of signal propagation (Abbott et al., 2020), although invasive brain-machine interface (BMI) systems could also possibly profit from this frequency range (Hammad et al., 2016).
Biocompatibility and Physical Considerations
To increase the quality of neurophysiological recordings, it seems to be a good idea to improve the biocompatibility of the recording tool, which means that its physical and chemical properties should approximate the ones of the intact neural tissue. One step forward could be an electrode coating or insulation with a boron-doped diamond material (Klempír et al., 2020) or polyethylene glycol (PEG) 4000 for increasing the rigidity of ultra-flexible probes (Guan et al., 2019; Vasileva and Bondar, 2021). Another promising material that unites strength and flexibility on a neural probe scale is carbon fiber, with increasing attentiveness in neurophysiological studies (Cetinkaya et al., 2018). Another procedure that decreases damage during implantation is the slow insertion of the recording devices. For even better results, administration of certain steroidal anti-inflammatory drugs or vasoconstrictive pharmakons, dura mater preparation and blood vessel-sparing penetration may be attempted (Fiáth et al., 2019a; He et al., 2020). Shank volume reduction is also proportional with tissue damage reduction (Musk, 2019). Nevertheless, elasticity has its own drawbacks when surgical procedures are considered; therefore, stiffening supports or even implantation shuttles can be of benefit (Wang et al., 2021). The latter solution would be of key importance when implantation targets dictate a high precision or vascular structures should be exceptionally spared (Fiani et al., 2021).There has been serious aspirations to decrease electrode impedance by applying special coatings to recording probes, for instance gold plating and polyethylene glycol additives (Kuperstein, 2021), poly (3,4-ethylenedioxythiophene) (PEDOT) (Saunier et al., 2020), or polystyrene sulfonate (Neto et al., 2018). However, it turned out that impedance control is of less importance when conceptualizing an electrode (Neto et al., 2018). Furthermore, impedance can be diminished by simply growing the surface of contacts, possibly entailing signal dissipation and decrease in amplitude (Camuñas-Mesa and Quiroga, 2013). Another more relevant aspect is that when recording potentials of interest, one should be concerned about the reference either theoretically set at zero potential or positioned far enough to be considered at least uncorrelated from recording probes (Jurczynski et al., 2021). Electrode geometry or channel density can also raise non-trivial surgical questions (Rivnay et al., 2017) but more importantly influence the quality of recordings, namely, recording sites located closer to the border of rigid silicon shanks are prone to s with a higher signal-to-noise ratio (Fiáth et al., 2021). Implementation of wider shanks has its two-faced features, too: the more volume covered and neural activity observed by the electrode, the more significant tissue scarring will be (Tóth et al., 2021).
Neural Sensors
Progress in electrode fabrication has not only yielded to multiply the amount of recording sites at a fast pace (Kim et al., 2018) but also finds the most suitable means for any electrophysiological experiment. Take for example 3D self-rolled biosensor arrays that are meant to collect data from three-dimensional cortical spheroid cultures (Kalmykov et al., 2021), with their high-density channel profile, Neuropixels probes are ideal for functional connectivity studies (Wang et al., 2021). In the following paragraphs, some of the most common sensor types used for electrophysiological recordings are presented.
Single Electrodes
Neural recordings may flawlessly be acquired by single, glass, or coated microwires, but their usefulness hinge on the tested experimental hypothesis. Whether dense-packed structures, for instance pyramidal cells of the hippocampus, are under investigation, plus population activity as local field potential brings sufficient information, individual electrodes constitute an ideal choice (Rutishauser et al., 2006). On the other hand, it is also evident that single unit activity can hardly be relieved by a single electrode recording because of the abundance of independently firing nearby neurons, without any knowledge of their spatial coordinates (Petrantonakis and Poirazi, 2017).
Tetrodes
The ensemble of four closely packed electrodes with the purpose of recording extracellular potentials defines a tetrode. Tetrodes may originate from simple copper wires (Lu et al., 2018) to even the more refined gold covering (Kuperstein, 2021) or quartz-coated platinum-tungsten alloy (Ravikumar, 2021), and their superiority compared to single electrodes stands in allowing to differentiate waveforms belonging to distinct neurons, thanks to their spatial configuration (Rey et al., 2015). Even so, tetrodes fall short when extracellular AP distributions must be investigated considering a complex set of attributes, such as distributions of multidimensional extracellular potentials. One should also bear in mind that tetrode usage as a recording facility technically culminates in animal studies (Vasileva and Bondar, 2021), since human applications are sparse (Despouy et al., 2020).
Polytrodes
Grading up from tetrodes, polytrodes, typically insulated metal wires (Francoeur et al., 2021) or micro-assembled silicon probes, have increased exponentially the number of neurons in a single observation (Neto et al., 2016). With a channel count ranging from 8 to 64, each one set 25–200 μm apart and arranged in 1–3 columns, U-/V- or S-probes are reliable tools for study on different cortical layers (Wang et al., 2021). As the time span of recorded data is increasingly substantial, it is recommended that beginning with polytrodes, data should be split into subsets, in a divide-and-conquer fashion (Swindale and Spacek, 2014; Lee et al., 2017; Diggelmann et al., 2018). Besides the probes by Plexon described above, there have been endeavors to upscale channel count mainly by placing electrodes as close as possible, thus keeping physical expansion of the channel ensembles at the bottom (Pimenta et al., 2021; Steinmetz et al., 2021; Wang et al., 2021). The rationale behind spatial oversampling or spacing recording sites so tightly that SUAs and background activity are clearly separable is not only to boost recording quality and neuron discriminability (Diggelmann et al., 2018) but to assess the accuracy of a newly engineered spike sorting algorithm without knowing the ground truth data (Zhang and Constandinou, 2021a).
Microelectrode Arrays
Standing as the pedestal for BMIs, microelectrode arrays (MEAs) are capable of direct signal gain or transmission (Ravikumar, 2021). Knowing that the traditional single glass electrode, then later its wire version was quickly replaced by a bundle of four wires, it was quite reasonable that there was a demand-and finally solution for microelectromechanical systems (De Dorigo et al., 2018) and finally, densely packed MEAs with 8, 32, 4,096 or even 11,000, 65,536 recording sites. Another direction for sensor development is to create a flexible, mesh-like grid of surface electrodes, namely, electrocorticogram (ECoG) arrays; this too serves as a potential interface for invasive BMIs without direct damage to brain tissues. There are MEAs that enable recording on a single side of the shaft, such as Michigan arrays, while Utah arrays receive signals from the tip of silicon needles (Kim et al., 2019).Objecting the vertical direction, Michigan probes are remarkably suitable for deep structure electrophysiological recordings. Their shaft length is usually between 2 and 15 mm (Choi et al., 2018; Ravikumar, 2021). Utah arrays, on the other hand, are 1.5-mm long, sharpened, and metal-layered silicon needles arranged in a matrix of 10 × 10 and are large enough to cover a 16-mm2 cortical, consequently apt, and approved by the United States Food and Drug Administration for clinical neurophysiology investigations (Saif-Ur-Rehman et al., 2019; Ravikumar, 2021; Sahasrabuddhe et al., 2021).Apart from these traditional arrays, non-conventional array architectures purpose effectiveness and biocompatibility through multiple strategies. Prevention of array shielding may be reduced by multiplanar, robust arrays (Shin et al., 2017), but folding arrays in an origami style may also increase the surface from where recordings are gained (Goshi et al., 2018). Spatial resolution may also be improved by creating tubular recording devices (Wang et al., 2017). Conic recording structures help in chronic stabilization of recordings by enabling tissues to grow inside perforations (Hara et al., 2016); and ECoG-like design (Fu et al., 2017), extreme volume reduction (Ereifej et al., 2018), or even self-softening materials (Hess-Dunning and Tyler, 2018) aims to reduce adverse tissue reactions (Kim et al., 2019).MEAs are excellent tools for long-term recordings, resulting in hundreds of relatively well-isolated single units (Chung et al., 2020). Nevertheless, we should not forget that detector/cell ratios that MEA arrays can provide are not always appropriate; hence, for tasks that require higher spatial resolution, probes may constitute a better choice (Negri et al., 2020).
Complementary Metal Oxide Semiconductors: CMOS Technologies
In the haste for superior recording instruments, lithographically printed and highly scalable probes turned out to be ideal candidates (Sahasrabuddhe et al., 2021). Being one of these, complementary metal oxide semiconductor (CMOS) applications amalgamated integrated circuits with recording electrodes, thus empowering probes with more compact input/output connections than ever (Hong and Lieber, 2019). CMOS probes maintain their compactness by local amplification and time-division multiplexing (Dimitriadis et al., 2018a) and support hardware acceleration processing by integration of application-specific integrated circuits or field programmable gate arrays at a very fair energy and space consumption rate (Schaffer et al., 2021); that is how they ensured the appearance of high-density microelectrode arrays (Dragas et al., 2017). An outstanding attempt are the Neuropixels probes: their silicon structure is made up of CMOS technology, with recording site numbers reaching up to 960 (Wang et al., 2021) or 5,120 (Steinmetz et al., 2021). Similarly grandiose projects made evidence that these types of probes are worth “scaling up” (Tsai et al., 2017; Sahasrabuddhe et al., 2021), reducing even more their inter-electrode spacing in Fiáth et al. (2019b) using them intracellularly (Abbott et al., 2020) or for the sake of neuromorphic computational paradigm (Milo et al., 2020).
Raw Output Data
The plethora of neural sensors does not entitle us to choose the recording parameters further discussed in a rather oblivious manner; on the contrary, we should strive to precisely select our targets within brain structures and their specific physiological conditions for data acquisition, as quality of spike sorting does heavily lean on this step (Hildebrandt et al., 2017). Considering technological factors such as sampling frequency (Irwin et al., 2016), referencing procedures (Jurczynski et al., 2021) and recording data that might be subdivided later (Hassan et al., 2021) are all influencing quality and computational cost. We should also take into account that not every channel would provide the highest quality signal possible, since electrodes detached from their amplifier may broadcast their data intermittently or simply get distorted by noise. For this case, real-time rejection of corrupted channels would be favorable (Swindale and Spacek, 2016; Li et al., 2020a).
Need for Compression
If we record with 1,000 channels at a sampling frequency of 40 kHz, an hour is just enough to produce 30 GB of electrophysiological data (Hadianpour et al., 2021); furthermore, a Neuropixels probe of 384 channels generates 90 GBs when sampling is set to 30 kHz. Unless virtually infinite storage capacity is in our hands, compression is what we should make use of. A more compact dataset is not just efficiently, but also speeds up the computational process (Rokai et al., 2021). Several methods have been described for data reduction up to a four-fold rate, including pure compression (Pagin and Ortmanns, 2018), thresholded signal transmission (Irwin et al., 2016), or the lately introduced on-chip spike sorting procedures (Saeed et al., 2017; Xu et al., 2019). Local data reduction may enable wireless broadcasting (Schiavone et al., 2020; Sahasrabuddhe et al., 2021; Voitiuk et al., 2021) but cannot handle massive multiplexing (Muratore et al., 2019); thus, large-enough on-chip memory is an absolute prerequisite (Yu et al., 2011): for instance, operating with 128 channels, with each of them generating 32.5 samples per second, would have a memory requirement of 768 to even 2,400 kbits (Park et al., 2017). Such a chip could be just robust enough for deep learning applications, namely, compressing inputs into an output that can be deconvolved on the receiver side (Wu et al., 2018b). Instead of transmitting each recorded sample, the mean difference between every two of them, delta compression, seems a reasonable routine (Mukhopadhyay et al., 2018; Chou et al., 2021; Pagin, 2021), although compressed sensing techniques can render data even more compact (Xiong et al., 2018).Sensor types can play an additional role in compression. Metal oxide memristive integrative sensors record and compress information in parallel (Gupta et al., 2016), while neuromorphic sensors are capable of event-driven recording and transmission, therefore improving temporal precision and reducing power consumption and data bandwidth (Liu et al., 2018; Soleymankhani and Shalchyan, 2021).
Digitalization of Neural Data
To be computationally analyzed, analog signals must be digitized, and analog to digital converters (ADCs) are just meant to solve the task. Various studies reported that for optimal spike sorting conditions, ADC resolution must be at least 7–8 bits (Zamani and Demosthenous, 2015; Liu et al., 2017; Pagin and Ortmanns, 2017). Moreover, logarithmic ADCs, as opposed to their linear counterparts, take advantage of small signals and more distributed dynamic range, just how neural recordings are designed to stand out.
The Common Spike Sorting Procedure
After data acquisition and its conversion to digital signal, the search and contextualization of extracellular action potentials follows. This mining-and-meaning procedure has been coined spike sorting and subdivided into a changing number of tasks, like waveform identification, feature extraction and low-dimensional re-representation, and, finally, projection-based group formation (Fournier et al., 2016). In different stages of spike sorting, we could refer to preprocessing (cleaning) and processing-per-se practices, but it is quite reasonable that for practical and computational reasons, even major steps tend to interweave.
Filters and Detectors
Prior to action potential detection, it is worth considering a filtering stage, as lower frequency local field potentials, mostly defined as frequencies below 300–500 Hz, may encumber further analyses (Issar et al., 2020). By this step, the quality of spikes should also enhance; hence, filters behave as balancing factors between incorrectly detected or discarded events even without previous thresholding (Zhang and Constandinou, 2021b). Take for example plain bandpass filtering (e.g., causal infinite impulse response filters), with the advantage of amplitude threshold detection: as feasible as it is, one should also deal with avoiding secondary phase distortion (Schaffer et al., 2021). Another yet computationally expensive non-linear filtering option is wavelet denoising, and the Haar mother wavelet is especially useful when background noise is unlike Gaussian distribution (Barabino et al., 2017; Baldazzi et al., 2020; Pakman et al., 2020). Statistical filtering is based on certain calculated parameters, like average absolute values or standard deviation of sample waveforms (Toosi et al., 2020), while reverse filtering ensures noise diminution by waveform encoding and restoration (Mizuhiki et al., 2020). If abrupt changes in particular data are suspected, particle filtering may confidently detect them, along with accepting the burden of greedy computational needs (Hu et al., 2018). Artifacts occur not just because of imperfect signal filtering but also human-induced signs like stimulation artifacts may contaminate data, and as these objects are highly structured artifacts, statistical filtering may circumvent this source of bias (Mena et al., 2017; Toosi et al., 2020). Despite most algorithms striding to suppress noise, some of them suggest highly contaminated snippet exclusion (Evangelou, 2020). At first sight paradoxically, introducing certain artifacts in the pre-emphasis (Ravikumar, 2021) with an optimum flicker noise intensity called stochastic resonance, signal detection rates significantly improve (Güngör and Töreyin, 2020; Güngör et al., 2021).As it can be seen, filtering and spike detection are the lead-in operations in spike sorting; therefore, the quality of feature extraction and clustering is greatly impacted by detection algorithm performance, but even if data have been vigorously curated, spotting spike candidates remains a challenge (Okkesim et al., 2021). Filters may be an excellent support for threshold crossing event detection algorithms (Yang et al., 2017; Saggese et al., 2021), although more complicated methods, such as correlation-based detection, wavelet decomposition (Gao et al., 2018), Bayesian shrinkage methods (Sousa et al., 2021), and Teager or smoothed non-linear energy operators may also profit from them (Pagin and Ortmanns, 2017; Tambaro et al., 2020). Once noise level is estimated, amplitude threshold value can be set to a proper value (Barabino et al., 2017), although dynamic changes in noise variance are principally neglected (Toosi et al., 2020), but where is the optimum for picking a threshold? By employing a three-to-five standard deviation threshold, most authors agree that spike prominence is correctly estimated (Laboy-Juárez et al., 2019), while others focus on loss minimization and push threshold values lower (Bigelow and Malone, 2021; Chou et al., 2021). Instead of declaring signal standard deviation as the event detection threshold, it may be reasonable to surge robustness against alternating firing rates and calculate with median values (by recognizing that high amplitudes or spiking activity could only represent a small fraction of the recorded data) (Pregowska et al., 2019). Therefore, an automatically set threshold value could be determined as follows (Quian and Nadasdy, 2004):where x stands for the filtered signal. Evaluated at , the approximation constant is the inverse of the standard normal distribution function, and threshold may be from a two- to four-fold value of σn (Quian and Nadasdy, 2004).Besides threshold crossing, plenty of algorithms enable action potential detection. Smoothed or common non-linear energy operators may be capable of sub-millisecond on-chip spike detection (Malik et al., 2016; Schaffer et al., 2017; Tambaro et al., 2021). Signal-to-noise ratio can be further augmented by amplitude-slope operators (Zhang and Constandinou, 2021b). By applying a Teager energy operator-detector on data, even higher noise levels are well-tolerated; thus, filtering stages may be skipped (Lieb et al., 2017). Another noise-resilient approach represents fractal analysis of neural recordings, and after concluding that segments containing spikes have inferior dimensionality compared to noise, spike detection can be achieved (Salmasi et al., 2016). As it can be candidly imagined, an action potential and its propagation in an extracellular space would not let the entropy content of the temporal dimension unchanged, so calculating it with a sliding window method proved to detect spikes with greater specificity (Farashi, 2018). Combined methods that filter and set threshold parallelly, with adjustable weights depending on the source, are signal-to-noise ratio optimal filters and proposed to reduce computational complexity and upgrade discriminating capability (Wouters and Kloosterman, 2019). As a next chapter in filtering and detection paradigms, neural networks with barely one hidden layer can fulfill the tasks of preprocessing and event detection (Issar et al., 2020).
Alignment
After successful data filtering and detection of action potentials, spike characteristics should also be explored and mapped. Before the very solution, which is feature extraction, it is reasonable to line up spikes in a way that may facilitate further processing and eventual visualization. This can be rendered by “binning” all spikes into a fixed length window, and then aligning them such that each spike has its temporal reference point, for instance maximum value or slope (Metcalfe et al., 2017; Valencia and Alimohammad, 2021). This method is uncomplicated and vital when opting for clustering alternatives but may fail when noise corruption is elevated (Valencia and Alimohammad, 2019). Such issues may be circumvented by upsampling data and, therefore, performing super-resolution alignment (Lee et al., 2017).
Feature Extraction
The main pillar for accurate signal decoding is inarguably finding distinctive features in spikes that practically reveal their source. This gain of waveform information is called feature extraction, where only the most critical elements, the so-called principal components, are retained for further assessment (Ravikumar, 2021), which means that later, dimensionality reduction also takes place (Mahallati et al., 2019). For maximizing spike sorting accuracy, it is reasonable to choose principal components wisely, preferably ones that are noise-independent (Soleymankhani and Shalchyan, 2021) and are distinctly discriminative (Lefebvre et al., 2016; Zamani et al., 2020b) but cheap at implementation (Zamani et al., 2020a); by this means, we can map neural data in an informative but lower dimensional space. There is also a difference between first (waveform amplitude) and second (slope of the waveform) principal components (Navratilova et al., 2016). Principal component analysis (PCA), as one of the most popular dimensionality reduction methods (Salmasi et al., 2016; Allen et al., 2018), constructs a matrix of the largest variation-containing orthogonal basis vectors in the feature space (Chen et al., 2021), but extensive computations and storage requirements are inevitable (Regalia et al., 2016; Yang et al., 2017).The must for unsupervised analysis had urged to think forward PCA. As a result, independent component analysis (ICA) has been created, which, similarly to PCA, has also benefited from redundancy reduction, improved signal to noise ratio, and as a final implementation, fastICA version reduced computation time, therefore turning the initial method into a suitable option for high density channel recordings (Leibig et al., 2016). While appropriate for high-density MEAs, ICA presumes that the set of sources does not outweigh the number of recording channels, and consequently fails when tetrode or low-density neural recordings are analyzed (Buccino et al., 2018). ICA generates both temporal and spatial redundancy, an advantage that can be brilliantly exploited by deep learning: convolutive ICA methods, therefore, are ready to extract features and cluster them in an unsupervised fashion (Leibig et al., 2016).There is another dissimilar branch of methods that prioritizes cutting back on hardware complexity and template matching (Laboy-Juárez et al., 2019) that is an increasingly popular alternative for clustering. Discrete derivatives (Zamani et al., 2018) or optimal wavelet transforms (Yang and Mason, 2017; Soleymankhani and Shalchyan, 2021), which are sub-band selective, can stand for filtering as well (Soleymankhani and Shalchyan, 2021), whereas zero crossing features (Oh et al., 2017) or first and second derivative spike features (Caro-Martín et al., 2018) are methods that can tackle this condition. These methods are concentrated on global features gripping waveform morphology similarities of action potentials, but local feature extraction, e.g., Laplacian eigenmaps, could constitute another strategy as well (Chah et al., 2011; Huang et al., 2021). Regardless of the choice of feature extraction algorithms, by the end of this step, a well-represented feature space should be received, mapping each spike snippet as part of a highly distinguished and densely populated area (Chung et al., 2017).
Clustering: The Core of Spike Sorting
The practice of categorizing spikes or their calculated features in such a way that their source neurons would be identical holds the name of clustering (Knieling et al., 2017). All the algorithms and technologies presented so far converge toward clustering, since the goal of decoding extracellular action potentials is acquired by this step. The ideal clustering algorithm runs real-time, implements sequential processing, it is fully unsupervised, but preferably as uncomplicated as clustering and parallel operations may be carried out on the recording device (Wood et al., 2004; Li et al., 2019; Toosi et al., 2020). For simplicity, clustering algorithms may be arranged into model- or non-model-based categories, admitting that even within these groups, algorithms highly differ from each other (Figure 2). This section intends to outline major clustering paradigm strategies without the ambition to compare all of them in terms of performance, execution speed, and other properties.
Figure 2
Feature classifications of the most widely used clustering algorithms. When choosing the ideal algorithm for a given dataset, it is straightforward to imagine a decision tree and consider which features are essential during the clustering process: supervised or unsupervised, and then the level of accuracy, even with acceptance of enormous computational costs.
Feature classifications of the most widely used clustering algorithms. When choosing the ideal algorithm for a given dataset, it is straightforward to imagine a decision tree and consider which features are essential during the clustering process: supervised or unsupervised, and then the level of accuracy, even with acceptance of enormous computational costs.Model-based or simply probabilistic approaches lean on a source-dependent spike probability distribution provided by a generative model: Bayesian methods, expectation maximization procedures, or maximum likelihood estimations are typical instances of this class. These tactics are unusually resilient to noise associations, and over and above cluster visualization is facilitated (Mahallati et al., 2019). Modeling a mixture of drifting t-distributions enables differentiating overlapping clusters from heavy tails (Shan et al., 2017), while hidden Markov models have been successfully utilized during joint detection and sorting analyses (Li et al., 2019); however, the foremost strain of computational requirements is faintly resolved. The threat of cluster overseparation is also considerable, especially when non-Gaussian clusters, in fact, are assumed to represent Gaussian distribution (Keshtkaran and Yang, 2017; Rezaei et al., 2021).Non-model-based methods, on the contrary, endeavor on classification tasks (Shan et al., 2017). With almost historical relevance, manual clustering is the most expository approach where obvious parameters like spike amplitude, duration, and channel location are applied as model substitutes (Sun et al., 2021). These methods have been gradually replaced by minimally supervised or supervision-free practices, one of them being k-means clustering, from the partitional subclass. Today, k-means dominates spike sorting protocols because of its lack of sophistication (Dallal et al., 2016; Fournier et al., 2016; Lefebvre et al., 2016; Park et al., 2020; Rácz et al., 2020), along with hierarchical techniques or graph-based, fuzzy logic, and density-, grid- and learning-based methods (Zhang et al., 2018; Veerabhadrappa et al., 2020). Hierarchical solutions are mainly represented by Euclidean distance-employing algorithms (Knieling et al., 2017), which are the base of optimal filter estimation methods, too (Hassan et al., 2020). Graph-based clustering has nearest neighbor interactions at the center, but spectral clustering (Huang et al., 2021) or super-paramagnetic clustering in the well-known wave_clus algorithm does also subscribe to this ground (Quian and Nadasdy, 2004). Fuzzy-C-means logic considers each action potential as a member of every cluster that has been delineated, and only their affinity degree makes decoding possible (Regalia et al., 2016). Density-based algorithms are the most analogous approaches with human clustering strategy, since they focus on agglomerated regions and their low-density belts in the feature space (Chung et al., 2017; Hilgen et al., 2017; Hennig et al., 2019). Learning-based clustering incorporates various means in the service of spike sorting, beginning from single-layer perceptrons to state-of-the-art spiking neural networks (Veerabhadrappa et al., 2020).Making up our mind for a certain clustering approach does not automatically discard correction procedures or even melting it with another algorithm. The minor corrections imply Laplacian eigenmaps, which boost k-means clustering accuracy (Chah et al., 2011) or subset clustering together with pre-whitening for parameter-free spike clustering (Diggelmann et al., 2018). Consensus and ensemble clustering uses the variability of distinct clustering algorithms (Fournier et al., 2016; Vitale et al., 2019; Zhu et al., 2020); similar contracted measures are Euclidean distance of scatter-plotted features (Berjis and Al-sulaifanie, 2020), k-means clustering combined with mean-shift strategy [with the advantage of calculating the number of optimal clusters and analyzing similarity degrees (Negri et al., 2020)], and hierarchical agglomerative clustering (Cleaver-Stigum, 2021).
Automated Strategies
Gradual replacement of manual spike sorting by software techniques anticipated the rise of fully automated algorithms (Barnett et al., 2016). This particular aspect of spike sorting attracted so much attention in the past half decade, that this section is dedicated entirely to processes where tedious and time-intensive human interaction found its alternative.Although most of the clustering algorithms presented previously are fully automated, many of them neglect aspects such as real-time application or opportunity to be uploaded on a chip that parallelly records and analyzes data. Neural network-based methods, however, are on the way to satisfy both criteria, with a promise of clinical applications (Radmanesh et al., 2021). But why do learning-based methods excel where human sorting efficiency is oftentimes inconsistent? Deep learning takes the advantage of non-linear relationship modeling, which means if associations between inputs and outputs are not straight-line, strategy finding patterns in these links might actually outperform algorithmic methods (Markanday et al., 2020; Guido, 2021).Automated algorithms are characterized by an unsupervised strategy, although exceptions exist, such as supervised training of a neural network, to be followed later by unsupervised execution or selecting meaningful channels before the sorting process begins (Saif-Ur-Rehman et al., 2021). It must be emphasized, though, that most learning-based algorithms perform a plainer form of spike sorting, namely, classification depending on what has been learned during the paramount process of training (Luan et al., 2018). Classification may be a clever option when runtime drop is a priority instead of near-maximum accuracy (Valencia and Alimohammad, 2019), thus enabling online sorting on a general-purpose computer or a chip with which neural data have been acquired (Schaffer et al., 2021). Therefore, to achieve storage reduction, neural networks may be shrunk to three layers of artificial neurons, where additional attention elements complete the network (Bernert and Yvert, 2019). Sequentially constructed algorithms, such as those building upon multiple basic dense layers (Mahallati et al., 2019; Yeganegi et al., 2020) and convolutional (Li et al., 2020b) and recurrent layers (Rácz et al., 2020) require an expansive repository, although by weights' and activation functions' binarization, complexity may be cut back (Valencia and Alimohammad, 2021), or parallelization by graphical processing units may take place (Tam and Yang, 2018). These layers may be constructed in different ways, mainly in order to mitigate or abandon the need for hand-labeled neural data throughout training: autoencoders (Weiss, 2019; Radmanesh et al., 2021; Rokai et al., 2021) or networks generated by adversarial (Wu et al., 2019; Ciecierski, 2020) or reinforcement learning paradigms (Salman et al., 2018; Moghaddasi et al., 2020) have successfully clustered features originating from noisiest datasets. Likewise, a more sophisticated learning-based method may even incorporate multiple steps of spike sorting, resolving detection, feature extraction, and clustering as a close-packed solution (Eom et al., 2021; Rokai et al., 2021), although manual curation is advisable (Horváth et al., 2021).Learning-based methods in the pay of automated spike sorting benefit a lot from additional remarks and optimization strategies (Table 1). If artificial neural networks perpetrate clustering, the optimal number of clusters may be estimated by Gap statistics (Tariq et al., 2019), and a method called Heuristic Spike Sorting Tuner even helps in selecting spatial or temporal features that ensure precise clustering (Bjånes et al., 2020). Regarding ideal input dimensionality, that is to say the number of specific features under analysis, studying four features are mostly sufficient for clustering (Hilgen et al., 2017).
Table 1
Presentation of the most used learning-based methods.
Learning based method
Particularities
References
MLP—multilayer perceptron
Non-linearly activating, fully connected nodes arranged in three or more layers
Park et al., 2020; Zamani et al., 2020b
CNN—convolutional neural network
Regularized MLP: finding hierarchies in data by sliding along convolutional kernels on the input matrix
Lee et al., 2017; Li et al., 2020b
RNN—recurrent neural network
Backpropagation of certain node outputs to previous layers for finetuning node values
Rácz et al., 2020
AE—autoencoder
Two-pillar architecture with non-recurrent neural network: 1. encoder encrypting the input; 2. decoder reconstructing the original input, based on the output of the encoder
Weiss, 2019; Eom et al., 2021; Rokai et al., 2021
GAN—generative adversarial network
Two-pillar architecture: 1. generative network creating samples for the evaluation performed by the, 2. discriminative network
Wu et al., 2019; Ciecierski, 2020
RL—reinforcement learning agent
Learning process based on maximizing reward after the action correctly executed
Salman et al., 2018; Moghaddasi et al., 2020
It is worthwhile to stress that the first three learning-based methods are the fundamental structures of modern artificial neural networks; moreover they may serve as building blocks when constructing the latter three.
Presentation of the most used learning-based methods.It is worthwhile to stress that the first three learning-based methods are the fundamental structures of modern artificial neural networks; moreover they may serve as building blocks when constructing the latter three.Even with greatest circumspection during spike sorting, clustering quality must be inspected, and to solve the needs, ground truth-containing datasets have been created. These data entail a ground for fair comparison between spike sorting algorithms in terms of accuracy, speed, and memory requirements (Figures 3, 4).
Figure 3
(A) Algorithm accuracy vs. the number of units found above the accuracy threshold of 80%. (B) Algorithm accuracy vs. time needed for computation. Algorithms were tested on the Hybrid_Janelia dataset, with a minimum SNR of 10.
Figure 4
Comparison of clustering algorithms based on their accuracy achieved in the wave_clus dataset. We selected novel state-of-the-art algorithms that cannot yet be evaluated through the SpikeForest framework, but their performance in the wave_clus dataset has been made available with their publication. Generally, 16 samples of this dataset are applied during validation and divided into the subgroups “easy1,” “easy2,” “difficult1,” and “difficult2,” which appear in our x axis as “e1,” “e2,” “d1,” and “d2,” respectively. The values after the underscore reveals the noise content of each simulation, i.e., _005 means a 5% noise contamination, whereas _01 10, _015 15, and _02 20% sequentially.
(A) Algorithm accuracy vs. the number of units found above the accuracy threshold of 80%. (B) Algorithm accuracy vs. time needed for computation. Algorithms were tested on the Hybrid_Janelia dataset, with a minimum SNR of 10.Comparison of clustering algorithms based on their accuracy achieved in the wave_clus dataset. We selected novel state-of-the-art algorithms that cannot yet be evaluated through the SpikeForest framework, but their performance in the wave_clus dataset has been made available with their publication. Generally, 16 samples of this dataset are applied during validation and divided into the subgroups “easy1,” “easy2,” “difficult1,” and “difficult2,” which appear in our x axis as “e1,” “e2,” “d1,” and “d2,” respectively. The values after the underscore reveals the noise content of each simulation, i.e., _005 means a 5% noise contamination, whereas _01 10, _015 15, and _02 20% sequentially.
Alternatives for Clustering
Clustering banks on feature space construction, which means that a two-step process may hide unanticipated problems. With this point of view, reducing the number of steps to 1 and introducing the concept of template matching hold great promises (Yang et al., 2017), especially when templates are interpreted in a Bayesian context (Franke et al., 2015). However, running templates through whole target signals in search for best-matching units is quite a chronophagous routine and remarkable when objected neurons are in abundance (Chen et al., 2021). As it could be speculated, these approaches are worthwhile when neural recordings are less compound, namely, electroneurogram/electromyogram decoding may be executed with this type of pattern recognition (Noce et al., 2019). By repeating spike sorting on the same data, normalized template matching methods guarantee an additional 40–70% detection of spikes; however, computational costs should also be a concern (Laboy-Juárez et al., 2019).
On-Chip Spike Sorting
Spikes may even be decoded with a recording device, and power consumption; data quantity will also benefit from it (Liu et al., 2018). Contemporary tools to fulfill this task are mainly based on field-programmable gate arrays or application-specific integrated circuits (Barnett et al., 2016). Nevertheless, microcontroller units (Schiavone et al., 2020) and system-on-a chip devices (Liu et al., 2017) are increasingly popular. These chips, however, must be trained in preceding spike sorting procedure for successive fine tuning and accurate execution of functions (Zeinolabedin et al., 2016; Saeed et al., 2017). This approach is also vital in the field of neuromorphic computation, ignited by very large-scale integration technologies (Mukhopadhyay et al., 2018); there has recently been an implementation for 65,536 simultaneously recording and stimulating electrodes (Tsai et al., 2017). Similarly, extensive neural recordings are preferentially processed in batches and then subdivided into bins given the resistive state that calls for proper noise estimation (Gupta et al., 2019).Clustering itself can be any algorithm from those introduced in the general clustering section, but it is worth considering the computational bottleneck of recording front ends. As it could be seen, most non-model based clustering algorithms are intended to capture specific geometric features; therefore, finding the most prominent ones could also cut back on the number of features under analysis (Shaeri and Sodagar, 2020). Event-trace, template-value differences add a temporal dimension to the template matching procedure, so they can further reduce the number of comparison units (Haessig et al., 2020). Learning-based methods may also control this circumstance at a superior level; hence, it is sufficient to upload a ready-to-use, modest as possible pretrained artificial neural network (Valencia et al., 2019). In this subject, hardware-embedded spiking neural networks are the greatest novelty, mostly owing to their feasible adaptation to data recorded on the fly (Werner et al., 2016).
Toolboxes and Software Suites for Spike Sorting
The ever-increasing need for spike sorting has led to a wide range of open-source electrophysiological platforms, frameworks, and software (Pachitariu et al., 2016). Any related project is either aimed to bring spike sorting closer to a user who has little knowledge of the procedure or provide a space for methodology comparison and dataset generation. It is also required to support a wide range of data formats, as well as oscillating quality and length of the recordings (Swindale et al., 2017). Here, we briefly present novel toolboxes from the past 5 years that can be applied straightforward even by non-expert users.For data acquisition, the Parallel Ultra-Low-Power (PULP) platform merges the process of data acquisition and single unit detection under a single benchwork (Schiavone et al., 2020). Another software, Neural Parallel Engine, is useful when the execution speed of spike sorting algorithms is crucial, since it enables highly parallelized computational process through graphical processing units (Tam and Yang, 2018). Neurophysiological data can be curated and further analyzed with Phy, a graphical user interface operating with Python. Another manual curation-supporting graphical user interface is based on t-student stochastic neighbor embedding (Dimitriadis et al., 2018b).Python frameworks have also been created for complete spike sorting procedures such as OpenElectrophy (Rosenberg and Horn, 2016), herding_spikes (Muthmann et al., 2015), NeoAnalysis (Zhang et al., 2017), tridesclous, and spyke. All of them support multichannel architectures. At the center of MountainSort, a density-based clustering approach stands, namely, ISO-SPLIT; this suite is also open-source (Chung et al., 2017).SpikeInterface is a framework that not only offers algorithms for spike sorting, but most recent sorters can be used interchangeably (Buccino et al., 2020). Similarly, Spikeforest is also available for a wide range of sorting approaches; moreover, their comparison has never been easier because of its intuitive graphical user interface (Magland et al., 2020). Spikeforest, which may be embedded into the SpikeInterface environment, evaluates algorithms automatically and systematically based on some of the most well-known datasets supplied with ground truth. The Spike-Sorting Evaluation Initiative and the 1st INCF Workshop on Validation of Analysis Methods (Denker et al., 2012) have pushed the spike sorting community toward sharing essential data for algorithm evaluation, but cloud computing as a helping feature is also considered (Mahmud and Vassanelli, 2019).When searching for unified computational for a more specific use, P-sort is a unique pipeline and software, destined to sort cerebellar single unit activities (Sedaghat-Nejad et al., 2021). CellExplorer suits for single-neuron characterization and signal visualization are for those who are confident with MATLAB (Petersen et al., 2020), whereas Big Neuronal Data Format (BNDF) emphasizes large-scale data processing and reducing overall runtime (Hadianpour et al., 2021). Combinato, on the other hand, is written in C/C++ and enables dealing with long-term, noisy recordings mostly in an unsupervised fashion (Knieling et al., 2016). For algorithm validation, SHYBRID is a surrogate platform when ground truth information is expected, offering hybrid data as an evaluation tool (Wouters et al., 2021).Another road to unburdening spike sorting is to simulate datasets in which algorithms can be trained, validated, and tested. ViSAPy (Hagen et al., 2015) and MEArec (Buccino and Einevoll, 2021) are Python-based, whereas Neurocube (Camuñas-Mesa and Quiroga, 2013), Neural Benchmark Simulator (Mondragón-González and Burguière, 2017), and SHYBRID (Wouters et al., 2021) are MATLAB-implemented frameworks (Figure 5).
Figure 5
Enlisting signal processing toolboxes based on their most useful functions. “Signal acquisition” box: a wide variety of toolboxes treating difficulties of the recording or signal generation process, most of them compressing or prefiltering data for further steps. “Data curation + data format” box: software indicated for preprocessing previously recorded signals, most of them recommended even in the case of manual spike sorting. The “sorting” box lists the most used and trusted spike sorting algorithms or algorithm collections that can be applied to various datasets. Finally, the “algorithm evaluation” and “algorithm comparison” box offers help when a custom algorithm needs validation or measuring against the algorithms itemized in the “sorting” box.
Enlisting signal processing toolboxes based on their most useful functions. “Signal acquisition” box: a wide variety of toolboxes treating difficulties of the recording or signal generation process, most of them compressing or prefiltering data for further steps. “Data curation + data format” box: software indicated for preprocessing previously recorded signals, most of them recommended even in the case of manual spike sorting. The “sorting” box lists the most used and trusted spike sorting algorithms or algorithm collections that can be applied to various datasets. Finally, the “algorithm evaluation” and “algorithm comparison” box offers help when a custom algorithm needs validation or measuring against the algorithms itemized in the “sorting” box.
Arising Challenges
There are several factors that may emerge during data collection, and despite being often predictable, they may complicate spike sorting. In the search for accurate and real-time functioning, computationally efficient algorithms, difficulties such as involuntary drifts, temporally coincident, overlapping, spikes, or even obstacles given by the ever-increasing recording capacity are generated. Disregarding energy consumption especially when implantable devices are considered can lead to obstacles in practice (Mukhopadhyay et al., 2018; Okkesim et al., 2021). Last but not least, we should also engage in assumptions that are made before recording, since electromagnetic field theory presumes extracellular space isotropy and homogeneity; however, e.g., privileged cell orientations and the bare existence of neural probes in a tissue render simple models inaccurate (Buccino et al., 2019). These notions are generally applicable within the gray matter, since one can easily admit radical differences in white matter signal propagation.
High Channel Counts
By increasing channel counts and covered brain areas, we get an insight into more neurons' activities in parallel. This means that statistical models based on such recordings would also hold a promise for superior neural population activity interpretation (Hurwitz et al., 2021). Nevertheless, computational costs escalate almost exponentially with channel numbers, and the number of wires that send signals forward would also grow, except for methods employing controllable switches to attach recording sites with single wires (Lee et al., 2021). Increasing channel density and adopting a divide-and-conquer processing strategy (Chen et al., 2021) allows for effortless detection of duplicated or overlapping spikes (Larionov et al., 2019; Chou et al., 2021; Dehnen et al., 2021) and provides more detailed spatial information on action potential sources (Rácz et al., 2020). Meanwhile, by increasing the number of spikes sorted out, false positive or negative detections become an imaginable source of errors: a small quantity of mistakenly identified spikes perturb firing rate or interspike interval values (Chiarion and Mesin, 2021).
The Drifting Dilemma
For a short time, almost every detected spike waveform keeps its original shape, which is the pseudo-stationary stage of the recording, and any occurring signal variability is mainly given by Gaussian-distributed noise (Yu et al., 2011). On the other hand, in vivo and sometimes in vitro setups suffer from small changes in electrode positions, even from the beginning of recordings, when, e.g., perceptible discrete brain sample dislocations that may impact initially recognized waveforms to an extent that clusters may split or even intercept each other and form new but misclassified groups of action potentials (Gong et al., 2016; Harris et al., 2016; Chaure et al., 2018). The most plausible solution for probe drift is to track and update primordial neuron templates (Lee et al., 2017) or handle data as independent batches, and then ultimately merge alike-looking clusters, but considering spikes with a mixture of drifting t-distribution character can also eliminate drift-suggesting heavier tails of clusters on data representation (Shan et al., 2017). Spike amplitude change may also be estimated with recursive least-squares, but its utility is mostly demonstrated by low-count channel recordings (Davey et al., 2020).
Overlapping Spikes
Overlapping spikes frequently cause a problem for spike sorting algorithms: when different neurons fire in such a restrained time window that their waveforms overlay (Rey et al., 2015), the features of their subcomponents extracted for clustering cannot be applied. It is reasonable and right to suspect that the extent of overlap defines the trouble of accurate classification; yet quite paradoxically, evidence shows that firing rate and spike-train correlation levels do not infer with algorithm performance change (Garcia et al., 2021). If data suggest that these events, by incidence, are sufficiently low, simply censoring spikes with double peaks may overcome the problem, and for detections that represent only a mild overlap, deconvolution can be employed (Li et al., 2020b). Others emphasize the ubiquity of overlapping spikes especially when recording with high-density channel count probes and urge for specific algorithms committed to defeat misclustering. A straightforward approach suggests resolving the overlap in the feature space by treating problematic spikes' feature vectors as linear superpositions (Wouters et al., 2020), enhanced by one-hot encoding (Wouters, 2020), or quite the contrary, by fusing features in behalf of dimension number and missing information reduction (Li et al., 2018), while others suggest adding an extra step of combining pair-wise action potential waveform templates at various time shifts for refining sorting accuracy as much as 30% (Mokri et al., 2017). Additional alternative strategies are biogeography-based optimization (Chiarion and Mesin, 2021), using blind source separation methods (Leibig et al., 2016) or close examination of each spike cluster's center, which is equally close to another two midpoints (Wouters and Kloosterman, 2021), by automated template merging (Chen et al., 2021). Sparse representation or compressive sensing of neural data performs peculiarly well when spike waveforms are alike; therefore, overlapping spikes can be resolved by this means (Wu et al., 2018a; Huang et al., 2020). Wavelet Packets Decomposition and Mutual Information (WM sorting) is a clustering algorithm specially designed for overlapping spikes outperforming most of the methods presented here; nevertheless, its computational intensity generates doubts about real-time applications (Huang et al., 2019).
Neural Bursts
Noticing in a recording channel a set of action potentials, with short, ~3–5-ms interspike intervals, mostly similar in shape but decreasing in amplitude, conventionally means there is a neural burst or “complex spike” (Rey et al., 2015; Evangelou, 2020). Emitted usually by pyramidal cells, in the standard signal generator circuit and by interneurons in deeper structures, the latter ones act by suppression (Gainutdinov, 2021). But why do they constitute a problem when spike sorting is at stake? First, decreasing amplitudes may hamper detection; moreover, this decrescendo may be the source of falsely created separate clusters; second, they can mimic other problem sources: complex waveforms may originate from overlapping spikes (Rácz et al., 2020), and diminishing may be due to electrode drifting. Bursting neurons can be handled by assigning them a burst index (Sun et al., 2021) or a special spike label (Kapucu et al., 2016), or by simply decomposing a spike train and its successors into a parent wave (Chung et al., 2017).
Long-Term Recordings
The ability to achieve recordings from freely moving animals with a promise of proper neural decoding over months, hopefully with the same device, is the holy grail of spike sorting (Shan et al., 2017). Long-term recordings, therefore, comprise all arising challenges that could be met throughout spike sorting, but the main problem is represented by unit instability, when new-found spike waveforms replace former ones; hence, most template matching strategies fail (Toosi et al., 2020). Unit firing rate variations, neural plasticity, or loss of recorded neurons in the long run may also influence sorting accuracy (Okun et al., 2016; Xiao et al., 2019; Vasileva and Bondar, 2021), although large-scale, multisite platforms may reconcile even these obstacles (Chung et al., 2020; Vasileva and Bondar, 2021). In the long run, neural sensors may determine leptomeningeal proliferation and fibrosis or even foreign body reaction, all these possibly leading to device encapsulation, further deteriorating recording quality (Szymanski et al., 2021).
The “Dark Neuron” Problem: Scarcely Firing Cells
It has been widely debated why generally, a recorded neural activity is sparser compared to anatomy-based expectations (Ahmadpour et al., 2019), However, research may suggest that the “dark neuron” problem could be accounted to neural sensors, since their de facto electrode sensibility is usually lower than their anticipated value; therefore others caution on the non-negligible nature of subthreshold signaling and urge for pushing further technical limitations (Shmoel et al., 2016).
Validation of Spike Sorting Algorithms
It is a common consensus that any novel algorithm must at least show a better performance than its predecessors; however, inventing custom metrics, for instance isolation, noise-overlap, or cluster signal-to-noise ratio, may bias performance evaluation (Chung et al., 2017). To prevent this situation, evaluation metrics should be kept as simple as possible, relying on accuracy, precision, recall, or F1 scores (Veerabhadrappa et al., 2020). However, profiting from external criteria indices, such as the Jaccard index, that can compare data clustering structures and internal criteria indices, e.g., the silhouette coefficient or Calinski-Harabasz index, which is also meant to predict the optimal number of clusters, can all be beneficial (Zhang et al., 2018; Toosi et al., 2020) (Table 2). Another heuristic alternative for clustering accuracy is to measure its stability, which means that performance is evaluated in terms of perturbation introduction to the testing dataset (Carlson and Carin, 2019). Besides algorithm validation, performance of neural sensors must also be characterized: additional optical imaging with high spatial-temporal resolution can settle both aspirations (Aqrawe et al., 2020).
Table 2
Formulas of the most common clustering metrics and indices.
Performance quantification:• relies on ground truth• based on confusion matrix results
External criteria indices• relies on ground truth• compares calculated labels with actual ground truth
Accuracy=TP+TNTP+TN+FP+FN
Jaccardindex=|Dground∩Dcalc||Dground∪Dcalc|
Precision=TPTP+FP
Internal criteria indices• requires no ground truth• gives information about cluster compactness and separation
Recall=TPTP+FN
Silhouetteindex=d(x,n¯)-d(x,s¯)max(s¯,n¯)
F1score=2×Precision×RecallPrecision+Recall
Calinski-Harabaszindex=tr(Bk)tr(Wk)×nD-kk-1
TP, true positive; TN, true negative; FP, false positive; FN, false negative; Dground, dataset of the ground truth labels; Dcalc, dataset of the calculated labels; x, given sample point; .
Formulas of the most common clustering metrics and indices.TP, true positive; TN, true negative; FP, false positive; FN, false negative; Dground, dataset of the ground truth labels; Dcalc, dataset of the calculated labels; x, given sample point; .
Ground Truth
Validation of spike sorting performance is almost inconceivable without ground truth. This concept is based on the a priori knowledge of an action potential source, more specifically, sensing when and which neuron in the neighborhood was active (Neto et al., 2016). Ground truth data generation involves human interaction to a certain extent, turning it to a supervised process (Wouters et al., 2021).What can be done when “ground truthing” is not an option? The most straightforward and popular method is to generate synthetic data and, with these, access ground truth (Buccino and Einevoll, 2021). Hybrid ground truth is accessible by spatially oversampled data acquisition or synthesis (Pachitariu et al., 2016; Scholvin et al., 2016; Wouters et al., 2021), although enriching data with formerly isolated or artificial spikes is also an option, coined hybrid method (Yger et al., 2018). Some classical approaches include manual labeling or simultaneous recording of intra- and extracellular activity, with the latter providing ground truth regarding a single neuron (Fournier et al., 2016; Diggelmann et al., 2018; Abbott et al., 2020). Recently, an automatic patch clamp technique in vivo lightened the workload, but human intervention cannot yet be erased (Kodandaramaiah et al., 2016; Allen et al., 2018). Simultaneous juxta-and extracellular recordings are also applied instead of in vivo patch clamping (Hunt et al., 2019; Magland et al., 2020; Urai et al., 2021). An appealing alternative for compensation of ground truth is to follow over the long run extracellular action potential propagation alongside single axonal arbors and, therefore, assessing their correlation with relatively stable extracellular action potentials; empirical ground truth can be obtained (Tovar et al., 2018). A last compromise for lacking ground truth can stand in the application of the internal criteria indices presented previously (Zhang and Constandinou, 2021a). This choice may be shadowed by certain tacit assumptions of internal criteria applications, such as the Gaussian nature of noise distribution.
Simulated Datasets
For the past 15 years, reclining on the well-known wave_clus dataset (Quian and Nadasdy, 2004) has proved to be a reliable source for algorithm validation; moreover, its prevalence for this scope assured a feasible and fair comparison ground among clustering procedures. Analogous but less-known synthetic datasets also grant for correct validation (Rutishauser et al., 2006; Pedreira et al., 2012; Rossant et al., 2016). It is becoming increasingly popular to generate custom datasets, enabling to set various complexity levels or recording site geometries (Smith and Mtetwa, 2007; Camuñas-Mesa and Quiroga, 2013; Hagen et al., 2015; Mondragón-González and Burguière, 2017; Buccino and Einevoll, 2021), but when producing synthetic waveforms that ideally mirror recorded ones, one should also pay attention to cell morphology, membrane ionic channel configuration, and density (Tran et al., 2020). Paying attention to previously lesser-considered aspects like layer-in homogeneity and dependence of frequencies resulted in construction of trailblazing dataset-simulating environments (Gherardi and Toreyin, 2021) (Table 3).
Table 3
Most relevant and widely used synthetic datasets for spike sorting algorithm validation.
References
Specifications
Quian and Nadasdy (2004)
wave_clus dataset: activity of 3 simulated neurons over 4 difficulty levels. The noise level is well-defined in each case.
Rutishauser et al. (2006)
3 simulated datasets each containing 3 neurons with the following scopes:• 1st: parameter evaluation for the algorithm tested• 2nd: limits of detectability• 3rd: limits of discriminability.
Smith and Mtetwa (2007)
Biophysical model incorporating signals that closely mimic in-vitro single unit activities, adding non-Gaussian noise and featuring spatial configuration characteristics between neurons and recording electrodes.
Pedreira et al. (2012)
95 simulations containing from 2 to 20 neurons' action potentials. The background activity appears as multiunit activity of varying weighting.
Camuñas-Mesa and Quiroga (2013)
A tetrode simulation with neural and thermal noise incorporated, three-dimensional aspect enhanced by the “neurons inside a cube” concept.
Hagen et al. (2015)
VisaPy: in vivo mimicking of the multi-compartmental neuron model. A Python-based software is also available.
Rossant et al. (2016)
Klusta: test dataset available for the algorithm proposed in the same paper.
Tran et al. (2020)
NEURON-based model software employing morphological filter for signal accuracy.
Gherardi and Toreyin (2021)
NEURON-based model software with line-source approximation improvements.
Buccino and Einevoll (2021)
MEARec: signal generation considering problematic spike aspects, several electrode designs
Note that while pioneering data have been constructed to test a certain algorithm; these days, data-synthesizing types of software prevail, with all their custom implementations.
Most relevant and widely used synthetic datasets for spike sorting algorithm validation.Note that while pioneering data have been constructed to test a certain algorithm; these days, data-synthesizing types of software prevail, with all their custom implementations.
In vivo Datasets
Despite lacking ground truth information, in vivo recorded datasets are highly valuable, because all features that simulated datasets need to be fulfilled are implicitly present. To mention a few, for rat cortical recordings, there are 32/128 channel-polytrode (Neto et al., 2016), 128 (4 × 32) array (Horváth et al., 2021) and Neuropixels-patch clamp combined recordings (Marques-Smith et al., 2018) available. A dataset recorded by Utah arrays on non-human primates executing different tasks is also suitable for single-unit activity clustering (Brochier et al., 2018). Human data are regularly acquired from patients whose epileptic seizure onset zone is under investigation. Among these, amygdala neurons during visual/emotional stimulation (Fedele et al., 2021) or medial temporal lobe populations under verbal working memory tasks are recorded by intracranial EEG and technically validated later (Boran et al., 2020) (Table 4).
Table 4
Most recent in vivo datasets suitable for validation of spike sorting algorithms.
References
Source
Recording device
State or task
Dataset
Neto et al. (2016)
Rat cortex
32- or 128-channel polytrode
Under anesthesia
23 neurons with juxtacellular pipette + nearby electrode
Marques-Smith et al. (2018)
Rat cortex
Patch clamp next to Neuropixels probe
Under anesthesia
43 paired recordings
Boran et al. (2020)
Human medial temporal lobe
Depth electrode with 8 contacts
Working memory task
9 patients, 37 recording sessions
Brochier et al. (2018)
Macaque motor cortex
10-by-10 Utah arrays
Reach-and-grasp task
2 macaques, 93 and 156 single unit activities
Fedele et al. (2021)
Human amygdala
Depth electrode with 8 contacts
Various emotional situations
9 patients, 14 amygdala recordings
Horváth et al. (2021)
Rat neocortex
128-channel polytrode
Under anesthesia
20 rats, 109 recordings, 7,126 sorted spikes
While there is a clear aspiration for acquiring continuous long-term recordings with as many channels as possible, and preferably although not necessarily from humans, all criteria cannot always be met. However, datasets focusing on just one of these aspects can reveal important strengths and weaknesses of a spike sorting algorithm under validation.
Most recent in vivo datasets suitable for validation of spike sorting algorithms.While there is a clear aspiration for acquiring continuous long-term recordings with as many channels as possible, and preferably although not necessarily from humans, all criteria cannot always be met. However, datasets focusing on just one of these aspects can reveal important strengths and weaknesses of a spike sorting algorithm under validation.
Discussion
On the Necessity of Spike Sorting
Spike sorting is not an art for art's sake protocol, since its applications are visibly boosting contemporary neuroscience: it has become essential in extraction of individual neuronal activities from multi-electrode data, since each electrode reports the collective activity of multiple nearby neurons. But why is carrying out this task cardinal? The fact that neighboring neurons are not necessarily the nearest in connections, i.e., are activated by different pathways or stimuli (Rey et al., 2015) mainly owing to information processing and energy optimization through structural solutions (Pregowska et al., 2019) calls the promise of neural decoding. This concept, although not equal with spike sorting, relies heavily on it and undertakes the risk of bias and wrong intensity estimation generated by spike sorting (Shibue and Komaki, 2017). Spike sorting also incites statistical analyses, involving correlogram analysis, inter-spike intervals, or spike rates (Veerabhadrappa et al., 2020).When we talk about spike sorting we classically imply manual sorting alongside automated methods (Dai et al., 2019). Given its time-consuming feature and potential for subjective bias (Febinger et al., 2018), manual sorting has been mostly overtaken by history; consequently, there is a sore need for fully automated algorithms (Chah et al., 2011). High-density microelectrode arrays foster progressing classification accuracies; however, computational capacity should also fall in line with recording performance (Chen et al., 2021). Several authors, most of them stressing the expenses of calculation, argue against the pertinence of spike sorting. As it may be expected, alternatives for spike sorting all have their strengths and limitations. According to those who subscribe to rather overtake spike sorting, when the sum (Li and Li, 2017) or moments (Sonia et al., 2014) of waveform features are calculated, spike sorting can be omitted for motor imagery task neural decoding; however, even these methods fall short of real-time reconstructions. Similarly, frequency spectrum maps together with temporal energy heatmaps can predict imaginary finger movements but in a well-defined force amplitude interval (Xu et al., 2020).Spike sorting indeed does not have to be compulsory when population activity is targeted (Trautmann et al., 2019); therefore brain-computer interface systems that engage in multi-unit activity may settle for less complex preprocessing techniques. Such a method is “binning,” by which firing rates are estimated in a fixed time window (Ahmadi et al., 2020), and with marked point models (Yousefi et al., 2020) or interspike interval histograms combined with power spectrum density estimation, complete firing patterns can be investigated (Guo et al., 2020): these methods are advantageous when tuning curves of single neurons may be distributed bimodally, such as in the case of murine head direction cells (Liu and Lengyel, 2021). Whenever applying a clustering-free method, one should keep in mind that its goodness-of-fit evaluation may differ from mainstream clustering algorithms (Tao et al., 2018). Nonetheless, at the level of individual cells, properties cannot be completely evaluated with spike sorting algorithms (Rossi-Pool and Romo, 2019). To conclude, an eventual conflict between spike sorting methods or data should not discourage us from conducting spike sorting whenever there is a clear indication of employing it.
Implementations of Spike Sorting
As the demand for spike sorting is clear-cut, we considered further stressing the relevance of these sets of algorithms by presenting research fields that greatly benefit from spike sorting. Spike sorting is mostly regarded as the essential step toward functional brain-machine interfaces and micro-electronic neural bridges (Huang et al., 2016), but its relevance is substantial in epilepsy research (Neumann et al., 2017; Richner et al., 2019) or the study of sleep (Kozák et al., 2020; Matsumoto, 2020). Besides common-term primary motor cortex single unit activity, clustering in areas where visual coding or multisensory integration takes place is an intuitive approach in vision and adaptive behavioral studies (Reber et al., 2019; Mizuhiki et al., 2020; Steinmetz, 2020). Cerebellar spikes should also be detected and clustered; nevertheless, the latter task is rendered more difficult by the intricate morphology of Purkinje cells (Markanday et al., 2020). Complex spikes can also be met in the thalamus, and spike sorting is just a way to learn about its electrophysiological properties (Pastor and Vega-Zelaya, 2020). In the subthalamic nucleus, clustering of spike trains may help to understand the pathophysiology of movement disorders (Kaku et al., 2019; Sukiban et al., 2019), while the basolateral amygdala or the hippocampus can offer ideas about general neural interactions (Hojjatinia et al., 2020; Oghazian et al., 2020). Correlations calculated after vigorous spike sorting in multichannel data gave rise to a promising neural encoding capacity hypothesis (Isbister et al., 2021) and neural populational activity dynamics (Theilman et al., 2021).Furthermore, spike sorting may be just as pivotal outside of the brain. Remaining at the level of the central nervous system, superficial dorsal horn spinal cord neurons reveal much about spinal plasticity, and sorting their activities may ensure isolation of relevant populations (Smith et al., 2020). For peripheral nervous system recordings, discrimination and identification of APs cannot lack spike sorting either (Metcalfe et al., 2021). Spike sorting is required to isolate retinal ganglion cells based on their multiunit activity recordings (Tsai et al., 2017; Pérez-Ortega et al., 2021) and identify their electrical responses (Li et al., 2021), but non-neural tissues can also benefit from it by applying spike sorting algorithms, i.e., on pancreatic biosignals (Iniguez-Lomeli et al., 2021).Spike sorting has its own advantages during in vitro experiments, too. Cerebral organoids, also known as Minibrains, and spike sorting together enable studying neuro/gliogenesis and connectivity (Govindan et al., 2021), while spike sorting in its intracellular variations can elucidate synaptic plasticity mechanisms (Ghanbari et al., 2017). Recently, spike sorting proved to be inescapable when performing optogenetic stimulation on MEA-supported brain slices (Sacher et al., 2021).Up to this point, we presented spike sorting as a complete procedure that seeks to assign a signal to a particular source, but regarding it as a piece of puzzle toward another problem solution is also valid. Spike sorting can be embedded in synaptic connectivity estimation algorithms, thus helping construct neuronal circuit diagrams (Endo et al., 2021). By combining spike sorting with phasic unit selection, we can recognize firing patterns in structures that have timekeeping properties (Chrobok et al., 2021).
Past Conclusions, Current Improvements, and Future Ideas
Our study intended to outline some of the most relevant issues that shape current spike sorting trends. Despite the endless attempts and various strategies, obvious spike sorting standards and algorithm comparison methods are still deficient, but any effort toward unified spike sorting frameworks and accuracy quantification metrics should be cheered (Rey et al., 2015; Smith et al., 2020). In spite of closely focusing on a set of algorithms that ascribe a well-defined signal to its supposed emitter, we should not miss the bigger picture either and, therefore, concentrate on the non-independent nature of neural activities (Urai et al., 2021). In the search for the gold standard algorithm, we should also leave room for neural network-based or custom-tailored solutions too, without which we cannot excel when miscellaneous temporal or spatial particularities are present (Vu et al., 2018; Sedaghat-Nejad et al., 2021). The authors of this article believe that fostering the concept of incremental learning in spike detection and clustering methods could offer the panacea for most difficulties arising during long-term recording analysis, since adaptation to alternating circumstances would not demand repeated training on previous samples but focus on the gradually growing set of data without having to worry for memory restrictions.Another aspect that opens new perspectives is the myriad of neural sensors at our fingertips. As it can now be anticipated, CMOS technologies could fulfill the requirements for implantable long-term sensors, as their ultra-low power consumption, neuromorphic design, and, lately, their capability to self-repair should be exploited (Rahiminejad et al., 2021). Even though a great variety of recording devices provides us with data of ever-improving signal-to-noise ratios and of diminishing invasiveness, there are fundamental questions about extracellular action potentials that are, so far, unanswered. In particular, a probe's interference with the adjacent neural tissue, eventual chance of bias for certain neuron subclasses, cross-compliance between neuron types, and extracellular signatures are hotly debated topics (Neto et al., 2016). None of these subjects are self-standing, since contemplating about the limitations of recording facilities could also possibly bring about the advent of better-built BMI systems.
Author Contributions
RB conceived the study, conducted relevant literature search, and wrote the first draft. JR and GM contributed in and supervised the drafting process. JR, GM, RF, IU, and DM revised and suggested modifications on the manuscript. RF provided data for figure generation. JR helped in data formatting. All the authors contributed, read the manuscript, and approved the submitted version.
Funding
Project no. FK132823 has been implemented with the support provided by the Ministry of Innovation and Technology of Hungary from the National Research, Development and Innovation Fund, financed under the FK_19 funding scheme. This research was also funded by the Hungarian Brain Research Program (2017_1.2.1-NKP-2017-00002) and the TUDFO/51757-1/2019-ITM grant by the Hungarian National Research, Development and Innovation Office. JR is thankful to Semmelweis University for the EFOP-3.6.3-VEKOP-16-2017-00009 grant and to the Ministry of Innovation and Technology of Hungary from the National Research, Development and Innovation Fund for the ÚNKP-21-3-II-SE-1. Project no. 134196 has been implemented with the support provided by the Ministry of Innovation and Technology of Hungary from the National Research, Development and Innovation Fund, financed under the PD_20 funding scheme.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Authors: Thomas J Richner; Sarah K Brodnick; Sanitta Thongpang; Amelia A Sandberg; Lisa A Krugner-Higby; Justin C Williams Journal: J Neural Eng Date: 2019-10-30 Impact factor: 5.379
Authors: Thomas P Reber; Marcel Bausch; Sina Mackay; Jan Boström; Christian E Elger; Florian Mormann Journal: PLoS Biol Date: 2019-06-03 Impact factor: 8.029
Authors: Jeremy Magland; James J Jun; Elizabeth Lovero; Alexander J Morley; Cole Lincoln Hurwitz; Alessio Paolo Buccino; Samuel Garcia; Alex H Barnett Journal: Elife Date: 2020-05-19 Impact factor: 8.140