Dario Dematties1,2, Chenyu Wen3,4, Shi-Li Zhang5. 1. Northwestern Argonne Institute of Science and Engineering, Northwestern University, 2205 Tech Drive Suite 1-160, Evanston, 60208 Illinois, United States. 2. Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Avenue, Lemont, 60439 Illinois, United States. 3. NanoDynamicsLab, Laboratory of Biophysics, Wageningen University, Stippeneng 4, Wageningen 6708 WE, The Netherlands. 4. Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, Van der Maasweg 9, Delft 2629 HZ, The Netherlands. 5. Department of Electrical Engineering, Uppsala University, Lägerhyddsvägen 1, 752 37, SE-751 03 Uppsala, Sweden.
Abstract
Pulse-like signals are ubiquitous in the field of single molecule analysis, e.g., electrical or optical pulses caused by analyte translocations in nanopores. The primary challenge in processing pulse-like signals is to capture the pulses in noisy backgrounds, but current methods are subjectively based on a user-defined threshold for pulse recognition. Here, we propose a generalized machine-learning based method, named pulse detection transformer (PETR), for pulse detection. PETR determines the start and end time points of individual pulses, thereby singling out pulse segments in a time-sequential trace. It is objective without needing to specify any threshold. It provides a generalized interface for downstream algorithms for specific application scenarios. PETR is validated using both simulated and experimental nanopore translocation data. It returns a competitive performance in detecting pulses through assessing them with several standard metrics. Finally, the generalization nature of the PETR output is demonstrated using two representative algorithms for feature extraction.
Pulse-like signals are ubiquitous in the field of single molecule analysis, e.g., electrical or optical pulses caused by analyte translocations in nanopores. The primary challenge in processing pulse-like signals is to capture the pulses in noisy backgrounds, but current methods are subjectively based on a user-defined threshold for pulse recognition. Here, we propose a generalized machine-learning based method, named pulse detection transformer (PETR), for pulse detection. PETR determines the start and end time points of individual pulses, thereby singling out pulse segments in a time-sequential trace. It is objective without needing to specify any threshold. It provides a generalized interface for downstream algorithms for specific application scenarios. PETR is validated using both simulated and experimental nanopore translocation data. It returns a competitive performance in detecting pulses through assessing them with several standard metrics. Finally, the generalization nature of the PETR output is demonstrated using two representative algorithms for feature extraction.
Single-molecule analysis (SMA)
technologies are developed to interrogate individual molecules so
as to gain high-fidelity information about them, a task often difficult
or even impossible to attain using ensemble averages.[1] They offer a powerful toolbox for direct observation of
molecule dynamics in nanoscale, e.g., in peptide/protein
folding,[2,3] protein dynamics,[4,5] single-ion
electrochemical reactions,[6,7] DNA hybridization,[8,9]etc. Pulse-like signals are ubiquitously found
in the field of SMA. They carry comprehensive information about the
concerned molecules including their dynamics and interactions with
the surroundings. They can be in form of, e.g., variations
in electrical current through protein ion channels in cell membranes,[10] alternations in luminance caused by analyte
translocations in nanopores,[11] and changes
in electrical current related to single-molecule electrochemical reactions
on nanoscale electrodes.[12] As a typical
example, nanopore sensors have been used for molecular analysis at
the single-molecule level,[13] such as DNA
and protein sequencing,[14,15] protein profiling,[16] peptide recognition,[17] and small molecule detection.[18] In a
nanopore, the passage of single molecules, i.e.,
the analytes, leads to temporal blockages of the pore and, therefore,
sporadic appearance of spikes on the ionic current in electrical readout
or changes in luminance in optical readout. Abundant information about
the translocating analytes is hidden in the fluctuating monitoring
ionic current contributed from interactions between the analytes and
the nanopore.[19] Such subtle informative
details in the signal are inevitably affected by noise and physical
limitation of the signal readout such as the bandwidth.[20] Hence, a prerequisite to analyze them for various
purposes, such as feature extraction and classification, is to be
able to single them out from the noisy background.The commonly
adopted methods to detect the pulses from time-sequential
traces are based on user-defined thresholds referring to the baseline.[21−23] Additional mechanisms are involved to self-adapt to variations or
fluctuations of the baseline, typified by dynamic window average[21,24] and iterative detection.[22] By detection
of abrupt changes in the current traces, the CUSUM algorithm tolerates
the baseline fluctuations to a certain extent and adjusts the threshold
accordingly.[25] A pitfall with these approaches
is that they do not thoroughly resolve the subjectivity problem during
the spike recognition, and some predefined parameters are needed, e.g., size of the average window, rough amplitude of the
spikes or steps. Other algorithms, i.e., ADEPT[26] and second-order-differential-based calibration
(DBC) with an integration method,[27] can
also predict the time points at which spikes appear. They usually
require a prior rough knowledge of the spike position so that a segment
containing a single spike can be singled out for further precise fitting.
Furthermore, several advanced machine learning algorithms have been
developed with a focus on specific tasks of processing pulse-like
signals, such as denoising, feature extraction, and classification.
However, machine learning, including artificial neural networks (NNs),
has seldom been involved in the most essential part of signal processing
in this procedure: pulse recognition.[23] Different from the classification task, pulse recognition requires
acquisition of pulse features from a noisy background in a time sequential
manner, which poses a challenging task to directly adopting commonly
used NN structures including convolutional neural networks (CNN) and
fully connected deep neural networks. The hidden Markov model is one
such attempt to detect pulses from nanopore sensing signals. However,
an initial pre-processing step is required to enable the detection,
and it involves the use of a user-defined criterion to determine the
approximate position of current blockage events.[28] The determination of this threshold is based on user experience
and, therefore, varies from case to case.[29] Obviously, objective algorithms are desired for isolating pulses
from noisy time-sequential traces. Our recently introduced NN-based
algorithm named Bi-path Network (B-Net)[29] has addressed this subjectivity issue. B-Net is composed of two
branches, each one using a residual neural network (ResNet) structure.
The novelty with B-Net lies in its assignment of a different task
to each branch, one counting the number of pulses in signal segments
while the other measuring the features of the pulses in the segments.
This design gains some inspiration from certain streams in the brain
in which different pathways handle specific tasks. For instance, the
ventral/dorsal streams in the visual system process the “what/where”
of objects. In this way, the training process is easier for each branch
and the generalization performance is increased. Unfortunately, B-Net
can only predict averaged features, such as amplitude and duration,
of the pulses in input data. It falls short in singling out pulses
in temporal windows. A potential drawback with only obtaining average
features is loss of information about the analytes.Here, a
deep learning (DL) method is proposed for pulse recognition.
It is capable of predicting the start and end time points of a pulse.
The duration of the pulse can, thus, be naturally obtained. The method,
named pulse detection transformer (PETR), is based on a transformer
architecture. Primarily, transformers have achieved state-of-the-art
results in many natural language processing tasks.[30,31] Attentional maps of transformers have been applied with outstanding
success to accurately predict protein structures, an important research
problem that had been open for more than 50 years.[32] In recent years, this approach has seen increasing applications
in computer vision (CV) as well.[33] An architecture
combining a transformer NN with a CNN is adopted in the present work.
We focus on the application of this architecture in the field of object
detection.[34] The original application is
adapted from two-dimensional (2D) object detection in bidimensional
images and implemented for one-dimensional (1D) pulse detection in
unidimensional temporal signals. Our method simplifies the detection
pipeline by removing many hand-designed components commonly found
in object detection architectures.[34] Furthermore,
we avoid complex training procedures with many independent stages
in the pipeline and propose a new architecture that can be easily
trained with the end-to-end philosophy that has led to the significant
success in DL.[35] Crucially, the need for
subjective user-defined thresholds is eliminated.[21,22]Specifically, we use electrical nanopore sensing signals, i.e., the ionic current traces from nanopore sensors, as
typical examples to evaluate PETR. Following the idioms in nanopore
sensing, “spike” is usually used to refer to “pulse”
as the signal. In this work, “spike” and “pulse”
are used interchangeably. We will generate synthetic datasets of analyte-translocating
nanopores and use them for training and validating the PETR. Afterward,
we use experimental datasets of DNA- and streptavidin-translocating
solid-state nanopores to further evaluate PETR by systematically comparing
the results to the counterparts of B-Net and threshold-based traditional
methods. Moreover, two representative algorithms known in the community
are utilized to process these segments and extract features in order
to demonstrate the generalized usage of the output pulse segments
from PETR.To evaluate the performance of PETR, we refer to
the concept of
mean average precision (mAP) from object detection in CV and adapt
it to the requirements posed by nanopore translocation signals. The
performance of the PETR is better for longer translocation durations.
Finally, the performance for narrow translocation events can be boosted
by artificially transforming them into longer translocations with
larger apparent durations by means of an interpolation process.Therefore, this work offers an objective spike recognition algorithm
that delivers an accurate prediction of each spike in a temporal window
from a noisy nanopore translocation trace with an mAP close to 1.
PETR achieves it through returning spike segment predictions by specifying
the start and end time points of each pulse in a signal window. This
proposed method is expected to have a significant impact in the SMA
research community since it offers a flexible generalized interface
toward downstream algorithms such as feature extraction and classification.
Results
Model and Implementation
PETR is developed as an algorithm
that can help detect characteristic events in time-sequential 1D noisy
traces. This network is inspired by the model called detection transformer.[34] The architecture of PETR is depicted in Figure . The network receives
a 1D temporal window, which is a chunk of a noisy trace and returns
a set of bounding segment predictions. The bounding segments are composed
of start and end time points that predict the location of a pulse
in the temporal window generated by, e.g., a translocation
of an analyte in a nanopore. PETR consists of four main blocks: a
pulse counter, a backbone, a transformer, and a feed forward network.
The original detection transformer (DETR) architecture was developed
with the ultimate purpose of being readily implementable in any DL
framework that provides a common CNN backbone, a clear distinction
from many modern detectors.
Figure 1
PETR as an algorithm with the capacity of detecting
distinctive
acute events into 1D noisy signals. It uses the feature estimation
path, ResNet 2, of our previously developed B-Net model as the backbone.
The backbone is a pre-trained network fine-tuned to return better
1D representations that are more adapted to the detection task. The
outputs from the backbone are first added up to a 1D positional encoded
vector and then passed to the transformer encoder. The transformer
decoder receives a certain number of learned embeddings (called pulse
queries) and returns a set of output embeddings while attending to
the encoder outputs. Each decoder output is passed to an FFNN that
predicts two different aspects of a detection. A detection class can
be pulse (colored) or no-pulse (gray) and a bounding segment.
PETR as an algorithm with the capacity of detecting
distinctive
acute events into 1D noisy signals. It uses the feature estimation
path, ResNet 2, of our previously developed B-Net model as the backbone.
The backbone is a pre-trained network fine-tuned to return better
1D representations that are more adapted to the detection task. The
outputs from the backbone are first added up to a 1D positional encoded
vector and then passed to the transformer encoder. The transformer
decoder receives a certain number of learned embeddings (called pulse
queries) and returns a set of output embeddings while attending to
the encoder outputs. Each decoder output is passed to an FFNN that
predicts two different aspects of a detection. A detection class can
be pulse (colored) or no-pulse (gray) and a bounding segment.In our case, the modularity and simplicity in the
original implementation
of DETR[34] have been adopted by taking advantage
of a detection model whose main property is its end-to-end trainability.
We have further adapted the original architecture for 1D “images”
via decoupling the original backbone and using instead the feature
prediction path in our previously developed B-Net,[29] an architecture composed of two ResNets. ResNet 1 predicts
the number of pulses in a temporal window, while ResNet 2 predicts
the average duration and amplitude of all the pulses in the window.
Similarly, ResNet 1 in our algorithm PETR predicts the number of pulses
in the input temporal window. When and only when the number of pulses
is larger than zero, the complete system, i.e., the
backbone, transformer, and prediction heads, will process the temporal
window. Otherwise, the operation of the system is disabled, and zero
predicted segments are returned. ResNet 2 is used as the backbone
since it has already been trained in B-Net to condense the abstract
information from raw signals. The transformer encoder receives the
output from the CNN in ResNet 2. It is worth noting that PETR is not
an extension of B-Net. PETR’s core unit, the transformer, is
totally different from B-Net. Here, only the functional components
are disassembled from B-Net and reused in the PETR peripherals for
convenience. One could also independently train a normal CNN as the
backbone of PETR, instead of ResNet 2, and use other preprocessing
techniques to filter out the blank segments without pulse instead
of ResNet1. The use of a pretrained backbone in these kinds of architectures
is a recurrent practice in DL.[34]As to bounding boxes, bounding segments have been utilized in our
study to reshape the concept of bounding boxes from 2D to 1D. PETR
uses a detection architecture whose performance is not influenced
by human conducted heuristics. The network objectively abstracts the
main features from the noisy signals in order to attain the best detection
experience. The self-attention mechanism of the transformer prototypes
all pairwise interactions among elements in the positioned backbone
output.Finally, each output embedding from the transformer
decoder is
processed by feed-forward fully connected neural networks (FFNNs)
that classify the pulses as present or absent and predict the start
and end time points of the bounding segments. The gray bounding segments
in Figure are classified
as absent pulses by the network, while the colored ones are classified
as effectively present pulses in the window.It is worth recalling
that PETR predicts all pulses in parallel,
thereby avoiding recurrence as in autoregressive models. It is trained
end-to-end using a set of loss functions such as bipartite matching
between predicted and ground truth pulses. When this kind of architecture
predicts a bounding segment in a certain section of a trace chunk,
such a prediction is not just based on the patterns nearby the prediction
location but also determined by paying attention to all the surroundings
inside the window. The prediction is in fact influenced by, e.g., the frequency, amplitude, and duration of the pulses
in all the surroundings of the prediction location.We generate
artificial datasets and use them for network training,
validation, and testing. In each dataset, three important parameters
are systematically varied, i.e., the diameter of
the analytes (15 kinds), the concentration of the analytes (20 kinds)
and the duration of the translocation spikes (5 kinds). In order to
measure the PETR performance, the datasets with different signal-to-noise
ratios (SNR) ranging from 4 to 0.25 are also generated. In addition,
two experimental datasets of λ-DNA and streptavidin-translocating
solid-state nanopores are introduced to further validate the PETR
fidelity. As a final demonstration of the benefits from its generalized
output, two additional feature extraction algorithms are employed
as use cases. Details of the data preparation can be found in the Methods section.The training process starts
by providing a random batch of temporal
windows from the training data. Only temporal windows with at least
one pulse are used for training. Once the total number of temporal
windows is consumed in the dataset, one epoch is completed. Inside
each epoch, the learning rate is set to 1 × 10–5, with a learning rate decay of 10 in a period of 100 epochs. We
validate our model periodically by utilizing mAP, which is a widely
used performance metric for object detection in 2D images.[36] The model with the highest mAP is saved as the
best representation.An adapted mAP is used for evaluating the
model. Instead of using
intersection over union (IoU) as a threshold, the relative distance
between the midpoint of both the predicted pulse and the ground truth
is considered, referring to the duration of the ground truth pulse.
A 100% threshold means that their distance is equal to the duration
of the ground truth pulse. The associated mAP is calculated as an
average by varying the relative distance thresholds from 100 to 400%,
with steps of 10%. Any matched pair of the predicted pulse and the
corresponding ground truth with smaller distance than such threshold
is considered a true positive. Compared to the IoU thresholds adopted
in the standard mAP, the adopted thresholds are more tolerant. However,
the adapted mAP in the pulse detection scenario appropriately reflects
the necessary requirements in order to catch nanopore translocation
events in trace windows. Furthermore, coverage, which is defined as
the total number of true positives divided by the total number of
ground truth pulses, is calculated as another metric for pulse detection
performance. In addition, the duration error as well as the start
and end time errors are computed as indirect performance measures
of PETR.
Performance on the SNR = 4 Dataset
The PETR performance
is evaluated for our artificially generated dataset with SNR = 4.
Several typical examples of the detected spikes are depicted in Figure a with the start
and end time points respectively marked in red and green stars predicted
by our model. The adapted mAP of the detection of the spikes with
different widths (i.e., translocation duration) is
displayed in Figure b. PETR returns an almost-perfect detection with mAP ≈ 1 for
the long spikes, e.g., duration ≥ 1.5 ms.
However, the detection precision for spikes shorter than 1.5 ms decays
rapidly with decreasing duration. The overall mAP on the entire dataset
is 0.85. Evaluating our model by employing the standard mAP produces
results with an mAP of ≥0.3 for spikes of 5 ms duration, which
is comparable to detectors used in other application scenarios.[34,37−39] Details of the performance measured by the standard
mAP can be found in Supporting Information (SI).
Figure 2
Results of PETR processing the SNR = 4 dataset. (a) Typical examples
of the PETR output. In the current trace segment, PETR predicts the
start and end time points of the spikes, marked as red (start) and
green (end) stars, respectively. (b) mAP and (c) coverage of the spike
detection at different durations. (d) Relative errors of predicted
duration and (e) errors of start and end time points for spikes at
different durations. In each sub-figure, the corresponding overall
mean and standard deviation on the entire dataset are included. (f)
Box chart showing the distribution of extracted durations for the
spikes at different set durations in the dataset. (g) Box chart showing
the distribution of spike appearance frequency for the signal generated
at different concentrations of analytes. In (f) and (g), pink dots
(average values) with error bars (spread) mark the corresponding quantities
of the ground truth, indicating an excellent agreement between PETR
predictions and ground truths.
Results of PETR processing the SNR = 4 dataset. (a) Typical examples
of the PETR output. In the current trace segment, PETR predicts the
start and end time points of the spikes, marked as red (start) and
green (end) stars, respectively. (b) mAP and (c) coverage of the spike
detection at different durations. (d) Relative errors of predicted
duration and (e) errors of start and end time points for spikes at
different durations. In each sub-figure, the corresponding overall
mean and standard deviation on the entire dataset are included. (f)
Box chart showing the distribution of extracted durations for the
spikes at different set durations in the dataset. (g) Box chart showing
the distribution of spike appearance frequency for the signal generated
at different concentrations of analytes. In (f) and (g), pink dots
(average values) with error bars (spread) mark the corresponding quantities
of the ground truth, indicating an excellent agreement between PETR
predictions and ground truths.The coverage shows a similar trend with the spike
duration in Figure c. PETR has rarely
missed spikes with duration longer than 1.5 ms, as reflected by a
coverage close to 1 in the figure. The coverage drops for shorter
spikes, though it can still reach 60% for durations equal to 0.5 ms.
Thus, the overall coverage of PETR is above 90% on the entire dataset.The relative errors of the predicted duration by the start and
end time points from PETR are shown in Figure d for different duration spikes. The relative
error is smaller for longer spikes and the average error on the entire
dataset is below 9.3%. Furthermore, the absolute errors on the predicted
start and end time points are displayed in Figure e, showing an average value of ∼1.5
ms.For nanopore translocation, three features are most widely
discussed
and studied for a spike, i.e., duration, appearance
frequency, and amplitude.[40,41] The first two can be
naturally obtained by PETR as the by-products of the detection process.
The box charts of the predicted duration and frequency of the spikes
summarized in Figure f,g represent their statistical distributions on corresponding setting
parameters in the data generation, i.e., set duration
and analyte concentration, respectively. For a better comparison,
the average values of the ground truth are also shown as dotted lines
with error bars in the same figures. It is found that the predicted
values coincide well with the respective ground truths, indicating
that PETR can efficiently detect the spikes with high accuracy.An important reason for the good recognition performance is that
PETR naturally resists the baseline fluctuations. Two kinds of fluctuations
are commonly seen in single-molecule detection signals and are involved
in the artificially generated datasets as well: slow drift and sudden
jumps. The recognition of pulses by PETR is based on the pulse features
and their differences from the background noise, which is insensitive
to the baseline level shift and, thus, resistant to the slow drift
of baseline. Sudden jumps generate rapid changes of a signal, which
are similar to the rising or falling edges of pulses. One of the main
features of a pulse, the target of PETR, is that its rising and falling
edges appear in pairs, which rarely happens with sudden jumps. Even
in the atypical scenario of a rising edge and a falling edge occurring
together with a similar timing to the one found in a normal pulse
in a window, the network will not detect such a fluctuation as a pulse.
PETR acquires not only the timing of pulses but also the statistical
distribution behind their main morphological features. This feature
makes PETR immune to sporadic atypical fluctuations in the signal.
Several examples showing how PETR is immune to baseline fluctuations
can be found in the SI (Figure S3).
Datasets with Different SNRs
The PETR performance is
also evaluated on our artificially generated datasets with SNRs other
than 4. The overall mAP and coverage on all the datasets with SNR
ranging from 4 to 0.25 are summarized in Figure a. Furthermore, the relative error of predicted
duration and the absolute error of the start and end time points for
the different SNR datasets are displayed in Figure b. Details about these parameters, distributed
on different setting variables in the data generation process, such
as duration, analyte concentration, and analyte size, for each dataset
can be found in the SI.
Figure 3
PETR performance for the datasets with different
SNRs. (a) mAP
and coverage of the spike detection at different SNRs. (b) Relative
error of the predicted duration and error of the start and end time
points at different SNRs.
PETR performance for the datasets with different
SNRs. (a) mAP
and coverage of the spike detection at different SNRs. (b) Relative
error of the predicted duration and error of the start and end time
points at different SNRs.It can be clearly inferred from the charts that
the PETR performance
does not decay substantially even when SNR decreases from 4 to 1.
The mAP for SNR ≥ 1 stays above 0.8, and the coverage stays
above 87%. The relative error of duration is lower than 10.2%, and
the error of predicted start and end time points is smaller than 1.8
ms. The detection and prediction abilities of PETR sharply fall for
SNR < 1. However, even for the worst case with SNR = 0.25, PETR
can still capture 20% of the spikes and the relative error of duration
is only ∼50%. If an mAP of >0.8 and coverage of >85%
are set
as the criteria for an accurate processing at 10 kHz sampling rate,
the shortest pulses that can be detected accurately are 1, 1, 1, and
5 ms for SNR = 4, 2, 1, and 0.5, respectively. These results indicate
that PETR has an outstanding generalization ability to adapt to a
wide range of noise space.In this work, SNR-specific networks
are trained individually for
the five different SNRs studied instead of training one common network
for all data with various noise levels. This design is adopted by
considering the properties of PETR as a machine learning model: the
recognition mechanism is based on the acquisition of the statistical
distribution behind the pulse properties of the signal and the surrounding
background noises as the context. The differences between the pulse
and noise are implied in the signal–noise context. Each SNR
carries a distinct signal–noise relationship, which naturally
requires a specific set of trained weights to achieve the best performance.
On the contrary, training a common network for all SNR can gain an
added degree of automatization for data processing but largely at
the expense of detection performance. Obviously, this approach does
not bring any benefit. Furthermore, the background noise is usually
a stationary stochastic process from an experimental perspective.
Thus, the statistical properties of noise do not change considerably
in one record of the current trace or even during the entire measurement
period as long as the experimental conditions, e.g., bias voltage, analyte concentration, temperature, and surface cleanness,
are stable. In practice, well-trained networks for different SNRs
can cover most of the application scenarios. In order to compensate
for the loss of the degree of automatization due to the use of SNR-specific
networks, an extra pre-processing section can be added to detect the
rough level of SNR. Alternatively, it is also possible to label a
fraction of the target dataset and use it for determining the best-performing
model. While viable, this last option is expensive and tedious.
Compensation of Short Duration Spikes
The results shown
in Figures and 3 point to the critical dependence of PETR detection
precision and coverage on the spike width (duration). If the duration
of a spike is too short, the number of sampling points can be too
limited to ascertain its recognition from the noisy background. For
example, using a sampling rate of 10 kHz generates five sampling points
for a 0.5 ms spike and 50 sampling points for a 5 ms spike. In other
words, it is the number of sampling points, instead of the absolute
duration time span, that determines the detection performance of PETR.
Therefore, increasing the sampling rate of the data acquisition, as
well as the bandwidth of the experimental setup, can enhance the detection
performance for the short spikes. From a signal-processing perspective,
interpolation can reach a similar performance improvement. This approach
is validated using a generated dataset (SNR = 4) with 0.5 ms spikes
by linearly interpolating with nine points in between each two original
adjacent points. Hence, the apparent effective spike duration of PETR
becomes 10 times longer, i.e., 5 ms. However, the
interpolation can alter the noise characteristics and thereby interfere
with the decision-making of PETR. In order to compensate for this
adverse effect, a small amount of artificially generated noise, which
has the same components in the power spectrum as the ones in the training
dataset, is added to the interpolated data. By adding a noise component
of half of the original amplitude in the signal (measured on a small
segment of input signal traces), which only increases the total noise
power by 12 + 0.52 = 1.25 times, the resultant
SNR is only slightly worsened from 4 to 4/ = 3.58. However, this small amount of added
noise can significantly boost the detection performance of PETR because
the signal–noise characteristics become similar to that of
the original training case with SNR = 4.There are prerequisites
for the interpolation and compensation. First, the bandwidth of the
system is large enough for the target signal, i.e., the pulses, and the sampling frequency to discretize the signal
is sufficiently high in accordance to the Nyquist theorem. Second,
the general knowledge of the noise characteristics in the system should
be acquired, such as the noise components that dominate in different
frequency ranges. Here, the noise characteristics of nanopores have
been studied and noise models are well established.[42,43] Hence, the original signal with its raw data already contains sufficient
information for correct recognition. However, the data points are
often too few for PETR to perform a sensible detection of a pulse.
In detail, the number of data points in the pulse is simply insufficient
for the convolutional kernel to process them or for the transformer
to pay enough attention to them. Consequently, PETR does not perform
well on these short-duration-pulse traces. Therefore, interpolating
points that contain a small amount of noise with similar characteristics
to the noise present in the original signal can alleviate the challenge
with limited data points and improve the detection performance. It
is important to clarify, however, that the interpolating samples do
not generate additional information.Both mAP and coverage of
the PETR spike detection for the original
and interpolated 0.5 ms data are compared in Figure . The detection performance for the original
5 ms data is also shown as a reference. It is clear that both mAP
(Figure a) and coverage
(Figure b) are significantly
improved after data interpolation, and they are comparable to those
for the original 5 ms data. It indicates that after the interpolation
with a 10-fold increase in number of points, PETR “sees”
the data as if they came from the 5 ms spikes. The relative error
of duration for the interpolated data is also similar to that of the
5 ms data (Figure c). Since the errors of the start and end time points depend on the
number of points in a spike instead of the absolute value of duration
time, the higher the sampling rate equivalently achieved by interpolation,
the lower is the absolute error in time (Figure d). Details of these parameters distributed
on different setting variables can be found in the SI.
Figure 4
Comparison of the PETR performance with interpolation. (a) mAP,
(b) coverage, (c) relative error of duration, and (d) errors of the
start and end time points, for the interpolated 0.5 ms duration data
in comparison with those for the original 0.5 and 5 ms data.
Comparison of the PETR performance with interpolation. (a) mAP,
(b) coverage, (c) relative error of duration, and (d) errors of the
start and end time points, for the interpolated 0.5 ms duration data
in comparison with those for the original 0.5 and 5 ms data.
Performance Evaluation on Experimental Datasets
PETR
is also applied to processing experimental data from the translocation
of λ-DNA and streptavidin in solid-state nanopores. The duration
and appearance frequency of translocation events at various bias voltages
are shown in Figure . The latter shows an upward trend with increasing voltage in accordance
with the physics of the capture process. Higher voltage offers larger
capture area, thereby yielding higher frequency.[41] The observed constant duration of streptavidin translocation
at the bias voltages used is attributed to the limited bandwidth of
signal acquisition in our experiment; it is readily conceivable that
a high translocating speed of small molecules such as protein can
lead to a sharp and featureless spike.[44,45] The duration
does not display a monotonous trend in the DNA translocation data,
which may result from complicated interactions between long and densely-charged
DNA and nanopore.[44,46] The traditional method in which
different multiples of noise level are utilized as thresholds for
spike detection yields diversified results on both frequency and duration
(SI, Figure S15). In sharp contrast, physics-plausible,
stable, and consistent trends are obtained by PETR based on the same
experimental data (SI).
Figure 5
Results of PETR processing the experimental
datasets. Box charts
showing the PETR distribution of extracted duration at different bias
voltages for (a) DNA and (b) streptavidin translocation data. Box
charts showing the distribution of spike appearance frequency at different
bias voltages for (c) DNA and (d) streptavidin translocation. The
pink dot-on-lines (average values) with error bars (spread) in each
figure display the corresponding results from our previously developed
B-Net algorithm.
Results of PETR processing the experimental
datasets. Box charts
showing the PETR distribution of extracted duration at different bias
voltages for (a) DNA and (b) streptavidin translocation data. Box
charts showing the distribution of spike appearance frequency at different
bias voltages for (c) DNA and (d) streptavidin translocation. The
pink dot-on-lines (average values) with error bars (spread) in each
figure display the corresponding results from our previously developed
B-Net algorithm.The PETR results agree satisfactorily well with
those based on
our previous algorithm B-Net[29] (Figure ). Both algorithms
give consistent results even for the variation details of these two
parameters, duration and frequency of the translocation events, along
the different bias voltages. It is important to emphasize that the
two algorithms have completely different objectives along with distinct
NN structures and output formats. The unanimous results strongly support
that PETR is effective and reliable. Nonetheless, systematic deviations
between PETR and B-Net appear. PETR generally predicts a shorter duration
(Figure a,b), but
a higher appearance frequency (Figure c,d) than what B-Net gives. These observations consistently
point to the ability of PETR to capture more spikes of relatively
smaller amplitudes and shorter durations than B-Net. That PETR raises
the average appearance frequency and lowers the average duration can
be related to the essential difference in the core tasks of the two
architectures, PETR versus B-Net. As mentioned, B-Net predicts averaged
properties of the pulses in each input segment. This network does
not detect individual pulses, and it has a focus on the more obvious
spikes with larger amplitudes and longer durations. In contrast, PETR
detects pulses according to their distinct features from the noisy
background, i.e., the context around the pulses,
and treats pulses as individual entities. By this mechanism, PETR
could catch smaller spikes.
Demonstration of Generalized Output
The primary objective
of the PETR algorithm is to single out all spikes in a time-sequential
trace. Being endowed with the largest flexibility, the PETR outputs
in the form of spike segments can be later adopted and processed by
other algorithms for different purposes, including extracting the
features and then classifying and correlating them to the physicochemical
properties of the analytes. To demonstrate the general utility of
PETR, two established algorithms, ADEPT[26] and DBC,[27] are adopted to post-process
the spike segments singled out by PETR. In short, current segments
containing single spikes objectively recognized from PETR are directly
used as raw materials for ADEPT and DBC to more precisely determine
the spike amplitude.The ADEPT algorithm is based on the pulse
response of the nanopore system according to its equivalent circuit.
Rising and dropping periods in spikes are fitted by several exponential
functions with different time constants, leading to the extraction
of duration and amplitude of the spikes. In the DBC method, the spikes
are first fitted by a Fourier series for smoothing. The second-order
derivative of the smoothed waveform is then calculated for determination
of its extrema. The positions of the two largest minima are then
correlated to the start and end time points of the translocation events.
Finally, the amplitude is extracted by considering the area enclosed
by the spikes referring to the baseline. Detailed processing flows
for both algorithms can be found in the SI with typical examples.The extracted amplitude and duration of the spike segments by means
of the ADEPT and DBC methods show physics-plausible and stable results
(see SI, Figure S17). The spike segments
result from the PETR detection data of the λ-DNA and streptavidin
translocation datasets. In detail, the spike amplitude increases with
increasing bias voltage, which is reasonable since higher voltage
induces a larger ionic current through the nanopore. The spike duration
of λ-DNA and streptavidin follows similar trends as those extracted
directly by means of PETR for different bias voltages. The comparisons
confirm that the results from both methods, as well as the spike segments
singled out using PETR, are reliable.
Discussion
The detection of pulses using PETR is based
on the acquisition
of the signal–noise characteristics of the input by the NN.
With a synergic consideration of the abrupt changes of corresponding
signal properties during the pulses and the background noise surroundings
as the context, PETR can distinguish the differences between the pulses
and the noise and recognize them. This approach brings a completely
different strategy from the traditional threshold-based methods, since
the latter only consider the highly simplified amplitude information.
Therefore, the PETR results are consistent and free from user-defined
parameters. Furthermore, PETR can automatically adapt to complicated
real-world scenarios with baseline fluctuations, noise level alternations,
pulse amplitude variations, etc., thereby yielding
stable and reliable outputs. Thus, PETR, unlike the traditional methods,
is characterized by the ability of generalization. The strategy of
PETR in dealing with complex problems with pulsed signals also determines
that its generalization ability highly depends on the characteristics
of the pulses and the background noise as well as their differences.
Thus, PETR will return a better performance for datasets with more
similar noise–signal characteristics to the training dataset.
Moreover, the detection performance of PETR for the interpolated data
can be enhanced by adding a small amount of noise with the same spectrum
as the one found in the training dataset.Compared with our
previously developed B-Net, PETR has an entirely
different aim. Instead of extracting the features of pulses, it focuses
on isolating individual pulses from a noisy background in a time-sequential
trace. From the perspective of the network structure, a well-adopted
practice in object detection is to use the pre-trained features of
a CNN as a pre-processing (backbone) platform.[34] From its convolutional architecture, this pre-processing
platform provides the system with the needed inductive bias to capture
important features existent in signals found in nature, such as pulse-like
signals from molecular translocating events.PETR utilizes the
CNN part of the ResNet 2 in the B-Net as its
backbone. This section of the network contributes important features
about the signals to the system, playing a pre-processing role of
information abstraction and condensation. However, the essential process
of acute pulse recognition and localization is realized by the transformer
structure. The transformer plays the role of a memory bank, which
is not affected by inductive bias as the backbone. The transformer
focuses on learning the statistical distribution in the context of
the time-sequential features delivered by the backbone. The transformer
uses the backbone features to memorize the semantic structures of
the different situations present in the signals. For instance, if
a pulse of certain duration is found in a location, it is highly likely
that some pulses with a similar duration could be found in the vicinity
of such a detection, since the signals used for training have such
properties.[47] If the network has detected
some repeated patterns of pulses in a region of the temporal window,
this observation will help the network to take a decision about similar
trends in other regions of such a window. All these semantic features
about the structure of the signals are memorized by the transformer
by referring to the pre-processing features provided by the backbone.
It is worth noting that the achieved performance of PETR relies on
the fact that the backbone efficiently extracts and condenses basic
features in the signals and that the transformer acquires the semantic
properties of pulses, such as positions and appearance frequency,
by considering the context in the time-sequential signal.
Conclusions
With the transformer structure as its base,
PETR can successfully
single out pulses from a noisy background in nanopore-sensing signals.
The typical machine learning end-to-end training strategy of PETR
avoids user-defined thresholds and, thus, the subjectivity of users.
PETR is first trained on generated datasets with different SNR levels.
It is further validated by both generated and experimental datasets.
Outstanding PETR detection performance is demonstrated to be achievable
even for low SNR data. As the detection performance is largely influenced
by the number of sampling points in each pulse, a simple linear interpolation
can significantly improve the detection precision and coverage for
short pulses. PETR is proven to be a generalized method for spike
detection and offers a powerful tool for processing signals from various
single-molecule events in SMA. It further acts as an important link
in the pipeline of pulse-like signal processing by offering a seamless
connection to the downstream algorithms.
Methods
Pulse Detection Transformer
As is the case for the
original DETR model by Carion et al.(34) PETR possesses a reasoning-like behavior that predicts
each bounding segment by taking into account a global context in the
entire temporal window. At the end, the system produces all predictions
in parallel ,and each bounding segment is predicted based on the global
context surrounding such a prediction.
Pulse Counter
The pulse counter path (ResNet 1) in
our previously developed B-Net was used to count the number of pulses
in the temporal window to be processed by the system. The B-Net architecture
uses two ResNets that are pre-trained for regression tasks—ResNet
1 is trained for pulse counting in a trace chunk, and ResNet 2 is
trained for averaged duration and amplitude prediction of the pulses
inside such a chunk. As shown in Figure , the system would process the temporal window
only if the number of pulses counted by ResNet 1 was more than zero.
Otherwise, the window was discarded, the operation of the system disabled,
and zero predicted segments returned. This section of the network
was only involved in testing activities. ResNet 1 did not take part
in either network training or validation.
Backbone
The feature prediction path (ResNet 2) of
our previously developed B-Net was used as the backbone in PETR.[29] The original pre-trained B-Net uses the ResNet18
architecture. The FFNN layers in ResNet 2 were replaced by identity
layers that only passed the input to the output without modification.
Hence, the output returned by the backbone was the flattened version
of the output of the convolutional section in ResNet 2.In its
original function, the B-Net was trained to predict characteristic
features inside a temporal window extracted from a noisy trace. Based
on the DL architecture of the B-Net, with its end-to-end training
philosophy, it is highly feasible that the convolutional section in
each ResNet acquired important features in the statistical structure
of the translocating pulses in the signal. These features turned out
to be highly effective for training the entire detection system we
are introducing here. ResNet 2 was fine-tuned for the detection task,
and the learning rate applied to it was a constant but a smaller value
than the one used for the rest of the network.
Transformer
As its name alludes, transformer architectures
transform one sequence at its input into another sequence returned
by its output. Even though transformers materialize the state of the
art in today’s machine learning sequence processing, these
architectures completely dispense recurrence, thanks to their attention
mechanisms that process sequences integrally in parallel.[48]The output from the backbone is a sequence
of vectors. Each vector in the sequence has a number of channels (num_channels).
The sequence is processed by a kernel_size = 1 1D convolution that
reduces the number of channels in each vector in the sequence to hidden_dim.
In our case, since we used ResNet 18 in the backbone, num_channels
= hidden_dim = 512, there was no dimensionality reduction.Afterward,
the transformer encoder added the 1D positional embedding
vector to the input sequence, and the resulting positioned sequence
was passed through Nenc successive encoder layers
composed by a multi-head self-attention (MHSA) layer followed by an
FFNN layer as residual stages were bypassed by skipping connections.The transformer decoder received a sequence of pulse queries, which
was a set of learned embedding vectors. Each embedding vector had
hidden_dim components. First, the embedding vectors were passed to
an MHSA, then the output from this one was passed to a multi-head
attention (MHA) layer. The MHA layer also paid attention to the outputs
from the transformer encoder. The output from this MHA layer was passed
to a final FFNN. The transformer decoder had Mdec layers repeating
this processing pipeline.
Feed-Forward Fully Connected NN
Finally, the outputs
from the last layer of the transformer decoder were passed to two
FFNN modules. One of these modules classified the pulse queries as
present or absent pulses in the trace chunk, while the other predicted
the bounding segments for each classification, i.e., predicted the location of the pulse in the temporal chunk extracted
from the noisy trace.
Pulse Detection and Prediction Losses
In each single
pass through the decoder, PETR predicted a fixed-size set of N pulses in a temporal window in a noisy trace. N was a chosen parameter, and it was larger than or equal
to the maximum number of translocation events produced inside a temporal
chunk in the complete dataset. The system found the best bipartite
matching between predictions and ground truths based on class, position,
and size. The matching cost took into account the class prediction,
which was an existent or non-existent pulse, and the similarity between
the predicted and the ground truth bounding segments.Once the
matching was done, each prediction was assigned to a ground truth
bounding segment and the system now could compute the Hungarian loss
for all the pairs matched in the previous step. The Hungarian loss
is a combination of class prediction and bounding segment losses, i.e., this combines a class cost, a bounding segment cost,
and an intersection over union cost.
Data Preparation
Data Generation
The artificially generated data was
composed of three parts: (1) randomly appeared translocation spikes,
(2) background noise, and (3) baseline variations. The baseline current
level, random properties of a translocation, current blockage amplitude,
and the background noise spectrum were all determined using our established
physical models with given corresponding parameters such as geometry
of nanopore and analytes, electrolyte concentration, analyte concentration,
and bias voltage. In signal generation, both the diameter and thickness
of the nanopores were fixed to 20 nm. Typical experimental conditions
were selected, including a bias voltage of 300 mV, a 100 mM KCl electrolyte,
and −0.02 C/m2 surface charge density. Three parameters
in the signal generation program, i.e., the diameter
of translocating analytes, the concentration of the analytes, and
the duration of translocation, were systematically varied in each
dataset. In each dataset, the diameter of the nanospheres varied from
3 to 17 nm with a 1 nm step (15 different values). The concentration
of the nanospheres varied from 0.01 to 1 nM, changing in logarithmic
scale (20 different values). The duration of the translocation was
directly assigned to 0.5, 1, 1.5, 3, and 5 ms (five different values).
In addition, the SNR was varied from 0.25 to 4 (five different values).
Each dataset was composed of 1500 traces with combinations of different
values of these three varying parameters. It is worth noting that
the SNR is defined as the ratio of spike amplitude to the peak-to-peak
value of the background noise. Details of the data generation are
available in the literature.[29] We provide
training, validation, and testing datasets available online for SNR
= 4 as well as testing datasets available for all the SNRs.[49]
Experimental Data
λ-DNA and streptavidin were
selected as two typical examples of the translocating analytes, representing
the long strand-shaped and sphere-shaped objects, respectively. Electrical
measurements were controlled using a patch clamp amplifier (Axopatch
200B, Molecular Device Inc.). The ionic current was converted to digital
signal by Axon Digidata 1550A (Molecular Device LLC.) and recorded
by software Axon pCLAMP 10 (Molecular Device LLC.). The translocation
signal was measured under six different bias voltages for both λ-DNA
and streptavidin. The λ-DNA translocation was measured in a
10 kHz sample rate with 2 kHz analog bandwidth, while the streptavidin
translocation was detected at 20 kHz sampling frequency with a 10
kHz bandwidth. All the datasets have been published in Zenodo.[49]
Training and Validation
Training was conducted using
artificially generated traces as described
above. For this work, five datasets, each with a different SNR, were
used to train five different instances of the same detector. In the
training process, we split each 20 s trace in temporal windows of
0.5 s. Accordingly, we ended up with 60,000 windows per dataset. Datasets
for testing had traces of 10 s, i.e., we ended up
with 30,000 temporal windows per dataset for testing purposes.The training process consisted of providing a random batch of temporal
windows from the training data. Only temporal windows with at least
one pulse were used for training. Empty windows were discarded for
training purposes. Therefore, ResNet 1 was not used during training.
Once the total number of temporal windows had been consumed in the
dataset, one epoch was completed. Inside one epoch, a learning rate
of 1 × 10–5 was adopted, with a learning rate
decay of 10 in a period of 100 epochs. The number of epochs used to
train a model instance depended on the level of noise in the training
dataset. The batch size was of six temporal windows in all the cases.Validation was conducted periodically, first after epoch number
50, then after epoch number 100, and from then every five epochs, i.e., after epochs number 105, 110, 115, and so on. We validated
our model by utilizing standard mAP, which is a widely used performance
metric for object detection in 2D images.[36] The model with the highest mAP was saved as the best representation.For evaluating (testing), the adapted mAP was used. In our case,
instead of using IoU as a threshold, we used the relative distance
between the midpoint of both, predicted the ground truth segments,
and referred to the length of the ground truth segment. We computed
the mAP by considering relative distance thresholds between 100% and
400% with steps of 10%. A threshold of 100% means that the distance
between the two segments, predicted and ground truth, is equal to
the length of the ground truth segment. Likewise, a threshold of 400%
means that such distance is 4 times the ground truth segment length.
Any pair of matched pulses, predicted and ground truth, with shorter
distance than such thresholds was considered as a true positive. Even
when such thresholds could seem too tolerant compared to the IoU thresholds
adopted by the detection community,1 in the
nanopore translocation application scenarios, our adapted mAP adoption
for testing appropriately reflected the requirements at time of catching
nanopore translocation events in trace windows. Coverage is another
important performance metric, which is defined as the total number
of true positives divided by the total number of ground truth segments.
We also computed the duration error and the start and end time errors.
The duration error is the relative difference between the predicted
and ground truth segment lengths relative to the ground truth segment
length. The start/end time error is the difference, in milliseconds,
between the predicted and the ground truth of start/end time. Finally,
we computed the average and standard deviation of all these metrics
for each duration in the test datasets.It is important to highlight
that during testing, for the computation
of the adapted mAP, empty windows were discarded since mAP was inconsistent
for scenes without objects. ResNet 1 was used to discard empty windows
when the model was confronted with experimental data.
Authors: Jacob H Forstater; Kyle Briggs; Joseph W F Robertson; Jessica Ettedgui; Olivier Marie-Rose; Canute Vaz; John J Kasianowicz; Vincent Tabard-Cossa; Arvind Balijepalli Journal: Anal Chem Date: 2016-11-15 Impact factor: 6.986
Authors: John Jumper; Richard Evans; Alexander Pritzel; Tim Green; Michael Figurnov; Olaf Ronneberger; Kathryn Tunyasuvunakool; Russ Bates; Augustin Žídek; Anna Potapenko; Alex Bridgland; Clemens Meyer; Simon A A Kohl; Andrew J Ballard; Andrew Cowie; Bernardino Romera-Paredes; Stanislav Nikolov; Rishub Jain; Demis Hassabis; Jonas Adler; Trevor Back; Stig Petersen; David Reiman; Ellen Clancy; Michal Zielinski; Martin Steinegger; Michalina Pacholska; Tamas Berghammer; Sebastian Bodenstein; David Silver; Oriol Vinyals; Andrew W Senior; Koray Kavukcuoglu; Pushmeet Kohli Journal: Nature Date: 2021-07-15 Impact factor: 49.962