Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Fast Step Transition and State Identification (STaSI) for Discrete Single-Molecule Data Analysis.

Literature DB >> 25247055

Fast Step Transition and State Identification (STaSI) for Discrete Single-Molecule Data Analysis.

Bo Shuang¹, David Cooper¹, J Nick Taylor², Lydia Kisley¹, Jixin Chen¹, Wenxiao Wang³, Chun Biu Li², Tamiki Komatsuzaki², Christy F Landes⁴.

Abstract

We introduce a step transition and state identification (STaSI) method for piecewise constant single-molecule data with a newly derived minimum description length equation as the objective function. We detect the step transitions using the Student's t test and group the segments into states by hierarchical clustering. The optimum number of states is determined based on the minimum description length equation. This method provides comprehensive, objective analysis of multiple traces requiring few user inputs about the underlying physical models and is faster and more precise in determining the number of states than established and cutting-edge methods for single-molecule data analysis. Perhaps most importantly, the method does not require either time-tagged photon counting or photon counting in general and thus can be applied to a broad range of experimental setups and analytes.

Entities: Disease

Year: 2014 PMID： 25247055 PMCID： PMC4167035 DOI： 10.1021/jz501435p

Source DB: PubMed Journal: J Phys Chem Lett ISSN： 1948-7185 Impact factor: 6.475

Single-molecule analysis often involves a compromise when our desire to quantify space/time heterogeneity is challenged by innately low signal-to-noise conditions. Only when these counterbalancing principles are optimized can we acquire access to equilibrium and nonequilibrium details that are unobtainable by ensemble methods. Single-molecule Förster resonance energy transfer (smFRET) measurements explore conformations and dynamics of biomolecules unresolvable in the ensemble state distribution.[1−5] In smFRET experiments, single molecules visit different structural or conformation states and generate piecewise constant signals.[1,2] Identifying states and step transitions between states is important to understand the stationary state distribution of the system and the dynamics among different states and to make testable mechanistic predictions. However, it is often challenging to identify states and transitions due to noise sources during the measurements. Established state determination methods for smFRET data are designed to extract the heterogeneity of the system buried within the mitigating fluctuations due to noise and include the Watkins and Yang change-point method,[6,70] hidden Markov model-based FRET time trajectory analysis program (HaMMy)[7−9] combined with wavelet denoising,[10,11] and variational Bayesian inference for smFRET time series (vbFRET).[12] The Watkins and Yang change-point method uses few user inputs but is designed for continuous photon-by-photon traces[6] and thus is not practical for binned data. Although collecting time-tagged photon-by-photon data is similar, and in most cases preferable to collecting binned photon data, time-tagged collection systems require more complicated and expensive pulsed excitation sources and hardware to resolve photon arrival times on single-photon counting detectors. In addition, for many other detectors used in single-molecule experiments, the collection frequencies required for single-photon collection are often faster than their temporal resolution. Thus, continuous wave excitation sources and binned photon data collection are widely used experimental simplifications of the more accurate, but expensive, time-tagged methods. Several of the most widely used single-molecule data processing algorithms (e.g., HaMMy[7] and vbFRET[12]) were specifically designed to analyze binned data because of its ubiquity and relative ease of acquisition. Both HaMMy and vbFRET assume that the data can be represented as a hidden Markov chain. HaMMy requires the user to decide the optimum number of states,[7] which is a challenge if a priori knowledge of the underlying states is unavailable. vbFRET automatically determines the optimum number of states based on maximum evidence inference,[12] but for noisy data (i.e., noise levels larger than the separation of states) or data with fast dynamics (i.e., with mean lifetimes within an order of magnitude larger than the sampling time) the method identifies redundant states due to noise- or binning-induced artifacts (Figure 3). Thus, the optimum solution for state determination remains an open question, especially for binned data.

Figure 3

Performance of STaSI using simulated five FRET states traces with fast dynamics. Only the first 200 (out of about 15 000) bin time (corresponding to 2000 sampling time for raw data in panel a) data points are shown for illustration. (a) Simulated raw data analyzed by STaSI and vbFRET. (b) Corresponding histograms of the STaSI fit, vbFRET fit, and the true states for raw data. (c) Simulated ten-point binned data analyzed by STaSI and vbFRET. (d) Corresponding histograms of the STaSI fit, vbFRET fit, and the true states for binned data.

All of these methods assume that smFRET data are generated by dynamics among several FRET states. This state distribution is usually sparse (meaning that the FRET states can be represented by several delta functions), even though experimental smFRET efficiency traces usually have a broad distribution due to noise. In this work, we introduce a step transition and state identification (STaSI) method to analyze smFRET data and recover the underlying sparse state distribution. STaSI is particularly designed for smFRET data, but, in principle, STaSI is useful for any piecewise constant signals. STaSI applies an equation we have derived for piecewise constant signals based on the minimum description length (MDL) principle[13,14] as the objective functionwhere MDL = F + G, and F measures the goodness of fit using the L1 norm and G measures the complexity of the fitting model. Compared with other information criteria, the MDL principle accounts for the detailed parameter complexity of the model[13,14]where σ is the overall noise level; y(t) and yfit(t) are the real data and fit value at time t, respectively; N is the total number of data points of the trace; k is the number of states; Ntp is the number of transition positions; V is the domain size (= ymax– ymin) for all y; n is the number of data points assigned to state i, and T is the difference of the fitting values before and after the transition position j. Here we derived G for smFRET data to consider the sparseness of the states and the transitions among these states; the full derivation is provided in the Supporting Information. MDL reaches a minimum when the increase in the complexity of the model (G) using an additional state equals the decrease in the fitting error (F) as measured by the L1 norm. Overall, the MDL equation accounts for the balance between simplicity and accuracy and guarantees the minimized solution to be the sparsest approximation.[15,16] Demonstration of STaSI using a simulated three-state FRET efficiency trace with added Gaussian noise. (a) Recursive process to detect step transitions using the Student’s t test. The step transition identified in each recursion is highlighted by the black arrows. The number of segments is indicated in the upper-left corners. (b) The iterative method to group the identified segments into states begins from the final result of step detection process and continues until only a single state remains. The merged segments from five to four states and from four to three states are highlighted by the black arrows. The number of assumed states is indicated in the upper-left corners. (c) The calculated MDL value for each state set. Clearly, the three-state set is the optimum number of states, with the global minimum MDL value. (d) The determined three-state fit (red) compared with true states (blue). The solution domain for the MDL objective function is first reduced by searching for the optimum solution for each number of states. The Student’s t test with unequal sample size and global noise level is applied to detect all of the step transitions[17] and breaks down the trace into multiple segments. The recursive process in Figure 1a applies the Student’s t test on each segment until no further transition points are found. Similar to the change-point method,[6,70] we then group these segments recursively up to one state to find the best grouping strategy for every possible number of states. In each grouping iteration, the most similar two states are grouped into one state (Figure 1b). More details and the related equations on step transition and state grouping are explained in the Supporting Information. The MDL value of the best solution under each number of states is calculated (Figure 1c), and the number of states corresponding to the global minimum MDL value is considered the optimum fitting model (Figure 1d). In short, this strategy first calculates the best fitting for different number of states, and the minimum MDL determines the optimum number of states. While we do not have undeniable proof that the final analysis reaches the global minimum MDL value, our performance tests (Figure 2 and 3) provide strong evidence of such. Moreover, this preselection scenario dramatically reduces the complexity of the algorithm from a computationally impossible nondeterministic polynomial-time hard (NP-hard)[18] classification to one in which the computational time scales with N2 (Supporting Information Figure S4).

Figure 1

Demonstration of STaSI using a simulated three-state FRET efficiency trace with added Gaussian noise. (a) Recursive process to detect step transitions using the Student’s t test. The step transition identified in each recursion is highlighted by the black arrows. The number of segments is indicated in the upper-left corners. (b) The iterative method to group the identified segments into states begins from the final result of step detection process and continues until only a single state remains. The merged segments from five to four states and from four to three states are highlighted by the black arrows. The number of assumed states is indicated in the upper-left corners. (c) The calculated MDL value for each state set. Clearly, the three-state set is the optimum number of states, with the global minimum MDL value. (d) The determined three-state fit (red) compared with true states (blue).

Figure 2

Comparison between the L1 norm and the L2 norm for data with different noise levels and mean lifetime of the states. The horizontal axis labels the four different noise levels, and the vertical axis labels the five different mean lifetimes of the states. The simulation uses five FRET states: 0.2, 0.25, 0.35, 0.5, and 0.7; a sampling time of 1 ms; and a binning time of 10 ms. Under each condition, 100 simulations are repeated. The different colors represent the success rates of correctly identifying the number of states. (a) Using the L1 norm analyzing raw data. (b) Using the L2 norm analyzing raw data. (c) Using the L1 norm analyzing binned data. (d) Using the L2 norm analyzing binned data. Using the L1 norm to measure F, the goodness of fit, is important to find the sparsest approximation of the real solution[15,16] and is robust to high noise levels, non-Gaussian noise, and binning artifacts,[19] as shown in Figure 2. While F is usually measured by the L2 norm (squared error),[14] our simulations show that under the typical noisy conditions of single-molecule measurements the L2 norm generates many redundant states (Figure 2 and Supporting Information Figures S1–S3). Using the L1 norm, STaSI identifies the correct number of states successfully (success rate >70%) for noise level smaller than 0.12 and mean lifetime of the states longer than 0.05 s (Figure 2a). Using the L2 norm, STaSI only finds the correct number of states when the noise level is smaller than 0.06 and the mean-lifetime is longer than 0.25 s (Figure 2b). For the L1 norm, binning improves the success rate of finding the correct number of states for noisy data with mean-lifetimes longer than 0.25 s (Figure 2c). For the L2 norm, STaSI fails to find the correct number of states using binned data (Figure 2d). Overall, by using the L1 norm, STaSI can successfully recover the state distribution under broad noise and mean-lifetime conditions. Similar results of using the L2 and L1 norm have been reported in other applications.[19] Therefore, for our desired application to single-molecule data, we use the L1 norm to quantify F. Performance of STaSI using simulated five FRET states traces with fast dynamics. Only the first 200 (out of about 15 000) bin time (corresponding to 2000 sampling time for raw data in panel a) data points are shown for illustration. (a) Simulated raw data analyzed by STaSI and vbFRET. (b) Corresponding histograms of the STaSI fit, vbFRET fit, and the true states for raw data. (c) Simulated ten-point binned data analyzed by STaSI and vbFRET. (d) Corresponding histograms of the STaSI fit, vbFRET fit, and the true states for binned data. The STaSI method can correctly analyze noisy smFRET data containing fast dynamics (i.e., when the interstate transition time is within ∼10 × the collection bin time). The signal-to-noise ratio can usually be improved through binning, but in the presence of fast dynamics, binning introduces artifact states in between these real states and limits the temporal resolution of single-molecule FRET. In Figure 3, STaSI identifies all of the states for both the noisy raw data (with 12% state assignment error, 8% state distribution error, and 0.008 absolute efficiency deviation) and the binned data (with 20% state assignment error, 9% state distribution error, and 0.013 absolute efficiency deviation). In comparison, vbFRET identifies four false states and fails to identify one state due to the influence of the noise for the raw data (Figure 3a,b). For binned data, vbFRET identifies six artifact states in between the real states (Figure 3c,d). This test demonstrates that STaSI provides a solution to interpret noisy data with fast dynamics, improving the experimental temporal resolution. The relatively large error for binned data is due to the presence of fast dynamics where multiple states are averaged in a single bin time. The binned data are preferred for data with relatively slower dynamics and a large noise level (Figure 2c). This improved performance of STaSI is mainly due to the MDL equation we derived for the piecewise constant signals. Because STaSI is parameter-free in terms of both the analysis and the collection method, the analysis is not limited to smFRET data. STaSI can be directly applied with other piecewise constant signals such as those that occur in imaging,[20] optical tweezers,[21] or scanning probe analyses. Methods based on the hidden Markov chain like HaMMy and vbFRET can be extended to apply the MDL principle to search for the optimum number of states. Tuning the noise level parameter in the Student’s t test can make the method more sensitive to smaller transitions or allow it to only capture relatively large transitions. Overall, STaSI is a good example of applying different information theory techniques for robust single-molecule data analysis. In summary, we have designed STaSI to analyze the states and interstate step transitions of piecewise constant signals. STaSI combines the Student’s t test and a new derivation of the MDL equation to optimize the analysis for piecewise constant signals. This method fills the gap of change-point detection with discontinuous binned data, especially in the single-molecule field, and improves the state determination for noisy data or data with fast dynamics. STaSI saves effort and time to identify the transition positions manually and decreases user biases when analyzing complicated data. In the future, we plan to apply this algorithm to other situations such as single-molecule instantaneous displacements in heterogeneous environments,[22] engineered surface association and dissociation,[20] and aggregation of conjugated polymers.[23] The performance of STaSI under different assumptions will be explored in detail using simulated FRET traces under different models from molecular dynamic levels.[24]

20 in total

1. Dynamic disorder in single-enzyme experiments: facts and artifacts.

Authors: Tatyana G Terentyeva; Hans Engelkamp; Alan E Rowan; Tamiki Komatsuzaki; Johan Hofkens; Chun-Biu Li; Kerstin Blank
Journal: ACS Nano Date: 2011-12-06 Impact factor: 15.881

2. Assembly dynamics of microtubules at molecular resolution.

Authors: Jacob W J Kerssemakers; E Laura Munteanu; Liedewij Laan; Tim L Noetzel; Marcel E Janson; Marileen Dogterom
Journal: Nature Date: 2006-06-25 Impact factor: 49.962

3. Expectation-maximization of the potential of mean force and diffusion coefficient in Langevin dynamics from single molecule FRET data photon by photon.

Authors: Kevin R Haas; Haw Yang; Jhih-Wei Chu
Journal: J Phys Chem B Date: 2013-09-20 Impact factor: 2.991

4. Learning rates and states from biophysical time series: a Bayesian approach to model selection and single-molecule FRET data.

Authors: Jonathan E Bronson; Jingyi Fei; Jake M Hofman; Ruben L Gonzalez; Chris H Wiggins
Journal: Biophys J Date: 2009-12-16 Impact factor: 4.033

5. Denoising single-molecule FRET trajectories with wavelets and Bayesian inference.

Authors: J Nick Taylor; Dmitrii E Makarov; Christy F Landes
Journal: Biophys J Date: 2010-01-06 Impact factor: 4.033

Review 6. A practical guide to single-molecule FRET.

Authors: Rahul Roy; Sungchul Hohng; Taekjip Ha
Journal: Nat Methods Date: 2008-06 Impact factor: 28.547

7. Toward a practical face recognition system: robust alignment and illumination by sparse representation.

Authors: Andrew Wagner; John Wright; Arvind Ganesh; Zihan Zhou; Hossein Mobahi; Yi Ma
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2012-02 Impact factor: 6.226

8. Detecting the conformation of individual proteins in live cells.

Authors: John J Sakon; Keith R Weninger
Journal: Nat Methods Date: 2010-01-31 Impact factor: 28.547

9. Structural landscape of isolated agonist-binding domains from single AMPA receptors.

Authors: Christy F Landes; Anu Rambhadran; J Nick Taylor; Ferandre Salatan; Vasanthi Jayaraman
Journal: Nat Chem Biol Date: 2011-02-06 Impact factor: 15.040

10. Complex RNA folding kinetics revealed by single-molecule FRET and hidden Markov models.

Authors: Bettina G Keller; Andrei Kobitski; Andres Jäschke; G Ulrich Nienhaus; Frank Noé
Journal: J Am Chem Soc Date: 2014-03-14 Impact factor: 15.419

31 in total

1. Single-molecule FRET methods to study the dynamics of proteins at work.

Authors: Hisham Mazal; Gilad Haran
Journal: Curr Opin Biomed Eng Date: 2019-08-23

2. Phosphorylation Induces Conformational Rigidity at the C-Terminal Domain of AMPA Receptors.

Authors: Sudeshna Chatterjee; Carina Ade; Caitlin E Nurik; Nicole C Carrejo; Chayan Dutta; Vasanthi Jayaraman; Christy F Landes
Journal: J Phys Chem B Date: 2018-12-27 Impact factor: 2.991

3. Conformational transitions in the glycine-bound GluN1 NMDA receptor LBD via single-molecule FRET.

Authors: David R Cooper; Drew M Dolino; Henriette Jaurich; Bo Shuang; Swarna Ramaswamy; Caitlin E Nurik; Jixin Chen; Vasanthi Jayaraman; Christy F Landes
Journal: Biophys J Date: 2015-07-07 Impact factor: 4.033

4. Dynamics of Membrane-Bound G12V-KRAS from Simulations and Single-Molecule FRET in Native Nanodiscs.

Authors: Priyanka Prakash; Douglas Litwin; Hong Liang; Suparna Sarkar-Banerjee; Drew Dolino; Yong Zhou; John F Hancock; Vasanthi Jayaraman; Alemayehu A Gorfe
Journal: Biophys J Date: 2018-12-20 Impact factor: 4.033

5. Allosteric Changes in the NMDA Receptor Associated with Calcium-Dependent Inactivation.

Authors: Nidhi Kaur Bhatia; Elisa Carrillo; Ryan J Durham; Vladimir Berka; Vasanthi Jayaraman
Journal: Biophys J Date: 2020-10-22 Impact factor: 4.033

6. Quantifying the Assembly of Multicomponent Molecular Machines by Single-Molecule Total Internal Reflection Fluorescence Microscopy.

Authors: E M Boehm; S Subramanyam; M Ghoneim; M Todd Washington; M Spies
Journal: Methods Enzymol Date: 2016-10-10 Impact factor: 1.600