Literature DB >> 30175546

XLSY: Extra-Large NMR Spectroscopy.

Yulia Pustovalova1, Maxim Mayzel2, Vladislav Yu Orekhov1,2.   

Abstract

NMR studies of intrinsically disordered proteins and other complex biomolecular systems require spectra with the highest resolution and dimensionality. An efficient approach, extra-large NMR spectroscopy, is presented for experimental data collection, reconstruction, and handling of very large NMR spectra by a combination of the radial and non-uniform sampling, a new processing algorithm, and rigorous statistical validation. We demonstrate the first high-quality reconstruction of a full seven-dimensional HNCOCACONH and two five-dimensional HACACONH and HN(CA)CONH experiments for a representative intrinsically disordered protein α-synuclein. XLSY will significantly enhance the NMR toolbox in challenging biomolecular studies.
© 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

Entities:  

Keywords:  NMR spectroscopy; XLSY; intrinsically disordered protein; non-uniform sampling

Mesh:

Substances:

Year:  2018        PMID: 30175546      PMCID: PMC6585689          DOI: 10.1002/anie.201806144

Source DB:  PubMed          Journal:  Angew Chem Int Ed Engl        ISSN: 1433-7851            Impact factor:   15.336


With the ever‐increasing sizes and complexity of biomolecular systems studied by NMR spectroscopy, the number of peaks and hence signal overlap increases which seriously complicates and compromises the data analysis. The problem is addressed by enhancing spectral resolution and dimensionality with the radial (RS) and non‐uniform sampling (NUS).1 However, the task of reconstructing and handling of very large spectra is still awaiting a good solution. The RS approach, which is based on direct analysis of planar spectral projections, is an efficient way of signal detection in high‐dimensional experiments.2 However, a set of planar projections is not a fair substitute for a true multidimensional experiment, especially in case of a highly crowded spectrum of a challenging protein system such as an intrinsically disordered protein (IDP). The best algorithms for reconstructing spectra from NUS3 data are impractical for large data sets owing to unbearable computational and storage requirements. Existing methods for spectra with five dimensions require a three‐dimensional reference spectrum or peak list4 whereas six‐ and seven‐dimensional spectra are produced only as their reduced dimensionality projections.4b A possible solution for the large data sets may be found in the family of parametric algorithms,5 although no examples of spectra reconstructions with more than four dimensions have been presented so far. Herein we introduce XLSY–NMR spectroscopy for extra‐large datasets. When dealing with a large spectrum, the main problem stems from its size that requires huge amounts of computational power for processing and by far exceeds computer memory. Notably, a multidimensional NMR spectrum is sparse and thus can be presented in a compact form both in time and frequency domains. XLSY is a non‐iterative procedure that converts a small number of RS/NUS measurements into a compact, high‐quality sparse spectrum without ever dealing with the huge full data representation in either the time or frequency domains. The XLSY algorithm for spectrum reconstruction consists of three steps: 1) frequency identification, 2) intensity evaluation, and 3) validation. The frequency identification borrows part of the SFFT algorithm6 for finding a short list of frequencies in the spectrum that may have significant (that is, higher than noise) intensities. This part is based on the radial sampling and the Fourier projection theorem.7 For an illustration of the algorithm, let us consider the simplest case of only two spectral dimensions each spanning N points. The two‐dimensional spectrum contains N×N points with frequency coordinates (f 1,f 2). In an experiment, we measure a one‐dimensional projection that contains N frequency points enumerated by index f. To distinguish frequency points in a multidimensional spectrum (f 1,f 2) and in a 1D projection, we call the later buckets. Value of each bucket with position f is given by the sum of N points of the 2D, which positions (f 1,f 2) fulfil the relation [Eq. (1a)a)]: For a spectrum with many dimensions, the corresponding general relation is [Eq. (1b)b)]: where mod is the modulo operator, α 1 and α 2 are integers, and α 1/α 2 represents a slope of the projection. Since the spectrum is sparse, most buckets contain only noise and are considered empty. A few buckets, the intensities of which exceed a chosen noise threshold, correspond to one or a few non‐zero frequencies points (f 1,f 2). To find the exact position of these essential frequencies in the 2D plane, we measure and analyze several projections with different tilt angles. Figure 1 illustrates the cumulative analysis of several projections from a 2D spectrum with only 3 non‐zero points. After occupied buckets of the first projection are identified, all N frequencies that contributed to those buckets get a vote. This procedure is repeated for different projections until a consistent short list of essential frequencies is discriminated by maximum number of votes. Points with frequencies that accumulated none or too few votes are considered to have exactly zero intensities in the spectrum and are omitted from further consideration and storage. To ensure picking up of low‐intensity peaks, the threshold in the projections can be as low as 2 σ noise and the voting cut‐off is set on the level of 70–80 % of the maximum defined by the number of used projections. For example, in Figure 1 d, three correct frequencies collect a maximal possible number of four votes each (blue). There are also four points that collect 3 votes (the darkest brown), which are artefacts of the radial sampling. These points may be kept in the short list of frequencies and will be eliminated at the final validation step.
Figure 1

An illustration of the voting procedure using Eq. (1): a) positions of three signals, b) voting with two orthogonal projections, c) addition of the first diagonal projection, and d) voting with two orthogonal and two diagonal projections. Pixel colour in (b)–(d) from light to dark indicates the number of votes from one to four. See the text for more explanations.

An illustration of the voting procedure using Eq. (1): a) positions of three signals, b) voting with two orthogonal projections, c) addition of the first diagonal projection, and d) voting with two orthogonal and two diagonal projections. Pixel colour in (b)–(d) from light to dark indicates the number of votes from one to four. See the text for more explanations. Evaluation of intensities for the frequencies shortlisted at the identification step is performed by solving a system of linear equations [Eq. (2a)a)]: where vector consists of N f unknown spectral intensities. Vector is composed of N t experimental complex time‐domain data points. is a N t×N f complex matrix obtained from the matrix of discrete inverse d‐dimensional Fourier transform by retaining only columns and rows corresponding to the shortlisted frequencies and available experimental points, respectively. Matrix elements A are calculated as [Eq. (2b)a)] where d is the number of indirect dimensions spanning N points each (in our case the same for all dimensions); k̂ is a d‐dimensional vector of coordinates of the k‐th point in the frequency domain corresponding to intensity s; and n̂ is a d‐dimensional vector of coordinates of the n‐th point in the time domain corresponding to the measured value t. To obtain a unique and reliable solution, matrix in the system in Equation (2a) must be skinny, that is, number of unknown spectral intensities N f must be lower than the number of linear equations N t. Besides, to obtain well‐conditioned matrix , it is essential to augment the RS data by NUS measurements. The possibility to use NUS along with RS data is a key feature of the new method that for the first time allowed ambiguities to be resolved and spurious aliasing peaks that are inherent in RS data to be avoided.1b, 3c To further stabilize the solution of the linear system in Equation (2a) for the most crowded spectral regions, we use a mild Tikhonov regularization.8 At the final validation step, which is also unique for the XLSY algorithm, the bootstrap approach9 is used to estimate individual uncertainties for every calculated point in the spectrum. This defines a local noise level that may vary significantly from one spectral region to another depending on the signal density. The local noise estimate also sets an upper intensity boundary for the weak peaks that might be lost in the spectrum reconstruction. Furthermore, the uncertainties play an important role in the XLSY algorithm for detecting true weak peaks and discarding a sizable fraction (20–80 %) of points in the frequency shortlist that originate from the radial sampling artefacts. Since the NUS data are used for solving Equation (2), these frequencies are easily spotted as having statistically insignificant intensities. Additional comment should be made about sensitivity of the XLSY method. Although at the evaluation and validations steps of the algorithm, all the RS and NUS experimental data can be used together, the identification step implies signal thresholding in the individual projections, which are recorded in a fraction of total experiment time. Similar to the APSY approach,2d,2e the corresponding loss of sensitivity is largely offset by the combined analysis of the projections in the voting procedure. For our spectra, the signal threshold in the projections was set to 2–3 σ noise level, which is well below the threshold commonly used for peak detection in individual NMR spectra. Note that 5 % (1 %) of noise intensities exceed the 2 (3) σ noise threshold just by chance. We demonstrate the XLSY by the first high‐quality reconstruction of three very large spectra consisting of 1010–1015 points. Figure 2 illustrates XLSY reconstructions of 7D HNCOCACONH,10 5D HACACONH,11 and 5D HN(CA)CONH12 spectra for a representative 14 kDa IDP α‐synuclein (Table 1). Figure 2 b–d highlights the genuine resolution of the 7D spectrum. Peaks of E105 and E131 from EEG sequence repeats, which are resolved in the 7D HNCOCACONH spectrum, are fully overlapped in 5D spectra (insets of Figure 2 e) and cannot be resolved in any radially sampled planar projections of the spectrum.
Figure 2

XLSY reconstructions of α‐synuclein spectra. a) CO /N projection of 7D HNCOCACONH spectrum. b)–d) Slices of the 7D spectrum through peaks for E105 and E131. e) NCα projection of 5D HACACONH spectrum. Insets show 1D cross‐sections taken through the cross peak E131/E130 (overlapped with E105/E104). f) An example of a sequential assignment walk in the N/N projection from 5D HN(CA)CONH spectrum.

Table 1

Parameters of NMR experiments and XLSY reconstructions.

5D7D
Experimental timeRS/NUS/total, hours23.5/3.5/2749/68/117
Time‐domain projections1914
Number of NUS points75310
Number of shortlisted frequencies8×104/6×104[a] 3×106
Final size of the reconstruction after validation, pts3.9×104/3.5×104[a] 4.6×104

[a] 5D HACACONH/ 5D HN(CA)CONH spectra, respectively.

XLSY reconstructions of α‐synuclein spectra. a) CO /N projection of 7D HNCOCACONH spectrum. b)–d) Slices of the 7D spectrum through peaks for E105 and E131. e) NCα projection of 5D HACACONH spectrum. Insets show 1D cross‐sections taken through the cross peak E131/E130 (overlapped with E105/E104). f) An example of a sequential assignment walk in the N/N projection from 5D HN(CA)CONH spectrum. Parameters of NMR experiments and XLSY reconstructions. [a] 5D HACACONH/ 5D HN(CA)CONH spectra, respectively. The XLSY spectra can be easily handled and analyzed. They have compact sparse‐matrix representation and contain only statistically validated intensities. For visualization and detailed analysis, any spectral slice or projection can be obtained including those that are very difficult or impossible to obtain in experiments with lower dimensions. Figure 2 a shows example of a unique orthogonal projection CO /N of the 7D HNCOCACONH spectrum. Figures 2 e,f illustrate quality of the 5D HACACONH and 5D HN(CA)CONH spectra with two planar projections N/Cα and N/N. Figure 2 f shows a partial sequential assignment walk performed in the 5D HN(CA)CONH spectrum for the stretch of amino acids from A69 to T64. In the multidimensional spectra, peaks are well‐resolved and semi‐automatic signal detection is straightforward. In the 5D HACACONH and 5D HN(CA)CONH spectra, we found all peaks expected for α‐synuclein with exception of five prolines and two residues at the N‐terminus. In the 7D HNCOCACONH we found all peaks that were present in the orthogonal projections of the experiment. The peak lists together with the corresponding backbone assignment of α‐synuclein are deposited in BMRB (27586). The assignment is in line with the published assignment (BMRB No. 6968), which was obtained at different sample temperature. In conclusion, by demonstrating the first high quality reconstruction of complete 7D and 5D spectra of a representative IDP we introduce the XLSY method that removes the limits on spectrum dimensionality and resolution imposed by the existing signal acquisition and processing approaches. We envisage that the method will be most useful in studies of IDPs and in automatized high‐throughput characterization of small and medium size globular protein systems, where experiments with high dimensionality and resolution are in the highest demand.13

Conflict of interest

The authors declare no conflict of interest. As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re‐organized for online delivery, but are not copy‐edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors. Supplementary Click here for additional data file.
  20 in total

Review 1.  Radial sampling for fast NMR: Concepts and practices over three decades.

Authors:  Brian E Coggins; Ronald A Venters; Pei Zhou
Journal:  Prog Nucl Magn Reson Spectrosc       Date:  2010-07-30       Impact factor: 9.795

2.  Automatic assignment of the intrinsically disordered protein Tau with 441-residues.

Authors:  Rhagavendran L Narayanan; Ulrich H N Dürr; Stefan Bibow; Jacek Biernat; Eckhard Mandelkow; Markus Zweckstetter
Journal:  J Am Chem Soc       Date:  2010-09-01       Impact factor: 15.419

3.  High-resolution iterative frequency identification for NMR as a general strategy for multidimensional data collection.

Authors:  Hamid R Eghbalnia; Arash Bahrami; Marco Tonelli; Klaas Hallenga; John L Markley
Journal:  J Am Chem Soc       Date:  2005-09-14       Impact factor: 15.419

4.  Removal of a time barrier for high-resolution multidimensional NMR spectroscopy.

Authors:  Victor Jaravine; Ilgis Ibraghimov; Vladislav Yu Orekhov
Journal:  Nat Methods       Date:  2006-08       Impact factor: 28.547

5.  APSY-NMR with proteins: practical aspects and backbone assignment.

Authors:  Sebastian Hiller; Gerhard Wider; Kurt Wüthrich
Journal:  J Biomol NMR       Date:  2008-10-08       Impact factor: 2.835

6.  Accelerated NMR spectroscopy by using compressed sensing.

Authors:  Krzysztof Kazimierczuk; Vladislav Yu Orekhov
Journal:  Angew Chem Int Ed Engl       Date:  2011-04-29       Impact factor: 15.336

7.  Application of iterative soft thresholding for fast reconstruction of NMR data non-uniformly sampled with multidimensional Poisson Gap scheduling.

Authors:  Sven G Hyberts; Alexander G Milbradt; Andreas B Wagner; Haribabu Arthanari; Gerhard Wagner
Journal:  J Biomol NMR       Date:  2012-02-14       Impact factor: 2.835

8.  Reconstruction of non-uniformly sampled five-dimensional NMR spectra by signal separation algorithm.

Authors:  Krzysztof Kosiński; Jan Stanek; Michał J Górka; Szymon Żerko; Wiktor Koźmiński
Journal:  J Biomol NMR       Date:  2017-02-28       Impact factor: 2.835

9.  XLSY: Extra-Large NMR Spectroscopy.

Authors:  Yulia Pustovalova; Maxim Mayzel; Vladislav Yu Orekhov
Journal:  Angew Chem Int Ed Engl       Date:  2018-10-01       Impact factor: 15.336

10.  Six- and seven-dimensional experiments by combination of sparse random sampling and projection spectroscopy dedicated for backbone resonance assignment of intrinsically disordered proteins.

Authors:  Szymon Żerko; Wiktor Koźmiński
Journal:  J Biomol NMR       Date:  2015-09-24       Impact factor: 2.835

View more
  5 in total

1.  XLSY: Extra-Large NMR Spectroscopy.

Authors:  Yulia Pustovalova; Maxim Mayzel; Vladislav Yu Orekhov
Journal:  Angew Chem Int Ed Engl       Date:  2018-10-01       Impact factor: 15.336

2.  Using Deep Neural Networks to Reconstruct Non-uniformly Sampled NMR Spectra.

Authors:  D Flemming Hansen
Journal:  J Biomol NMR       Date:  2019-07-10       Impact factor: 2.835

3.  Novel NMR Assignment Strategy Reveals Structural Heterogeneity in Solution of the nsP3 HVD Domain of Venezuelan Equine Encephalitis Virus.

Authors:  Peter Agback; Andrey Shernyukov; Francisco Dominguez; Tatiana Agback; Elena I Frolova
Journal:  Molecules       Date:  2020-12-10       Impact factor: 4.411

4.  Linear discriminant analysis reveals hidden patterns in NMR chemical shifts of intrinsically disordered proteins.

Authors:  Javier A Romero; Paulina Putko; Mateusz Urbańczyk; Krzysztof Kazimierczuk; Anna Zawadzka-Kazimierczuk
Journal:  PLoS Comput Biol       Date:  2022-10-06       Impact factor: 4.779

5.  Unambiguous Tracking of Protein Phosphorylation by Fast High-Resolution FOSY NMR*.

Authors:  Dmitry M Lesovoy; Panagiota S Georgoulia; Tammo Diercks; Irena Matečko-Burmann; Björn M Burmann; Eduard V Bocharov; Wolfgang Bermel; Vladislav Y Orekhov
Journal:  Angew Chem Int Ed Engl       Date:  2021-07-13       Impact factor: 15.336

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.