Jia Xu1, Steven R Van Doren1. 1. Department of Biochemistry, University of Missouri , 117 Schweitzer Hall, Columbia, Missouri 65211, United States.
Abstract
Evidence is presented that binding isotherms, simple or biphasic, can be extracted directly from noninterpreted, complex 2D NMR spectra using principal component analysis (PCA) to reveal the largest trend(s) across the series. This approach renders peak picking unnecessary for tracking population changes. In 1:1 binding, the first principal component captures the binding isotherm from NMR-detected titrations in fast, slow, and even intermediate and mixed exchange regimes, as illustrated for phospholigand associations with proteins. Although the sigmoidal shifts and line broadening of intermediate exchange distorts binding isotherms constructed conventionally, applying PCA directly to these spectra along with Pareto scaling overcomes the distortion. Applying PCA to time-domain NMR data also yields binding isotherms from titrations in fast or slow exchange. The algorithm readily extracts from magnetic resonance imaging movie time courses such as breathing and heart rate in chest imaging. Similarly, two-step binding processes detected by NMR are easily captured by principal components 1 and 2. PCA obviates the customary focus on specific peaks or regions of images. Applying it directly to a series of complex data will easily delineate binding isotherms, equilibrium shifts, and time courses of reactions or fluctuations.
Evidence is presented that binding isotherms, simple or biphasic, can be extracted directly from noninterpreted, complex 2D NMR spectra using principal component analysis (PCA) to reveal the largest trend(s) across the series. This approach renders peak picking unnecessary for tracking population changes. In 1:1 binding, the first principal component captures the binding isotherm from NMR-detected titrations in fast, slow, and even intermediate and mixed exchange regimes, as illustrated for phospholigand associations with proteins. Although the sigmoidal shifts and line broadening of intermediate exchange distorts binding isotherms constructed conventionally, applying PCA directly to these spectra along with Pareto scaling overcomes the distortion. Applying PCA to time-domain NMR data also yields binding isotherms from titrations in fast or slow exchange. The algorithm readily extracts from magnetic resonance imaging movie time courses such as breathing and heart rate in chest imaging. Similarly, two-step binding processes detected by NMR are easily captured by principal components 1 and 2. PCA obviates the customary focus on specific peaks or regions of images. Applying it directly to a series of complex data will easily delineate binding isotherms, equilibrium shifts, and time courses of reactions or fluctuations.
Affinity
measurements are essential
in understanding molecular recognition and in assessing drug discovery.
Time courses of chemical and biological transformations are of wide
interest. A theme shared in monitoring either equilibria or kinetics
is to describe the shifts in population, the central interest of this
Article. We propose to marshal a classic method of chemometrics to
follow such shifts more generally.In the case of ligand associations,
a preferred spectral approach
has been heteronuclear NMR, due to its information on binding site
and suitability over a range of affinities.[1−5] Typically, the ligand-binding equilibrium is monitored
by shifts of NMR peaks.[1,2,4] Arriving
at affinities, however, has meant traveling through slow bottlenecks
of spectral peak picking to obtain binding isotherms, usually assignment
of the peaks, and global fitting of a binding isotherm consistent
with the shifts of multiple peaks of the protein or macromolecule.[6] Despite the advantages of this approach and rapidity
of modern collection of spectra,[7,8] the time invested in
interpreting these spectra is a barrier to wider and faster applications.
Below, we propose an improved strategy that bypasses the selection
of favorable peaks in spectra and favorable features in images for
analysis.The stepwise population changes due to ligand binding
in a titration
are usually accompanied by changes in NMR peaks that depend on the
exchange regime, i.e., the time scale of chemical exchange relative
to the chemical shift differences between free and bound states. Behaviors
of fast, slow, and intermediate exchange regimes are depicted in Figure S1. Peak shifts in the fast exchange regime
are favored for modeling binding isotherms.[4,9] In
the slow exchange regime, peaks representing the free state can disappear
and reappear elsewhere in the bound state, complicating peak assignments.
In intermediate exchange, the nonlinearity of chemical shift changes
from titrations can corrupt binding isotherms with sigmoidal distortion,
resulting in skewed and unreliable fits of the association[4] (Figure S1).Principal component analysis (PCA) reduces the dimensionality of
data to reveal a simpler set of shared features or patterns. It is
efficient, robust, and widely applied in chemometrics, analytical
spectroscopy, and imaging.[10,11] PCA is often implemented
using singular value decomposition (SVD). The approach has only occasionally
been applied to reactions monitored by 2D NMR spectra.[12−16] These included resolution of time-dependent[12] or pH-dependent components (using CS-PCA).[13] PCA filtered noise out of spectra to improve global fits of binding.[15] SVD of peak heights from in-cell NMR spectra
of proteins associating suggested the binding site.[16] The SVD of these NMR studies was applied to peak pick lists,[13−16] rather than to the stack of 2D NMR spectra “unfolded”
into a stack of vectors, which avoided peak lists and worked well
on sparse 2D NMR spectra.[12] In NMR-detected
titrations, the applicability of PCA is regarded at this writing as
limited to the fast exchange regime.[14,17,18] The need for wide applicability to complex scenarios
such as binding of multiple ligands, mixtures of chemical exchange
regimes, and changing linewidths was articulated.[14] The work herein responds to this need.PCA can be
computed by either SVD or eigenvector decomposition
of covariance, aiming at maximization of variance with minimization
of correlation and redundancy (see the Supporting Information for more detail). PCA computes new orthogonal components
that are linear combinations of the original experimental variables,
with the first principal component (PC1) reporting the largest variance.
Jolliffe asserts that PCA is often useful for data deviating from
Gaussian distributions and linear relationships of observed variables
to underlying components.[19]Magnetic
resonance imaging (MRI) of brain and diseased tissues
presents opportunities for chemometrics, such as comparing and registering
images spatially, temporally, and metabolically.[20−24] Resolution of trends of change between the frames
of a stack of congruent images or 2D spectra can be undertaken by
three-way multiple image analysis such as “unfold”-PCA,
which simplifies the 3D stack into two dimensions for standard PCA.[12,25]We demonstrate how to extend unfold-PCA to extract binding
isotherms
successfully from 2D NMR spectra of ligand titrations in slow exchange
and problematic intermediate exchange by introducing preprocessing
steps. Moreover, the improved approach needs no peak picking
or peak assignments. The algorithm is even successful in
deriving binding isotherms from the unprocessed free induction decays
(FIDs) from titrations in fast or slow exchange. When a second binding
process has been detected spectrally, PCA can also derive it as the
second component of the reaction. Likewise, this enhancement of unfold-PCA
is general enough to extract multiple and periodic time-varying components
from MRI movies. Applying PCA directly to a series of spectra or images
saves much time in handling them and in resolving the processes present.
Experimental
Section
Preprocessing of Spectra and Images for SVD
Each spectrum
or image in the series of measurements is collected and processed
under identical conditions, except for the experimental variable changed
(concentration, time, pH, etc.). Each 2D spectrum or image (F1 ×
F2 points) is rearranged as a 1D vector arrayed over the experimental
variable[25] (Figure S2). Each vector is compressed, by deleting unchanging positions,
in order to expedite computational manipulations of the matrix X′. Low intensity regions of the vectorized
spectra were usually filtered out prior to SVD. Alternative choices
of no scaling, autoscaling, and Pareto scaling[26] of the rows of X′ were compared. The
rows were mean-centered.[11]
Extraction
of Principle Components
SVD of X′ can
be expressed aswhere U and V are orthogonal matrices, S is a
diagonal matrix, and subscripts denote sizes of matrices. The eigenvectors
of X·X′ constitute
the matrix V containing the
singular vectors of interest, such as PC1 as the first row with the
largest trend (Figure S2) and PC2 as the
second row with the second largest trend. PC1 may depend on time,[27] [ligand],[15] or other
conditions.[13] The simulations of NMR spectra
used for part of the testing PCA applied directly to them are described
in the Supporting Information.
Results
and Discussion
PCA Capture of Time Courses
We extended
the unfold-PCA
strategy of converting a 3D stack of 2D NMR spectra (perturbed by
the experimental variable) into a 2D array of vectors for SVD.[12] To improve performance, we inserted preprocessing
steps for data compression, noise filtration, and scaling options
(Figure S2). We automated these processing
and calculation procedures for multiple data formats.[28] This algorithm avoids user selection of features in the
data (Figure S2). Its ability to capture
main trends is introduced using time-lapse images of a sunset or multiplying
bacteria (Figure S3). The trajectory of
the setting sun is marked by PC1 (Figure S3A,B). The exponential growth in bacteria is represented by PC1, despite
their motility (Figure S3C,D). Applying
the same PCA approach to time-lapse 2D NMR spectra captures a reaction
progress curve as PC1. Changes in 1H–15N correlation spectra have been used to track dephosphorylation or
phosphorylation rates.[29,30] PCA applied directly to time-lapse
TROSY spectra of a phosphoryl transfer enzyme reveals the time course
of dephosphorylation (Figure S3E,F). The
kinetics derived from unsupervised PCA of entire spectra echo those
obtained from global fitting of carefully selected peak height changes[29] but with new ease.
Fast Exchange Scenarios
PCA was demonstrated on peak
pick lists of titrations with NMR peaks in the fast exchange regime,
where the shifts of the peak positions are linear combinations of
the basis spectra and suffice to indicate population change.[13,14,16] However, applying PCA directly
to noninterpreted spectra means that more information is considered:
not only selected peak positions but also line shapes (widths, heights,
volumes, etc.) throughout the spectrum. Autoscaling[32] and Pareto scaling[26] perform
acceptably when applying the improved algorithm to fast exchange (Figure S4A,B). Autoscaling is, however, more
accurate and precise for fast exchange, especially with the threshold
for retention of spectral points set to 3- to 7-fold the noise level
(Figure S4A,B).The list-based and
improved spectrum-based implementations of PCA reproduce conventional
results in obtaining binding isotherms. An example of 1:1 protein–ligand
binding in the fast exchange regime with KD set to 270 μM is shown with the simulated titration of Figure A. Application of
PCA to lists of all peaks provides an accurate binding isotherm as
PC1 plotted vs [ligand]. Fitting to standard eq S4 places KD at 271 ± 17 μM
(Figure B). This indicates
that PCA of all peak positions, whether shifted by the ligand or not,
matches conventional global fitting of only the big shifts of well-resolved
peaks. It is more convenient and thorough to apply the improved unfold-PCA
algorithm directly to the spectra (Figure S2). The binding isotherm captured as PC1 in this way reproduces the
true populations (Figure B). This is also illustrated for the titration of a phosphoprotein
binding domain with a phosphoThr peptide in fast exchange[31] (Figure C). PC1 direct from the spectra delineates the binding isotherm
fitted by KD of 36 ± 4 μM (Figure D), which closely
resembles the binding isotherms and KD of 40 ± 5 μM globally fitted previously to the shifts
of multiple amide peaks.[31] PCA of lists
of the spectral peaks picked from the titration provides PC1 fitted
by a similar KD of 34 ± 3 μM
(Figure D).
Figure 1
PC1 from SVD
of titrations in fast exchange, simulated or measured,
represents Langmuir binding isotherms. (A) Simulated spectral shifts
in the fast exchange regime. The colors of the contours progress with
ligand additions up to 10-fold excess. (B) Binding isotherms were
obtained by applying SVD to the simulated spectra without peak picking
(triangles), peak pick lists (circles), or the simulated raw FIDs
(open squares). Black squares mark conventional, global fitting of
the shifts of individual peaks. ||..|| denotes normalization of the
peak shifts. (C) Superposed 15N HSQC spectra of a phosphoprotein-binding
FHA domain (600 μM) titrated with a phosphopeptide from a protein
kinase exhibit fast exchange behavior.[31] (D) Binding isotherms were derived from the titration shown in (C)
by applying SVD directly to the spectra (open triangles), lists of
the peaks of each spectrum (squares), or FIDs (circles). The KD of 40 ± 5 μM globally fitted to
the peak shifts of multiple amide peaks[31] is closest to the KD fitted to PC1 of
the spectra.
PC1 from SVD
of titrations in fast exchange, simulated or measured,
represents Langmuir binding isotherms. (A) Simulated spectral shifts
in the fast exchange regime. The colors of the contours progress with
ligand additions up to 10-fold excess. (B) Binding isotherms were
obtained by applying SVD to the simulated spectra without peak picking
(triangles), peak pick lists (circles), or the simulated raw FIDs
(open squares). Black squares mark conventional, global fitting of
the shifts of individual peaks. ||..|| denotes normalization of the
peak shifts. (C) Superposed 15N HSQC spectra of a phosphoprotein-binding
FHA domain (600 μM) titrated with a phosphopeptide from a protein
kinase exhibit fast exchange behavior.[31] (D) Binding isotherms were derived from the titration shown in (C)
by applying SVD directly to the spectra (open triangles), lists of
the peaks of each spectrum (squares), or FIDs (circles). The KD of 40 ± 5 μM globally fitted to
the peak shifts of multiple amide peaks[31] is closest to the KD fitted to PC1 of
the spectra.Parseval’s theorem
suggests that signals in time and frequency
domains can be considered equivalent.[33] With this in mind, PCA of the unprocessed FIDs was also evaluated
(Figure ). PC1 derived
from the array of FIDs from the simulation of fast exchange managed
to obtain a binding isotherm with nearly correct affinity but larger
uncertainty, i.e., KD of 290 ± 68
μM (Figure B).
This outcome is promising for PCA overcoming the high level of noise
added to the simulated example (S/N of 5 at the median peak height).
PCA of the sets of FIDs from the protein titration with phosphoThr
peptide in fast exchange[31] generated a
binding isotherm with KD close to the
33 ± 6 μM obtained by other methods (Figure D). The smaller uncertainties when applying
PCA after Fourier transformation might reflect increased sensitivity
from integration of the signals or from better signal resolution.
Slow Exchange Scenarios
Binding isotherms can be constructed
conventionally in the slow exchange regime (with slower koff and higher affinities) from changes of peak volumes
or heights but with more difficulty and rarity. Tracking the appearance
of bound state peaks is preferred[4] but
can be complicated by challenging peak assignments and peak attenuation
by line broadening. PCA of the simulated titration (KD set at 270 μM) in the slow exchange regime derives
a binding isotherm as PC1 that is virtually indistinguishable (KD of 262 ± 9 μM) from the simulated
populations (Figure B). SVD of the series of spectra derives robust binding isotherms
from titrations in slow exchange. The fits to them are precise with
all three options of scaling, provided that with autoscaling the threshold
for data inclusion is kept ≤7-fold the noise level (Figure S4E,F). PC1 extracted from simulated FIDs
provides a binding isotherm resembling the simulated populations,
with slight deviations in points and fitted KD of 290 ± 14 μM (Figure B). PCA was applied to the entirety of crowded 15N TROSY spectra of the 52 kDa PMM enzyme titrated by its
inhibitor xylose 1-phosphate (X1P), exhibiting slow exchange behavior
(Figure C). The binding
isotherm globally fitted to the increasing peak heights of several
selected bound state peaks estimates KD at 23 ± 6 μM. (The blue curve in Figure D summarizes many normalized peak heights
fitted.) The points of PC1 obtained directly from the spectra are
fitted by KD of 27 ± 13 μM
and PC1 from FIDs by KD of 32 ± 11
μM (Figure D).
These PC1-derived binding isotherms match well those obtained from
conventional global fitting of bound peak heights but with the advantages
of minimal data handling or interpretation.
Figure 2
SVD of titrations featuring
slow exchange, in simulated or measured
NMR spectra, distills binding isotherms as PC1. (A) Overlay of HSQC
spectra simulated with slow exchange. Protein ligand ratios of 1:0,
1:1.3, and 1:10 are represented by red, cyan, and darker blue, respectively.
Insets are 1D slices of peak pairs indicated by black arrows. (B)
PC1 derived from the simulated series of spectra (triangles) in panel
A provides binding isotherms equivalent to plotting heights of disappearing
peaks of the free state (black squares). PC1 was also calculated from
peak lists (circles) or the FIDs (open squares). (C) Spectra from
a slow exchange titration of an enzyme with an inhibitor. 15N TROSY spectra of PMM (52 kDa, 800 MHz, 25 °C) titrated with
X1P are superposed and contain amide peaks in slow exchange. PMM/X1P
ratios of 1:0, 1:0.6, and 1:8 are represented by red, cyan, and blue,
respectively. (D) PC1 of either the spectra or FIDs from this titration
captures the binding isotherm. Standard global fitting of peak heights
is shown with blue symbols for comparison.
SVD of titrations featuring
slow exchange, in simulated or measured
NMR spectra, distills binding isotherms as PC1. (A) Overlay of HSQC
spectra simulated with slow exchange. Protein ligand ratios of 1:0,
1:1.3, and 1:10 are represented by red, cyan, and darker blue, respectively.
Insets are 1D slices of peak pairs indicated by black arrows. (B)
PC1 derived from the simulated series of spectra (triangles) in panel
A provides binding isotherms equivalent to plotting heights of disappearing
peaks of the free state (black squares). PC1 was also calculated from
peak lists (circles) or the FIDs (open squares). (C) Spectra from
a slow exchange titration of an enzyme with an inhibitor. 15N TROSY spectra of PMM (52 kDa, 800 MHz, 25 °C) titrated with
X1P are superposed and contain amide peaks in slow exchange. PMM/X1P
ratios of 1:0, 1:0.6, and 1:8 are represented by red, cyan, and blue,
respectively. (D) PC1 of either the spectra or FIDs from this titration
captures the binding isotherm. Standard global fitting of peak heights
is shown with blue symbols for comparison.
Intermediate Exchange Scenarios
Intermediate exchange
is most problematic for estimating affinities due to its sigmoidal
plots of NMR peak shifts[4] vs [ligand] (Figures S1F and 3B). These
nonlinear shifts can be fitted erroneously with deviations up to 2
orders of magnitude from actual.[4] It can
also be misconstrued as evidence of cooperativity.
Figure 3
Suppressing the intermediate
exchange distortion of binding isotherms
by applying PCA directly to spectra. (A) HSQC spectra simulated to
be intermediate to fast in exchange for 1H chemical shift
changes and line shapes. The inset shows slices through a shifted
and broadened peak. (B) In intermediate to fast exchange, the ligand-induced
peak shifts deviate sigmodally from a 1:1 binding isotherm when applying
PCA to the peak pick lists (dashed line). The lag is suppressed in
PC1 (green triangles) from SVD of Pareto-scaled spectra. (C) A region
of the 15N HSQC spectrum of the FHA domain titrated with
a phosphopeptide displays intermediate-fast exchange behavior at the
peaks of four amino acids labeled. (D) PC1 of the spectra yields a
binding isotherm fitted by KD of 21 ±
8 μM, which agrees with the KD of
20 μM measured by isothermal titration calorimetry.[31]
Suppressing the intermediate
exchange distortion of binding isotherms
by applying PCA directly to spectra. (A) HSQC spectra simulated to
be intermediate to fast in exchange for 1H chemical shift
changes and line shapes. The inset shows slices through a shifted
and broadened peak. (B) In intermediate to fast exchange, the ligand-induced
peak shifts deviate sigmodally from a 1:1 binding isotherm when applying
PCA to the peak pick lists (dashed line). The lag is suppressed in
PC1 (green triangles) from SVD of Pareto-scaled spectra. (C) A region
of the 15N HSQC spectrum of the FHA domain titrated with
a phosphopeptide displays intermediate-fast exchange behavior at the
peaks of four amino acids labeled. (D) PC1 of the spectra yields a
binding isotherm fitted by KD of 21 ±
8 μM, which agrees with the KD of
20 μM measured by isothermal titration calorimetry.[31]In intermediate exchange, both line shapes and peak positions
appear
to be critical for capturing population change. As a simple and extreme
case, NMR spectra of a titration were simulated with intermediate
exchange broadening in all peaks in the 1H dimension. The application of standard autoscaling[32] in the algorithm of Figure S2 falls short of the accuracy and precision needed (see purple box
in Figure S4C,D). For obtaining a binding
isotherm of high accuracy and precision from intermediate exchange
behavior, Pareto scaling of the rows is required and improved by the
threshold remaining small (Figure S4C,D). Though the shifts of all peaks are sigmoidal (Figure A,B), PCA of the Pareto-scaled,
linearized spectra avoids any such distortion of PC1; it is best fitted
by a KD of 102 ± 15 μM that
agrees with the simulated KD (Figure B). Pareto scaling
with a low threshold increases the weighting of weak peaks broadened
by intermediate exchange and appears to move the data closer to a
Gaussian (Figure S6), the distribution
optimal for PCA.[19]
Mixtures of Regimes
It is much more typical of titrations
with NMR peaks in intermediate exchange to be accompanied by other
peaks in fast or slow exchange. We simulated a titration with a mixture
of all three regimes and 34% of the peaks in intermediate exchange
(Figure S8A). The sigmoidal shifts of the
latter are enough to cause PCA of the lists of all picked peaks to
extract PC1 which is sigmoidal and unacceptable as a binding isotherm
(Figure S8B). The application of PCA to
these spectra instead (with Pareto scaling for accuracy)
successfully captures the simulated population change as PC1 with
fitted KD within 7% of the simulated value
(Figure S8B). When using only peaks in
intermediate exchange from this simulation (Figure S8C), the sigmoidal distortion of PC1 from PCA of peak lists
worsens, but PCA of the Pareto-scaled spectra still suppresses distortion
of PC1, as is evident from fitted KD within
13% of the actual value (Figure S8D).15N HSQC spectra
of an FHA domain titrated with a phosphoThr peptide[31] exhibit intermediate-fast exchange (Figure C). Though numerous unaffected peaks are
also present, fitting of the PC1-derived binding isotherm matches
the KD of 20 ± 3 μM measured
independently by isothermal titration calorimetry (Figure D). PCA is not recommended
for application to FIDs with intermediate exchange broadening because
of the skewing of PC1 that results (Figure S9E,F).Applying unfold-PCA to spectra along with the preprocessing
recommended
herein (Figures S2 and S4) reliably defines
the binding isotherms. This is much easier than seeking KD through fitting of line shapes or competition experiments[4] requiring prior knowledge of relative ligand
affinity. Use of PCA does not change the need for [protein] to be
0.2 to 0.8 of KD for best accuracy in
fitting KD and within 10-fold for acceptable
accuracy.[5,9] When affinities are too tight to use this
range (evident as an abrupt transition), competition can then be introduced
to weaken the affinity of interest into the concentration range where
it can be fitted accurately.[4,5,15]
Two-Step Binding
Next, we attempted resolution of two binding events, reactions determined to be sequential.[34] In the course of multiple ligand binding, mixed
exchange regimes are likely to complicate previous strategies of analysis.
Cogliati et al. reported a challenging mixture of exchange regimes
in the two-step binding of two molecules of sodium glycochenodeoxycholate
(GCDA) to bile acid binding protein[34] (Figure A). The titrations
display a mixture of fast, slow, and intermediate exchange regimes
accompanying the complex binding (Figure B). The authors exploited line shape analysis
to selected amide NMR peaks undergoing intermediate exchange broadening;
see those marked with black arrows in Figure B.[34] This enabled
them to estimate the proportions of the apo (P), intermediate (PL),
and ligand-saturated (PL2) states through the course of
titrations[34] (green in Figure C).
Figure 4
Principal components
from SVD of spectra agree with the populations
estimated earlier by line shape analysis[34] for a titration of two sequential binding events. (A) Scheme of
the two-step binding mechanism hypothesized. (B) Chicken liver bile
acid binding protein with disulfide bridge was titrated with GCDA
and underwent intermediate exchange broadening, as is evident for
two peaks marked with arrows in the superposed HSQC spectra.[34] (B) HSQC spectra of this protein titrated with
GCDA, specifically ligand/protein ratios of 0, 0.1, 0.2, 0.4, 0.6,
0.8, 1.0, 1.3, 1.6, 2.0, 2.5, 3.0, and 3.5, with contours ranging
from red to blue. Black arrows indicate peaks in intermediate exchange.[34] (C) Comparison between normalized PCs (purple)
and populations of the states P, PL, and PL2 previously
calculated using line shape analysis (green, adapted from Figure 3e
in ref (34) with permission,
copyright 2010 John Wiley & Sons).
Principal components
from SVD of spectra agree with the populations
estimated earlier by line shape analysis[34] for a titration of two sequential binding events. (A) Scheme of
the two-step binding mechanism hypothesized. (B) Chicken liver bile
acid binding protein with disulfide bridge was titrated with GCDA
and underwent intermediate exchange broadening, as is evident for
two peaks marked with arrows in the superposed HSQC spectra.[34] (B) HSQC spectra of this protein titrated with
GCDA, specifically ligand/protein ratios of 0, 0.1, 0.2, 0.4, 0.6,
0.8, 1.0, 1.3, 1.6, 2.0, 2.5, 3.0, and 3.5, with contours ranging
from red to blue. Black arrows indicate peaks in intermediate exchange.[34] (C) Comparison between normalized PCs (purple)
and populations of the states P, PL, and PL2 previously
calculated using line shape analysis (green, adapted from Figure 3e
in ref (34) with permission,
copyright 2010 John Wiley & Sons).The application of SVD directly to the same spectra without
peak
picking and with Pareto scaling results in PC1 accounting for 61%
of the variances and PC2 accounting for 12% (Table S2). PC1 approximates the disappearance of the apo state P.
The quantity 1 – PC1 (not shown) resembles but slightly exceeds
the formation of the fully bound state PL2 (Figure C). PC2 resembles the rise and fall of the
population of the singly ligated intermediate PL, once PC2 is normalized
to the scale of PC1 (Figure C). Since the population changes of P and PL2 are highly correlated
(R= −0.93) and hence statistically related,
it is mathematically unrealistic to distinguish these two correlated
components by PCA, a decorrelation technique.When no ligand
is present (L/P = 0) or the bile acid binding protein
is saturated with the GCDA ligand (e.g, L/P = 3.5), PC1 and PC2 sum
to 1.0 in agreement with the proportions of PL and PL2 summing to
1.0. Consequently, the sum of PC1 and PC2 is renormalized to 1.0.
This implies that PL2 should be modeled by 1-PC1-PC2, which
matches well the fractional concentrations of PL2 estimated
previously[34] (Figure C).
Nonlinearity and Applicability of PCA
Are the nonlinear
peak shifts of the peaks in intermediate exchange (see Figures , 4, and S8) suitable for PCA? Neither SVD
nor covariance calculations require Gaussian distributions.[19] The series of NMR spectra and time-lapse images
analyzed in this study all have a degree of the nonlinear
character (non-normal distributions) exemplified more dramatically
by a chaotic system (Figure S7). This may
result from the spectra and images containing more components than
lists of their peaks or features. It would require multiple PCs to
capture most of the greater complexity to reconstruct the original measurements (with matrix U in eq ). However, for this study’s
more modest goal of extracting the largest population shifts among
the spectra or images, the nonlinearity (Figure S7) does not interfere in the largest PCs capturing the main
processes. When these largest trends are abstracted from matrix V (eq ), they robustly withstand nonlinearity. The central limit theorem
generates an approximation of normality for most data sets, as they
have the large size required by the theorem. The scaling of the data
matrix of spectra appears to shift it toward a normal-like distribution
(Figure S6). Thus, discovering the main
trends requires far fewer PCs from matrix V than needed for faithful reconstruction of nonlinear
spectra and images using matrix U.
Periodic and
Multiple Components from MRI by PCA
We
tested the fitness of this SVD approach for wider applications to
measurements paralleling macromolecular NMR spectra in being complex
and responsive to coordinated processes, e.g., MRI movies. The SVD
algorithm extracts from an MRI movie of brain fluctuations[35] the periodic flow of cerebral spinal fluid as
PC1 (Figure A,B).
PC1 from the full breadth of the movie frames appears similar to the
reported modulation of image intensities within the box confined to
the third ventricle[36] (Figure A,B). PC1 represents the 5
cycles of respiration, each with 2.5 s of inspiration and 2.5 s of
expiration, similarly to the conventional plot of the localized intensities
of the MRI signal[36] (Movie S1). PC1 being smoother than the local intensity changes
may reflect the integration of more covarying data and the noise filtering
that is intrinsic to PCA.
Figure 5
SVD extracts the time courses of pulsation in
MRI movies of cross
sections through the brain[35] or chest.[38] (A) Frames from the brain imaging (Movie S1, adapted from ref (35) with permission, copyright
BiomedNMR/CC-BY-SA-3.0) feature cerebral spinal fluid flow most apparent
within the box pointed out by an arrow in frame 2.[35,36] (B) PC1 from the movie captures five cycles of breathing, plotted
with the red line. Signal intensities within the boxed central region
with the arrow in the third ventricle are plotted with the black dashed
line. (C) A frame from the movie of ref (38) (adapted with permission, copyright 2014 John
Wiley & Sons) is labeled AA for ascending aorta, DA for descending
aorta, PT for pulmonary trunk, RPA for right pulmonary artery, and
SVC for superior vena cava. (D) The time courses of the four PCs generated
by unsupervised SVD are plotted and suggest four types of periodic
fluctuations. This movie[38] is synchronized
with plotting of its PC1 and PC2 in Movie S2.
SVD extracts the time courses of pulsation in
MRI movies of cross
sections through the brain[35] or chest.[38] (A) Frames from the brain imaging (Movie S1, adapted from ref (35) with permission, copyright
BiomedNMR/CC-BY-SA-3.0) feature cerebral spinal fluid flow most apparent
within the box pointed out by an arrow in frame 2.[35,36] (B) PC1 from the movie captures five cycles of breathing, plotted
with the red line. Signal intensities within the boxed central region
with the arrow in the third ventricle are plotted with the black dashed
line. (C) A frame from the movie of ref (38) (adapted with permission, copyright 2014 John
Wiley & Sons) is labeled AA for ascending aorta, DA for descending
aorta, PT for pulmonary trunk, RPA for right pulmonary artery, and
SVC for superior vena cava. (D) The time courses of the four PCs generated
by unsupervised SVD are plotted and suggest four types of periodic
fluctuations. This movie[38] is synchronized
with plotting of its PC1 and PC2 in Movie S2.We also applied this PCA approach
to an MRI movie of a chest cross-section[38] through the large arteries (the aorta and pulmonary
trunk) and vein (superior vena cava) each connected to the heart (Figure C). The aorta, pulmonary
trunk, and superior vena cava pulse in unison upon contraction of
the heart, while chest dimensions undulate more slowly with breathing[38] (Movie S2). Applying
unfold-PCA to the standard magnitude view of the MRI movie easily
extracts four time courses as PC1 to PC4. PC1 represents breathing
with three cycles of inspiration and expiration (red in Figure D and Movie S2). PC2 represents the pulsation of the major arteries and
superior vena cava upon heart contraction for ten consecutive heart
beats; the troughs mark the expansion of the vessels (blue in Figure D and Movie S2). The process represented by PC3 is unclear but is synchronized to
breathing and repeats at exactly twice the frequency of PC1 and breathing.
Movie reconstruction[28] using only PC3 suggests
subtle fluctuations in the pulmonary trunk (not shown), which ties
to the lungs. PC4 is clearly synchronized to the cardiac cycle. Movie
reconstruction[28] reveals that PC4 affects
the pulmonary trunk the most and the aorta slightly. The crests of
PC4 (Figure D) probably
represent contraction of the heart (systole) because they are narrow
and immediately precede the bolus of blood that appears in the arteries
(troughs in PC2). The broad troughs of PC4 probably represent the
relaxation of the heart known as diastole, with its rapid filling
and subsequent slower filling phases; these are evident as the steeper
and more gradual slopes at the bottom of the troughs (Figure D). Thus, the strategy of applying
PCA directly to the series of images resolves multiple concurrent
processes. Two PCs are as intuitive as breathing and heart beat while
another PC represents phases of the cardiac cycle.
Tallying Meaningful
Principal Components
Determining
the number of meaningful PCs can become important when there are concurrent
processes. Scree plots of the contributions of PCs are widely trusted
and give especially clear suggestions of the significant PCs for the
peak lists and movies that we analyzed. Additional strategies of counting
significant PCs were proposed (e.g., singular values and RMSD)[15,39] but appear inconclusive in all applications of unfold-PCA to the
series of spectra and images that we have examined, except to highlight
the ubiquity of nonlinear behavior (Figure S7). Even for a simple titration with NMR peaks in slow to intermediate
exchange, using the percentage of the variances accounted for cannot
judge the adequacy of the single component (Figure S6). The criterion that a PC be smooth (high
autocorrelation),[15] however, appears more
reliable for recognizing a meaningful component, when coupled with
some understanding of the processes. For example, in 1:1 protein–ligand
binding, the hyperbolic PC1 curve represents the binding isotherm
regardless of the proportion of variance contributed by PC1. This
inspection of PC1 works for the slow-intermediate exchange example
(Figure S6). When more than one significant
component is present, the shapes of lesser PCs need to be checked.[15] In analyses of protein–ligand titrations
with two reactions (see Figure ), PC1 and PC2 are smooth and clearly larger than other PCs
(Figure S10).
Limits to Applications
of PCA to Spectra and Images
We have encountered instances
of deterioration or failure of the
improved unfold-PCA algorithm. PCs were corrupted when spectral windows,
signal averaging, management of water suppression, or gain were not
uniform. This is usually overcome by applying SVD to peak pick lists.
SVD of unprocessed FIDs diminished by simulated intermediate exchange
failed to represent the binding isotherms of the titrations (Figure S9F). This is avoided by Fourier transformation.
When SVD is applied to 1D spectra of abnormally low digital resolution,
the accuracy of the binding isotherm deteriorates (Figure S6). However, PCA appears remarkably reliable in representing
at least two processes from a series of 2D measurements.
Potential
Applications to Digital Data
Unfold-PCA,
improved by preprocessing steps described, can process many kinds
of series of comparable spectra and images. It makes most sense to
apply it to data that are complex but that respond to one or more
concerted processes, for the purpose of finding the main trends. Macromolecular
NMR and MRI provide good examples. Plotting the course of protein
folding intermediates recorded by expedited NMR spectra[40] is another potential application. Potential
applications may extend to other series of 2D measurements such as
spectra, gels, and imaging of microarrays,[41] chromatographic separations,[42] electrochemistry,[43] and chemical biology signals.[44,45]
Conclusions
The application of this PCA strategy (enhanced
by preprocessing)
to a series of spectra or MRI images offers convenience and wide applicability
to characterizing concerted processes. Such applications will expand
the accessibility of affinities, equilibria, kinetics, and time-evolving
processes. This will include noninterpreted, unassigned, and overlapped
features in spectra and movies, which may number two or more concurrent
processes. For example, NMR studies will be enabled to elucidate binding
isotherms masked by intermediate exchange and/or two or more concurrent
processes.
Authors: Carlos Amero; Paul Schanda; M Asunción Durá; Isabel Ayala; Dominique Marion; Bruno Franzetti; Bernhard Brutscher; Jérôme Boisbouvier Journal: J Am Chem Soc Date: 2009-03-18 Impact factor: 15.419
Authors: Thomas Z Teisseyre; Jiri Urban; Nicholas W Halpern-Manners; Stuart D Chambers; Vikram S Bajaj; Frantisek Svec; Alexander Pines Journal: Anal Chem Date: 2011-07-01 Impact factor: 6.986
Authors: Robert A van den Berg; Huub C J Hoefsloot; Johan A Westerhuis; Age K Smilde; Mariët J van der Werf Journal: BMC Genomics Date: 2006-06-08 Impact factor: 3.969
Authors: Melanie M Britton; Paul M Bayley; Patrick C Howlett; Alison J Davenport; Maria Forsyth Journal: J Phys Chem Lett Date: 2013-08-22 Impact factor: 6.475
Authors: Andrew T Namanja; Jia Xu; Haihong Wu; Qi Sun; Anup K Upadhyay; Chaohong Sun; Steven R Van Doren; Andrew M Petros Journal: J Biomol NMR Date: 2019-09-20 Impact factor: 2.835
Authors: Yan G Fulcher; Stephen H Prior; Sayaka Masuko; Lingyun Li; Dennis Pu; Fuming Zhang; Robert J Linhardt; Steven R Van Doren Journal: Structure Date: 2017-06-22 Impact factor: 5.006
Authors: Yajin Ye; Yan G Fulcher; David J Sliman; Mizani T Day; Mark J Schroeder; Rama K Koppisetti; Philip D Bates; Jay J Thelen; Steven R Van Doren Journal: J Biol Chem Date: 2020-05-27 Impact factor: 5.157
Authors: Wenyi Mi; Yi Zhang; Jie Lyu; Xiaolu Wang; Qiong Tong; Danni Peng; Yongming Xue; Adam H Tencer; Hong Wen; Wei Li; Tatiana G Kutateladze; Xiaobing Shi Journal: Nat Commun Date: 2018-09-14 Impact factor: 14.919