Literature DB >> 23619693

Accurate assessment of mass, models and resolution by small-angle scattering.

Abstract

Modern small-angle scattering (SAS) experiments with X-rays or neutrons provide a comprehensive, resolution-limited observation of the thermodynamic state. However, methods for evaluating mass and validating SAS-based models and resolution have been inadequate. Here we define the volume of correlation, Vc, a SAS invariant derived from the scattered intensities that is specific to the structural state of the particle, but independent of concentration and the requirements of a compact, folded particle. We show that Vc defines a ratio, QR, that determines the molecular mass of proteins or RNA ranging from 10 to 1,000 kilodaltons. Furthermore, we propose a statistically robust method for assessing model-data agreements (χ(2)free) akin to cross-validation. Our approach prevents over-fitting of the SAS data and can be used with a newly defined metric, RSAS, for quantitative evaluation of resolution. Together, these metrics (Vc, QR, χ(2)free and RSAS) provide analytical tools for unbiased and accurate macromolecular structural characterizations in solution.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 23619693 PMCID： PMC3714217 DOI： 10.1038/nature12070

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

Achieving reliable, high-throughput structural characterizations of biological macromolecular complexes is a major challenge in the modern structural-genomics era[1]. In principle, small-angle scattering (SAS) with X-rays (SAXS) or neutrons (SANS) can meet this challenge by efficiently providing information that fully describes the structural state of a macromolecule in solution[2-4]. SAS can determine a scattering particle’s radius-of-gyration (Rg), volume (Vp), surface-to-volume ratio and correlation length (lc) with the latter three physical parameters dependent on the Porod invariant[5], Q, an empirical SAS value defined for compact folded particles. Q is unique to a scattering experiment and requires convergence of the SAS data at high scattering vectors (q, Å−1) in a q (Kratky) plot. Convergence defines an enclosed area where the degree of convergence reflects the compacted (bounded area), flexible, or unfolded (unbounded area) solution states (Fig. 1a). Consequently, non-convergence leaves Q undetermined and paradoxically implies Vp and lc are undefined for flexible particles (SI Fig. 1 and Notes). This observation leaves Rg as the only structural parameter that can be reliably derived from SAS data on flexible systems.

Figure 1

Concentration independence and conformational dependence of Vc

(a, b), Experimental SAXS data plotted on a relative scale for glucose isomerase (cyan), 94-nucleotide SAM-1 riboswitch in the absence of Mg2+ (orange) and RAD51AP1, an intrinsically unfolded protein (green). a, Data transformed as the Kratky plot, q, reveal the parabolic convergence for a folded particle (blue) and divergence for a flexible (orange) or fully unfolded (green) particle. b, Data plotted as q • I(q) vs. q show convergence for both folded and flexible particles. Inset demonstrates convergence for a fully unfolded polymer. c, Concentration independence of Vc for experimental SAXS data. For each of 7 samples, relative difference is calculated as the deviation from the mean normalized to the mean. Concentrations ranged from 0.2 to 3 mg/mL for glucose isomerase (cyan), P4–P6 domain (open red), xylanase (orange), TyMV UUAG TLS RNA (solid black), del8 RNA (open purple), Atu RNase P(open black), SAM-1 riboswitch with Mg2+ and ligand (closed purple), SAM-1 riboswitch in the absence of Mg2+ (solid green). X-axis (Sample Number) refers to the different concentrations for each sample increasing from left to right. d, Correlated changes in Vc (red) and Rg (cyan) for conformations of SAM-1 riboswitch (PDB 2GIS) simulated from molecular dynamics with CNS[28]. Horizontal lines demonstrate for Rg or Vc that a single value can map to multiple conformations. Dual specification of both Rg and Vc reduces multiplicity (vertical bars). Relative change represents the difference calculated from the starting model 2GIS. Asterisks denote the time step of the displayed conformation.

Defining the volume-of-correlation

SAS is uniquely capable of providing structural information on all particle types including flexible systems such as intrinsically unstructured proteins[6, 7]. Here, we overcome current limitations of SAS analyses by deriving a SAS invariant called the volume-of-correlation, Vc. Vc is defined as the ratio of the particle’s zero angle scattering intensity, I(0), to its total scattered intensity (SI Notes). The total scattered intensity is the integrated area of the SAS data[8, 9] transformed as q⋅I(q) vs. q. Unlike the Kratky plot, we observe that the integral of q⋅I(q) vs. q converges for both folded-compact and unfolded-flexible particles (Fig. 1b). The aforementioned ratio given by reduces to the particle’s volume (Vp) per self-correlation length (lc) with units of Å2. This derivation asserts that Vc, like Rg, can be calculated from a single SAS curve and is concentration independent. We validated concentration independence using well-characterized macromolecules of differing composition and mass. Specifically, for the 173 kDa protein glucose isomerase and the 51 kDa P4–P6 RNA domain from the Tetrahymena group I intron[10], SAXS data collected at 7 concentrations ranging from 0.2 to 3 mg/mL exhibited concentration independence: 86% of the variance was contained within 4% of the mean. Further analysis of 7 additional protein and RNA samples confirmed the concentration independence (Fig. 1c): 65% of the variance was contained within 2% of the mean, suggesting Vc is constant across the concentration ranges for all macromolecular shapes and compositions tested. Vc is defined by the particle’s correlation length and implies that a change in conformation should change Vc (Fig. 1d). We observed this prediction for both the SAM-1 riboswitch[10] and PYR1, a plant hormone binding protein[11]. For these macromolecules, ligand binding decreased both Rg and Vc (Table 1) consistent with reported compaction upon binding[11-13]. Furthermore, we examined Mg2+-dependent structured RNAs for folding by SAXS. Measurements of both the SAM-1 riboswitch and TyMV TLS[14] without Mg2+ displayed the classic hyperbolic feature of a monodisperse multi-conformation Gaussian ensemble in the Kratky plot (SI Fig. 1). As predicted, flexibility in the absence of Mg2+ increased the experimentally determined Vc values (by 14.5% for TyMV TLS and 21 % for SAM-1 RNA), compared to their compact Mg2+-folded states (Table 1). Collectively, the observed ligand-dependent changes in Vc for both PYR1 and SAM-1 RNA or Mg2+-dependent changes in Vc for TyMV TLS and SAM-1 RNA assert that Vc is an informative descriptor of the macromolecular state.

Table 1

Condition-dependent changes in SAXS invariants

Macromolecule	V_c(Å²)	R_g(Å)	V_p(Å³)	SAXS mass(kDa)
SAM-1 (bound) : mixture	460 (± 2)	34.4 (± 0.3)	80,000	50.3
SAM-1 (free) : mixture	407 (± 2)	31.0 (± 0.2)	76,000	44.9
SAM-1 (bound)	280 (± 4)	22.8 (± 0.4)	40,000	31.4
SAM-1 (free)	295 (± 4)	24.7 (± 0.7)	48,000	32.0
SAM-1 (−) Mg²⁺	339 (± 12)	31.6 (± 1.0)	n.d.	32.8
P4P6 RNA domain : mixture	478 (± 1.0)	31.0 (± 0.1)	105,000	58.2
P4P6 RNA domain	414 (± 5)	29.4 (± 0.2)	73,000	50.8
PYR1 (bound)	319 (± 0.5)	20.6 (± 0.9)	59,000	41.9
PYR1 (free)	343 (± 8)	23.2 (± 0.8)	74,000	40.2
TyMV (+) Mg²⁺	324 (± 2)	25.9 (± 0.1)	49,000	35.9
TyMV (−) Mg²⁺	371 (± 1)	29.9 (± 0.1)	n.d.	39.8

• Vp denotes the particle’s Porod volume.

• n.d. denotes “not determined”.

• ‘mixture’ refers to non-gel filtration purified samples containing mis-folded RNA.

• Uncertainties are the standard deviation of 4 to 8 independent SAXS datasets.

Particle mass determination by Qr

Accurate determination of molecular mass has been a major difficulty in SAS analysis. Existing methods require an accurate particle concentration, the assumption of a compact near-spherical shape, or SAXS measurements on an absolute scale[15-18]. As these requirements hinder both accuracy and throughput of mass estimates by SAS, we sought to establish a SAS-based statistic suitable for determining the molecular mass of proteins, nucleic acids or mixed complexes in solution without concentration or shape assumptions. We calculated Rg and Vc from simulated SAXS profiles for 9,446 protein structures from the Protein Data Bank (PDB)[19], ranging in molecular weight from 8 to 400 kDa. We discovered that a parameter, Qr, defined as the ratio of the square of Vc to Rg with units of Å[3] is linear versus molecular mass in a log-log plot (Fig. 2, 3 and SI Fig. 2). The linear relationship is a power-law relationship given by that determines the empirical mass of the scattering biological particle allowing for the direct assessment of oligomeric state and sample quality. Parameters k and c are empirically determined and specific to the class of macromolecular particle (SI Fig. 3).

Figure 2

Defining the power-law relationship between Vc, Rgand protein mass

Vc and Rg were determined from theoretical atomic X-ray scattering profiles for 9,446 protein PDB[20] structures. For each profile, SAXS data were simulated to a maximum q = 0.5 Å−1 (~13 Å). Various ratios of Vc and Rg against protein mass were examined in a log-log plot. The linear relationship observed for the ratio Vc2 • Rg−1 (black) suggests a power law relationship exists between the ratio and particle mass of the form ratio = c • (mass). The ratio, Vc2 • Rg−1, is defined by units of Å3 with mass in Daltons. Additional ratios examined (green, cyan, gray and red) displayed asymmetric non-linear relationships. In green, the fit included m (0.9246 ± 0.0008) and n (1.892 ± 0.0005) in a non-linear surface optimization with an average mass error of 4.9 ± 4.3%. Fitting the linear power-law relationship (black) produces an average mass error of 4.0 ± 3.6%. Truncation of the data to q = 0.3 Å−1 (~21 Å resolution) increases the mass error by 0.6% (Supplementary Fig. 2).

Figure 3

Power-law relationship betweenQrand particle mass (MW) allows direct mass determination

a, Qr calculated from previously reported experimental SAXS data for protein only samples (Supplementary Table 1). Gel-filtration purified samples (orange) were plotted with experimental data taken from BioIsis.net (open circles). c, Qr calculated from experimental SAXS data for RNA only samples (blue) (Supplementary Table 2). Final equations in a and b can be used for mass determination of protein or RNA only samples. Due to a lack of available SAXS data for protein-nucleic acid complexes, parameters for k and c remain undetermined.

Vc and Rg are both contrast and concentration independent, thus the determination of molecular mass using Qr can be made from SAXS data collected under diverse buffer conditions and concentrations, albeit free of interparticle interference. In fact, this linear relationship produced an average mass error < 4% for the 9,446 proteins in the in vacuo simulated dataset (Fig. 2). Calculations of Qr from simulated and experimental (SI Tables 1 and 2) buffer-subtracted SAXS data of proteins, mixed protein-nucleic acid complexes or RNA alone (Fig. 3a, b) further verified the power-law relationship between Qr and mass. The mass errors for protein and RNA gel-filtration purified SAXS samples were 9.7 and 4.6%, respectively. Furthermore, for RNAs that were measured under folded and unfolded conditions, the average mass difference was 5.6%. The empirically determined mass power-law parameters (Fig. 3) are specific to macromolecular composition and analogous to empirical refractive index increments in light scattering studies[20]. Moreover, Qr, as a mass estimator, assesses SAXS data quality for modeling. For heterogeneous samples, neither Rg nor Vc alone can reliably suggest a corrupted sample. Applying Qr to P4P6 and SAM-1 RNA samples with known contaminants[10] (Table 1) shows that having 5 and 15% contaminants results in a 14 and 60% mass error, respectively, suggesting ab initio density models would not accurately represent the assumed homogenous solution state.

Cross-validating SAS model-data agreements

Atomistic modeling of SAS data relies on the reduced chi-square (chi error-weighted scoring function[21, 22] that can be unreliable with moderately noisy datasets or over-estimated degrees-of-freedom (SI Fig. 4 and 5). This can lead to over-fitting and model misidentification. In crystallographic and NMR analyses, cross-validation statistical methods mitigate over-fitting and increase confidence in selected model(s)[23, 24]. Here, we present an analogous robust statistical method based on the Nyquist-Shannon sampling and the noisy-channel coding theorems (SI Notes) for evaluating structural models against SAS data. For a given maximum dimension (dmax), the sampling theorem[9] determines that the number of unique, evenly distributed observations, n, required to represent a particle to a maximum scattering vector (q) is given by (dmax⋅qmax)⋅π −1. For example, SAS data to q of 0.3 Å−1 determines for xylanase (dmax 44 Å) or 30S ribosomal particle (dmax 240 Å) the minimum number of observations are 4 and 23, respectively. This represents a ~20- to 125-fold over-sampling of a SAS curve composed of 500 observations. The Nyquist-Shannon limit (n) is the set of maximally independent observations from the band-limited SAS curve (SI Fig. 7). We reasoned that calculating chi from a dataset reduced to n should more accurately assess the model-data agreement by restricting chi evaluations to the set of independent random variables (SI Notes). Due to over-sampling and the uncertainties in q, I(q) and dmax, determining the exact set of Nyquist-Shannon points will be difficult. Nevertheless, application of the noisy-channel coding theorem guarantees noise-free recovery of the SAS signal (SI Notes, Fig. 8 and 9); therefore, we propose the following sampling procedure for estimating chi that partitions a SAS dataset into n equal bins for a given dmax. A randomly sampled data point is taken from each bin creating a n-length data vector that is used in chi. To minimize outlier influence, chi is taken as the median over k sampling rounds (typically k = 1001) yielding a statistic we call X. Analogous to R, X2 uses a cross-validation scheme that excludes data from each bin during a round. This technique is akin to the robust least-trimmed squares method[25] and provides resistance to outliers, preventing over-fitting and the misidentification of models[26, 27].

Resisting over-fitting with X

We tested X on SAXS data for xylanase at pH 7.2 (Fig. 4a). Based on the fit to the crystallographic structure (PDB 1REF, chi = 3.9), SAXS data implies an alternate conformation in solution. Using 1REF as a reference structure, 1,600 conformations were generated and used in a conventional all data chi determination. ~7% of the models produced chi < 1 suggesting data over-fitting with the best model (chi = 1.0; Fig. 4a) showing a clear bias in the high q-region. Using X, no model was identified with a X < 1 and the best model (X = 1.39) demonstrated improved fitting in the high q-region, showing X distinguishes subtle conformational states. By minimizing on the median n-limited chi, X more accurately determines the true model-data agreement and is not prone to over-fitting (SI Fig. 5).

Figure 4

Objective, quantitative evaluation of models using the least medianχ2(X2)

a, Selection of the best PDB model from a pool of 1,600 conformations generated using CONCOORD[29]. The best selected model (model 44 of 1600) from CRYSOL (red) with a conventional χ2 = 1 demonstrates a bias in the high q-region of the residuals whereas the best selected model (model 560 of 1600) using X2 (cyan) displays an even distribution throughout the residuals with X2 = 1.39. The bias within the high q region (0.18 Å−1 < q < 0.24 Å−1) implies a conformational difference between the data (red) and target model due to over-fitting. The resistance to over-fitting by X2 enables the identification of different “best” models. b, Effects of noise on χ-values from X2 (cyan) and conventional χ (red) calculations. Varying empirical noise levels were transposed onto a simulated SAXS profile of a randomly selected xylanase model generated by CONCOORD. A specified noise level represents the average noise in the last third of the q-range in a. Conventional χ (red) is unstable and directly influenced by outliers producing erroneous χ-values whereas X2 is resistant and stable to noise (black line). Erroneous χ-values will increase the false-negative rate for an experiment. c, Distribution of χ-values determined from the set of models with an r.m.s.d < 1.5 at 19% noise. 30 randomly selected targets were fitted against 500 simulated SAXS curves at 19% noise from a pool of CONCOORD generated xylanase conformations. (Inset) Distribution of r.m.s.d for all models with a X2 < 1.5. At higher noise, X2 (cyan) produces narrower χ-value distributions than conventional χ (red) for near native conformations, thus reducing overall false negative rate.

To test how resistant X is to noise, we simulated noisy xylanase SAXS datasets using empirical noise from reference datasets and evaluated how well conventional chi2 and X can identify the true model from a set of randomly perturbed structures. Under low noise (≤ 12%), both X and conventional chi behave similarly. At higher noise levels, conventional chi becomes unstable, such that true models would be erroneously rejected. In contrast, X values were stable over the tested noise levels and effective at identifying matches (Fig. 4b). More importantly, for near-native conformations of the target (root-mean-square difference, r.m.s.d < 1.5), conventional chi values are widely distributed with nearly half greater than 2 (Fig. 4c). For X, the distribution is narrower suggesting near native conformations are better identified with fewer false negatives.

Validating model-data resolution limits

Determining resolution limits of model-data agreements cannot be achieved by chi alone and requires a metric we define as Rsas incorporating residuals between modeled and experimental values for both Rg and Vc given by: Rsas is a difference distance metric determined from the set of Q-independent SAS invariants. Calculation of Rsas at varying resolutions provides an objective basis to determine appropriate resolution limits for data-model agreements. For dilute xylanase (SI Fig. 4a, 4b), data were collected to a maximum q = 0.5 Å−1 (~13 Å resolution) and fit to PDB 1REF with a chi of 1.3 suggesting an acceptable data-model agreement. However, inspection of Rsas and X (20.3 and 1.8, respectively) reveal low agreement. Truncating the SAS data shows a significant decrease in Rsas with X increasing initially then decreasing as the data-model agreement improves (SI Fig 4b). Convergence of Rsas towards zero with a X ≤ 1.5 implies the limit of the data-model agreement to be q ≃ 0.2 Å−1 or a resolution of 31 Å. The combination of Rsas and X2, for a given model, provides a quantitative and graphical approach for determining the acceptable resolution between the data and model (SI Fig. 4b and 5). As SAXS data is often used to filter a large set of conformationally distinct models, the models themselves may not be capable of describing the SAXS data to high resolution; therefore, application of Rsas and X2 may provide the useful resolution of the data-model agreement. Nevertheless, as done recently for crystallography[27], a functional definition of resolution can come from the noisy-channel coding theorem. Here, the useful resolution of the data will be asserted by the highest Nyquist-Shannon point supported by the data.

Perspective

The SAS invariant Vc extends analysis to flexible biopolymers in solution. The volume-per-correlation length, like Rg, faithfully informs on the conformational state of the particle and can be calculated for models determined by other structural techniques including electron microscopy, X-ray crystallography, NMR and SANS. Vc provides a unique descriptor of the scattering experiment that is broadly applicable. We expect that Vc may further characterize voids in materials such as bone, polymeric beads or nano-materials. As the ratio of the square of Vc to Rg defines a mass parameter, Qr, SAS experiments can now inform on particle mass without requiring compactness and instrument calibration. Furthermore, X is a robust statistical metric that we envision will enable cross-validated determination of flexible ensembles against observed SAXS data. We anticipate that Vc, Qr, X, and Rsas will efficiently and objectively aid characterization of flexible macromolecules, check sample quality, determine mass and assembly states, detect concentration-dependent scattering, reduce model misidentification and over-fitting, and assess resolution for model to data agreement.

Methods

X2 Calculation

For a given dmax, the SAXS/SANS data collected between qmin and qmax can be divided into n equal bins where n is determined by the Nyquist-Shannon sampling theorem[9]. Here, dmax is measured from the atomistic model; however, dmax can be directly inferred using an indirect Fourier transform method such as GNOM. In the case of 500 data points, and n = 10, each bin will contain 50 data points such that a single randomly selected datapoint will represent that Nyquist-Shannon point. Since a selected data point may be biased by interparticle interference or uncertainties in q or I(q), the selection of the representative datapoint from the Nyquist-Shannon bin must occur through several selection rounds (k). During each round, the set of randomly selected points comprises the test set for calculating chi2 against the model. The accepted value is taken as the median over k rounds. The number of rounds, k, will vary with the average noise level of the SAXS/SANS dataset. The probability of selecting an erroneous datapoint from a bin scales directly with the noise. We have found for high quality data (< 10% noise), k can be as small as a few hundred whereas for high noise data, k should be 2000 to a maximum of 3000.

Sample preparation

Protein and RNA samples were derived from a variety of sources. For glucose isomerase and xylanase, protein samples were obtained as suspended crystals (Hampton Research). Each protein was further purified by gel-filtration chromatography immediately before SAXS data collection in buffer containing either (A) 20 mM HEPES pH 7.2, 5 mM MgCl2, 100 mM KCl and 2 mM TCEP, (B) 40 mM MES pH 6.8, 8 mM MgCl2, and 100 mM KCl, or (C) 40 mM NaCitrate pH 5.0, 75 mM KCl and 1% glycerol. Proteins were resuspended by a 50-fold dilution of the crystals in buffers A or B for glucose isomerase and buffers A, B or C for xylanase. Diluted crystals were incubated at 37 °C on a nutator for 1 hour, concentrated to 10 mg/ml and injected on a pre-equilibrated Superdex 200 PC 3.2 column (GE Healthcare) for glucose isomerase and Superdex 75 PC 3.2 column (GE Healthcare) for xylanase. Fractions corresponding to peak elution were taken for SAXS and quantitated by absorbance at 280 nm. TAQ polymerase was recombinantly expressed and purified from Escherichia coli using cells transformed with pET vector conferring ampicillin resistance. Cells were grown at 37 °C, induced for 4 hours with IPTG at 0.8 OD260 before harvesting. Cells were lysed as described[30]. Lysate was clarified by low-speed spin in 50 mL falcon tubes and incubated at 65 °C for 20 minutes. Lysate was further clarified by high speed centrifugation at 20,000 × g for 40 minutes at 4 °C. Bound nucleic acids were removed by PEI treatment and ammonium sulfate precipitation. Protein was resuspended in buffer B and further purified to homogeneity using Superdex 200 HR 10/30 (GE Healthcare) for SAXS analysis. Catalase (human erythrocyte) was purchased from a commercial source (EMD). 1 mg was resuspended in 100 uL of buffer A and further purified using a Superose 6 PC 3.2 column (GE Healthcare) equilibrated in buffer A. Fraction corresponding to peak elution was taken for SAXS analysis. Thermosome from Sulfolobus solfataricus was purified from source and kindly provided by Steve Yannone (Lawrence Berkeley National lab). Thermosome samples were prepared by purification on a Superose 6 HR 10/30 column in buffer equilibrated with 40 mM pH 5.5, 75 mM KCl, 75 mM NaCl, 5 mM MgCl2, and 2 mM TCEP. Fraction corresponding to peak elution was taken for SAXS analysis. Data for Full-length and truncated TBL1 was kindly provided by Yoana Dimitrova and Walter Chazin (Vanderbilt University). Data for p65 was kindly provided by Andrea Berman and Tom Cech (University of Colorado at Boulder). Data for PYR1 samples were kindly provided by Kenichi Hitomi and Elizabeth Getzoff and purified as described[11]. Samples were purified and analyzed onsite by gel-filration and MALS immediately before SAXS analysis.

Multi-angle light scattering (MALS)

Multi-angle light scattering (MALS) studies were performed inline with size-exclusion chromatography on protein and RNA samples to assess monodispersity and mass of the SAXS samples using an 18-angle DAWN HELEOS light scattering (LS) detector in which detector 12 was replaced with a DynaPro quasi-elastic light scattering detector (Wyatt Technology). Simultaneous concentration measurements were made with an Optilab rEX refractive index detector (Wyatt Technology) connected in tandem to the LS detector. For each buffer used, the MALS system was calibrated with BSA at 10 mg/mL to determine delay times and band broadening. For proteins, BSA, xylanase and glucose isomerase provided an additional calibration of the refractive index increment for protein samples. For RNA samples, the refractive index increment was determined from P4–P6 RNA samples[10, 30]. MALS analyses were performed on all the RNAs (except tRNAphe) in this study and a set of proteins comprising glucose isomerase, xylanase, thermosome, catalase, TBL1, PYR1, and p65 (Table S1 and S2).

PDB query

The Protein Data bank (PDB) was used as a source for structural models for SAXS simulations. The comprehensive protein dataset was selected based on the following criteria: molecular mass range (10 to 1200 kDa), technique (X-ray crystallography), resolution limits (1.8 to 3.2 Å), exclude 90% similarity, protein only, and single models with 1 to 2 chains in the asymmetric unit. Further manual curation was performed for structures where the asymmetric unit produced two models physically separated in space without crystal contacts. For the RNA only datasets, the following criteria was used: RNA only, molecular mass range 10 to 250 kDa, exclude 95% similarity, technique (X-ray crystallography) and single model. Finally for mixed protein-nucleic acid complexes, the following criteria was used, molecular mass range 8 to 1000 kDa, technique (X-ray crystallography), protein and RNA, protein and DNA, 95% similarity and single model.

SAXS data collection

SAXS data were collected at beamline 12.3.1 of the Advanced Light Source at the Lawrence Berkeley National Laboratory[2]. SAXS data were collected as a 2/3rds dilution series using 20 uL samples and three different exposures. Exposures generally follow a short, medium and long time consisting of 0.1, 1 and 6 seconds or 0.5, 1 and 8 seconds and were merged as described[10]. Samples after gel-filtration purification eluted within the range of 1.5 and 3 mg/mL and for each sample, buffer was collected from the gel-filtration column after 1.2 column volumes for corresponding matching SAXS buffers. For each sample, aggregation and interparticle interference was assessed using overlay plots of the concentration series in Gnuplot (http://www.gnuplot.org). Fits to the Guinier region (q⋅R < 1.3) were performed with software at beamline 12.3.1 (Robert Rambo, Lawrence Berkeley National Lab) and all data graphs were prepared with Kaleidagraph (http://www.synergy.com) and gnuplot. Figures with structural models were prepared with VMD and rendered with Povray (http://www.povray.org).

SAXS data analysis

For each SAXS dataset used in this study, linear fits to the Guinier region were performed with ruby scripts, rubyGSL (by Yoshiki Tsunesada) and the GNU Scientific Library (http://www.gnu.org/software/gsl/) for the determination of Rg and I(0). The Guinier parameters were subsequently used to calculate an extrapolated scattering dataset to zero angle at intervals determined from the average scattering vector increment, Δq. Based on an extrapolated dataset, Vc was calculated by dividing the Guinier I(0) by the area of the transformed intensity taken as the product of q·I(q) and integrating using the trapezoid rule. For simulated atomic SAXS profiles, extrapolation was not necessary. Simulated atomic SAXS profiles were calculated with FOXS as it can calculate scattering profiles at specified scattering vector increments consistent with experimental measurements whereas CRYSOL (without an input SAXS dataset) can only calculate a maximum of 256 scattering intensities at a specified maximum scattering vector. Typical datasets collected at a maximum q of 0.32 Å−1 at beamline 12.3.1 produce ~500 data points with the beamstop centered in the middle of the detector. Visual comparison of atomic SAXS profiles from FOXS with CRYSOL did not illustrate any systematic differences. For experimental SAXS datasets that were fit to an input PDB model, CRYSOL was used with default input parameters. In these cases, CRYSOL reports chi and not chi-square for the model fits in the output log file.

Conformational Simulation

SAM-1 riboswitch molecular dynamics simulations were performed with CNS as described[13]. Briefly, the SAM crystal structure (PDB: 2GIS) was analyzed with FIRST and FRODA[31] at several energy cut-offs to determine plausible rigid and flexible regions within the structure. These were used to ascribe constraints within the structure for molecular dynamic simulations with CNS using anneal.inp. The CNS input file was modified to remove the electrical potential from the energy function and calculations were performed as torsional angle dynamics only. For each simulation, 2000 steps were recorded in the trajectory file and each step was written to file as a PDB. CONCOORD simulations with 1REF were performed with the following command line argument: to generate 1000 possible conformations close to the starting input structure. The resulting PDB files were fit to the experimental SAXS dataset with CRYSOL and the output intensity file for each PDB conformation was used to calculate Vc.

Simulating Noisy SAXS Datasets

SAS intensities over a single exposure will range over several decades and consequently, the noise levels will vary throughout the measured q-region. Therefore, we used intensity uncertainties from previously collected SAXS experiments as a source of realistic noise for the simulated SAXS datasets. The noise level of the empirical SAXS curve is reported as the average relative noise in the last third of the observed q-range (Fig. 4). For a selected q, the simulated I(q) was randomly displaced based on a random draw using the Box-Muller transform of a standard Gaussian distribution parameterized by the empirical intensity, I(q)_obs, and uncertainty, error(q)_obs. The Box-Muller transform returns two possible values and a random binary selection was used to provide a final single value for the displacement of the simulated I(q), I(q)_displaced. The simulated error(q) was reported as I(q)_displaced * error(q)_obs/I(q)_obs.

22 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Comments on the NIGMS PSI.

Authors: Stephen C Harrison
Journal: Structure Date: 2007-11 Impact factor: 5.006

3. Analyzing the flexibility of RNA structures by constraint counting.

Authors: Simone Fulle; Holger Gohlke
Journal: Biophys J Date: 2008-02-15 Impact factor: 4.033

Review 4. X-ray solution scattering (SAXS) combined with crystallography and computation: defining accurate macromolecular structures, conformations and assemblies in solution.

Authors: Christopher D Putnam; Michal Hammel; Greg L Hura; John A Tainer
Journal: Q Rev Biophys Date: 2007-08 Impact factor: 5.318

5. Free R value: a novel statistical quantity for assessing the accuracy of crystal structures.

Authors: A T Brünger
Journal: Nature Date: 1992-01-30 Impact factor: 49.962

6. Crystallography & NMR system: A new software suite for macromolecular structure determination.

Authors: A T Brünger; P D Adams; G M Clore; W L DeLano; P Gros; R W Grosse-Kunstleve; J S Jiang; J Kuszewski; M Nilges; N S Pannu; R J Read; L M Rice; T Simonson; G L Warren
Journal: Acta Crystallogr D Biol Crystallogr Date: 1998-09-01

7. Prediction of protein conformational freedom from distance constraints.

Authors: B L de Groot; D M van Aalten; R M Scheek; A Amadei; G Vriend; H J Berendsen
Journal: Proteins Date: 1997-10

8. Linking crystallographic model and data quality.

Authors: P Andrew Karplus; Kay Diederichs
Journal: Science Date: 2012-05-25 Impact factor: 47.728

9. Assessing the quality of solution nuclear magnetic resonance structures by complete cross-validation.

Authors: A T Brünger; G M Clore; A M Gronenborn; R Saffrich; M Nilges
Journal: Science Date: 1993-07-16 Impact factor: 47.728

10. Probing counterion modulated repulsion and attraction between nucleic acid duplexes in solution.

Authors: Yu Bai; Rhiju Das; Ian S Millett; Daniel Herschlag; Sebastian Doniach
Journal: Proc Natl Acad Sci U S A Date: 2005-01-12 Impact factor: 11.205

334 in total

1. The PTEN Tumor Suppressor Forms Homodimers in Solution.

Authors: Frank Heinrich; Srinivas Chakravarthy; Hirsh Nanda; Antonella Papa; Pier Paolo Pandolfi; Alonzo H Ross; Rakesh K Harishchandra; Arne Gericke; Mathias Lösche
Journal: Structure Date: 2015-08-20 Impact factor: 5.006

2. Allosteric Activation of Bacterial Swi2/Snf2 (Switch/Sucrose Non-fermentable) Protein RapA by RNA Polymerase: BIOCHEMICAL AND STRUCTURAL STUDIES.

Authors: Smita Kakar; Xianyang Fang; Lucyna Lubkowska; Yan Ning Zhou; Gary X Shaw; Yun-Xing Wang; Ding Jun Jin; Mikhail Kashlev; Xinhua Ji
Journal: J Biol Chem Date: 2015-08-13 Impact factor: 5.157

3. Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy.

Authors: Tadeusz L Ogorzalek; Greg L Hura; Adam Belsom; Kathryn H Burnett; Andriy Kryshtafovych; John A Tainer; Juri Rappsilber; Susan E Tsutakawa; Krzysztof Fidelis
Journal: Proteins Date: 2018-02-07

4. Biophysical investigation of type A PutAs reveals a conserved core oligomeric structure.

Authors: David A Korasick; Harkewal Singh; Travis A Pemberton; Min Luo; Richa Dhatwalia; John J Tanner
Journal: FEBS J Date: 2017-08-01 Impact factor: 5.542

5. Accurate SAXS profile computation and its assessment by contrast variation experiments.

Authors: Dina Schneidman-Duhovny; Michal Hammel; John A Tainer; Andrej Sali
Journal: Biophys J Date: 2013-08-20 Impact factor: 4.033

6. The structure of the box C/D enzyme reveals regulation of RNA methylation.

Authors: Audrone Lapinaite; Bernd Simon; Lars Skjaerven; Magdalena Rakwalska-Bange; Frank Gabel; Teresa Carlomagno
Journal: Nature Date: 2013-10-13 Impact factor: 49.962

Review 7. Structural insights into NHEJ: building up an integrated picture of the dynamic DSB repair super complex, one component and interaction at a time.

Authors: Gareth J Williams; Michal Hammel; Sarvan Kumar Radhakrishnan; Dale Ramsden; Susan P Lees-Miller; John A Tainer
Journal: DNA Repair (Amst) Date: 2014-03-20

8. Structural model of the dimeric Parkinson's protein LRRK2 reveals a compact architecture involving distant interdomain contacts.

Authors: Giambattista Guaitoli; Francesco Raimondi; Bernd K Gilsbach; Yacob Gómez-Llorente; Egon Deyaert; Fabiana Renzi; Xianting Li; Adam Schaffner; Pravin Kumar Ankush Jagtap; Karsten Boldt; Felix von Zweydorf; Katja Gotthardt; Donald D Lorimer; Zhenyu Yue; Alex Burgin; Nebojsa Janjic; Michael Sattler; Wim Versées; Marius Ueffing; Iban Ubarretxena-Belandia; Arjan Kortholt; Christian Johannes Gloeckner
Journal: Proc Natl Acad Sci U S A Date: 2016-06-29 Impact factor: 11.205

9. Minimal effects of macromolecular crowding on an intrinsically disordered protein: a small-angle neutron scattering study.

Authors: David P Goldenberg; Brian Argyle
Journal: Biophys J Date: 2014-02-18 Impact factor: 4.033

10. Site-specific covalent labeling of large RNAs with nanoparticles empowered by expanded genetic alphabet transcription.

Authors: Yan Wang; Yaoyi Chen; Yanping Hu; Xianyang Fang
Journal: Proc Natl Acad Sci U S A Date: 2020-08-31 Impact factor: 11.205