Literature DB >> 32383755

ARIAweb: a server for automated NMR structure calculation.

Fabrice Allain¹, Fabien Mareuil², Hervé Ménager², Michael Nilges¹, Benjamin Bardiaux¹.

Abstract

Nuclear magnetic resonance (NMR) spectroscopy is a method of choice to study the dynamics and determine the atomic structure of macromolecules in solution. The standalone program ARIA (Ambiguous Restraints for Iterative Assignment) for automated assignment of nuclear Overhauser enhancement (NOE) data and structure calculation is well established in the NMR community. To ultimately provide a perfectly transparent and easy to use service, we designed an online user interface to ARIA with additional functionalities. Data conversion, structure calculation setup and execution, followed by interactive visualization of the generated 3D structures are all integrated in ARIAweb and freely accessible at https://ariaweb.pasteur.fr.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 32383755 PMCID： PMC7319541 DOI： 10.1093/nar/gkaa362

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Protein structure determination is crucial for understanding protein function, as it paves the way to the discovery of new drugs and of new approaches to control pathological biological processes. The recent advances in structural biology now allow collecting structural information from a variety of techniques at various resolutions (1). For decades, X-ray crystallography was the method of choice to determine structures of macromolecules at atomic resolution. Owing to the latest technical revolution in cryo-electron microscopy (cryo-EM), structures of large complexes can now be readily obtained at very high resolution without the need to obtain crystalline samples. On top of these methods, nuclear magnetic resonance (NMR) spectroscopy is the only technique that allows to determine the structure and dynamics of macromolecules in solution (2). Yet, high-resolution NMR is still limited by the macromolecule size and solubility as well as the necessity to obtain isotopically labeled samples in sufficient amount. Three-dimensional (3D) structure determination by NMR relies on efficient and reliable assignment strategies (3). Specifically, interpretation of nuclear Overhauser effects (NOEs) as structural data is often ambiguous due to chemical shift degeneracy and spectral overlap. Manually assigning NOE cross-peaks to the corresponding atoms in order to derive distance restraints is a tedious process that generally depends on the, possibly subjective, examination by a trained expert. To automate this process and improve objectivity, software packages such as ARIA (4), CYANA (5), UNIO (6) or ASDP (7) are being actively developed. The ARIA (Ambiguous Restraints for Iterative Assignment) software package automates the treatment of NMR data and calculation of 3D structures (4,8). In an iterative manner, ARIA automatically assigns NOE cross-peaks and converts them into distance restraints in order to compute structure ensembles by molecular dynamics simulation (9). In the last round of the CASD-NMR (Critical Assessment of Automated Structure Determination by NMR) challenge (10), ARIA consistently produced accurate structures with a mean backbone RMSD of 1.1 Å for 10 targets (11). Despite being well established in the structural NMR community, the usage of ARIA can be hindered by the complexity of its installation and the sophistication of the configuration and execution tasks. Although being parallelizable, the iterative protocol in ARIA requires access to a computer cluster to reduce execution time. To improve the user experience for ARIA and provide a larger access to automated NMR structure calculation procedures, we have developed the ARIAweb server, with the aim of guiding users (from novice to expert) through the different stages required for NMR structure determination and providing an online service for computationally intensive calculations. Various software packages are used by the NMR community to process and analyze NMR data prior to engage in structure calculation, each having their own proprietary file formats which needs to be converted into the internal ARIA format. To circumvent this issue, we propose an online data conversion tool for which there is no equivalent user interface in the standalone ARIA package. Finally, outputs from an ARIA calculation, such as 3D structures and NMR restraints statistics, are displayed with graphical and interactive representations to ease inspection and interpretation of the results in terms of reliability and quality.

METHODS

ARIA protocol

ARIAweb implements the latest release of ARIA version 2.3 (4,12) in combination with CNS version 1.21 (13,14) modified with dedicated ARIA routines. The ARIA protocol matches NOE cross-peaks and assigned NMR resonances to convert them into ambiguous distance restraints that are subsequently used to calculate structures by means of molecular dynamics simulated-annealing (9). Considering data and structure consistency, these ambiguous distance restraints can be further refined by iteratively detecting unsatisfied restraints and generating new conformers which better explain the input NMR data. This approach is fully automated in ARIA 2.3. In short, the molecular sequence, NMR cross-peak and chemical-shift lists and other known restraints are supplied to ARIA for automated NOE assignment and structure calculation (Figure 1). ARIA 2.3 uses an iterative protocol where NOE cross-peaks are partially assigned, calibrated to obtain distance estimations and analyzed for restraint violations from the structure ensemble calculated in the previous iteration. This produces a new distance restraint set for the next iteration. After all iterations (typically 9) are completed, the lowest energy conformers are refined in a shell of solvent molecules (15). The final ensemble of conformers is validated with PROCHECK (16), WHATCHECK (17) and MolProbity (18). ARIA version 2.3 integrates all latest developments for objective and reliable NMR structure determination: a bounds-free log-harmonic distance restraint potential with optimal weighting factor (19,20) and an adaptive violation tolerance estimation (11) along with restraint-combination and network-anchoring procedures (21,22). Additionally, ARIA 2.3 includes routines to calculate structures of symmetric oligomers (C2 to C5 and D2 symmetries) (22,23).

Figure 1.

Schematic representation of the ARIAweb workflow. Input data can be converted to the ARIA XML format with the Data conversion tool. Data are then sent for structure calculation setup, where a user can define custom parameters. Once ready, the structure calculation job is started and enters the ARIA iterative procedure for NOE assignment and structure generation. After a validation step, structure calculation results are available for download and presented for interactive visualization.

SERVER FUNCTIONALITIES AND USAGE

For the sake of modularity, calculations performed on the ARIAweb server are organized as projects with three main sections corresponding to the provided services: Data conversion, Structure calculation and Result visualization (Figure 1).

Data conversion

The minimal input data to perform ARIA calculations are (i) the sequence of the molecular chain(s) composing the molecule for which a user wants to calculate the structure, (ii) one (or more) list(s) of NOE cross-peaks and (iii) one (or more) list(s) of assigned chemical shifts. The input data for molecular sequence and NMR data (NOE cross-peak and assigned chemical shift lists) consist in formatted text files produced by popular NMR analysis software. Due to the variety of formats to store spectra and chemical shifts information, data has to be converted into the ARIA XML format before setting up a Structure calculation. For a Data conversion project, supported formats for NOE and chemical shift lists are XEASY, SPARKY, NMRVIEW and ANSIG. Once all input data has been uploaded, the conversion can be processed online (Figure 2). Upon completion of a conversion job, users have the choice to download an archive with the converted data ready to be used with a standalone ARIA installation or to initialize a Structure calculation directly within the main project space. Alternatively, CCPN (Collaborative Computing Project for NMR) projects containing all required data in a format-free manner (24), archived as a single tar.gz file, can be directly accepted as input for Structure calculation without any conversion. Currently, ARIAweb only supports standard amino-acid residues or DNA/RNA bases definitions and incorporation of zinc ions in tetrahedral coordination. We recommend to use the standalone version of ARIA when modified residues or other organic ligands have to be included.

Figure 2.

Screenshots of the jobs management, structure calculation setup forms and the interactive results visualization page, which allows the user to view and analyze the results of a structure calculation job.

Structure calculation

Similarly to the graphical user interface (GUI) of the standalone ARIA program, the Structure calculation service allows users to specify parameters such as input data, NOE assignment criteria, number of conformers to generate or parameters for the solvent refinement. When initiating a Structure calculation from a Data conversion job, mandatory input data are pre-filled and shown in the corresponding forms (Figure 2). When using data from a CCPN project, a list of all compatible data entries are listed for each appropriate input data type. Structure Calculation on ARIAweb is designed to guide the user through the main categories of parameters that need to be checked (if default values are used) or specified (if a user wants to change with customized values). The four main categories are Setup (generic information of the project), Data (input data organized by their type), Protocol (parameters related to the ARIA iterative protocol) and Structure generation (generation of structures with CNS). After saving a Structure calculation project, users can download an XML file readily usable with a standalone ARIA installation, or submit directly the ARIA structure calculation job online.

Results visualization

Once a structure calculation job has been submitted and finished, users can download a ZIP archive containing the entire ARIA run directory with the ensemble of conformers (PDB format), NOE violation statistics, NOE assignments and structure validation scores (text format and graphical PDF) (25). If a CCPN project has been used as input, a new project is generated, notably including new assignments of NOE cross-peaks made by ARIA that can be analyzed further with the appropriate CcpNmr Analysis software (26). More importantly, results can be inspected online via an interactive visualization dashboard (Figure 2). Structure ensembles generated by the server will be displayed interactively for all ARIA iterations with various representations. Aside from the traditional ‘cartoon’ representation with colored secondary structures elements, users can directly visualize, on the three-dimensional structure, color-coded statistics such as the average RMS (root mean square) of NOE violations per residue or the RMSF (root mean square fluctuation) of the generated ensemble in ‘sausage‘ representation. Such representations allow users to easily identify unconverged regions in the final structure ensemble or residues for which NOE-derived distance restraints are not consistent, which are indicators of problematic or unsatisfactory interpretation of some NOE data by ARIA and that need more in-depth inspection from the expert (27). Restraints statistics, ensemble RMSD or color-coded structure quality scores are also presented as graphs and charts (Figure 2). Additionally, a standardized table reporting various restraints and structural statistics is generated.

SERVER IMPLEMENTATION

An overview of the ARIAweb server implementation and the technologies employed is shown in Figure 3. The web application features dynamic forms such as Data conversion and Structure calculation forms formatted with Bootstrap (https://getbootstrap.com). The content displayed on the client-side comes from the templating system of a Django back-end server (https://www.djangoproject.com). Input data sent to the server are securely saved in a PostgreSQL database (https://www.postgresql.org) using the concept of Object Relational Mapping from the Django framework. To perform and monitor the execution of ARIA jobs on a HPC (high performance computing) cluster, the Django server communicates with the Pasteur Galaxy server (28) through the BioBlend REST API (29). Galaxy is a scientific workflow management system that provides means to build multi-step computational analysis akin to recipes (30). Job status is reported to the user in real-time with the following stages: Building, Pending, Running and Success. Finally, the results are displayed as a single page application written in ReactJS (https://reactjs.org) and connected to the back-end with the Django REST framework. This application makes use of the NGL JavaScript library for protein structure visualization (31,32).

Figure 3.

Diagram describing the ARIAweb implementation workflow. The front-end interface (in the web browser) basically corresponds to the visual aspect of the website and the logics related to graphical components. On the server side, back-end services are in charge of the interaction with the database, the computing cluster and generate the context for the front-end interface. A detailed description of the server functionalities is provided in the online documentation along with example input files in various formats, a step-by-step tutorial and a set of recommendations to help users interpret the results. Additionally, all forms requiring an input file to be specified contains a link to a corresponding example file. Due to the complexity of the Structure calculation section, several customizable parameters are hidden by default. Advanced users can reveal and modify those additional parameters by triggering the ‘expert mode’ in their user profile. Jobs submitted via ARIAweb are executed on the Institut Pasteur HPC cluster with 7000+ cores shared by all institutional services. Each Structure calculation job uses 12 cores simultaneously. Typical execution times for Data conversion and Structure calculation jobs are around 5 min and 90 min, respectively.

CASE STUDIES

To validate the efficiency of the ARIAweb server, Structure calculation projects were set up and performed on a benchmark of seven proteins or protein/RNA complexes, ranging from 56 to 160 amino-acids, for which the solution NMR structure has been solved by experts and deposited in the Protein Data Bank (33) (Table 1). Input data consisted in assigned chemical shift lists, lists of unassigned NOE cross-peaks (derived from 2D, 3D or 4D NOESY spectra) and dihedral angle restraints corresponding to the data originally used to determine the deposited PDB structure. The tested entries are: Tudor domain of the human Survival of Motor Neuron (SMN) protein (1G5V) (34), HRDC domain of the S. cerevisiae RecQ helicase (1D8B) (35), dimeric anti-sigma factor CsfB of B. subtilis (5N7Y) (36), mouse RBM20 RRM domain in complex with RNA (6SO9) (37), MANEC domain of the human hepatocyte growth factor activator inhibitor-1 (HAI-1) (2MSX) (38), Ni metallochaperone HypA from Helicobacter pylori (6G81) (39) and the N-terminal domain of the human mitotic checkpoint serine/threonine protein kinase BUB1 (CASD-NMR target HR5460A, 2LAH) (40). The precision (mean pairwise-RMSD of backbone atoms of well-ordered residues) of ARIAweb generated ensembles is consistently <1 Å indicating well-converged calculations (Table 1). PROCHECK (16) analysis shows that >95% of the residues are in the allowed regions of the Ramachandran plots as is expected for high-resolution NMR structures. More importantly, the bundle accuracy (mean pairwise RMSD between ARIAweb and PDB ensembles) is <1.6 Å for all tested entries except HypA, revealing that automated NMR structure calculation with ARIAweb yields structures very similar to the reference PDB structures (Figure 4). The lower accuracy (1.96 Å) observed for the HypA protein can be explained by the slightly different orientation between the Zn- and Ni-binding domains with regards to the reference PDB structure, for which the authors already noticed a variability (39).

Table 1.

Target	PDB	Length.	# NOE cross-peaks (input)	# NOE restraints (output)	Ramachandran statistics (%)^b	NOE RMS (Å)	Bundle RMSD^a (Å)	Bundle Accuracy^a (Å)
Tudor	1G5V	56	2172	1160	90.0/10.0/0.0/0.0	0.112	0.35 (0.08)	1.01 (0.11)
HRDC	1D8B	91	3155	1698	89.3/8.4/1.3/1.0	0.064	0.44 (0.07)	1.36 (0.12)
CsfB^#	5N7Y	2 × 49	1658	2 × 709^c	77.8/18.9/0.0/3.3	0.143	0.33 (0.09)	1.12 (0.08)
RBM20	6SO9	112^d	6185	2982	86.7/10.1/2.6/0.6	0.142	0.13 (0.02)	1.57 (0.07)
MANEC	2MSX	114	5201	2349	84.0/15.2/0.2/0.6	0.288	0.58 (0.11)	1.41 (0.10)
HypA#	6G81	117	5336	2048	82.0/15.3/1.7/1.0	0.173	0.74 (0.24)	1.96 (0.32)
HR5460A	2LAH	160	17250	4807	88.5/11.2/0.3/0.0	0.175	0.38 (0.06)	1.25 (0.11)

aMean RMSD (sd) for backbone atoms of well-ordered residues.

bPercentage of residues of the full-length sequence in the most favored/allowed/generously allowed/disallowed regions of the Ramachandran plot.

cRestraint list is automatically copied for each chain in symmetric dimer.

dIn complex with 6 bp RNA.

#protein containing Zn ion bound in tetrahedral coordination.

Figure 4.

Overview of structure ensembles obtained with ARIAweb for the case studies using automated NOE assignment. For each entry, ARIAweb ensembles (in blue) are shown as cartoon and superimposed on the reference PDB ensembles (in red).

Summary of input data and output statistics for the case studies. For each target, the corresponding reference PDB entry is given along with the length of the protein sequence and the number of input NOE cross-peaks used. The number of NOE restraints generated by ARIAweb at the last iteration is also shown. Ramachandran statistics, RMS of NOE restraints violations and bundle precision (mean pairwise RMSD) are presented as indicators for the quality of the generated structure ensembles aMean RMSD (sd) for backbone atoms of well-ordered residues. bPercentage of residues of the full-length sequence in the most favored/allowed/generously allowed/disallowed regions of the Ramachandran plot. cRestraint list is automatically copied for each chain in symmetric dimer. dIn complex with 6 bp RNA. #protein containing Zn ion bound in tetrahedral coordination. Overview of structure ensembles obtained with ARIAweb for the case studies using automated NOE assignment. For each entry, ARIAweb ensembles (in blue) are shown as cartoon and superimposed on the reference PDB ensembles (in red).

CONCLUSIONS AND FUTURE WORK

In a field where the amount of experimental data and sizes of studied macromolecules are continuously growing, improving the usability and accessibility of appropriate software plays an important role in speeding up research. The development of the new ARIAweb server tends to solve most of the limitations of a standalone installation of the ARIA tool, not only for accessibility but also for gathering the successive conversion, structure calculation and analysis steps in one place. Adapting the tool as a web service facilitates the interaction and eases the learning curve with the user base and the deployment of new features on a cross-platform environment. ARIAweb will serve as a useful portal to narrow the differences in technical know-how of users by helping them to manage and perform NMR structure calculation projects through the different stages without the need to install ARIA and its dependencies, or to have access to a computing cluster. ARIAweb will continue to be enhanced by integrating more experimental or in silico data and by improving the robustness of its automated NOE assignments procedure, e.g. through the use of consensus bundles (41). Finally, we are committed to make ARIAweb compliant with the new NMR Exchange Format (NEF) standard (42) for better integration with next-generation NMR analysis software (43) and streamlined deposition of atomic coordinates and NMR restraints in public databases.

42 in total

1. ARIA: automated NOE assignment and NMR structure calculation.

Authors: Jens P Linge; Michael Habeck; Wolfgang Rieping; Michael Nilges
Journal: Bioinformatics Date: 2003-01-22 Impact factor: 6.937

2. Refinement of protein structures in explicit solvent.

Authors: Jens P Linge; Mark A Williams; Christian A E M Spronk; Alexandre M J J Bonvin; Michael Nilges
Journal: Proteins Date: 2003-02-15

3. A topology-constrained distance network algorithm for protein structure determination from NOESY data.

Authors: Yuanpeng Janet Huang; Roberto Tejero; Robert Powers; Gaetano T Montelione
Journal: Proteins Date: 2006-03-15

4. Graphical analysis of NMR structural quality and interactive contact map of NOE assignments in ARIA.

Authors: Benjamin Bardiaux; Aymeric Bernard; Wolfgang Rieping; Michael Habeck; Thérèse E Malliavin; Michael Nilges
Journal: BMC Struct Biol Date: 2008-06-05

Review 5. Advances in automated NMR protein structure determination.

Authors: Paul Guerry; Torsten Herrmann
Journal: Q Rev Biophys Date: 2011-03-17 Impact factor: 5.318

6. BioBlend: automating pipeline analyses within Galaxy and CloudMan.

Authors: Clare Sloggett; Nuwan Goonasekera; Enis Afgan
Journal: Bioinformatics Date: 2013-04-28 Impact factor: 6.937

7. Structure and dynamics of Helicobacter pylori nickel-chaperone HypA: an integrated approach using NMR spectroscopy, functional assays and computational tools.

Authors: Chris A E M Spronk; Szymon Żerko; Michał Górka; Wiktor Koźmiński; Benjamin Bardiaux; Barbara Zambelli; Francesco Musiani; Mario Piccioli; Priyanka Basak; Faith C Blum; Ryan C Johnson; Heidi Hu; D Scott Merrell; Michael Maroney; Stefano Ciurli
Journal: J Biol Inorg Chem Date: 2018-09-27 Impact factor: 3.358

8. A calculation strategy for the structure determination of symmetric dimers by 1H NMR.

Authors: M Nilges
Journal: Proteins Date: 1993-11

9. Structure calculation, refinement and validation using CcpNmr Analysis.

Authors: Simon P Skinner; Benjamin T Goult; Rasmus H Fogh; Wayne Boucher; Tim J Stevens; Ernest D Laue; Geerten W Vuister
Journal: Acta Crystallogr D Biol Crystallogr Date: 2015-01-01

10. The RCSB protein data bank: integrative view of protein, gene and 3D structural information.

Authors: Peter W Rose; Andreas Prlić; Ali Altunkaya; Chunxiao Bi; Anthony R Bradley; Cole H Christie; Luigi Di Costanzo; Jose M Duarte; Shuchismita Dutta; Zukang Feng; Rachel Kramer Green; David S Goodsell; Brian Hudson; Tara Kalro; Robert Lowe; Ezra Peisach; Christopher Randle; Alexander S Rose; Chenghua Shao; Yi-Ping Tao; Yana Valasatava; Maria Voigt; John D Westbrook; Jesse Woo; Huangwang Yang; Jasmine Y Young; Christine Zardecki; Helen M Berman; Stephen K Burley
Journal: Nucleic Acids Res Date: 2016-10-27 Impact factor: 16.971

3 in total

1. Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA.

Authors: Piotr Klukowski; Roland Riek; Peter Güntert
Journal: Nat Commun Date: 2022-10-18 Impact factor: 17.694

2. NMR Assignment through Linear Programming.

Authors: José F S Bravo-Ferreira; David Cowburn; Yuehaw Khoo; Amit Singer
Journal: J Glob Optim Date: 2021-03-11 Impact factor: 1.996

3. Structural basis for effector recognition by an antibacterial type IV secretion system.

Authors: Gabriel U Oka; Diorge P Souza; William Cenens; Bruno Y Matsuyama; Marcus V C Cardoso; Luciana C Oliveira; Filipe da Silva Lima; Iolanda M Cuccovia; Cristiane R Guzzo; Roberto K Salinas; Chuck S Farah
Journal: Proc Natl Acad Sci U S A Date: 2022-01-04 Impact factor: 12.779

3 in total