| Literature DB >> 35111815 |
Kumaran Baskaran1, D Levi Craft1, Hamid R Eghbalnia1, Michael R Gryk1, Jeffrey C Hoch1, Mark W Maciejewski1, Adam D Schuyler1, Jonathan R Wedell1, Colin W Wilburn1.
Abstract
The Biological Magnetic Resonance Data Bank (BMRB) has served the NMR structural biology community for 40 years, and has been instrumental in the development of many widely-used tools. It fosters the reuse of data resources in structural biology by embodying the FAIR data principles (Findable, Accessible, Inter-operable, and Re-usable). NMRbox is less than a decade old, but complements BMRB by providing NMR software and high-performance computing resources, facilitating the reuse of software resources. BMRB and NMRbox both facilitate reproducible research. NMRbox also fosters the development and deployment of complex meta-software. Combining BMRB and NMRbox helps speed and simplify workflows that utilize BMRB, and enables facile federation of BMRB with other data repositories. Utilization of BMRB and NMRbox in tandem will enable additional advances, such as machine learning, that are poised to become increasingly powerful.Entities:
Keywords: data federation; data repositories; nuclear magnetic resonance; reproducible research; structural biology
Year: 2022 PMID: 35111815 PMCID: PMC8802229 DOI: 10.3389/fmolb.2021.817175
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1Data flow in a hypothetical study involving federation of three separate databases. An application sends query requests to the three databases. Each database processes the query according to its server load, policies, maintenance cycle, and other factors. Because data records need to be merged, a synchronization step is required. Dotted arrows indicate transfers via the internet. The figure depicts a simple architecture involving 2-way merges. The number of steps and the computational cost of a merge is application-dependent, involving buffer lengths, differential latencies, format conversions, semantic translations, and other factors.
FIGURE 2Schematic representation of the NMRbox platform from an end-user and software developer perspective. Solid lines illustrate direct services available to users and developers, while dotted lines represent services that are available transparently through connections controlled by the NMRbox infrastructure. Users connect to NMRbox PaaS VMs (production or custom built) through VNC which provides a full GUI desktop environment or ssh for a command-line interface. The NMRbox website provides users the ability to check the status of VMs, start and end VNC sessions, help documentation, and a complete software listing with version information. File transfers are performed via ssh (scp/sftp) or the NMRbox Globus institutional endpoint. All NMRbox VMs (production, custom, and downloadable) are provisioned with software from the NMRbox software developer repository. All PaaS VMs are backed by significant computational resources and storage appliances in the HPC datacenter.
FIGURE 3Metadata widget for exploring Varian/Agilent NMR datasets. The metadata widget shown above parses and extracts metadata from the various files within a Varian data directory. This data is used to construct a visual representation of the information layout of the data in terms of the protein atom types correlated along the various spectral dimensions. The mapping between spectral dimensions and axes are not unambiguously defined within the spectrometer metadata; the user is provided with pull-down menus for documenting these relationships.
FIGURE 4Depending on the study, one of the two modes of BMRB data deposition and dissemination workflow is utilized.
FIGURE 5BMRB data content segmented by molecule type and information content. The type of information is segmented into 18 types and indicated along the left side. For each information type, the count of available data is shown to the right of the bar in the bar graph. Data was obtained on date id 2021-W50-4 (specified in ISO week format indicated as year-week-day.
FIGURE 6The Processing script and the data are co-located on the NMRbox platform. Access (read), as well data merge (write) operations are managed in a manner that is, similar to the way local files are accessed.
FIGURE 7Fraction of amide protons with one or more NOE restraint to an aromatic ring proton (y-axis), as a function of the Z-score of the amide proton (x-axis) is shown in panel (A). Proportions are calculated with respect to the total number of amide hydrogens with chemical shifts reported in entries with at least one amide-aromatic restraint. The numbers over each point in panel (A) are the total number of such amides (including those lacking any NOE restraints to a nearby aromatic) at the corresponding Z-score in units of standard deviation σ. The scatter plots in panel (B) shows the distribution of amide chemical shifts of amino acids (HIS, TRP, PHE, TYR) as a function of distance (for distances less than 8 Å) of the amide proton from the center of the nearest aromatic ring. For Z-score values between −2σ and 2σ, the restrained amide protons fractions are generally consistent among amino acids studied. However, outside the range of −2σ to 2σ, the trend of restrained amide protons is dependent on the amino acid (HIS, TRP, PHE, TYR). The violin plot in panel (C) illustrates the dependence of this trend on the amino acids.
| BMRB data standards | |
|---|---|
| Description | Link |
| Molecular and nomenclature standards | |
| Atom Nomenclature of amino acids and nucleic acids |
|
| Amino acids atom nomenclature conversion table |
|
| Amino acid pseudo atom nomenclature conversion table |
|
| Nucleic acid pseudo atom nomenclature conversion table |
|
| Amido acid description |
|
| Amino acid hydrophobicity table |
|
| Secondary structure propensities |
|
| Amino acid codons |
|
| Amino acid properties |
|
| Experimental standards | |
| Indirect chemical shift referencing |
|
| Random Coil Chemical shifts (Dyson, Wright, and coworkers) |
|
| Random Coil Chemical shifts (Wüthrich and coworkers) |
|
| Chemical shift index parameters (Wishart and coworkers) |
|
| Sequential and medium-range proton distances in polypeptide secondary structures |
|
| Pulse sequence library |
|
| Data format standards | |
| NMR-STAR documentation |
|
| NMR-STAR dictionary (GitHub) |
|
| mmCIF documentation |
|
| BMRB software resources | |
|---|---|
| Software resource | Link |
| NMR-STAR parser |
|
| BMRB-API |
|
| Data visualization in Python (PyBMRB) |
|
| Data visualization in R |
|