Literature DB >> 35480260

The exposome paradigm to predict environmental health in terms of systemic homeostasis and resource balance based on NMR data science.

Jun Kikuchi1,2,3, Shunji Yamada1,4,5.   

Abstract

The environment, from microbial ecosystems to recycled resources, fluctuates dynamically due to many physical, chemical and biological factors, the profile of which reflects changes in overall state, such as environmental illness caused by a collapse of homeostasis. To evaluate and predict environmental health in terms of systemic homeostasis and resource balance, a comprehensive understanding of these factors requires an approach based on the "exposome paradigm", namely the totality of exposure to all substances. Furthermore, in considering sustainable development to meet global population growth, it is important to gain an understanding of both the circulation of biological resources and waste recycling in human society. From this perspective, natural environment, agriculture, aquaculture, wastewater treatment in industry, biomass degradation and biodegradable materials design are at the forefront of current research. In this respect, nuclear magnetic resonance (NMR) offers tremendous advantages in the analysis of samples of molecular complexity, such as crude bio-extracts, intact cells and tissues, fibres, foods, feeds, fertilizers and environmental samples. Here we outline examples to promote an understanding of recent applications of solution-state, solid-state, time-domain NMR and magnetic resonance imaging (MRI) to the complex evaluation of organisms, materials and the environment. We also describe useful databases and informatics tools, as well as machine learning techniques for NMR analysis, demonstrating that NMR data science can be used to evaluate the exposome in both the natural environment and human society towards a sustainable future. This journal is © The Royal Society of Chemistry.

Entities:  

Year:  2021        PMID: 35480260      PMCID: PMC9041152          DOI: 10.1039/d1ra03008f

Source DB:  PubMed          Journal:  RSC Adv        ISSN: 2046-2069            Impact factor:   4.036


Introduction

The environment, from microbial ecosystems and materials to macrosystems such as the earth and its inhabitants, fluctuates within certain bounds due to many physical, chemical and biological factors. Within an “environmental system”, fluctuations in these factors at a certain level can be used to evaluate systemic homeostasis.[1] For example, a human can be thought of as a human–microbe hybrid or “superorganism”, whose homeostasis (i.e., healthy state) will be affected by (1) intrinsic properties,[2] (2) the “exposome” – namely, the totality of human physical, chemical and biological factors such as lifestyle choices (e.g., food, drink and drug intake), and (3) the acquisition of a stable ‘‘healthy’’ symbiotic microbe (the so-called “microbiome”).[3] Alterations in this superorganism will be manifested in the metabolic balance within human samples, such as serum, urine and faeces.[4] Similarly, natural ecosystems may be conceived of as interconnected environmental and metabolic systems; for example, carbon, nitrogen and phosphorus fluxes are biologically driven through ecosystems on a global scale, and also through biochemical pathways on an organismal or population scale. Therefore, naturally collected samples, including soil, sediment, water, algae, plants, fish and materials, contain much information on homeostatic fluctuations – in other words, the metabolic balance – within an ecosystem of interest.[5] With the aim towards a sustainable society, here we propose a shift to a research approach based on the exposome paradigm with a focus on nuclear magnetic resonance (NMR) data science to predict environmental health in terms of systemic homeostasis and resource balance. We first summarize the current status of world population growth and prospects for sustainable development (Section 1), and then outline current technology in NMR data science (Section 2). As examples of the application of NMR data science, we outline issues and proposals across human society in diverse fields such as the natural environment, agriculture, aquaculture, wastewater treatment in the industry, biomass decomposition and biodegradable plastics (Section 3).

Global human population growth and sustainable development

According to the statistical report of the Food and Agriculture Organization (FAO) of the United Nations, since 2010 the world population and production of agricultural, livestock and marine products has been rising (Fig. 1). In the future, the world population and the number of macro- and microplastics in the environment are expected to increase further.[6] Therefore, sustainable development of the world is one of the major challenges of our society. By 2050, we will need to provide affluent lives to an estimated population of over 9 billion and build a sustainable society that takes into account the circulation of natural resources.[7] To increase production while lowering the impact on the environment, advances in science and technology to increase production capacity, and reduce or reuse waste from agriculture, aquaculture and industry, are required. Reducing waste is one way to meet demand without increasing production. Food fermentation techniques offer the potential to extend the shelf life of products and create new value-added food ingredients.[8] From the viewpoint of waste reuse, energy and polymer materials can be produced from biomass.[9] Sustainable diets will need to protect biodiversity and the environment, to optimize natural resources, and to be culturally acceptable, accessible and affordable to various populations, while being safe and nutritious.[10] A shift towards lower meat consumption is one strategy to reduce the loss of biodiversity and to offset the effects of climate change,[11] but alternative sustainable, non-animal sources of protein must be developed.[12] Currently, vegetal sources dominate the global protein supply. From a health perspective, fish, algae and insects offer attractive alternatives to animal protein. Clearly these challenges are complex‚ requiring a focus on both human and earth health. These topics are important, but they require a very broad perspective and have not been considered from a bird's-eye view. In addressing them, “the exposome paradigm-based NMR approach” has many uses that can probe molecular complexity, including samples from natural ecosystems, agriculture, aquaculture and polymer materials development.[13]
Fig. 1

Global changes in population growth to 2050, and food production and emission of plastics to 2020. Values of world population, productions of meat and cereals were taken from FAOSTAT database (http://www.fao.org/faostat/en/#home) of Food and Agriculture Organization (FAO) of the United Nations. Emissions of macroplastics and microplastics were taken from the paper of Lebreton et al.[6] The dotted lines of world population, macroplastics and microplastics are the predicted values. Values of fishery total production, aquaculture production and capture fishery production were taken from databases (http://www.fao.org/fishery/statistics/en) of the FAO Fisheries Division.

Circulation of biological resources and water, and waste recycling in human society

Throughout human history, useful substances such as food, wood and oil have been produced from organisms in the land sphere and the hydrosphere. Between the ecosphere and the humanosphere, it is expected that recycling resources of molecular complexity, such as water, foods, natural resources and plastics, will be necessary to achieve Sustainable Development Goals (Fig. 2). In recent years, our oceans are faced with the problem of red tide due to the inflow of industrial wastewater and pesticides, and the environmental load from aquaculture. Moreover, human society is faced with the need for alternative resources to petroleum and the problem of microplastics due to the outflow of plastics to the environment. NMR will be useful in addressing some of these problems. It can analyse diverse biological and environmental samples and various materials, producing multiple data with inter-institution compatibility that are suitable for data science using informatics technics such as multivariate analysis, machine learning and database creation.[13] By coupling NMR with techniques such as inductivity coupled plasma optical emission spectrometry (ICP-OES), thermal analysis and next-generation sequencing (NGS), it is possible to collect data on chemical, physical and microbial factors.[14] Furthermore, the ecological environment will be improved by predicting and giving early warning of environmental changes, and by controlling key factors before the balance of the ecosystem is lost due to changes in these factors.[15,16]
Fig. 2

Recycling of sustainable resources between the ecosphere and humanosphere. Shown is a schematic summary of research broadly aimed at resource circulation of the global environment and human society. It is expected that recycling resources of molecular complexity, such as water, foods, natural resources and plastics will be necessary to achieve Sustainable Development Goals.

NMR data science approaches

Investigation of environmental homeostasis

“Exposome changes”, such as human lifestyle choices and environmental alterations in the ecosystem, will be manifested in the metabolic profile of human and environmental samples. NMR-based metabolic balance analysis (i.e., pattern recognition of metabolic profiles based on NMR spectroscopy of complex-mixture samples) can identify potential metabolites as diagnostic biomarkers associated with homeostasis as “environmental health” (Fig. 3). As mentioned in the Introduction, biologically driven carbon flows through ecosystems on a global level, but also through biochemical pathways at the organismal and population levels. The significant increase in carbon emission from soil in the past 40 years is reported to have made a small contribution to global climate change.[17] A more direct contribution of human activity to metabolic homeostasis results from resource extraction by humans. For example, overfishing can completely restructure patterns across trophic levels in coastal ecosystems.[18] Advances in agricultural/fishery science are providing approaches to overcome food shortages arising from population growth in developing countries.[19]
Fig. 3

Conceptual diagram of metabolic profiling using NMR spectra for evaluating homeostasis in ecosystems. Top, the schematic diagram shows collapse of an environmental ecosystem. Middle, the NMR signals of samples from environments can be profiled as a relative intensity change. Bottom, the corresponding variation of three states (normal, sign, and abnormal) are illustrated by hypothetical fluctuations in the concentrations of three metabolites (metabolite A, blue; metabolite B, yellow; and metabolite C, red).

Naturally collected samples, including soil, sediment, water, algae and fish, contain much information on homeostatic fluctuations or the metabolic balance of an ecosystem of interest.[5] Based on reports of the direct and indirect effects of agriculture, forestry and fisheries on natural ecosystems that produce biomass resources, we propose application of a recent analytical paradigm[20-29] to ecosystem research. As a specific example, we will consider changes in homeostasis in the coastal environment. Oceans cover 70% of the earth's surface, but the coastal environment is strongly influenced by human activities.[30] In aquaculture, for example, fishery products are grown at high density and fed with large amounts of nitrogen/phosphorus nutrient-rich feed to enable growth in a short period of time,[31] but most of the feed is released into the ocean without being metabolized. In a eutrophic environment, this can cause red tides and blue tides, which ultimately have a strong, negative impact on the fishery industry. Therefore, a method for predicting red and blue tides is an important research area. In addition to regular seawater sampling and metabolic profiling, we have developed a method for visualizing which factors are important by mapping various metabolites along with physical factors such as water temperature.[14] In addition, it has been shown that fish naturally have different metabolic and microbiota profiles, depending on diet and growing environment, which can be discriminated by modelling based on machine learning.[32-34]

NMR techniques

NMR data science involves analytical cycles based on the concept of measurement informatics, which consists of sample preparation,[35] NMR measurement,[36] database construction, data pre-processing and data analysis (Fig. 4). There are several types of NMR, which facilitate the study of chemical structure and dynamics. NMR can evaluate the components of mixtures from an exposome perspective. In addition, solid-state NMR can evaluate the higher-order structure of solid samples. High-field NMR is used for advanced analysis such as multidimensional NMR and solid-state NMR at the research level, while low-field NMR is used for simple measurements in the field. Regardless of approach, the measured data are stored in databases. As a result, chemical shifts in NMR spectra of samples can be assigned by reference to the spectral data of standard samples stored in databases. In the case of complex mixtures, the data are pre-processed to simplify multi-component signals. The processed data are then evaluated by data-driven analysis. Lastly, based on the evaluation and prediction, sample preparation is optimized through trial-and-error experiments.
Fig. 4

Strategy for the analysis of complex molecular systems by NMR. Shown is the concept of measurement informatics, which consists of a cycle of sample preparation, NMR measurement, database construction, data pre-processing and data analysis.

Solution-NMR

Recently, solution NMR has been applied to metabolic profiling, whereby NMR spectra of mixtures of small biological molecules are subjected to multivariate analysis to identify metabolite biomarkers,[37-41] or to evaluate nutrients in food.[22,42-44] Innovations in such non-targeted approaches to studying biological systems are important for improving biomass production and for better sustainability of these systems. Although one-dimensional (1D) solution NMR has been often used for metabolic analysis, the analysis of mixtures using traditional 1D NMR spectra is difficult because of signal overlap due to signal splitting caused by spin coupling. Therefore, multidimensional approaches to NMR-based metabolomic analysis have been applied.[45] For example, two-dimensional (2D) J-resolved spectroscopy can be used to determine spin–spin coupling constants, which are then used for structural analysis.[46] Solution-state multidimensional NMR has been also used to determine correlation signals in homo- and heteronuclear experiments to identify metabolites. In 1H–13C-heteronuclear single quantum coherence (HSQC), metabolites can be identified by comparison to a chemical shift database and previously reported HSQC spectra. To increase identification accuracy, 1H–1H correlation spectroscopy (COSY), 1H–1H totally correlated spectroscopy (TOCSY) and 1H–1H single quantum-double quantum correlation spectroscopy can be used to confirm correlation signals with adjacent atoms. By using HSQC-TOCSY and three-dimensional (3D) HCCH-COSY together, adjacent correlation signals are easily confirmed for annotated metabolites on HSQC spectra.[47] In solution NMR, solvent suppression is important for evaluating peaks of interest.[48] In this regard, pure shift techniques yield a spectrum in which all the J coupling multiplets are collapsed into singlets, while ultra-clean pure shift 1H-NMR has been applied to metabolomics profiling.[49] Furthermore, in metabolomic profiling of aqueous samples with 13C natural abundance, 2D real-time BIRD 1H–13C HSQC has been shown to improve spectral resolution in regions affected by spectral overlap.[50] To achieve sufficient sensitivity and resolution for a sample of metabolomic interest, acquisition of a 2D spectrum can take from tens of minutes to several hours. To reduce experimental duration, non-uniform sampling and ultrafast NMR have successfully been used in targeted and untargeted metabolomics and lipidomics.[51-53] Lastly, fast multi-scan single-shot COSY experiments have been applied to the quantification of major metabolites in biologically relevant samples.[54]

Solid-state NMR

Solid-state NMR spectroscopy provides information on native structures and dynamics useful for predicting and designing the physical properties of multi-component solid materials. It is being increasingly applied in material and life sciences.[5,55] In the characterization of solid-state samples with crystalline, interphase and amorphous domains (i.e., disordered materials), the anisotropy detected by static measurements is useful.[56] Many foods are solid or at least semi-solid, with restricted molecular motions relative to pure liquids.[57] The most important technique is magic angle spinning (MAS), which uses a defined angle of 54.7° to eradicate the nuclear dipole–dipole interactions, chemical shift anisotropy and variations in magnetic susceptibility that cause line broadening in the NMR spectrum.[58] Towards automated acquisition during variable-temperature, static and/or MAS NMR experiments, an external automatic tuning and matching (eATM) robot that can be attached to commercial and/or home-built MAS or static NMR probeheads has been developed.[59] In both multidimensional and multinuclear spectroscopy, high-resolution-MAS (HR-MAS) can provide clear and informative molecular characterization of complex heterogeneous systems (e.g., soil organic matter and plant-derived materials) and help to unravel the environmental reactivity of inorganic and organic materials.[60] Plant biomass, which mainly comprises polysaccharides, is one of the most abundant biomaterials in nature. Determining the structure and composition of plant polysaccharides, which are categorized as storage polysaccharides such as starch[61] and structural polysaccharides such as cellulose,[62] is therefore one of main challenges of biomass analysis. We have acquired 1D and 2D MAS spectra in a cross-polarization (CP) time series and used peak fitting techniques to establish valid models of crystalline and amorphous cellulose,[63,64] showing that matrix polysaccharides are composed of hemicelluloses and pectins. We have also developed methods for the chemical profiling of macroalgae by solution and solid-state NMR, Fourier transform-infrared (FT-IR) spectroscopy and ICP-OES.[28] We combined these methods in an integrated analysis of 107 algal samples, using the multi-instrumental data derived to characterize the samples according to their chemical diversity. Correlation analysis yielded chemotaxonomic clusters based on chemical diversity that were consistent with genetic linkages. The integrated analysis also indicated a relationship between alginate and ions including cadmium and arsenic, which are toxic to mammals.[21] We have also profiled the cellular biomacromolecules of Euglena gracilis, a flagellated protist that accumulates the polysaccharide paramylon in crystal form and triple-helical β-1,3-glucan as a storage polysaccharide, similar to brown algae.[65] For this analysis, we developed a cellular NMR approach using cells of 13C and 15N-labelled E. gracilis without any purification or fractionation steps. The measurements included 2D-/3D-NMR pulse sequences of solid-state NMR (e.g., “incredible natural-abundance double-quantum transfer experiment” [INADEQUATE], second-order Hamiltonian among analogous nuclei plus [SHA+], and 3D dipolar-assisted rotational resonance [DARR]), which required considerable measurement time. To support the more rapid evaluation of macromolecular mixtures, we subsequently developed the webtool InterSpin, which integrates the programs SpinMacro for the assignment of macromolecules, and PeaK SeParation (PKSP) and SENsitivity improvement with Spectral Integration (SENSI) for the separation of broad spectra.[66]

Time-domain NMR and MRI

Time-domain NMR and relaxometry with low-field and compact NMR instruments has several advantages, including simple sample preparation, easy handling and relatively low costs.[67,68] This is a useful approach to evaluate the crystalline state[69] and determine carbon chain lengths of fatty acid mixtures.[70] As a result, NMR-based relaxometry is widely used in various industries and research fields such as plants,[71] point-of-care testing,[72] food[73,74] and materials.[75,76] For the characterization of constituents in complex materials using low-resolution NMR relaxometry data, Laplace inversion using sparse representation methods has been applied.[77] Applications of low-field benchtop NMR include continuous-flow process monitoring,[78-81] food[82] and metabolomics analysis.[83] Food evaluation by foodpro[84] is also an important technique using benchtop-NMR for taste discrimination in food samples. Benchtop-NMR has been also applied as part of an organic synthesis robot with machine learning to search for new reactivity.[85] Overall, a range of challenging wide-line solid-state NMR spectra can be acquired by a maintenance-free, low-cost benchtop/mobile NMR spectrometer.[86] In spatial encoding and spatial selection methods, several sub-experiments are conducted in parallel in different spatial regions of the sample spectra.[87,88] As an example of spatial methods, magnetic resonance imaging (MRI) can be used for the noninvasive evaluation of spatial 1H density, diffusion or relaxation, and chemical shift in biological and other samples. It is also put to practical use in the medical field as a powerful technique for diagnostic imaging. We have observed intact textural features by using 1D MRI and proposed new pulse sequences, collectively named spatial molecular-dynamically ordered spectroscopy (SMOOSY), for NMR measurements driven by diffusion, relaxation and REST encoding.[89] In this approach, pseudo-3D SMOOSY spectra similar to MRI spectra are recorded. We have applied SMOOSY to non-invasive imaging of the shrimp body and two heterogeneous systems, showing that pseudo-2D SMOOSY spectral images can be used to assess the different dynamics of compounds at each spatial z-position of the samples.

NMR-based profiling methods

Because experimentally obtained NMR spectra might include useful and valuable information (relationships between different samples) that is not obvious, data science approaches, such as multivariate statistical analysis[90] and machine learning, should be incorporated into the analytical flow of metabolomics studies. Within the analytical flow, data pre-processing steps are very important[91] and typically include baseline correction, alignment, binning, normalization and scaling.[92] Correction of baseline distortions[93] and peak alignment[94] are necessary steps in spectral processing because these factors affect intensity values and result in inaccuracy in peak assignment and quantification. In the spectral bucketing or binning approach, the spectrum is split into small bins or buckets containing a range of variations for a specific peak shift, and the intensity of each bucket is calculated as the area under the curve.[95] Scaling and normalization are also important in data pre-processing because extreme differences in scales among multiple parameters can obscure the data, leading to inadequate interpretation.[96] Typically, an internal standard, such as 4,4-dimethyl-4-silapentane-1-sulfonic acid, is used to normalize datasets. Alternatively, z-scoring uses a standard normal distribution with the average set as 0 and the standard deviation as 1;[97] when analysing an entire spectrum, however, z-scoring is problematic in that it also normalizes noise. Lastly, the method of probabilistic quotient normalization is not susceptible to outliers because the median of numerous estimated values is used.[98] An essential component of NMR metabolomic studies is multivariate analysis, with methods falling roughly into two categories: unsupervised (e.g., principle component analysis, PCA[99]) and supervised (e.g., partial least squares, PLS[100]).[101] PCA is the most widely used method for data overview and trend (or cluster) identification in metabolomics, but it is also used to reduce dimensions in dataset pre-processing. Other representative, unsupervised, multivariate analyses include hierarchal clustering,[27] k-means clustering[102] and self-organizing maps.[103] Correlation-based analysis is also widely used in metabolomics studies.[104] In addition, market basket analysis, a statistical approach for identifying the co-occurrence of variables in datasets, has been applied to these studies.[105] Importantly, in terms of the exposome, metabolomics has great potential in the diagnosis and monitoring of disease using samples such as blood, saliva, urine and faeces. Recently, NMR metabolomic analysis of the human superorganism has provided insight into interactions between the host and intestinal microorganisms during the establishment of immunological homeostasis.[106-109] In addition, a Markov blanket-based feature selection method with an ecological–chemical–physical integrated network based on a Bayesian network inference algorithm has provided valuable information on fish home-range, and indicated that chemical and physical characterization of fish muscle can serve as an indicator for fish ecotyping and human impact monitoring.[110] Lastly, evaluation of environmental homeostasis, which is growing in importance with the global decline in environmental health, has been evaluated on multiple levels using NMR,[26,111-115] while metabolomics analysis has been reported for diverse samples,[116-130] including food,[131,132] plant,[133-137] animal,[138] human,[92,139] bacterial,[140] virus[141] and environmental samples.[142,143]

NMR databases and analytical tools

In NMR data-driven science, spectrum processing, data pre-processing, data analysis and signal assignment are carried out after NMR data acquisition.[144-146] In the standard approach, the samples are prepared, and the data are acquired by 1D or 2D NMR. After analysis, the data are then matched to those in relevant databases to identify biomarkers relevant to the exposome.[147] For this purpose, many useful informatics tools have been developed,[5,13,148-150] such as NOREVA,[151] PhenoMeNal,[152] Plasmodesma[153] and MetaboAnalyst3.[154-156] Solid-state NMR analysis is difficult because the spectra are broad with overlapping peaks.[157] To improve sensitivity and resolution, several methods have been developed for spectral separation,[66,158] apodization, zero filling, linear prediction, fitting and numerical simulation[159] such as covariance analysis,[160] SIMPSON,[161] SPINEVOLUTION,[162] dmfit,[163] EASY-GOING deconvolution,[164] INFOS,[165] Fityk,[166] ssNake,[167] signal deconvolution methods using short-time FT and non-negative tensor/matrix factorization[168] and noise reduction based on PCA.[169] Identifying factors related to homeostasis and materials recycling in living organisms and environments can be achieved by using Cytoscape[170] to integrate and analyse various sources of information such as genomic, transcriptomic and metabolomic data, environmental factors (chemical and physical) and behaviour. This method has been shown to be useful for clarifying health maintenance,[171] factors involved in disease[172] and the impact of environmental change,[21] as well as for the further development of predictive algorithms.[173] NMR is a widely used analytical technique with a growing number of available repositories. As a result, there is a need for an open-data format that will aid long-term storage of NMR data, ease data comparisons and encourage sharing and reuse of NMR data. The NMReDATA database stores chemical shift values, signal integrals, intensities, multiplicities, scalar coupling constants, lists of 2D correlations, relaxation times and diffusion rates related to the raw and spectral data.[174] Nmrshiftdb2, an open-data repository for organic structures and their NMR spectra, can import and export NMReDATA.[175] Similarly, nmrML serves as a storage format for the MetaboLights data repository,[176] while NMR-STAR is the archival format used by the Biological Magnetic Resonance Bank (BMRB),[177] the International Repository of Biomolecular NMR Data and the Worldwide Protein Data Bank.[178] An NMR database based on linked open data described in Resource Description Framework (RDF) has been developed by the National Bioscience Database Center,[179] Protein Data Bank Japan[180] and BMRB.[181] Constructing databases of compounds enables metabolites to be identified from complex spectra.[182] In addition, various databases have been developed for NMR-based metabolomics research,[13] such as HMDB,[183] MMCD,[184] BML-NMR,[185] NAPROC-13,[186] Mery-B,[187] Spektraris[188] and SDBS.[189] Regarding database and informatics tools for polymers, Polyinfo[190] and Polymer Genome[191] have been established. Various programs are available for automatically searching NMR peaks of compounds in these databases, such as InterSpin,[66] which includes tools for pre-processing of low-resolution NMR and signal assignment of macromolecules; SpinAssign[192] for HSQC spectra; SpinCouple[193] for 2D J-resolved spectra; and ECOMICS to assign chemical shifts of lignin and hemicellulose.[194] Particularly for studies of homeostasis, SpinCouple can quantitatively analyse metabolic changes in a time-course series of samples. Such quantification over time reveals the dynamics of homeostasis not only for humans but also for surrounding environments such as external ecosystems.[195] Lastly, the web server COLMAR has been developed for the analysis of complex mixtures and provides several tools.[196-201] Python, R and Matlab are the most popular programming language interfaces for data processing and machine learning. Regarding python-based applications, NMRbot improves automated data acquisition and has tools for optimizing experimental parameters on the fly.[202] When used in combination with python scientific libraries, nmrglue[203] and nmrpy[204] provide a highly flexible and robust environment for spectral processing, analysis and visualization; include common utilities such as linear prediction, peak picking and lineshape fitting; and can facilitate reading, writing and conversion of data stored in Bruker, Agilent/Varian, NMRPipe, Sparky, SIMPSON and Rowland NMR Toolkit file formats. The nmrstarlib package has been developed for accessing data stored in the BMRB.[205] Other python tools for NMR metabolomics include FOCUS,[206] Farseer-NMR[207] and KIMBLE.[208] There are also many R packages for NMR data handling, pre-processing and analysis,[209] such as AlpsNMR,[210] speaq[211] and PepsNMR.[212]

Machine learning in NMR

Machine learning (ML) approaches are integral to artificial intelligence (i.e., systems that imitate human intelligence) and are now used in many scientific fields. The two main types of ML algorithm are supervised learning and unsupervised learning.[213] In unsupervised ML, the system learns complex patterns more autonomously and identifies them with the purpose of summarizing, exploring and discovering. This approach requires only a few prior assumptions and little previous knowledge of the data. Four of the most commonly used methods for unsupervised ML in metabolomics data analysis are PCA, k-means clustering, hierarchical clustering and self-organizing map. By contrast, supervised ML trains on an annotated dataset and has a defined output. Its purpose is to determine an association between the response variable and the predictors (often referred to as covariates) and to make accurate predictions. Analyses involving discrete variables (e.g., control group vs. diseased group) are referred to as classification problems, while those involving continuous variables (e.g., metabolite concentration or gene expression level) are known as regression problems. Two of the most widely used methods for supervised ML in metabolomics are partial least squares-discriminant analysis (PLS-DA)[214] and support vector machine (SVM). PCA and PLS are both linear methods, but more complex non-linear ML methods, such as random forest (RF),[215-217] SVM[218-222] with a non-linear kernel, and artificial neural networks (ANNs), may be applicable to NMR-based metabolomics. Traditionally, the first step in building a predictive model, known as feature selection, is to convert raw data into a form that can be manageably processed. For metabolomics, this would be a matrix of metabolite concentrations. In the second step, the semi-manually extracted data are modelled by using a relatively simple predictive algorithm (e.g., ANN, SVM or PLS). In deep learning, both steps are incorporated into a single algorithm; this requires multiple layers of neurons that are stacked to sequentially deconvolve data from their raw state, abstract latent structure, and then effect prediction. With the shift towards interpretability, PLS has become the standard supervised multivariate method used by the metabolomics community.[223] For non-linear ML models, interpretability remains a major challenge; however, the need for high accuracy of classification may overshadow the need for model interpretability. Deep learning applications in NMR spectroscopy are based predominantly on three types of model: deep neural networks (DNNs), convolutional neural networks (CNNs) and recurrent neural networks (RNNs).[224] DNNs are well suited for complex high-dimensional data analysis, including both feature extraction and mapping, CNNs are useful for analysing spatial information, including spectrum reconstruction and denoising, and chemical shift prediction, while RNNs are often used for tasks that require processing of sequential inputs, such as time-domain signals (Fig. 5). Because free induction decay signals and time-series NMR spectra data are sequential, RNNs can provide guidance for processing time-domain data and time change data. We have successfully developed several ML-based analytical approaches, namely, a prediction method for metabolic mixture signals,[225,226] DNN-mean decrease accuracy,[32] ensemble DNN,[33] variable selection for regional feature extraction,[34] and methods for evaluation of surface water,[14] impact estimation of food intake on mice,[227] evaluation of human daily dietary intake[228] and relaxometric learning[229] in metabolomics studies. DNN has also been used to reconstruct non-uniformly sampled NMR spectra[230,231] and for solubility prediction,[232] while CNN has been used to remove noise from NMR spectra[233] and to reconstruct NMR spectra from non-uniformly sampled data.[234] DN-Unet combines structures of encoders–decoders, and CNN can be used to suppress noise in liquid-state NMR spectra to enhance SNR.[235] Lastly, NMR-TS automatically identifies a molecule from its NMR spectrum, and discovers candidate molecules whose NMR spectra match the target spectrum by using RNN and density functional theory-computed spectra.[236]
Fig. 5

Conceptual diagram of analysis of time-series data by a recurrent neural network, which is a machine learning method. Examples of time-series data are NMR spectral intensity change of metabolites, microbiome change, and weather change. Completing missing values is an important step in time-series data.

Several libraries have been constructed to ease implementation of ML, including MXNet (https://mxnet.apache.org/), PyTorch (https://pytorch.org/), Tensorflow (https://www.tensorflow.org/), scikit-multilearn (http://scikit.ml/), Keras (https://keras.io/), classyfire,[237] MetNormalizer,[238] Caret[239] and mlr.[240] By parallelising code to run on graphics processing units (GPUs), ML and simulation may be accelerated, opening new frontiers in NMR data science. GPU computing has been applied to Monte Carlo simulation,[241,242] prediction of NMR chemical shifts,[243,244] calculating the diffusion tensor for flexible molecules, deep learning for metabolomics,[245]de novo pulse sequence design in solid-state NMR,[246] reconstruction of non-uniformly sampled NMR spectra[230,231] and denoising.[235,247] GPU computing is readily available through a workstation-class machine or cloud computing services (e.g., Amazon Web Service, Google Cloud Platform, Microsoft Azure).

Application of NMR data science towards global sustainability

Natural environment

The global environment is a product of the interwoven activity of earth, plants, animals, microorganisms and human society. In terms of the exposome, organic matter and nutrients contained in wastewater from agriculture and human society, as well as minerals from the earth, exert great influence and are essential for healthy biological and chemical (e.g., microbes, organic matters, nutrients and minerals) cycles in nature (Fig. 6a). NMR data science has applications in the integrated analysis of organic, inorganic and microbial data as the environmental exposome related to eutrophication. In this case, organic matter was measured by NMR signal intensities, inorganic elements by abundance measurements using ICP-OES, microbial groups by next-generation sequencing, and nitrogen compounds by elemental analysis.[16] Subsequent correlation network analysis revealed that peptides signals from the HSQC spectrum of sediments in eutrophication were correlated with sulfate-reducing bacteria, sulfur, iron and nitrogen compounds (Fig. 6b). The HSQC spectra were analyzed by PCA, and the peptide signal in the sediments associated with eutrophication was related to geographic variation (Fig. 6c). The organic matter that accumulates in estuarine environments and provides habitats for various microorganisms can be analysed by using NMR techniques such as HSQC spectroscopy.[248] Dissolved organic matter in water bodies has been extensively investigated by NMR-based approaches. For example, DOM in estuarine environments has been evaluated by 2D correlation spectroscopy using 13C NMR and FT-IR spectra,[249] solid- and multidimensional solution-state NMR have determined the chemical composition of river DOM,[250] NMR coupled with isotope analysis has revealed refractory proteinaceous compounds in lake DOM,[251] while DOM in a shallow aquifer has been characterized by coupling 13C NMR spectroscopy with elemental analysis and ultraviolet-visible absorbance spectroscopy.[252] The agglomeration of organic matter in estuarine ecosystems has been evaluated by advanced solid-state NMR, revealing that a fraction of organic matter is analogous in composition to the cell wall components of bacteria.[253] Although microbial ecosystems in aquatic environments are often studied by NMR-based metabolomic approaches, concentrations of metabolites tend to be low. Thus, water sampling and pre-treatment are key steps in sample preparation for NMR measurements. In this respect, a method for concentrating metabolites derived from low-density planktonic communities for NMR measurements has been reported.[254] In marine environments, the chemical structures and compositions of particulate organic matter in water column and marine sediments have been characterized by solid-state NMR.[255,256] For solution-state NMR measurements, a method for the solid-phase extraction of DOM has been proposed,[257] and improved guidelines for extraction have been recently reported.[258] An NMR-based approach combined with ultrahigh resolution MS has shown that carboxyl-rich alicyclic material is abundant in subfractions of DOM.[259] Combined with high performance liquid chromatography measurements, 2D and 3D NMR has demonstrated that oxidized sterols and hopanoids may be dominant in DOM.[260] Multidimensional NMR to detect long-range proton–carbon correlations has been used to identify material derived from linear terpenoids in DOM.[261] Lastly, the rise in seawater temperature in recent years, coupled with urban industrialization and coastal eutrophication of fertilizer inflow from rural areas, has led to a collapse of the marine microbial ecosystem, resulting in marine damage such as red tides.[14] In this regard, we have studied analytically and numerically a stochastic phytoplankton-toxic phytoplankton–zooplankton system.[262]
Fig. 6

Integrated analysis of organic, inorganic and microbial data as the environmental exposome related to eutrophication of estuaries. (a) Conceptual diagram of wastewater from agriculture and industry flowing into the estuary. (b) Results of correlation analysis using organic matter measured by NMR, minerals measured by ICP-OES, microbes measured by next-generation sequencing, and nitrogen compounds measured by elemental analysis. (c) The HSQC spectra of geometrically and seasonally variable sediments were analyzed by PCA are shown. The red signal indicates a positive loading factor (PC2) calculated by PCA, while the blue signal is negative.

Agriculture

Plants and algae that live in natural environments are often precious resources that maintain the ecosystem.[263-265] In addition to providing food, they are sources of numerous compounds from low molecular weight molecules to polymers, have medicinal value, and provide energy.[266-268] NMR can help in targeting of breeding strategies; for instance, NMR has been used to compare eight fleshy fruit species during development and ripening in order to understand the mechanisms that link metabolism to phenotype.[269] NMR-based metabolomics has been applied to the study of other foodstuffs, including truffle, kiwifruit, lettuce and sea bass.[270] In combination with multivariate analysis, NMR-based metabolomics can be used to characterize differences based on geographical origin and plant varieties. For example, NMR metabolomics has been used to profile tomato variety, fruit development, ripening, quality, geographical origin, and daily and seasonal changes.[43,271] ICA and PLS-DA is superior to classical PCA regarding the verification of rice authenticity.[272] In addition, lignocelluloses in plants change significantly in response to environmental changes. In addition, a defence mechanism to prevent invasion of microbes hardens cell walls by changing the composition of lignin.[273] Therefore, structural characterization of these polymers is important for effective production of biomass as a resource. Solid-state NMR can be used to determine the composition of insoluble polymers and biomass; in this respect, several studies using 13C-labelled Arabidopsis thaliana have been reported.[274-279] Lastly, HR-MAS enables the direct application of NMR spectroscopy to semi-solid and gel-like samples,[60] can evaluate differences in the chemical composition caused by geographical factors.[280] The major advantage of HR-MAS NMR over liquid-state NMR is that there is no extraction step necessary which can lead to the loss of signals from non-soluble metabolites.[281] In agriculture, plant growth is affected by exposomes such as the soil, rhizosphere, and weather conditions (Fig. 7a). Relationships among numerous environmental factors can be explored by using NMR-based metabolic profiling with integrated analysis of various measurement data. For example, to examine an experimental agricultural field (“agroecosystem”), a data-driven approach for evaluating the agricultural exposome in terms of the metabolome (an examination of various metabolites using NMR), ionome (looking at the distribution of elements), microbiome (a comprehensive survey of microbial profile), and phenome (an examination of plant phenotypes) has been used (Fig. 7b).[282] In this case, NMR spectra of plant and soil detected 13C-labelled bondomers (i.e., succinic acid and proline) metabolized from root absorption of 13C-alanine (Fig. 7c). This network analysis revealed the significant role of organic nitrogen in increasing agricultural crop yield in the agroecosystem.
Fig. 7

Data-driven approach for evaluating the agricultural exposome. (a) Conceptual diagram of time-series variables such as the phenome, plant metabolome, rhizosphere microbiome, ionome, soil metabolome, and environment in the agricultural ecosystem. (b) Conceptual diagram of integrated network analysis of multi-omics data including the metabolome, phenome, microbiome and ionome. (c) Regions of interest in HSQC spectra of 13C-labelled compounds (i.e., succinic acid and proline in plant) metabolized from absorbed soil alanine and detected by integrated network analysis. 1JCC values are shown by blue letter. Deducible 13C–13C bondomers are indicated by red lines.

Soils contain numerous types of organic matter, ranging from small molecules to highly complex supermolecular structures.[283] Microorganisms and their degradation products are also important chemical components of the soil biomass.[284] In environmental research, NMR-based approaches have been applied to evaluate and characterize soil organic matter,[285] including chemical components and structures. For instance, a solution-state NMR study revealed that biomacromolecules derived from microorganisms and plants are the major components in soil humus,[286] a finding supported by another solution-state NMR study that used a solvent system solubilizing 70% of humin (a component of humus).[287] Solid-state NMR has also been applied to the characterization of organic matter using various approaches. For example, direct polarization (DP)-MAS, CP-MAS, DP-total suppression of spinning sidebands (TOSS), CP-TOSS and 1H–13C HETCOR have been used to characterize the chemical structures and composition of organic matter in peat,[288,289] anaerobically digested biosolids,[290] and terra preta and prairie soils.[291] NMR-based experiments have also evaluated sorption selectivity and adsorption properties in natural organic matter.[292-294] Both solid- and solution-state NMR measurements have been used to characterize the cellulosic supermolecular structures of rice straw and their degradation profiles produced by a microbial community in the paddy field.[295] Interestingly, the same chemical components of the rice straw biomass induced distinct metabolic reactions and reaction rates in different supermolecular structures when different key microbes were involved in degrading the plant biomass.[295] Moreover, an integrated metabolic dynamics approach using both 1D 1H and 1D 13C NMR spectra with time-course analyses has been proposed for the characterization of soil microbial activities in field environments.[296] Another NMR-based metabolomics study has compared agricultural and native state soils.[297] Lastly, NMR-based metabolomics has been used to analyse the interactions between soil microbial communities and plants, showing that soil microbial communities and their metabolites change in accordance with physical and chemical properties of the soil that are related to plant growth.[23]

Fisheries and aquaculture

Aquaculture provides an increasingly important source of protein for human consumption. In recent years, aquaculture technology has developed, and culture methods have become more intensive; however, there remain several issues, such as the aquaculture environment, feeding and disease. As an example of application of the exposome paradigm to aquaculture, the “muscle quality” of natural fish varies depending on the water temperature and nutrients in the environment where it grows, as well as the plankton and small fish that feed in this environment; as a result, the market value of fish meats also varies greatly.[110,298] Thus, “muscle quality” analysis data of natural fishes can be used to determine their origin and will contribute to maintaining and evaluating the market value. To this end, we have focused on the recent progress in NMR instruments not only for high-end laboratory analysis but also for bench-top analysis, and have developed a peak separation method from multi-variate analysis.[299] However, if a high-end instrument is used, it is possible to more clearly discriminate changes in chemical composition in the muscle due to the differences in the living environment and food. Therefore, by introducing cutting-edge machine learning calculations, into NMR analysis for evaluating the exposome during fish lifetime, we can either extract important factors (i.e., NMR peaks of metabolites) involved in determining the origin of natural fish (Fig. 8).[32] The an improved DNN-based analytical approach that incorporates an importance estimation for each variable using a mean decrease accuracy (DNN-MDA) approach have the best classification accuracy (97.8%) than other examined methods such as PLS, SVM, RF. In addition to this, the DNN-MDA approach facilitate the identification of important variables such as trimethylamine N-oxide, inosinic acid, and glycine, which are characteristic metabolites that contributed to the discrimination of the geographical differences between fish caught in the Kanto region and those caught in other regions. The ensemble DNN (EDNN) approach is applied to metabolomics data of fish muscles collected from Japan coastal and estuarine environments.[33] The performance of EDNN regression for fish size based on metabolic profiles is superior to that of DNN, RF, and SVM algorithms.
Fig. 8

Data-driven approach for evaluating the exposome during fish lifetime. Several machine-learning algorithms, such as partial least squares (PLS), support vector machine (SVM), random forest (RF), and deep neural network (DNN), can be applied to 1H NMR spectra from gobby muscle to (1) classify fish origins and (2) extract important factors based on features of goby muscle metabolites derived from their habitat environment. In the figure, FAs refers to fatty acids, IMP refers to inosinic acid, PC refers to phosphatidylcholine, PUFAs refers to polyunsaturated fatty acids, and TMAO refers to trimethylamine N-oxide.

As another example, fish meal is mainly used for feed in aquaculture, but only about 20% of the nitrogen and phosphorus content is assimilated into fish, while the other 80% enters the environment as residual food and faeces. To lower the environmental load, therefore, the development of both feed[300] that does not rely on fish meal and digitization technology for evaluation of ecosystems is required. In addition, nutrition for high trophic species in aquaculture is faced with the problems of sustainable fish and plant-based diets. Insects seem particularly promising for supplementing fish and plant-based diets. In this respect, 1H-NMR metabolomics profiling has been used to compare metabolites in insect diets and in fish plasma, liver and muscle,[301] and to evaluate growth performance and feed conversion ratios in fish fed plant-based, commercial, insect, spirulina and yeast diets.[302] Aquaculture has been hampered by a scarcity of biological knowledge of the farmed animals and their symbiotic gut bacteria. However, high-throughput NMR-based technologies or “omics” for investigating the genome, transcriptome, proteome, metabolome and microbiome are being gradually applied to aquaculture, and may improve current farming technology. For example, gut microbiota affects both nutrient acquisition and energy homeostasis in the host, and is thought to play an important role in the immune system. Owing to the flow of water through the digestive tract, the gut microbiota of fish and shellfish are especially dependent on the external conditions and change markedly in response to environmental and biological stimuli.[303] NMR-based metabolomics can be used to characterize such changes in gut microbiota. For instance, one NMR-based metabolomics study has shown that dietary conditions strongly affect the gut microbiota in fish;[24] notably, changes in the feeding of Epinephelus septemfasciatus were found to affect the metabolic and microbiota profiles of faeces. Similarly, in host and symbiotic metabolic analyses of wild yellowfin goby[15] and 24 natural fish diversities living in the Kanto and Tohoku region in Japan,[34] we found that the microbiota and metabolic profiles of faeces showed clustering trends based on feeding conditions. Thus, as an aggregate of final metabolic products, faeces is affected by diet. Given that the intestinal environment is reflected in the faeces of fish, this type of analysis might be an informative non-invasive technique for use in aquaculture. We have estimated the dietary components of several eel larvae species and the marine environment in the western North Pacific using metagenomic analysis.[304] In the future, metabolomic analysis using NMR may lead to an improvement in its early survival during aquaculture. Lastly, another NMR-based metabolite profiling study has evaluated time-series data under different feeding conditions in the rearing environment of leopard coral grouper to identify fish nutritional biorhythms and improve feeding in aquaculture.[300] As a radical solution to environmental problems, closed-circulation aquaculture – in which fish and shellfish are cultivated in limited water on land – is drawing attention. By controlling the closed environment, it is possible to attain high growth, modify fish meat components, improve quality, and build an automated system for breeding. In these systems, ammonia is highly toxic and must be monitored and removed; therefore, wastewater treatment is an important issue. Aquaponics, a resource-recycling type of food production system, is an innovative and sustainable hydroponic plant production system using fish waste (ammonia) that can play an important role in the future of environmental and socio-economic sustainability.[305] In addition, fish wastewater can be optimized as fertilizer to meet the specific demands of plant species.[306]

Wastewater treatment in industry

Just as inadequate removal of feed in the environment causes eutrophication in rivers, lakes and coastal environments, wastewater treatment is also important for environmental health. In other words, inadequate wastewater treatment can be regarded as the “exposome for an aqueous environment”, including the flow of nitrogen and phosphorus downstream (Fig. 9). For example, in practical wastewater treatment process, 31P NMR and machine learning of extreme gradient boosting (XGBoost) method was applied to model the phosphorus release performance of biofilm at different pH.[307] The application of NMR spectroscopy to characterization of wastewater has been previously reviewed.[308] Here we focus on environmental metabolomics studies of microbial ecosystems utilized in industrial processes such as wastewater treatment and bioremediation. Anaerobic microbial ecosystems are particularly useful in industry because they can degrade supermolecular assemblies into small molecules. To characterize the degradation processes in anaerobic microbial ecosystems, NMR-based metabolomic approaches have been used in conjunction with microbial community analyses. For example, to determine “metabolic sequences” (i.e., metabolic dynamics over time) of anaerobic fermentation, correlation-based analysis has been used to integrate data from metabolic profiling (evaluated by NMR-based metabolomics) with data from microbial community profiling[104] (as evaluated by denaturing gradient gel electrophoresis fingerprinting[309]). This approach enabled the entire set of metabolic processes to be visualized with their responsible factors in an anaerobic microbial ecosystem.[104] Stable isotope probing is a useful tool for characterizing and identifying key factors in microbial ecosystems,[310] and has been combined with DGGE-NMR analysis to both monitor metabolic variations of microbial ecosystems and capture transient metabolic fluxes with their responsible factors within microbial communities.[311] NMR spectroscopy is also well-suited to characterizing cellulose polymer degradation into biogases by anaerobic microbial ecosystems because it can measure solid-, solution- and gas-state compounds.[312] Indeed, the degradation of cellulose polymers and production of methane gas via organic compounds in anaerobic microbial ecosystems has been successfully demonstrated using triple-phase (solid-, solution- and gas-state) NMR. The detailed metabolic reactions of degradation processes have been characterized by NMR-based metabolomics combined with metagenomics using stable isotope labelling, identifying key metabolites, enzymes and microbe in the ecosystem.[313]
Fig. 9

Data-driven solutions for wastewater treatment. Conceptual diagram of wastewater exhausted from factories and aquaculture as the “exposome for an aqueous environment”. Next, operational data such as pH, dissolved oxygen (DO), and electrical conductivity (EC) are shown as examples of time-series data, as well as NMR spectra of macromolecules and biofilm microflora data analyzed by principal component analysis (PCA). Finally, a conceptual diagram of data-driven analysis for water purification using a database of those data is shown.

Regarding operation of wastewater treatment, the main obstacle preventing wider use of membrane bioreactors (MBRs) is membrane fouling (i.e., deterioration of membrane permeability), which increases operating costs. Thus, an understanding of the mechanisms of membrane fouling is important. In the study of membrane fouling, NMR has been used to investigate the nature of membrane foulant (proteinaceous) changes depending on the food-microorganism ratio,[314] and seasonal variation (proteinaceous at high temperatures, and polysaccharide-like or humic acid-like substances at low temperatures).[315] In these papers, consistent chemical structures have been detected in both the 13C CP-MAS NMR spectra (proteins (55 and 175 ppm), carbohydrates (75 and 105 ppm), aromatic carbons (110–165 ppm)), and the FT-IR spectra (the amido-I, -II bands and aromatic CC of proteins (1660, 1540, and 1620 cm−1), carbohydrates (1100 cm−1), and humic substances (1400 cm−1)). In an approach to non-invasively elucidate fluid–structure interactions in complex multispecies biofilms, pulsed-field gradient NMR has been used to measure the water diffusion in five different types of biomass aggregate, including sludge flocs, biofilms and granules.[316] Because the removal or degradation of particulate organic matter is a crucial part of biological wastewater treatment, particle transport into granular sludge beds has been visualized in 3D by MRI.[317] Lastly, NMR has been used to tune the hydrophobicity and fouling-liable properties of polyethersulfone membranes.[318]

Biomass degradation

Recycling in an ecosystem occurs through symbiosis between termite gut microbial communities, specific protists and fungi, and soil microbial communities. Plant-derived saccharides are degraded by soil microbial communities and eventually return to the soil.[296] Lignocellulose, a high-order structure and sparingly soluble polymer mixture formed from polysaccharide cellulose, hemicellulose and the polymer lignin, is the most abundant biomass on land.[319-321] The carbon source produced by trees is decomposed by the intestinal microbial community of termites, is further metabolized to amino acids through nitrogen fixation by the intestinal microorganisms, and eventually becomes the proteins that make up the termite body.[322] Furthermore, the decomposition of macromolecular complexes such as lignocellulose (comprising cellulose, hemicellulose and lignin) is an important process in the circulation of various elements. The carbon source from plants and nitrogen in the air is decomposed and metabolized to amino acids via nitrogen fixation by the gut microbes of termites. Carbon and nitrogen circulation via their faeces and remains represents the terrestrial exposome, and enriches and affects the forest and aquatic ecosystem (Fig. 10a).[323] Web tools and NMR database for biomass macromolecules are useful for evaluating terrestrial exposome such as plant composition evaluation of ball-milled samples dissolved in dimethyl sulfoxide (DMSO) solvent by solution NMR[324] and evaluation of plant cell wall quality by solid-state NMR[320] (Fig. 10b). For example, Bm-char in ECOMICS[194] can be used to characterize the HSQC spectra of biomass macromolecules measured by solution NMR (Fig. 10c). The database comprises 42 and 17 signals for, respectively, aromatic and aliphatic sites of lignin, and 26 signals for hemicellulosic sites and three uncategorized sites. Biomass samples are characterized by their composition of biomass macromolecules such as lignin and hemicellulose (Fig. 10d). The resulting pie-charts are categorized according to ‘detailed category’ items described on the Bm-Char website: namely ‘syringyl’, ‘syringyl (oxidized alpha-ketone)’, ‘guaiacyl’, ‘guaiacyl (oxidized alpha-ketone), ‘p-hydroxyphenyl’, ‘ferulate’, ‘p-coumarates’, ‘cinnamyl alcohol end group’ and ‘p-hydroxybenzoates’ for aromatic sites of lignin; ‘β-O-4’, ‘β-O-4-S’, ‘β-O-4-H/G’, ‘β-5’, ‘β–β’ and ‘5–5/4-O-β’ for aliphatic sites of lignin; ‘acetylated xylopyranoside’, ‘xylopyranoside’, ‘xylopyranoside + glucopyranoside’, ‘glucopyranoside’, ‘galactopyranoside’, ‘arabinofuranoside’, ‘mannopyranoside’, ‘fucopyranoside’ and ‘methyl-glucuronic acid’ for hemicellulosic sites; and ‘others’. The humus that accumulates in soil after biomass decomposition can adsorb metal elements, which are then taken up by detritus-eating animals that ingest the humus and are responsible for the bioaccumulation of various metals. Plankton and algae such as E. gracilis[319] are deeply involved in the cycle of materials and the food chain of aquatic ecosystems. The chemical components contained in seaweeds are used in, for example, foods, feeds, fertilizers and fine chemicals, and the adsorption of metal by polysaccharides is also attracting attention in water treatment processes. However, the composition of seaweeds is known to fluctuate with the seasons and cannot be controlled by humans. To capture the complex seasonal changes in the components of naturally growing seaweeds, it is necessary to comprehensively measure the components over time and evaluate them in an integrated manner.[21]
Fig. 10

Characterization of biomass macromolecules as the terrestrial exposome. (a) Conceptual diagram of biomass macromolecules related to carbon and nitrogen circulation as the terrestrial exposome. (b) Development of macromolecules database including solution and solid-state NMR data. In the case of solution NMR, biomass samples are ball-milled and dissolved in DMSO. (c) Overlay of the lignin aromatic region of 2D 1H–13C HSQC spectra of poplar (brown), Japanese cedar (light brown) and Erianthus sp. (green). Lignin signal assignments and their chemical structures are highlighted along with corresponding cross peaks. Assignmented compounds in the figure are the signals of syringyl (S2/6), guaiacyl (G2, G5 + G6), ferulate (FA2, FA6), p-hydroxyphenyl (H2/6), p-coumarate (pCA2/6, pCA3/5, pCA/8). (d) Pie-charts representing the percentage of biomass-related signals of 11 plant samples (i.e., grasses: Erianthus, napier grass, guinea grass, Brachypodium, rice, and wheat, herbs: Arabidopsis, and trees: sudajii, Japanese cedar, and poplar). Pie-charts are categorized according to ‘detailed category’ items described on the Bm-Char website (http://ecomics.riken.jp/biomass/).

NMR data science can be applied to investigations of these anaerobic microbial communities. For instance, the microbes responsible for the digestion of plant saccharides into short chain fatty acids (SCFAs) have been investigated in several systems, including animal gut,[37,38] waste management,[104,312,313] and paddy field soil.[295] These symbiont-derived SCFAs have attracted interest because they may be essential for survival of the plant or animal host, which in many cases has evolved to utilize the metabolic activity of their microbial symbionts. SCFAs are also important because they act as electron donors and can reduce heavy metals such as Fe(iii) and Mn(iv) under different environmental conditions. Moreover, SCFA-producing bacteria such as Lactobacillus spp. stimulate plant germination, and root and shoot growth in soil ecosystems; thus, SCFAs and the bacteria that produce them are central to the soil environment. On the whole, however, little information is available on the biomass degradation process, especially the dynamics of soil microbial communities and SCFA metabolism. In this respect, metabolic profiling based on NMR represents a powerful tool for the analysis of complex reactions in microbial ecosystems. This technique has been used to characterize total metabolic activity and identify biomarkers in a wide variety of systems.[325-327] Furthermore, the introduction of 13C-nuclei to NMR analysis has been a useful technological advance for the resolution of overlapping signals of saccharides that limit 1H-NMR.[328] Advantages of the 13C-labelling method also include the characterization of compounds in complex components, the determination of compound structures, and the tracking of microbial metabolic pathways by NMR or other analytical techniques.[328] We have previously successfully performed 13C-labelling in plants[329,330] and microbial systems,[331] as well as metabolic profiling using 2D and 3D NMR techniques.[276,332]

Biodegradable plastics

Plastics have been important components of disposable bottles and synthetic fibres; however, the highly stable microplastics[333] that result from the discharge of plastic waste into the environment and outflow out from rivers to oceans has led to a worldwide problem of pollution affecting the marine exposome (Fig. 11a).[334-337] The degradation process of plastics can be investigated by in vitro experiments conducted in the ocean. Biodegradable plastics, which can decompose in the environment, are a promising solution for the environmental problems caused by disposal of plastics.[338-341] Environmental degradation rates and pathways for the major types of thermoplastic polymer have been investigated.[342] As alternatives to petroleum resources, microbial products and plant biomass can be used to produce macromolecular materials such as plastics and feedstock.[343] For example, polymers such as polylactic acid (PLA),[344] poly-ε-caprolactone[345] and cellulose[62-64,312,313,320,346,347] have been prepared as high-performance materials with various properties. In developing such materials, it is necessary to analyse microbial and plant biomass as complex biochemical systems with multiple macromolecular components. In this respect, NMR spectroscopy can characterize the native structure, components and dynamics of these samples at the atomic level (Fig. 11b).[5,55] In addition, time-series NMR measurements produce 1H anisotropic and 13C spectra as descriptors of structure and property during the biodegradation process.
Fig. 11

Data-driven approaches to the development of biodegradable plastics. (a) Plastic pollution of the ocean from human society is a problem affecting the marine exposome. The degradation process of plastics can be investigated by in vitro experiments. (b) NMR measurements of 1H anisotropy and 13C spectra can detect chemical structures and dynamics of solid-state materials. (c) In the development of biodegradable plastics, data-driven scientific methods such as Bayesian optimization and Generative Topographic Mapping Regression (GTMR) can predict (1) NMR spectra after degradation of materials and (2) NMR spectra of materials with target properties.

The higher-order structure of materials has a significant influence on their macroscopic properties.[348] Traditional trial-and-error design approaches for new materials face significant challenges due to the vast design space. In addition, computational technologies, such as density functional theory[225] and molecular dynamics,[62] are usually computationally expensive; thus, it is difficult to calculate molecular structures from material properties. To address these problems, ML-assisted materials design is emerging as a breakthrough tool in many areas of science.[349] Furthermore, a materials informatics approach has been recently considered for materials design.[350] In this approach, “big data” deposited in databases, as well as higher-order structural data obtained during the materials production process,[85,190] are used to predict physicochemical properties of new materials. Furthermore, moulding conditions are important in developing a material with desired physical properties. In this regard, NMR, especially low magnetic field NMR which is used for routine material evaluation, produces vast datasets[66] that can be used in the prediction of NMR signals to aid in the cycle of developing materials and identifying structures with desired properties. Coupled with NMR measurements, inverse design using ML can be used to design polymers with desired phase behaviour. For example, ML has been used for cloud-point engineering of polymers, whose compositions were determined by 1H NMR.[351] NMR chemical shifts are rich in information and encode structural features of the molecules contributing to the physical, chemical and biological properties of a material; thus, they have also been used as a descriptor in quantitative structure–activity/property relationship modelling studies.[352] In designing higher-order structures of biodegradable plastics, physical properties such as glass transition, melting and degradation temperatures (Tg, Tm, and Td) related to biodegradability and durability should be controlled (Fig. 11c). Bayesian optimization has been applied in real and virtual degradable experiments of bioplastics.[353] From the results of degradation experiments, the weight average molecular weight was obtained as the objective variable, while initial crystallinity, moisture content in the compost environment, degradation period and NMR spectra were used as explanatory variables. These values were subjected to a decomposition degree predictive model consisting of ML. The optimum decomposition degree, various analytical values, and experimental conditions were explored by using a combination approach with a decomposition-degree predictive model and Bayesian optimization. For efficient material design based on the iterative cycle of preparation, measurement of samples and analysis of measured data, GTMR has also been applied to the analysis of CP-MAS spectra to predict 13C NMR spectrum of the material in its solid-state based on its thermophysical properties.[168] Microalgae (e.g., Chlamydomonas reinhardtii and Phaeodactylum tricornutum) are also suitable for environmental and biotechnological applications based on the exploitation of solar light.[354] Polystyrene is generally considered to be durable and biodegradable, but recently biodegradation of polystyrene by the enterobacteria of mealworms has been reported as a new way to deal with plastic waste.[355] Bioderived polyesters such as PLA are interesting alternatives to petrochemical-based plastics.[356] PLA is unlikely to hydrolyse in the low temperatures of marine environments; however, the carbonyl group of PLA absorbs UV radiation below 280 nm, making the polymer susceptible to photodegradation in marine environments. Lastly, biodegradation of polyethylene microplastics by the marine fungus Zalerion maritimum has been reported on the basis of FT-IR, NMR and scanning electrom microscope data.[357]

Conclusions and future perspective

In the move towards a sustainable society to meet global population growth, the exposome paradigm based on NMR data science will play an important role in comprehensively evaluating and predicting environmental health in terms of systemic homeostasis and resource balance. Because environmental homeostasis (i.e., environmental health) is largely affected by the “exposome” of all substances over time, NMR approaches have the advantage of being able to characterize fluctuations in molecular complexity in a time-course manner based on a data science approach.[13,358] NMR measurements also support diverse samples from polar to non-polar solvent systems, and from small molecules to supermolecular assemblies; they can also be applied to various physicochemical states, including gas, sol, gel and solid samples, facilitating structure, component, interaction, adsorption and diffusion analyses. In particular, solid-state NMR has a unique advantage for the evaluation of higher-order structures and dynamics such as crystalline and amorphous biomass and polymer materials. Because NMR produces vast numbers of data with inter-institution compatibility, it is highly suitable for data science using informatics techniques such as multivariate analysis, ML and database analysis. Many targeted analyses focus on a single compound or type of compound that requires pre-processing procedures for sample preparation, pre-treatment and extraction. In contrast, due to its high reproducibility, quantification and high-throughput, the NMR-based non-targeted approach is superior for analysing both large-scale data from time-series samples in environmental evaluation and multiple samples generated in polymer development and manufacture. With advances in informatics technology for pre-processing of low-resolution NMR data, cost-effective benchtop NMR will become more applicable to on-site assessments of human health, natural environment, agriculture, aquaculture and industry, while artificial intelligence and robotics for automation of research and development will be a key technology of NMR data science. The challenge is to utilize data that are linked across the natural environment and human activities throughout society, and to develop and implement data utilization methods.[181]

Conflicts of interest

There are no conflicts to declare.
  295 in total

1.  Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics.

Authors:  Frank Dieterle; Alfred Ross; Götz Schlotterbeck; Hans Senn
Journal:  Anal Chem       Date:  2006-07-01       Impact factor: 6.986

2.  Enhanced efficiency of solid-state NMR investigations of energy materials using an external automatic tuning/matching (eATM) robot.

Authors:  Oliver Pecher; David M Halat; Jeongjae Lee; Zigeng Liu; Kent J Griffith; Marco Braun; Clare P Grey
Journal:  J Magn Reson       Date:  2016-12-23       Impact factor: 2.229

3.  Fast Wide-Line Solid-State NMR on a Low-Cost Benchtop Spectrometer.

Authors:  Morten K Sørensen; Nicholas M Balsgart; Ole Jensen; Niels Chr Nielsen; Thomas Vosegaard
Journal:  Chemphyschem       Date:  2018-10-09       Impact factor: 3.102

Review 4.  NMR window of molecular complexity showing homeostasis in superorganisms.

Authors:  Jun Kikuchi; Shunji Yamada
Journal:  Analyst       Date:  2017-11-06       Impact factor: 4.616

5.  Biodegradability of Plastics: Challenges and Misconceptions.

Authors:  Stephan Kubowicz; Andy M Booth
Journal:  Environ Sci Technol       Date:  2017-10-12       Impact factor: 9.028

6.  NMR-Based Metabolic Profiling of Field-Grown Leaves from Sugar Beet Plants Harbouring Different Levels of Resistance to Cercospora Leaf Spot Disease.

Authors:  Yasuyo Sekiyama; Kazuyuki Okazaki; Jun Kikuchi; Seishi Ikeda
Journal:  Metabolites       Date:  2017-01-26

7.  Gut microbiomes of Malawian twin pairs discordant for kwashiorkor.

Authors:  Michelle I Smith; Tanya Yatsunenko; Mark J Manary; Indi Trehan; Rajhab Mkakosya; Jiye Cheng; Andrew L Kau; Stephen S Rich; Patrick Concannon; Josyf C Mychaleckyj; Jie Liu; Eric Houpt; Jia V Li; Elaine Holmes; Jeremy Nicholson; Dan Knights; Luke K Ursell; Rob Knight; Jeffrey I Gordon
Journal:  Science       Date:  2013-01-30       Impact factor: 47.728

8.  MetaboAnalyst: a web server for metabolomic data analysis and interpretation.

Authors:  Jianguo Xia; Nick Psychogios; Nelson Young; David S Wishart
Journal:  Nucleic Acids Res       Date:  2009-05-08       Impact factor: 16.971

9.  Solid-, solution-, and gas-state NMR monitoring of ¹³C-cellulose degradation in an anaerobic microbial ecosystem.

Authors:  Akira Yamazawa; Tomohiro Iikura; Amiu Shino; Yasuhiro Date; Jun Kikuchi
Journal:  Molecules       Date:  2013-07-29       Impact factor: 4.411

View more
  1 in total

1.  Paradigm and Path of Released Prisoners' Rights of Rehabilitation from the Perspective of Social Governance.

Authors:  Meifeng Ma
Journal:  J Environ Public Health       Date:  2022-09-19
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.