| Literature DB >> 35744782 |
Weiwei Wei1, Yuxuan Liao2, Yufei Wang2, Shaoqi Wang2, Wen Du1, Hongmei Lu2, Bo Kong1, Huawu Yang3, Zhimin Zhang2.
Abstract
Nuclear magnetic resonance (NMR) spectroscopy is highly unbiased and reproducible, which provides us a powerful tool to analyze mixtures consisting of small molecules. However, the compound identification in NMR spectra of mixtures is highly challenging because of chemical shift variations of the same compound in different mixtures and peak overlapping among molecules. Here, we present a pseudo-Siamese convolutional neural network method (pSCNN) to identify compounds in mixtures for NMR spectroscopy. A data augmentation method was implemented for the superposition of several NMR spectra sampled from a spectral database with random noises. The augmented dataset was split and used to train, validate and test the pSCNN model. Two experimental NMR datasets (flavor mixtures and additional flavor mixture) were acquired to benchmark its performance in real applications. The results show that the proposed method can achieve good performances in the augmented test set (ACC = 99.80%, TPR = 99.70% and FPR = 0.10%), the flavor mixtures dataset (ACC = 97.62%, TPR = 96.44% and FPR = 2.29%) and the additional flavor mixture dataset (ACC = 91.67%, TPR = 100.00% and FPR = 10.53%). We have demonstrated that the translational invariance of convolutional neural networks can solve the chemical shift variation problem in NMR spectra. In summary, pSCNN is an off-the-shelf method to identify compounds in mixtures for NMR spectroscopy because of its accuracy in compound identification and robustness to chemical shift variation.Entities:
Keywords: NMR; deep learning; identification; mixture analysis
Mesh:
Year: 2022 PMID: 35744782 PMCID: PMC9227391 DOI: 10.3390/molecules27123653
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Figure 1Schematic diagram of the proposed pSCNN method. (a) The network architecture of the pSCNN model. pSCNN consists of two subnetworks. Each subnetwork consists of six convolutional layers. The extracted features are concatenated and fed into two dense layers for prediction. (b) pSCNN model-based component identification. The inclusion relationship between each compound in the database and a mixture is predicted by the pSCNN model.
Figure 2The detailed neural network architecture of pSCNN.
Figure 3Augmented and experimental NMR spectra. (a) The spectrum of mixture obtained by data augmentation and its components spectra. (b) The experimental NMR spectra of the mixture and its components. (1–3) are the local zoomed-in views.
Figure 4Optimization of the pSCNN model. (a) The accuracy curves and loss curves of training set and validation set. (b) The accuracy of different models on the validation set.
The accuracy of different pSCNN models on the validation set.
| Name of Models | Epoch | The Number of Convolutional Layers * | Learning Rate | ACC |
|---|---|---|---|---|
| M1 | 100 | 6 | 10−2 | 0.4900 |
| M2 | 100 | 6 | 10−3 | 0.4935 |
| M3 | 100 | 6 | 10−4 | 0.9990 |
| M4 | 100 | 6 | 10−5 | 0.9935 |
| M5 | 100 | 5 | 10−4 | 0.9975 |
| M6 | 100 | 7 | 10−4 | 0.9975 |
| M7 | 100 | 8 | 10−4 | 0.9935 |
| M8 | 100 | 9 | 10−4 | 0.9925 |
| M9 | 100 | 10 | 10−4 | 0.9860 |
* A max pooling layer whose stride is set to 2 follows a convolutional layer.
Figure 5Performance evaluation on the test set and application of pSCNN on the known flavor mixtures and the additional flavor mixture.
The results of the pSCNN model on the experimental NMR datasets.
| Datasets | ACC | TPR | FPR |
|---|---|---|---|
| flavor mixtures dataset | 97.62% | 96.44% | 2.29% |
| additional flavor mixture dataset | 91.67% | 100.00% | 10.53% |
Figure 6Demonstration of the translation invariance of pSCNN. (a) The number of chemical shift variations between all mixtures and their corresponding components. (b–d) The probabilities of the corresponding components in mixtures predicted by pSCNN for spectral pairs with different chemical shift variations. The results in (b–d) are the spectral pairs with two, three and four components, respectively.