| Literature DB >> 32101298 |
Xiayin Zhang1, Kai Zhang1,2, Duoru Lin1, Yi Zhu1,3, Chuan Chen1,4, Lin He2, Xusen Guo5, Kexin Chen1, Ruixin Wang1, Zhenzhen Liu1, Xiaohang Wu1, Erping Long1, Kai Huang5, Zhiqiang He6, Xiyang Liu2, Haotian Lin1,7.
Abstract
BACKGROUND: Color vision is the ability to detect, distinguish, and analyze the wavelength distributions of light independent of the total intensity. It mediates the interaction between an organism and its environment from multiple important aspects. However, the physicochemical basis of color coding has not been explored completely, and how color perception is integrated with other sensory input, typically odor, is unclear.Entities:
Keywords: color perception; deep belief network; odor perception; physicochemical features; random forest
Mesh:
Year: 2020 PMID: 32101298 PMCID: PMC7043059 DOI: 10.1093/gigascience/giaa011
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:The overall workflow of color prediction and odor prediction. A total of 1,267 structurally diverse molecules were labeled with 12 diverse colors, and 598 structurally diverse molecules were labeled with 12 diverse odors. In addition, 5,270 physicochemical features of each molecule were generated by Dragon. Random forest models and deep belief networks were built to predict colors or odors using their physicochemical features. Feature selection was conducted by random forest models and the genetic algorithm. With the selected feature, random forest models and deep belief networks were reused for color and odor prediction. The models were evaluated on the basis of the means and variances of the accuracies between the labeled and predicted colors or odors.
Figure 2:Color prediction using the random forest model and DBN. A. Confusion matrix for the classification of color with 100% accuracy by the random forest. The X-axis presents the labeled colors of the molecules, and the Y-axis presents the predicted colors of the molecules. B. The classification results for color were as high as 95.23% using the DBN. The X-axis presents the learning rate, the Y-axis presents the algorithm parameter “momentum,” and the Z-axis presents the accuracy rate. C. Boxplot presenting the accuracy of color prediction from 4-fold cross-validations using the random forest with all features, the top 24 features selected by random forest models, the top 24 features selected by random forest and the genetic algorithm, and the total 48 features from above. The median values of these boxplots are labeled. D. Boxplot presenting the accuracy of color prediction using the DBN with all features, the top 24 features selected by random forest models, the top 24 features selected by random forest and the genetic algorithm, and the total 48 features from above. The median values of these boxplots are labeled. #Random forest models, *random forest models and genetic algorithm. E. Heat map of the correlation values between the top 24 features selected by random forest models and the 12 colors based on the hierarchical clustering framework. The connections between the colors and descriptors were calculated by the Euclid distances.
Figure 3:Odor prediction using the random forest model and DBN. A. The confusion matrix for the classification of odor with 93.40% accuracy by the random forest. B. The classification results for odor were as high as 94.75% using the DBN. The X-axis presents the learning rate, the Y-axis presents the algorithm parameter “momentum,” and the Z-axis presents the accuracy rate. C. Boxplot to present the accuracy of color prediction using the random forest with all features, the top 39 features selected by random forest models, the top 39 features selected by the random forest and the genetic algorithm, and the total 78 features from above. The median values of these boxplots are labeled. D. Boxplot presenting the accuracy of color prediction using the DBN with all features, the top 39 features selected by random forest models, the top 39 features selected by random forest and the genetic algorithm, and the total 78 features from above. The median values of these boxplots are labeled. #Random forest models, *random forest models and genetic algorithm. E. Heat map of the correlation values between the top 39 features selected by random forest models and the 12 odors based on the hierarchical clustering framework. Connections between the odors and descriptors were calculated by the Euclid distances.
Figure 4:The correlations between color and olfaction perception. A. Of the 1,267 molecules with color, 90 also had odor information. B. Schematic diagram of the key physicochemical features for color and odor perceptions in the interactome. The key features for color perception were closely connected with the key features for odor perception. The distance of each line represents its correlation value.