Literature DB >> 33385118

SkinBug: an artificial intelligence approach to predict human skin microbiome-mediated metabolism of biotics and xenobiotics.

Shubham K Jaiswal1, Shitij Manojkumar Agarwal1, Parikshit Thodum1, Vineet K Sharma1.   

Abstract

In addition to being pivotal for the host health, the skin microbiome possesses a large reservoir of metabolic enzymes, which can metabolize molecules (cosmetics, medicines, pollutants, etc.) that form a major part of the skin exposome. Therefore, to predict the complete metabolism of any molecule by skin microbiome, a curated database of metabolic enzymes (1,094,153), reactions, and substrates from ∼900 bacterial species from 19 different skin sites were used to develop "SkinBug." It integrates machine learning, neural networks, and chemoinformatics methods, and displays a multiclass multilabel accuracy of up to 82.4% and binary accuracy of up to 90.0%. SkinBug predicts all possible metabolic reactions and associated enzymes, reaction centers, skin microbiome species harboring the enzyme, and the respective skin sites. Thus, SkinBug will be an indispensable tool to predict xenobiotic/biotic metabolism by skin microbiome and will find applications in exposome and microbiome studies, dermatology, and skin cancer research.
© 2020.

Entities:  

Keywords:  In Silico Biology; Metabolomics; Systems Biology

Year:  2020        PMID: 33385118      PMCID: PMC7772573          DOI: 10.1016/j.isci.2020.101925

Source DB:  PubMed          Journal:  iScience        ISSN: 2589-0042


Introduction

After the human gut, it is the skin that harbors the largest and most exposed human-associated microbiome known as the skin microbiome that plays a pivotal role in both health and disease (Grice et al., 2009; Grice and Segre, 2011; Kong and Segre, 2012). Skin is composed of three layers, among which the outermost layers (epidermis and dermis) serve as an elaborate host for building an ecosystem with more than a trillion microbial cells belonging to more than 1,000 microbial species from 19 different phyla (Edwards and Marks, 1995; Grice et al., 2009; Grice and Segre, 2011; Nakatsuji et al., 2013). Several specialized niches such as sebaceous, moist, and dry are found on the skin with differences in moisture level; aerobic, anaerobic, and semi-aerobic conditions; carbon and sulfur availability; secretions; and exogenous environmental factors (Grice and Segre, 2011). The aerobic niches include the skin surface and the epidermis layer, whereas the dermis layer and specifically the invaginations that form folliculo-sebaceous units or roots of the hair follicles and sweat glands form the anaerobic and lipid-rich niches for anaerobic species (Bay et al., 2020; Grice and Segre, 2011). The skin site-specific niches support the growth of diverse bacterial species and result in enormous compositional differences in skin microbiome (Grice and Segre, 2011). The commensal skin microbiome influences the physical characteristics of the epidermis layer and regulates the development of immune system, and a dysbiosis of this community is associated with several skin diseases such as acne, atopic dermatitis, and psoriatic lesions (Byrd et al., 2018; Grice et al., 2009; Kong et al., 2012; Naik et al., 2012; Picardo and Ottaviani, 2014; SanMiguel and Grice, 2015; Zeeuwen et al., 2013; Saxena, 2018). In addition to being crucial for the host health, the skin microbiome also possesses a large and diverse reservoir of metabolic enzymes that play a key role in metabolizing numerous biomolecules naturally produced by either host skin cells or other commensal microorganisms. One such example of a common skin commensal is Staphylococcus epidermidis that can ferment the glycerol present in the skin and can inhibit the growth of Propionibacterium acnes, which causes acne vulgaris (Wang et al., 2014). This gigantic pool of metabolic enzymes can also potentially metabolize the xenobiotic molecules that come in regular contact with our skin such as those present in cosmetics, pharmacological formulations, and the environmental pollutants, which together comprise a large part of the skin exposome (Stingley et al., 2010). Furthermore, due to the presence of both aerobic and anaerobic microbial species in the aerobic and anaerobic niches, the skin microbiome can perform the aerobic or anaerobic metabolism of any biotic/xenobiotic molecule. For example, the metabolism of synthetic azo dyes such as methyl red (MR) is reported by both aerobic and anaerobic bacterial species of skin microbiome that reduce the dye using their azoreductase enzyme (Stingley et al., 2010). The azo dyes are regularly used in cosmetics, tattoo inks, and other products, and the reduction of these dyes due to their undesirable metabolism by the skin microbiome-harbored metabolic enzymes can produce carcinogenic aromatic amines, which poses significant health risks (Chung, 1983; Nakayama et al., 1983). Another example is the aerobic oxidative metabolism of an abundant environment pollutant BaP by monooxygenase enzymes from different bacterial species of skin microbiome (Sowada et al., 2014). The anaerobic metabolism of different skin secretion substances such as triglyceride lipids and secretory proteins by lipases and proteases has also been reported by experimental studies (Byrd et al., 2018). Similarly, the metabolism of glycolic acid, cholesterol, and glycerol by aerobic bacterial species; the metabolism of arginine, triglyceride lipids, propylene glycol, and palmitic acid by anaerobic bacterial species; and the metabolism of alpha-tocopherol, uric acid, lactic acid, ethanolamine, linolenic acid by both aerobic and anaerobic bacterial species of skin microbiome have been reported recently by an experimental study (Timm et al., 2020). However, in most of these cases, the exact metabolic reactions and the metabolic enzymes responsible for the metabolism of these molecules are yet unknown. The skin microbiome-mediated undesired metabolism of commonly used cosmetic products and therapeutic drugs applied on the skin can significantly alter the bioavailability and therapeutic bioactivity of these molecules. Therefore, exploration of the metabolic potential of the enzymes present in the skin microbiome is much needed to understand their impact and role in human health and disease, in exposome studies, to determine the efficacy and bioavailability of the therapeutic and cosmetic molecules applied on the skin, and also to predict their potential toxicity due to the promiscuous metabolism by the skin microbiome. However, the comprehensive experimental metabolic investigation of each molecule separately by skin microbiome through experimental approaches is a very challenging and tedious task because skin microbiome shows the highest longitudinal variability and largest phylogenetic diversity (Grice and Segre, 2011). In this scenario, the development of an efficient computational method for the prediction of metabolism of chemical substances by microbial species present in skin microbiome appears as a promising approach. At present, there exists no tool for the prediction of metabolism of cosmetics or any xenobiotic molecule solely by the skin microbial species. Therefore, we developed a tool named “SkinBug” by integrating machine learning, neural networks, and chemoinformatics methods to predict the metabolic reactions, corresponding reaction centers, metabolic enzymes that can catalyze the predicted reactions, species containing predicted enzymes, and skin sites harboring predicted species for any given biotic/xenobiotic molecule by the skin microbiome. As the metabolic enzymes show promiscuity and are capable of metabolizing structurally similar substrates, the structural and chemical properties of substrate molecules from the known reactions were used as features for predicting the metabolism of biotic or xenobiotic/therapeutic molecules by skin microbiome (Babtie et al., 2010; Hult and Berglund, 2007; Khersonsky et al., 2006; Pandya et al., 2014; Khersonsky and Tawfik, 2010; Sharma, 2017). The predictions from the tool can help to determine the metabolic potential of the skin microbiome, in exposome studies, and to design more efficient skin therapeutic molecules and cosmetic agents by considering their metabolism by skin microbiome.

Results

In this study, we have developed a computational tool named “SkinBug” to predict the metabolic reaction, enzymes, species, and skin sites of the skin microbiome that can potentially metabolize the biotic/xenobiotic molecules by integrating chemoinformatics, machine learning, and neural network methodologies.

Construction of pangenomes for skin microbiome

Several metagenomic studies have explored the bacterial diversity of different skin sites; however, until now there is no single resource available that provides information on the bacterial species present at different skin sites. Therefore, the data and text mining of the available literature was performed to construct the database of bacterial species present at different skin sites. A total of 1,616 unique bacterial species were identified from the 19 different skin sites from literature. Of the 1,616 identified bacterial species, the complete genomes of 897 could be identified at the “NCBI RefSeq” database. The genomes of all the strains were used to construct the pangenome of a particular species. A total of 897 pangenomes were constructed for the human skin microbiome and were used for further analysis.

Transferases are the most abundant enzymes in skin microbiome

Using the ExPASy enzyme database a total of 1,094,153 metabolic enzymes were identified from the 897 pangenomes. Each of these metabolic enzymes was linked to a four-digit EC number based on the associated EC number of its closest homolog. From the distribution of metabolic enzymes across six reaction classes, it was apparent that the enzymes belonging to “Transferases” class were the most abundant in the skin microbiome, whereas “Isomerases” enzymes were the least abundant (Figure S1). The proportions of different reaction class were identified for each of the 19 skin sites (Figure 1A). For all the sites, we observed that the “Transferases” reactions were most abundant followed by the “Oxidoreductases” and “Hydrolases” reactions, perhaps due to the higher abundance of these classes of enzymes in the bacterial pangenomes. The inner wrist, perineum, and cheek had the highest proportions of the “Ligases”_underreactions and the lowest proportions of “Hydrolysis” reactions when compared with the other skin sites. The inner wrist and perineum also have the least proportion of “Oxidoreductases” reactions when compared with the other skin sites.
Figure 1

Diversity of metabolic enzymes on skin sites

(A) Proportions of different metabolic enzymes from six types of reaction classes (“Oxidoreductases,” “Hydrolases,” “Transferases,” “Lyases,” “Isomerases,” and “Ligases”) plotted as stacked bar plots across the 19 different skin sites.

(B) Silhouette plot for determining the optimum number of clusters for clustering different skin sites. The y axis is average silhouette width value, and the x axis is the number of clusters. The k-value at which the maximum of average silhouette width is achieved is the optimum number of clusters.

(C) The dendrogram showing the results of hierarchical clustering of the skin sites based on the 2,523 unique reactions that can be performed by the metabolic enzymes of species present in skin microbiome. The approximate unbiased p values (AUp) and the bootstrap probability (BP) values for each branch/cluster were calculated using multiscale bootstrap resampling and using normal bootstrap resampling, respectively. The AUp values are mentioned in red, and the BP values are mentioned in green.

See also Figures S1–S4.

Diversity of metabolic enzymes on skin sites (A) Proportions of different metabolic enzymes from six types of reaction classes (“Oxidoreductases,” “Hydrolases,” “Transferases,” “Lyases,” “Isomerases,” and “Ligases”) plotted as stacked bar plots across the 19 different skin sites. (B) Silhouette plot for determining the optimum number of clusters for clustering different skin sites. The y axis is average silhouette width value, and the x axis is the number of clusters. The k-value at which the maximum of average silhouette width is achieved is the optimum number of clusters. (C) The dendrogram showing the results of hierarchical clustering of the skin sites based on the 2,523 unique reactions that can be performed by the metabolic enzymes of species present in skin microbiome. The approximate unbiased p values (AUp) and the bootstrap probability (BP) values for each branch/cluster were calculated using multiscale bootstrap resampling and using normal bootstrap resampling, respectively. The AUp values are mentioned in red, and the BP values are mentioned in green. See also Figures S1–S4.

Skin sites are highly variable in enzymatic reactions

Of the 5,430 reactions on KEGG (Kyoto Encyclopedia of Genes and Genomes) database, the complete skin microbiome had enzymes for 2,523 reactions (Kanehisa and Goto, 2000). To identify the similarities and differences in skin sites with respect to the enzymatic reactions, hierarchical clustering was performed and the approximate unbiased p values (AUp) and the bootstrap probability (BP) values for each branch/cluster were calculated. The optimum number of clusters was identified to be two based on the average silhouette method (Figure 1B). Among the two most optimum clusters, the sites cheek, inner wrist, and perineum clustered together, and the rest of the sites formed another cluster (Figure 1C). Moreover, four clades were found to be statistically significant in the hierarchical clustering of skin sites. The first clade was formed by the cheek, inner wrist, and perineum; the second clade was formed by the arm and foot; the third clade was formed by the groin and retroauricular crease; and the fourth clade was formed by pressure ulcer and venous leg ulcer (Figure 1C). It is apparent that the sites with similar secretions, humidity, environmental exposure, or disease status clustered together. Also the total reactions present at each site and the reactions common to the different sites were identified using the matrix layout analysis (Supplementary Text S5). Foot and arm had the highest number of enzymatic reactions, whereas the perineum, inner writs, and cheek had the lowest number of enzymatic reactions (Figure S2). These observations are supported by a previous finding, which showed that dry niches of skin microbiome such as foot and arm have higher phylogenetic diversity than the other niches of skin microbiome (Grice and Segre, 2011). Only 277 reactions of 2,523 were common to all the 19 skin microbiome sites, and many reactions were unique to specific groups of sites. For example, 134 reactions were unique to foot and arm and occur only at these two sites (Figure S2). This suggests that the metabolic potential of different skin microbiome sites is variable, and site-specific metabolism should be considered while developing the tool to predict the metabolism of molecules by skin microbiome.

Metabolic complexity of microbiome

The principal-component analysis (PCA) with 2,322 variables was performed for the 3,769 substrates (Supplementary Text S6 and S7). In the PCA analysis, PC-1 (14.7%) and PC-2 (5.7%), collectively could explain only 20.4% of the variance present in the input dataset (Figure 2A). Furthermore, from the scree and cumulative scree plot it is apparent that more than 1,000 dimensions (PCs) were needed to explain the complete variance (>95%) present in the dataset (Figure 2A). This suggests that the dataset is very diverse and most of the variables add significant variance to the dataset (Figures 2A, S3, and S4).
Figure 2

Evaluating the diversity and complexity of metabolic dataset

(A) The cumulative scree plot and a normal scree plot from the PCA analysis of all the substrate molecules from the complete input dataset based on the 2,322 selected features. The x axis is the principal component number, y axis for the dot plot is the cumulative variance explained by the individual principal components, and the y axis for the bar plot is the percentage of variance explained by the individual principal components.

(B) The density-based clustering of substrate molecules using the principal component PC1 and PC2 from the PCA analysis. The density-based clustering was performed with MinPts value 20 and epsilon value 3.25. Three dense clusters were identified, which are colored as blue for cluster-1, yellow for cluster-2, and gray for cluster-3.

(C) The proportion of different reaction classes in each cluster is shown as pie chart with percentage value labeled for each reaction class. The proportion is calculated based on the number of substrate molecules that can undergo a particular kind of reaction class in that cluster. The pie chart is mentioned for all three above identified clusters: cluster-1, cluster-2, and cluster-3.

See also Figures S5 and S6.

Evaluating the diversity and complexity of metabolic dataset (A) The cumulative scree plot and a normal scree plot from the PCA analysis of all the substrate molecules from the complete input dataset based on the 2,322 selected features. The x axis is the principal component number, y axis for the dot plot is the cumulative variance explained by the individual principal components, and the y axis for the bar plot is the percentage of variance explained by the individual principal components. (B) The density-based clustering of substrate molecules using the principal component PC1 and PC2 from the PCA analysis. The density-based clustering was performed with MinPts value 20 and epsilon value 3.25. Three dense clusters were identified, which are colored as blue for cluster-1, yellow for cluster-2, and gray for cluster-3. (C) The proportion of different reaction classes in each cluster is shown as pie chart with percentage value labeled for each reaction class. The proportion is calculated based on the number of substrate molecules that can undergo a particular kind of reaction class in that cluster. The pie chart is mentioned for all three above identified clusters: cluster-1, cluster-2, and cluster-3. See also Figures S5 and S6. Furthermore, to identify if the different reaction classes are separable from each other or cluster separately, the density-based unsupervised clustering was performed on the PCA results using PC-1 and PC-2 (Kriegel et al., 2011). The density-based clustering is resistant to noise, outliers, and inherent irregular shapes of the clusters present in the complex datasets when compared with the other methods such as k-means, PAM (partition around medoids), and hierarchical clustering. The two parameters that affect the quality of clustering are epsilon (eps) and minimum points (MinPts). Larger datasets need a larger “MinPts” value, thus, a value of 20 was used. The epsilon value of 3.25 was chosen based on the k-distance graph, the best value being the knee point in the graph (Figure S5). In the density-based clustering we observed that the dataset formed three clusters with a lot of outliers marked as black points (Figure 2B). Moreover, each of these clusters had substrate molecules from all the six reaction classes; cluster-1 had the highest number of substrates for Transferases reaction, whereas cluster-2 and cluster-3 had the highest number of substrates for Oxidoreductase reaction (Figure 2C). Furthermore, the PCA analysis of pure substrates that can perform only one type of reaction was performed to check if any pattern of separation exists among these pure substrates. For these substrates also, more than 1,000 dimensions were required for explaining the complete variance present in the dataset (Figure S6), and no clear separation between different reactions classes was observed (Figure S6). Collectively, it is evident that the input dataset is very diverse and complex and there is no clear linear or non-linear separation between different reaction classes. Thus, robust machine learning and/or deep learning methods are required to build the models that can perform the multiclass multilabel classification for these reaction classes.

Prediction of metabolism by skin microbiome

The complete workflow of the construction of “SkinBug” is mentioned in Figure 3. “SkinBug” provides a comprehensive prediction of the possible metabolic reactions and their respective reaction centers, corresponding metabolic enzymes, species having those enzymes, and skin sites carrying those species. The five key steps involved in the prediction are classified into five modules. The first module predicts the reaction class, the second module predicts reaction subclass, the third module predicts complete reactions, the fourth module predicts the reaction centers in the molecule for the predicted reactions, and the fifth module provides information on the metabolic enzymes, species, and skin sites.
Figure 3

The algorithm and complete workflow for the construction of SkinBug

The algorithm and complete workflow for the construction of SkinBug

Module-1: Construction of reaction class-specific prediction models

Two kinds of approaches were used; one is machine learning that will be efficient at learning the class correlations and class discrimination, and the other is artificial neural network (ANN) models that will be efficient at learning the class-specific patterns. For the machine learning-based multiclass multilabel predictions, both the problem transformation (five types) and algorithm adaptation methods were used. For each type of problem transformation seven types of best suited core learners were used. Thus, the performances of a total of 34 different problem transformation models and the two available algorithm adaptation models were compared. To obtain a comprehensive and reliable comparison of the models, the comparison was performed using three types of datasets: (1) boruta selected fingerprints, boruta selected descriptor, and ECFP, (2) boruta selected fingerprints, boruta selected descriptor, and FCFP; and (3) boruta selected fingerprints, boruta selected descriptor, ECFP, and FCFP. The values of multilabel accuracy, multilabel precision, F1 score, and hamming loss for all the models for all the three datasets are plotted as line plot and are mentioned in Figures S7–S9. From the plots it is evident, that only Random forest survival regression classification (RFSRC) algorithm consistently performed better on all the three datasets, and thus, this algorithm was selected for further modeling. The complete dataset of 3,769 substrates was split into a working dataset and a blind dataset with a split ratio of 95:5. Furthermore, the working dataset was split into training and testing dataset using a customized stratified random sampling method (Methods). The RandomForestSRC model was trained on training dataset, and its performance was evaluated on the testing dataset. From the multilabel performance, it is apparent that the model showed a multilabel accuracy of up to 82.4%, multilabel sensitivity of up to 91.8%, multilabel precision of up to 83.3%, F1 score of up to 85.5%, and hamming loss of up to 0.067 in the three types of statistical testing methods (Figure 4A).
Figure 4

The multilabel and binary performance of reaction class-specific RFSRC prediction model

(A) The multilabel accuracy, multilabel sensitivity, multilabel precision, multilabel F1 score, and hamming loss mentioned for three type of statistical testing of the RFSRC model: stratified randomly sampled test dataset, 5-fold cross-validation, and blind dataset.

(B) The box plot for the performance across each fold of the 5-fold cross-validation for the five different performance measures: multilabel accuracy, multilabel sensitivity, multilabel precision, multilabel F1 score, and hamming loss. The y axis shows the fraction value for each of the performance measure. Fraction values were chosen so that all the measures including hamming loss could be plotted on the same y axis scale.

(C) The binary performance of the reaction class prediction RFSRC model for each of six different reaction classes (“Oxidoreductases,” “Hydrolases,” “Transferases,” “Lyases,” “Isomerases,” and “Ligases”). Fraction values were chosen so that all the measures could be shown on the same scale.

AUC = area under the curve, MMCE = binary mean misclassification error, FNR = binary false-negative rate, FPR = binary false-positive rate, ACC = binary accuracy, MCC = Matthews correlation coefficient, NPV = binary negative predictive value, PPV = binary positive predicted value, F1 = binary F1 score, FDR = binary false discovery rate, GPR = geometric mean of binary precision and binary recall.

See also Figures S7–S9.

The multilabel and binary performance of reaction class-specific RFSRC prediction model (A) The multilabel accuracy, multilabel sensitivity, multilabel precision, multilabel F1 score, and hamming loss mentioned for three type of statistical testing of the RFSRC model: stratified randomly sampled test dataset, 5-fold cross-validation, and blind dataset. (B) The box plot for the performance across each fold of the 5-fold cross-validation for the five different performance measures: multilabel accuracy, multilabel sensitivity, multilabel precision, multilabel F1 score, and hamming loss. The y axis shows the fraction value for each of the performance measure. Fraction values were chosen so that all the measures including hamming loss could be plotted on the same y axis scale. (C) The binary performance of the reaction class prediction RFSRC model for each of six different reaction classes (“Oxidoreductases,” “Hydrolases,” “Transferases,” “Lyases,” “Isomerases,” and “Ligases”). Fraction values were chosen so that all the measures could be shown on the same scale. AUC = area under the curve, MMCE = binary mean misclassification error, FNR = binary false-negative rate, FPR = binary false-positive rate, ACC = binary accuracy, MCC = Matthews correlation coefficient, NPV = binary negative predictive value, PPV = binary positive predicted value, F1 = binary F1 score, FDR = binary false discovery rate, GPR = geometric mean of binary precision and binary recall. See also Figures S7–S9. However, this performance could also be a result of over-fitting; thus, examining the over-fitting of the model is required. One way to examine over-fitting is cross validation. If the performance varies too much among the different folds of the cross-validation, it indicates the case of over-fitting. Hence for the RFSRC model, the performance on each fold of the 5-fold cross-validation was calculated and plotted as box plot to evaluate the over-fitting of the model (Figure 4B). From the box plot it is clear that the performance of the model across the folds of cross-validation is similar for all the five measures of multiclass multilabel performance. Thus, it is apparent that the RFSRC model does not show any sign of over-fitting. Additionally, the binary performance of the RFSRC model on test dataset was also evaluated for each of the reaction class and is shown as a heatmap (Figure 4C). The binary performance measures the quality of learning for each of the target class present in the multiclass multilabel dataset. The performance for each class is measured using the 11 different binary performance matrices. In the binary performance, the model could achieve the binary accuracy of up to 98.8%, area under the curve of up to 95.5%, and Matthews correlation coefficient of up to 0.78. The binary performance of the RFSRC model on the 5-fold cross-validation and blind dataset was also calculated and is mentioned in the Tables S1 and S2. For the ANN model, the final selected architecture had a total of three layers: input layer with size of 2,322 neurons based on the number of features selected, hidden layer with size of 1,162 neurons, and output layer with size of six neurons based on six types of reaction classes. As ANN model was to be used for the multilabel classification, the three evaluation matrices were categorical accuracy, binary accuracy, and log loss. The change in the three matrices along the epochs is measured for the training and test dataset to obtain the most optimum epoch value. The value of 1,500 epochs was found to be the most optimum using all the three matrices as all the three matrices showed a plateau after this value (Figures 5A–5C).
Figure 5

The optimization of epochs and performance evaluation of ANN model

(A) The plot of log loss or binary cross-entropy for different values of epochs for the training and testing dataset.

(B) The plot of binary accuracy for different values of epochs for the training and testing datasets.

(C) The plot of categorical accuracy for different values of epochs for the training and testing datasets.

(D and E) (D) The box plot for the performance across each fold of the 5-fold cross-validation for the three different performance measures of ANN model: categorical accuracy, binary accuracy, and log loss or binary cross-entropy. The y axis for the binary and categorical accuracy shows the percentage value, whereas the y axis for the log loss or binary cross-entropy shows the actual value. (E) The performance of ANN model measured using three evaluation measures, categorical accuracy, binary accuracy, and log loss or binary cross-entropy for two types of statistical testing methods: 5-fold cross-validation and testing on stratified randomly sampled testing dataset.

See also Figures S10–S14.

The optimization of epochs and performance evaluation of ANN model (A) The plot of log loss or binary cross-entropy for different values of epochs for the training and testing dataset. (B) The plot of binary accuracy for different values of epochs for the training and testing datasets. (C) The plot of categorical accuracy for different values of epochs for the training and testing datasets. (D and E) (D) The box plot for the performance across each fold of the 5-fold cross-validation for the three different performance measures of ANN model: categorical accuracy, binary accuracy, and log loss or binary cross-entropy. The y axis for the binary and categorical accuracy shows the percentage value, whereas the y axis for the log loss or binary cross-entropy shows the actual value. (E) The performance of ANN model measured using three evaluation measures, categorical accuracy, binary accuracy, and log loss or binary cross-entropy for two types of statistical testing methods: 5-fold cross-validation and testing on stratified randomly sampled testing dataset. See also Figures S10S14. The specific order for the hyperparameters tuning as suggested by Greff et al., 2016, was performed using grid search to achieve the most optimum value of all the parameters (Greff et al., 2016). The results of the grid search with 5-fold cross-validation-based hyperparameters tuning are mentioned in Figures S10S14. The final ANN model was trained with the hyperparameter values: epochs = 1,500, optimizer algorithm = RMSprop, learning rate = 0.001, weight initialization method = lecun_uniform, batch size = 150, dropout rate = 0.4, and weight constraint = 4. As it was a multiclass multilabel classification problem, for the hidden layer “rectified linear unit” activation function was used, whereas for the output layer the “sigmoid” activation function was used. The final ANN model on the optimized hyperparameters was tested using the 5-fold cross-validation on the training dataset. The values of all the three evaluation matrices were calculated for each fold of the 5-fold cross-validation and were plotted as box plot to evaluate for any kind of over-fitting (Figure 5D). From the box plot it is apparent that the values across the folds are very similar, thus conveying no over-fitting in the final ANN model. Furthermore, the performance of the ANN model was also evaluated on the test dataset. The ANN model could achieve a categorical accuracy of up to 70.7%, binary accuracy of up to 90.0%, and log loss of up to 0.813 (Figure 5E). As the learning capabilities of the RFSRC model and the ANN models are complementary to each other, the union of the predictions of multiclass multilabel RFSRC and ANN models was used to make the prediction of the reaction class.

Module-2: Construction of reaction subclass-specific prediction models

As one molecule could undergo multiple types of reactions belonging to different reaction subclasses, the multiclass multilabel prediction models were constructed for each type of reaction class to predict the reaction subclasses. The working dataset was split into six datasets, one for each reaction class, and these datasets were utilized for the construction of reaction subclass-specific prediction models (Methods). The previously optimized RFSRC algorithm was used to construct six different multilabel prediction models specific to the six types of reaction classes. The multiclass multilabel performance on testing dataset for RFSRC model for each reaction class is mentioned in Table 1, and for 5-fold cross-validation on training dataset is mentioned in Table 2. The RFSRC models showed the multilabel accuracy of 61.0–74.4%, multilabel sensitivity of 63.7%–77.7%, multilabel precision of 75.0%–91.4%, F1 score of 63.9%–76.5%, and hamming loss of 0.093–0.019 on the test dataset. The binary performance of RFSRC models for each of the corresponding reaction subclass for each reaction class for 5-fold cross-validation is mentioned in Tables S3–S8, and on test dataset, is mentioned in Tables S9–S14. After the different reaction classes are predicted by the reaction class prediction models, the prediction of reaction subclass was performed for each of the predicted reaction class using the respective reaction class-specific RFSRC models.
Table 1

Multiclass multilabel performance metrics for the reaction subclass-specific models calculated using stratified random sampling

Reaction classAccuracy (%)Sensitivity (%)Precision (%)F1 score (%)Hamming loss
Oxidoreductases73.977.788.676.50.019
Transferases61.063.783.363.90.072
Hydrolases72.476.879.274.80.04
Lyases74.476.785.676.20.058
Isomerases72.273.691.473.10.06
Ligases72.072.075.072.00.093

All the parameters mentioned above were calculated using multilabel evaluation methodology (see Tables S9–S14).

Table 2

Multiclass multilabel performance metrics for the reaction subclass-specific models calculated using 5-fold cross-validation

Reaction classAccuracy (%)Sensitivity (%)Precision (%)F1 score (%)Hamming loss
Oxidoreductases61.564.382.864.90.036
Transferases56.359.680.860.00.09
Hydrolases63.166.380.566.30.052
Lyases64.467.177.767.10.084
Isomerases65.267.782.567.00.089
Ligases77.780.084.579.90.075

All the parameters mentioned above were calculated using multilabel evaluation methodology (see also Tables S3–S8).

Multiclass multilabel performance metrics for the reaction subclass-specific models calculated using stratified random sampling All the parameters mentioned above were calculated using multilabel evaluation methodology (see Tables S9–S14). Multiclass multilabel performance metrics for the reaction subclass-specific models calculated using 5-fold cross-validation All the parameters mentioned above were calculated using multilabel evaluation methodology (see also Tables S3–S8).

Module-3: Construction of complete reaction prediction models

The molecular similarity search was performed against all the predicted reaction subclasses from each of the predicted reaction class. The molecular similarity search measure known as Tanimoto coefficient was calculated, and the subclasses that had substrates with Tanimoto coefficient value of >0.80 were considered further. This step was essential to filter out the false-positive predictions from the previous steps. The k-nearest neighbors (KNN) were identified in the reaction subclasses that qualify the aforementioned criteria using the KNN method. In this approach, the molecular similarity searching and KNN methods together allow for identifying the best structural and chemical homolog of the input molecule. Thus, the complete reactions of all the molecules that qualify the molecular similarity searching and KNN method were assigned to the given molecule. This way all the complete metabolic reactions that could potentially occur to a given molecule were identified.

Module-4: Prediction of reaction centers

The reaction centers on the given molecule for all complete reactions predicted in the abovementioned steps are identified using the RDM (Reaction center, Difference region, and Matched region) patterns database. The RDM patterns are derived from the structure alignments of the substrates and contain the information about the KEGG atom type changes at the reaction center, matched region of the molecules, and the dissimilar region of the molecule (Yamanishi et al., 2009). The KEGG reaction-class pairs (RC pairs) were identified for each of the predicted complete reactions, and the corresponding RDM patterns were extracted from the RDM pattern database constructed in this study. Each of the RDM pattern is applied to the given molecule, and the respective reaction center is identified using the in-house python script while taking into account the atoms types, their valancies, and bonding information about the molecule.

Module-5: Prediction of metabolic enzymes, species, and skin sites

In this module, for each of the predicted complete reaction, the metabolic enzymes that could potentially perform those reactions, the corresponding species that harbors those metabolic enzymes, and all the skin sites that carry those species are retrieved from the in-house constructed skin microbiome specific metabolic database that contains information on 1,094,153 metabolic enzymes from 897 species pangenomes of skin microbiome from 19 skin sites.

The development of SkinBug web server

For easy and user-friendly usage of the SkinBug approach, a user-friendly web server tool was developed and is available at http://metagenomics.iiserb.ac.in/skinbug. On the web server, a user can upload a mol/sdf file of a molecule, or can provide the PubChem ID of a molecule. For a novel molecule, these files can also be prepared by drawing its structure in any of the molecular editor software, e.g., ChemDraw. The complete workflow of the web server-based predictions using SkinBug is mentioned in Figure 6, and the other details of the web server tool are mentioned in Supplementary Text S8.
Figure 6

The prediction steps and complete workflow for processing of any input molecule by the SkinBug web server

The prediction steps and complete workflow for processing of any input molecule by the SkinBug web server The results from the SkinBug tool are displayed as results sections: R1 to R6. All the possible reaction classes are provided in R1; reaction subclass in R2; complete reactions in R3; the RC pair, RDM pattern, and predicted reaction centers in R4; and the predicted skin sites in R5. The complete reaction annotated as a four-digit EC number, corresponding metabolic enzyme, species harboring the enzyme, and the homology parameters for enzyme annotations (percent identity, e-value, query coverage, subject coverage) are mentioned in R6. All the results could also be downloaded as a single text file by clicking on the “Download Results” icon.

Validation of the SkinBug tool

This is the first tool that can predict the enzymatic biotransformation of any biotic or xenobiotic molecule by the skin microbiome, and thus a direct comparison with any existing state-of-the-art tool was not possible. The biological validation of SkinBug was performed on diverse molecules including natural molecules present on skin, cosmetics, pharmaceuticals, etc. that come in regular contact with our skin, and their metabolism is known from the experimental studies (Table 3). A total of 28 diverse molecules selected for which information about their metabolism by human host or microbial species of skin microbiome through experimental studies is available in literature were evaluated by SkinBug and presented as case studies. This validation set also included the examples of molecules that can undergo the aerobic and/or anaerobic metabolism. The comparison of SkinBug predictions was performed with the metabolism information known from previous experimental studies, and information available from three reference databases, namely, EAWAG-BBD (The University of Minnesota Biocatalysis/Biodegradation Database), Transformer (Metabolism of Xenobiotics Database), and SMPDB (The Small Molecule Pathway Database) (Gao et al., 2010; Hoffmann et al., 2014; Jewison et al., 2014). The prediction results for these 28 molecules along with the information from literature are mentioned in Table 3. Some examples from the comparative analysis with the reference databases are provided below.
Table 3

The prediction of metabolism of the 28 selected molecules by SkinBug along with their known metabolism from literature

Sr. No.CompoundFunctionReaction subclass from literatureReaction subclass from SkinBugEnzyme from literatureEnzyme from SkinBugSkin microbial genus from literatureSkin microbes genus from SkinBugReferences
1Cinnamyl alcoholPerfuming, MaskingActing on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygenActing on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; acyltransferasesAlcohol dehydrogenase; cytochrome P450Cinnamyl alcohol dehydrogenase; acetyl CoA benzyl alcohol acetyltransferaseNANo genus predicted(Jäckh et al., 2012; Walker et al., 2013)
2LeucineAntistatic, hair conditioning, skin conditioningTransferring nitrogenous groupsActing on the CH-NH2 group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; acyltransferases; transferring nitrogenous groups; acting on carbon-nitrogen bonds, other than peptide bonds; intramolecular transferases; forming carbon-oxygen bondsBranched chain amino acid aminotransferaseIsoleucine N-monooxygenase; valine N-monooxygenase; leucine N-acetyltransferase; branched chain amino acid transferase; leucine transaminase; leucine 2,3-aminomutase; leucine tRNA ligase; etc.Corynebacterium; StaphylococcusZymomonas; Pseudomonas; Burkholderia; Streptomyces; Staphylococcus; Yersinia; Corynebacterium; etc.(Fredrich et al., 2013; Holeček, 2018)
3GlycerolPerfuming, denaturant, humectant, solvent, hair conditioning, viscosity controllingActing on the CH-OH group of donors; transferring phosphorus-containing groupsActing on the CH-OH group of donors; glycosyltransferases; transferring phosphorus-containing groups; acting on ester bonds; glycosylases; carbon-carbon lyases; carbon-oxygen lyasesGlycerol kinase; glycerol dehydrogenaseGlycerol kinase; glycerol dehydrogenase; 1,2-alpha-glucosylglycerol phosphorylase; acylglycerol lipase; glycerol phosphatase; dihydroxy acid dehydratase; etc.StaphylococcusZymomonas; Xylella; Xanthomonas; Gluconobacter; Geobacillus; Staphylococcus; Streptococcus; etc.(Fredrich et al., 2013; Ruzheinikov et al., 2001; Xue et al., 2017)
4Benzo(a)pyrenebEnvironmental pollutant found in soot, tobacco smoke, diesel exhaust etc.Acting on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygenCytochrome P450: CYP1A1, monooxygenases, dioxygenasesNaphthalene 1,2-dioxygenase; unspecific monooxygenases; steroid 21-monooxygenase; etc.Pseudomonas; Micrococcus; Bacillus; etc.Pseudomonas; Bacillus; Burkholderia; Polaromonas; Ralstonia(Ahmad and Mukhtar, 2004)
5Retinoic acidAntiseborrheicActing on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygen; acting on the aldehyde or oxo group of donors; GlycosyltransferasesCytochrome P450: CYP1A1Unspecific monooxygenase; alkane monooxygenase; retinal dehydrogenase; aldehyde oxidase; glucuronosyl transferase; etc.NAEscherichia; Pseudomonas; Alcanivorax; Methylobacterium; Jonesia; etc.(Ahmad and Mukhtar, 2004)
67,12-Dimethylbenz(a)anthraceneEnvironmental pollutant found in tobacco smokeActing on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygenCytochrome P450: CYP1A1, cytochrome P450: CYP1B1Unspecific monooxygenase; aromatase; estradiol 6-beta monooxygenaseNANo genus predicted(Ahmad and Mukhtar, 2004)
7Vitamin D3Skin conditioningActing on paired donors, with incorporation or reduction of molecular oxygenActing on the CH-CH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygenCytochrome P450Vitamin D3,24-hydroxylase; calcidiol 1-monooxygenase; vitamin D1, 25-hydroxylaseNANocardia; Verrucosispora; Stackebrandtia(Ahmad and Mukhtar, 2004)
84-AllylanisolePerfumingActing on paired donors, with incorporation or reduction of molecular oxygenActing on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; transferring one-carbon groupsCytochrome P4504-(hydroxymethyl) benzenesulfonate dehydrogenase; (iso) eugenol O-methyltransferase; trans-anol O-methyltransferaseNANo genus predicted(Jäckh et al., 2012)
9IsoeugenolPerfuming, masking, drug: local antiseptic and analgesicActing on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygenActing on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; transferring one-carbon groupsCytochrome P450, cytochrome P450: CYP2E1, alcohol dehydrogenaseAlcohol dehydrogenase; isoeugenol synthase; salicylate 1-monooxygenase; 4-hydroxyphenylacetate 3-monooxygenase; catechol O-methyltransferase; etc.NAThermus; Zymomonas; Staphylococcus; Pseudomonas; Bacillus; Acinetobacter; etc.(Jäckh et al., 2012)
104-PhenylenediamineHair dyeingActing on paired donors, with incorporation or reduction of molecular oxygenActing on other nitrogenous compounds as donors; acting on paired donors, with incorporation or reduction of molecular oxygen; carbon-carbon lyasesCytochrome P450Nitrobenzene nitroreductase; histidine decarboxylaseNAVibrio; Raoultella; Pseudomonas; Ralstonia; Variovorax; etc.(Jäckh et al., 2012)
11AnilinebOutdoor air, tobacco smokeActing on paired donors, with incorporation or reduction of molecular oxygenActing on other nitrogenous compounds as donors; acting on paired donors, with incorporation or reduction of molecular oxygen; transferring one-carbon groups; acyltransferases; glycosyltransferases; transferring sulfur-containing groups; acting on carbon-nitrogen bonds, other than peptide bonds; carbon-carbon lyases; forming carbon-nitrogen bondsCytochrome P450Phenol 2-monooxygenase; azobenzene reductase; arylamine N-acetyltransferase; arylamine glucosyltransferase; amine sulfotransferase; aryl acylamidase; aminobenzoate decarboxylase; gamma-glutamylanilide synthase; etc.NABacillus; Staphylococcus; Mycobacterium(Jäckh et al., 2012)
12Methyl red or azo dyeaTextile dyes, tattoo inks, and cosmetic colorantActing on other nitrogenous compounds as donorsActing on the CH-CH group of donors; acting on other nitrogenous compounds as donors; acting on single donors with incorporation of molecular oxygen (oxygenases); acting on paired donors, with incorporation or reduction of molecular oxygenAzoreductase (Azo1)FMN-dependent NADPH azoreductase (Azo1); NAD(P)H-dependent oxidoreductase; 4-(dimethylamino)phenylazoxybenzene reductase; azobenzene reductaseStaphylococcus; Corynebacterium; Micrococcus; Dermacoccus; KocuriaStaphylococcus; Bacillus(Stingley et al., 2010)
13FluorouracilUsed in actinic keratosis and skin warts conditionsGlycosyltransferases; acting on the CH-CH group of donorsGlycosyltransferase; acting on the CH-CH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; acting on CH or CH2 groups; glycosylases; acting on carbon-nitrogen bonds, other than peptide bondsOrotate phosphoribosyltransferase; uridine phosphorylase; thymidine phosphorylase; dihydropyrimidine dehydrogenaseOrotate phosphoribosyltransferase; uridine phosphorylase; thymidine phosphorylase; dihydropyrimidine dehydrogenase; unspecific monooxygenaseNAYersinia; Wolinella; Vibrio; Tolumonas; Shigella; Aeromonas; Xylella; etc.(Amirfallah et al., 2018; Longley et al., 2003)
14Propylene glycolUsed as demulcent and in skin lotions and ointmentsActing on the CH-OH group of donorsActing on the CH-OH group of donors; carbon-carbon lyases; carbon-oxygen lyasesAlcohol dehydrogenasesL-glycol dehydrogenase; aldehyde reductase; glycerol dehydrogenase; lactaldehyde reductase; propanediol dehydratase; etc.NASalmonella; Escherichia; Geobacillus; Citrobacter; Klebsiella; etc.(Ewaschuk et al., 2005; Kraut and Kurtz, 2008)
15HydroquinoneDemelanizing agentActing on single donors with incorporation of molecular oxygen; glycosyltransferasesActing on single donors with incorporation of molecular oxygen (oxygenases); acting on paired donors, with incorporation or reduction of molecular oxygen; glycosyltransferases; transferring alkyl or aryl groups, other than methyl groups; glycosylases; carbon-carbon lyases; etc.Dioxygenases; quinone oxidoreductaseCatechol 1 2-dioxygenase; hydroquinone 1 2-dioxygenase; toluene 4-monooxygenase; p-benzoquinone reductase; 4-hydroxybenzoate 1-hydroxylase; arylesterase; hydroquinone glucosyltransferase; etc.NAPseudomonas; Bacillus; Escherichia; Acinetobacter; etc.(McDonald et al., 2001; Zhang et al., 2012)
16Para-aminobenzoic acidUsed as sunscreen and melanizing agent. Also used in treatment of fibrotic skin disordersTransferring alkyl or aryl groups, other than methyl groupsTransferring alkyl or aryl groups, other than methyl groups; acting on paired donors, with incorporation or reduction of molecular oxygen; transferring one-carbon groups; carbon-carbon lyases; etc.Dihydropteroate synthaseBenzoate 1 2-dioxygenase; phenylethanolamine N-methyltransferase; dihydropteroate synthase; aminobenzoate decarboxylase; etc.NAYersinia; Vibrio; Streptococcus; Shigella; Streptomyces; etc.(Wegkamp et al., 2007)
17BenzophenoneUsed as organic sunscreen and melanizing agent. Also used as a fragrance enhancerActing on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donorsCytochrome P450 (CYPs)Nitrilotriacetate monooxygenaseNASorangium; Anabaena; Burkholderia; Sorangium; etc.(Watanabe et al., 2015)
18LindaneUsed in scabies and pediculosis skin conditionsActing on halide bonds; acting on paired donors, with incorporation or reduction of molecular oxygenActing on halide bonds; acting on paired donors, with incorporation or reduction of molecular oxygen; etc.Dehydrochlorinase enzymeHaloalkane dehalogenaseNASphingobium; Psychrobacter; Phenylobacterium; Caulobacter; etc.(Macholz and Kujawa, 1985; Nagata et al., 1993; Tanaka et al., 1979)
19Ethinyl estradiolUsed for the treatment of moderate acne vulgaris (common acne) in femalesActing on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors; glycosyltransferases; etc.Cytochrome P450 (CYPs)Unspecific monooxygenase; aromatase; 3-beta-hydroxy-delta-5-steroid dehydrogenase; etc.NANo genus predicted(Stimmel et al., 1951; Wang et al., 2004)
20DocosanolUsed in the treatment of herpes virus infection-caused recurring episodes of small, painful, fluid-filled blisters on the skinActing on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygenActing on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygenNAHexadecanol dehydrogenase; long-chain-alcohol dehydrogenase; naphthalene 1 2-dioxygenase; etc.NAGeobacillus; Zymomonas; Staphylococcus; Bacillus; etc.(Pope et al., 1996)
21Lauric acidUsed on skin for its antibacterial properties and ability to effectively combat acneActing on paired donors, with incorporation or reduction of molecular oxygen; acting on the aldehyde or oxo group of donorsActing on paired donors, with incorporation or reduction of molecular oxygen; acting on the aldehyde or oxo group of donors; acting on ester bondsFatty acid beta-oxidation enzymes such as acyl CoA dehydrogenase, hydrolase, etc.Trimethyllysine dioxygenase; aldehyde dehydrogenase; oleoyl-(acyl-carrier-protein) hydrolase; clavaminate synthase; etc.NABacillus; Mycobacterium; Escherichia; Thermus; etc.(Dayrit, 2015)
22Methyl lactatebUsed as a soothing and cooling agent for skin. Also to provide relief against itching and irritation on skinActing on the CH-OH group of donorsActing on the CH-OH group of donors; acting on the aldehyde or oxo group of donors; acyltransferases; etc.Lactate dehydrogenaseL-lactate dehydrogenase; glyoxylate reductase; oxaloglycolate reductase (decarboxylating); pyruvate synthase; formate C-acetyltransferase; etc.Staphylococcus; Micrococcus; etc.Staphylococcus; Vibrio; Micrococcus; Shewanella; etc.(Gladden, 2004; Lam et al., 2018)
23MethylparabenbUsed in cosmetic products as preservative to give products a longer shelf lifeGlycosyltransferases; acting on ester bonds; acting on paired donors, with incorporation or reduction of molecular oxygenGlycosyltransferases; acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors; etc.GlucuronosyltransferaseCyanohydrin beta-glucosyltransferase; 4-hydroxyphenylacetaldehyde oxime monooxygenase; 4-hydroxybenzoate 1-hydroxylase; 3-phenylpropanoate dioxygenase; etc.NAPseudomonas; Escherichia; Bacillus; Mycobacterium; etc.(Abbas et al., 2010; Moos et al., 2016)
24TriclosanUsed as antiseptic and antibacterial on the skinActing on paired donors, with incorporation or reduction of molecular oxygen; glycosyltransferasesActing on paired donors, with incorporation or reduction of molecular oxygen; acting on single donors with incorporation of molecular oxygen (oxygenases); acting on the CH-CH group of donorsSulfur oxygenase/reductase; cytochrome P450 (CYPs); glucuronosyltransferaseSulfur oxygenase/reductase; carbazole 1,9a-dioxygenase; naphthalene 1,2-dioxygenase; persulfide dioxygenase; etc.NAPseudomonas; Bacillus; Streptococcus; Yersinia; etc.(Fang et al., 2016; Wang et al., 2018)
25TerbinafinebUsed to treat the fungal infections on skinActing on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygenCytochromes (CYPs)Ammonia monooxygenaseNANitrosomonas; Nitrosospira(Vickers et al., 1999)
26Alpha-tocopherolbUsed in cosmetic products to prevent UV damage to the skinActing on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors; transferring one-carbon groupsCytochromes (CYPs)Validamycin A dioxygenases; aurachin C monooxygenase/isomerase; calcidiol 1-monooxygenase; thymidylate synthase; etc.Deinococcus; StenotrophomonasDeinococcus; Stenotrophomonas; Zymomonas; Yersinia; etc.(Johnson et al., 2013; Timm et al., 2020)
27CholesterolbUsed as an emollient in cosmetic products such as eye makeup, face makeup, skin lotions, creams, and hair care formulationsActing on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors; acting on the CH-CH group of donors; intramolecular oxidoreductasesSterol 27-hydroxylaseCholesterol 25-hydroxylase; vitamin D 25-hydroxylase; 11-beta-hydroxysteroid dehydrogenase; 3-beta-hydroxysteroid 3-dehydrogenase; etc.Streptococcus; Bacillus; AerococcusStreptococcus; Bacillus; Mycobacterium; Aeromonas; etc.(Iuliano, 2011; Timm et al., 2020)
28Linoleic acidUsed in the conditions of skin irritation and to reduce acne breakoutsActing on paired donors, with incorporation or reduction of molecular oxygenActing on paired donors, with incorporation or reduction of molecular oxygen; acting on single donors with incorporation of molecular oxygen (oxygenases); acting on ester bondsLipoxygenase; delta-6-desaturaseLinoleate 13S-lipoxygenase; linoleate 11-lipoxygenase; acyl-CoA 6-desaturase; palmitoyl-CoA hydrolase; etc.Paracoccus; Stenotrophomonas; Brevundimonas; Staphylococcus; etc.Paracoccus; Stenotrophomonas; Brevundimonas; Staphylococcus; etc.(Brown et al., 2000; Gardner, 1970; Timm et al., 2020)

Threshold for the subclass prediction models of “Oxidoreductases” was set from 0.5 to 0.1, and for “Transferases,” “Hydrolases,” and “Lyases” the thresholds were set from 0.5 to 0.2. The Tanimoto coefficient threshold was set to 0.5.

Tanimoto coefficient threshold was set to 0.5 or 0.3.

The prediction of metabolism of the 28 selected molecules by SkinBug along with their known metabolism from literature Threshold for the subclass prediction models of “Oxidoreductases” was set from 0.5 to 0.1, and for “Transferases,” “Hydrolases,” and “Lyases” the thresholds were set from 0.5 to 0.2. The Tanimoto coefficient threshold was set to 0.5. Tanimoto coefficient threshold was set to 0.5 or 0.3. For the metabolism of hydroquinone molecule (used as a demelanizing agent) SkinBug predicted monooxygenases, dioxygenases, hydroxylases, and arylesterases, and the reference databases also indicated the same four classes of enzymes for its metabolism. Likewise, for terbinafine (used to treat fungal infections on skin) and fluorouracil (used in actinic keratosis and skin warts conditions), the reference databases suggested their metabolism by human liver cytochromes, and SkinBug also predicted metabolic enzymes from skin microbiome with similar activity such as monooxygenases. For para-aminobenzoic acid, which is used as melanizing agent and for the treatment of different skin disorders, both SkinBug and reference databases suggested benzoate dioxygenases for its metabolism. Similarly, for methylparaben (used in cosmetic products), which is metabolized by 4-hydroxybenzoate 1-hydroxylase and monooxygenases as per the reference databases, SkinBug also predicted the same enzymes from skin microbiome for its metabolism. It is noteworthy that for some molecules such as cinnamyl alcohol, BaP, fluorouracil, and 7,12-dimethylbenz(a)anthracene molecules, SkinBug only predicted those reactions that are known in literature without any false-positives confirming to its robustness. However, the strength of SkinBug lies in the fact that for the other molecules as mentioned in Table 3, in addition to correctly predicting the known reactions it also predicted additional metabolic reactions, which seems correct as per the reactive functional groups present on those molecules, along with the implicated microbial enzymes and species. Thus, the predictions of SkinBug corroborated with the metabolic information available on these reference databases and experimental studies (Table 3), confirming to the accuracy and reliability of the predictions.

Metabolism of Benzo(a)pyreme by skin microbiome

BaP is an abundant environmental pollutant found in almost all types of soot and smoke generated by the incomplete combustion of fossil fuel, coal, and other biomass including tobacco. The metabolism of this polycyclic aromatic hydrocarbon molecule by oxidation reaction through cytochrome P450 enzyme from human host and monooxygenases and dioxygenase enzymes from bacterial species of skin microbiome is known from experimental studies, and reference databases such as the University of Minnesota Biocatalysis/Biodegradation Database (Gelboin, 1980; Gibson et al., 1975; Jiang et al., 2007; Schwarz et al., 2001; Sowada et al., 2014). For this molecule, SkinBug correctly predicted its metabolism by skin microbiome as shown by an experimental study from Sowada et al., 2014, and predicted its oxidation by naphthalene 1,2-dioxygenase and monooxygenase enzymes from multiple species of Burkholderia, Polaromonas, Pseudomonas, and Ralstonia genera from the skin microbiome. This case study further supports the validity, accuracy, and utility of SkinBug tool. The ingestion of BaP is known to be toxic due to its metabolism by human cytochrome P450 enzyme and the products of its oxidation are known to cause carcinogenicity and other adverse health effects by reacting with the host DNA (Gibson et al., 1975; Vo-Dinh et al., 1987; Zhou et al., 2017). Thus considering the previous findings and prediction results, it is apparent that its contact with skin could also be toxic due to its undesired metabolism by the skin microbiome. Similarly, the prediction of aniline metabolism that is largely present in the tobacco smoke reveals that this can also be metabolized by the microbial species present in the skin microbiome.

Metabolism of Azo dyes by skin microbiome

Azo dyes such as MR are widely used in cosmetics and other products that regularly come in contact with our skin in the form of tattoo ink, hair colors, and textile colors. These dyes are experimentally known to be reduced by azoreductase (Azo1) enzyme found in several species of the skin microbiome (Stingley et al., 2010). SkinBug also predicted the reduction of these dyes by the same metabolic enzyme and species and also predicted some other metabolic enzymes from different bacterial species of skin microbiome with azoreductase activity that could also potentially metabolize the azo dyes.

Discussion

The skin microbiome is capable of undertaking numerous metabolic reactions in addition to that of the human genome (Stingley et al., 2010). The regular contact of skin microbiome with cosmetic agents, pollutants, and topical substances like skincare products and medical ointments, etc., which are part of the skin exposome, creates unprecedented possibilities of their promiscuous metabolism that results in modulation of their efficacy, and occasional toxicity associated with skin rashes and cancer (Chung, 1983; Stingley et al., 2010). To predict all the possible metabolic reactions that can occur to such chemical substances by our skin microbiome, along with the information on respective reaction centers, metabolic enzymes, microbial species carrying these enzymes, and also the skin sites harboring these species, “SkinBug” is proposed to be a unique, reliable and user-friendly tool. One of the key contributions of this study is the construction of the skin microbiome-specific metabolic database using the pangenomes instead of individual genomes that helped in incorporating the metabolic potential of all the strains of a given species. Furthermore, we used only the high-quality pangenomes of 897 species that helped in the construction of the first comprehensive metabolic enzyme database of skin microbiome, which is the integral part of the SkinBug tool for making reliable metabolic predictions. The inclusion of information on bacterial species from 19 different skin sites and various skin-specific niches including anaerobic/aerobic niches and carbon- and sulfur-rich/-limiting niches in the skin microbiome database was very important for comprehensive training and achieving higher prediction accuracy. It was observed that the skin microbiome database had a higher representation of aerobic species because the metagenomic studies so far have been carried out majorly for the aerobic niches than the anaerobic niches due to the difficulty associated with anaerobic niches in sampling, isolation, and cultivation (Sfriso et al., 2020). Another key aspect of SkinBug is the inclusion of all the well-annotated and manually curated reactions of the KEGG database and their primary substrates for the construction of training set, analysis, and modeling. Likewise, the manual curation of database entries for the selected bacterial species and inclusion of only the complete bacterial genomes from NCBI RefSeq database, along with the usage of very strict thresholds, helped in the exclusion of any false-positives that often lead to errors in the predictions. Furthermore, SkinBug exploits the structural and chemical properties of substrates using chemical descriptors, linear fingerprints, and circular fingerprints, and thus is able to predict all the possible reactions for the given molecule. Therefore, it is also well-equipped to address the cases of enzymatic moonlighting and promiscuity because any additional substrate (for additional metabolic reaction) in such cases will also need to have similar structural and chemical properties as the original substrate. These features ensure the accuracy and wide applicability of the SkinBug tool for metabolic prediction of diverse biotic and xenobiotic molecules. The tool has a modular structure, which helps in easy updates of databases and models. Due to the different levels of secretions, environmental exposure, and topography of skin sites, microbiome variability and the consequential metabolic variability across sites are expected that was also observed in this study. Of the 2,523 unique reactions that were found to be occurring in skin microbiome, only 277 were common to all the sites and the rest were specific to the different skin sites. The results pointed toward the need for site-specific metabolic prediction for skin microbiome, and this is one of the key features of SkinBug. It was apparent from the hierarchical clustering of skin sites that the sites with similar physical and physiological properties clustered together. The exposed, dry, and desiccated skin sites such as the arm, foot, and forearm clustered together as they have similar physical and physiological properties such as moisture level, pH, and environmental exposure. Similarly, the sites with high amount of secretions such as diseased sites, face, and skin surface invaginations clustered together. Likewise, the groin and retroauricular crease sites are similar in terms of moisture levels and secretions, and pressure ulcer and venous leg ulcer are both the diseased sites with similar physiology, thus their clustering was also expected and observed in this study. Furthermore, it was intriguing to observe that cheek, perineum, and inner wrist clustered together. The clustering of cheek and perineum was expected because both sites have a lot of secretions by glands such as sebaceous, sweat, and hepatoid, and thus may harbor similar microbiome, which was also reported by other studies (Grice and Segre, 2011). However, the clustering of inner wrist, which is a desiccated and dry site, with check and perineum was surprising. The plausible explanation for it emerges from the recent studies which reported that the dry and desiccated sites harbor a highly dynamic flora, and a phylogenetic diversity that is even higher than the diversity of gut and oral cavity microbiomes of the same individual (Costello et al., 2009; Grice and Segre, 2011). In summary, it was evident that the bacterial species and the corresponding metabolic reactions observed at the different skin sites associates with the physiological and physical properties of the skin sites such as topographical location, different secretions, moisture levels, and environmental exposure (Grice et al., 2009; Grice and Segre, 2011). However, there were some key challenges during the development of SkinBug. The first one was to train the multiclass multilabel classification models from the limited dataset of well-annotated reactions with sufficient accuracy. To overcome this challenge, the integration of both the machine learning and neural network methods, selection of optimum algorithms, and rigorous optimization of selected models helped to achieve the high multiclass multilabel classification accuracy. The highly skewed distribution of metabolic reactions in the six reaction classes and their subclasses constituted the second challenge for optimal training of models. To tackle this challenge, the usage of a modified version of the stratified random sampling that included down-sampling, in addition to the standard stratified random sampling method, helped in effectively splitting of the data into training and testing for efficient predictive modeling. The third challenge was the hierarchical nature of the complete reactions annotated as the four-digit EC number, for which the prediction strategies were also needed to be hierarchical and accurate at each step of the prediction to achieve accurate four-digit EC number prediction. To overcome this challenge, an integrated approach employing machine learning and neural network models to predict the reaction class, followed by construction of six different machine learning models to predict the reaction subclasses, and a molecular similarity search-guided k-nearest neigbors method to predict the complete reactions were incorporated into SkinBug. Comprehensive validation of a prediction tool is important to ensure reliable and accurate predictions, which in the case of SkinBug was carried out using a diverse set of molecules (cosmetic, pharmaceutical, pollutants, natural molecules present on skin, etc.) that includes the cases of aerobic and anaerobic metabolism at various skin sites including the dry, moist, and sebaceous niches. The results of validation were well supported by the experimental studies and reference databases. The accuracy of the predictions was also exemplified by the selected case studies of BaP and azo dyes where SkinBug predicted the correct metabolic reactions, enzymes, and microbial species as known from the experimental studies and reference databases (Gibson et al., 1975; Jiang et al., 2007; Schwarz et al., 2001; Sowada et al., 2014; Zhou et al., 2017). Taken together, the robust methodology, training, comprehensive validations, and their biological significance make SkinBug a very useful, accurate, and reliable method to predict the metabolic reaction, enzyme, microbial species, and the specific skin site for any given molecule. However, for the occurrence of a metabolic reaction, the xenobiotic/biotic molecule needs to come in close vicinity of the active enzyme along with other favorable reaction conditions, and it is tempting to question if the predicted reaction will really occur given these conditions. The plausible biological reasoning to answer this emerges from the fact that bacterial species have an abundance of transporters such as outer membrane-associated β-barrel-containing proteins or porins, which may allow for the transport of biotic/xenobiotic molecules, and thus will facilitate their metabolism by the bacteria. One such example is the TonB-dependent transport system where the outer membrane-associated TonB-dependent transporter (TBDT), and other similar ATP-driven influx transporters, facilitate the active transport and subsequent metabolism of xenobiotic compounds by bacterial cells (Jindal et al., 2019; Samantarrai et al., 2020). The experimental supports to these processes are provided by a few recent studies that showed the metabolism of different biotic/xenobiotic molecules such as BaP, glycoholic acid, cholesterol, glycerol, azo dyes, arginine, triglyceride lipids, propylene glycol, palmitic acid, alpha-tocopherol, uric acid, lactic acid, ethanol amine, and linolenic acid on their incubation with the bacterial species of the skin microbiome (Sowada et al., 2014; Stingley et al., 2010; Timm et al., 2020). The case studies along with biological validations confirm the accuracy and reliability of SkinBug with strong control over false-positives and false-negatives, highlight the potential of SkinBug in revealing the metabolic consequences of undesired metabolism by our skin microbiome, and provide leads for further experimental studies. It is also anticipated to have applications in skin microbiome and exposome studies, in the development of novel diagnostic and therapeutic approaches in dermatology, and in cosmetics industry for developing safer, more effective and population-specific products.

Limitations of the study

The availability of complete and annotated genomes of bacterial species for skin microbiome is one of the limiting factors that determines the size and comprehensiveness of the metabolic database for the skin microbiome. Similarly, machine learning and neural network models are trained on available metabolic reaction information, which currently has a lower representation of Isomerases and Ligases reaction classes. Thus, the availability of more information on genomes and metabolic reactions in the reference databases will further strengthen and broaden the applicability of SkinBug.

Conclusion

Our skin comes in regular contact with many chemical molecules due to pollution or usage of skin care products, thus the prediction and evaluation of metabolism of these molecules by skin microbiome species was much needed for which SkinBug is a valuable contribution. The integrated approach of using machine learning, neural networks, and chemoinformatics along with the database of metabolic enzymes for skin microbiome species constructed in this study helped in prediction of all the possible metabolic reactions that can occur to a given molecule, their respective reaction centers, the metabolic enzymes that can perform the predict reactions, species that harbors these metabolic enzymes, and the skin sites that carry these species. The case studies along with biological validations presented here attest to the accuracy, applicability, and potential of SkinBug. This first state-of-the-art tool to predict the metabolism of a given molecule by the skin microbiome will be very useful for the development of novel therapeutic approaches in dermatology and cosmetics and will also provide leads for future experimental studies.

Resource availability

Lead contact

Further information, requests, and inquiries should be directed to the Lead Contact, Vineet K. Sharma (vineetks@iiserb.ac.in).

Materials availability

This study did not generate new data and only used the data available in public databases mentioned in the manuscript text.

Data and code availability

The published article includes all data used or analyzed during this study.

Methods

All methods can be found in the accompanying Transparent methods supplemental file.
  71 in total

Review 1.  Enzyme promiscuity: a mechanistic and evolutionary perspective.

Authors:  Olga Khersonsky; Dan S Tawfik
Journal:  Annu Rev Biochem       Date:  2010       Impact factor: 23.643

2.  LSTM: A Search Space Odyssey.

Authors:  Klaus Greff; Rupesh K Srivastava; Jan Koutnik; Bas R Steunebrink; Jurgen Schmidhuber
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2016-07-08       Impact factor: 10.451

Review 3.  The human skin microbiome.

Authors:  Allyson L Byrd; Yasmine Belkaid; Julia A Segre
Journal:  Nat Rev Microbiol       Date:  2018-01-15       Impact factor: 60.633

Review 4.  Lactate metabolism: a new paradigm for the third millennium.

Authors:  L B Gladden
Journal:  J Physiol       Date:  2004-05-06       Impact factor: 5.182

5.  Removal and metabolism of triclosan by three different microalgal species in aquatic environment.

Authors:  Shujuan Wang; Karen Poon; Zongwei Cai
Journal:  J Hazard Mater       Date:  2017-09-05       Impact factor: 10.588

6.  The involvement of CYP3A4 and CYP2C9 in the metabolism of 17 alpha-ethinylestradiol.

Authors:  Bonnie Wang; Rosa I Sanchez; Ronald B Franklin; David C Evans; Su-Er W Huskey
Journal:  Drug Metab Dispos       Date:  2004-08-10       Impact factor: 3.922

7.  The Transformer database: biotransformation of xenobiotics.

Authors:  Michael F Hoffmann; Sarah C Preissner; Janette Nickel; Mathias Dunkel; Robert Preissner; Saskia Preissner
Journal:  Nucleic Acids Res       Date:  2013-12-10       Impact factor: 16.971

Review 8.  Revealing the secret life of skin - with the microbiome you never walk alone.

Authors:  R Sfriso; M Egert; M Gempeler; R Voegeli; R Campiche
Journal:  Int J Cosmet Sci       Date:  2019-12-25       Impact factor: 2.970

9.  Isolation and characterization of diverse microbial representatives from the human skin microbiome.

Authors:  Collin M Timm; Kristin Loomis; William Stone; Thomas Mehoke; Bryan Brensinger; Matthew Pellicore; Phillip P A Staniczenko; Curtisha Charles; Seema Nayak; David K Karig
Journal:  Microbiome       Date:  2020-04-22       Impact factor: 14.650

10.  The University of Minnesota Biocatalysis/Biodegradation Database: improving public access.

Authors:  Junfeng Gao; Lynda B M Ellis; Lawrence P Wackett
Journal:  Nucleic Acids Res       Date:  2009-09-18       Impact factor: 16.971

View more
  4 in total

1.  SKIOME Project: a curated collection of skin microbiome datasets enriched with study-related metadata.

Authors:  Giulia Agostinetto; Davide Bozzi; Danilo Porro; Maurizio Casiraghi; Massimo Labra; Antonia Bruno
Journal:  Database (Oxford)       Date:  2022-05-16       Impact factor: 4.462

Review 2.  Biocosmetics: technological advances and future outlook.

Authors:  Nishu Goyal; Frankline Jerold
Journal:  Environ Sci Pollut Res Int       Date:  2021-11-25       Impact factor: 5.190

3.  Commensal-Related Changes in the Epidermal Barrier Function Lead to Alterations in the Benzo[a]Pyrene Metabolite Profile and Its Distribution in 3D Skin.

Authors:  Lisa Lemoine; Dilan Bayrambey; Alexander Roloff; Christoph Hutzler; Andreas Luch; Tewes Tralau
Journal:  mBio       Date:  2021-09-28       Impact factor: 7.867

Review 4.  Exposome and Skin: Part 1. Bibliometric Analysis and Review of the Impact of Exposome Approaches on Dermatology.

Authors:  Manuel Molina-García; Corinne Granger; Carles Trullàs; Susana Puig
Journal:  Dermatol Ther (Heidelb)       Date:  2022-02-03
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.