Literature DB >> 33385118

SkinBug: an artificial intelligence approach to predict human skin microbiome-mediated metabolism of biotics and xenobiotics.

Shubham K Jaiswal¹, Shitij Manojkumar Agarwal¹, Parikshit Thodum¹, Vineet K Sharma¹.

Abstract

In addition to being pivotal for the host health, the skin microbiome possesses a large reservoir of metabolic enzymes, which can metabolize molecules (cosmetics, medicines, pollutants, etc.) that form a major part of the skin exposome. Therefore, to predict the complete metabolism of any molecule by skin microbiome, a curated database of metabolic enzymes (1,094,153), reactions, and substrates from ∼900 bacterial species from 19 different skin sites were used to develop "SkinBug." It integrates machine learning, neural networks, and chemoinformatics methods, and displays a multiclass multilabel accuracy of up to 82.4% and binary accuracy of up to 90.0%. SkinBug predicts all possible metabolic reactions and associated enzymes, reaction centers, skin microbiome species harboring the enzyme, and the respective skin sites. Thus, SkinBug will be an indispensable tool to predict xenobiotic/biotic metabolism by skin microbiome and will find applications in exposome and microbiome studies, dermatology, and skin cancer research.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: In Silico Biology; Metabolomics; Systems Biology

Year: 2020 PMID： 33385118 PMCID： PMC7772573 DOI： 10.1016/j.isci.2020.101925

Source DB: PubMed Journal: iScience ISSN： 2589-0042

Introduction

After the human gut, it is the skin that harbors the largest and most exposed human-associated microbiome known as the skin microbiome that plays a pivotal role in both health and disease (Grice et al., 2009; Grice and Segre, 2011; Kong and Segre, 2012). Skin is composed of three layers, among which the outermost layers (epidermis and dermis) serve as an elaborate host for building an ecosystem with more than a trillion microbial cells belonging to more than 1,000 microbial species from 19 different phyla (Edwards and Marks, 1995; Grice et al., 2009; Grice and Segre, 2011; Nakatsuji et al., 2013). Several specialized niches such as sebaceous, moist, and dry are found on the skin with differences in moisture level; aerobic, anaerobic, and semi-aerobic conditions; carbon and sulfur availability; secretions; and exogenous environmental factors (Grice and Segre, 2011). The aerobic niches include the skin surface and the epidermis layer, whereas the dermis layer and specifically the invaginations that form folliculo-sebaceous units or roots of the hair follicles and sweat glands form the anaerobic and lipid-rich niches for anaerobic species (Bay et al., 2020; Grice and Segre, 2011). The skin site-specific niches support the growth of diverse bacterial species and result in enormous compositional differences in skin microbiome (Grice and Segre, 2011). The commensal skin microbiome influences the physical characteristics of the epidermis layer and regulates the development of immune system, and a dysbiosis of this community is associated with several skin diseases such as acne, atopic dermatitis, and psoriatic lesions (Byrd et al., 2018; Grice et al., 2009; Kong et al., 2012; Naik et al., 2012; Picardo and Ottaviani, 2014; SanMiguel and Grice, 2015; Zeeuwen et al., 2013; Saxena, 2018). In addition to being crucial for the host health, the skin microbiome also possesses a large and diverse reservoir of metabolic enzymes that play a key role in metabolizing numerous biomolecules naturally produced by either host skin cells or other commensal microorganisms. One such example of a common skin commensal is Staphylococcus epidermidis that can ferment the glycerol present in the skin and can inhibit the growth of Propionibacterium acnes, which causes acne vulgaris (Wang et al., 2014). This gigantic pool of metabolic enzymes can also potentially metabolize the xenobiotic molecules that come in regular contact with our skin such as those present in cosmetics, pharmacological formulations, and the environmental pollutants, which together comprise a large part of the skin exposome (Stingley et al., 2010). Furthermore, due to the presence of both aerobic and anaerobic microbial species in the aerobic and anaerobic niches, the skin microbiome can perform the aerobic or anaerobic metabolism of any biotic/xenobiotic molecule. For example, the metabolism of synthetic azo dyes such as methyl red (MR) is reported by both aerobic and anaerobic bacterial species of skin microbiome that reduce the dye using their azoreductase enzyme (Stingley et al., 2010). The azo dyes are regularly used in cosmetics, tattoo inks, and other products, and the reduction of these dyes due to their undesirable metabolism by the skin microbiome-harbored metabolic enzymes can produce carcinogenic aromatic amines, which poses significant health risks (Chung, 1983; Nakayama et al., 1983). Another example is the aerobic oxidative metabolism of an abundant environment pollutant BaP by monooxygenase enzymes from different bacterial species of skin microbiome (Sowada et al., 2014). The anaerobic metabolism of different skin secretion substances such as triglyceride lipids and secretory proteins by lipases and proteases has also been reported by experimental studies (Byrd et al., 2018). Similarly, the metabolism of glycolic acid, cholesterol, and glycerol by aerobic bacterial species; the metabolism of arginine, triglyceride lipids, propylene glycol, and palmitic acid by anaerobic bacterial species; and the metabolism of alpha-tocopherol, uric acid, lactic acid, ethanolamine, linolenic acid by both aerobic and anaerobic bacterial species of skin microbiome have been reported recently by an experimental study (Timm et al., 2020). However, in most of these cases, the exact metabolic reactions and the metabolic enzymes responsible for the metabolism of these molecules are yet unknown. The skin microbiome-mediated undesired metabolism of commonly used cosmetic products and therapeutic drugs applied on the skin can significantly alter the bioavailability and therapeutic bioactivity of these molecules. Therefore, exploration of the metabolic potential of the enzymes present in the skin microbiome is much needed to understand their impact and role in human health and disease, in exposome studies, to determine the efficacy and bioavailability of the therapeutic and cosmetic molecules applied on the skin, and also to predict their potential toxicity due to the promiscuous metabolism by the skin microbiome. However, the comprehensive experimental metabolic investigation of each molecule separately by skin microbiome through experimental approaches is a very challenging and tedious task because skin microbiome shows the highest longitudinal variability and largest phylogenetic diversity (Grice and Segre, 2011). In this scenario, the development of an efficient computational method for the prediction of metabolism of chemical substances by microbial species present in skin microbiome appears as a promising approach. At present, there exists no tool for the prediction of metabolism of cosmetics or any xenobiotic molecule solely by the skin microbial species. Therefore, we developed a tool named “SkinBug” by integrating machine learning, neural networks, and chemoinformatics methods to predict the metabolic reactions, corresponding reaction centers, metabolic enzymes that can catalyze the predicted reactions, species containing predicted enzymes, and skin sites harboring predicted species for any given biotic/xenobiotic molecule by the skin microbiome. As the metabolic enzymes show promiscuity and are capable of metabolizing structurally similar substrates, the structural and chemical properties of substrate molecules from the known reactions were used as features for predicting the metabolism of biotic or xenobiotic/therapeutic molecules by skin microbiome (Babtie et al., 2010; Hult and Berglund, 2007; Khersonsky et al., 2006; Pandya et al., 2014; Khersonsky and Tawfik, 2010; Sharma, 2017). The predictions from the tool can help to determine the metabolic potential of the skin microbiome, in exposome studies, and to design more efficient skin therapeutic molecules and cosmetic agents by considering their metabolism by skin microbiome.

Results

In this study, we have developed a computational tool named “SkinBug” to predict the metabolic reaction, enzymes, species, and skin sites of the skin microbiome that can potentially metabolize the biotic/xenobiotic molecules by integrating chemoinformatics, machine learning, and neural network methodologies.

Construction of pangenomes for skin microbiome

Several metagenomic studies have explored the bacterial diversity of different skin sites; however, until now there is no single resource available that provides information on the bacterial species present at different skin sites. Therefore, the data and text mining of the available literature was performed to construct the database of bacterial species present at different skin sites. A total of 1,616 unique bacterial species were identified from the 19 different skin sites from literature. Of the 1,616 identified bacterial species, the complete genomes of 897 could be identified at the “NCBI RefSeq” database. The genomes of all the strains were used to construct the pangenome of a particular species. A total of 897 pangenomes were constructed for the human skin microbiome and were used for further analysis.

Transferases are the most abundant enzymes in skin microbiome

Using the ExPASy enzyme database a total of 1,094,153 metabolic enzymes were identified from the 897 pangenomes. Each of these metabolic enzymes was linked to a four-digit EC number based on the associated EC number of its closest homolog. From the distribution of metabolic enzymes across six reaction classes, it was apparent that the enzymes belonging to “Transferases” class were the most abundant in the skin microbiome, whereas “Isomerases” enzymes were the least abundant (Figure S1). The proportions of different reaction class were identified for each of the 19 skin sites (Figure 1A). For all the sites, we observed that the “Transferases” reactions were most abundant followed by the “Oxidoreductases” and “Hydrolases” reactions, perhaps due to the higher abundance of these classes of enzymes in the bacterial pangenomes. The inner wrist, perineum, and cheek had the highest proportions of the “Ligases”_underreactions and the lowest proportions of “Hydrolysis” reactions when compared with the other skin sites. The inner wrist and perineum also have the least proportion of “Oxidoreductases” reactions when compared with the other skin sites.

Figure 1

Diversity of metabolic enzymes on skin sites

(A) Proportions of different metabolic enzymes from six types of reaction classes (“Oxidoreductases,” “Hydrolases,” “Transferases,” “Lyases,” “Isomerases,” and “Ligases”) plotted as stacked bar plots across the 19 different skin sites.

(B) Silhouette plot for determining the optimum number of clusters for clustering different skin sites. The y axis is average silhouette width value, and the x axis is the number of clusters. The k-value at which the maximum of average silhouette width is achieved is the optimum number of clusters.

(C) The dendrogram showing the results of hierarchical clustering of the skin sites based on the 2,523 unique reactions that can be performed by the metabolic enzymes of species present in skin microbiome. The approximate unbiased p values (AUp) and the bootstrap probability (BP) values for each branch/cluster were calculated using multiscale bootstrap resampling and using normal bootstrap resampling, respectively. The AUp values are mentioned in red, and the BP values are mentioned in green.

Skin sites are highly variable in enzymatic reactions

Of the 5,430 reactions on KEGG (Kyoto Encyclopedia of Genes and Genomes) database, the complete skin microbiome had enzymes for 2,523 reactions (Kanehisa and Goto, 2000). To identify the similarities and differences in skin sites with respect to the enzymatic reactions, hierarchical clustering was performed and the approximate unbiased p values (AUp) and the bootstrap probability (BP) values for each branch/cluster were calculated. The optimum number of clusters was identified to be two based on the average silhouette method (Figure 1B). Among the two most optimum clusters, the sites cheek, inner wrist, and perineum clustered together, and the rest of the sites formed another cluster (Figure 1C). Moreover, four clades were found to be statistically significant in the hierarchical clustering of skin sites. The first clade was formed by the cheek, inner wrist, and perineum; the second clade was formed by the arm and foot; the third clade was formed by the groin and retroauricular crease; and the fourth clade was formed by pressure ulcer and venous leg ulcer (Figure 1C). It is apparent that the sites with similar secretions, humidity, environmental exposure, or disease status clustered together. Also the total reactions present at each site and the reactions common to the different sites were identified using the matrix layout analysis (Supplementary Text S5). Foot and arm had the highest number of enzymatic reactions, whereas the perineum, inner writs, and cheek had the lowest number of enzymatic reactions (Figure S2). These observations are supported by a previous finding, which showed that dry niches of skin microbiome such as foot and arm have higher phylogenetic diversity than the other niches of skin microbiome (Grice and Segre, 2011). Only 277 reactions of 2,523 were common to all the 19 skin microbiome sites, and many reactions were unique to specific groups of sites. For example, 134 reactions were unique to foot and arm and occur only at these two sites (Figure S2). This suggests that the metabolic potential of different skin microbiome sites is variable, and site-specific metabolism should be considered while developing the tool to predict the metabolism of molecules by skin microbiome.

Metabolic complexity of microbiome

The principal-component analysis (PCA) with 2,322 variables was performed for the 3,769 substrates (Supplementary Text S6 and S7). In the PCA analysis, PC-1 (14.7%) and PC-2 (5.7%), collectively could explain only 20.4% of the variance present in the input dataset (Figure 2A). Furthermore, from the scree and cumulative scree plot it is apparent that more than 1,000 dimensions (PCs) were needed to explain the complete variance (>95%) present in the dataset (Figure 2A). This suggests that the dataset is very diverse and most of the variables add significant variance to the dataset (Figures 2A, S3, and S4).

Figure 2

Evaluating the diversity and complexity of metabolic dataset

(A) The cumulative scree plot and a normal scree plot from the PCA analysis of all the substrate molecules from the complete input dataset based on the 2,322 selected features. The x axis is the principal component number, y axis for the dot plot is the cumulative variance explained by the individual principal components, and the y axis for the bar plot is the percentage of variance explained by the individual principal components.

(B) The density-based clustering of substrate molecules using the principal component PC1 and PC2 from the PCA analysis. The density-based clustering was performed with MinPts value 20 and epsilon value 3.25. Three dense clusters were identified, which are colored as blue for cluster-1, yellow for cluster-2, and gray for cluster-3.

(C) The proportion of different reaction classes in each cluster is shown as pie chart with percentage value labeled for each reaction class. The proportion is calculated based on the number of substrate molecules that can undergo a particular kind of reaction class in that cluster. The pie chart is mentioned for all three above identified clusters: cluster-1, cluster-2, and cluster-3.

Prediction of metabolism by skin microbiome

The complete workflow of the construction of “SkinBug” is mentioned in Figure 3. “SkinBug” provides a comprehensive prediction of the possible metabolic reactions and their respective reaction centers, corresponding metabolic enzymes, species having those enzymes, and skin sites carrying those species. The five key steps involved in the prediction are classified into five modules. The first module predicts the reaction class, the second module predicts reaction subclass, the third module predicts complete reactions, the fourth module predicts the reaction centers in the molecule for the predicted reactions, and the fifth module provides information on the metabolic enzymes, species, and skin sites.

Figure 3

The algorithm and complete workflow for the construction of SkinBug

Module-1: Construction of reaction class-specific prediction models

Two kinds of approaches were used; one is machine learning that will be efficient at learning the class correlations and class discrimination, and the other is artificial neural network (ANN) models that will be efficient at learning the class-specific patterns. For the machine learning-based multiclass multilabel predictions, both the problem transformation (five types) and algorithm adaptation methods were used. For each type of problem transformation seven types of best suited core learners were used. Thus, the performances of a total of 34 different problem transformation models and the two available algorithm adaptation models were compared. To obtain a comprehensive and reliable comparison of the models, the comparison was performed using three types of datasets: (1) boruta selected fingerprints, boruta selected descriptor, and ECFP, (2) boruta selected fingerprints, boruta selected descriptor, and FCFP; and (3) boruta selected fingerprints, boruta selected descriptor, ECFP, and FCFP. The values of multilabel accuracy, multilabel precision, F1 score, and hamming loss for all the models for all the three datasets are plotted as line plot and are mentioned in Figures S7–S9. From the plots it is evident, that only Random forest survival regression classification (RFSRC) algorithm consistently performed better on all the three datasets, and thus, this algorithm was selected for further modeling. The complete dataset of 3,769 substrates was split into a working dataset and a blind dataset with a split ratio of 95:5. Furthermore, the working dataset was split into training and testing dataset using a customized stratified random sampling method (Methods). The RandomForestSRC model was trained on training dataset, and its performance was evaluated on the testing dataset. From the multilabel performance, it is apparent that the model showed a multilabel accuracy of up to 82.4%, multilabel sensitivity of up to 91.8%, multilabel precision of up to 83.3%, F1 score of up to 85.5%, and hamming loss of up to 0.067 in the three types of statistical testing methods (Figure 4A).

Figure 4

The multilabel and binary performance of reaction class-specific RFSRC prediction model

(A) The multilabel accuracy, multilabel sensitivity, multilabel precision, multilabel F1 score, and hamming loss mentioned for three type of statistical testing of the RFSRC model: stratified randomly sampled test dataset, 5-fold cross-validation, and blind dataset.

(B) The box plot for the performance across each fold of the 5-fold cross-validation for the five different performance measures: multilabel accuracy, multilabel sensitivity, multilabel precision, multilabel F1 score, and hamming loss. The y axis shows the fraction value for each of the performance measure. Fraction values were chosen so that all the measures including hamming loss could be plotted on the same y axis scale.

(C) The binary performance of the reaction class prediction RFSRC model for each of six different reaction classes (“Oxidoreductases,” “Hydrolases,” “Transferases,” “Lyases,” “Isomerases,” and “Ligases”). Fraction values were chosen so that all the measures could be shown on the same scale.

AUC = area under the curve, MMCE = binary mean misclassification error, FNR = binary false-negative rate, FPR = binary false-positive rate, ACC = binary accuracy, MCC = Matthews correlation coefficient, NPV = binary negative predictive value, PPV = binary positive predicted value, F1 = binary F1 score, FDR = binary false discovery rate, GPR = geometric mean of binary precision and binary recall.

Module-2: Construction of reaction subclass-specific prediction models

As one molecule could undergo multiple types of reactions belonging to different reaction subclasses, the multiclass multilabel prediction models were constructed for each type of reaction class to predict the reaction subclasses. The working dataset was split into six datasets, one for each reaction class, and these datasets were utilized for the construction of reaction subclass-specific prediction models (Methods). The previously optimized RFSRC algorithm was used to construct six different multilabel prediction models specific to the six types of reaction classes. The multiclass multilabel performance on testing dataset for RFSRC model for each reaction class is mentioned in Table 1, and for 5-fold cross-validation on training dataset is mentioned in Table 2. The RFSRC models showed the multilabel accuracy of 61.0–74.4%, multilabel sensitivity of 63.7%–77.7%, multilabel precision of 75.0%–91.4%, F1 score of 63.9%–76.5%, and hamming loss of 0.093–0.019 on the test dataset. The binary performance of RFSRC models for each of the corresponding reaction subclass for each reaction class for 5-fold cross-validation is mentioned in Tables S3–S8, and on test dataset, is mentioned in Tables S9–S14. After the different reaction classes are predicted by the reaction class prediction models, the prediction of reaction subclass was performed for each of the predicted reaction class using the respective reaction class-specific RFSRC models.

Table 1

Multiclass multilabel performance metrics for the reaction subclass-specific models calculated using stratified random sampling

Reaction class	Accuracy (%)	Sensitivity (%)	Precision (%)	F1 score (%)	Hamming loss
Oxidoreductases	73.9	77.7	88.6	76.5	0.019
Transferases	61.0	63.7	83.3	63.9	0.072
Hydrolases	72.4	76.8	79.2	74.8	0.04
Lyases	74.4	76.7	85.6	76.2	0.058
Isomerases	72.2	73.6	91.4	73.1	0.06
Ligases	72.0	72.0	75.0	72.0	0.093

All the parameters mentioned above were calculated using multilabel evaluation methodology (see Tables S9–S14).

Table 2

Multiclass multilabel performance metrics for the reaction subclass-specific models calculated using 5-fold cross-validation

Reaction class	Accuracy (%)	Sensitivity (%)	Precision (%)	F1 score (%)	Hamming loss
Oxidoreductases	61.5	64.3	82.8	64.9	0.036
Transferases	56.3	59.6	80.8	60.0	0.09
Hydrolases	63.1	66.3	80.5	66.3	0.052
Lyases	64.4	67.1	77.7	67.1	0.084
Isomerases	65.2	67.7	82.5	67.0	0.089
Ligases	77.7	80.0	84.5	79.9	0.075

All the parameters mentioned above were calculated using multilabel evaluation methodology (see also Tables S3–S8).

Multiclass multilabel performance metrics for the reaction subclass-specific models calculated using stratified random sampling All the parameters mentioned above were calculated using multilabel evaluation methodology (see Tables S9–S14). Multiclass multilabel performance metrics for the reaction subclass-specific models calculated using 5-fold cross-validation All the parameters mentioned above were calculated using multilabel evaluation methodology (see also Tables S3–S8).

Module-3: Construction of complete reaction prediction models

The molecular similarity search was performed against all the predicted reaction subclasses from each of the predicted reaction class. The molecular similarity search measure known as Tanimoto coefficient was calculated, and the subclasses that had substrates with Tanimoto coefficient value of >0.80 were considered further. This step was essential to filter out the false-positive predictions from the previous steps. The k-nearest neighbors (KNN) were identified in the reaction subclasses that qualify the aforementioned criteria using the KNN method. In this approach, the molecular similarity searching and KNN methods together allow for identifying the best structural and chemical homolog of the input molecule. Thus, the complete reactions of all the molecules that qualify the molecular similarity searching and KNN method were assigned to the given molecule. This way all the complete metabolic reactions that could potentially occur to a given molecule were identified.

Module-4: Prediction of reaction centers

The reaction centers on the given molecule for all complete reactions predicted in the abovementioned steps are identified using the RDM (Reaction center, Difference region, and Matched region) patterns database. The RDM patterns are derived from the structure alignments of the substrates and contain the information about the KEGG atom type changes at the reaction center, matched region of the molecules, and the dissimilar region of the molecule (Yamanishi et al., 2009). The KEGG reaction-class pairs (RC pairs) were identified for each of the predicted complete reactions, and the corresponding RDM patterns were extracted from the RDM pattern database constructed in this study. Each of the RDM pattern is applied to the given molecule, and the respective reaction center is identified using the in-house python script while taking into account the atoms types, their valancies, and bonding information about the molecule.

Module-5: Prediction of metabolic enzymes, species, and skin sites

In this module, for each of the predicted complete reaction, the metabolic enzymes that could potentially perform those reactions, the corresponding species that harbors those metabolic enzymes, and all the skin sites that carry those species are retrieved from the in-house constructed skin microbiome specific metabolic database that contains information on 1,094,153 metabolic enzymes from 897 species pangenomes of skin microbiome from 19 skin sites.

The development of SkinBug web server

For easy and user-friendly usage of the SkinBug approach, a user-friendly web server tool was developed and is available at http://metagenomics.iiserb.ac.in/skinbug. On the web server, a user can upload a mol/sdf file of a molecule, or can provide the PubChem ID of a molecule. For a novel molecule, these files can also be prepared by drawing its structure in any of the molecular editor software, e.g., ChemDraw. The complete workflow of the web server-based predictions using SkinBug is mentioned in Figure 6, and the other details of the web server tool are mentioned in Supplementary Text S8.

Figure 6

The prediction steps and complete workflow for processing of any input molecule by the SkinBug web server

The prediction steps and complete workflow for processing of any input molecule by the SkinBug web server The results from the SkinBug tool are displayed as results sections: R1 to R6. All the possible reaction classes are provided in R1; reaction subclass in R2; complete reactions in R3; the RC pair, RDM pattern, and predicted reaction centers in R4; and the predicted skin sites in R5. The complete reaction annotated as a four-digit EC number, corresponding metabolic enzyme, species harboring the enzyme, and the homology parameters for enzyme annotations (percent identity, e-value, query coverage, subject coverage) are mentioned in R6. All the results could also be downloaded as a single text file by clicking on the “Download Results” icon.

Validation of the SkinBug tool

This is the first tool that can predict the enzymatic biotransformation of any biotic or xenobiotic molecule by the skin microbiome, and thus a direct comparison with any existing state-of-the-art tool was not possible. The biological validation of SkinBug was performed on diverse molecules including natural molecules present on skin, cosmetics, pharmaceuticals, etc. that come in regular contact with our skin, and their metabolism is known from the experimental studies (Table 3). A total of 28 diverse molecules selected for which information about their metabolism by human host or microbial species of skin microbiome through experimental studies is available in literature were evaluated by SkinBug and presented as case studies. This validation set also included the examples of molecules that can undergo the aerobic and/or anaerobic metabolism. The comparison of SkinBug predictions was performed with the metabolism information known from previous experimental studies, and information available from three reference databases, namely, EAWAG-BBD (The University of Minnesota Biocatalysis/Biodegradation Database), Transformer (Metabolism of Xenobiotics Database), and SMPDB (The Small Molecule Pathway Database) (Gao et al., 2010; Hoffmann et al., 2014; Jewison et al., 2014). The prediction results for these 28 molecules along with the information from literature are mentioned in Table 3. Some examples from the comparative analysis with the reference databases are provided below.

Table 3

The prediction of metabolism of the 28 selected molecules by SkinBug along with their known metabolism from literature

Sr. No.	Compound	Function	Reaction subclass from literature	Reaction subclass from SkinBug	Enzyme from literature	Enzyme from SkinBug	Skin microbial genus from literature	Skin microbes genus from SkinBug	References
1	Cinnamyl alcohol	Perfuming, Masking	Acting on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; acyltransferases	Alcohol dehydrogenase; cytochrome P450	Cinnamyl alcohol dehydrogenase; acetyl CoA benzyl alcohol acetyltransferase	NA	No genus predicted	(Jäckh et al., 2012; Walker et al., 2013)
2	Leucine	Antistatic, hair conditioning, skin conditioning	Transferring nitrogenous groups	Acting on the CH-NH2 group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; acyltransferases; transferring nitrogenous groups; acting on carbon-nitrogen bonds, other than peptide bonds; intramolecular transferases; forming carbon-oxygen bonds	Branched chain amino acid aminotransferase	Isoleucine N-monooxygenase; valine N-monooxygenase; leucine N-acetyltransferase; branched chain amino acid transferase; leucine transaminase; leucine 2,3-aminomutase; leucine tRNA ligase; etc.	Corynebacterium; Staphylococcus	Zymomonas; Pseudomonas; Burkholderia; Streptomyces; Staphylococcus; Yersinia; Corynebacterium; etc.	(Fredrich et al., 2013; Holeček, 2018)
3	Glycerol	Perfuming, denaturant, humectant, solvent, hair conditioning, viscosity controlling	Acting on the CH-OH group of donors; transferring phosphorus-containing groups	Acting on the CH-OH group of donors; glycosyltransferases; transferring phosphorus-containing groups; acting on ester bonds; glycosylases; carbon-carbon lyases; carbon-oxygen lyases	Glycerol kinase; glycerol dehydrogenase	Glycerol kinase; glycerol dehydrogenase; 1,2-alpha-glucosylglycerol phosphorylase; acylglycerol lipase; glycerol phosphatase; dihydroxy acid dehydratase; etc.	Staphylococcus	Zymomonas; Xylella; Xanthomonas; Gluconobacter; Geobacillus; Staphylococcus; Streptococcus; etc.	(Fredrich et al., 2013; Ruzheinikov et al., 2001; Xue et al., 2017)
4	Benzo(a)pyreneb	Environmental pollutant found in soot, tobacco smoke, diesel exhaust etc.	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen	Cytochrome P450: CYP1A1, monooxygenases, dioxygenases	Naphthalene 1,2-dioxygenase; unspecific monooxygenases; steroid 21-monooxygenase; etc.	Pseudomonas; Micrococcus; Bacillus; etc.	Pseudomonas; Bacillus; Burkholderia; Polaromonas; Ralstonia	(Ahmad and Mukhtar, 2004)
5	Retinoic acid	Antiseborrheic	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the aldehyde or oxo group of donors; Glycosyltransferases	Cytochrome P450: CYP1A1	Unspecific monooxygenase; alkane monooxygenase; retinal dehydrogenase; aldehyde oxidase; glucuronosyl transferase; etc.	NA	Escherichia; Pseudomonas; Alcanivorax; Methylobacterium; Jonesia; etc.	(Ahmad and Mukhtar, 2004)
6	7,12-Dimethylbenz(a)anthracene	Environmental pollutant found in tobacco smoke	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen	Cytochrome P450: CYP1A1, cytochrome P450: CYP1B1	Unspecific monooxygenase; aromatase; estradiol 6-beta monooxygenase	NA	No genus predicted	(Ahmad and Mukhtar, 2004)
7	Vitamin D3	Skin conditioning	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on the CH-CH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen	Cytochrome P450	Vitamin D3,24-hydroxylase; calcidiol 1-monooxygenase; vitamin D1, 25-hydroxylase	NA	Nocardia; Verrucosispora; Stackebrandtia	(Ahmad and Mukhtar, 2004)
8	4-Allylanisole	Perfuming	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; transferring one-carbon groups	Cytochrome P450	4-(hydroxymethyl) benzenesulfonate dehydrogenase; (iso) eugenol O-methyltransferase; trans-anol O-methyltransferase	NA	No genus predicted	(Jäckh et al., 2012)
9	Isoeugenol	Perfuming, masking, drug: local antiseptic and analgesic	Acting on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; transferring one-carbon groups	Cytochrome P450, cytochrome P450: CYP2E1, alcohol dehydrogenase	Alcohol dehydrogenase; isoeugenol synthase; salicylate 1-monooxygenase; 4-hydroxyphenylacetate 3-monooxygenase; catechol O-methyltransferase; etc.	NA	Thermus; Zymomonas; Staphylococcus; Pseudomonas; Bacillus; Acinetobacter; etc.	(Jäckh et al., 2012)
10	4-Phenylenediamine	Hair dyeing	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on other nitrogenous compounds as donors; acting on paired donors, with incorporation or reduction of molecular oxygen; carbon-carbon lyases	Cytochrome P450	Nitrobenzene nitroreductase; histidine decarboxylase	NA	Vibrio; Raoultella; Pseudomonas; Ralstonia; Variovorax; etc.	(Jäckh et al., 2012)
11	Anilineb	Outdoor air, tobacco smoke	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on other nitrogenous compounds as donors; acting on paired donors, with incorporation or reduction of molecular oxygen; transferring one-carbon groups; acyltransferases; glycosyltransferases; transferring sulfur-containing groups; acting on carbon-nitrogen bonds, other than peptide bonds; carbon-carbon lyases; forming carbon-nitrogen bonds	Cytochrome P450	Phenol 2-monooxygenase; azobenzene reductase; arylamine N-acetyltransferase; arylamine glucosyltransferase; amine sulfotransferase; aryl acylamidase; aminobenzoate decarboxylase; gamma-glutamylanilide synthase; etc.	NA	Bacillus; Staphylococcus; Mycobacterium	(Jäckh et al., 2012)
12	Methyl red or azo dyea	Textile dyes, tattoo inks, and cosmetic colorant	Acting on other nitrogenous compounds as donors	Acting on the CH-CH group of donors; acting on other nitrogenous compounds as donors; acting on single donors with incorporation of molecular oxygen (oxygenases); acting on paired donors, with incorporation or reduction of molecular oxygen	Azoreductase (Azo1)	FMN-dependent NADPH azoreductase (Azo1); NAD(P)H-dependent oxidoreductase; 4-(dimethylamino)phenylazoxybenzene reductase; azobenzene reductase	Staphylococcus; Corynebacterium; Micrococcus; Dermacoccus; Kocuria	Staphylococcus; Bacillus	(Stingley et al., 2010)
13	Fluorouracil	Used in actinic keratosis and skin warts conditions	Glycosyltransferases; acting on the CH-CH group of donors	Glycosyltransferase; acting on the CH-CH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen; acting on CH or CH₂ groups; glycosylases; acting on carbon-nitrogen bonds, other than peptide bonds	Orotate phosphoribosyltransferase; uridine phosphorylase; thymidine phosphorylase; dihydropyrimidine dehydrogenase	Orotate phosphoribosyltransferase; uridine phosphorylase; thymidine phosphorylase; dihydropyrimidine dehydrogenase; unspecific monooxygenase	NA	Yersinia; Wolinella; Vibrio; Tolumonas; Shigella; Aeromonas; Xylella; etc.	(Amirfallah et al., 2018; Longley et al., 2003)
14	Propylene glycol	Used as demulcent and in skin lotions and ointments	Acting on the CH-OH group of donors	Acting on the CH-OH group of donors; carbon-carbon lyases; carbon-oxygen lyases	Alcohol dehydrogenases	L-glycol dehydrogenase; aldehyde reductase; glycerol dehydrogenase; lactaldehyde reductase; propanediol dehydratase; etc.	NA	Salmonella; Escherichia; Geobacillus; Citrobacter; Klebsiella; etc.	(Ewaschuk et al., 2005; Kraut and Kurtz, 2008)
15	Hydroquinone	Demelanizing agent	Acting on single donors with incorporation of molecular oxygen; glycosyltransferases	Acting on single donors with incorporation of molecular oxygen (oxygenases); acting on paired donors, with incorporation or reduction of molecular oxygen; glycosyltransferases; transferring alkyl or aryl groups, other than methyl groups; glycosylases; carbon-carbon lyases; etc.	Dioxygenases; quinone oxidoreductase	Catechol 1 2-dioxygenase; hydroquinone 1 2-dioxygenase; toluene 4-monooxygenase; p-benzoquinone reductase; 4-hydroxybenzoate 1-hydroxylase; arylesterase; hydroquinone glucosyltransferase; etc.	NA	Pseudomonas; Bacillus; Escherichia; Acinetobacter; etc.	(McDonald et al., 2001; Zhang et al., 2012)
16	Para-aminobenzoic acid	Used as sunscreen and melanizing agent. Also used in treatment of fibrotic skin disorders	Transferring alkyl or aryl groups, other than methyl groups	Transferring alkyl or aryl groups, other than methyl groups; acting on paired donors, with incorporation or reduction of molecular oxygen; transferring one-carbon groups; carbon-carbon lyases; etc.	Dihydropteroate synthase	Benzoate 1 2-dioxygenase; phenylethanolamine N-methyltransferase; dihydropteroate synthase; aminobenzoate decarboxylase; etc.	NA	Yersinia; Vibrio; Streptococcus; Shigella; Streptomyces; etc.	(Wegkamp et al., 2007)
17	Benzophenone	Used as organic sunscreen and melanizing agent. Also used as a fragrance enhancer	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors	Cytochrome P450 (CYPs)	Nitrilotriacetate monooxygenase	NA	Sorangium; Anabaena; Burkholderia; Sorangium; etc.	(Watanabe et al., 2015)
18	Lindane	Used in scabies and pediculosis skin conditions	Acting on halide bonds; acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on halide bonds; acting on paired donors, with incorporation or reduction of molecular oxygen; etc.	Dehydrochlorinase enzyme	Haloalkane dehalogenase	NA	Sphingobium; Psychrobacter; Phenylobacterium; Caulobacter; etc.	(Macholz and Kujawa, 1985; Nagata et al., 1993; Tanaka et al., 1979)
19	Ethinyl estradiol	Used for the treatment of moderate acne vulgaris (common acne) in females	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors; glycosyltransferases; etc.	Cytochrome P450 (CYPs)	Unspecific monooxygenase; aromatase; 3-beta-hydroxy-delta-5-steroid dehydrogenase; etc.	NA	No genus predicted	(Stimmel et al., 1951; Wang et al., 2004)
20	Docosanol	Used in the treatment of herpes virus infection-caused recurring episodes of small, painful, fluid-filled blisters on the skin	Acting on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on the CH-OH group of donors; acting on paired donors, with incorporation or reduction of molecular oxygen	NA	Hexadecanol dehydrogenase; long-chain-alcohol dehydrogenase; naphthalene 1 2-dioxygenase; etc.	NA	Geobacillus; Zymomonas; Staphylococcus; Bacillus; etc.	(Pope et al., 1996)
21	Lauric acid	Used on skin for its antibacterial properties and ability to effectively combat acne	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the aldehyde or oxo group of donors	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the aldehyde or oxo group of donors; acting on ester bonds	Fatty acid beta-oxidation enzymes such as acyl CoA dehydrogenase, hydrolase, etc.	Trimethyllysine dioxygenase; aldehyde dehydrogenase; oleoyl-(acyl-carrier-protein) hydrolase; clavaminate synthase; etc.	NA	Bacillus; Mycobacterium; Escherichia; Thermus; etc.	(Dayrit, 2015)
22	Methyl lactateb	Used as a soothing and cooling agent for skin. Also to provide relief against itching and irritation on skin	Acting on the CH-OH group of donors	Acting on the CH-OH group of donors; acting on the aldehyde or oxo group of donors; acyltransferases; etc.	Lactate dehydrogenase	L-lactate dehydrogenase; glyoxylate reductase; oxaloglycolate reductase (decarboxylating); pyruvate synthase; formate C-acetyltransferase; etc.	Staphylococcus; Micrococcus; etc.	Staphylococcus; Vibrio; Micrococcus; Shewanella; etc.	(Gladden, 2004; Lam et al., 2018)
23	Methylparabenb	Used in cosmetic products as preservative to give products a longer shelf life	Glycosyltransferases; acting on ester bonds; acting on paired donors, with incorporation or reduction of molecular oxygen	Glycosyltransferases; acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors; etc.	Glucuronosyltransferase	Cyanohydrin beta-glucosyltransferase; 4-hydroxyphenylacetaldehyde oxime monooxygenase; 4-hydroxybenzoate 1-hydroxylase; 3-phenylpropanoate dioxygenase; etc.	NA	Pseudomonas; Escherichia; Bacillus; Mycobacterium; etc.	(Abbas et al., 2010; Moos et al., 2016)
24	Triclosan	Used as antiseptic and antibacterial on the skin	Acting on paired donors, with incorporation or reduction of molecular oxygen; glycosyltransferases	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on single donors with incorporation of molecular oxygen (oxygenases); acting on the CH-CH group of donors	Sulfur oxygenase/reductase; cytochrome P450 (CYPs); glucuronosyltransferase	Sulfur oxygenase/reductase; carbazole 1,9a-dioxygenase; naphthalene 1,2-dioxygenase; persulfide dioxygenase; etc.	NA	Pseudomonas; Bacillus; Streptococcus; Yersinia; etc.	(Fang et al., 2016; Wang et al., 2018)
25	Terbinafineb	Used to treat the fungal infections on skin	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen	Cytochromes (CYPs)	Ammonia monooxygenase	NA	Nitrosomonas; Nitrosospira	(Vickers et al., 1999)
26	Alpha-tocopherolb	Used in cosmetic products to prevent UV damage to the skin	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors; transferring one-carbon groups	Cytochromes (CYPs)	Validamycin A dioxygenases; aurachin C monooxygenase/isomerase; calcidiol 1-monooxygenase; thymidylate synthase; etc.	Deinococcus; Stenotrophomonas	Deinococcus; Stenotrophomonas; Zymomonas; Yersinia; etc.	(Johnson et al., 2013; Timm et al., 2020)
27	Cholesterolb	Used as an emollient in cosmetic products such as eye makeup, face makeup, skin lotions, creams, and hair care formulations	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on the CH-OH group of donors; acting on the CH-CH group of donors; intramolecular oxidoreductases	Sterol 27-hydroxylase	Cholesterol 25-hydroxylase; vitamin D 25-hydroxylase; 11-beta-hydroxysteroid dehydrogenase; 3-beta-hydroxysteroid 3-dehydrogenase; etc.	Streptococcus; Bacillus; Aerococcus	Streptococcus; Bacillus; Mycobacterium; Aeromonas; etc.	(Iuliano, 2011; Timm et al., 2020)
28	Linoleic acid	Used in the conditions of skin irritation and to reduce acne breakouts	Acting on paired donors, with incorporation or reduction of molecular oxygen	Acting on paired donors, with incorporation or reduction of molecular oxygen; acting on single donors with incorporation of molecular oxygen (oxygenases); acting on ester bonds	Lipoxygenase; delta-6-desaturase	Linoleate 13S-lipoxygenase; linoleate 11-lipoxygenase; acyl-CoA 6-desaturase; palmitoyl-CoA hydrolase; etc.	Paracoccus; Stenotrophomonas; Brevundimonas; Staphylococcus; etc.	Paracoccus; Stenotrophomonas; Brevundimonas; Staphylococcus; etc.	(Brown et al., 2000; Gardner, 1970; Timm et al., 2020)

Threshold for the subclass prediction models of “Oxidoreductases” was set from 0.5 to 0.1, and for “Transferases,” “Hydrolases,” and “Lyases” the thresholds were set from 0.5 to 0.2. The Tanimoto coefficient threshold was set to 0.5.

Tanimoto coefficient threshold was set to 0.5 or 0.3.

The prediction of metabolism of the 28 selected molecules by SkinBug along with their known metabolism from literature Threshold for the subclass prediction models of “Oxidoreductases” was set from 0.5 to 0.1, and for “Transferases,” “Hydrolases,” and “Lyases” the thresholds were set from 0.5 to 0.2. The Tanimoto coefficient threshold was set to 0.5. Tanimoto coefficient threshold was set to 0.5 or 0.3. For the metabolism of hydroquinone molecule (used as a demelanizing agent) SkinBug predicted monooxygenases, dioxygenases, hydroxylases, and arylesterases, and the reference databases also indicated the same four classes of enzymes for its metabolism. Likewise, for terbinafine (used to treat fungal infections on skin) and fluorouracil (used in actinic keratosis and skin warts conditions), the reference databases suggested their metabolism by human liver cytochromes, and SkinBug also predicted metabolic enzymes from skin microbiome with similar activity such as monooxygenases. For para-aminobenzoic acid, which is used as melanizing agent and for the treatment of different skin disorders, both SkinBug and reference databases suggested benzoate dioxygenases for its metabolism. Similarly, for methylparaben (used in cosmetic products), which is metabolized by 4-hydroxybenzoate 1-hydroxylase and monooxygenases as per the reference databases, SkinBug also predicted the same enzymes from skin microbiome for its metabolism. It is noteworthy that for some molecules such as cinnamyl alcohol, BaP, fluorouracil, and 7,12-dimethylbenz(a)anthracene molecules, SkinBug only predicted those reactions that are known in literature without any false-positives confirming to its robustness. However, the strength of SkinBug lies in the fact that for the other molecules as mentioned in Table 3, in addition to correctly predicting the known reactions it also predicted additional metabolic reactions, which seems correct as per the reactive functional groups present on those molecules, along with the implicated microbial enzymes and species. Thus, the predictions of SkinBug corroborated with the metabolic information available on these reference databases and experimental studies (Table 3), confirming to the accuracy and reliability of the predictions.

Metabolism of Benzo(a)pyreme by skin microbiome

BaP is an abundant environmental pollutant found in almost all types of soot and smoke generated by the incomplete combustion of fossil fuel, coal, and other biomass including tobacco. The metabolism of this polycyclic aromatic hydrocarbon molecule by oxidation reaction through cytochrome P450 enzyme from human host and monooxygenases and dioxygenase enzymes from bacterial species of skin microbiome is known from experimental studies, and reference databases such as the University of Minnesota Biocatalysis/Biodegradation Database (Gelboin, 1980; Gibson et al., 1975; Jiang et al., 2007; Schwarz et al., 2001; Sowada et al., 2014). For this molecule, SkinBug correctly predicted its metabolism by skin microbiome as shown by an experimental study from Sowada et al., 2014, and predicted its oxidation by naphthalene 1,2-dioxygenase and monooxygenase enzymes from multiple species of Burkholderia, Polaromonas, Pseudomonas, and Ralstonia genera from the skin microbiome. This case study further supports the validity, accuracy, and utility of SkinBug tool. The ingestion of BaP is known to be toxic due to its metabolism by human cytochrome P450 enzyme and the products of its oxidation are known to cause carcinogenicity and other adverse health effects by reacting with the host DNA (Gibson et al., 1975; Vo-Dinh et al., 1987; Zhou et al., 2017). Thus considering the previous findings and prediction results, it is apparent that its contact with skin could also be toxic due to its undesired metabolism by the skin microbiome. Similarly, the prediction of aniline metabolism that is largely present in the tobacco smoke reveals that this can also be metabolized by the microbial species present in the skin microbiome.

Metabolism of Azo dyes by skin microbiome

Azo dyes such as MR are widely used in cosmetics and other products that regularly come in contact with our skin in the form of tattoo ink, hair colors, and textile colors. These dyes are experimentally known to be reduced by azoreductase (Azo1) enzyme found in several species of the skin microbiome (Stingley et al., 2010). SkinBug also predicted the reduction of these dyes by the same metabolic enzyme and species and also predicted some other metabolic enzymes from different bacterial species of skin microbiome with azoreductase activity that could also potentially metabolize the azo dyes.

Discussion

The skin microbiome is capable of undertaking numerous metabolic reactions in addition to that of the human genome (Stingley et al., 2010). The regular contact of skin microbiome with cosmetic agents, pollutants, and topical substances like skincare products and medical ointments, etc., which are part of the skin exposome, creates unprecedented possibilities of their promiscuous metabolism that results in modulation of their efficacy, and occasional toxicity associated with skin rashes and cancer (Chung, 1983; Stingley et al., 2010). To predict all the possible metabolic reactions that can occur to such chemical substances by our skin microbiome, along with the information on respective reaction centers, metabolic enzymes, microbial species carrying these enzymes, and also the skin sites harboring these species, “SkinBug” is proposed to be a unique, reliable and user-friendly tool. One of the key contributions of this study is the construction of the skin microbiome-specific metabolic database using the pangenomes instead of individual genomes that helped in incorporating the metabolic potential of all the strains of a given species. Furthermore, we used only the high-quality pangenomes of 897 species that helped in the construction of the first comprehensive metabolic enzyme database of skin microbiome, which is the integral part of the SkinBug tool for making reliable metabolic predictions. The inclusion of information on bacterial species from 19 different skin sites and various skin-specific niches including anaerobic/aerobic niches and carbon- and sulfur-rich/-limiting niches in the skin microbiome database was very important for comprehensive training and achieving higher prediction accuracy. It was observed that the skin microbiome database had a higher representation of aerobic species because the metagenomic studies so far have been carried out majorly for the aerobic niches than the anaerobic niches due to the difficulty associated with anaerobic niches in sampling, isolation, and cultivation (Sfriso et al., 2020). Another key aspect of SkinBug is the inclusion of all the well-annotated and manually curated reactions of the KEGG database and their primary substrates for the construction of training set, analysis, and modeling. Likewise, the manual curation of database entries for the selected bacterial species and inclusion of only the complete bacterial genomes from NCBI RefSeq database, along with the usage of very strict thresholds, helped in the exclusion of any false-positives that often lead to errors in the predictions. Furthermore, SkinBug exploits the structural and chemical properties of substrates using chemical descriptors, linear fingerprints, and circular fingerprints, and thus is able to predict all the possible reactions for the given molecule. Therefore, it is also well-equipped to address the cases of enzymatic moonlighting and promiscuity because any additional substrate (for additional metabolic reaction) in such cases will also need to have similar structural and chemical properties as the original substrate. These features ensure the accuracy and wide applicability of the SkinBug tool for metabolic prediction of diverse biotic and xenobiotic molecules. The tool has a modular structure, which helps in easy updates of databases and models. Due to the different levels of secretions, environmental exposure, and topography of skin sites, microbiome variability and the consequential metabolic variability across sites are expected that was also observed in this study. Of the 2,523 unique reactions that were found to be occurring in skin microbiome, only 277 were common to all the sites and the rest were specific to the different skin sites. The results pointed toward the need for site-specific metabolic prediction for skin microbiome, and this is one of the key features of SkinBug. It was apparent from the hierarchical clustering of skin sites that the sites with similar physical and physiological properties clustered together. The exposed, dry, and desiccated skin sites such as the arm, foot, and forearm clustered together as they have similar physical and physiological properties such as moisture level, pH, and environmental exposure. Similarly, the sites with high amount of secretions such as diseased sites, face, and skin surface invaginations clustered together. Likewise, the groin and retroauricular crease sites are similar in terms of moisture levels and secretions, and pressure ulcer and venous leg ulcer are both the diseased sites with similar physiology, thus their clustering was also expected and observed in this study. Furthermore, it was intriguing to observe that cheek, perineum, and inner wrist clustered together. The clustering of cheek and perineum was expected because both sites have a lot of secretions by glands such as sebaceous, sweat, and hepatoid, and thus may harbor similar microbiome, which was also reported by other studies (Grice and Segre, 2011). However, the clustering of inner wrist, which is a desiccated and dry site, with check and perineum was surprising. The plausible explanation for it emerges from the recent studies which reported that the dry and desiccated sites harbor a highly dynamic flora, and a phylogenetic diversity that is even higher than the diversity of gut and oral cavity microbiomes of the same individual (Costello et al., 2009; Grice and Segre, 2011). In summary, it was evident that the bacterial species and the corresponding metabolic reactions observed at the different skin sites associates with the physiological and physical properties of the skin sites such as topographical location, different secretions, moisture levels, and environmental exposure (Grice et al., 2009; Grice and Segre, 2011). However, there were some key challenges during the development of SkinBug. The first one was to train the multiclass multilabel classification models from the limited dataset of well-annotated reactions with sufficient accuracy. To overcome this challenge, the integration of both the machine learning and neural network methods, selection of optimum algorithms, and rigorous optimization of selected models helped to achieve the high multiclass multilabel classification accuracy. The highly skewed distribution of metabolic reactions in the six reaction classes and their subclasses constituted the second challenge for optimal training of models. To tackle this challenge, the usage of a modified version of the stratified random sampling that included down-sampling, in addition to the standard stratified random sampling method, helped in effectively splitting of the data into training and testing for efficient predictive modeling. The third challenge was the hierarchical nature of the complete reactions annotated as the four-digit EC number, for which the prediction strategies were also needed to be hierarchical and accurate at each step of the prediction to achieve accurate four-digit EC number prediction. To overcome this challenge, an integrated approach employing machine learning and neural network models to predict the reaction class, followed by construction of six different machine learning models to predict the reaction subclasses, and a molecular similarity search-guided k-nearest neigbors method to predict the complete reactions were incorporated into SkinBug. Comprehensive validation of a prediction tool is important to ensure reliable and accurate predictions, which in the case of SkinBug was carried out using a diverse set of molecules (cosmetic, pharmaceutical, pollutants, natural molecules present on skin, etc.) that includes the cases of aerobic and anaerobic metabolism at various skin sites including the dry, moist, and sebaceous niches. The results of validation were well supported by the experimental studies and reference databases. The accuracy of the predictions was also exemplified by the selected case studies of BaP and azo dyes where SkinBug predicted the correct metabolic reactions, enzymes, and microbial species as known from the experimental studies and reference databases (Gibson et al., 1975; Jiang et al., 2007; Schwarz et al., 2001; Sowada et al., 2014; Zhou et al., 2017). Taken together, the robust methodology, training, comprehensive validations, and their biological significance make SkinBug a very useful, accurate, and reliable method to predict the metabolic reaction, enzyme, microbial species, and the specific skin site for any given molecule. However, for the occurrence of a metabolic reaction, the xenobiotic/biotic molecule needs to come in close vicinity of the active enzyme along with other favorable reaction conditions, and it is tempting to question if the predicted reaction will really occur given these conditions. The plausible biological reasoning to answer this emerges from the fact that bacterial species have an abundance of transporters such as outer membrane-associated β-barrel-containing proteins or porins, which may allow for the transport of biotic/xenobiotic molecules, and thus will facilitate their metabolism by the bacteria. One such example is the TonB-dependent transport system where the outer membrane-associated TonB-dependent transporter (TBDT), and other similar ATP-driven influx transporters, facilitate the active transport and subsequent metabolism of xenobiotic compounds by bacterial cells (Jindal et al., 2019; Samantarrai et al., 2020). The experimental supports to these processes are provided by a few recent studies that showed the metabolism of different biotic/xenobiotic molecules such as BaP, glycoholic acid, cholesterol, glycerol, azo dyes, arginine, triglyceride lipids, propylene glycol, palmitic acid, alpha-tocopherol, uric acid, lactic acid, ethanol amine, and linolenic acid on their incubation with the bacterial species of the skin microbiome (Sowada et al., 2014; Stingley et al., 2010; Timm et al., 2020). The case studies along with biological validations confirm the accuracy and reliability of SkinBug with strong control over false-positives and false-negatives, highlight the potential of SkinBug in revealing the metabolic consequences of undesired metabolism by our skin microbiome, and provide leads for further experimental studies. It is also anticipated to have applications in skin microbiome and exposome studies, in the development of novel diagnostic and therapeutic approaches in dermatology, and in cosmetics industry for developing safer, more effective and population-specific products.

Limitations of the study

The availability of complete and annotated genomes of bacterial species for skin microbiome is one of the limiting factors that determines the size and comprehensiveness of the metabolic database for the skin microbiome. Similarly, machine learning and neural network models are trained on available metabolic reaction information, which currently has a lower representation of Isomerases and Ligases reaction classes. Thus, the availability of more information on genomes and metabolic reactions in the reference databases will further strengthen and broaden the applicability of SkinBug.

Conclusion

Our skin comes in regular contact with many chemical molecules due to pollution or usage of skin care products, thus the prediction and evaluation of metabolism of these molecules by skin microbiome species was much needed for which SkinBug is a valuable contribution. The integrated approach of using machine learning, neural networks, and chemoinformatics along with the database of metabolic enzymes for skin microbiome species constructed in this study helped in prediction of all the possible metabolic reactions that can occur to a given molecule, their respective reaction centers, the metabolic enzymes that can perform the predict reactions, species that harbors these metabolic enzymes, and the skin sites that carry these species. The case studies along with biological validations presented here attest to the accuracy, applicability, and potential of SkinBug. This first state-of-the-art tool to predict the metabolism of a given molecule by the skin microbiome will be very useful for the development of novel therapeutic approaches in dermatology and cosmetics and will also provide leads for future experimental studies.

Resource availability

Lead contact

Further information, requests, and inquiries should be directed to the Lead Contact, Vineet K. Sharma (vineetks@iiserb.ac.in).

Materials availability

This study did not generate new data and only used the data available in public databases mentioned in the manuscript text.

Data and code availability

The published article includes all data used or analyzed during this study.

Methods

All methods can be found in the accompanying Transparent methods supplemental file.

71 in total

Review 1. Enzyme promiscuity: a mechanistic and evolutionary perspective.

Authors: Olga Khersonsky; Dan S Tawfik
Journal: Annu Rev Biochem Date: 2010 Impact factor: 23.643

2. LSTM: A Search Space Odyssey.

Authors: Klaus Greff; Rupesh K Srivastava; Jan Koutnik; Bas R Steunebrink; Jurgen Schmidhuber
Journal: IEEE Trans Neural Netw Learn Syst Date: 2016-07-08 Impact factor: 10.451

Review 3. The human skin microbiome.

Authors: Allyson L Byrd; Yasmine Belkaid; Julia A Segre
Journal: Nat Rev Microbiol Date: 2018-01-15 Impact factor: 60.633

Review 4. Lactate metabolism: a new paradigm for the third millennium.

Authors: L B Gladden
Journal: J Physiol Date: 2004-05-06 Impact factor: 5.182

5. Removal and metabolism of triclosan by three different microalgal species in aquatic environment.

Authors: Shujuan Wang; Karen Poon; Zongwei Cai
Journal: J Hazard Mater Date: 2017-09-05 Impact factor: 10.588

6. The involvement of CYP3A4 and CYP2C9 in the metabolism of 17 alpha-ethinylestradiol.

Authors: Bonnie Wang; Rosa I Sanchez; Ronald B Franklin; David C Evans; Su-Er W Huskey
Journal: Drug Metab Dispos Date: 2004-08-10 Impact factor: 3.922

7. The Transformer database: biotransformation of xenobiotics.

Authors: Michael F Hoffmann; Sarah C Preissner; Janette Nickel; Mathias Dunkel; Robert Preissner; Saskia Preissner
Journal: Nucleic Acids Res Date: 2013-12-10 Impact factor: 16.971

Review 8. Revealing the secret life of skin - with the microbiome you never walk alone.

Authors: R Sfriso; M Egert; M Gempeler; R Voegeli; R Campiche
Journal: Int J Cosmet Sci Date: 2019-12-25 Impact factor: 2.970

9. Isolation and characterization of diverse microbial representatives from the human skin microbiome.

Authors: Collin M Timm; Kristin Loomis; William Stone; Thomas Mehoke; Bryan Brensinger; Matthew Pellicore; Phillip P A Staniczenko; Curtisha Charles; Seema Nayak; David K Karig
Journal: Microbiome Date: 2020-04-22 Impact factor: 14.650

10. The University of Minnesota Biocatalysis/Biodegradation Database: improving public access.

Authors: Junfeng Gao; Lynda B M Ellis; Lawrence P Wackett
Journal: Nucleic Acids Res Date: 2009-09-18 Impact factor: 16.971

4 in total

1. SKIOME Project: a curated collection of skin microbiome datasets enriched with study-related metadata.

Authors: Giulia Agostinetto; Davide Bozzi; Danilo Porro; Maurizio Casiraghi; Massimo Labra; Antonia Bruno
Journal: Database (Oxford) Date: 2022-05-16 Impact factor: 4.462

Review 2. Biocosmetics: technological advances and future outlook.

Authors: Nishu Goyal; Frankline Jerold
Journal: Environ Sci Pollut Res Int Date: 2021-11-25 Impact factor: 5.190

3. Commensal-Related Changes in the Epidermal Barrier Function Lead to Alterations in the Benzo[a]Pyrene Metabolite Profile and Its Distribution in 3D Skin.

Authors: Lisa Lemoine; Dilan Bayrambey; Alexander Roloff; Christoph Hutzler; Andreas Luch; Tewes Tralau
Journal: mBio Date: 2021-09-28 Impact factor: 7.867

Review 4. Exposome and Skin: Part 1. Bibliometric Analysis and Review of the Impact of Exposome Approaches on Dermatology.

Authors: Manuel Molina-García; Corinne Granger; Carles Trullàs; Susana Puig
Journal: Dermatol Ther (Heidelb) Date: 2022-02-03

4 in total