| Literature DB >> 34307859 |
Daniel Silva Junior1, Esther Pacitti2, Aline Paes1, Daniel de Oliveira1.
Abstract
Scientific Workflows (SWfs) have revolutionized how scientists in various domains of science conduct their experiments. The management of SWfs is performed by complex tools that provide support for workflow composition, monitoring, execution, capturing, and storage of the data generated during execution. In some cases, they also provide components to ease the visualization and analysis of the generated data. During the workflow's composition phase, programs must be selected to perform the activities defined in the workflow specification. These programs often require additional parameters that serve to adjust the program's behavior according to the experiment's goals. Consequently, workflows commonly have many parameters to be manually configured, encompassing even more than one hundred in many cases. Wrongly parameters' values choosing can lead to crash workflows executions or provide undesired results. As the execution of data- and compute-intensive workflows is commonly performed in a high-performance computing environment e.g., (a cluster, a supercomputer, or a public cloud), an unsuccessful execution configures a waste of time and resources. In this article, we present FReeP-Feature Recommender from Preferences, a parameter value recommendation method that is designed to suggest values for workflow parameters, taking into account past user preferences. FReeP is based on Machine Learning techniques, particularly in Preference Learning. FReeP is composed of three algorithms, where two of them aim at recommending the value for one parameter at a time, and the third makes recommendations for n parameters at once. The experimental results obtained with provenance data from two broadly used workflows showed FReeP usefulness in the recommendation of values for one parameter. Furthermore, the results indicate the potential of FReeP to recommend values for n parameters in scientific workflows.Entities:
Keywords: Machine Learning; Preference Learning; Recommender systems; Scientific workflows
Year: 2021 PMID: 34307859 PMCID: PMC8279147 DOI: 10.7717/peerj-cs.606
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Related types of recommender systems taxonomy.
Figure 2Votes example that each candidate received in voters preference order.
Figure 3A synthetic workflow: circles represent activities, arrows between the circles represent the link between activities (data dependencies), and the labels for each circle represent the configuration parameters for each activity.
Figure 4FReeP architecture overview.
Naive FReeP-Discrete.
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: |
| 12: |
Figure 5Example of FReeP’s partitioning rules generation for Sciphy provenance dataset using user’s preferences.
Figure 6Example of FReeP’s horizontal filter using one partitioning rule for the Sciphy provenance dataset.
Figure 7FReeP’s vertical filter step.
Enhanced FReeP.
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| 6 |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: |
| 12: |
| 14: |
Figure 8Generic FReeP architecture overview.
Generic FReeP.
| N :number of random sequences orders to be generated |
| 1: |
| 2: |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: |
| 12: |
| 13: |
| 14: |
| 15: |
| 16: |
| 17: |
Figure 9The abstract specification of (a) SciPhy and (b) Montage.
Dataset characteristics.
| Dataset | Total records | Total attributes | Categorical attributes | Numerical attributes |
|---|---|---|---|---|
| 376 | 6 | 2 | 4 | |
| 1,565 | 8 | 2 | 6 |
SciPhy dataset statistics.
| Parameter | Minimum value | Maximum value | Standard deviation |
|---|---|---|---|
| num_aligns | 9.00 | 11.00 | 0.21 |
| length | 85.00 | 1,039.00 | 169.90 |
| prob1 | 634.67 | 5,753.52 | 1,103.43 |
| prob2 | 635.87 | 5,795.28 | 1,101.76 |
Montage dataset statistics.
| Parameter | Minimum value | Maximum value | Standard deviation |
|---|---|---|---|
| cntr | 0.00 | 134.00 | 35.34 |
| ra | 83.12 | 323.90 | 91.13 |
| dec | −27.17 | 28.85 | 17.90 |
| crval1 | 83.12 | 323.90 | 91.13 |
| crval2 | −27.17 | 28.85 | 17.90 |
| crota2 | 0.00 | 360.00 | 178.64 |
Figure 10Datasets attributes correlation matrices.
Figure 11Precision results with SciPhy data.
Figure 12Recall results with SciPhy data.
Figure 13Experiment recommendation execution time with SciPhy data.
Figure 14Precision results with Montage data.
Figure 15Recall results with Montage data.
Figure 16Experiment recommendation execution time with Montage data.
Algorithm 2 values per parameter used in Experiment 2.
| Classifiers | Regressors | Partition strategy | Percentage |
|---|---|---|---|
| KNN | Linear Regression | PCA | 30 |
| SVM | KNR | ANOVA | 50 |
| Multi-Layer Perceptron | SVR | 70 | |
| Multi-Layer Perceptron |
Figure 17Precision results with Sciphy data.
Figure 18Recall results with Sciphy data.
Figure 19Experiment recommendation execution time with Sciphy data.
Figure 20MSE results and recommendation execution time with Sciphy data.
Figure 21Precision results with Montage data.
Figure 22Recall results with Montage data.
Figure 23Experiment recommendation execution time with Montage data.
Figure 24MSE results and recommendation execution time with Montage data.
Experiment 3 results with Sciphy dataset.
| Classifier | Regressor | Partitioning strategy | Precision | Recall | Failures | |
|---|---|---|---|---|---|---|
| KNN 5 | KNR 5 | ANOVA 50 | 0.0 | 1.0 | 1.0 | 6 |
| KNN 5 | KNR 7 | ANOVA 50 | 0.0 | 1.0 | 1.0 | 6 |
| KNN 5 | SVR | ANOVA 50 | 1.1075 | 1.0 | 1.0 | 6 |
| KNN 7 | KNR 5 | ANOVA 50 | 4,279.2240 | 1.0 | 1.0 | 5 |
| KNN 7 | KNR 7 | ANOVA 50 | 0.0 | 1.0 | 1.0 | 5 |
| KNN 7 | SVR | ANOVA 50 | 0.444 | 1.0 | 1.0 | 5 |
| SVM | KNR 5 | ANOVA 50 | 1,148.1876 | 0.75 | 0.75 | 6 |
| SVM | KNR 7 | ANOVA 50 | 0.0 | 1.0 | 1.0 | 7 |
| SVM | SVR | ANOVA 50 | 0.0 | 1.0 | 1.0 | 7 |
Comparison between FReeP and related work.
| Approach | Domain | Search space | Considers dependencies | Requires execution | Life-cycle phase |
|---|---|---|---|---|---|
| General | All | No | Yes | Execution | |
| General | Pruned | No | Yes | Execution | |
| General | Pruned | No | Yes | Execution | |
| General | Pruned | No | Yes | Execution | |
| General | Pruned | No | Yes | Execution | |
| Workflow | N/A | No | No | Composition | |
| Workflow | N/A | No | No | Composition | |
| Workflow | N/A | No | No | Composition | |
| Workflow | N/A | No | No | Composition | |
| Workflow | N/A | Yes | No | Composition | |
| Workflow | N/A | Yes | No | Composition | |
| Workflow | N/A | N/A | N/A | Composition/Execution | |
| Workflow | N/A | N/A | N/A | Composition/Execution | |
| Workflow | N/A | N/A | No | Analysis | |
| Workflow | N/A | N/A | No | Analysis | |