| Literature DB >> 35845844 |
Zhen Xu1, Sergio Escalera2, Adrien Pavão3, Magali Richard4, Wei-Wei Tu1, Quanming Yao5, Huan Zhao1, Isabelle Guyon3,6.
Abstract
Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.Entities:
Keywords: benchmark platform; competitions; data science; machine learning; reproducibility
Year: 2022 PMID: 35845844 PMCID: PMC9278500 DOI: 10.1016/j.patter.2022.100543
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Comparison of various reproducible science platforms
| Platform | Flexibility | Easy to use | Reproducibility | |||||
|---|---|---|---|---|---|---|---|---|
| Bundle | Result/code submit | Dataset submit | Easy creation | Open source/free | API access | Compute queue | ||
| Kaggle | ✗ | ✓ | ✗ | ✓ | ✗ | ✓ | ✓ | ✓ |
| Tianchi | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ | ✓ |
| CodaLab | ✓ | ✓ | ✗ | ✓ | ✓ | ✗ | ✓ | ✓ |
| UCI | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ |
| OpenML | ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| PapersWithCode | ✗ | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ | ✓ |
| DAWNBench | ✗ | ✓ | ✗ | ✗ | ✓ | ✗ | ✗ | ✓ |
| Codabench | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Different features are introduced in the section key features of Codabench. Bundle means whether a wrap up is provided for a benchmark such that we could reuse or share. Result/code/dataset submit means whether different submissions are supported to enable flexible tasks. Compute queue means where public or private computation resources could be provided or linked for convenient deployment.
Figure 1Overview of Codabench
A meta-benchmark platform has three types of contributors: platform developers (in yellow), benchmark organizers (in green), and benchmark participants (in red). Codabench is at the meta level to support diverse benchmarks. Each benchmark is implemented by a benchmark bundle that contains one or more tasks.
Figure 2Operational Codabench workflow
Left: task module specified by the benchmark organizers, executed on the platform. Right: web interface with participants permitted to make submissions and retrieve results. Numerated blocks are specified by the benchmark organizers. They include (1) a scoring module, (2) an ingestion module, (3) and public information. An intermediate block also exists for information exchange of time budget, scoring, input data, ground-truth data, and predictions. Red bottom-right block: participant prepares a submission “z” uploaded to the platform. The submission is then executed by the ingestion program. The role of the scoring program is to produce scores that are then displayed on the leaderboard.
Figure 3Use-case illustrations
Four use cases are introduced: (A) AutoGraph, (B) DECONbench, (C) COMETH, and (D) job scheduling. The use-case details are introduced in the section Use cases of Codabench.
Figure 4Bundle structure
The details of benchmark.yaml is given in Data S1.