| Literature DB >> 26154165 |
Alejandra González-Beltrán1, Peter Li2, Jun Zhao3, Maria Susana Avila-Garcia4, Marco Roos5, Mark Thompson5, Eelke van der Horst5, Rajaram Kaliyaperumal5, Ruibang Luo6, Tin-Lap Lee7, Tak-Wah Lam6, Scott C Edmunds2, Susanna-Assunta Sansone1, Philippe Rocca-Serra1.
Abstract
MOTIVATION: Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler.Entities:
Mesh:
Year: 2015 PMID: 26154165 PMCID: PMC4495984 DOI: 10.1371/journal.pone.0127612
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Results from reproducing Table 2 of the original paper, where the original results are shown in between parenthesis.
|
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| ||
| S. aureus | SOAPdenovo1 | 79 | 148.6 | 156 | 23 | 49 | 342 | 0 | 342 |
| SOAPdenovo2 | 80 | 98.6 | 25 | 71.5 | 38 | 1086 | 2 | 1078 | |
| ALLPATHS-LG | 37 | 149.7 | 13 |
|
| 1477 | 1 | 1093 | |
| R. sphaeroides | SOAPdenovo1 |
| 3.5 |
| 2.8 | 956 |
|
|
|
| SOAPdenovo2 | 721 | 18 | 106 | 14.1 | 333 | 2549 | 4 | 2540 | |
| ALLPATHS-LG | 190 | 41.9 |
| 36.7 | 32 | 3191 | 0 |
|
Fig 1A graphical representation showing the role of ISA, Nanopublication and Research model in progressively structuring experimental information, moving from hand written notes in laboratory books to semi-structure tab-delimited files and fully explicit linked data.
Fig 2Another view of the complementary aspects of these research object models, highlighting the reliance of persistent identifiers (such as ORCID), and references to Galaxy workflows hosted on GigaScience Servers.
Fig 3A detailed view showing how linked data representation of an ISA experiment (upper pane, in green background), with one of the finding expressed as Nanopublication statements, (lower pane), where a red outline indicates the key statement.
Predictor and response variables for the SOAPdenovo2 study, as identified in the ISA-TAB documents.
|
|
|
|
|---|---|---|
|
| genome assembly algorithm (OBI:0001522) | ALLPATH-LG |
| SOAPdenovo1 | ||
| SOAPdenovo2 | ||
| genome size (PATO:0000117) | small | |
| medium | ||
| large | ||
|
| genome coverage | |
| computation run time | ||
| memory consumption |