| Literature DB >> 22423044 |
Pablo Meyer1, Julia Hoeng, J Jeremy Rice, Raquel Norel, Jörg Sprengel, Katrin Stolle, Thomas Bonk, Stephanie Corthesy, Ajay Royyuru, Manuel C Peitsch, Gustavo Stolovitzky.
Abstract
MOTIVATION: Analyses and algorithmic predictions based on high-throughput data are essential for the success of systems biology in academic and industrial settings. Organizations, such as companies and academic consortia, conduct large multi-year scientific studies that entail the collection and analysis of thousands of individual experiments, often over many physical sites and with internal and outsourced components. To extract maximum value, the interested parties need to verify the accuracy and reproducibility of data and methods before the initiation of such large multi-year studies. However, systematic and well-established verification procedures do not exist for automated collection and analysis workflows in systems biology which could lead to inaccurate conclusions.Entities:
Mesh:
Year: 2012 PMID: 22423044 PMCID: PMC3338013 DOI: 10.1093/bioinformatics/bts116
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Additional information for the eight community-based efforts described in the paper. The last row describes other efforts not discussed in the main text
| Name | Domain and Regularity | Website |
|---|---|---|
| KDD Cup | Knowledge discovery and machine learning in various domains. | |
| Knowledge Discovery and Data Mining. Every year since launch in 1997. | ||
| InnoCentive | The name mixes Innovation and Incentive. | |
| Crowd-sourcing for problems of commercial interest. | ||
| Founded in 2001. New challenges are released on a rolling schedule. | ||
| Netflix Prize | The name comes from the sponsoring company, Netflix. | |
| Prediction of user ratings for films, based on previous ratings. | ||
| Only challenge so far, released in 2006, lasted 3 years to complete. | ||
| CASP | Critical Assessment of Techniques for Protein Structure Prediction. | |
| Protein 3D structure prediction assessment. | ||
| Every 2 years since 1994. | ||
| CAPRI | Critical Assessment of PRedicted Interactions. Assessment of predictions of | |
| protein–protein docking or protein–DNA interaction from 3D structure. | ||
| Goes by Round 22 since 2001. Starts whenever an experimentalist offers an adequate target. | ||
| Predicted structures are submitted 6–8 weeks later. | ||
| DREAM | Dialogue for Reverse Engineering Assessments and Methods. | |
| Assessment of quantitative modeling in systems biology. | ||
| Every year since 2006. | ||
| BioCreAtIve | Assessment of Information Extraction Systems in Biology. Evaluating text mining | |
| and information extraction systems applied to the biological literature. | ||
| Every 2 years beginning in 2004. | ||
| FlowCAP | Flow Cytometry Critical Assessment of Population Id Methods. | |
| Evaluation of automated analysis of flow cytometry data. | ||
| Only one iteration on 2010, second one on planning phase. |
Others efforts TunedIT: http://tunedit.org/, RGASP-RNAseq Genome Annotation Assessment Project: www.sanger.ac.uk/PostGenomics/encode/RGASP.html Pittsburgh brain competition: http://pbc.lrdc.pitt.edu/ CAMDA Critical Assessment of Microarray Data Analysis: http://camda.bioinfo.cipf.es/camda2011/ Genome Access Workshop evaluation of statistical genetics approaches: http://www.gaworkshop.
Fig. 1.Organization of a research workflow by decomposition into building blocks amenable to verification. (A) Research pipelines are indicated by the gray arrows, whereas the orange blocks are the more specific building blocks necessary to execute the pipeline. A concatenation of research pipelines forms a research workflow. Each of the building blocks in this diagram can be verified by the challenges indicated by the black arrows emerging from the orange blocks. (B) Example of a research pipeline including the challenges discussed in Section 3. For the internal challenge example, levels of RNA extracted from tissue or cells are measured with 2 different technologies, one of which is used as reference. For the external challenge example, gene expression data from patients and control subjects are used to test whether a disease signature can be extracted and verified.
Fig. 2.Schematic diagram of MS Disease signature challenge organization. A dataset with both gene expression and corresponding clinical diagnoses or prognosis forms the basis of the challenge. The test data contains the gene expression data generated only and is transmitted to the participants via a web portal. There are three participants shown, the actual challenges could involve many more. The participants generate predictions-based gene signatures that are submitted back via the website. A trusted party will blindly score and rank the prediction by comparing to the gold standard dataset that contains both the gene expression data and actual clinical outcomes.