| Literature DB >> 19116613 |
Michael E Cusick1, Haiyuan Yu, Alex Smolyar, Kavitha Venkatesan, Anne-Ruxandra Carvunis, Nicolas Simonis, Jean-François Rual, Heather Borick, Pascal Braun, Matija Dreze, Jean Vandenhaute, Mary Galli, Junshi Yazaki, David E Hill, Joseph R Ecker, Frederick P Roth, Marc Vidal.
Abstract
High-quality datasets are needed to understand how global and local properties of protein-protein interaction, or 'interactome', networks relate to biological mechanisms, and to guide research on individual proteins. In an evaluation of existing curation of protein interaction experiments reported in the literature, we found that curation can be error-prone and possibly of lower quality than commonly assumed.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19116613 PMCID: PMC2683745 DOI: 10.1038/nmeth.1284
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Comparison of s trategies towards completing an interactome map
| Attribute | High-throughput | Literature Curated |
|---|---|---|
| Investigation | discovery based | hypothesis driven |
| Functional inference | determinable from network? | determinable from study design? |
| Study bias | unbiased | biased |
| Completeness | estimable | inestimable |
| Reliability | determinable | indeterminable |
Figure 1Distribution of the number of published papers supporting each interaction: in the dataset of yeast protein interactions downloaded from the BioGRID21 database; in the literature-curated dataset of human protein interactions; and in the literature-curated dataset of Arabidopsis protein interactions.
Figure 2Distribution of the publications in literature-curated datasets by the number of interactions reported in the publication. Distribution in the (a) yeast (b) human and (c) Arabidopsis literature-curated PPI datasets supported by a single publication.
Figure 3Overlaps of reported curation for yeast PPIs (a) Overlaps of the total number of reported binary PPIs, or after removing the largest high-throughput yeast PPI reports (numbers in parentheses). (b) Overlaps of the Pubmed reports curated. (c) Overlaps after removing multiply supported interactions.
Figure 4Summary of recuration results. (a) 100 interacting pairs randomly drawn from the yeast literature curated dataset supported by only a single publication. Score 0: erroneous, not reported in the associated publication; score 1: reported in the associated publication but not verified; score 2: reported and verified. (b) Recuration results of the literature curated sample for human PPIs reported in multiple publications. Proportion of correct and erroneous curation units (left panel) and a distribution of different types of curation errors (right panel). (c) Summary of curation results of randomly sampled sets from human literature curated interacting pairs reported in a single publication. Correct and erroneous curation units (left side); distribution of different types of curation errors (right side).
Summary of curation results for human and Arabidopsis
| Sampled Dataset | Interaction Units | Curation Units |
|---|---|---|
| Human LC-multiple | Correct: 172 (91.5%) | Correct: 362 (62%) |
| Human Literature | Correct: 88 (55%) | Correct: 88 (55%) |
| Arabidopsis | Correct: 94 (94%) | Correct: 201 (89.3%) |
For human a Curation Unit is an interaction reported in one publication regardless of the number of databases curating the interaction. An interaction reported in three distinct papers and curated in two databases represents three Curation Units. For Arabidopsis a Curation Unit is an interaction reported in one publication or one database. An interaction reported in three distinct papers and with all three curated in the two Arabidopsis PPI databases represents six Curation Units.