| Literature DB >> 18039026 |
Hailiang Huang1, Bruno M Jedynak, Joel S Bader.
Abstract
Yeast two-hybrid screens are an important method for mapping pairwise physical interactions between proteins. The fraction of interactions detected in independent screens can be very small, and an outstanding challenge is to determine the reason for the low overlap. Low overlap can arise from either a high false-discovery rate (interaction sets have low overlap because each set is contaminated by a large number of stochastic false-positive interactions) or a high false-negative rate (interaction sets have low overlap because each misses many true interactions). We extend capture-recapture theory to provide the first unified model for false-positive and false-negative rates for two-hybrid screens. Analysis of yeast, worm, and fly data indicates that 25% to 45% of the reported interactions are likely false positives. Membrane proteins have higher false-discovery rates on average, and signal transduction proteins have lower rates. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical undersampling and a 55% to 85% false-negative rate due to proteins that appear to be systematically lost from the assays. Finally, statistical model selection conclusively rejects the Erdös-Rényi network model in favor of the power law model for yeast and the truncated power law for worm and fly degree distributions. Much as genome sequencing coverage estimates were essential for planning the human genome sequencing project, the coverage estimates developed here will be valuable for guiding future proteomic screens. All software and datasets are available in and , -, and -, and are also available from our Web site, http://www.baderzone.org.Entities:
Mesh:
Year: 2007 PMID: 18039026 PMCID: PMC2082503 DOI: 10.1371/journal.pcbi.0030214
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Flowchart for Yeast Two-Hybrid Screens Indicates Systematic and Stochastic Sources of False Negatives and Stochastic Sources of False Positives
Figure 2Simplified Schematic Shows the Two-Hybrid Sampling Process
In this picture, true-positive interactions (black edges) are sampled uniformly with total probability 1 − α, and false-positive interactions (red edges) are sampled stochastically with total probability 1 − α. Sampling is with replacement, and multiple edges between a pair of vertices represent multiple observations of the same interaction. The example shows n = 12 edges sampled in the entire network, with w = 11 unique edges and s = 10 edges that are singletons observed once. The total number of true-positive edges, k, and the number of false-positive edges within the sample, f, are hidden. The actual experimental data is more complicated, with individual values reported for n, w, and s for each protein used as a bait. The statistical method presented here provides estimates for k and f together with parameter estimates for α and the distribution Pr(k).
Definitions of Symbols
Known Properties of the Experimental Datasets Are Total Number of Baits, N; Mean Number of Preys Sampled per Bait, ; Mean Number of Unique Preys, ; and Mean Number of Singleton Preys,
Error Rates and Projections for Full Coverage Provided for Yeast (PL-MIXTURE), Worm (TPL-MIXTURE), and Fly (TPL-MIXTURE) Models
Promiscuous Domains
Chaste Domains
Correlation of False-Discovery Rates with Hydrophobicity Scales and Length
The False-Discovery Rate for a Bait Protein, /n, Positively Correlated with the Estimated Number of True Interaction Partners That Are Observed, w − , and the Total Number,
Parameter Estimates for the True-Positive Rates for Avoiding Systematic Losses
Protein Interaction Count Predictions Provided from This Method, , and from a Previous Method, k ∩
True-Positive Rates Estimated from Literature Comparisons