| Literature DB >> 21658292 |
Abstract
Given prior human judgments of the condition of an object it is possible to use these judgments to make a maximal likelihood estimate of what future human judgments of the condition of that object will be. However, if one has a reasonably large collection of similar objects and the prior human judgments of a number of judges regarding the condition of each object in the collection, then it is possible to make predictions of future human judgments for the whole collection that are superior to the simple maximal likelihood estimate for each object in isolation. This is possible because the multiple judgments over the collection allow an analysis to determine the relative value of a judge as compared with the other judges in the group and this value can be used to augment or diminish a particular judge's influence in predicting future judgments. Here we study and compare five different methods for making such improved predictions and show that each is superior to simple maximal likelihood estimates.Entities:
Mesh:
Year: 2011 PMID: 21658292 PMCID: PMC3111591 DOI: 10.1186/1471-2105-12-S3-S5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Optimal parameters associated with the methods M1, M4 and M5 accurate to two digits.
| 0.63 | 1.8 | 0.022 |
Log of Probability Measures for all the methods using rigorous values. The best performance in each row is marked with an asterisk
| Judge | ||||||
|---|---|---|---|---|---|---|
| 0 | -8897 | -8884 | -8384 | -8704 | -8501 | -8202* |
| 1 | -7103 | -7085 | -7006 | -6843 | -6940 | -6690* |
| 2 | -6900 | -6884 | -6889 | -6687 | -6701 | -6371* |
| 3 | -6806 | -6729 | -6699 | -6493 | -6734 | -6192* |
| 4 | -7694 | -7637 | -7501* | -7560 | -8121 | -9350 |
| 5 | -7131 | -7045 | -6912 | -6872* | -7259 | -7514 |
| 6 | -7044 | -6993 | -6884 | -6814* | -7026 | -7237 |
| 7 | -7110 | -7149 | -7035 | -6876 | -6557 | -6446* |
| 8 | -7354 | -7521 | -7374 | -7266 | -6559* | -6838 |
| 9 | -7122 | -7040 | -7100 | -6911* | -7004 | -7125 |
| 10 | -8032 | -8128 | -7862 | -7881 | -7576 | -7545* |
| 11 | -7281 | -7123 | -7071 | -7008* | -7450 | -7593 |
| 12 | -8153 | -8305 | -8044 | -8047 | -7694* | -8056 |
| Ave | -7433 | -7425 | -7289 | -7228 | -7240 | -7320 |
Log of Probability Measures for test set optimized single parameters. The best performance in each row is marked with an asterisk.
| Judge | |||
|---|---|---|---|
| 0 | -8902 | -8425 | -7805* |
| 1 | -7087 | -6925 | -6833* |
| 2 | -6872 | -6641* | -6674 |
| 3 | -6760 | -6675 | -6462* |
| 4 | -7694* | -8068 | -7703 |
| 5 | -7121 | -7259 | -6942* |
| 6 | -7032 | -7015 | -6913* |
| 7 | -7094 | -6482* | -6836 |
| 8 | -7352 | -6487* | -7199 |
| 9 | -7113 | -6992 | -6874* |
| 10 | -8041 | -7576 | -7442* |
| 11 | -7275 | -7450 | -6909* |
| 12 | -8160 | -7694* | -7784 |
| Ave | -7423 | -7207 | -7106 |
In order to measure which method best predicts the individual class values made by a test judge between two methods, we apply the signed rank test. We also count query document pairs where the predicted probability of the class value is bigger for each method (and also ties). An asterisk marks the better result when the difference has a p-value less than 0.05 by the signed rank test. The optimal parameters are obtained through the rigorous induction method as in Table 2.
| Judge | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| = | = | = | |||||||
| 0 | 2326 | 2674* | 0 | 1763 | 3237* | 0 | 2197 | 2803* | 0 |
| 1 | 2576* | 2423 | 1 | 1741 | 3259* | 0 | 1808 | 3192* | 0 |
| 2 | 2336* | 2664 | 0 | 1580 | 3420* | 0 | 1892 | 3108* | 0 |
| 3 | 2637* | 2363 | 0 | 1616 | 3384* | 0 | 1592 | 3408* | 0 |
| 4 | 3130* | 1870 | 0 | 2817* | 2183 | 0 | 2788 | 2212 | 0 |
| 5 | 2955* | 2045 | 0 | 2463 | 2537* | 0 | 2341 | 2659* | 0 |
| 6 | 2692* | 2308 | 0 | 2302 | 2698* | 0 | 2301 | 2699* | 0 |
| 7 | 1829 | 3171* | 0 | 1504 | 3496* | 0 | 1972 | 3028* | 0 |
| 8 | 1398 | 3602* | 0 | 1504 | 3496* | 0 | 2313 | 2687* | 0 |
| 9 | 2449* | 2551 | 0 | 1964 | 3036* | 0 | 2024 | 2976* | 0 |
| 10 | 1970 | 3030* | 0 | 1689 | 3311* | 0 | 2337 | 2663* | 0 |
| 11 | 3035* | 1965 | 0 | 2199 | 2801* | 0 | 2096 | 2904* | 0 |
| 12 | 1965 | 3035* | 0 | 1915 | 3085* | 0 | 2452 | 2548* | 0 |
| Total | 31298 | 33701 | 1 | 25057 | 39943 | 0 | 28113 | 36887 | 0 |
In order to measure which method best predicts the individual class values made by a test judge between two methods, we apply the signed rank test. We also count query document pairs where the predicted probability of the class value is bigger for each method (and also ties). An asterisk marks th better result when the difference has a p-value less than 0.05 by the signed rank test. The optimal parameters are the single parameter optimizations of Table 1.
| Judge | |||
|---|---|---|---|
| = | |||
| 0 | 1992 | 3008* | 0 |
| 1 | 2546 | 2454* | 0 |
| 2 | 2864* | 2136 | 0 |
| 3 | 2598 | 2402* | 0 |
| 4 | 2148 | 2851* | 1 |
| 5 | 2247 | 2753* | 0 |
| 6 | 2527 | 2473* | 0 |
| 7 | 3392* | 1608 | 0 |
| 8 | 3798* | 1202 | 0 |
| 9 | 2676 | 2324* | 0 |
| 10 | 2802* | 2198 | 0 |
| 11 | 2084 | 2916* | 0 |
| 12 | 2938* | 2062 | 0 |
| Total | 34612 | 30387 | 1 |