| Literature DB >> 24268030 |
Ping Zhang, Weidan Cao, Zoran Obradovic.
Abstract
BACKGROUND: In many biomedical applications, there is a need for developing classification models based on noisy annotations. Recently, various methods addressed this scenario by relaying on unreliable annotations obtained from multiple sources.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24268030 PMCID: PMC3848820 DOI: 10.1186/1471-2105-14-S12-S5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Properties of multi-annotator algorithms.
| Algorithms | Unsupervised? | Integrate labels globally? | Data dependent? | Filter novice annotation? |
|---|---|---|---|---|
| MV | Y | N | N | N |
| MAP-ML | Y | Y | N | N |
| GMM-MAPML | Y | Y | Y | N |
| AEFN | Y | Y | Y | Y |
The comparisons of properties of multi-annotator algorithms are shown. 'Y' denotes that the algorithm has the property; 'N' denotes that the algorithm doesn't have the property.
AEFN based accuracy estimates on the text evidence classification task without using ground truth.
| First Component | Second Component | Third Component | ||||
|---|---|---|---|---|---|---|
| Annotators | Estimated Sensitivity | Estimated Specificity | Estimated Sensitivity | Estimated Specificity | Estimated Sensitivity | Estimated Specificity |
| Annotator 1 | Filtered | 0.7573 | 0.7737 | Filtered | ||
| Annotator 2 | 0.8400 | 0.8445 | 0.8901 | 0.9303 | 0.8103 | 0.8798 |
| Annotator 3 | 0.8984 | 0.9061 | 0.8150 | 0.8870 | 0.8235 | 0.8196 |
| Annotator 4 | 0.7492 | 0.7553 | Filtered | 0.7184 | 0.8197 | |
| Annotator 5 | 0.8035 | 0.7810 | 0.7991 | 0.8199 | 0.8819 | 0.9152 |
The estimates by five annotators for three principal components on the text evidence task are shown.
AEFN based accuracy estimates on the text focus classification task without using ground truth.
| First Component | Second Component | Third Component | ||||
|---|---|---|---|---|---|---|
| Annotators | Estimated Sensitivity | Estimated Specificity | Estimated Sensitivity | Estimated Specificity | Estimated Sensitivity | Estimated Specificity |
| Annotator 1 | 0.7672 | 0.7749 | 0.8005 | 0.7969 | 0.7634 | 0.7907 |
| Annotator 2 | 0.9373 | 0.8588 | 0.8753 | 0.8271 | 0.8958 | 0.8863 |
| Annotator 3 | 0.7383 | 0.8258 | Filtered | Filtered | ||
| Annotator 4 | 0.8059 | 0.8652 | 0.9010 | 0.8594 | 0.8318 | 0.8413 |
| Annotator 5 | Filtered | Filtered | Filtered | |||
The estimates by five annotators for three principal components on the text focus task are shown.
Figure 1Three logistic regression classifier ROC comparisons on the text evidence classification task. The ROC comparison on the biomedical evidence classification of three strategies for selecting an annotation source for logistic regression. Methods are sorted in the legend of the figure according to their AUC values.
Figure 2Three logistic regression classifier ROC comparisons on the text focus classification task. The ROC comparison on the biomedical focus classification of three strategies for selecting an annotation source for logistic regression. Methods are sorted in the legend of the figure according to their AUC values.
CASP9 comparison on labelled data.
| Predictor Name | Institution | ACC | AUC |
|---|---|---|---|
| AEFN | |||
| GMM-MAPML | 0.785 | 0.874 | |
| MAP-ML | 0.764 | 0.859 | |
| MV | 0.735 | 0.776 | |
| PRDOS2 | Tokyo Tech | 0.754 | 0.855 |
| MULTICOM-REFINE | U of Missouri | 0.750 | 0.822 |
| BIOMINE_DR_PDB | U of Alberta | 0.741 | 0.821 |
| GSMETADISORDERMD | IIMCB in Warsaw | 0.738 | 0.816 |
| MASON | George Mason U | 0.736 | 0.743 |
| ZHOU-SPINE-D | Indiana University | 0.731 | 0.832 |
| DISTILL-PUNCH1 | UCD Dublin | 0.726 | 0.800 |
| OND-CRF | Umea University | 0.706 | 0.759 |
| UNITED3D | Kitasato University | 0.704 | 0.780 |
| CBRC_POODLE | CBRC | 0.694 | 0.830 |
| MCGUFFIN | University of Reading | 0.688 | 0.817 |
| ISUNSTRUCT | IPR RAS | 0.676 | 0.739 |
| DISOPRED3C | UCL | 0.670 | 0.853 |
| ULG-GIGA | University of Liege | 0.588 | 0.726 |
| MEDOR | Aix-Marseille U | 0.579 | 0.679 |
Comparisons of AEFN vs. alternative multi-annotator methods (GMM-MAPML, MAP-ML and MV) and individual CASP9 protein disorder predictors.
Figure 3Analysis of CASP9 disorder predictors at three components identified by AEFN. In panels a, b, and c: the black cross plots the actual sensitivity and specificity of each predictor; the red dot plots the sensitivity and specificity of the best predictors as estimated by the AEFN algorithm; the green squares show the predictors filtered as those less accurate in the experiment.