Ying-Ying Xu1, Fan Yang1, Yang Zhang1, Hong-Bin Shen2. 1. Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA. 2. Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China and Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Abstract
MOTIVATION: There is a long-term interest in the challenging task of finding translocated and mislocated cancer biomarker proteins. Bioimages of subcellular protein distribution are new data sources which have attracted much attention in recent years because of their intuitive and detailed descriptions of protein distribution. However, automated methods in large-scale biomarker screening suffer significantly from the lack of subcellular location annotations for bioimages from cancer tissues. The transfer prediction idea of applying models trained on normal tissue proteins to predict the subcellular locations of cancerous ones is arbitrary because the protein distribution patterns may differ in normal and cancerous states. RESULTS: We developed a new semi-supervised protocol that can use unlabeled cancer protein data in model construction by an iterative and incremental training strategy. Our approach enables us to selectively use the low-quality images in normal states to expand the training sample space and provides a general way for dealing with the small size of annotated images used together with large unannotated ones. Experiments demonstrate that the new semi-supervised protocol can result in improved accuracy and sensitivity of subcellular location difference detection. AVAILABILITY AND IMPLEMENTATION: The data and code are available at: www.csbio.sjtu.edu.cn/bioinf/SemiBiomarker/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: There is a long-term interest in the challenging task of finding translocated and mislocated cancer biomarker proteins. Bioimages of subcellular protein distribution are new data sources which have attracted much attention in recent years because of their intuitive and detailed descriptions of protein distribution. However, automated methods in large-scale biomarker screening suffer significantly from the lack of subcellular location annotations for bioimages from cancer tissues. The transfer prediction idea of applying models trained on normal tissue proteins to predict the subcellular locations of cancerous ones is arbitrary because the protein distribution patterns may differ in normal and cancerous states. RESULTS: We developed a new semi-supervised protocol that can use unlabeled cancer protein data in model construction by an iterative and incremental training strategy. Our approach enables us to selectively use the low-quality images in normal states to expand the training sample space and provides a general way for dealing with the small size of annotated images used together with large unannotated ones. Experiments demonstrate that the new semi-supervised protocol can result in improved accuracy and sensitivity of subcellular location difference detection. AVAILABILITY AND IMPLEMENTATION: The data and code are available at: www.csbio.sjtu.edu.cn/bioinf/SemiBiomarker/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Mathias Uhlen; Per Oksvold; Linn Fagerberg; Emma Lundberg; Kalle Jonasson; Mattias Forsberg; Martin Zwahlen; Caroline Kampf; Kenneth Wester; Sophia Hober; Henrik Wernerus; Lisa Björling; Fredrik Ponten Journal: Nat Biotechnol Date: 2010-12 Impact factor: 54.908
Authors: Kevin W Eliceiri; Michael R Berthold; Ilya G Goldberg; Luis Ibáñez; B S Manjunath; Maryann E Martone; Robert F Murphy; Hanchuan Peng; Anne L Plant; Badrinath Roysam; Nico Stuurman; Nico Stuurmann; Jason R Swedlow; Pavel Tomancak; Anne E Carpenter Journal: Nat Methods Date: 2012-06-28 Impact factor: 28.547
Authors: S Takayama; S Krajewski; M Krajewska; S Kitada; J M Zapata; K Kochel; D Knee; D Scudiero; G Tudor; G J Miller; T Miyashita; M Yamada; J C Reed Journal: Cancer Res Date: 1998-07-15 Impact factor: 12.701