Jonas Hein1,2, Matthias Seibold3,4, Federica Bogo5, Mazda Farshad6, Marc Pollefeys7,5, Philipp Fürnstahl8, Nassir Navab9. 1. Research in Orthopedic Computer Science, University Hospital Balgrist, University of Zurich, Balgrist CAMPUS, Zurich, Switzerland. heinj@student.ethz.ch. 2. Computer Vision and Geometry Group, ETH Zurich, Zurich, Switzerland. heinj@student.ethz.ch. 3. Research in Orthopedic Computer Science, University Hospital Balgrist, University of Zurich, Balgrist CAMPUS, Zurich, Switzerland. matthias.seibold@tum.de. 4. Computer Aided Medical Procedures, Technical University Munich, Garching, Germany. matthias.seibold@tum.de. 5. Mixed Reality & AI Zurich Lab, Microsoft, Zurich, Switzerland. 6. Balgrist University Hospital, University of Zurich, Zurich, Switzerland. 7. Computer Vision and Geometry Group, ETH Zurich, Zurich, Switzerland. 8. Research in Orthopedic Computer Science, University Hospital Balgrist, University of Zurich, Balgrist CAMPUS, Zurich, Switzerland. 9. Computer Aided Medical Procedures, Technical University Munich, Garching, Germany.
Abstract
PURPOSE: : Tracking of tools and surgical activity is becoming more and more important in the context of computer assisted surgery. In this work, we present a data generation framework, dataset and baseline methods to facilitate further research in the direction of markerless hand and instrument pose estimation in realistic surgical scenarios. METHODS: : We developed a rendering pipeline to create inexpensive and realistic synthetic data for model pretraining. Subsequently, we propose a pipeline to capture and label real data with hand and object pose ground truth in an experimental setup to gather high-quality real data. We furthermore present three state-of-the-art RGB-based pose estimation baselines. RESULTS: : We evaluate three baseline models on the proposed datasets. The best performing baseline achieves an average tool 3D vertex error of 16.7 mm on synthetic data as well as 13.8 mm on real data which is comparable to the state-of-the art in RGB-based hand/object pose estimation. CONCLUSION: : To the best of our knowledge, we propose the first synthetic and real data generation pipelines to generate hand and object pose labels for open surgery. We present three baseline models for RGB based object and object/hand pose estimation based on RGB frames. Our realistic synthetic data generation pipeline may contribute to overcome the data bottleneck in the surgical domain and can easily be transferred to other medical applications.
PURPOSE: : Tracking of tools and surgical activity is becoming more and more important in the context of computer assisted surgery. In this work, we present a data generation framework, dataset and baseline methods to facilitate further research in the direction of markerless hand and instrument pose estimation in realistic surgical scenarios. METHODS: : We developed a rendering pipeline to create inexpensive and realistic synthetic data for model pretraining. Subsequently, we propose a pipeline to capture and label real data with hand and object pose ground truth in an experimental setup to gather high-quality real data. We furthermore present three state-of-the-art RGB-based pose estimation baselines. RESULTS: : We evaluate three baseline models on the proposed datasets. The best performing baseline achieves an average tool 3D vertex error of 16.7 mm on synthetic data as well as 13.8 mm on real data which is comparable to the state-of-the art in RGB-based hand/object pose estimation. CONCLUSION: : To the best of our knowledge, we propose the first synthetic and real data generation pipelines to generate hand and object pose labels for open surgery. We present three baseline models for RGB based object and object/hand pose estimation based on RGB frames. Our realistic synthetic data generation pipeline may contribute to overcome the data bottleneck in the surgical domain and can easily be transferred to other medical applications.
Entities:
Keywords:
Deep learning; Hand pose; Object pose; Single-shot pose estimation; Synthetic data generation