Literature DB >> 27038963

Combined 2D and 3D tracking of surgical instruments for minimally invasive and robotic-assisted surgery.

Xiaofei Du1, Maximilian Allan2, Alessio Dore3, Sebastien Ourselin4, David Hawkes4, John D Kelly5, Danail Stoyanov2.   

Abstract

PURPOSE: Computer-assisted interventions for enhanced minimally invasive surgery (MIS) require tracking of the surgical instruments. Instrument tracking is a challenging problem in both conventional and robotic-assisted MIS, but vision-based approaches are a promising solution with minimal hardware integration requirements. However, vision-based methods suffer from drift, and in the case of occlusions, shadows and fast motion, they can be subject to complete tracking failure.
METHODS: In this paper, we develop a 2D tracker based on a Generalized Hough Transform using SIFT features which can both handle complex environmental changes and recover from tracking failure. We use this to initialize a 3D tracker at each frame which enables us to recover 3D instrument pose over long sequences and even during occlusions.
RESULTS: We quantitatively validate our method in 2D and 3D with ex vivo data collected from a DVRK controller as well as providing qualitative validation on robotic-assisted in vivo data.
CONCLUSIONS: We demonstrate from our extended sequences that our method provides drift-free robust and accurate tracking. Our occlusion-based sequences additionally demonstrate that our method can recover from occlusion-based failure. In both cases, we show an improvement over using 3D tracking alone suggesting that combining 2D and 3D tracking is a promising solution to challenges in surgical instrument tracking.

Entities:  

Keywords:  Instrument tracking and detection; Minimally invasive surgery; Robot-assisted surgery; Surgical vision

Mesh:

Year:  2016        PMID: 27038963      PMCID: PMC4893384          DOI: 10.1007/s11548-016-1393-4

Source DB:  PubMed          Journal:  Int J Comput Assist Radiol Surg        ISSN: 1861-6410            Impact factor:   2.924


Introduction

Detection and tracking of surgical instruments can provide an important information component of computer-assisted surgery (CAS) for MIS [22]. Control systems which can supply automated visual servoing [18], soft motion constraints [19] and tactile feedback [15] are reliant on knowing positional information about both the shaft and the tip of the articulated instrument. Hardware-based solutions such as optical tracking systems using fiducial markers [10] require modification to the instrument design posing ergonomic challenges and additionally suffer from robustness issues due to line-of-sight requirements. Direct use of robotic joint encoders and forward kinematics to track instruments is possible in robot-assisted interventions; however, tendon-driven systems, such as da Vinci (Intuitive Surgical Inc., CA), introduce errors in the position information which usually requires correction that can be achieved through visual methods [17, 18]. Entirely image-based solutions [3, 21, 23] directly estimate the instrument pose in the reference frame of the observing camera. This avoids complex calibration routines and can be implemented entirely through software which allows them to be applied retrospectively and without modification to the instruments or the surgical workflow. Early image-based methods predominantly estimated the instrument pose in 2D by estimating image-based translation parameters, scale and in-plane rotation without explicitly modeling the 3D shape of the instrument. These have been based around low-level image processing [20] which accumulate handcrafted visual features and more complex learned discriminative models [6, 23] which track an instrument by performing detection independently on each frame. Such methods are typically fast and robust, handling complex and fast motion as well as recovery when the instrument is occluded by the field of view of the camera or smoke and tissue as they perform a global or semi-global search of the entire image for the tracked instrument. Fewer methods have attempted to estimate the 3D pose of the instruments directly from image data. This typically is a much more complex problem as it involves estimating three additional degrees of freedom (DOF) from very weak small baseline stereo or monocular cues. However, it provides additional benefits over 2D methods as it allows reasoning about instrument–instrument occlusions and interactions with tissue surfaces. Most of these methods focus on the alignment of a 3D model with a probabilistic classification of the image [1, 2, 16] which allows the fusion of geometric constraints with image data without an offline learning phase. A significant challenge with 3D tracking methods is that they commonly fail when the instrument motion is fast or complex, as they restrict the parameter search to local regions close to the estimated parameters from the previous frame. In many cases, this can lead to drift which requires a manual reset of the tracking. In this paper, we present a new method which combines the strengths of a novel 2D tracker with a preexisting 3D tracking method [3] allowing us to robustly track surgical instruments through sequences that contain occlusions and challenging motion which cause the 3D tracker to fail. We achieve this by performing global tracking-by-detection in 2D with a keypoint-based tracker which is used to initialize the 3D tracker with image-based translation and rotation parameters as well as an estimate of scale. We then perform a normal gradient-based optimization to estimate the full set of 3D parameters. We quantitatively validate our method using ex vivo data collected from a DVRK controller and forward kinematics and additionally provide convincing qualitative validation on in vivo robot-assisted prostatectomy sequences. Our validation shows the our method provides state-of-the-art 2D tracking performance and significantly improves tracking accuracy in 3D. In the ex vivo sequences, we restrict the motion of rigid 3D tracking as the method we use [3] does not model articulations of the instrument tip. Our validation shows the our method provides state-of-the-art 2D tracking performance and improves tracking accuracy in 3D.

Methods

Our method assumes that we have the 3D pose of the instrument in the first frame which we use to initialize a 2D bounding box (see Fig. 1) around the instrument head where is the pixel coordinates of top left corner of the bounding box which has width w and height h. We define the 2D detection problem as the estimation of the parameters and the 3D estimation problem as the estimation of the parameters , where (u, v) are the pixel coordinates of the center of the instrument head and is the pitch/in-plane rotation of the instrument shaft around the optical axis. (x, y, z) are the 3D translation coordinate in metric units from the camera coordinate system origin to the instrument coordinate system origin, and are the x, y, z rotations of the instrument in 3D, respectively. For each new input frame, we detect the instrument, estimating the 2D parameters using our new tracker. Using these parameters, we then initialize a previously developed, open-source 3D tracker [3] which then converges using gradient descent to estimate the full 3D parameter vector .
Fig. 1

The left image shows the 2D detection and estimation of the parameters which are then used a to initialize the 3D parameters . After the 3D pose is estimated, a new frame is loaded b and 2D detection begins again

The left image shows the 2D detection and estimation of the parameters which are then used a to initialize the 3D parameters . After the 3D pose is estimated, a new frame is loaded b and 2D detection begins again

Generalized Hough transform for 2D detection

To estimate , we implement a keypoint-based tracker which relies on a Generalized Hough Transform (GHT) [5] and a global histogram segmentation model. The GHT extends the well-known Hough Transform to detect arbitrary shapes as maxima in a parameter space by describing shapes as collections of spatial features in a local coordinate system. Given an example image template containing the object of interest, a reference point which serves as the origin of the local coordinates is computed, usually as the center of the template window. Then, for keypoint-based features (e.g., SIFT [13]) in the template image, the feature orientation and the relative displacement and orientation to the reference point are computed and stored in a database known as an R-table, which fully defines the target object. To perform detection with the GHT, keypoints in a new image are computed and matched to the stored keypoints in the R-table. Each matched keypoint then “votes” for the origin of the coordinate system, and the center is chosen as the reference point with the most votes.

Initializing the model

Given a sequence of m frames  and the 2D bounding box on the template frame , we detect the parameters on every input frame. The object model M is represented by a set of keypointswhere  denotes the ith keypoint on the model,  represents the distance between keypoint , and the center of the instrument head (u, v). is the voting state of keypoint  at frame t: 0 for negative and 1 for positive. It is positive if the corresponding keypoint has contributed for the voting of the detected center; otherwise, it is negative. The voting states for all keypoints are initialized as positive for the template frame For each input frame  with , the keypoints in the model are matched. We gather the matched corresponding keypoints as the vote set .where  is the voting weight for each matched keypoint , which is defined based on the segmentation model introduced in Sect. 2.3. Segmentation model initialization and update strategy: a image region inside the convex hull (green polygon) of the positive keypoints (green circle) is used to initialize and update the foreground histogram; filled circle with magenta color indicates the reference center; b foreground probability colormap illustration, in which blue color indicates low probability, while red color indicates higher probability; c foreground/background classification binary map based on the probability model

Histogram-based segmentation model

To adapt object model accounting for appearance changes, we are inspired by the work of [7, 8] and we implemented a global probabilistic model based on color histogram by using a recursive Bayesian formulation to better discriminate foreground and background. The foreground probability of a pixel at frame t is based on the segmentation of previous frame .where  is the class of the pixel at frame t: 0 for background and 1 for foreground,  is the pixel’s color from frame 1 to t, and Z is a normalization constant to keep the probabilities sum to 1. The color distribution  is built with HSV color histograms with  bins for H and S channels and 8 separate bins for V channel. We omit the background probability  here since it is similar to Eq. 4. The transition probabilities for foreground and background where are empirical choices as in [8], which are not very sensitive. To detect more boundary keypoints, the bounding box is usually slightly larger than the object, which includes more background pixels. So instead of initializing the histogram from the bounding box on the template image as in [8], we initialize it from the detection result from the first frame after locating the object center. It is assumed that the positive keypoints are most likely located on the object, so we collect all the positive keypoint into The foreground histogram is then initialized from the image region inside the convex hull of all the positive keypoints , which contains less background pixels. The background histogram is initialized from the image region surrounding the detected object bounding box (with some margin between). For the following frames, the color distributions are adapted in the same way as the initialization (shown in Fig. 2)where  is the model update factor.
Fig. 2

Segmentation model initialization and update strategy: a image region inside the convex hull (green polygon) of the positive keypoints (green circle) is used to initialize and update the foreground histogram; filled circle with magenta color indicates the reference center; b foreground probability colormap illustration, in which blue color indicates low probability, while red color indicates higher probability; c foreground/background classification binary map based on the probability model

The voting weight of a keypoint is defined as the mean foreground probability of the image patch surrounding the keypointDuring the voting process, we set the weight threshold , only keypoints with higher weight  participate in the voting process, and the weighted votes accumulated based on the segmentation model. In regard to the voting, we developed a rotation-invariant voting scheme in Sect. 2.4.

Rotation-invariant Hough voting scheme

When the object undergoes scale change or in-plane rotation, the voting also needs to rotate and scale in order to locate the object center. Scale and rotation information can be obtained from most feature detectors, but since it is usually not reliable enough, in [14], the authors analyzed the pairwise Euclidean distance and angular change between keypoints. We illustrated their voting scheme and ours in Fig. 3: Keypoints on the model and on the input frame are matched in Fig. 3a1, a2; then in the input frame, median pairwise angular change between keypoints is computed by comparing with the initial constellation in Fig. 3b1, and correspondent keypoints rotate votes based on the median angular change in Fig. 3b2. It displays the ideal situation for rotation estimation, but when the percentage of outliers is high, votes will probably miss shoot the center based on unreliable rotation estimation. We develop a rotation-invariant voting strategy shown in Fig. 3c1, c2. For each keypoint, instead of voting for only one direction, it votes for a circle. In this way, our vote scheme does not rely on any pre-estimation of rotation, and the maximum vote still accumulated at the center without any potential error induced by the pre-voting rotation estimation. In order to improve the overshooting or fall short situation for scale estimation or out-of-plane rotation, we make it more robust by voting for a ring circle in Fig. 3d1, d2. The thickness ratio is set to be [0.95, 1.05]. The initial scale is set to be 1.0, the radius of the voting circle is based on the scale of the previous frame , and the distance of the keypoint to the reference center of the model isAfter voting, the scale and rotation are estimated based on the scale change and angle change of all the positive keypoints.
Fig. 3

Voting scheme illustration: a1 keypoints and reference center on the model (shown in color); a2 keypoints and the tracked center (u, v) on the input frame; in [14], keypoints vote for the reference center (b1); in the input frame, the rotation is estimated by pairwise angular change and vote based on the rotation estimation in (b2); our rotation-invariant voting scheme votes not only for one direction, but a circle (c1, c2), in order to improve robustness, keypoint votes for a ring circle, and the rotation and scale s are estimated after voting (d1, d2)

Voting scheme illustration: a1 keypoints and reference center on the model (shown in color); a2 keypoints and the tracked center (u, v) on the input frame; in [14], keypoints vote for the reference center (b1); in the input frame, the rotation is estimated by pairwise angular change and vote based on the rotation estimation in (b2); our rotation-invariant voting scheme votes not only for one direction, but a circle (c1, c2), in order to improve robustness, keypoint votes for a ring circle, and the rotation and scale s are estimated after voting (d1, d2)

Model adaptation

One of the challenges for visual tracking is how and when to adapt the object model to cope with appearance changes due to deformation, illumination variations, etc. In endoscopic images, when the object center is out of view or out of plane, instead of updating the model, we have to reset the detector to re-detect the object. To achieve this, we define the following updating strategy. Whenever the voted center is out of the convex hull of the positive keypoint set , we evaluate all the keypoints inside the bounding box  around the detected center based on the segmentation model. If the weight  of the keypoint candidate  is higher than the weight threshold , it is considered as a potential keypoint and is included in the keypoint candidate set ; otherwise, it will be discarded.Then, we analyze the distribution of the keypoint candidates with regard to the object center: (i) If the center (u, v) is inside the convex hull of the candidates  and the number of candidates is higher than certain threshold, we add the new candidates  into the model and remove negative features, and then use the updated model to continue tracking; (ii) if the center is outside of the convex hull, it indicates the object is most likely out of image or is under out-of-plane rotation, so we switch the detector into reset mode: If the object is matched, the detector will be switched back to normal mode. Example frames from our ex vivo sequences acquired using a da Vinci (Intuitive Surgical Inc., CA) classic stereo laparoscope. The images show typical challenges in instrument tracking, such as instrument and tissue-based occlusions and sequences where the instrument goes in and out of view repeatedly Numerical results for the 3D tracking for each of the ex-vivo sequences Each value shows the mean error (mm) of the translation error for our 2D3D method and for the 3D only tracking

Combining 2D and 3D tracking

We use an open-source 3D level set tracker [3] which is capable of recovering the full 3D pose of surgical instruments by aligning multiple-level set segmentations with Random Forest pixel classifications and additionally uses optical flow tracking to local track features on the instrument body. We use the 2D pose to initialize the 3D pose of this method in each frame, rather than using the tracking-by-initialization method of the original authors. The parameters (x, y, z) of are initialized by ray casting (u, v) and using the z estimate from the first frame, scaled by s. is used directly to initialize , and are retained from the previous frame. Effectively, we only retain the parameters in the 3D tracker which cannot be estimated by the 2D tracker. Given an initial estimate, we allow the 3D level set-based tracker to converge to a solution through gradient descent.

Experiments and results

In this section, we present validation on both our novel 2D tracker “GHT” and our 2D-initialised-3D (referred to as 2D3D) tracking. In this section, we refer to the 3D tracker without 2D initialization [3] as “3D only.” Our quantitative validation is performed on new ex vivo datasets which we have made available online (see Fig. 4).1 We hope that by releasing data, we will encourage other researchers to test their methods against our data, an idea which was explored recently in the Endoscopic Vision Challenge at MICCAI 2015 which provided labeled segmentation and tracking data for laparoscopic and robotic-assisted minimally invasive surgery.
Fig. 4

Example frames from our ex vivo sequences acquired using a da Vinci (Intuitive Surgical Inc., CA) classic stereo laparoscope. The images show typical challenges in instrument tracking, such as instrument and tissue-based occlusions and sequences where the instrument goes in and out of view repeatedly

Ex vivo experiments

To evaluate the ability of our method to robustly track a surgical instrument through challenging sequences, we constructed four datasets with porcine tissue samples. Our ex vivo sequences are collected using a da Vinci (Intuitive Surgical Inc., CA) robot where we obtained joint encoder data from a DVRK controller box [12]. Using forward kinematics, we can compute the 3D transform for the instrument in the reference frame of the stereo camera using manual calibration to remove the offset between the robot and camera coordinate system. This can be projected into the image plane to obtain validation for both the 2D and 2D3D tracking. We compare our 2D tracking method with the-state-of-art CST tracker [9] and TLD tracker [11] using precision and box plots based on location error metric and area under curve (AUC) to analyze the performance. These metrics are widely used to evaluate tracking performance [4, 24]. Precision plots show the percentage of frames (y axis) where the estimated position (u, v) is within a distance threshold (x axis) compared with the ground truth. In the box plot, edges of the box are 25 and 75 % percentiles, the whiskers extend to the most extreme data points not considered outliers, and the red markers are outliers plotted outside the box. Performance comparison for dataset I, which contains a tool occlusion between frames 250–400 We compare our 2D tracking method with the-state-of-art CST tracker [9] and TLD tracker [11] using precision and box plots based on location error metric and area under curve (AUC) to analyze the performance. These metrics are widely used to evaluate tracking performance [4, 24] and show the percentage of frames where the estimated position is within a threshold of the ground truth. We also summarise the numerical results for the 3D tracking in Table 1. In the table, mean translation errors for our 2D3D method and the 3D only tracking are shown for each of the ex vivo sequences.
Table 1

Numerical results for the 3D tracking for each of the ex-vivo sequences

Dataset IDataset IIDataset IIIDataset IV
2D3D3.70 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm $$\end{document}± 2.2816.23 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm $$\end{document}± 11.838.29 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm $$\end{document}± 11.2911.54 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm $$\end{document}± 7.94
3D only4.76 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm $$\end{document}± 3.2838.47 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm $$\end{document}± 32.1151.37 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm $$\end{document}± 52.1016.79 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\pm $$\end{document}± 14.88

Each value shows the mean error (mm) of the translation error for our 2D3D method and for the 3D only tracking

Tracking through occlusions and out of view

Dataset I evaluates the ability of the method to track instruments when they are occluded by other instruments, effectively assessing our method’s ability to avoid tracking association errors between the target instrument and additional instruments in the frame, even when they violate each other’s image space. Dataset II evaluated the ability of the method to track instruments when they are occluded by tissue samples. Dataset III and dataset IV evaluate the ability of our method to recover when the instrument moves out of view of the camera. The trajectories of the tracked center and the precision plots for each sequence are shown in Figs. 5 and 6. The CST tracker lacks the ability to recover from occlusions or out-of view situations compared with the TLD tracker and our GHT tracker. Our GHT tracker has the highest AUC score among three trackers, which means our method can handle various occlusion and out-of-view challenges. The 3D tracker demonstrates similar performance with and without 2D initialization in Dataset I, with slight improvement in the z axis estimation during the occluded frames. Dataset II clearly demonstrates the improvement of our method as the 3D only tracker loses tracking at frame 380 and never recovers. The same effect occurs in dataset III where the 3D only tracker loses tracking after occlusion and does not recover. Figures  5, 6 and 7a show the trajectories of the tracked center for three 2D methods, (b) show the precision plot for three 2D methods, (c) show the box plot for three 2D methods, and (d–f) show the 3D trajectory of the proposed 2D3D tracker compared with using the 3D tracker directly. Dataset I evaluated the ability of the method to track instruments when they are occluded by tissue samples. Dataset II evaluates the ability of the method to track instruments when they are occluded by other instruments, effectively assessing our method’s ability to avoid tracking association errors between the target instrument and additional instruments in the frame, even when they violate each other’s image space. Dataset III evaluates the ability of our method to recover when the instrument moves out of view of the camera. The trajectories of the tracked center and the precision plots for each sequence are shown in Figs. 5, 6 and 7. The CST tracker lacks the ability to recover from occlusions or out-of view situations compared with the TLD tracker and our GHT tracker. Our GHT tracker has the highest AUC score among three trackers, which means our method can handle various occlusion and out-of-view challenges. The 3D tracker demonstrates similar performance with and without 2D initialization in Dataset I, with slight improvement in the z axis estimation during the occluded frames. In Dataset II, however, the improvement is significant as the 3D only tracker loses tracking at frame 380 and never recovers. The same effect occurs in dataset III where the 3D only tracker loses tracking after occlusion and does not recover. Figures  5, 6, 7 (a) show the trajectories of the tracked center for three 2D methods, (b) show the precision plot for three 2D methods, (c) show the box plot for three 2D methods, and (d–f) show the 3D trajectory of the proposed 2D3D tracker compared with using the 3D tracker directly.
Fig. 5

Performance comparison for dataset I, which contains a tool occlusion between frames 250–400

Fig. 6

Performance comparison for dataset II, which contains a tissue occlusion between frames 225–350

Fig. 7

Performance comparison for dataset III, which contains out-of-view occlusions between frames 325–350

Performance comparison for dataset II, which contains a tissue occlusion between frames 225–350 Performance comparison for dataset III, which contains out-of-view occlusions between frames 325–350 Performance comparison for the extended tracking sequence, dataset IV Frames showing an instrument tracked through an in vivo sequence. a–c Demonstrate good accuracy, whereas in d a failure mode for our algorithm is exhibited where poor classification on the instrument body causes the 3D tracked to fail to converge correctly

Long-term tracking

We construct an extended sequence (dataset IV) of over 4000 frames to demonstrate the capability of our method to track the pose of the instrument in 3D without failing from drift. We display the results in Fig. 8 where (a) shows the trajectories of tracked center, (b) shows a precision plot for three 2D methods over the whole sequence, (c) shows a precision plot for three 2D methods over frames where all methods report a positive detection, (d) shows a box plot for the three 2D methods, (e–g) show the 3D trajectory of the proposed 2D3D tracker compared with using the 3D tracker directly. The figures show that our method is capable of reliable long-term tracking although it does exhibit interesting failure cases. On dataset IV, our method fails to discriminate between out-of-view occlusions and out-of-plane-based appearance changes which results in non-detected output when the tracked points are rotated out of the field of view or such that the appearance of the patch changes beyond recognition. To display quantitative precision plot results in cases where each 2D detection method has some false non-detections, we display 2 types of plot: one where we display the results from the whole sequence where we set an infinite distance for missed detections (Fig. 8b) and one where we only consider frames where all of the 2D tracking methods report a detection (Fig. 8c).
Fig. 8

Performance comparison for the extended tracking sequence, dataset IV

In vivo experiments

We additionally qualitatively validate our method using robotic video data [16]. Example images showing our method performing detection on these images are shown in Fig. 9. This in vivo sequence shows that our method is capable of tracking through complex surgical images even when the instrument undergoes articulation, which our method does not explicitly model.
Fig. 9

Frames showing an instrument tracked through an in vivo sequence. a–c Demonstrate good accuracy, whereas in d a failure mode for our algorithm is exhibited where poor classification on the instrument body causes the 3D tracked to fail to converge correctly

Conclusions

In this paper, we present a novel GHT-based 2D tracker with a global histogram probabilistic segmentation model which we combine with a 3D tracking algorithm to robustly estimate the full 3D pose of instruments in minimally invasive surgery. Our extensive ex vivo validation demonstrates that our method is not only capable of tracking instruments over extended sequences but that it can also recover from tracking failures and occlusions, a feature that has not been demonstrated in any prior 3D tracking work in a minimally invasive surgical context. Future improvements to this method will focus around removing the requirement on a manual initialization. This can potentially be achieved with an enforced fixed position of the instrument, while the 3D pose estimator converges to a correct solution. Below is the link to the electronic supplementary material. Supplementary material 1 (mp4 36653 KB)
  11 in total

1.  FRACAS: a system for computer-aided image-guided long bone fracture surgery.

Authors:  L Joskowicz; C Milgrom; A Simkin; L Tockus; Z Yaniv
Journal:  Comput Aided Surg       Date:  1998

Review 2.  Surgical vision.

Authors:  Danail Stoyanov
Journal:  Ann Biomed Eng       Date:  2011-10-20       Impact factor: 3.934

3.  Tracking-Learning-Detection.

Authors:  Zdenek Kalal; Krystian Mikolajczyk; Jiri Matas
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2011-12-13       Impact factor: 6.226

4.  Robust Object Tracking with Online Multiple Instance Learning.

Authors:  Boris Babenko; Ming-Hsuan Yang; Serge Belongie
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2010-12-23       Impact factor: 6.226

5.  Detecting Surgical Tools by Modelling Local Appearance and Global Shape.

Authors:  David Bouget; Rodrigo Benenson; Mohamed Omran; Laurent Riffaud; Bernt Schiele; Pierre Jannin
Journal:  IEEE Trans Med Imaging       Date:  2015-12       Impact factor: 10.048

6.  Online selection of discriminative tracking features.

Authors:  Robert T Collins; Yanxi Liu; Marius Leordeanu
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2005-10       Impact factor: 6.226

7.  Dynamic 3-D virtual fixtures for minimally invasive beating heart procedures.

Authors:  Jing Ren; Rajni V Patel; Kenneth A McIsaac; Gerard Guiraudon; Terry M Peters
Journal:  IEEE Trans Med Imaging       Date:  2008-08       Impact factor: 10.048

8.  Fast part-based classification for instrument detection in minimally invasive surgery.

Authors:  Raphael Sznitman; Carlos Becker; Pascal Fua
Journal:  Med Image Comput Comput Assist Interv       Date:  2014

9.  Toward detection and localization of instruments in minimally invasive surgery.

Authors:  Max Allan; Sébastien Ourselin; Steve Thompson; David J Hawkes; John Kelly; Danail Stoyanov
Journal:  IEEE Trans Biomed Eng       Date:  2012-11-21       Impact factor: 4.538

10.  Feature classification for tracking articulated surgical tools.

Authors:  Austin Reiter; Peter K Allen; Tao Zhao
Journal:  Med Image Comput Comput Assist Interv       Date:  2012
View more
  5 in total

1.  Articulated Multi-Instrument 2-D Pose Estimation Using Fully Convolutional Networks.

Authors:  Xiaofei Du; Thomas Kurmann; Ping-Lin Chang; Maximilian Allan; Sebastien Ourselin; Raphael Sznitman; John D Kelly; Danail Stoyanov
Journal:  IEEE Trans Med Imaging       Date:  2018-05       Impact factor: 10.048

2.  A contextual detector of surgical tools in laparoscopic videos using deep learning.

Authors:  Babak Namazi; Ganesh Sankaranarayanan; Venkat Devarajan
Journal:  Surg Endosc       Date:  2021-02-08       Impact factor: 4.584

3.  Can surgical simulation be used to train detection and classification of neural networks?

Authors:  Odysseas Zisimopoulos; Evangello Flouty; Mark Stacey; Sam Muscroft; Petros Giataganas; Jean Nehme; Andre Chow; Danail Stoyanov
Journal:  Healthc Technol Lett       Date:  2017-09-14

4.  Convolutional neural network-based surgical instrument detection.

Authors:  Tongbiao Cai; Zijian Zhao
Journal:  Technol Health Care       Date:  2020       Impact factor: 1.285

5.  A computationally efficient method for hand-eye calibration.

Authors:  Zhiqiang Zhang; Lin Zhang; Guang-Zhong Yang
Journal:  Int J Comput Assist Radiol Surg       Date:  2017-07-19       Impact factor: 2.924

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.