Literature DB >> 34287934

A new 2D-3D registration gold-standard dataset for the hip joint based on uncertainty modeling.

Fabio D'Isidoro¹, Christophe Chênes², Stephen J Ferguson¹, Jérôme Schmid².

Abstract

PURPOSE: Estimation of the accuracy of 2D-3D registration is paramount for a correct evaluation of its outcome in both research and clinical studies. Publicly available datasets with standardized evaluation methodology are necessary for validation and comparison of 2D-3D registration techniques. Given the large use of 2D-3D registration in biomechanics, we introduced the first gold standard validation dataset for computed tomography (CT)-to-x-ray registration of the hip joint, based on fluoroscopic images with large rotation angles. As the ground truth computed with fiducial markers is affected by localization errors in the image datasets, we proposed a new methodology based on uncertainty propagation to estimate the accuracy of a gold standard dataset.
METHODS: The gold standard dataset included a 3D CT scan of a female hip phantom and 19 2D fluoroscopic images acquired at different views and voltages. The ground truth transformations were estimated based on the corresponding pairs of extracted 2D and 3D fiducial locations. These were assumed to be corrupted by Gaussian noise, without any restrictions of isotropy. We devised the multiple projective points criterion (MPPC) that jointly optimizes the transformations and the noisy 3D fiducial locations for all views. The accuracy of the transformations obtained with the MPPC was assessed in both synthetic and real experiments using different formulations of the target registration error (TRE), including a novel formulation of the TRE (uTRE) derived from the uncertainty analysis of the MPPC.
RESULTS: The proposed MPPC method was statistically more accurate compared to the validation methods for 2D-3D registration that did not optimize the 3D fiducial positions or wrongly assumed the isotropy of the noise. The reported results were comparable to previous published works of gold standard datasets. However, a formulation of the TRE commonly found in these gold standard datasets was found to significantly miscalculate the true TRE computed in synthetic experiments with known ground truths. In contrast, the uncertainty-based uTRE was statistically closer to the true TRE.
CONCLUSIONS: We proposed a new gold standard dataset for the validation of CT-to-X-ray registration of the hip joint. The gold standard transformations were derived from a novel method modeling the uncertainty in extracted 2D and 3D fiducials. Results showed that considering possible noise anisotropy and including corrupted 3D fiducials in the optimization resulted in improved accuracy of the gold standard. A new uncertainty-based formulation of the TRE also appeared as a good alternative to the unknown true TRE that has been replaced in previous works by an alternative TRE not fully reflecting the gold standard accuracy.

Entities: Chemical

Keywords: 2D-3D registration; CT-to-X-ray image registration; gold standard dataset; uncertainty propagation

Mesh：

Year: 2021 PMID： 34287934 PMCID： PMC9290855 DOI： 10.1002/mp.15124

Source DB: PubMed Journal: Med Phys ISSN： 0094-2405 Impact factor: 4.506

INTRODUCTION

The goal of 2D‐3D registration is to find the spatial transformation that best aligns 3D imaging data with one or more 2D projection images, in the 3D physical space. Typically, the 3D volume consists of pre‐intervention data such as computed tomography (CT) or magnetic resonance (MR) scans, while the 2D images are intra‐intervention data such as a radiograph or fluoroscopic image. Orthopedic applications of 2D‐3D registration include spine surgery, total hip replacement, orthopedic diagnostics, and kinematic analysis. In spine surgery, the registration of single vertebrae is mostly used for pedicle screw placement and cement reinforcement. , , For total hip replacement, the registration is used for intra‐operative positioning of the femoral implant , , and post‐operative analysis of cup placement. , , In orthopedic diagnostics, the 3D curvature of scoliotic spine and the scoliotic rib cage were analyzed. For kinematics analysis, 2D‐3D registration between 3D models of the joint and fluoroscopic video sequences acquired during various activities of daily living was used to analyze the in vivo motion of the native knee , , and hip, , , as well as of the prosthetic knee , , and hip. , , Evaluation of the accuracy of 2D‐3D registration is paramount to determine the performance and limitations of proposed methods, and to clarify the potential clinical application and benefit compared to possible pre‐existing methods. Typically, registration accuracy is estimated by comparison to an accurate “gold‐standard” registration method applied on a sample dataset that is representative of the specific application. Due to the large number of techniques proposed in literature, effective comparison between registration algorithms or evaluation of the accuracy of one registration technique for different applications is only possible with a standardized evaluation methodology and publicly available validation datasets. To date, only few gold standard datasets are publicly available for the validation of 2D‐3D registration for application in orthopedics. They include sets of CT, MR volumes, and x‐ray images of human cadaveric spines , and of a fresh porcine cadaver head and lungs as well as a simulated dataset of CT and digitally reconstructed radiographs (DRRs) of human pelvis and vertebrae from the Visible Human Project. Using synthetic DRR images provides an exact known ground truth but usually results in non‐fully realistic x‐ray images (e.g., absence of x‐ray scattering or image noise). Some works proposed more realistic DRR images but were highly specific to body areas (e.g., chest ), although recent work leveraging deep learning advances look very promising (e.g., DeepDRR ). To the authors’ knowledge, only one validation dataset of the hip joint with real fluoroscopic images currently exists. Among the limitations of this dataset, we identified the limited rotation angles of the fluoroscopic images and most importantly the absence of quantitative assessment of its quality as targeted in this paper. Due to the large number of orthopedic research studies focused on the hip joint, the first aim of this work is to provide another gold‐standard dataset to fill the present gap. The second focus is the improvement of both the accuracy of a gold‐standard dataset and of the method to estimate it. For most non‐synthetic datasets, , , the gold standard rigid transformations are retrieved with fiducial markers that are rigidly fixed to the phantom or patient. The location of these markers is extracted from both the 3D volume and 2D image datasets. The obtained corresponding 2D‐3D pairs are used to compute the accurate spatial alignment of the 3D dataset in the calibrated coordinate system of the 2D image. However, such gold standard transformations are not guaranteed to be completely accurate due to the x‐ray system calibration errors (e.g., inaccurate computation of the source‐to‐detector distance) and to the fiducial localization error (FLE), that is, the error in the extraction of the 3D and 2D locations of the fiducials. We propose an approach to compute the gold‐standard transformations for 2D‐3D registration which accounts for isotropic and anisotropic Gaussian FLEs in both 2D and 3D. This approach could also be used in the interventional context, where preoperative data are brought in correspondence with intraoperative information via 2D‐3D transformations estimation. Estimation of the accuracy of the ground truth is important as it defines the “uncertainty” of the gold‐standard solution, against which the transformation obtained with a 2D‐3D registration algorithm is evaluated. Most studies estimated the accuracy of their gold‐standard dataset by computing the expected target registration error (TRE). , , The TRE measures the displacement from their true position of registered target points not used as fiducials, which are typically chosen within the region of interest of the 3D dataset. The expected TRE was computed based on seminal works of Sibson and Fitzpatrick et al. under the assumption of isotropic, homogeneous, and independent Gaussian noise on the extracted location of the 3D fiducials. In the present work, we investigate whether some of these conditions may not be always met and propose to use an alternative TRE computation grounded in uncertainty theory.

MATERIALS AND METHODS

Phantom preparation and image acquisition

We used a phantom including a female pelvis, proximal femurs, and lumbar spinal segments embedded in a resin substrate mimicking the radiological response of soft tissue. Metallic beads of 3 mm diameter (N = 21 fiducials) were rigidly attached to the outer surface of the phantom (Figure 1a). Fourteen retroreflective motion capture (MoCap) markers were additionally stuck to the surface.

FIGURE 1

X‐ray phantom of a female pelvis embedded into a material mimicking radiological response of soft tissue. (a) Example of motion capture (MoCap) marker (white circle) and metallic spherical fiducial (white arrow) stuck on the phantom surface, with other examples exemplified in (b) CT volume and (c) X‐ray image acquired with different phantom orientations. For illustration purpose, depicted markers are not in correspondence between subfigures (a), (b), and (c) A CT scan of the phantom with the fiducials and the MoCap markers was acquired with a Brilliance CT 64 scanner (Philips Medical Systems) at 140 kV (Figure 1b), which resulted after cropping in a 431 × 315 × 468 volume with a voxel size of 0.78 × 0.78 × 1.0 mm3. A video‐fluoroscopy C‐arm (BV Pulsera, Philips Medical Systems) was used to acquire a set of S = 19 2D images at different orientations of the phantom around the vertical axis of the lab (Figure 1c). For each view, the fluoroscope was operated at several different kV and mAs settings. The 2D fluoroscopic images have an image matrix size of 1000 × 1000 square pixels and a grayscale dynamic range of 12 bits. A schematic overview of the measurements and of the variables involved in the computation of the gold‐standard dataset is provided in Figure 2. The fiducial locations () were defined in the coordinate system of the CT scan . For each view (), the location of a 3D fiducial is transformed to the coordinate system relative to the X‐ray image with the rigid transformation , and subsequently projected onto the image plane as the 2D fiducial position : where is the 3 × 4 projection matrix and is the transformation from to .

FIGURE 2

Schematic overview of the generation of the gold‐standard dataset. represents the 3D coordinates of the ‐th fiducial in the coordinate system of the CT scan, while represents the pixel coordinates of the ‐th fiducial in the image at view . is the rigid transformation of the coordinate system of the phantom relative to the X‐ray coordinate system and represents the ground truth transformation. The equation describing the projection of onto is: , where is the intrinsic camera projection matrix relative to the X‐ray imaging system. is the rigid transformation of the lab coordinate system relative to the X‐ray coordinate system and is used to transform lab coordinates of the motion capture markers into corresponding coordinates in the in order to retrieve a coarse estimation of the ground truth from motion capture. In practice, the phantom was actually moved at each view with respect to a static imaging system. Hence is in fact the same for all views The aim of the gold‐standard dataset is to provide the set of ground truth transformations for each view based on the 3D‐2D corresponding pairs . The ground truth of the present gold‐standard dataset considers that the 3D and 2D locations of the fiducials are affected by errors. In our work, we regroup the corrupted measured 2D and 3D positions and in a measurement vector . Ideal 3D positions are similarly regrouped into the model vector . We gather all the parameters of the unknown transformations in a transformation vector , where subscripts 1 to 3 and subscripts 4 to 6 refer to the rotational and the translational parameters, respectively. We chose the rotation vector representation, where the vector direction provides a rotation axis and its magnitude represents the rotation angle around this axis. The fluoroscopic system was considered to be calibrated because the projection matrix was estimated from a calibration procedure described in Appendix 1 (Supplementary Material). This work assumes that the re‐projection error of 0.033 mm obtained from the calibration procedure is small enough, so that the propagation error originated from the calibration can be neglected in the uncertainty analysis.

MoCap acquisition and processing

Optical MoCap was performed simultaneously to video fluoroscopy, in order to get a coarse estimate of the ground truth transforms based on motion capture and in order to automatically define the correspondences between 2D and 3D fiducial pairs. A VICON MX system (Oxford Metrics Group, UK), and 26 MX40 and T160 infrared cameras recorded at 100 Hz positions in the lab coordinate system of the MoCap markers were attached to the phantom (Figure 1). The accuracy of 3D point computation of our MoCap setup is difficult to assess, as generally several factors, such as the number and coverage of cameras, impact the overall accuracy. The impact of the MoCap setup accuracy will be investigated in the experiments validating our gold‐standard dataset. For each view , the rigid transformation of the phantom () relative to the lab coordinate system was computed by 3D‐3D registration between the positions of the optical markers measured in the lab and positions of the markers in the CT coordinate system . The obtained transformation was converted into the coordinate system of the imaging system by applying the conversion matrix relating coordinates in the lab with coordinates in the imaging system (Figure 2). In practice, the imaging system was static in while the phantom was moved at each view . Thus, and needed to be computed only once. Finally, a MoCap‐based estimate of the ground truth transform was obtained for each view as: Interested readers can refer to Appendix 2 of Supplementary Material for further details.

Fiducial positions measurement and correspondence

Our regularized deformable model framework was used to automatically extract the centers of 3D fiducials and MoCap markers, referred to as “spherical objects,” from the CT volume. For each fiducial/MoCap marker, a spherical mesh was deformed until it best matched the boundaries of the spherical object based on the alignment of intensity gradients and the mesh vertex normals. The centers of gravity of the resulting fitted spheres were set as the positions of the 3D fiducials/MoCap markers. The pixel coordinates of the 2D fiducial centers from each fluoroscopic image acquired at the kV value producing the best image contrast were retrieved by means of an in‐house developed semi‐automatic method. This algorithm relied on a blob detection algorithm provided by the open source computer vision library “OpenCV” to interactively detect the 2D fiducial positions as centers of fitted ellipses to detected blobs. Once we computed the positions of the 3D and 2D fiducials, 2D‐3D correspondence was established in automatic fashion by exploiting the coarse estimate of the transform obtained from motion capture. Transformation was used to project the positions of the 3D fiducials to 2D positions Given a projected position , the closest measured 2D position was identified. If the Euclidean distance was below the threshold of 5 mm, the point was flagged as visible in the image and set in correspondence with the 3D fiducial . We first estimated the 3D fiducial extraction accuracy in synthetic experiments, in which an artificial noisy CT scan‐like 3D volume was created including 20 spheres. We varied the volume characteristics (voxel size and isotropy, levels of additive Gaussian noise), sphere properties (radius and intensity), and initialization positions for the automatic segmentation. The resulting signed differences between expected and extracted centers of more than 24 000 spheres were: 0.018 ± 0.04, 0.014 ± 0.074, and 0.006 ± 0.10 mm in X‐, Y‐, and Z‐directions, respectively, Z being the slice stacking direction. Then, we used a quality assurance (QA) phantom (Lucy 3D QA Phantom, Standard Imaging, Inc.) in an in vitro experiment. The QA phantom included twenty 2 mm diameter aluminum spheres spaced by 5 mm (manufacturing tolerance of 0.1 mm). We acquired a CT scan (120 kV, size 512 × 512 × 340, 0.31 × 0.31 × 0.5 mm3 of voxel size, Philips Brilliance CT Big Bore model) of the QA phantom (Figure 3a) and extracted the centers of the segmented spheres (Figure 3b) in 400 trials in which we randomly varied the initial centers within the 20 spheres according to a normal distribution with mm–mimicking a user click around the sphere centers. These were rigidly registered to the reference centers of the QA phantom, with resulting average errors of 1.76e‐09 ± 0.16, 1.17e‐09 ± 0.16, and −2.86e‐09 ± 0.18 mm in X‐, Y‐, and Z‐directions, respectively. By combining the signed differences of synthetic and Lucy phantom experiments altogether, we obtained average difference errors of 0.01 ± 0.09, 0.01 ± 0.10, and 0.004 ± 0.127 mm in X‐, Y‐, and Z‐directions, respectively.

FIGURE 3

Metallic sphere detection of quality assurance phantom. a) CT scan of the phantom showing four 2 mm diameter aluminum spheres. b) Example of 3D sphere extraction based on regularized deformable models where the larger red circle is the initialized model and the smaller green circle is the final result. c) Example of 2D extraction where the reference locations (centers of larger red circles) are compared with the extracted locations (centers of smaller green circles) The same QA phantom was used to assess the 2D fiducial center extraction, which was performed on multiple DRRs of the QA phantom CT at various angles (Figure 3c), with a DRR spatial resolution of 0.29 × 0.29 mm2. The reference 2D centers were obtained by projection of the 3D centers extracted using our 3D segmentation approach. Differences between extracted 2D centers and reference 2D positions were 0.15 ± 0.19 mm and −0.02 ± 0.19 mm in horizontal X‐ and vertical Y‐directions, respectively.

Multiple projective points criterion (MPPC)

Traditional Perspective‐n‐Point (PnP) algorithms , are commonly used to compute the ground truth transformations from and corresponding pairs using Equation (1). However, standard PnP algorithms are not suited to account for inaccuracies in measured 2D fiducial positions. Different extensions of the PnP algorithm were proposed to address this issue, such as the CEEPnP and ML‐PnP approaches. Alternatively, optimization approaches , were developed to reduce the impact of 2D inaccuracies by minimizing the 3D fiducial registration error (FRE), defined as the distance between 3D fiducials segmented from the CT scan and the 3D fiducials reconstructed by triangulation of the extracted 2D fiducial image positions. For a set of 2D image positions in multiple views corresponding to the same fiducial, the triangulation computes the 3D reconstructed point as the 3D closest point to back‐projected lines passing through the 2D image positions. Other studies obtained the ground truth transformation by minimizing the 2D mean projection distance (mPD) between extracted 2D fiducial image positions and reprojected 3D fiducials. However, most of these approaches continue to assume that the positions of the 3D markers of the model are perfectly known or that errors in their detection are negligible. These assumptions may become invalid when fiducials are manually placed on gold‐standard phantoms for 2D‐3D registration as proposed in the current work and previous studies. , , In our work, we simultaneously optimized both the transformation parameters and the 3D markers location, similar to the work of Nicolau et al. They defined the extended projective points criterion (EPPC) to determine the optimal and the optimal , hereafter referred to as and from a maximum likelihood estimator: , with p being the conditional probability density function. Nicolau et al. considered that 2D and 3D fiducials were corrupted by zero‐mean Gaussian isotropic noise parameterized by variances and . In our work, we assume that positions of the 2D fiducials and 3D fiducials are identically and independently corrupted by additive zero‐mean Gaussian noises with covariance matrices and . We can thus model both isotropic and anisotropic noises. Furthermore, in our case the transformations between the different X‐ray are unknown so we have to optimize multiple transforms. Based on these assumptions, the conditional probability of our measurement vector is written as the product of independent probabilities : where or 0 if 2D point is visible or not in image . Taking the negative logarithm of , we aim at minimizing the proposed multiple projective points criterion (MPPC) : with subcriteria similar to squared Mahalanobis distances: where and are the inverses of the 2D and 3D covariance matrices and . We point out that the proposed MPPC criterion is optimized for all views simultaneously. To minimize the criterion , we initialize unknown ideal fiducial positions with the measured positions and the transformations with the coarse transformations resulting from the MoCap analysis. Then the optimization of is split into two sub‐optimizations performed in an iterative interleaved manner until convergence : M – optimization: at a given iteration the current estimates of the transformations are considered as fixed and the positions are optimized and T – optimization: in the next iteration the last estimates of positions are kept fixed while the transformations are optimized. In contrast to Nicolau et al.’s work, we used the Levenberg–Marquardt (LM) optimization algorithm as subcriteria which are expressed as sums of squared residuals terms (see Appendix 3 in Supplementary Material).

Accuracy of the gold standard transformations

The works of Sibson and Fitzpatrick et al. were used by most previous studies to compute the expected TRE in order to estimate the accuracy of the gold‐standard datasets. Assuming that the FLE of each fiducial and of each corresponding transformed fiducial is identically and independently distributed (i.i.d) as an isotropic zero‐mean Gaussian distribution, by assuming a first‐order approximation of the rotation component of the transformation , Fitzpatrick et al. proposed an estimation of the expected TRE at a target point based on the expected FLE: where is the root mean square (RMS) distance of the projections of the fiducials to the th principal axis of the fiducial configuration, is the RMS distance of projected to the th principal axis, and ‹⬚› indicates the expected value. Sibson showed that under the same assumptions, the expected FLE can be retrieved from the expected FRE as: This TRE, hereafter referred to as reconstructed TRE (rTRE), has commonly be used to replace the “true” TRE (tTRE) that would be computed if true transformations were available. However, we found that the conditions to use such rTRE formulation are not met when using reconstructed 3D fiducials by multi‐view triangulation of the 2D fiducials. In fact, their distribution was shown to be usually anisotropic, , , and its Gaussianity may be valid only as a local approximation. Furthermore, reconstructed fiducials will present heteroscedastic errors––characterized by inhomogeneous noise. Despite alternative TRE computations were proposed to tackle this more complex noise models, , most of the works on gold‐standard datasets , , still used the original rTRE proposed by Fitzpatrick et al. We investigated an alternative formulation of the TRE for the MPPC method, which takes into account the propagated uncertainty of both 2D and 3D fiducial positions modeled as Gaussian and possibly anisotropic noise. Following Pennec and Thirion, we can state that criterion (3) will reach a well‐defined local minimum if and only if: We can consider the measurement vector as a random vector of mean and covariance . Using the implicit function theorem and a first‐order Taylor series expansion we get: By definition, , hence we could extract from (analytical details are provided in Appendix 4 of the Supplementary Material). The estimation of the uncertainty of a target position after application of the optimized transformation vector , is obtained by uncertainty propagation: where the expression of is given by Pennec and Thirion. From the same work, we derive an expression of the expected TRE for a target transformed by the computed gold‐standard as the expectation of the squared distance between true and estimated positions of the transformed target: Given a number of target points, we finally express the corresponding average TRE, coined hereafter as uTRE, as the following RMS: Compared to the standard rTRE, the uTRE is expected to better account for both noise distributions of 2D and 3D fiducials.

RESULTS

The performance of the proposed MPPC criterion was compared against the iterative PnP algorithm “solvePnP” of OpenCV, referred to as iterative PnP (cvPnP) approach, in both synthetic and real experiments in presence of 2D and 3D noise. cvPnP minimizes the reprojection error with the LM algorithm, and contrary to the MPPC approach, it optimizes each view independently and does not explicitly model the 3D and 2D noises in the optimization. We used the different formulations of TRE as evaluation metrics for comparison between the proposed MPPC and the cvPnP algorithms: “Standard” reconstructed TRE (rTRE) (6), , , relying on measured FRE and on the known FLE for synthetic experiments or the estimated FLE for real experiments. Reconstructed points were expressed in the coordinate system and computed as the closest points to back‐projected lines. In case of MPPC, we used the optimized fiducial positions instead of the perturbed positions for computation of FRE. The estimated rigid transform between fiducial points and reconstructed points was based on standard least square error minimization. Robust reconstructed TRE (hTRE) designed to tackle the heteroscedastic and anisotropic errors of the reconstructed fiducials. In this case, we also used a robust rigid transform estimation technique instead of the standard least square approach. The proposed uTRE (12) based on uncertainty derivation, only valid for our MPPC approach. In synthetic experiments, the true TRE (tTRE) computed as the RMS of the Euclidean distances between target points transformed by the ground truth and by the tested transforms : Both standard and robust rTRE rely on calculated Euclidean distances in the reconstructed coordinate system , while for uTRE and tTRE these distances are computed in the of each view. This discrepancy of CS between TRE formulations prevents the direct comparison of the TRE values. Assuming we know the transformations expressed from to , we can calculate additional TREs for these transformations and use the chain rule provided by West and Maurer to get a comparable “composite” reconstructed . The composite TRE can be derived for both standard () and robust () rTRE. We tested the significance of the difference in paired observations using a paired two‐sided t‐test if the difference was normally distributed, or a paired two‐sided Wilcoxon signed‐rank test otherwise. Data normality was checked with a Shapiro–Wilk test. All tests used a confidence level at 99%.

Synthetic evaluation of the multiple projective points criterion

We considered the MoCap transforms and the extracted fiducial positions from the CT volume as the ground truths. Both MPPC and cvPnP approaches were initialized with transformations computed using the ML‐PnP algorithm of Urban et al. We produced various FLE by perturbing the 3D positions of fiducials and of their 2D reprojections with different values of zero‐mean Gaussian noises: mm for 2D and mm for 3D noises. 2D covariance matrices were isotropic (), while for 3D noise we considered both isotropic ) and anisotropic ) cases. For each configuration of 2D and 3D (anisotropic) noises, we randomly drew 100 samples from the respective distributions––leading to a total of 3600 experiments involving 19 views. Target points were regularly sampled in a 9 × 9 × 9 grid around the hip bones (. Since ground truth transforms were known, we computed the composite standard and robust rTREs. Results averaged over all trials and different noise levels are reported in Table 1. Based on the average values of TREs, we observed that tTRE was statistically different than the corresponding robust or standard composite TREs (p values <0.002), regardless of the chosen approach and of the 3D perturbation isotropy. The only exceptions without statistical difference were the robust composite TREs for the MPPC approach in the isotropic case with the highest level of 3D noise (). The obtained using the MPPC approach proved to be always statistically inferior to the obtained using the cvPnP approach, regardless of the noise levels and 3D noise isotropy. For the tTRE, average values suggested that the MPPC approach generally performed better than the cvPnP (e.g., 2.05 mm vs. 2.38 mm for the isotropic case) although statistical significance was not observed for pairs with mm and mm in the isotropic case.

TABLE 1

Method (iso/anisotropic voxel size)		rTREc [mm]	hTREc [mm]	tTRE [mm]	uTRE [mm]
cvPnP	isotropic	2.52 ± 1.56	2.48 ± 1.58	2.38 ± 1.61	—
cvPnP	anisotropic	2.80 ± 1.76	2.75 ± 1.78	2.67 ± 1.81	—
MPPC	isotropic	2.22 ± 1.25	1.99 ± 1.34	2.05 ± 1.30	2.19 ± 0.51
MPPC	anisotropic	2.39 ± 1.38	2.13 ± 1.48	2.23 ± 1.44	2.29 ± 0.54

Results of different types of target reconstruction errors (true TRE (tTRE), reconstructed composite TRE (for both standard () and robust () approaches), and uncertainty‐based TRE (uTRE)) from the synthetic experiments, averaged over 3600 trials with different 2D and 3D noise levels–3D noise having isotropic and anisotropic variants. An iterative PnP method (cvPnP) was tested against our method using the proposed multiple projective points criterion (MPPC) Method (iso/anisotropic voxel size) [mm] [mm] tTRE [mm] uTRE [mm] All TRE formulations were significantly higher when 3D fiducials were perturbed by anisotropic noise, except for the in the MPPC approach with for which statistical significance was not observed. For the MPPC approach and over all noise levels, the average uTRE was considerably closer to the average tTRE compared to the . When considering the effect of varying 2D and 3D noise levels and 3D noise isotropy (Table 2) tTRE and uTRE were statistically different for some noise configurations with large (combined) noise levels. In those cases, the averaged uTRE generally overestimated the tTRE.

TABLE 2

Comparison of the difference between true TRE (tTRE) and uncertainty‐based TRE (uTRE) for our multiple projective points criterion (MPPC) method in synthetic experiments which included 3600 trials with varying 2D and 3D Gaussian noise levels (): from 0.15 to 1.45 mm and from 0.5 to 2.0 mm (with isotropic and anisotropic variants of the 3D covariance matrix)

tTRE—uTRE (isotropic) [mm]							tTRE—uTRE (anisotropic) [mm]
	0.15	0.29	0.58	0.87	1.16	1.45	0.15	0.29	0.58	0.87	1.16	1.45
0.5	0.17	0.13	0.04	−0.04	−0.12	−0.21*	0.26	0.21	0.08	−0.08	−0.20	−0.33*
1.0	−0.00	−0.08	0.19	−0.27*	−0.34*	−0.40*	0.10	0.02	−0.12	−0.23	−0.31*	−0.40*
2.0	−0.06	−0.11	−0.19	−0.25	−0.31	−0.36*	0.18	0.11	−0.00	−0.10	−0.19	−0.27

Asterix* highlights a statistically significant difference.

Validation of the MPPC‐based gold‐standard dataset

For the real experiments, we applied both the MPPC and the cvPnP approaches to compute the gold‐standard transformations of our dataset. Both algorithms were initialized with the transformations from MoCap. In addition, the MPPC was initialized with the measured 3D positions . We set the FLE 2D and 3D covariance matrices and based on the variances of conducted experiments, both expressed in mm: The 3D covariance matrix modeled an anisotropic noise with larger variance in the Z‐direction, which is common for medical imaging datasets with a lower resolution in the slice stacking direction in order to save acquisition time, improve signal‐to‐noise ratio, or reduce dose exposure. We set the 2D and 3D covariance matrices to have equal variances in X‐ and Y‐directions because computed variances in the experiments were quasi‐identical and it was reasonable to assume that noise would not be especially biased for any of the X‐ or Y‐direction. In order to assess the accuracy of the MoCap setup, we also considered the initializations as the result of an approach to compute ground truth transformations, denoted as the “MoCap” method. We tested the three approaches MPPC, cvPnP, and MoCap with different numbers of views: 2 (acquired in anteroposterior (AP) and quasilateral (LAT) positions), 9 (mimicking at best the angles of the work of Tomaževič et al. ), and all the 19 views. For the cvPnP and MoCap approaches, the number of views did not have any impact on the computation of the transformations , but it will impact the results of the following evaluation metrics. For comparison purposes with previous works, 2D metrics for the evaluation of the accuracy of the ground truths included the mean (mPD) and RMS (rmsPD) projection distance errors, as well as the standard rTRE since the robust variant was not used in these works. For the MPPC approach, we computed the values of the metrics using both measured positions and optimized positions (denoted as the “non‐noisy” case). The uncertainty‐based TRE uTRE (12) was only computed for the non‐noisy MPPC. It is worth noting that the formulations of the true TRE (tTRE) and of the composite TREs ( and ) used in previous experiment could not be used as the ground truth transformations were unknown. For computation of the TREs, we defined 12 target points located at key anatomical landmarks such as the trochanters, hip joint centers, or the anterior superior iliac spines. Results are summarized in Table 3 and Figure 4.

TABLE 3

Method (# views)		mPD [mm]	rmsPD [mm]	FRE [mm]	FLE [mm]	rTRE [mm]	uTRE [mm]
MoCap	2 views	0.76 ± 0.32	0.82	1.05	1.19	0.78	—
	9 views	0.97 ± 0.50	1.08	0.90	0.94	0.35	—
	19 views	0.93 ± 0.46	1.04	0.87	0.91	0.34	—
cvPnP	2 views	0.22 ± 0.11	0.25	0.27	0.31	0.20	—
	9 views	0.27 ± 0.12	0.30	0.28	0.30	0.11	—
	19 views	0.25 ± 0.12	0.28	0.27	0.28	0.11	—
MPPC	2 views	0.22 ± 0.11	0.25	0.26	0.30	0.20	—
	9 views	0.29 ± 0.14	0.32	0.31	0.33	0.12	—
	19 views	0.29 ± 0.14	0.32	0.32	0.34	0.12	—
MPCC (non‐noisy)	2 views	0.15 ± 0.07	0.17	0.20	0.22	0.15	0.65
	9 views	0.14 ± 0.06	0.16	0.16	0.17	0.06	0.61
	19 views	0.12 ± 0.06	0.13	0.13	0.14	0.05	0.59

Abbreviations: FLE, fiducial localization error; FRE, fiducial registration error; mPD, mean reprojection distance; rmsPD, root mean square projection distance; rTRE, standard reconstructed target registration error; uTRE, target registration error based on uncertainty theory.

FIGURE 4

Box plot for evaluation metrics of the accuracy of the ground truth transformations retrieved with an iterative PnP method (cvPnP), with optical motion capture (MoCap), and with the multiple projective points criterion (MPPC) using measured 3D fiducial positions (MPPC) and optimized ones (MPPC non‐noisy)

Evaluation metrics of the accuracy of the ground truth transformations obtained with an iterative PnP method (cvPnP), with optical motion capture (MoCap), and with the proposed multiple projective points criterion (MPPC). Metrics for cvPnP and MoCap were computed based on 3D fiducial positions only, while metrics for MPPC were computed with both (MPPC) and the optimized 3D fiducial positions (MPPC non‐noisy) Method (# views) mPD [mm] rmsPD [mm] FRE [mm] FLE [mm] rTRE [mm] uTRE [mm] Abbreviations: FLE, fiducial localization error; FRE, fiducial registration error; mPD, mean reprojection distance; rmsPD, root mean square projection distance; rTRE, standard reconstructed target registration error; uTRE, target registration error based on uncertainty theory. Box plot for evaluation metrics of the accuracy of the ground truth transformations retrieved with an iterative PnP method (cvPnP), with optical motion capture (MoCap), and with the multiple projective points criterion (MPPC) using measured 3D fiducial positions (MPPC) and optimized ones (MPPC non‐noisy) When using non‐optimized 3D fiducial positions , values of mPD, rmsPD, and rTRE obtained with the MPPC method were similar to those from the cvPnP method, for all x‐ray views. However, when the accuracy evaluation metrics were computed using the MPPC method with optimized 3D fiducial positions , the rTRE was statistically smaller than the rTREs of the best cvPnP and MoCap results using 19 views (rTRE = 0.11 and 0.34 mm), regardless of the number of views used for the MPPC, for which the rTRE ranged from 0.05 to 0.15 mm. Increasing the number of x‐ray views improved the rTRE, especially for the MPPC using optimized 3D fiducials for which rTRE decreased by 60% from 2 to 19 views. The MoCap method performed poorly compared to cvPnP and MPPC variants, with statistically higher evaluation metrics (both average and standard deviations), regardless of the number of views. Like in the previous synthetic experiments, the computed uTRE was considerably higher than the rTRE, regardless the number of views.

DISCUSSION

A new gold‐standard 2D‐3D registration dataset for the hip joint

We proposed a public dataset for the validation of CT‐to‐x‐ray 2D‐3D registration of the hip joint that consists in 19 real fluoroscopic images of a female hip phantom, acquired at different x‐ray voltages and different phantom orientations with large rotations. This dataset is useful for standardized evaluation of the registration accuracy in orthopedic applications. Markelj et al. generated a validation dataset including the human pelvis based on DRRs, which were however not fully realistic due to the absence of noise introduced by the imaging device and due to the discrete nature of the projected CT volumetric image. In the present work, the quality and the field of view of the fluoroscopic images were matched to those of typical in vivo acquisitions, such as fluoroscopy‐based analyses of the hip joint during motion, for which the required voltage varies depending on the body mass index of the patient, the target hip is not always centered in the image and frequent overexposed areas present saturation of the pixel intensity. Moreover, the different poses of the phantom reproduce the varying irradiation angles used in the clinical practice depending on the instrumental setup (i.e., single‐plane vs. dual‐plane fluoroscopy), on the measured activity and subject, and on the limits in delivered radiation exposure. The dependence of the performance of a registration algorithm on the x‐ray voltage can also be investigated with the proposed dataset. Still, virtual radiographs as proposed by Markelj et al. are of interest since the ground truth transformations are exact, so we decided to also include synthetic radiographs to our dataset. We modified the DeepDRR approach (e.g., modeling of the detector response to x‐ray fluence, use of post‐filtering such as adaptive histogram equalization) and used a higher resolution CT (dimensions 431 × 315 × 1418 and voxel size 0.78 × 0.78 × 0.33 mm3) in order to produce better virtual radiographs. As shown in Figure 5, the DeepDRR approach generally produced quite convincing radiographs, but some artifacts were visible (e.g., vertical stripes, grainy areas) and the realism of the scattering effect (e.g., the sacrum is too visible in the virtual radiographs) or the overexposed effect could not be really reproduced.

FIGURE 5

Comparison of fluoroscopic images (left column) versus synthetic DRR images (right column) generated with the DeepDRR approach

Comparison of fluoroscopic images (left column) versus synthetic DRR images (right column) generated with the DeepDRR approach Another novel aspect of the present work was the use of MoCap for automatic definition of the 3D‐2D fiducial marker correspondences required for point‐based registration to obtain the ground truth transformation. However, this technique requires motion capture equipment and may not be suited for intraoperative validations. This was the focus of the work from Madan et al., which proposed a method for fully automatic marker extraction and identification for point‐based registration during endovascular image‐guided interventions. While the accuracy of the transformations computed with MoCap was sufficient to establish point correspondence, it was considerably lower than the accuracy from the other tested methods. Hence, MoCap may not be suited to build an accurate gold‐standard validation dataset. In fact, the accuracy of 3D point computation of MoCap systems depends on several factors such as coverage, number and type of cameras, and static or dynamic setup. For instance, previous works , , reported a 95th percentile error (mean + 2 * std) ranging in [0.073, 6.7] mm.

A validated dataset based on the multiple projective points criterion (MPPC)

The proposed MPPC method for computation of the gold‐standard transformations a) modeled the noise of both corresponding 2D and 3D fiducials as identically and independently distributed zero‐mean Gaussian noise, with a tunable degree of anisotropy and b) optimized the noisy 3D fiducial locations together with the transformations, including all X‐ray views into a single optimization. In contrast, previous studies , , did not model 3D fiducial errors and mainly optimized the unknown transformations, while assuming zero‐mean Gaussian noise of 2D fiducial positions. Optimizing for the 3D fiducials together with the transformations had significant effects on the rTRE when evaluated with these optimized 3D fiducial locations. The rTRE was significantly smaller compared to the one evaluated from both MPPC and the iterative cvPnP method with measured 3D fiducial locations, regardless the number of x‐ray views. We observed that in both 2D and 3D experiments the computed error distributions did not follow a Gaussian distribution according to univariate (Shapiro–Wilk test) and multivariate (Mardia's test) normality tests. However, the proposed MPPC performed better than other methods, despite the assumption of Gaussianity. The assumption of 2D zero‐mean Gaussian distribution is commonly used for approaches minimizing the reprojection error in a least mean squares sense, although it is often not formally verified. As observed in previous works, , the accuracy metrics of the MPPC improved when increasing the number of x‐ray views. For instance, an increase from 2 to 19 x‐ray views slightly improved the mPD from 0.15 to 0.12 mm while the rTRE decreased from 0.15 to 0.05 mm. Results obtained with our dataset were comparable and sometimes superior to previous published works. However, comparison with other fiducial‐based validation datasets should be performed with caution due to the different types of fiducials, anatomy, target points as well as the quality of x‐ray and volumetric images. Pawiro et al. generated gold‐standard CT‐to‐x‐ray transformations from two views of a fresh porcine cadaver head, by minimizing the mPD in 2D. Their best values for FRE, rTRE, and mPD were 0.22, 0.17, and 0.51 mm, respectively, which are similar to the values of the present dataset for two views, but higher than those for 19 views. Tomazevic et al. generated a CT‐to‐x‐ray validation dataset for the lumbar spine by minimizing the FRE for nine views, and reported rTRE less than 0.26 mm, which is higher than the largest rTRE for our dataset using two views. Mitrovic et al. and Madan et al. produced the first gold‐standard datasets based on pairs of clinical images, including 3D contrast‐enhanced cone beam CT and 2D angiograms of 20 patients. Using only two quasi‐orthogonal views, they both achieved better accuracy than the one herein reported for 19 views (FRE between 0.038 and 0.060 mm, rTRE between 0.033 and 0.056 for Mitrovic et al. ; FRE = 0.017 mm, rTRE smaller than 0.027 mm for Madan et al. ). Their improved performance may originate from the better quality of the medical images, and to a possibly more accurate fiducial position extraction technique. Lastly, while Grupp et al. , proposed the first 2D‐3D gold‐standard dataset for the hip, no detailed information was given on quantitative performance of the registration, especially with the TRE. Similar to previously published gold‐standard datasets, a limitation of our dataset is that it cannot encompass all anatomical and pathological variations across individuals. As a result, the performance of any algorithm validated with it cannot be deemed as representative of its general performance in clinical practice. However, as we previously mentioned, the purpose of 2D‐3D datasets is to provide an objective way to benchmark 2D‐3D algorithms.

The need to account for data uncertainty

Our extraction of 2D and 3D fiducial positions generated fairly isotropic noise, except for the slice stacking direction of the CT scan which showed larger deviation values. Synthetic experiments showed an expected worsening of both reconstructed and true TREs when anisotropy affected the 3D positions of the fiducials. Hence, our analysis showed that the assumption of anisotropy shall be enforced, usually by considering the direction of the medical datasets with lower spatial resolution, commonly observed in clinical practices (e.g., CT , and MRI , ). The proposed MPPC provides the mathematical framework to model uncertainty in both 2D and 3D fiducial locations, and we highlighted the superiority of the MPPC in considering noise to derive optimized transformations and 3D fiducial positions. It can be argued that the smaller values of rTRE for the MPPC stems only from the optimization of the 3D fiducial positions and the use of these optimized values in the rTRE metric. In fact, when using the measured 3D fiducial positions instead of the optimized ones, the MPPC yielded accuracy metrics with our dataset that could not be statistically distinguished from those of the cvPnP approach. However, in our opinion the issue is that the accuracy metric of the rTRE may not be representative of the true accuracy of a gold‐standard dataset. In fact, synthetic experiments showed that the rTRE was significantly different than the true TRE. While the robust variant of the rTRE produced values closer to the true TRE than the standard rTRE by accounting for heteroscedastic errors and using a robust registration approach, statistical differences were still observed–with the only exception of the MPPC approach in the isotropic case with the highest level of 3D noise . We suspect that this surprising result stems from the non‐Gaussian distribution of the 3D reconstructed fiducials based on triangulation that may hinder the performance of the robust approaches designed with the assumption of Gaussianity. In contrast, the proposed TRE based on uncertainty analysis (uTRE) provided a better approximation of the true TRE in both isotropic and anisotropic synthetic cases, and it only overestimated the tTRE for large values of 3D and 2D noises, which should not be observed in practice. We think that observed differences between uTRE and true TRE in the synthetic experiments may be the result of approximations such as first‐order Taylor series truncation. Since in the real experiments the uTRE was considerably different than the reconstructed rTRE, we suggest that the rTRE may not reflect the accuracy of a gold standard dataset and that reported rTREs may have to be considered with caution, especially when the error in fiducial extraction was not reported or was not negligible. While the uTRE provides more realistic results than the rTRE due to the inclusion of (anisotropic) Gaussian noise in the 3D fiducials as proven by the synthetic experiments, it may still not be fully realistic in our gold standard dataset due to the non‐Gaussianity observed for both 2D and 3D fiducials. A limitation of the proposed evaluation methodology was that the proposed MPPC method did not take into account of the propagation of possible errors in the estimated intrinsic camera parameters. We could take inspiration from previous works jointly minimizing the intrinsic and extrinsic parameters, , , but the derivation of a new criterion based on an uncertainty analysis will be more much complex. Furthermore, noises on 2D and 3D points were assumed Gaussian but were not measured as such and were also considered without bias. Indeed, our 3D extraction technique applied to the synthetic datasets yielded an average bias of 0.013 mm, which we considered negligible with respect to the average voxel size of 0.7 mm. The 2D technique had however a higher bias of 0.15 mm compared to a 0.29 mm pixel size, which should be further improved with a better designed 2D extraction technique. More complex noise distribution (e.g., Gaussian noise with bias studied by Moghari and Abolmaesumi ) could be hence investigated in future research.

CONCLUSIONS

We proposed the first publicly available dataset for the standardized validation of 2D‐3D registration of the hip joint based on real fluoroscopic images presenting large rotation angles. Our dataset is a perfect complement to the recently released public dataset of the hip joint in which fluoroscopic images presented slight rotations–hindering the study of multiview 2D‐3D reconstruction. In addition to the new anatomical target, the present paper introduces novel aspects in both computation of the gold‐standard transformations and the evaluation of their accuracy based on uncertainty analysis. We presented approaches to extract the positions of 2D and 3D fiducials from x‐ray and volumetric images. The uncertainty in the measured 2D and 3D fiducials was modeled as independently and identically distributed zero‐mean isotropic and anisotropic Gaussian noises. This uncertainty was used to derive a new iterative PnP criterion (MPPC) that computes the ground truth transformations by optimizing the noisy 3D fiducial positions as well. The proposed MPPC exhibited good performance in both synthetic and real experiments. Furthermore, a new target reconstruction error (uTRE) was formulated, which included the uncertainty in the extraction of the 2D and 3D fiducials and anisotropy. Failing at including such uncertainties may provide incorrect estimation of the accuracy of a gold standard dataset. We demonstrated the utility of MPPC algorithm for the estimation and assessment of 2D‐3D transformations for gold‐standard datasets. The proposed algorithm could also be used intraoperatively to put into correspondence pre‐ and intra‐operative data–while obtaining an estimation of the resulting uncertainty.

CONFLICT OF INTEREST

The authors have no conflict of interest to report. Fig S1 Click here for additional data file. Fig S2 Click here for additional data file. Supplementary Material Click here for additional data file.

48 in total

1. Standardized evaluation methodology for 3D/2D registration based on the Visible Human data set.

Authors: Primoz Markelj; Bostjan Likar; Franjo Pernus
Journal: Med Phys Date: 2010-09 Impact factor: 4.071

2. Standardized evaluation methodology for 2-D-3-D registration.

Authors: Everine B van de Kraats; Graeme P Penney; Dejan Tomazevic; Theo van Walsum; Wiro J Niessen
Journal: IEEE Trans Med Imaging Date: 2005-09 Impact factor: 10.048

3. Robust gradient-based 3-D/2-D registration of CT and MR to X-ray images.

Authors: Primo Markelj; Dejan Tomazevic; Franjo Pernus; Bo Tjan Likar
Journal: IEEE Trans Med Imaging Date: 2008-12 Impact factor: 10.048

4. A theoretical comparison of different target registration error estimators.

Authors: Mehdi Hedjazi Moghari; Burton Ma; Purang Abolmaesumi
Journal: Med Image Comput Comput Assist Interv Date: 2008

5. Predicting error in rigid-body point-based registration.

Authors: J M Fitzpatrick; J B West; C R Maurer
Journal: IEEE Trans Med Imaging Date: 1998-10 Impact factor: 10.048

6. Estimation of optimal fiducial target registration error in the presence of heteroscedastic noise.

Authors: Burton Ma; Mehdi H Moghari; Randy E Ellis; Purang Abolmaesumi
Journal: IEEE Trans Med Imaging Date: 2010-03 Impact factor: 10.048

7. Dynamic femoral head translations in dysplastic hips.

Authors: Tatsuya Sato; Hiromasa Tanino; Yasuhiro Nishida; Hiroshi Ito; Takeo Matsuno; Scott A Banks
Journal: Clin Biomech (Bristol, Avon) Date: 2017-05-05 Impact factor: 2.063

8. In-vivo 6 degrees-of-freedom kinematics of metal-on-polyethylene total hip arthroplasty during gait.

Authors: Tsung-Yuan Tsai; Jing-Sheng Li; Shaobai Wang; Donna Scarborough; Young-Min Kwon
Journal: J Biomech Date: 2014-03-15 Impact factor: 2.712

9. A new 2D-3D registration gold-standard dataset for the hip joint based on uncertainty modeling.

Authors: Fabio D'Isidoro; Christophe Chênes; Stephen J Ferguson; Jérôme Schmid
Journal: Med Phys Date: 2021-08-17 Impact factor: 4.506

10. Validation of a method to measure three-dimensional hip joint kinematics in subjects with femoroacetabular impingement.

Authors: Thomas R Ward; Mafruha M Hussain; Mark Pickering; Diana Perriman; Al Burns; Jennie Scarvell; Paul N Smith
Journal: Hip Int Date: 2019-10-17 Impact factor: 2.135

1 in total

1. A new 2D-3D registration gold-standard dataset for the hip joint based on uncertainty modeling.

Authors: Fabio D'Isidoro; Christophe Chênes; Stephen J Ferguson; Jérôme Schmid
Journal: Med Phys Date: 2021-08-17 Impact factor: 4.506

1 in total