Automatic Plane Adjustment of Orthopedic Intra-operative

Automatic Plane Adjustment of Orthopedic Intra-operativeFlat Panel Detector CT-Volumes

Celia Martın Vicarioa, Florian Kordonabc, Felix Denzingerac, Jan Siad El Barbarid, MaximPrivalovd, Jochen Franked, Sarina Thomase, Lisa Kausche, Andreas Maiera, and Holger

Kunzeca

aPattern Recognition Lab, Friedrich-Alexander-Universitat Erlangen-Nurnberg, Erlangen,Germany

bErlangen Graduate School in Advanced Optical Technologies, Friedrich-Alexander-UniversitatErlangen-Nurnberg (FAU), Erlangen, Germany

cSiemens Healthcare GmbH, Forchheim, GermanydDepartment for Trauma and Orthopaedic Surgery, BG Trauma Center Ludwigshafen,

LudwigshafeneDivision of Medical Image Computing, German Cancer Research Center, Heidelberg,

Germany

ABSTRACT

Purpose 3D acquisitions are often acquired to assess the result in orthopedic trauma surgery. With a mobile C-Arm system, these acquisitions can be performed intra-operatively. That reduces the number of required revisionsurgeries. However, due to the operation room setup, the acquisitions typically cannot be performed such thatthe acquired volumes are aligned to the anatomical regions. Thus, the multiplanar reconstructed (MPR) planesneed to be adjusted manually during the review of the volume. In this paper, we present a detailed study ofmulti-task learning (MTL) regression networks to estimate the parameters of the MPR planes.

Approach First, various mathematical descriptions for rotation, including Euler angle, quaternion, and matrixrepresentation, are revised. Then, three different MTL network architectures based on the PoseNet are comparedwith a single task learning network.

Results Using a matrix description rather than the Euler angle description, the accuracy of the regressednormals improves from 7.7 to 7.3 in the mean value for single anatomies. The multi-head approach improvesthe regression of the plane position from 7.4mm to 6.1mm, while the orientation does not benefit from thisapproach.

Conclusions The results show that a multi-head approach can lead to slightly better results than the individualtasks networks. The most important benefit of the MTL approach is that it is a single network for standardplane regression for all body regions with a reduced number of stored parameters.

Keywords: Multiplanar Reconstruction, Orthopedics, Flat Panel CT, Plane Regression

1. INTRODUCTION

The default imaging modality to assess fracture reduction, implant position, and overall outcome during anorthopedic trauma surgery is X-ray imaging. However, the situation cannot be clearly judged from the X-rayimage in complex anatomical regions like calcaneus, ankle, wrist, or knee in complex anatomical regions likecalcaneus, ankle, wrist, or knee. This is due to ambiguities caused by overlapping or convex bones so that thepositions of implants with respect to the corresponding bones are difficult to judge. Therefore, the acquisitionof 3D scans is recommended before releasing the patient from the hospital. If 3D imaging is performed post-operatively, e.g., using a diagnostic CT system, not every minor finding will lead to revision surgery to spare

Further author information: (Send correspondence to C. Martın Vicario)C. Martın Vicario: E-mail: [email protected]

1

arX

iv:2

109.

1073

1v1

[ee

ss.I

V]

15

Sep

2021

the patient the risks of additional surgery. However, recent studies have shown that intra-operative 3D imaginghas led to corrections of up to 40 % of surgeries, depending on the body region.1–10 Thus intra-operative 3Dimaging reduces the number of revision surgeries and improves the outcome of surgeries as also minor findingsare usually corrected.

For intra-operative acquisition of 3D volumes, mobile C-arm systems are usually used, which are capable ofcone-beam tomography (CBCT). These systems typically have a relatively limited field of view with a volumeedge length of about 160 mm to 250 mm. Consequently, the captured anatomy section and thus the anatomicallandmarks’ position and visibility may vary substantially.

When reading a 3D volume, the volume should be aligned to the anatomical structures in a standardized wayas it is done in the radiology department. The key slices that contain anatomical structures which are decisivefor the assessment of intervention results are called standard planes. Typically there are three of them: the axial,coronal, and sagittal plane. From an intra-operative 3D volume, they are typically obtained by the multiplanarreconstruction (MPR) technique. Generally, the three planes are orthogonal to each other, but in some regions,instead of these three orthogonal planes, an oblique plane provides the required information. One example of anoblique plane is the semi-coronal plane in the calcaneus region, a modification of the coronal plane that is notorthogonal to the axial and sagittal planes and which allows the evaluation of the reconstruction of the posteriortalar surface.11

In Kausch et al.12 it has been shown that the accuracy of surgeons adjusting the standard MPRs highlydepends on the region. In the lumbar spine region, where the planes can be adjusted using well-defined landmarks,the inter-rater difference was about half compared to the proximal femur region, where these kinds of landmarksare missing. The mean inter-rater variance was measured up to 6.3 for the normals and up to 9.3 mm for theplane position.

As mobile C-Arms systems lack information about the spatial relationship between the system and theanatomical region, the adjustment of the plane position and orientation needs to be performed at the workstationin the operating room. This alignment of the planes is a manual task which takes 46 to 210 seconds dependingon the experience level of the surgeon and thus, is a time-consuming step in a surgery.13,14

Slice alignment in acquired volumes is a rather old topic. While the initial focus was on automatic rotationof the brain CT,15–17 with the invention of 3D capable mobile C-arms systems – which were used mainlyin orthopedic and trauma surgery environments – also other body parts like extremities attracted increasedattention in research. Speeded Up Robust Features (SURF) were used in Brehler et al.14 to register the acquiredvolume with an atlas that has annotated MPR planes. This method requires careful choice of the atlas andfeature extraction method, but even then this approach has a limited capture range of rotation. Therefore, inThomas18 shape models with attached labels for the MPR planes were used. For generating the shape models,multiple volumes need to be manually segmented, which is time-consuming work.

To account for small volume sizes that lead to cropped bones, and to be invariant to different metal implantspositions, much effort and domain knowledge during the registration was applied to obtain a robust algorithmfor one region. This leads to a long execution time of 23 s for the shape model registration and the subsequentplane regression.

Artificial intelligence systems allow performing this task in a considerably faster time. An active researchfield for standard plane regression task is ultrasound imaging, for which in Lu et al.19 probabilistic boostingtrees are used to estimate 9 transform parameters of the target MPRs using a multi-stage approach. Li et al.20

propose an iterative approach where a CNN repeatedly estimates the transform between a 2D plane and thestandard plane. Using this approach, they can circumvent a fully 3D approach as only a small number of planesamples and updates are necessary until the regression converges.

In a more general sense, spatial transformer networks (STN)21 predict the parameters of an affine transformmatrix that is used to manipulate feature maps in a convolutional architecture spatially. No direct supervisionfor the transform is used, allowing the network to optimize towards a spatial configuration that maximizes theperformance of the actual supervised target task. The Ω-Net by Vigneault et al.22 modifies this approach byestimating the transform parameters for direct manipulation of the input image data. Conditional on the feature

2

maps of a prior segmentation CNN, direct ground truth for the transformation parameters is used to bring theinput images to a canonical form that better suits the downstream segmentation task.

Martin et al.23 uses a Pose-Net for the regression of the plane parameters. These plane parameters can beinterpreted as transformation parameters. Comparing the structure of the Pose-Net with that of the STN, it canbe clearly seen that the convolutional layers resemble the localization network, and the fully connected layersresemble the final regression layer. Thus, Martin et al.23 avoid the additional overhead of the segmentationintroduced by the Ω-Net while retaining the approach of supervising the transform parameters, which are ofinterest for the current task.

This article contributes in multiple ways:

– We extend our initial ablation study presented in Martin et al.23 to give better insight in the performanceof the algorithm.

– We analyze different multi-task learning (MTL) approaches to improve the performance of the baselinealgorithm. Typically, the number of available volumes per body region is small. Caruana et al.24 showedthat multi-task learning (MTL) can help to find the right shared representation for related tasks when onlylittle data are available for the single tasks. MTL also helps to handle overfitting issues. Baxter25 showedthat parameter sharing reduces this risk substantially. Therefore, simultaneous learning for several taskscan help to find more appropriate representations and thus reduce the risk of overfitting. Furthermore, suchcombined training of MPR regression for different body regions can help to improve regression performance.We want to make use of this property of MTL in this work.

The approach of MTL also has a practical benefit: single task networks are stored separately for eachanatomical region and need to be loaded on demand. Measurements have shown that it takes up to1 s to load them from a hard drive to a graphics card. A combined network for which the parameterscan be loaded once and then stay in memory would be beneficial. Therefore, we compare the results ofregion-specific networks to combined networks with and without knowledge about the body region and amulti-head approach.

– We increase the number of evaluated body regions by adding proximal tibia (knee) and distal radius (wrist)to calcaneus and ankle, and we also show the results of an additional plane orientation representation andcompare it to the already published results of Martin et al.23

In Section 2 we present the employed mathematical description of planes, including a newly introduced secondversion of the 6D method.26 We describe the normalization of the coordinate system and introduce the differentneural network architectures we want to compare. Furthermore, the cost function for optimization is introduced.The implementation and the data we used for training and testing, as well as the study design, are described inSection 3. After that, we present the results of our experiments and discuss the results in Section 4.

2. METHODS

2.1 Plane Description

An MPR plane can be described by its center position A and the vectors eu and ev showing in the directions ofthe rows and columns. Its normal ew is the cross product of these two directions.

For each plane, a rigid homogeneous transformation T from the volume coordinate system to the planecoordinate system exists which can be decomposed into a 3x3 rotation matrix R and a 3-element translation t.

T =

[R t0 1

](1)

The 9 parameters of the 3 × 3 rotation matrix are highly coupled. So, the column vectors are normalized, thedot product of two vectors is zero, and one column vector can be calculated by the cross product of the othertwo vectors. These properties are utilized by the 6D method.26 With this method, the values of two vectors

3

are estimated by the neural network. Typically, the first two columns are utilized. However, it might also befavorable to regress the first and the third column instead of the second column as it encodes the normal of theplane which itself is part of the score function (Equ. 6), which will be introduced below. We denote the 6Dmethod which regresses the parameters for x and y direction with 6Dxy and the one which regresses the x andz direction with 6Dxz. After regression of the values, each column vector is normalized and the missing columnvector is calculated as the cross product. As the matrix is a pure rotation matrix, its entries are in the range of[−1, 1].

A more common way to regress rotation parameters is to decompose the matrix into Euler angles or use a unitquaternion representation. While Euler angles suffer from discontinuous values, the quaternion representationdoes not have this problem. To overcome the limitation for Euler angles, we do not regress directly the angularvalue but their sine and cosine values. The actual angle value is then calculated from the regressed values usingthe atan2 method. Another advantage of this method is that the parameter range of the values is compressedto the range [−1, 1]. The same range applies to the values of the quaternions.

The translation is normalized with respect to the volumes’ dimensions and thus also lies in the range of[−1, 1] with the origin placed at the center of the volume.

2.2 Separate and Combined Networks

In Martin et al.23 separate networks were used for different anatomical regions. For each region, a single networkfor the regression of all three plane parameters achieved the best performance. However, it was not analyzedhow one single network for all body regions performs.

In preliminary experiments we compared the performance of VGG-16,27 ResNet-34,28 and PoseNet29 alikenetworks. We could observe the PoseNet to generalize better and to be more robust compared to the other twoarchitectures. Therefore, we chose the PoseNet as baseline network for our study (Figure 1a).

The PoseNet network consists of 5 convolutional layers and 3 fully connected layers. The last layer has asmany output nodes as regressed values. The topology of this baseline network are listed in Table 1. When weuse this network for regression of the plane parameters, it is agnostic about the body region for which the planes’parameters need to be calculated.

As in Martin et al.,23 this information was provided by selecting the correct individual network. We wantto compare the performance of this base network with two extended versions which also use the additional classinformation. In the first version, we encode the information as a (N × 1) one-hot encoded tensor which bypassesthe convolutional layers and is concatenated to the output of the last convolutional layer and fed into the firstfully connected layer (Figure 1b). The idea behind this structure is that the convolutional layers are strictlylimited to feature extraction and with processing information from more volumes of multiple anatomies, theycan calculate more meaningful and generalizable features. Besides, the fully connected layers may benefit fromthis additional information.

Lastly, we investigate a multi-head approach with a shared convolutional feature extraction but individualfully connected regression heads for each anatomical region (Figure 1c).30 During inference the knowledge aboutthe body region is used to select the head and output nodes that correspond to the given body region. Duringback-propagation, the error gradients for all other body regions are set to zero. Thus only parameters withinthe fully connected layers belonging to the selected body region and those within the convolutional layers areupdated.

2.3 Augmentation and Value Normalization

During training, online augmentation of the volumes is employed. The spatial augmentation includes randomrotation within the interval [−45, 45]

, random spatial scaling of the volume by a factor in the range [0.95, 1.05],

translation by [−12, 12] mm, center cropping, and sub-sampling. All aforementioned augmentations were appliedwith a probability of 0.5 and were sampled uniformly from the respectively given range. Additionally, mirroringin x-direction is added with a probability of 0.5 which allows simulating left-right handedness of the volume.

4

Figure 1: Schematic visualization of the analyzed network architectures. (a) Baseline network without providingbody region information to the network. The five convolutional blocks consist of a 3D convolutional layer (red),followed by a ReLU activation function (orange), batch normalization (brown), and a max pooling operation(yellow). The obtained features are fed into three fully connected layers (green). (b) Combined network archi-tecture for all body regions with additional information about the body region fed into the first fully connectedlayer. (c) Multi-head network, convolutional blocks are shared across body regions, individual fully connectedlayers for the different body regions.

(a) (b) (c)

Table 1: Structure and parameter layout of the baseline network.

Block Input resolution # Input channels # Output channels

CNN1 72× 72× 72 1 8CNN2 31× 37× 31 8 16CNN3 16× 19× 16 16 32CNN4 8× 10× 8 32 64CNN5 4× 5× 4 64 228

FC1 1× 1× 1 10240 1300FC2 1× 1× 1 1300 50FC3 1× 1× 1 50 # parameters

These spatial operations are composed by combining their representation by homogeneous matrices to a singlecomposite matrix. The homogeneous transform matrix is given by

Tm = TrTsTtTR (2)

where Tr, Ts, Tt, and TR represent respectively the sub-sampling, scaling, translate and rotation homogeneousmatrices. This way of implementation helps to speed up the calculation and reduces the number of performedinterpolations to one.

Thereafter, also an intensity augmentation is implemented which simulates that the Hounsfield Unit (HU)values of mobile C-Arm devices are generally not as well calibrated as those of CT systems. Thereto, the valueof 1000 HU is added to the interpolated HU values, and the result is multiplied by a factor uniformly sampledfrom the range [0.95, 1.05]. For normalization, a windowing function w(x) is applied after clipping the volumeintensity values to the range of [−490, 1040] HU and rescaling it to [0, 1]. The resulting intensity value beforeapplying the windowing function is given by

c(x) =

0 if x < min,f(x+1000)−min

max−min if min < x < max,

1 if x > max.

(3)

where f represents the random factor. The windowing function is defined as

w(x) =1

(1 + eg(0.5−x))(4)

5

with a minimum and maximum value dependent gain factor. The gain factor is given by

g = log

(1− yy

)/0.4 (5)

where y = 0.02(max−min). In contrast to min-max normalization, it reduces the signal variance of metal andair which typically contains little to no information about the plane’s parameters.

2.4 Post-processing of Regressed Values

In Martin et al.23 it was shown that a combined regression of the parameters of the three planes is beneficialcompared to train separate networks for each plane. So the accuracy can be improved when the planes areredundantly regressed. In the same publication, it was also shown that the training does not benefit from anadditional orthogonality constraint on the regressed values. Therefore, we decided to regress the parameters ofthe planes in all the presented architectures decoupled and adjust them afterwards algorithmically.

As presented in Martin et al.,23 the axial plane is the most accurately regressed in the anatomical regions.Therefore, it is taken as reference plane for other planes. That means, that the in-plane rotation of the coronaland the sagittal plane is corrected such that the intersection of the axial plane at these planes is at 0. Thereafter,in cases in which the planes are orthogonal to each other, the normal direction of the sagittal plane is adjustedto be orthogonal to axial and coronal planes.

3. EXPERIMENTS

3.1 Data Sets

Our data set consists of 160 volumes of the calcaneus region, 220 volumes of the ankle region, 274 volumes ofthe knee, and 250 volumes of the wrist. All volumes were acquired with a mobile C-arm system Cios Spin fromSiemens Healthineers and reconstructed offline with Feldkamp-David-Kress algorithm using parameters equalto the product standard settings. The volumes have a uniform resolution of 5123 voxels and a field of view of(160mm)3. They were partly acquired after an orthopedic surgery for assessing the surgical result and partlywere acquired from cadavers which were prepared for surgical training. The cadaver data sets were typicallyscanned twice: once without any metal and once with metal objects put on the surface of the cadaver. Wealso obtained volumes of cadavers with various metal implants acquired during surgical training. The exactdistribution of the data sets is listed in Table 2. All available volumes were included in the data set, without anyconstraint on the positioning of the body part of interest. The volumes were corrected for wrong patient positiondescription according to the DICOM meta information. For each body region 5 data splits were created, takingcare that volumes of the same patient belong to the same subset and that the distribution of the data set’s originis approximately the same as in the total data set. For all volumes, standard planes were defined according tothe clinical definition provided in Grutzner.11 Sketches of the planes are displayed in Figure 2.

For the ankle, knee, and wrist volumes axial, coronal, and sagittal MPRs, for the calcaneus data sets axial,sagittal, and semi-coronal planes were annotated. This was done by a medical engineer after five hours of trainingusing a syngo XWorkplace VD20 which was modified to store the plane description. Axial, sagittal, and coronalMPRs were adjusted with coupled MPRs. The semi-coronal plane was adjusted thereafter with decoupled planes.The annotation validity was verified by an expert physician and additionally by a senior medical engineer.

3.2 Performance Metric

As an evaluation metric to compare the performance of the networks, we use a weighted average over theindividual error values of the three regressed planes

p =1

#planes

∑j∈planes

0.2dj + 0.6εn,j + 0.2εi,j . (6)

dj denotes the mean error of the absolute translation of the center in direction of the jth plane’s normal. εn,j isthe deviation of the normal vectors ew, and εi,j is the in-plane rotation error calculated as mean difference angle

6

Figure 2: Representation of the 3D definition of axial (red), coronal (blue), and sagittal (green) standard planesin the calcaneus, ankle, knee, and wrist.

Table 2: Number, origin, and realism with respect to metallic objects of the volumes.

Cadaver Clinical Total

Metalimplants

Metaloutside

NoMetal

Metalimplants

Calcaneus 9 63 62 26 160Ankle 36 61 56 67 220Knee 65 68 70 71 274Wrist 0 101 102 46 249

of eu and ev, after projecting the directions on the plane defined by the annotation. The different weights in (6)were chosen heuristically and reflect that the normal has the most complex effect on the result. For this normalto be corrected, out-of-plane rotations would be necessary, whereas in-plane rotation and plane translation areeasy-to-fix components.

In the results tables below, the mean and the standard deviation of the median prediction errors of the foldsis represented.

3.3 Study Design

Before investigating a combined regression network for multiple anatomies, some further experiments were carriedout to evaluate the performance of the baseline network. We have seen in Section 2 that there are severalpossibilities to parameterize rotations. In addition to Martin et al.23 the 6Dxz method was introduced takinginto account that the main contribution to the performance metric comes from angular deviation of the normals.Therefore, as first experiment the comparison of the representation with Euler angles, quaternions, 6Dxy, and6Dxz is performed for the four body regions.

The best performing representation is used in the subsequent experiments. First, the influence of the post-processing of the regressed angles is evaluated. For that, εn and εi are calculated with and without post-processingand their values are compared.

In Martin et al.23 the question was kept open, whether better results can be expected with more data samples.Since the number of available volumes is fixed, we incrementally reduce the number of volumes used for training.For this, the training for the different body regions is repeated using 100 %, 80 %, 60 %, and 40 % of the volumesin the training split, while keeping the test volumes unchanged.

Following the evaluation of the baseline model, different experiments were carried out to evaluate the per-formance of the use of a single model for all the body regions. First, we trained a single network for all bodyregions without providing any further class information. Second, we extend this architecture by encoding theanatomical region information as a one-hot encoded vector and concatenating it with the output of the convo-lutional layers (Fig. 1b). Lastly, a multi-head architecture (Fig. 1c) is used where all body regions share theconvolutional feature extraction layers but are individually processed in separate regression heads consisting onthree fully connected layers for each anatomical region. In order to overcome the imbalance between the differentclasses, the volumes were randomly over-sampled from the minority classes with a weight given by the number

7

of volumes from a given class. To gain a better understanding of the influence and benefits of the additionalanatomy information in the case of the first extended model, this additional information is corrupted to varyingdegrees during model inference. For this purpose, all nodes of the one-hot-tensor are set to the same expecteddiffuse probability of either 0.0, 0.5, or 1.0.

3.4 Implementation

The models are implemented in PyTorch (v.1.5.1) and trained on Windows 10 systems with 32 GB RAM and8 GB NVIDIA RTX 2070S. The weights are initialized by the He et al. method.31 The network is trained by amini-batch gradient descent optimizer with momentum. For optimization of the network parameters, the meansquared error between model prediction and ground truth was calculated at each output node. The total numberof epochs was set to 400, verifying training convergence of all model variants. For the selection of the learningrate, learning rate decay, step size, momentum, and batch size, a hyper-parameter optimization using randomsampling of the search space was performed. For that purpose, one fold was used and individual hyper-parameteroptimizations were performed for the different rotation descriptions in the baseline network. In Table 3 the searchspace for each hyper-parameter evaluated as well as the sampling value for the 6Dxy representation are listed.This method results in an offset of typically 0.1 and maximum 0.4 score points.

Table 3: Search space hyper-parameters, sampling distribution and best configuration for the plane regressiontask as result of random search hyper-parameter optimization.

Hyper-parameter Sampling distribution Sampling value

Learning rate s ∼ logU(0.0001, 0.01) 0.00164Learning rate decay s ∼ logU(0.2, 0.9) 0.27291Learning rate decay step s ∼ U(20, 80) 75Momentum s ∼ logU(0.5, 0.99) 0.957437

Batch size s ∼ U(5, 12) 9

4. RESULTS

As can be observed in Table 6, the evaluation of the different rotation representations in the base model showsthat the 6D method outperforms the Euler and quaternion representations in all the body regions except theknee. For this region similar performance to the best representation, the Euler angles, is reached. Among the6D methods, no significant difference in performance between 6Dxz and 6Dxy can be observed. Thus, using thenormal in the directly obtained values and consequently also in the cost function does not generally improvethe quality of the planes parameter regression. In two body regions we could observe a small reduction in themean error of the estimated normals, whereas an error increase was registered for the other two regions. Inall cases the in-plane rotation performance got significantly worse. The position estimation of the planes wasapproximately the same for both representations. Due to these reasons, the 6Dxy variant was chosen for theremaining experiments. When using sine and cosine representations of the Euler angles instead of the raw anglevalues shows superior performance over the quaternion representation for the estimation of the plane normal.Looking at the performance score that weights all metrics (Subsection 3.2), the Euler angles show better resultsin three body regions compared to the quaternions.

The analysis of the influence on the post-processing to the single parts of the score for 6Dxy representation(Table 5) shows that the post-processing helps to significantly improve εn as well as εi by up to 1.89. As thetranslation remains untouched by the post-processing, no changes can be observed for the translation error d.

The performance analysis of the baseline model upon reduced amounts of training data (Table ??) revealsthat in the ankle body region 174 volumes are sufficient to find good results. For the other body regions thenumbers of provided volumes should be increased to obtain the best possible results. Compared to the ankle,the other regions show a larger variance in shape and joint angulation and thus more training data is neededto capture all different shapes. It can be observed that calcaneus, knee, and wrist regions all show a similarperformance characteristics at reduced amounts of training data.

8

Table 4: Summarized results of evaluation of Euler angles, quaternions, 6Dxy, and 6Dxz rotation representationsin standard plane regression of calcaneus, upper ankle, knee, and wrist regions.

d (mm) εn() εi() Score

CalcaneusEuler 14.39 ± 1.64 8.93 ± 1.60 9.99 ± 0.75 10.23 ± 1.11Quat. 9.93 ± 2.53 9.96 ± 1.75 9.57 ± 1.52 9.87 ± 1.656Dxy 9.94 ± 1.92 8.08 ± 0.38 8.09 ± 0.45 8.46 ± 0.636Dxz 9.31 ± 1.10 8.23 ± 0.69 9.42 ± 1.02 8.68 ± 0.65

AnkleEuler 7.78 ± 0.36 6.98 ± 0.77 7.52 ± 0.76 7.25 ± 0.66Quat. 5.00 ± 0.09 8.16 ± 0.79 8.31 ± 0.71 7.56 ± 0.636Dxy 5.43 ± 0.25 6.61 ± 0.34 6.37 ± 0.31 6.32 ± 0.256Dxz 5.41 ± 0.49 6.17 ± 0.78 7.32 ± 1.06 6.25 ± 0.65

KneeEuler 6.81 ± 0.65 6.59 ± 1.05 7.36 ± 1.54 6.79 ± 0.96Quat. 6.82 ± 0.72 9.45 ± 0.67 10.54 ± 1.22 9.15 ± 0.596Dxy 6.81 ± 0.47 6.71 ± 0.63 7.07 ± 0.95 6.80 ± 0.556Dxz 7.15 ± 1.16 7.19 ± 0.52 8.22 ± 0.62 7.39 ± 0.49

WristEuler 7.45 ± 1.00 8.35 ± 1.66 9.82 ± 1.38 8.48 ± 1.31Quat. 8.46 ± 1.93 11.31 ± 1.87 13.47 ± 2.48 11.22 ± 1.836Dxy 7.27 ± 1.08 7.74 ± 1.14 8.72 ± 0.64 7.85 ± 0.946Dxz 7.21 ± 1.02 7.21 ± 1.06 9.37 ± 1.08 7.64 ± 0.84

Table 5: Comparison of the errors directly obtained by the network (Regressed) and after post-processing ensuringorthogonality of respective planes (Post-proc.) using the 6Dxy rotation representation.


CalcaneusRegressed 9.94 ± 1.92 8.77 ± 0.60 8.34 ± 0.44 8.92 ± 0.51Post-proc. 9.94 ± 1.92 8.08 ± 0.38 8.09 ± 0.45 8.46 ± 0.63

AnkleRegressed 5.43 ± 0.25 7.11 ± 0.48 6.58 ± 0.29 6.70 ± 0.35Post-proc. 5.43 ± 0.25 6.61 ± 0.34 6.37 ± 0.31 6.32 ± 0.25

KneeRegressed 6.81 ± 0.47 8.60 ± 0.98 8.45 ± 0.63 8.22 ± 0.71Post-proc. 6.81 ± 0.47 6.71 ± 0.63 7.07 ± 0.95 6.80 ± 0.55

WristRegressed 7.27 ± 1.08 8.76 ± 1.19 8.84 ± 0.94 8.48 ± 1.09Post-proc. 7.27 ± 1.08 7.74 ± 1.14 8.72 ± 0.64 7.85 ± 0.94

The comparison of the multi-head networks (Table 7) shows that a combined network which jointly estimatesthe parameters of the planes for different body region can in two cases improve the accuracy of the planespositions. However, for the angle regression task, this network variant yields inferior results. As the angularerrors have a higher impact on the score, the overall performance is inferior.

The same holds true for the network architecture where all layers are shared by all body regions and addi-tionally the body region is provided to the fully connected layers. This architecture has approximately the samescore as the network without the additional class information. Analyzing the influence of information aboutthe body region shows that the current network and training configuration is not suitable for incorporating and

9

Table 6: Summarized results of evaluation of Euler angles, quaternions, 6Dxy, and 6Dxz rotation representationsin standard plane regression with (Post-proc.) and without (Regressed) post-processing.

Calcaneus Ankle Knee Wrist

EulerRegressed 10.44 ± 1.17 7.42 ± 0.77 6.68 ± 0.85 8.30 ± 1.24Post-proc. 10.23 ± 1.11 7.25 ± 0.66 6.79 ± 0.96 8.48 ± 1.31

Quat.Regressed 9.88 ± 1.65 7.70 ± 0.67 9.01 ± 0.65 11.38 ± 2.00Post-proc. 9.87 ± 1.65 7.56 ± 0.63 9.15 ± 0.59 11.22 ± 1.83

6Dxy

Regressed 8.92 ± 0.51 6.70 ± 0.35 8.22 ± 0.71 8.48 ± 1.09Post-proc. 8.46 ± 0.63 6.32 ± 0.25 6.80 ± 0.55 7.85 ± 0.94

6Dxz

Regressed 8.71 ± 0.66 6.58 ± 0.64 7.59 ± 0.60 8.05 ± 1.07Post-proc. 8.68 ± 0.65 6.25 ± 0.65 7.39 ± 0.49 7.64 ± 0.84

Table 7: Summarized results of the different networks including the use of multiple models (a model for eachanatomy), of a model for training all the anatomies without class information, of a model trained with classinformation, and the multi-head model.


CalcaneusMultiple 9.94 ± 1.92 8.08 ± 0.38 8.09 ± 0.45 8.46 ± 0.63w/o class 9.17 ± 0.64 9.18 ± 1.21 8.87 ± 1.34 9.12 ± 1.03w/ class 9.38 ± 1.30 9.90 ± 2.16 10.31 ± 2.26 9.88 ± 1.71Multi-head 7.44 ± 0.31 9.16 ± 1.80 8.55 ± 0.88 8.69 ± 1.23AnkleMultiple 5.43 ± 0.25 6.61 ± 0.34 6.37 ± 0.31 6.32 ± 0.25w/o class 6.34 ± 0.77 9.71 ± 1.98 9.64 ± 1.90 9.02 ± 1.59w/ class 5.55 ± 1.11 7.71 ± 1.14 8.73 ± 1.68 7.49 ± 0.97Multi-head 4.47 ± 0.33 6.08 ± 0.45 6.61 ± 0.65 5.86 ± 0.40KneeMultiple 6.81 ± 0.47 6.71 ± 0.63 7.07 ± 0.95 6.80 ± 0.55w/o class 6.71 ± 0.72 8.04 ± 0.58 8.14 ± 1.10 7.79 ± 0.53w/ class 6.34 ± 0.82 8.16 ± 1.35 7.95 ± 1.21 7.75 ± 1.11Multi-head 5.62 ± 0.68 6.70± 1.28 6.77± 0.81 6.49 ± 1.05WristMultiple 7.27 ± 1.08 7.74 ± 1.14 8.72 ± 0.64 7.85 ± 0.94w/o class 6.42 ± 0.75 10.34 ± 2.82 10.52 ± 2.04 9.59 ± 2.15w/ class 6.46 ± 1.06 10.52 ± 2.33 10.00 ± 1.63 9.61 ± 1.77Multi-head 7.03 ± 1.16 10.50 ± 1.73 11.15 ± 1.29 9.93 ± 1.41

interpreting this additional information.

The multi-head network achieves the best performance score for 2 out of 4 body regions. For the calcaneusregion the single task network and the multi-head network have about the same performance, with their meanperformance score and rotation errors lying in each others range of standard deviation. Only for the wrist bodyregion the angle errors and thus also the score is significantly worse compared to the single task network. Forthis region, the multi-head network has achieved the worst values compared to all MTL network variants.

Across all experiments we could see that the estimation of the position can be improved by the MTL ap-proaches (Table 7). However, the angle estimation for both the normals and the in-plane rotation – whichtypically have a higher nominal error compared to the position (Figure 3) – do not benefit from the MTL

10

Table 8: Performance comparison for different provision levels of anatomical class information. The network isprovided once with true information about the volume class (one-hot) and for three different diffuse priors whereequal probabilities are assigned at each class label node (0, 0.5, 1).


True label 9.88 ± 1.71 7.49 ± 0.97 7.75 ± 1.11 9.61 ± 1.770.0 11.18 ± 1.08 7.45 ± 0.97 8.15 ± 1.00 9.76 ± 1.530.5 10.73 ± 1.11 6.67 ± 1.48 8.28 ± 1.35 9.33 ± 1.091.0 10.53 ± 0.67 7.95 ± 0.66 7.68 ± 1.11 9.38 ± 1.46

approach.

Figure 3: Individual distribution of plane and distance errors per anatomy obtained by the multi-head network(3.2).


0

20

40

60

80

100

120

140

Scor

e

εn ()

εi ()

d (mm)

For a better understanding of this result, we compared the volumes contributing to the 10 % best scoringresults to those contributing to the 10 % worst scoring results. The presence of metallic objects like screws orplates could not be observed as a source for these errors. Likewise, we could preclude that the regression erroris higher in those volumes where only a portion of the relevant anatomy is represented. For these problematiccases, the algorithm is quite robust. However, in these volumes, we realized that the patient positioning was donein a different way in comparison to the standard, e.g. prone or left instead of supine, or focus on the proximalfemur instead of the tibial head.

Since we constrained the augmentation pipeline by purpose not to fully cover this flips and rotations, moretraining data needs to be added to handle this.

Figures 4-6 show samples of the central planes through clinically acquired CBCT volumes and compare themto both the manually adjusted standard planes as well as the automatically inferred predictions by the multi-head network. In some cases, the algorithm could correct the in-plane rotation by 180 (Figure 4) or plane flips(Figure 5). However, the rotation by 90 of the axial plane in Figure 6 could lead to a very bad regression result.

5. DISCUSSION AND CONCLUSION

In this paper we investigate the regression of standard planes for four different body regions. The volumes forwhich the standard planes should be regressed are acquired with mobile C-arm devices and therefore have alimited field of view. Furthermore, there is no standardized relationship between the C-arm device and the bodyregion of interest, which also means that the representation of the body region in the acquired volumes is notconsistent. This also applies to the position of the body region in relation to the operating table. The target

11

Figure 4: Example of automatic plane regression results by the multi-head network for the clinical wrist dataset.

body regions are also in close proximity to flexible joints like knee, wrist, or ankle, which leads to great variabilityof the input data and thus to a significantly higher task complexity.

Despite this complex setting, our proposed method yields encouraging results with low median errors for theregressed angles and also positions. The experiment results reveal that the single task networks already achievea very good accuracy and that in this context MTL can only improve the results by a small margin. Only themulti-head approach could produce significantly better results. We argue that the used configuration of the fullyconnected layers in the combined network is not capable of learning an appropriate representation of data andtask distribution, so that the addition of further information does not help to improve regression performance.By corrupting the anatomy information provided to the model with shared fully connected layers, the resultsdid not get worse. This observation leads to the conclusion that the additional information has no influenceon the network’s decision-making. These shortcomings could be addressed by performing feature abstractionand combination in smaller consecutive steps, for example by adding intermediate fully connected layers. Thisreasoning is supported by the observation that only the additional parameter capacity of the multi-head approachhas led to a performance increase.

An important step to enhance the accuracy of the plane regression is to couple the planes as a post-processingstep. We could show that this improves εn and εi by up to 1. The improvement obtained by this post-processingmethod exceeds the one obtained by integration of the plane coupling in the training by a modified cost functionas was studied in.23 Most of all, the post-processing guarantees the orthogonality of the coupled planes.

The job of regressing the planes parameters can be performed equally well for orthogonal and oblique planes.While for the specialized networks for the calcaneus body region with oblique planes were obtained, for themulti-head network the worst values were received for the wrist body region. For this body region the planes areorthogonal. While axial planes are typically well regressed, the overall score is deteriorated by the coronal and

12

Figure 5: Example of automatic plane regression results by the multi-head network for the clinical calcaneusdata set.

sagittal planes. The normals of these planes are typically less well defined and small rotations by a few degreesare hardly noticed.

The results show that good angle regression performance is obtained when the volumes are acquired withthe body moderately aligned to the imaging system axes but fails when the body is rotated by more than 90.For these cases, the applied augmentation pipeline does not help. Flips in y-direction were not covered by theaugmentation. This was done by purpose, since in clinical practice a flip of a wrist upside down comes alongwith a modification of the configuration. In the case of the upper ankle or calcaneus, the upper ankle joint getsstretched more. Thus, applying the augmentation does not lead to clinically relevant data sets. Since at presentstage additional clinical data is not available and their clinical acquisition is seldom, more cadaver data is neededto sufficiently represent those poses. This does also mean that the results presented in this work do not showthe full potential of this approach.

Kausch et al.12 has shown that the human performance in adjusting the planes highly depends on the targetregion. In regions with many well-defined landmarks and few anatomical variations, the inter-rater variancein the plane adjustment is low. However, in regions for which less reliable landmarks can be identified, thisinter-rater variance is substantially higher.

For the presented anatomies no such variance estimates are available yet. This imposes limitations on theinterpretability of our results, since no well-defined reference values for clinically required error limits can serveas a standard. Although such a comparative analysis should be addressed in follow-up studies, we generally seepromising results of our proposed method which fits well within the error bounds of related studies of anatomywith comparable complexity.12

The benefit of the direct standard plane parameter regression is clearly the reduced amount of annotationdata per data set. Costly annotations of landmarks or even segmentation of bones can be omitted and are

13

Figure 6: Failure example of the automatic plane regression by the multi-head network for the clinical calcaneusdata set.

replaced by comparably cheap adjustments of the standard planes. Also the implementation of specific rules perbody region to obtain the parameters of the landmarks is omitted. Thus the direct regression provides a generictool for plane parameter estimation.

The MTL approach has proven to be beneficial in 2 out of the 4 body regions. For the case that a largeramount of data is available, we see further potential to reduce the error for all network architectures. Then,no substantial differences between the analyzed architecture variants are to be expected. However, the MTLapproach will help to reduce the number of stored parameters and to facilitate a common network for standardplane regression. Thus, the network parameters need not to be loaded depending on the scanned body part,which saves time during the execution.

Ethical Standards: The data was obtained retrospectively from anonymized databases and not generatedintentionally for the study. For this type of study formal consent is not required.

Informed Consent: The acquisition of data from living patients had a medical indication and informed consentwas not required. The acquired data sets of cadavers were available retrospectively after they had been generatedduring surgical courses for physicians. The corresponding consent for body donation for these purposes has beenobtained.

Acknowledgements: The authors gratefully acknowledge funding of the Erlangen Graduate School in AdvancedOptical Technologies (SAOT) by the Bavarian State Ministry for Science and Art. This work was partially fundedby Siemens Healthcare GmbH, Erlangen, Germany.

Disclosures: All authors declare that they have no conflict of interest.

14

Disclaimer: The methods and information presented here are based on research and are not commerciallyavailable.

REFERENCES

[1] Atesok, K. and others, “The use of intraoperative three-dimensional imaging (ISO-c-3d) in fixation ofintraarticular fractures,” Injury 38(10), 1163–1169 (2007).

[2] Beck, M., Mittlmeier, T., Gierer, P., Harms, C., and Gradl, G., “Benefit and accuracy of intraoperative 3d-imaging after pedicle screw placement: a prospective study in stabilizing thoracolumbar fractures,” Europeanspine journal : official publication of the European Spine Society, the European Spinal Deformity Society,and the European Section of the Cervical Spine Research Society 18(10), 1469–1477 (2009).

[3] Beisemann, N., , Keil, H., Swartman, B., Schnetzke, M., Franke, J., Grutzner, P. A., and Vetter, S. Y.,“Intraoperative 3d imaging leads to substantial revision rate in management of tibial plateau fractures in559 cases,” Journal of orthopaedic surgery and research 14(1), 236 (2019).

[4] Carelsen, B., Haverlag, R., Ubbink, D. T., Luitse, J. S. K., and Goslings, J. C., “Does intraoperativefluoroscopic 3d imaging provide extra information for fracture surgery?,” Archives of orthopaedic and traumasurgery 128(12), 1419–1424 (2008).

[5] Franke, J., von Recum, J., Suda, A. J., Grutzner, P. A., and Wendl, K., “Intraoperative three-dimensionalimaging in the treatment of acute unstable syndesmotic injuries,” The Journal of bone and joint surgery.American volume 94(15), 1386–1390 (2012).

[6] Franke, J., Wendl, K., Suda, A. J., Giese, T., Grutzner, P. A., and von Recum, J., “Intraoperative three-dimensional imaging in the treatment of calcaneal fractures:,” The Journal of Bone and Joint Surgery-American Volume 96(9), e72–1–7 (2014).

[7] Gwak, H.-C., Kim, J.-G., Kim, J.-H., and Roh, S.-M., “Intraoperative three-dimensional imaging in calcanealfracture treatment,” Clinics in orthopedic surgery 7(4), 483–489 (2015).

[8] Keil, H., Aytac, S., Grutzner, P. A., and Franke, J., “Intraoperative bildgebung bei der operativen therapievon beckenfrakturen,” Zeitschrift fur Orthopadie und Unfallchirurgie 157(4), 367–377 (2019).

[9] Kendoff, D. and others, “Intraoperative 3d imaging: value and consequences in 248 cases,” The Journal oftrauma 66(1), 232–238 (2009).

[10] Schnetzke, M. and others, “Intraoperative three-dimensional imaging in the treatment of distal radiusfractures,” Archives of orthopaedic and trauma surgery 138(4), 487–493 (2018).

[11] Grutzner, P. A., [Rontgenhelfer 3D Handbuch intraoperative 3D-Bildgebung mit mobilen C-Bogen ], Ben-gelsdorf und Schimmel (2004). OCLC: 723498140.

[12] Kausch, L., Thomas, S., Kunze, H., Privalov, M., Vetter, S., Franke, J., Mahnken, A. H., Maier-Hein, L.,and Maier-Hein, K., “Toward automatic c-arm positioning for standard projections in orthopedic surgery,”International Journal of Computer Assisted Radiology and Surgery 15, 1095–1105 (2020).

[13] Brehler, M., “Intra-operative visualization and assessment of articular surfaces in c-arm computed tomog-raphy images,” (2016).

[14] Brehler, M., Gorres, J., Franke, J., Barth, K., Vetter, S. Y., Grutzner, P. A., Meinzer, H.-P., Wolf, I., andNabers, D., “Intra-operative adjustment of standard planes in c-arm CT image data,” International Journalof Computer Assisted Radiology and Surgery 11(3), 495–504 (2016).

[15] Hirshberg, D. A., Loper, M., Rachlin, E., Tsoli, A., Weiss, A., Corner, B., and Black, M. J., “Evaluatingthe automated alignment of 3d human body scans,” in [Proceedings of the 2nd International Conference on3D Body Scanning Technologies, Lugano, Switzerland, 25-26 October 2011 ], 76–86, Hometrica Consulting -Dr. Nicola D’Apuzzo (2011).

[16] Tan, W., Kang, Y., Dong, Z., Chen, C., Yin, X., Su, Y., Zhang, Y., Zhang, L., and Xu, L., “An approachto extraction midsagittal plane of skull from brain CT images for oral and maxillofacial surgery,” IEEEAccess 7, 118203–118217 (2019).

[17] Qi, X., Belle, A., Shandilya, S., Chen, W., Cockrell, C., Tang, Y., Ward, K. R., Hargraves, R. H., andNajarian, K., “Ideal midline detection using automated processing of brain CT image,” Open Journal ofMedical Imaging 03(2), 51–59 (2013).

15

[18] Thomas, S., “Automatic image analysis of c-arm computed tomography images for ankle joint surgeries,”(2020).

[19] Lu, X., Georgescu, B., Zheng, Y., Otsuki, J., and Comaniciu, D., “AutoMPR: Automatic detection of stan-dard planes in 3d echocardiography,” in [2008 5th IEEE International Symposium on Biomedical Imaging:From Nano to Macro ], 1279–1282, IEEE (2008).

[20] Li, Y., Khanal, B., Hou, B., Alansary, A., Cerrolaza, J. J., Sinclair, M., Matthew, J., Gupta, C., Knight, C.,Kainz, B., and Rueckert, D., “Standard plane detection in 3d fetal ultrasound using an iterative transforma-tion network,” in [Medical Image Computing and Computer Assisted Intervention – MICCAI 2018 ], Frangi,A. F., Schnabel, J. A., Davatzikos, C., Alberola-Lopez, C., and Fichtinger, G., eds., 392–400, SpringerInternational Publishing, Cham (2018).

[21] Jaderberg, M., Simonyan, K., Zisserman, A., and kavukcuoglu, k., “Spatial transformer networks,” in[Advances in Neural Information Processing Systems ], Cortes, C., Lawrence, N., Lee, D., Sugiyama, M.,and Garnett, R., eds., 28, 2017–2025, Curran Associates, Inc. (2015).

[22] Vigneault, D. M., Xie, W., Ho, C. Y., Bluemke, D. A., and Noble, J. A., “ω-net (omega-net): Fullyautomatic, multi-view cardiac mr detection, orientation, and segmentation with deep neural networks,”Medical Image Analysis 48, 95 – 106 (2018).

[23] Martin Vicario, C., Kordon, F., Denzinger, F., Weiten, M., Thomas, S., Kausch, L., Jochen, F., Keil, H.,Maier, A., and Kunze, H., “Automatic plane adjustment of orthopedic intraoperative flat panel detectorct-volumes,” in [Proc Med Image Comput Comput Assist Interv ], Martel, A. L., Abolmaesumi, P., Stoyanov,D., Mateus, D., Zuluaga, M. A., Zhou, S. K., Racoceanu, D., and Joskowicz, L., eds., 486–495, SpringerInternational Publishing (2020).

[24] Caruana, R., “Multitask learning,” Machine Learning 28(1), 41–75 (1997).

[25] Baxter, J., “A bayesian/information theoretic model of learning to learn via multiple task sampling,” Ma-chine Learning 28(1), 7–39 (1997).

[26] Zhou, Y., Barnes, C., Lu, J., Yang, J., and Li, H., “On the continuity of rotation representations in neuralnetworks,” in [2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ], 5738–5746, IEEE (2019).

[27] Simonyan, K. and Zisserman, A., “Very deep convolutional networks for large-scale image recognition,” in[International Conference on Learning Representations ], (2015).

[28] He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [2016 IEEEConference on Computer Vision and Pattern Recognition (CVPR) ], 770–778 (2016).

[29] Bui, M., Albarqouni, S., Schrapp, M., Navab, N., and Ilic, S., “X-ray posenet: 6 dof pose estimation formobile x-ray devices,” in [2017 IEEE Winter Conference on Applications of Computer Vision (WACV) ],1036–1044 (2017).

[30] Ruder, S., “An overview of multi-task learning in deep neural networks.”

[31] He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [Proceedings ofthe IEEE international conference on computer vision ], 1026–1034 (2015).

16

Documents

Automatic Plane Adjustment of Orthopedic Intra-operative