Upload
rob-a
View
213
Download
0
Embed Size (px)
Citation preview
The Spine Journal 13 (2013) 99–109
Clinical Study
Spinal fusion for chronic low back pain: systematic reviewon the accuracy of tests for patient selection
Paul C. Willems, MD, PhDa,*, J. Bart Staal, PT, PhDb, Geert H.I.M. Walenkamp, MD, PhDa,Rob A. de Bie, PT, PhDc
aDepartment of Orthopaedics, Research School CAPHRI, Maastricht University Medical Center, P. Debyelaan 25, PO Box 5800,
6202 AZ Maastricht, The NetherlandsbScientific Institute for Quality of Healthcare, Radboud University Medical Center, PO Box 9101, 6500 HB Nijmegen, The Netherlands
cDepartment of Epidemiology, Research School CAPHRI, Maastricht University Medical Center, PO Box 5800, 6202 AZ Maastricht, The Netherlands
Received 8 December 2011; revised 27 April 2012; accepted 1 October 2012
Abstract BACKGROUND CONTEXT: Spinal fusion i
FDA device/drug
Author disclosure
disclose. GHIMW: N
* Corresponding a
CAPHRI, Maastricht
5800, 6202 AZ Maas
0031-433874893.
E-mail address: p
1529-9430/$ - see fro
http://dx.doi.org/10.10
s a common but controversial treatment for chroniclow back pain (LBP) with outcomes similar to those of programmed conservative care. To improvethe results of fusion, tests for patient selection are used in clinical practice.PURPOSE: To determine the prognostic accuracy of tests for patient selection that are currently usedin clinical practice to identify those patients with chronic LBP who will benefit from spinal fusion.STUDY DESIGN: Systematic review of the literature.SAMPLE: Studies that compared the results of magnetic resonance imaging (MRI), provocativediscography, facet joint blocks, orthosis immobilization, and temporary external fixation with theclinical outcome of patients who underwent spinal fusion for chronic LBP.OUTCOME MEASURES: To determine the prognostic accuracy of tests to predict the clinicaloutcome of spinal fusion in terms of sensitivity, specificity, and likelihood ratios (LRs).METHODS: Data sources PubMed (1966 to November 2010), EMBASE (1974 to November2010), and reference lists were searched without restriction by language or publication status.Two reviewers independently selected studies for inclusion, extracted data for analysis, and as-sessed the risk of bias with the Quality Assessment of Diagnostic Accuracy Studies checklist, mod-ified for prognostic studies. Discrepancies were resolved by consensus.RESULTS: Ten studies met the eligibility criteria. Immobilization by an orthosis (median [range]positive LR, 1.10 [0.94–1.13] and negative LR, 0.92 [0.39–1.12]), provocative discography (median[range] positive LR, 1.18 [0.70–1.71] and negative LR, 0.74 [0.24–1.40]), and temporary externalfixation (median [range] positive LR, 1.22 [1.02–1.74] and negative LR, 0.58 [0.15–0.94]) failed toshow clinically useful prognostic accuracy. Statistical pooling was not feasible because of differenttest protocols, variability in outcome assessment, and heterogeneous patient populations. No studiesreporting on facet joint blocks or MRI could satisfy the inclusion criteria. Obscure patient selection,high risk of verification bias, and outcome assessment with poorly validated instruments precludedstrong conclusions for all tests.CONCLUSIONS: No subset of patients with chronic LBP could be identified for whom spinal fu-sion is a predictable and effective treatment. Best evidence does not support the use of current testsfor patient selection in clinical practice. � 2013 Elsevier Inc. All rights reserved.
Keywords: Chronic low back pain; Spinal fusion; Patient selection; Systematic review; Test accuracy
status: Not applicable.
s: PCW: Nothing to disclose. JBS: Nothing to
othing to disclose. RAdB: Nothing to disclose.
uthor. Department of Orthopaedics, Research School
University Medical Center, P. Debyelaan 25, PO Box
tricht, The Netherlands. Tel.: 0031-433875038; fax:
[email protected] (P.C. Willems)
nt matter � 2013 Elsevier Inc. All rights reserved.
16/j.spinee.2012.10.001
Introduction
Chronic low back pain (LBP) imposes huge costs to so-ciety, either directly by health-care consumption or indi-rectly by lost productivity because of work absenteeismand early retirement [1,2]. If conservative treatment fails[3], lumbar spinal fusion may be performed to stabilize
ContextSpinal surgeons often use the results of magnetic reso-
nance imaging, discography, facet joint blocks, and
brace immobilization when selecting patients with de-
generative disease and chronic low back pain for surger-
ies such as fusion or disc replacement.
ContributionIn this review, the authors found that these tests have
been inadequately studied in a systematic fashion. In
the few cases in which the tests had the best data ana-
lyzed, none of them were demonstrated to be accurate
or useful.
ImplicationSurgery for chronic low back pain (without neurological
impingement, instability, etc) is controversial at best.
There is no clear pathognomic or specific pathologic le-
sion, yet the authors discovered that strong data predicts
clinically serious low back pain syndromes. Diagnostic
tests have proven to be nonspecific and their accuracy
poor in determining treatment success. Outcomes are
universally inferior to those expected for clinically
well-defined degenerative conditions (herniated nucleus
pulposus, stenosis, degenerative spondylolisthesis). De-
spite a nearly 60-year concerted effort and the escalation
of complex surgical approaches, little clinically signifi-
cant progress has been made to improve the situation
for these patients.—The Editors
100 P.C. Willems et al. / The Spine Journal 13 (2013) 99–109
a painful segment. However, its results are variable andhard to predict for the individual patient [4]. Two random-ized controlled trials compared fusion with cognitive be-havioral exercise therapy [5] or an intensive rehabilitationprogram [6] and reported equal improvement for fusionand conservative treatment. As spinal fusion surgery is as-sociated with greater complications [7], health-care costs[8], and morbidity [9,10], there is only a rationale for fusionif its results are improved by identifying and operating onlythose patients who will actually benefit from fusion.
For patient selection in practice, clinicians rely on teststhat predict the outcome of spinal fusion. These are eithergenuine prognostic tests or diagnostic tests for prognosticpurposes with the underlying assumption that the presenceor absence of painful disc degeneration will identify sub-groups of patients with a good or bad patient-important out-come of spinal fusion, respectively [11,12]. The mostcommonly used tests are magnetic resonance imaging(MRI), provocative discography, facet joint blocks (all testsintend to identify the source of LBP), immobilization by anorthosis, and temporary external transpedicular fixation of
suspect spinal levels (both prognostic tests that intend tomimic the immobilizing effect of a spinal fusion).
Considering that false-positive test results will lead to un-necessary invasive and expensive surgery with potentialcomplications and false-negative test results will withholdadequate treatment from patients who may benefit from fu-sion, our systematic review aimed to determine the accuracyof tests currently used in clinical practice to identify those pa-tients with chronic LBP who will benefit from spinal fusion.
Methods
For the purpose of this review, we investigated the mostcommonly used tests in clinical practice: MRI, which hasbeen recommended as the imaging study of choice for theevaluation of patients with back pain [13–15], provocativediscography [16], facet joint blocks [17], immobilizationby an orthosis [18], and immobilization by temporary exter-nal fixation [19]. These tests are described in detail inTable 1.
Data sources and searches
A literature search was conducted according to theguidelines by Devill�e et al. [20]. PubMed (1966 to Novem-ber 2010) and EMBASE (1974 to November 2010) data-bases were explored, and we used search terms forrelevant test procedures, study design, and patient popula-tion. For the tests, the following terms were used: immobi-lization, thoracolumbosacral orthosis, surgical cast(s),provocative discography, discography, temporary externalfixation, facet joint blocks, zygapophyseal joint blocks,imaging, and MRI. For study design, we used the termsprognosis, prognostic, accuracy, predictive, diagnosis, diag-nostic test(s), and diagnostic technique(s), and for patientpopulation, the terms lumbar spine, lumbar vertebrae, lum-bosacral, spinal, LBP, degenerative disc disease, inter-vertebral disc(k), disc degeneration, failed back surgerysyndrome, spondylosis, spinal fusion, and spondylodesiswere used. Both Medical Subject Headings terms and freetext words were entered.
Study selection
Two authors (PCW and JBS) screened the titles andabstracts of all references identified by the search todetermine whether they met the following inclusioncriteria:
1. Patients should suffer for at least 3 months from LBPwithout signs of nerve root impingement, spinal ste-nosis, instability, or deformity.
2. Studies should contain both patients with a positiveand patients with a negative index test result, whosubsequently underwent spinal fusion.
Table 1
Description of investigated tests for patient selection
MRI [14,15]Facet joint degeneration and abnormal disc morphology can be identified on MRI of the spine. Loss of T2-signal intensity, collapse, Modic changes, and
high-intensity zones are commonly observed in the disc and presumed to be a source of pain
Provocative discography [16]
Under sterile conditions, a stiletted needle is advanced into the center of the intended disc space. Under fluoroscopic control, a contrast agent is injected. If
this injection provokes pain similar to the patient’s usual pain and if one or two control discs adjacent to the suspect disc do not elicit usual pain, the test
is considered positive. The extent of degeneration of the injected discs is determined on fluoroscopy or a computed tomography scan immediately after
the procedure
Facet joint blocks [17]
Using an aseptic technique and fluoroscopic guidance, local anesthetic is injected into the facet joint. Between 0.5 and 3 h after injection, the amount of
pain relief is recorded. In case of substantial pain relief, the test is considered positive
Orthosis immobilization [18]
A standard brace or corset is prescribed, or a plaster cast can be applied. In a pantaloon cast, one hip is fixed within the cast for better immobilization of the
lumbosacral junction. Patients are expected to wear the orthosis for at least 2 to 4 wks and are encouraged to perform as many daily life activities as
possible. In case of significant pain relief, the test is considered positive
Temporary external transpedicular fixation [19]
Under general anesthesia, antibiotic prophylaxis, and fluoroscopy, two screws are inserted percutaneously through the pedicles into the vertebra above and
two screws into the vertebra below the suspect discs, respectively. Postoperatively, the protruding screw ends are fixed externally with two vertical bars,
which immobilizes the discs of interest. In case of adequate pain relief, the test is considered positive. Optionally, immobilization can be discontinued
without the knowledge of the patient by fixing the bars horizontally (dynamization), which should annul pain relief
MRI, magnetic resonance imaging.
101P.C. Willems et al. / The Spine Journal 13 (2013) 99–109
3. Clinical outcome after fusion, which was considered asthe reference standard, should be presented per indi-vidual patient in such a way that a relevant clinical im-provement cutoff could be defined for analysis andoutcome could be dichotomized into success or failure.
4. Pain, subjective improvement, back-specific disabil-ity, disability for work, or patient satisfaction shouldhave been incorporated as a clinically relevant out-come measure.
5. Studies should include at least 20 patients.6. There were no restrictions by language.7. Study populations with objective neurologic motor
deficit, fracture, infectious disease, ankylosing spon-dylitis, neoplasm, congenital or adolescent idiopathicscoliosis, kyphosis, or adult scoliosis were excluded.
Full publications of studies, which were considered aspotentially relevant by both authors, were retrieved. The ar-ticles were read and checked for final inclusion indepen-dently. Any disagreement with regard to study selectionwas discussed in consensus meetings. In cases where dis-agreement persisted, a third reviewer (RAdB) was con-sulted for the final decision. The references of the articlesidentified by the search were checked for additional eligiblestudies.
Data extraction and assessment of bias
Relevant study data were retrieved by the same two re-viewers using standardized forms. Extracted informationincluded standard reference data (first author, journal, andpublication year), number of patients, characteristics ofstudy population before surgery (ie, age, sex, severity andduration of pain, and/or disability), index test, spinal fusionmethod, outcome measures, and clinical outcome.
The two reviewers independently assessed the risk ofbias of included studies by means of a modified versionof the Quality Assessment for Diagnostic Accuracy Studies(QUADAS) checklist [21]. The QUADAS is a generally ac-knowledged checklist to assess the quality of primary stud-ies of diagnostic accuracy. As there are no gold standardcriteria for quality assessment of studies of prognosticaccuracy, we modified the QUADAS checklist, as follows(Table 2): Items 1 and 2 of the original QUADAS remainedin the modified version. The original Item 3 (Is the refer-ence standard likely to correctly classify the target condi-tion?) was left out because for the selected studies in thepresent review, the reference standard and target conditionare the same (ie, clinical outcome after fusion). Instead,whether the reference standard was assessed by valid mea-sures of acceptable quality was included as Item 3. Item 4of the original QUADAS (Is the time period between thereference standard and the index test short enough to bereasonably sure that the target condition did not changein the time period between these tests?) was removedbecause to obtain a reliable estimation of clinical outcomeafter lumbar spinal fusion (ie, reference standard), thelength of the follow-up should be at least 2 years [22](modified Item 4). The original Item 5 (Did the whole sam-ple or a random selection of the sample receive verificationusing a reference standard of diagnosis?) was left out be-cause for inclusion in the present review, all analyzed pa-tients from the selected studies had undergone fusion andsubsequent clinical outcome assessment. Item 6 of the orig-inal QUADAS (Did patients receive the same referencestandard regardless of the index test result?) remained un-changed (modified Item 5). Item 7 of the original QUADAS(Was the reference standard independent of the index test ordid the index test form part of the reference standard?) wasremoved because the outcome of fusion was assessed much
Table 2
Modified QUADAS checklist: criteria to assess risk of bias
1. Was the spectrum of patients representative of the patients who will receive the index test in practice?
2. Were selection criteria clearly described?
3. Were the outcomes used to assess recovery collected by means of validated measures of acceptable quality?*
4. Was a sufficiently long follow-up period (2 y or more) used to asses the outcome of the spinal fusion operation?*
5. Did all patients receive spinal fusion followed by the outcome assessment regardless of the index test result?
6. Was the execution of the index test described in sufficient detail to permit replication of the test?
7. Was a clear cutoff point used to qualify positive versus negative results of the index test?*
8. Did the effect sizes that were used to consider patients as being recovered (ie, the reference standard) meet accepted standards of clinical relevance, that
is, a minimal important change of 30% or more?*
9. Were the clinical outcomes after fusion assessed without knowledge of the results of the index test?
10. Were the same clinical data available when index test results were interpreted as would be available when the test is used in practice?
11. Were uninterpretable results of the index test reported?
12. Were withdrawals from the study explained?
QUADAS, Quality Assessment of Diagnostic Accuracy Studies [21].
* Items 3, 4, 7, and 8 are items modified for prognostic accuracy.
102 P.C. Willems et al. / The Spine Journal 13 (2013) 99–109
later than the index test. The original Item 8 was includedin the modified QUADAS version as Item 6. The modifiedItem 7 was added to verify whether an objective and clearlydefined cutoff point was mentioned to determine whetherthe index test was positive or negative. Item 9 of the orig-inal QUADAS (Was the execution of the reference standarddescribed in sufficient detail in order to permit replicationof the test?) was transformed into whether the assessmentof clinical outcome after fusion was adequately addressedaccording to the accepted standards of clinical importance[23] (modified Item 8). Item 10 of the original QUADAS(Were the index test results interpreted without knowledgeof the reference standard?) was left out because fusion out-come was assessed much later than the preoperative indextest. The original Items 11, 12, 13, and 14 were includedas Items 9, 10, 11, and 12, respectively. Disagreements be-tween both reviewers were discussed and resolved in a con-sensus meeting.
Data synthesis and analysis
By combining outcome (dichotomized into success orfailure) with the test results (positive or negative), two-by-two tables with four cells (true positives, false negatives,false positives, and true negatives) could be generated, andtest qualifiers, such as sensitivity, specificity, predictivevalues, and likelihood ratios (LRs) with 95% confidence in-tervals (CIs), were calculated. Calculations were done withMeta-DiSc statistical software version 1.4 (Unit of ClinicalBiostatistics, Ram�on y Cajal Hospital, Madrid, Spain) [24].Statistical pooling was only performed if studies on a spe-cific index test were not hampered by statistical or clinicalheterogeneity. Statistical heterogeneity was defined as non-overlapping 95% CIs for estimates of sensitivity and spec-ificity and a difference in these estimates among the studiesof more than 20% [25,26]. We considered studies as clini-cally heterogeneous when patient groups, outcome mea-sures, or the execution of index tests were different. Incases of statistical or clinical heterogeneity, we refrained
from statistical pooling, and the results were presentedper individual study.
Results
Figure shows the flow diagram of studies from initial re-sults of database searches to final inclusion, according tothe Preferred Reporting Items for Systematic Reviewsand Meta-Analyses (PRISMA) guidelines 2009 [27]. Ofthe 22 selected full articles, six studies in which only pa-tients with a positive index test had been selected for lum-bar fusion were excluded [28–33]. We also excluded sixother studies, in which test accuracy could not be deter-mined because only mean values of recovery were reportedwithout proportions of patients with success or failure oftreatment [17,34–38]. Finally, 10 studies met the inclusioncriteria [18,39–47].
Characteristics of included studies
Study characteristics are listed in Table 3. Three articlesconcerned immobilization by an orthosis, a fiberglass pan-taloon cast [18], a canvas corset [39], or a plaster pantalooncast [40]. Four articles reported on discography, of whichtwo studies focused on provocative discography of suspectlevels [42,47], one study on provocation of the levels adja-cent to the intended fusion [41], and the fourth study fo-cused on the amount of degeneration as registered atdiscography in a group of patients with a positive disco-graphic pain response [45]. Three articles evaluated trialimmobilization by external fixation, either with [43,46] orwithout dynamization [44].
The sample sizes of the included studies ranged from 22to 162. Two studies reported exclusively on patients with-out previous spine surgery [42,45]. The length of thefollow-up ranged from 6 months or ‘‘when fusion wasnoted’’ [47] to 12 years. Either anterior or posterolateralfusion was performed. In the studies in which both
1,055 records identified through
literature search of PubMed
19 full-text articles
assessed for eligibility
1,124 records excluded
based on evaluation of
titles and abstracts
3 additional full-text articles
included by reference
screening
12 full-text articles excluded
- 6 studies only included
patients with a positive
index test
- 6 studies reported mean
outcome instead of
proportions of patients with
successful outcome or
failure
10 studies included in
qualitative synthesis
115 records identified through
literature search of EMBASE
1,143 records identified through combined
literature search after removal of duplicates
1,143 records screened
Figure. PRISMA flow diagram of combined literature search and selection.
103P.C. Willems et al. / The Spine Journal 13 (2013) 99–109
procedures were performed, no difference in outcome be-tween the two types of fusion was reported [40–43]. Painappeared to be the only measure of outcome that was con-sistently reported in all included articles. Five studies useda visual analog scale to score pain, of which three studies[40,41,43] defined a cutoff point of at least 30% decreasein pain as a clinically relevant improvement; one study,a decrease of at least 75% [45]; and one study, ‘‘little orno pain on a visual analog scale [44].’’ The other studiesused a subjective pain scale (pain free or significant pain re-lief vs. insignificant or no pain relief) [18,39,42,46,47]. Nostudy reporting on facet joint blocks or MRI could meet theinclusion criteria.
Risk of bias
The two reviewers agreed on 77 of the 120 items (64%)scored. After discussion, consensus could be reached on allitems. Most disagreements were because of reading errorsor ambiguous reporting. The risk of bias ratings are listedin Table 4. The most prevalent shortcomings were as fol-lows: uninterpretable index test results were not reported,no clear cutoff point was defined for positive and negative
index test results, and in all but three studies, not all pa-tients who were tested underwent fusion regardless of theindex test result (verification bias).
Test accuracy
Table 5 summarizes test accuracies from the includedstudies.
OrthosisAn orthosis or cast immobilization could neither confirm
nor rule out a good outcome after spinal fusion with vari-able sensitivity (0.43–0.94; range of positive LR, 0.94–1.13) and specificity (0.14–0.61; range of negative LR,0.39–1.12) [18,39,40].
Provocative discographyThe four studies reporting on the prognostic value of dis-
cography showed variable sensitivity (0.40–0.88; range ofpositive LR, 0.70–1.71) and specificity (0.25–0.48; rangeof negative LR, 0.24–1.40) [41,42,45,47]. In one study,LRs were statistically significant (positive LR, 1.71; 95%CI, 1.21–2.41 and negative LR, 0.24; 95% CI, 0.13–0.43),
Table 3
Characteristics of included primary studies on the clinical outcome of lumbar spinal fusion
Source Setting
Included
patients
(n)
Patients
for current
analysis (n)
Mean
age6SD
(range)
Male
(n), % Patient characteristics
Study
design
Index test and
criterion for positive
result
Follow-up
in mo
(range)
Method of
fusion
Criteria for
positive reference
standard (5fusion
outcome)
Thoracolumbosacral orthosis
Axelsson
et al. [39]
Tertiary
(university
hospital)
50 50 44 (20–68) 28 (56) Intractable LBP:
spondylolisthesis
(n524), facet/disc
degeneration
(n511) or
postlaminectomy
syndrome (n515),
duration not
specified
Retrospective Thoracolumbosacral
orthosis or canvas
corset; positive in
case of $50%
subjective pain
relief
24 Posterolateral
fusion with
autograft
Pain free or
significantly
improved on
a five-point
pain scale and
satisfied on a
three-point
satisfaction
scale*
Rask and
Dall [18]
Tertiary
(university
hospital)
45 25 43.7 (20–61) Not
specified
O6 mo back pain
(mean, 3.9 y), no
neurologic motor
deficit, herniation,
or olisthesisO2
mm, 38% with
prior spine surgery
Retrospective Fiberglass pantaloon
cast; positive in
case of pain relief
that returned after
the removal of the
cast
Minimum
of 6
Posterolateral
fusion with
autograft
Significant
subjective
pain relief*
Willems
et al. [40]
Tertiary
(specialized
hospital)
257 107 4068.8
(range not
specified)
39 (36) Incapacitating LBP,
mean, 3.7 (0.5–31)
y, no neurologic
motor deficit and
routine testing
indecisive, 65%
with prior spine
surgery (N570)
Retrospective Pantaloon plaster cast;
positive in case of
subjective
substantial pain
relief in the cast
Median
of 76
(15–144)
Posterolateral
(n579) or
anterior
fusion
(n528)
$30% decrease
in pain on a
VAS (0–100)
Provocative discography
Colhoun
et al. [42]
Tertiary
(orthopedic
hospital)
195 168 39.1 (17–70) 86 (51) Persistent LBP, no
previous back
surgery, duration
not specified
Retrospective Provocative
discography;
positive in case of
typical pain
reproduction,
which was not
present in adjacent
control discs
Mean
of 43
(24–120)
Anterior or
posterior
fusion
(numbers
not specified)
Complete pain
relief or
significant
subjective
improvement,
resumption of
work/normal
duties, and no
intake of
analgesics*
Esses
et al. [47]
Tertiary
(university
hospital)
32 22 41 (31–57) 18 (84%) Long-standing LBP,
mean duration of
6.2 (1–20) y, 54%
with prior spine
surgery
Prospective Provocative
discography, no
control discs;
positive in case of
typical pain
reproduction
‘‘When
fusion
was
noted’’
Posterolateral
fusion with
autograft
Complete or
significant
relief of pain*
104
P.C.Willem
set
al./TheSpineJournal13(2013)99–109
Gill and
Blumenthal
[45]
Tertiary
(orthopedic
institute)
53 53 34 (21–50) 36 (68) LBP with a mean
disability of 11
(3–120) mo, all
selected by
concordant pain
response provoked
at discography
L5–S1
Design not
specified
Discography image of
L5–S1, no control
disc(s); positive in
case of annular tear
extending to the
periphery
Mean of 36
(24–54)
Anterior fusion
with allograft
(n548) or
autograft
(n55)
Relief of $75%
of initial back
pain on VAS,
return to work,
and no use of
narcotics
Willems
et al. [41]
Tertiary
(specialized
hospital)
197 82 4068.5
(range not
specified)
26 (32) Incapacitating LBP
O1 y (mean
duration not
specified), no
neurologic motor
deficit, and routine
testing indecisive,
65% with prior
spine surgery
(N553)
Retrospective Provocative
discography of
adjacent to
intended fusion;
positive in case of
no or unfamiliar
pain reproduction
Mean of 80
(15–144)
Posterolateral or
anterior fusion
(numbers not
specified)
$30% decrease
in pain on a
VAS (0–100)
TETF
Elmans
et al. [43]
Tertiary
(specialized
hospital)
330 123 4269
(range not
specified)
45 (37) Incapacitating LBP
O1 y (mean
duration of 665 y)
with inconclusive
routine testing,
62% with prior
spine surgery
Prospective TETF with
dynamization;
positive if VAS in
placebo was $30
points more than
VAS in fixation
Median
of 79
(15–44)
mo
Anterior
(n533) or
posterolateral
(n590) fusion
$30% decrease
in pain on a
VAS (0–100)
Heini
et al. [44]
Tertiary
(university
hospital)
63 36 48 (26–67) 22 (62) LBP with a mean
duration of 5 (1–
20) y, 67% with
prior spine surgery
Prospective TETF without
dynamization;
positive if pain on
VAS and use of
analgesics
decreased
sufficiently
(estimated by
surgeon)
Mean of 32
(23–60)
Posterolateral
fusion, except
three dynamic
fixations and
one anterior
fusion
No or little pain
on a VAS and
no pain
medication*
Jeanneret
et al. [46]
Secondary
(regional
hospital)
101 43 48 (22–74) 25 (58) Chronic LBP with or
without leg pain,
duration not
specified, routine
testing
indeterminate
Design not
specified
TETF with
dynamization;
positive if pain was
reduced at
stabilization and
returned at
destabilization
Mean of 50
(24–92)
Posterior or
anterior fusion
(numbers not
specified)
Almost
completely
pain free, no
pain medication
and working*
SD, standard deviation; LBP, low back pain; VAS, visual analog scale; TETF, temporary external transpedicular fixation.
* No validated outcome measure reported.
105
P.C.Willem
set
al./TheSpineJournal13(2013)99–109
Table 4
Risk of bias: number of quality criteria of the modified QUADAS checklist* that were met
Quality criteria
Source 1 2 3 4 5 6 7 8 9 10 11 12 Total
Thoracolumbosacral orthosis
Axelsson et al. [39] þ � � þ þ þ þ � ? þ ? ? 6
Rask and Dall [18] þ þ � � � þ � � þ þ � þ 6
Willems et al. [40] � þ þ þ � þ � þ þ þ � þ 8
Provocative discography
Colhoun et al. [42] þ ? � þ þ þ � � ? þ ? ? 5
Esses et al. [47] þ � � � � � ? � � þ � ? 2
Gill and Blumenthal [45] � � ? þ þ ? þ � � ? ? þ 4
Willems et al. [41] � þ þ þ � þ ? þ þ � � þ 7
Temporary external transpedicular fixation
Elmans et al. [43] � � þ þ � þ þ þ þ ? ? þ 7
Heini et al. [44] � � þ þ � þ � � � þ þ þ 6
Jeanneret et al. [46] þ ? � þ � þ � � � þ � ? 4
QUADAS, Quality Assessment for Diagnostic Accuracy Studies; þ, yes; �, no; ?, unclear.* See Table 2 for complete modified QUADAS checklist.
106 P.C. Willems et al. / The Spine Journal 13 (2013) 99–109
but specificity was low, 0.48, meaning that only half of thepatients who would not improve after fusion could be de-tected [42].
Temporary external transpedicular fixationSensitivity was generally high (0.80–0.93; range of pos-
itive LR, 1.02–1.74), but specificity was low (0.20–0.47;range of negative LR, 0.15–0.94) [43,44,46,47]. In onestudy, LRs were statistically significant (positive LR,1.74; 95% CI, 1.07–2.83 and negative LR, 0.15; 95% CI,0.03–0.65), but with a specificity of only 0.47 [46].
Statistical pooling was not feasible because of differenttest protocols with no clear cutoff point for a positive versusnegative result, variability in outcome assessment, and het-erogeneous patient populations (varying diagnoses and dif-ferent mix of patients with or without prior spine surgerybetween studies, Table 3).
Table 5
Summary prognostic accuracy of orthosis immobilization, provocative discograp
Source
Sample
size Sensitivity Specificity
Thoracolumbosacral orthosis
Axelsson et al. [39] 50 0.61 0.35
Rask and Dall [18] 25 0.94 0.14
Willems et al. [40] 107 0.43 0.61
Provocative discography
Colhoun et al. [42] 168 0.88 0.48
Esses et al. [47] 22 0.40 0.43
Gill and Blumenthal [45] 53 0.81 0.41
Willems et al. [41] 82 0.73 0.27
Temporary external transpedicular fixation
Elmans et al. [43] 123 0.80 0.34
Heini et al. [44] 36 0.81 0.20
Jeanneret et al. [46] 43 0.93 0.47
LR, likelihood ratio; 95% CI, 95% confidence interval.
Discussion
We systematically reviewed the accuracy of tests that arecommonly used in clinical practice to identify thosepatients with chronic LBP who will benefit from spinalfusion. With LRs approaching one, all tests failed to accu-rately predict the outcome of spinal fusion. In particular,specificity was consistently low, meaning that for all tests,high proportions of false-positive test results will lead tounnecessary invasive and expensive surgery.
The lack of proven accuracy of the current tests is re-flected in the high degree of clinical uncertainty in decisionmaking regarding fusion surgery for chronic LBP [4,48].Studies among spine surgeons show that there is no consen-sus in treatment strategy [49,50], and our results confirmthat in many clinical practices, patients are scheduled forfusion on the basis of tests, of which the accuracy is insuf-ficient or at best unknown. In this respect, it is
hy, and external transpedicular fixation for spinal fusion outcome
Positive
predictive
value
Negative
predictive
value
Positive
LR (95% CI)
Negative
LR (95% CI)
0.64 0.32 0.94 (0.60–1.46) 1.12 (0.52–2.41)
0.74 0.50 1.10 (0.80–1.52) 0.39 (0.03–5.40)
0.44 0.61 1.13 (0.71–1.80) 0.92 (0.67–1.28)
0.88 0.48 1.71 (1.21–2.41) 0.24 (0.13–0.43)
0.60 0.25 0.70 (0.29–1.70) 1.40 (0.54–3.62)
0.74 0.50 1.37 (0.89–2.10) 0.47 (0.20–1.13)
0.45 0.55 0.99 (0.76–1.30) 1.01 (0.49–2.08)
0.46 0.71 1.22 (0.98–1.51) 0.58 (0.31–1.11)
0.45 0.57 1.02 (0.74–1.40) 0.94 (0.24–3.60)
0.77 0.78 1.74 (1.07–2.83) 0.15 (0.03–0.65)
107P.C. Willems et al. / The Spine Journal 13 (2013) 99–109
disappointing that for MRI, the recommended evaluationtool for LBP, no studies could be identified to determineits accuracy for surgical decision making.
Limitations
As pain appeared to be the only measure of outcome thatwas consistently incorporated in all studies, our analysiswas limited to pain. Although it represents only one aspectof the complex chronic LBP problem, pain is the primaryindication for operative treatment.
Substantial risk of bias in most selected studies pre-cluded firm conclusions. A major drawback was that inall but three studies, a proportion of patients who hadundergone the index test with a negative test result weredenied fusion and had been excluded from analysis (verifi-cation bias). Ideally, fusion should have been performedcompletely independent of the index test result. Becauseof the substantial risk of bias, heterogeneous patient popu-lations, different methods of fusion, and variability in testprotocols, a meta-analysis would produce misleadingresults. Therefore, we refrained from statistical pooling ofincluded studies [51].
The current review focused on a limited number of in-dividual tests and thus provides no evidence on other teststhat may be used in clinical practice for decision making.Moreover, as there are no reports in the literature on thecombined use of prognostic tests, no information is pro-vided on clinical utility if some of these tests were usedin an algorithmic approach. In addition, psychosocial pa-tient factors (eg, worker’s compensation and smoking)that may negatively affect treatment outcome and thusare very relevant for clinical decision making were notincorporated.
The QUADAS tool is an acknowledged tool to assess thequality of diagnostic studies. As there are no validated toolsfor prognostic purposes, the QUADAS tool was modified asdescribed in the Methods section. It should be acknowl-edged that the modified QUADAS tool is not validatedfor prognostic studies and that the changes made are there-fore debatable.
It should be noted that the present study design focusedon the optimization of the results of spinal fusion. In sucha design, a positive outcome of a high accuracy test doesnot necessarily imply an indication for spinal fusion, asthe test may merely identify those patients with a better nat-ural history, regardless of treatment. Only if tests for patientselection would be embedded in a randomized controlledtrial design between fusion and programmed conservativecare, it could truly be determined what would be the besttreatment for subgroups of patients. At present, such studieshave not been performed.
From the vast amount of literature on spinal fusion forLBP, only 10 studies evaluating three tests could meet our in-clusion criteria. It is disappointing that such a small numberof studies focused on true test accuracy of mainly expensive
and invasive tests with potential complications. In the searchselection, we excluded six studies [17,34–38] that reportedmean clinical improvement after fusion for patient groupswith a positive index test result and for patient groups witha negative index test result, respectively. Because no propor-tions of patients with good clinical outcome were reported[23], no two-by-two tables could be created to determine testaccuracy. Three of these excluded studies reported on MRIwith conflicting results. In one study, inflammatory vertebralend plate changes (Modic Type I) were significantly relatedto continued back pain after fusion [34], whereas the othertwo studies showed a significantly better [36] or a relativelybetter (no statistics) [35] outcome for patients with ModicType I end plate changes, respectively. A study on preopera-tive test bracing revealed ‘‘a clear tendency for poorer prog-nosis for patients who had responded poorly to the brace[37].’’ Another excluded study focusing on pressure-controlled discography reported no significant differencesin long-term surgical outcome across the entire sample[38]. A study on facet joint blocks failed to show any signif-icant correlation between test results and the outcome of spi-nal fusion [17].
Clinical relevance and implications for practice
Several studies have reported that cognitive behavioraltherapy or intensive exercise programs [5,6,52] have treat-ment results similar to those of spinal fusion, but with con-siderably less complications, morbidity, and costs [9,10].The findings of the present review show that the currentlyused tests do not improve the results of fusion by better pa-tient selection, which makes it hard to propose spinal fusionas a standard treatment for chronic LBP. Currently usedtests for patient selection are not recommended for surgicaldecision making in standard care.
Implications for future research
To verify whether spinal fusion could be effective fora subset of patients with persisting symptoms after conser-vative care, future research should focus on studies thatinclude both positively and negatively tested patients ina randomized design between fusion and programmed con-servative care. Test protocols should be clearly described,and clinical outcome should be defined by a consensus cut-off point of improvement in pain and functional status,a so-called minimal clinically important change [23,53].Consensus on relevant outcomes and choice of measure-ment tools would provide better consistency and compara-bility across studies. To further minimize the risk of bias,detailed reporting of methods and interventions would al-low replication and appropriate interpretation of results[54,55]. Additionally, the role of MRI, as well as the rela-tion between treatment outcome and psychosocial patientrisk factors for persistent disabling LBP [56], should be fur-ther elucidated. Determination of reliable predictors of out-come would greatly help physicians to counsel their
108 P.C. Willems et al. / The Spine Journal 13 (2013) 99–109
patients properly in weighing the risks and benefits of treat-ment options for chronic LBP.
Conclusions
No subset of patients with chronic LBP could be identi-fied for whom spinal fusion is a predictable and effectivetreatment. Best evidence does not support the use of currenttests for patient selection in clinical practice.
References
[1] Maniadakis N, Gray A. The economic burden of back pain in the UK.
Pain 2000;84:95–103.
[2] Lambeek LC, van Tulder MW, Swinkels IC, et al. The trend in total
cost of back pain in The Netherlands in the period 2002 to 2007.
Spine 2011;36:1050–8.
[3] Airaksinen O, Brox JI, Cedraschi C, et al. Chapter 4. European guide-
lines for the management of chronic nonspecific low back pain. Eur
Spine J 2006;15(Suppl 2):S192–300.
[4] Deyo RA, Nachemson A, Mirza SK. Spinal-fusion surgery—the case
for restraint. N Engl J Med 2004;350:722–6.
[5] Brox JI, Sorensen R, Friis A, et al. Randomized clinical trial of lum-
bar instrumented fusion and cognitive intervention and exercises in
patients with chronic low back pain and disc degeneration. Spine
2003;28:1913–21.
[6] Fairbank J, Frost H, Wilson-MacDonald J, et al. Randomised con-
trolled trial to compare surgical stabilisation of the lumbar spine with
an intensive rehabilitation programme for patients with chronic low
back pain: the MRC spine stabilisation trial. BMJ 2005;330:1233.
[7] Wilson-MacDonald J, Fairbank J, Frost H, et al. The MRC spine sta-
bilization trial: surgical methods, outcomes, costs, and complications
of surgical stabilization. Spine 2008;33:2334–40.
[8] Rivero-Arias O, Campbell H, Gray A, et al. Surgical stabilisation of
the spine compared with a programme of intensive rehabilitation for
the management of patients with chronic low back pain: cost utility
analysis based on a randomised controlled trial. BMJ 2005;330:1239.
[9] Fritzell P, Hagg O, Jonsson D, Nordwall A. Cost-effectiveness of
lumbar fusion and nonsurgical treatment for chronic low back pain
in the Swedish Lumbar Spine Study: a multicenter, randomized, con-
trolled trial from the Swedish Lumbar Spine Study Group. Spine
2004;29:421–34; discussion Z3.
[10] Deyo RA, Mirza SK, Martin BI, et al. Trends, major medical compli-
cations, and charges associated with surgery for lumbar spinal steno-
sis in older adults. JAMA 2010;303:1259–65.
[11] Lord SJ, Staub LP, Bossuyt PM, Irwig LM. Target practice: choosing
target conditions for test accuracy studies that are relevant to clinical
practice. BMJ 2011;343:d4684.
[12] Schunemann HJ, Oxman AD, Brozek J, et al. Grading quality of ev-
idence and strength of recommendations for diagnostic tests and
strategies. BMJ 2008;336:1106–10.
[13] Modic MT, Masaryk TJ, Ross JS, Carter JR. Imaging of degenerative
disk disease. Radiology 1988;168:177–86.
[14] Pfirrmann CW, Metzdorf A, Zanetti M, et al. Magnetic resonance
classification of lumbar intervertebral disc degeneration. Spine
2001;26:1873–8.
[15] Resnick DK, Choudhri TF, Dailey AT, et al. Guidelines for the per-
formance of fusion procedures for degenerative disease of the lumbar
spine. Part 2: assessment of functional outcome. J Neurosurg Spine
2005;2:639–46.
[16] Guyer RD, Ohnmeiss DD. Lumbar discography. Position statement
from the North American Spine Society Diagnostic and Therapeutic
Committee. Spine 1995;20:2048–59.
[17] Esses SI, Moro JK. The value of facet joint blocks in patient selection
for lumbar fusion. Spine 1993;18:185–90.
[18] Rask B, Dall BE. Use of the pantaloon cast for the selection of fusion
candidates in the treatment of chronic low back pain. Clin Orthop
Relat Res 1993;288:148–57.
[19] van der Schaaf DB, van Limbeek J, Pavlov PW. Temporary external
transpedicular fixation of the lumbosacral spine. Spine 1999;24:
481–4; discussion 4–5.
[20] Devill�e WL, Buntinx F, Bouter LM, et al. Conducting systematic
reviews of diagnostic studies: didactic guidelines. BMC Med Res
Methodol 2002;2:9.
[21] Whiting P, Rutjes AW, Reitsma JB, et al. The development of QUA-
DAS: a tool for the quality assessment of studies of diagnostic accu-
racy included in systematic reviews. BMC Med Res Methodol
2003;3:25.
[22] Turner JA, Deyo RA, Loeser JD, et al. The importance of placebo
effects in pain treatment and research. JAMA 1994;271:1609–14.
[23] Ostelo RW, Deyo RA, Stratford P, et al. Interpreting change scores
for pain and functional status in low back pain: towards international
consensus regarding minimal important change. Spine 2008;33:90–4.
[24] Zamora J, Abraira V, Muriel A, et al. Meta-DiSc: a software for meta-
analysis of test accuracy data. BMC Med Res Methodol 2006;6:31.
[25] Jellema P, van der Windt DA, Bruinvels DJ, et al. Value of symptoms
and additional diagnostic tests for colorectal cancer in primary care:
systematic review and meta-analysis. BMJ 2010;340:c1269.
[26] van der Windt DA, Jellema P, Mulder CJ, et al. Diagnostic testing for
celiac disease among patients with abdominal symptoms: a systematic
review. JAMA 2010;303:1738–46.
[27] Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting
items for systematic reviews and meta-analyses: the PRISMA state-
ment. PLoS Med 2009;6:e1000097.
[28] Ohtori S, Kinoshita T, Yamashita M, et al. Results of surgery for dis-
cogenic low back pain: a randomized study using discography versus
discoblock for diagnosis. Spine 2009;34:1345–8.
[29] Lovely TJ, Rastogi P. The value of provocative facet blocking as
a predictor of success in lumbar spine fusion. J Spinal Disord
1997;10:512–7.
[30] Bednar DA, Raducan V. External spinal skeletal fixation in the man-
agement of back pain. Clin Orthop Relat Res 1996;322:131–9.
[31] Axelsson P, Johnsson R, Stromqvist B, Andreasson H. Temporary ex-
ternal pedicular fixation versus definitive bony fusion: a prospective
comparative study on pain relief and function. Eur Spine J
2003;12:41–7.
[32] Madan S, Gundanna M, Harley JM, et al. Does provocative discogra-
phy screening of discogenic back pain improve surgical outcome?
J Spinal Disord Tech 2002;15:245–51.
[33] Peng B, Chen J, Kuang Z, et al. Diagnosis and surgical treatment of
back pain originating from endplate. Eur Spine J 2009;18:1035–40.
[34] Buttermann GR, Heithoff KB, Ogilvie JW, et al. Vertebral body MRI
related to lumbar fusion results. Eur Spine J 1997;6:115–20.
[35] Chataigner H, Onimus M, Polette A. [Surgery for degenerative lum-
bar disc disease. Should the black disc be grafted?]. Rev Chir Orthop
Reparatrice Appar Mot 1998;84:583–9.
[36] Esposito P, Pinheiro-Franco JL, Froelich S, Maitrot D. Predictive
value of MRI vertebral end-plate signal changes (Modic) on outcome
of surgically treated degenerative disc disease. Results of a cohort
study including 60 patients. Neurochirurgie 2006;52:315–22.
[37] Christensen FB, Karlsmose B, Hansen ES, Bunger CE. Radiological
and functional outcome after anterior lumbar interbody spinal fusion.
Eur Spine J 1996;5:293–8.
[38] Derby R, Howard MW, Grant JM, et al. The ability of pressure-
controlled discography to predict surgical and nonsurgical outcomes.
Spine 1999;24:364–71; discussion 71–2.
[39] Axelsson P, Johnsson R, Stromqvist B, et al. Orthosis as prognostic
instrument in lumbar fusion: no predictive value in 50 cases followed
prospectively. J Spinal Disord 1995;8:284–8.
109P.C. Willems et al. / The Spine Journal 13 (2013) 99–109
[40] Willems PC, Elmans L, Anderson PG, et al. The value of a pantaloon
cast test in surgical decision making for chronic low back pain
patients: a systematic review of the literature supplemented with
a prospective cohort study. Eur Spine J 2006;15:1487–94.
[41] Willems PC, Elmans L, Anderson PG, et al. Provocative discography
and lumbar fusion: is preoperative assessment of adjacent discs use-
ful? Spine 2007;32:1094–9; discussion 1100.
[42] Colhoun E, McCall IW, Williams L, Cassar Pullicino VN. Provoca-
tion discography as a guide to planning operations on the spine.
J Bone Joint Surg Br 1988;70:267–71.
[43] Elmans L, Willems PC, Anderson PG, et al. Temporary external
transpedicular fixation of the lumbosacral spine: a prospective, longi-
tudinal study in 330 patients. Spine 2005;30:2813–6.
[44] Heini PF, Gahrich U, Orler R. The external fixator: a tool for evalu-
ation of complex low back pain problems. J Spinal Disord Tech
2004;17:8–14.
[45] Gill K, Blumenthal SL. Functional results after anterior lumbar
fusion at L5-S1 in patients with normal and abnormal MRI scans.
Spine 1992;17:940–2.
[46] Jeanneret B, Jovanovic M, Magerl F. Percutaneous diagnostic stabili-
zation for low back pain. Correlation with results after fusion opera-
tions. Clin Orthop Relat Res 1994;304:130–8.
[47] Esses SI, Botsford DJ, Kostuik JP. The role of external spinal skeletal
fixation in the assessment of low-back disorders. Spine 1989;14:
594–601.
[48] Weinstein JN, Lurie JD, Olson PR, et al. United States’ trends and
regional variations in lumbar spine surgery: 1992-2003. Spine
2006;31:2707–14.
[49] Irwin ZN, Hilibrand A, Gustavel M, et al. Variation in surgical deci-
sion making for degenerative spinal disorders. Part I: lumbar spine.
Spine 2005;30:2208–13.
[50] Katz JN, Lipson SJ, Lew RA, et al. Lumbar laminectomy alone or
with instrumented or noninstrumented arthrodesis in degenerative
lumbar spinal stenosis. Patient selection, costs, and surgical out-
comes. Spine 1997;22:1123–31.
[51] Macaskill P, Gatsonis C, Deeks J, et al. Chapter 10: analysing and
presenting results. In: Deeks J, Bossuyt P, Gatsonis C, eds. Cochrane
handbook for systematic reviews of diagnostic test accuracy version
1.0. West Sussex, UK: The Cochrane Collaboration, John Wiley &
Sons, 2010.
[52] GuzmanJ,EsmailR,KarjalainenK, et al.Multidisciplinary rehabilitation
for chronic low back pain: systematic review. BMJ 2001;322:1511–6.
[53] van der Roer N, Ostelo RW, Bekkering GE, et al. Minimal clinically
important change for pain intensity, functional status, and general
health status in patients with nonspecific low back pain. Spine
2006;31:578–82.
[54] Moher D, Schulz KF, Altman DG. The CONSORT statement:
revised recommendations for improving the quality of reports of
parallel-group randomized trials. Ann Intern Med 2001;134:
657–62.
[55] von Elm E, Altman DG, Egger M, et al. The Strengthening the Re-
porting of Observational Studies in Epidemiology (STROBE) state-
ment: guidelines for reporting observational studies. J Clin
Epidemiol 2008;61:344–9.
[56] Chou R, Shekelle P. Will this patient develop persistent disabling low
back pain? JAMA 2010;303:1295–302.