Spinal fusion for chronic low back pain: systematic review on the accuracy of tests for patient selection

The Spine Journal 13 (2013) 99–109

Clinical Study

Spinal fusion for chronic low back pain: systematic reviewon the accuracy of tests for patient selection

Paul C. Willems, MD, PhDa,*, J. Bart Staal, PT, PhDb, Geert H.I.M. Walenkamp, MD, PhDa,Rob A. de Bie, PT, PhDc

aDepartment of Orthopaedics, Research School CAPHRI, Maastricht University Medical Center, P. Debyelaan 25, PO Box 5800,

6202 AZ Maastricht, The NetherlandsbScientific Institute for Quality of Healthcare, Radboud University Medical Center, PO Box 9101, 6500 HB Nijmegen, The Netherlands

cDepartment of Epidemiology, Research School CAPHRI, Maastricht University Medical Center, PO Box 5800, 6202 AZ Maastricht, The Netherlands

Received 8 December 2011; revised 27 April 2012; accepted 1 October 2012

Abstract BACKGROUND CONTEXT: Spinal fusion i

FDA device/drug

Author disclosure

disclose. GHIMW: N

* Corresponding a

CAPHRI, Maastricht

5800, 6202 AZ Maas

0031-433874893.

E-mail address: p

1529-9430/$ - see fro

http://dx.doi.org/10.10

s a common but controversial treatment for chroniclow back pain (LBP) with outcomes similar to those of programmed conservative care. To improvethe results of fusion, tests for patient selection are used in clinical practice.PURPOSE: To determine the prognostic accuracy of tests for patient selection that are currently usedin clinical practice to identify those patients with chronic LBP who will benefit from spinal fusion.STUDY DESIGN: Systematic review of the literature.SAMPLE: Studies that compared the results of magnetic resonance imaging (MRI), provocativediscography, facet joint blocks, orthosis immobilization, and temporary external fixation with theclinical outcome of patients who underwent spinal fusion for chronic LBP.OUTCOME MEASURES: To determine the prognostic accuracy of tests to predict the clinicaloutcome of spinal fusion in terms of sensitivity, specificity, and likelihood ratios (LRs).METHODS: Data sources PubMed (1966 to November 2010), EMBASE (1974 to November2010), and reference lists were searched without restriction by language or publication status.Two reviewers independently selected studies for inclusion, extracted data for analysis, and as-sessed the risk of bias with the Quality Assessment of Diagnostic Accuracy Studies checklist, mod-ified for prognostic studies. Discrepancies were resolved by consensus.RESULTS: Ten studies met the eligibility criteria. Immobilization by an orthosis (median [range]positive LR, 1.10 [0.94–1.13] and negative LR, 0.92 [0.39–1.12]), provocative discography (median[range] positive LR, 1.18 [0.70–1.71] and negative LR, 0.74 [0.24–1.40]), and temporary externalfixation (median [range] positive LR, 1.22 [1.02–1.74] and negative LR, 0.58 [0.15–0.94]) failed toshow clinically useful prognostic accuracy. Statistical pooling was not feasible because of differenttest protocols, variability in outcome assessment, and heterogeneous patient populations. No studiesreporting on facet joint blocks or MRI could satisfy the inclusion criteria. Obscure patient selection,high risk of verification bias, and outcome assessment with poorly validated instruments precludedstrong conclusions for all tests.CONCLUSIONS: No subset of patients with chronic LBP could be identified for whom spinal fu-sion is a predictable and effective treatment. Best evidence does not support the use of current testsfor patient selection in clinical practice. � 2013 Elsevier Inc. All rights reserved.

Keywords: Chronic low back pain; Spinal fusion; Patient selection; Systematic review; Test accuracy

status: Not applicable.

s: PCW: Nothing to disclose. JBS: Nothing to

othing to disclose. RAdB: Nothing to disclose.

uthor. Department of Orthopaedics, Research School

University Medical Center, P. Debyelaan 25, PO Box

tricht, The Netherlands. Tel.: 0031-433875038; fax:

[email protected] (P.C. Willems)

nt matter � 2013 Elsevier Inc. All rights reserved.

16/j.spinee.2012.10.001

Introduction

Chronic low back pain (LBP) imposes huge costs to so-ciety, either directly by health-care consumption or indi-rectly by lost productivity because of work absenteeismand early retirement [1,2]. If conservative treatment fails[3], lumbar spinal fusion may be performed to stabilize

Delta:1_given name

Delta:1_given name

Delta:1_given name

Delta:1_given name

Delta:1_surname

Delta:1_given name

Delta:1_given name

Delta:1_given name

Delta:1_given name

Delta:1_surname

Delta:1_given name

Delta:1_given name

Delta:1_given name

Delta:1_given name

Delta:1_surname

Delta:1_given name

mailto:[email protected]

http://dx.doi.org/10.1016/j.spinee.2012.10.001

http://dx.doi.org/10.1016/j.spinee.2012.10.001

ContextSpinal surgeons often use the results of magnetic reso-

nance imaging, discography, facet joint blocks, and

brace immobilization when selecting patients with de-

generative disease and chronic low back pain for surger-

ies such as fusion or disc replacement.

ContributionIn this review, the authors found that these tests have

been inadequately studied in a systematic fashion. In

the few cases in which the tests had the best data ana-

lyzed, none of them were demonstrated to be accurate

or useful.

ImplicationSurgery for chronic low back pain (without neurological

impingement, instability, etc) is controversial at best.

There is no clear pathognomic or specific pathologic le-

sion, yet the authors discovered that strong data predicts

clinically serious low back pain syndromes. Diagnostic

tests have proven to be nonspecific and their accuracy

poor in determining treatment success. Outcomes are

universally inferior to those expected for clinically

well-defined degenerative conditions (herniated nucleus

pulposus, stenosis, degenerative spondylolisthesis). De-

spite a nearly 60-year concerted effort and the escalation

of complex surgical approaches, little clinically signifi-

cant progress has been made to improve the situation

for these patients.—The Editors

100 P.C. Willems et al. / The Spine Journal 13 (2013) 99–109

a painful segment. However, its results are variable andhard to predict for the individual patient [4]. Two random-ized controlled trials compared fusion with cognitive be-havioral exercise therapy [5] or an intensive rehabilitationprogram [6] and reported equal improvement for fusionand conservative treatment. As spinal fusion surgery is as-sociated with greater complications [7], health-care costs[8], and morbidity [9,10], there is only a rationale for fusionif its results are improved by identifying and operating onlythose patients who will actually benefit from fusion.

For patient selection in practice, clinicians rely on teststhat predict the outcome of spinal fusion. These are eithergenuine prognostic tests or diagnostic tests for prognosticpurposes with the underlying assumption that the presenceor absence of painful disc degeneration will identify sub-groups of patients with a good or bad patient-important out-come of spinal fusion, respectively [11,12]. The mostcommonly used tests are magnetic resonance imaging(MRI), provocative discography, facet joint blocks (all testsintend to identify the source of LBP), immobilization by anorthosis, and temporary external transpedicular fixation of

suspect spinal levels (both prognostic tests that intend tomimic the immobilizing effect of a spinal fusion).

Considering that false-positive test results will lead to un-necessary invasive and expensive surgery with potentialcomplications and false-negative test results will withholdadequate treatment from patients who may benefit from fu-sion, our systematic review aimed to determine the accuracyof tests currently used in clinical practice to identify those pa-tients with chronic LBP who will benefit from spinal fusion.

Methods

For the purpose of this review, we investigated the mostcommonly used tests in clinical practice: MRI, which hasbeen recommended as the imaging study of choice for theevaluation of patients with back pain [13–15], provocativediscography [16], facet joint blocks [17], immobilizationby an orthosis [18], and immobilization by temporary exter-nal fixation [19]. These tests are described in detail inTable 1.

Data sources and searches

A literature search was conducted according to theguidelines by Devill�e et al. [20]. PubMed (1966 to Novem-ber 2010) and EMBASE (1974 to November 2010) data-bases were explored, and we used search terms forrelevant test procedures, study design, and patient popula-tion. For the tests, the following terms were used: immobi-lization, thoracolumbosacral orthosis, surgical cast(s),provocative discography, discography, temporary externalfixation, facet joint blocks, zygapophyseal joint blocks,imaging, and MRI. For study design, we used the termsprognosis, prognostic, accuracy, predictive, diagnosis, diag-nostic test(s), and diagnostic technique(s), and for patientpopulation, the terms lumbar spine, lumbar vertebrae, lum-bosacral, spinal, LBP, degenerative disc disease, inter-vertebral disc(k), disc degeneration, failed back surgerysyndrome, spondylosis, spinal fusion, and spondylodesiswere used. Both Medical Subject Headings terms and freetext words were entered.

Study selection

Two authors (PCW and JBS) screened the titles andabstracts of all references identified by the search todetermine whether they met the following inclusioncriteria:

1. Patients should suffer for at least 3 months from LBPwithout signs of nerve root impingement, spinal ste-nosis, instability, or deformity.

2. Studies should contain both patients with a positiveand patients with a negative index test result, whosubsequently underwent spinal fusion.

Table 1

Description of investigated tests for patient selection

MRI [14,15]Facet joint degeneration and abnormal disc morphology can be identified on MRI of the spine. Loss of T2-signal intensity, collapse, Modic changes, and

high-intensity zones are commonly observed in the disc and presumed to be a source of pain

Provocative discography [16]

Under sterile conditions, a stiletted needle is advanced into the center of the intended disc space. Under fluoroscopic control, a contrast agent is injected. If

this injection provokes pain similar to the patient’s usual pain and if one or two control discs adjacent to the suspect disc do not elicit usual pain, the test

is considered positive. The extent of degeneration of the injected discs is determined on fluoroscopy or a computed tomography scan immediately after

the procedure

Facet joint blocks [17]

Using an aseptic technique and fluoroscopic guidance, local anesthetic is injected into the facet joint. Between 0.5 and 3 h after injection, the amount of

pain relief is recorded. In case of substantial pain relief, the test is considered positive

Orthosis immobilization [18]

A standard brace or corset is prescribed, or a plaster cast can be applied. In a pantaloon cast, one hip is fixed within the cast for better immobilization of the

lumbosacral junction. Patients are expected to wear the orthosis for at least 2 to 4 wks and are encouraged to perform as many daily life activities as

possible. In case of significant pain relief, the test is considered positive

Temporary external transpedicular fixation [19]

Under general anesthesia, antibiotic prophylaxis, and fluoroscopy, two screws are inserted percutaneously through the pedicles into the vertebra above and

two screws into the vertebra below the suspect discs, respectively. Postoperatively, the protruding screw ends are fixed externally with two vertical bars,

which immobilizes the discs of interest. In case of adequate pain relief, the test is considered positive. Optionally, immobilization can be discontinued

without the knowledge of the patient by fixing the bars horizontally (dynamization), which should annul pain relief

MRI, magnetic resonance imaging.

101P.C. Willems et al. / The Spine Journal 13 (2013) 99–109

3. Clinical outcome after fusion, which was considered asthe reference standard, should be presented per indi-vidual patient in such a way that a relevant clinical im-provement cutoff could be defined for analysis andoutcome could be dichotomized into success or failure.

4. Pain, subjective improvement, back-specific disabil-ity, disability for work, or patient satisfaction shouldhave been incorporated as a clinically relevant out-come measure.

5. Studies should include at least 20 patients.6. There were no restrictions by language.7. Study populations with objective neurologic motor

deficit, fracture, infectious disease, ankylosing spon-dylitis, neoplasm, congenital or adolescent idiopathicscoliosis, kyphosis, or adult scoliosis were excluded.

Full publications of studies, which were considered aspotentially relevant by both authors, were retrieved. The ar-ticles were read and checked for final inclusion indepen-dently. Any disagreement with regard to study selectionwas discussed in consensus meetings. In cases where dis-agreement persisted, a third reviewer (RAdB) was con-sulted for the final decision. The references of the articlesidentified by the search were checked for additional eligiblestudies.

Data extraction and assessment of bias

Relevant study data were retrieved by the same two re-viewers using standardized forms. Extracted informationincluded standard reference data (first author, journal, andpublication year), number of patients, characteristics ofstudy population before surgery (ie, age, sex, severity andduration of pain, and/or disability), index test, spinal fusionmethod, outcome measures, and clinical outcome.

The two reviewers independently assessed the risk ofbias of included studies by means of a modified versionof the Quality Assessment for Diagnostic Accuracy Studies(QUADAS) checklist [21]. The QUADAS is a generally ac-knowledged checklist to assess the quality of primary stud-ies of diagnostic accuracy. As there are no gold standardcriteria for quality assessment of studies of prognosticaccuracy, we modified the QUADAS checklist, as follows(Table 2): Items 1 and 2 of the original QUADAS remainedin the modified version. The original Item 3 (Is the refer-ence standard likely to correctly classify the target condi-tion?) was left out because for the selected studies in thepresent review, the reference standard and target conditionare the same (ie, clinical outcome after fusion). Instead,whether the reference standard was assessed by valid mea-sures of acceptable quality was included as Item 3. Item 4of the original QUADAS (Is the time period between thereference standard and the index test short enough to bereasonably sure that the target condition did not changein the time period between these tests?) was removedbecause to obtain a reliable estimation of clinical outcomeafter lumbar spinal fusion (ie, reference standard), thelength of the follow-up should be at least 2 years [22](modified Item 4). The original Item 5 (Did the whole sam-ple or a random selection of the sample receive verificationusing a reference standard of diagnosis?) was left out be-cause for inclusion in the present review, all analyzed pa-tients from the selected studies had undergone fusion andsubsequent clinical outcome assessment. Item 6 of the orig-inal QUADAS (Did patients receive the same referencestandard regardless of the index test result?) remained un-changed (modified Item 5). Item 7 of the original QUADAS(Was the reference standard independent of the index test ordid the index test form part of the reference standard?) wasremoved because the outcome of fusion was assessed much

Table 2

Modified QUADAS checklist: criteria to assess risk of bias

1. Was the spectrum of patients representative of the patients who will receive the index test in practice?

2. Were selection criteria clearly described?

3. Were the outcomes used to assess recovery collected by means of validated measures of acceptable quality?*

4. Was a sufficiently long follow-up period (2 y or more) used to asses the outcome of the spinal fusion operation?*

5. Did all patients receive spinal fusion followed by the outcome assessment regardless of the index test result?

6. Was the execution of the index test described in sufficient detail to permit replication of the test?

7. Was a clear cutoff point used to qualify positive versus negative results of the index test?*

8. Did the effect sizes that were used to consider patients as being recovered (ie, the reference standard) meet accepted standards of clinical relevance, that

is, a minimal important change of 30% or more?*

9. Were the clinical outcomes after fusion assessed without knowledge of the results of the index test?

10. Were the same clinical data available when index test results were interpreted as would be available when the test is used in practice?

11. Were uninterpretable results of the index test reported?

12. Were withdrawals from the study explained?

QUADAS, Quality Assessment of Diagnostic Accuracy Studies [21].

* Items 3, 4, 7, and 8 are items modified for prognostic accuracy.


later than the index test. The original Item 8 was includedin the modified QUADAS version as Item 6. The modifiedItem 7 was added to verify whether an objective and clearlydefined cutoff point was mentioned to determine whetherthe index test was positive or negative. Item 9 of the orig-inal QUADAS (Was the execution of the reference standarddescribed in sufficient detail in order to permit replicationof the test?) was transformed into whether the assessmentof clinical outcome after fusion was adequately addressedaccording to the accepted standards of clinical importance[23] (modified Item 8). Item 10 of the original QUADAS(Were the index test results interpreted without knowledgeof the reference standard?) was left out because fusion out-come was assessed much later than the preoperative indextest. The original Items 11, 12, 13, and 14 were includedas Items 9, 10, 11, and 12, respectively. Disagreements be-tween both reviewers were discussed and resolved in a con-sensus meeting.

Data synthesis and analysis

By combining outcome (dichotomized into success orfailure) with the test results (positive or negative), two-by-two tables with four cells (true positives, false negatives,false positives, and true negatives) could be generated, andtest qualifiers, such as sensitivity, specificity, predictivevalues, and likelihood ratios (LRs) with 95% confidence in-tervals (CIs), were calculated. Calculations were done withMeta-DiSc statistical software version 1.4 (Unit of ClinicalBiostatistics, Ram�on y Cajal Hospital, Madrid, Spain) [24].Statistical pooling was only performed if studies on a spe-cific index test were not hampered by statistical or clinicalheterogeneity. Statistical heterogeneity was defined as non-overlapping 95% CIs for estimates of sensitivity and spec-ificity and a difference in these estimates among the studiesof more than 20% [25,26]. We considered studies as clini-cally heterogeneous when patient groups, outcome mea-sures, or the execution of index tests were different. Incases of statistical or clinical heterogeneity, we refrained

from statistical pooling, and the results were presentedper individual study.

Results

Figure shows the flow diagram of studies from initial re-sults of database searches to final inclusion, according tothe Preferred Reporting Items for Systematic Reviewsand Meta-Analyses (PRISMA) guidelines 2009 [27]. Ofthe 22 selected full articles, six studies in which only pa-tients with a positive index test had been selected for lum-bar fusion were excluded [28–33]. We also excluded sixother studies, in which test accuracy could not be deter-mined because only mean values of recovery were reportedwithout proportions of patients with success or failure oftreatment [17,34–38]. Finally, 10 studies met the inclusioncriteria [18,39–47].

Characteristics of included studies

Study characteristics are listed in Table 3. Three articlesconcerned immobilization by an orthosis, a fiberglass pan-taloon cast [18], a canvas corset [39], or a plaster pantalooncast [40]. Four articles reported on discography, of whichtwo studies focused on provocative discography of suspectlevels [42,47], one study on provocation of the levels adja-cent to the intended fusion [41], and the fourth study fo-cused on the amount of degeneration as registered atdiscography in a group of patients with a positive disco-graphic pain response [45]. Three articles evaluated trialimmobilization by external fixation, either with [43,46] orwithout dynamization [44].

The sample sizes of the included studies ranged from 22to 162. Two studies reported exclusively on patients with-out previous spine surgery [42,45]. The length of thefollow-up ranged from 6 months or ‘‘when fusion wasnoted’’ [47] to 12 years. Either anterior or posterolateralfusion was performed. In the studies in which both

1,055 records identified through

literature search of PubMed

19 full-text articles

assessed for eligibility

1,124 records excluded

based on evaluation of

titles and abstracts

3 additional full-text articles

included by reference

screening

12 full-text articles excluded

- 6 studies only included

patients with a positive

index test

- 6 studies reported mean

outcome instead of

proportions of patients with

successful outcome or

failure

10 studies included in

qualitative synthesis

115 records identified through

literature search of EMBASE

1,143 records identified through combined

literature search after removal of duplicates

1,143 records screened

Figure. PRISMA flow diagram of combined literature search and selection.


procedures were performed, no difference in outcome be-tween the two types of fusion was reported [40–43]. Painappeared to be the only measure of outcome that was con-sistently reported in all included articles. Five studies useda visual analog scale to score pain, of which three studies[40,41,43] defined a cutoff point of at least 30% decreasein pain as a clinically relevant improvement; one study,a decrease of at least 75% [45]; and one study, ‘‘little orno pain on a visual analog scale [44].’’ The other studiesused a subjective pain scale (pain free or significant pain re-lief vs. insignificant or no pain relief) [18,39,42,46,47]. Nostudy reporting on facet joint blocks or MRI could meet theinclusion criteria.

Risk of bias

The two reviewers agreed on 77 of the 120 items (64%)scored. After discussion, consensus could be reached on allitems. Most disagreements were because of reading errorsor ambiguous reporting. The risk of bias ratings are listedin Table 4. The most prevalent shortcomings were as fol-lows: uninterpretable index test results were not reported,no clear cutoff point was defined for positive and negative

index test results, and in all but three studies, not all pa-tients who were tested underwent fusion regardless of theindex test result (verification bias).

Test accuracy

Table 5 summarizes test accuracies from the includedstudies.

OrthosisAn orthosis or cast immobilization could neither confirm

nor rule out a good outcome after spinal fusion with vari-able sensitivity (0.43–0.94; range of positive LR, 0.94–1.13) and specificity (0.14–0.61; range of negative LR,0.39–1.12) [18,39,40].

Provocative discographyThe four studies reporting on the prognostic value of dis-

cography showed variable sensitivity (0.40–0.88; range ofpositive LR, 0.70–1.71) and specificity (0.25–0.48; rangeof negative LR, 0.24–1.40) [41,42,45,47]. In one study,LRs were statistically significant (positive LR, 1.71; 95%CI, 1.21–2.41 and negative LR, 0.24; 95% CI, 0.13–0.43),

Table 3

Characteristics of included primary studies on the clinical outcome of lumbar spinal fusion

Source Setting

Included

patients

(n)

Patients

for current

analysis (n)

Mean

age6SD

(range)

Male

(n), % Patient characteristics

Study

design

Index test and

criterion for positive

result

Follow-up

in mo

(range)

Method of

fusion

Criteria for

positive reference

standard (5fusion

outcome)

Thoracolumbosacral orthosis

Axelsson

et al. [39]

Tertiary

(university

hospital)

50 50 44 (20–68) 28 (56) Intractable LBP:

spondylolisthesis

(n524), facet/disc

degeneration

(n511) or

postlaminectomy

syndrome (n515),

duration not

specified

Retrospective Thoracolumbosacral

orthosis or canvas

corset; positive in

case of $50%

subjective pain

relief

24 Posterolateral

fusion with

autograft

Pain free or

significantly

improved on

a five-point

pain scale and

satisfied on a

three-point

satisfaction

scale*

Rask and

Dall [18]

Tertiary

(university

hospital)

45 25 43.7 (20–61) Not

specified

O6 mo back pain

(mean, 3.9 y), no

neurologic motor

deficit, herniation,

or olisthesisO2

mm, 38% with

prior spine surgery

Retrospective Fiberglass pantaloon

cast; positive in

case of pain relief

that returned after

the removal of the

cast

Minimum

of 6

Posterolateral

fusion with

autograft

Significant

subjective

pain relief*

Willems

et al. [40]

Tertiary

(specialized

hospital)

257 107 4068.8

(range not

specified)

39 (36) Incapacitating LBP,

mean, 3.7 (0.5–31)

y, no neurologic

motor deficit and

routine testing

indecisive, 65%

with prior spine

surgery (N570)

Retrospective Pantaloon plaster cast;

positive in case of

subjective

substantial pain

relief in the cast

Median

of 76

(15–144)

Posterolateral

(n579) or

anterior

fusion

(n528)

$30% decrease

in pain on a

VAS (0–100)

Provocative discography

Colhoun

et al. [42]

Tertiary

(orthopedic

hospital)

195 168 39.1 (17–70) 86 (51) Persistent LBP, no

previous back

surgery, duration

not specified

Retrospective Provocative

discography;

positive in case of

typical pain

reproduction,

which was not

present in adjacent

control discs

Mean

of 43

(24–120)

Anterior or

posterior

fusion

(numbers

not specified)

Complete pain

relief or

significant

subjective

improvement,

resumption of

work/normal

duties, and no

intake of

analgesics*

Esses

et al. [47]

Tertiary

(university

hospital)

32 22 41 (31–57) 18 (84%) Long-standing LBP,

mean duration of

6.2 (1–20) y, 54%

with prior spine

surgery

Prospective Provocative

discography, no

control discs;

positive in case of

typical pain

reproduction

‘‘When

fusion

was

noted’’

Posterolateral

fusion with

autograft

Complete or

significant

relief of pain*

104

P.C.Willem

set

al./TheSpineJournal13(2013)99–109

Gill and

Blumenthal

[45]

Tertiary

(orthopedic

institute)

53 53 34 (21–50) 36 (68) LBP with a mean

disability of 11

(3–120) mo, all

selected by

concordant pain

response provoked

at discography

L5–S1

Design not

specified

Discography image of

L5–S1, no control

disc(s); positive in

case of annular tear

extending to the

periphery

Mean of 36

(24–54)

Anterior fusion

with allograft

(n548) or

autograft

(n55)

Relief of $75%

of initial back

pain on VAS,

return to work,

and no use of

narcotics

Willems

et al. [41]

Tertiary

(specialized

hospital)

197 82 4068.5

(range not

specified)

26 (32) Incapacitating LBP

O1 y (mean

duration not

specified), no

neurologic motor

deficit, and routine

testing indecisive,

65% with prior

spine surgery

(N553)

Retrospective Provocative

discography of

adjacent to

intended fusion;

positive in case of

no or unfamiliar

pain reproduction

Mean of 80

(15–144)

Posterolateral or

anterior fusion

(numbers not

specified)

$30% decrease

in pain on a

VAS (0–100)

TETF

Elmans

et al. [43]

Tertiary

(specialized

hospital)

330 123 4269

(range not

specified)

45 (37) Incapacitating LBP

O1 y (mean

duration of 665 y)

with inconclusive

routine testing,

62% with prior

spine surgery

Prospective TETF with

dynamization;

positive if VAS in

placebo was $30

points more than

VAS in fixation

Median

of 79

(15–44)

mo

Anterior

(n533) or

posterolateral

(n590) fusion

$30% decrease

in pain on a

VAS (0–100)

Heini

et al. [44]

Tertiary

(university

hospital)

63 36 48 (26–67) 22 (62) LBP with a mean

duration of 5 (1–

20) y, 67% with

prior spine surgery

Prospective TETF without

dynamization;

positive if pain on

VAS and use of

analgesics

decreased

sufficiently

(estimated by

surgeon)

Mean of 32

(23–60)

Posterolateral

fusion, except

three dynamic

fixations and

one anterior

fusion

No or little pain

on a VAS and

no pain

medication*

Jeanneret

et al. [46]

Secondary

(regional

hospital)

101 43 48 (22–74) 25 (58) Chronic LBP with or

without leg pain,

duration not

specified, routine

testing

indeterminate

Design not

specified

TETF with

dynamization;

positive if pain was

reduced at

stabilization and

returned at

destabilization

Mean of 50

(24–92)

Posterior or

anterior fusion

(numbers not

specified)

Almost

completely

pain free, no

pain medication

and working*

SD, standard deviation; LBP, low back pain; VAS, visual analog scale; TETF, temporary external transpedicular fixation.

* No validated outcome measure reported.

105

P.C.Willem

set

al./TheSpineJournal13(2013)99–109

Table 4

Risk of bias: number of quality criteria of the modified QUADAS checklist* that were met

Quality criteria

Source 1 2 3 4 5 6 7 8 9 10 11 12 Total


Axelsson et al. [39] þ � � þ þ þ þ � ? þ ? ? 6

Rask and Dall [18] þ þ � � � þ � � þ þ � þ 6

Willems et al. [40] � þ þ þ � þ � þ þ þ � þ 8


Colhoun et al. [42] þ ? � þ þ þ � � ? þ ? ? 5

Esses et al. [47] þ � � � � � ? � � þ � ? 2

Gill and Blumenthal [45] � � ? þ þ ? þ � � ? ? þ 4

Willems et al. [41] � þ þ þ � þ ? þ þ � � þ 7

Temporary external transpedicular fixation

Elmans et al. [43] � � þ þ � þ þ þ þ ? ? þ 7

Heini et al. [44] � � þ þ � þ � � � þ þ þ 6

Jeanneret et al. [46] þ ? � þ � þ � � � þ � ? 4

QUADAS, Quality Assessment for Diagnostic Accuracy Studies; þ, yes; �, no; ?, unclear.* See Table 2 for complete modified QUADAS checklist.


but specificity was low, 0.48, meaning that only half of thepatients who would not improve after fusion could be de-tected [42].

Temporary external transpedicular fixationSensitivity was generally high (0.80–0.93; range of pos-

itive LR, 1.02–1.74), but specificity was low (0.20–0.47;range of negative LR, 0.15–0.94) [43,44,46,47]. In onestudy, LRs were statistically significant (positive LR,1.74; 95% CI, 1.07–2.83 and negative LR, 0.15; 95% CI,0.03–0.65), but with a specificity of only 0.47 [46].

Statistical pooling was not feasible because of differenttest protocols with no clear cutoff point for a positive versusnegative result, variability in outcome assessment, and het-erogeneous patient populations (varying diagnoses and dif-ferent mix of patients with or without prior spine surgerybetween studies, Table 3).

Table 5

Summary prognostic accuracy of orthosis immobilization, provocative discograp

Source

Sample

size Sensitivity Specificity


Axelsson et al. [39] 50 0.61 0.35

Rask and Dall [18] 25 0.94 0.14

Willems et al. [40] 107 0.43 0.61


Colhoun et al. [42] 168 0.88 0.48

Esses et al. [47] 22 0.40 0.43

Gill and Blumenthal [45] 53 0.81 0.41

Willems et al. [41] 82 0.73 0.27

Temporary external transpedicular fixation

Elmans et al. [43] 123 0.80 0.34

Heini et al. [44] 36 0.81 0.20

Jeanneret et al. [46] 43 0.93 0.47

LR, likelihood ratio; 95% CI, 95% confidence interval.

Discussion

We systematically reviewed the accuracy of tests that arecommonly used in clinical practice to identify thosepatients with chronic LBP who will benefit from spinalfusion. With LRs approaching one, all tests failed to accu-rately predict the outcome of spinal fusion. In particular,specificity was consistently low, meaning that for all tests,high proportions of false-positive test results will lead tounnecessary invasive and expensive surgery.

The lack of proven accuracy of the current tests is re-flected in the high degree of clinical uncertainty in decisionmaking regarding fusion surgery for chronic LBP [4,48].Studies among spine surgeons show that there is no consen-sus in treatment strategy [49,50], and our results confirmthat in many clinical practices, patients are scheduled forfusion on the basis of tests, of which the accuracy is insuf-ficient or at best unknown. In this respect, it is

hy, and external transpedicular fixation for spinal fusion outcome

Positive

predictive

value

Negative

predictive

value

Positive

LR (95% CI)

Negative

LR (95% CI)

0.64 0.32 0.94 (0.60–1.46) 1.12 (0.52–2.41)

0.74 0.50 1.10 (0.80–1.52) 0.39 (0.03–5.40)

0.44 0.61 1.13 (0.71–1.80) 0.92 (0.67–1.28)

0.88 0.48 1.71 (1.21–2.41) 0.24 (0.13–0.43)

0.60 0.25 0.70 (0.29–1.70) 1.40 (0.54–3.62)

0.74 0.50 1.37 (0.89–2.10) 0.47 (0.20–1.13)

0.45 0.55 0.99 (0.76–1.30) 1.01 (0.49–2.08)

0.46 0.71 1.22 (0.98–1.51) 0.58 (0.31–1.11)

0.45 0.57 1.02 (0.74–1.40) 0.94 (0.24–3.60)

0.77 0.78 1.74 (1.07–2.83) 0.15 (0.03–0.65)


disappointing that for MRI, the recommended evaluationtool for LBP, no studies could be identified to determineits accuracy for surgical decision making.

Limitations

As pain appeared to be the only measure of outcome thatwas consistently incorporated in all studies, our analysiswas limited to pain. Although it represents only one aspectof the complex chronic LBP problem, pain is the primaryindication for operative treatment.

Substantial risk of bias in most selected studies pre-cluded firm conclusions. A major drawback was that inall but three studies, a proportion of patients who hadundergone the index test with a negative test result weredenied fusion and had been excluded from analysis (verifi-cation bias). Ideally, fusion should have been performedcompletely independent of the index test result. Becauseof the substantial risk of bias, heterogeneous patient popu-lations, different methods of fusion, and variability in testprotocols, a meta-analysis would produce misleadingresults. Therefore, we refrained from statistical pooling ofincluded studies [51].

The current review focused on a limited number of in-dividual tests and thus provides no evidence on other teststhat may be used in clinical practice for decision making.Moreover, as there are no reports in the literature on thecombined use of prognostic tests, no information is pro-vided on clinical utility if some of these tests were usedin an algorithmic approach. In addition, psychosocial pa-tient factors (eg, worker’s compensation and smoking)that may negatively affect treatment outcome and thusare very relevant for clinical decision making were notincorporated.

The QUADAS tool is an acknowledged tool to assess thequality of diagnostic studies. As there are no validated toolsfor prognostic purposes, the QUADAS tool was modified asdescribed in the Methods section. It should be acknowl-edged that the modified QUADAS tool is not validatedfor prognostic studies and that the changes made are there-fore debatable.

It should be noted that the present study design focusedon the optimization of the results of spinal fusion. In sucha design, a positive outcome of a high accuracy test doesnot necessarily imply an indication for spinal fusion, asthe test may merely identify those patients with a better nat-ural history, regardless of treatment. Only if tests for patientselection would be embedded in a randomized controlledtrial design between fusion and programmed conservativecare, it could truly be determined what would be the besttreatment for subgroups of patients. At present, such studieshave not been performed.

From the vast amount of literature on spinal fusion forLBP, only 10 studies evaluating three tests could meet our in-clusion criteria. It is disappointing that such a small numberof studies focused on true test accuracy of mainly expensive

and invasive tests with potential complications. In the searchselection, we excluded six studies [17,34–38] that reportedmean clinical improvement after fusion for patient groupswith a positive index test result and for patient groups witha negative index test result, respectively. Because no propor-tions of patients with good clinical outcome were reported[23], no two-by-two tables could be created to determine testaccuracy. Three of these excluded studies reported on MRIwith conflicting results. In one study, inflammatory vertebralend plate changes (Modic Type I) were significantly relatedto continued back pain after fusion [34], whereas the othertwo studies showed a significantly better [36] or a relativelybetter (no statistics) [35] outcome for patients with ModicType I end plate changes, respectively. A study on preopera-tive test bracing revealed ‘‘a clear tendency for poorer prog-nosis for patients who had responded poorly to the brace[37].’’ Another excluded study focusing on pressure-controlled discography reported no significant differencesin long-term surgical outcome across the entire sample[38]. A study on facet joint blocks failed to show any signif-icant correlation between test results and the outcome of spi-nal fusion [17].

Clinical relevance and implications for practice

Several studies have reported that cognitive behavioraltherapy or intensive exercise programs [5,6,52] have treat-ment results similar to those of spinal fusion, but with con-siderably less complications, morbidity, and costs [9,10].The findings of the present review show that the currentlyused tests do not improve the results of fusion by better pa-tient selection, which makes it hard to propose spinal fusionas a standard treatment for chronic LBP. Currently usedtests for patient selection are not recommended for surgicaldecision making in standard care.

Implications for future research

To verify whether spinal fusion could be effective fora subset of patients with persisting symptoms after conser-vative care, future research should focus on studies thatinclude both positively and negatively tested patients ina randomized design between fusion and programmed con-servative care. Test protocols should be clearly described,and clinical outcome should be defined by a consensus cut-off point of improvement in pain and functional status,a so-called minimal clinically important change [23,53].Consensus on relevant outcomes and choice of measure-ment tools would provide better consistency and compara-bility across studies. To further minimize the risk of bias,detailed reporting of methods and interventions would al-low replication and appropriate interpretation of results[54,55]. Additionally, the role of MRI, as well as the rela-tion between treatment outcome and psychosocial patientrisk factors for persistent disabling LBP [56], should be fur-ther elucidated. Determination of reliable predictors of out-come would greatly help physicians to counsel their


patients properly in weighing the risks and benefits of treat-ment options for chronic LBP.

Conclusions

No subset of patients with chronic LBP could be identi-fied for whom spinal fusion is a predictable and effectivetreatment. Best evidence does not support the use of currenttests for patient selection in clinical practice.

References

[1] Maniadakis N, Gray A. The economic burden of back pain in the UK.

Pain 2000;84:95–103.

[2] Lambeek LC, van Tulder MW, Swinkels IC, et al. The trend in total

cost of back pain in The Netherlands in the period 2002 to 2007.

Spine 2011;36:1050–8.

[3] Airaksinen O, Brox JI, Cedraschi C, et al. Chapter 4. European guide-

lines for the management of chronic nonspecific low back pain. Eur

Spine J 2006;15(Suppl 2):S192–300.

[4] Deyo RA, Nachemson A, Mirza SK. Spinal-fusion surgery—the case

for restraint. N Engl J Med 2004;350:722–6.

[5] Brox JI, Sorensen R, Friis A, et al. Randomized clinical trial of lum-

bar instrumented fusion and cognitive intervention and exercises in

patients with chronic low back pain and disc degeneration. Spine

2003;28:1913–21.

[6] Fairbank J, Frost H, Wilson-MacDonald J, et al. Randomised con-

trolled trial to compare surgical stabilisation of the lumbar spine with

an intensive rehabilitation programme for patients with chronic low

back pain: the MRC spine stabilisation trial. BMJ 2005;330:1233.

[7] Wilson-MacDonald J, Fairbank J, Frost H, et al. The MRC spine sta-

bilization trial: surgical methods, outcomes, costs, and complications

of surgical stabilization. Spine 2008;33:2334–40.

[8] Rivero-Arias O, Campbell H, Gray A, et al. Surgical stabilisation of

the spine compared with a programme of intensive rehabilitation for

the management of patients with chronic low back pain: cost utility

analysis based on a randomised controlled trial. BMJ 2005;330:1239.

[9] Fritzell P, Hagg O, Jonsson D, Nordwall A. Cost-effectiveness of

lumbar fusion and nonsurgical treatment for chronic low back pain

in the Swedish Lumbar Spine Study: a multicenter, randomized, con-

trolled trial from the Swedish Lumbar Spine Study Group. Spine

2004;29:421–34; discussion Z3.

[10] Deyo RA, Mirza SK, Martin BI, et al. Trends, major medical compli-

cations, and charges associated with surgery for lumbar spinal steno-

sis in older adults. JAMA 2010;303:1259–65.

[11] Lord SJ, Staub LP, Bossuyt PM, Irwig LM. Target practice: choosing

target conditions for test accuracy studies that are relevant to clinical

practice. BMJ 2011;343:d4684.

[12] Schunemann HJ, Oxman AD, Brozek J, et al. Grading quality of ev-

idence and strength of recommendations for diagnostic tests and

strategies. BMJ 2008;336:1106–10.

[13] Modic MT, Masaryk TJ, Ross JS, Carter JR. Imaging of degenerative

disk disease. Radiology 1988;168:177–86.

[14] Pfirrmann CW, Metzdorf A, Zanetti M, et al. Magnetic resonance

classification of lumbar intervertebral disc degeneration. Spine

2001;26:1873–8.

[15] Resnick DK, Choudhri TF, Dailey AT, et al. Guidelines for the per-

formance of fusion procedures for degenerative disease of the lumbar

spine. Part 2: assessment of functional outcome. J Neurosurg Spine

2005;2:639–46.

[16] Guyer RD, Ohnmeiss DD. Lumbar discography. Position statement

from the North American Spine Society Diagnostic and Therapeutic

Committee. Spine 1995;20:2048–59.

[17] Esses SI, Moro JK. The value of facet joint blocks in patient selection

for lumbar fusion. Spine 1993;18:185–90.

[18] Rask B, Dall BE. Use of the pantaloon cast for the selection of fusion

candidates in the treatment of chronic low back pain. Clin Orthop

Relat Res 1993;288:148–57.

[19] van der Schaaf DB, van Limbeek J, Pavlov PW. Temporary external

transpedicular fixation of the lumbosacral spine. Spine 1999;24:

481–4; discussion 4–5.

[20] Devill�e WL, Buntinx F, Bouter LM, et al. Conducting systematic

reviews of diagnostic studies: didactic guidelines. BMC Med Res

Methodol 2002;2:9.

[21] Whiting P, Rutjes AW, Reitsma JB, et al. The development of QUA-

DAS: a tool for the quality assessment of studies of diagnostic accu-

racy included in systematic reviews. BMC Med Res Methodol

2003;3:25.

[22] Turner JA, Deyo RA, Loeser JD, et al. The importance of placebo

effects in pain treatment and research. JAMA 1994;271:1609–14.

[23] Ostelo RW, Deyo RA, Stratford P, et al. Interpreting change scores

for pain and functional status in low back pain: towards international

consensus regarding minimal important change. Spine 2008;33:90–4.

[24] Zamora J, Abraira V, Muriel A, et al. Meta-DiSc: a software for meta-

analysis of test accuracy data. BMC Med Res Methodol 2006;6:31.

[25] Jellema P, van der Windt DA, Bruinvels DJ, et al. Value of symptoms

and additional diagnostic tests for colorectal cancer in primary care:

systematic review and meta-analysis. BMJ 2010;340:c1269.

[26] van der Windt DA, Jellema P, Mulder CJ, et al. Diagnostic testing for

celiac disease among patients with abdominal symptoms: a systematic

review. JAMA 2010;303:1738–46.

[27] Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting

items for systematic reviews and meta-analyses: the PRISMA state-

ment. PLoS Med 2009;6:e1000097.

[28] Ohtori S, Kinoshita T, Yamashita M, et al. Results of surgery for dis-

cogenic low back pain: a randomized study using discography versus

discoblock for diagnosis. Spine 2009;34:1345–8.

[29] Lovely TJ, Rastogi P. The value of provocative facet blocking as

a predictor of success in lumbar spine fusion. J Spinal Disord

1997;10:512–7.

[30] Bednar DA, Raducan V. External spinal skeletal fixation in the man-

agement of back pain. Clin Orthop Relat Res 1996;322:131–9.

[31] Axelsson P, Johnsson R, Stromqvist B, Andreasson H. Temporary ex-

ternal pedicular fixation versus definitive bony fusion: a prospective

comparative study on pain relief and function. Eur Spine J

2003;12:41–7.

[32] Madan S, Gundanna M, Harley JM, et al. Does provocative discogra-

phy screening of discogenic back pain improve surgical outcome?

J Spinal Disord Tech 2002;15:245–51.

[33] Peng B, Chen J, Kuang Z, et al. Diagnosis and surgical treatment of

back pain originating from endplate. Eur Spine J 2009;18:1035–40.

[34] Buttermann GR, Heithoff KB, Ogilvie JW, et al. Vertebral body MRI

related to lumbar fusion results. Eur Spine J 1997;6:115–20.

[35] Chataigner H, Onimus M, Polette A. [Surgery for degenerative lum-

bar disc disease. Should the black disc be grafted?]. Rev Chir Orthop

Reparatrice Appar Mot 1998;84:583–9.

[36] Esposito P, Pinheiro-Franco JL, Froelich S, Maitrot D. Predictive

value of MRI vertebral end-plate signal changes (Modic) on outcome

of surgically treated degenerative disc disease. Results of a cohort

study including 60 patients. Neurochirurgie 2006;52:315–22.

[37] Christensen FB, Karlsmose B, Hansen ES, Bunger CE. Radiological

and functional outcome after anterior lumbar interbody spinal fusion.

Eur Spine J 1996;5:293–8.

[38] Derby R, Howard MW, Grant JM, et al. The ability of pressure-

controlled discography to predict surgical and nonsurgical outcomes.

Spine 1999;24:364–71; discussion 71–2.

[39] Axelsson P, Johnsson R, Stromqvist B, et al. Orthosis as prognostic

instrument in lumbar fusion: no predictive value in 50 cases followed

prospectively. J Spinal Disord 1995;8:284–8.


[40] Willems PC, Elmans L, Anderson PG, et al. The value of a pantaloon

cast test in surgical decision making for chronic low back pain

patients: a systematic review of the literature supplemented with

a prospective cohort study. Eur Spine J 2006;15:1487–94.

[41] Willems PC, Elmans L, Anderson PG, et al. Provocative discography

and lumbar fusion: is preoperative assessment of adjacent discs use-

ful? Spine 2007;32:1094–9; discussion 1100.

[42] Colhoun E, McCall IW, Williams L, Cassar Pullicino VN. Provoca-

tion discography as a guide to planning operations on the spine.

J Bone Joint Surg Br 1988;70:267–71.

[43] Elmans L, Willems PC, Anderson PG, et al. Temporary external

transpedicular fixation of the lumbosacral spine: a prospective, longi-

tudinal study in 330 patients. Spine 2005;30:2813–6.

[44] Heini PF, Gahrich U, Orler R. The external fixator: a tool for evalu-

ation of complex low back pain problems. J Spinal Disord Tech

2004;17:8–14.

[45] Gill K, Blumenthal SL. Functional results after anterior lumbar

fusion at L5-S1 in patients with normal and abnormal MRI scans.

Spine 1992;17:940–2.

[46] Jeanneret B, Jovanovic M, Magerl F. Percutaneous diagnostic stabili-

zation for low back pain. Correlation with results after fusion opera-

tions. Clin Orthop Relat Res 1994;304:130–8.

[47] Esses SI, Botsford DJ, Kostuik JP. The role of external spinal skeletal

fixation in the assessment of low-back disorders. Spine 1989;14:

594–601.

[48] Weinstein JN, Lurie JD, Olson PR, et al. United States’ trends and

regional variations in lumbar spine surgery: 1992-2003. Spine

2006;31:2707–14.

[49] Irwin ZN, Hilibrand A, Gustavel M, et al. Variation in surgical deci-

sion making for degenerative spinal disorders. Part I: lumbar spine.

Spine 2005;30:2208–13.

[50] Katz JN, Lipson SJ, Lew RA, et al. Lumbar laminectomy alone or

with instrumented or noninstrumented arthrodesis in degenerative

lumbar spinal stenosis. Patient selection, costs, and surgical out-

comes. Spine 1997;22:1123–31.

[51] Macaskill P, Gatsonis C, Deeks J, et al. Chapter 10: analysing and

presenting results. In: Deeks J, Bossuyt P, Gatsonis C, eds. Cochrane

handbook for systematic reviews of diagnostic test accuracy version

1.0. West Sussex, UK: The Cochrane Collaboration, John Wiley &

Sons, 2010.

[52] GuzmanJ,EsmailR,KarjalainenK, et al.Multidisciplinary rehabilitation

for chronic low back pain: systematic review. BMJ 2001;322:1511–6.

[53] van der Roer N, Ostelo RW, Bekkering GE, et al. Minimal clinically

important change for pain intensity, functional status, and general

health status in patients with nonspecific low back pain. Spine

2006;31:578–82.

[54] Moher D, Schulz KF, Altman DG. The CONSORT statement:

revised recommendations for improving the quality of reports of

parallel-group randomized trials. Ann Intern Med 2001;134:

657–62.

[55] von Elm E, Altman DG, Egger M, et al. The Strengthening the Re-

porting of Observational Studies in Epidemiology (STROBE) state-

ment: guidelines for reporting observational studies. J Clin

Epidemiol 2008;61:344–9.

[56] Chou R, Shekelle P. Will this patient develop persistent disabling low

back pain? JAMA 2010;303:1295–302.

Documents

Spinal fusion for chronic low back pain: systematic review on the accuracy of tests for patient selection