Guide to scoring evidencemethod, scoring them on the quality of their implementation. This scoring...

Preview:

Citation preview

Guide to

scoring evidenceusing the Maryland

Scientific Methods ScaleUpdated June 2016

Contents

1. Introduction 1

2. Impact evaluation 3

Using Counterfactuals 3

3. SMS 5 methods 6

Randomised Control Trial (RCT) 6

4. SMS 4 methods 10

Instrumental variables (IV) 10

Regression Discontinuity Design (RDD) 13

5. SMS 3/4 methods’ 17

6. SMS 3 methods 18

Difference-in-differences (DiD) 19

Panel data methods 20

Propensity Score Matching (PSM) 23

7. SMS 2 methods and below 26

Cross-sectional regression 26

Before-and-after 27

Additionality (not SMS scoreable) 28

Impact modelling (not SMS scoreable) 29

Appendix 1: Quick-scoring scoring guide for the Maryland Scientific Method Scale 30

Appendix 2: Mixed methods scoring 4 or 3. 33

Hazard Regressions 33

Heckman two-stage correction (H2S)/ Control function (CF) 36

Appendix 3: Arellano-Bond method 39

00

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 1

01

Introduction

TheWhatWorksCentreforLocalEconomicGrowth(WWG)producessystematicreviewsoftheevidencebaseonabroadrangeofpoliciesintheareaoflocaleconomicgrowth.Animportantstepinthereviewprocessistheassessmentofwhetheranevaluationprovidesconvincingevidenceonlikelypolicyimpacts.OurassessmentisbasedonthescoringofpapersontheMarylandScientificMethodsScale(SMS),whichrankspolicyevaluationsfrom1(leastrobust)to5(mostrobust)accordingtotherobustnessofthemethodusedandthequalityofitsimplementation.Robustness,asjudgedbytheMarylandSMS,istheextenttowhichthemethoddealswiththeselectionbiasesinherenttopolicyevaluations.

ThisdocumentexaminesawidevarietyofcommonlyemployedmethodsandexplainshowweplacethemontheMarylandSMS.Itthenlooksatanumberofexamplesofpolicyevaluationsforeachmethod,scoringthemonthequalityoftheirimplementation.

Thisscoringguideplaysseveralimportantroles.Firstly,consistentwithourcommitmenttoopenness,itservesasdocumentationastohowwerankstudiesbyrobustnessforourseriesofsystematicreviewsandenablesustobeconfidentthatweareachievingalevelofconsistencyinourrankingacrosstheteam.Secondly,itactsasascoringhandbookforanyonewantingtoassesstherobustnessofaparticularpolicyevaluation.Thisprovidesusefulguidanceonhowmuchweighttoputonaparticularpieceofevidence.Finallyitcanhelporganisationsundertakingevaluationseitherinassessingthemaftercompletionorinhelpingchoosebetweenmethodologiesbeforehand.

Thisdocumentisnosubstituteforbettertechnicaltrainingandexpertadvice.Butitshouldhelpthosewithsomeknowledgeofevaluationtechniquestobetterunderstandrecentadvancesandthewaythatwetreattheseinoursystematicreviews.It’salsoimportanttonotethattherankingofindividualstudiesisnotanexactscienceandofteninvolvesadegreeofjudgement.Statisticaltestingcanonlytakeussofarinassessingthesuitabilityofagivenmethodandthequalityofitsapplicationinpractice.Whenassessinganindividualstudy(includingthespecificexamplesthatwediscusshere)thereisalwaysscopeforsomedisagreementontheexactranking.Indeed,anyonewhohasattendedanacademicseminarwillknowtheextenttowhichsuchissuescanbehotlydisputed.Thatsaid,onaverageourscoringwilltendtoproducerankingsonwhichmanyevaluationexpertswouldbroadly

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 2

agree.Ofcourse,whenpullingtogetherourevidencereviewswelookattheoverallbalanceoftheevidenceandit’sthereforehighlyunlikelythataspecificrankingonanygivenstudywillinfluencetheconclusionsthatwereachonpolicyeffectiveness.

Theexamplesthatweusearedrawnfromawiderangeofstudies.Notallofthemarespecificallyfocussedonlocaleconomicgrowthbutinallcasestheydemonstrateapproachesthatcouldfeasiblybetakeninfutureevaluations.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 3

Impact evaluation

Governmentsaroundtheworldincreasinglyhavestrongsystemstomonitorpolicyinputs(suchastheamountofloansguaranteedtoSMEs,whichpolicymakershopewillcreatenewjobs)andoutputs(suchasthenumberoffirmsthathavereceivedloanguarantees).However,theyarelessgoodatidentifyingpolicyoutcomes(suchastheeffectofprovidingaloanguaranteeonfirmemployment).Inparticular,manygovernment-sponsoredevaluationsthatlookatoutcomesdonotusecrediblestrategiestoassessthecausalimpactofpolicyinterventions.

Bycausalimpact,theevaluationliteraturemeansanestimateofthedifferencethatcanbeexpectedbetweentheoutcomefor(inthisexample)firms‘treated’inaprogramme,andtheaverageoutcometheywouldhaveexperiencedwithoutit.Pinningdowncausalityisacruciallyimportantpartofimpactevaluation.Estimatesofthebenefitsofaprojectareoflimitedusetopolicymakersunlessthosebenefitscanbeattributed,withareasonabledegreeofcertainty,tothatproject.

Thecredibilitywithwhichevaluationsestablishcausalityisthecriteriononwhichthisreviewassessestheliterature.

Using CounterfactualsEstablishing causality requires the construction of a valid counterfactual–i.e.whatwouldhavehappenedtoprogrammeparticipantshadtheynotbeentreatedundertheprogramme.Thatoutcomeisfundamentallyunobservable,soresearchersspendagreatdealoftimetryingtorebuildit.Thewayinwhichthiscounterfactualis(re)constructedisthekeyelementofimpactevaluationdesign.

A standard approach is to create a counterfactual group of similar places, people or businesses not undertaking the kind of project being evaluated. Changesinoutcomescanthenbecomparedbetweenthe‘treatmentgroup’(thoseaffectedbythepolicy)andthe‘controlgroup’(similarplaces,peopleorbusinessesnotexposedtothepolicy).

A key issue in creating the counterfactual group is dealing with the ‘selection into treatment’ problem.Selectionintotreatmentoccurswhenparticipantsintheprogrammedifferfromthosewhodonotparticipateintheprogramme.

02

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 4

Anexampleofthisproblemforaccesstofinanceprogrammes,asinourexampleabove,wouldbewhenmoreambitiousfirmsapplyforsupport.Ifthishappens,estimatesofpolicyimpactmaybebiasedupwardsbecauseweincorrectlyattributebetterfirmoutcomestothepolicy,ratherthantothefactthatthemoreambitiousparticipantswouldhavedonebetterevenwithouttheprogramme.

Selectionproblemsmayalsoleadtodownwardbias.Forexample,firmsthatapplyforsupportmightbeexperiencingproblemsandsuchfirmsmaybelesslikelytogroworsucceedindependentofanyadvicetheyreceive.Thesefactorsareoftenunobservabletoresearchers.

So the challenge for good programme evaluation is to deal with these issues, and to demonstrate that the control group is plausible.Iftheconstructionofplausiblecounterfactualsiscentraltogoodpolicyevaluation,thenthecrucialquestionbecomes:how do we design counterfactuals? Thisscoringguideexplainsthewaysinwhichweassesshowwellresearchershavedoneinansweringthisquestion.InparticularitexplainshowwerankimpactevaluationsbasedonanadjustedversionoftheMarylandScientificMethodsScale(SMS)todothis.1TheSMSisafive-pointscalerangingfrom1,forevaluationsbasedonsimplecrosssectionalcorrelations,to5forrandomisedcontroltrials.Box1providesanoverviewofthescaleandhighlightssomeoftheapproachesthatwediscussinmoredetailintheremainderofthisguide.

1 Sherman,Gottfredson,MacKenzie,Eck,Reuter,andBushway(1998).

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 5

Box 1: Our robustness scores (based on an adjusted Maryland Scientific Methods Scale)

Level 1:Either (a) across-sectional comparison of treated groups with untreated groups, or (b) a before-and-after comparison of treated group, without an untreated comparison group.Nouseofcontrolvariablesinstatisticalanalysistoadjustfordifferencesbetweentreatedanduntreatedgroupsorperiods.

Level 2:Use of adequate control variables and either (a) across-sectional comparison of treated groups with untreated groups, or (b) a before-and-after comparison of treated group, without an untreated comparison group. In(a), controlvariablesormatchingtechniquesusedtoaccountforcross-sectionaldifferencesbetweentreatedandcontrolsgroups.In(b),controlvariablesareusedtoaccountforbefore-and-afterchangesinmacrolevelfactors.

Level 3:Comparison of outcomes in treated group after an intervention, with outcomes in the treated group before the intervention, and a comparison group used to provide a counterfactual (e.g. difference in difference). Justificationgiventochoiceofcomparatorgroupthatisarguedtobesimilartothetreatmentgroup.Evidencepresentedoncomparabilityoftreatmentandcontrolgroups.Techniquessuchasregressionand(propensityscore)matchingmaybeusedtoadjustfordifferencebetweentreatedanduntreatedgroups,buttherearelikelytobeimportantunobserveddifferencesremaining.

Level 4:Quasi-randomness in treatment is exploited, so that it can be credibly held that treatment and control groups differ only in their exposure to the random allocation of treatment.Thisoftenentailstheuseofaninstrumentordiscontinuityintreatment,thesuitabilityofwhichshouldbeadequatelydemonstratedanddefended.

Level 5: Reserved for research designs that involve explicit randomisation into treatment and control groups, with Randomised Control Trials (RCTs) providing the definitive example. Extensiveevidenceprovidedoncomparabilityoftreatmentandcontrolgroups,showingnosignificantdifferencesintermsoflevelsortrends.Controlvariablesmaybeusedtoadjustfortreatmentandcontrolgroupdifferences,butthisadjustmentshouldnothavealargeimpactonthemainresults.Attentionpaidtoproblemsofselectiveattritionfromrandomlyassignedgroups,whichisshowntobeofnegligibleimportance.Thereshouldbelimitedor,ideally,nooccurrenceof‘contamination’ofthecontrolgroupwiththetreatment.

Note:TheselevelsarebasedonbutnotidenticaltotheoriginalMarylandSMS.Thelevelsherearegenerallyalittlestricterthantheoriginalscaletohelptoclearlyseparatelevels3,4and5whichformthebasisforourevidencereviews.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 6

SMS 5 methods

SMS5methodsrequirefullrandomisationofprogrammeparticipation.Thiscanonlybedonebyarandomisedcontroltrial.AssuchthereisonlyonemethodinthiscategoryontheSMS.Randomisation,properlyapplied,meansthereisnoselectionintothetreatment.Thisensuresthattherearenodifferencesbetweenthetreatmentgroupandcontrolgroupeitheronobservable(e.g.age)orunobservable(e.g.ability)characteristics.Anydifferencepost-treatmentmustthereforebeaneffectofthetreatment.Evidenceofthistypeisthehighestqualityofevidenceandisconsideredthe‘goldstandard’forpolicyevaluations.

Randomised Control Trial (RCT)AnRCTisdefinedbyrandomassignmenttoeitherthetreatmentorcontrolgroup.AtypicalRCTwillinvolvethefollowingsteps.First,theoriginalprogrammeapplicantsmaybepre-screenedoneligibilityrequirements.Second,alottery(computerrandomisation)assignsapercentageoftheeligibleapplicants(usually50%)tothecontrolgroupandtheremaindertothetreatmentgroup.Third,baselinedataiscollected(eitherfromanexistingdatasourceorfromabespoke‘baseline’survey).Fourth,thetreatmentisapplied.Fifthandfinally,dataiscollectedsometimeafterthetreatment(again,eitherfromanexistingdatasourceorabespoke‘follow–up’survey).Becauseindividualsarerandomlyassignedtothetreatmentandcontrolgroups,thereisnoreasonthattheyshoulddifferoneitherobservableorunobservablecharacteristicsinthebaselinedata.Thismeansthatdifferencesintheoutcomevariablesinthepost-treatmentdatawillbeentirelyattributabletothetreatmenti.e.arefreefromselectionbias.ForthisreasonanRCTscoresthemaximumscoreof5ontheMarylandscale.

InorderforanRCTtoachievethemaximumSMS5scoreonimplementation,threecriteriamustbemet.Firstly,therandomisationmustbesuccessful.Whetherornotthisisthecase,isoftentestedusing“balancingtests”thatcomparetreatedandcontrolindividualsinthebaselinedataonarangeofcharacteristics.Ifrandomisationhasbeensuccessfulthenthebalancingtestsshouldshownosignificantdifferencesbetweenthetwogroups.Secondly,attritionmustbecarefullyaddressed.Attritionhappenswheneverindividualsdropoutfromthestudye.g.donotcompletetreatmentordonotprovidefollow-updata(whenthisiscollectedusingafollow-upsurvey).Theavailabilityoffollow-updataisoftenaparticularproblemforindividualsinthecontrolgroupsincetheyhavelessincentive

03

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 7

tostayinthestudy.Ifdropoutisonarandombasis,thenattritionisnotanissue.However,theremaybeelementsofselectionbiasfordroppingout.Forinstance,itmaybethecasethelessskilledindividualsinacontrolgroupchoosetoleaveanunemploymenttrainingstudy.Thiswouldmaketheremainingcontrolgroupmoreskilledonaverageincreasingthelikelihoodthattheyfindemployment.Inturn,thiswouldleadtoadownwardlybiasedtreatmenteffect.Forthisreason,policyevaluatorsshouldpaycarefulattentiontotheissueofattrition.Ifattritionisliabletoselectionbias,thenevaluatorsshouldsatisfactorilyaddresstheissue.

Forinstance,inastudythatlooksattheimpactofneighbourhoodcrimeonthepropensityofrefugeestocommitcrimethemselves,ifcertainindividualsdropoutofthestudybecausetheymoveabroad,thenthereisaproblemofpotentiallybiasedattrition.Toovercomethis,theauthorscouldlookintowhatfactorsleadrefugeestogoabroad,andincludethesecontrolsintheirstudy(wediscussanexampleofthisbelow).Thirdly,theexperimentshouldbedesignedsuchthatthetreatmentdoesnotspillovertothecontrolgroupi.e.contaminationmustnotbeissue.Inorderforthecontrolgrouptobeasuitablecomparator,itmustnotinanywaybeexposedtothetreatment.Ifthisconditionisnotrespected,thenthetreatmenteffectmaybedownwardlybiasedbecausethecontrolgrouppartlybenefitsfromthetreatment.Contaminationmayhappenif,forexample,individualsinagivenneighbourhoodaregivenfinancialmanagementadvice,andtheysharetheirknowledgewithpeopleinotherneighbourhoodswhoaresupposedtogountreated.

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Randomised Control Trial (RCT)

a.k.a.

FieldExperiment

5,5if

• Randomisationissuccessful

• Attritioncarefullyaddressedornotanissue

• Contaminationnotanissue

5,4if

• Oneofthecriteriaisseverelyviolated

• 5,3if

• Twoormoreofthecriteriaareseverelyviolated

RCT Example 1 (SMS 5, 5)

A2013paperbyJensetal.publishedintheAmericanEconomicReviewevaluatestheimpactofneighbourhoodqualityonlifeoutcomes.2Inthiscase,theevaluationisdifficultbecauseitispossiblethattheresidentsofdisadvantagedneighbourhoodshaveworseoutcomesbecauseofindividualorfamilycharacteristics.Toanextent,certaintypesoffamilies“self-select”intobadneighbourhoods,meaningthatthereisaproblemofselectionbias.Inthisinstance,theauthorssurpassthisissuebyevaluatingtheimpactofMovingtoOpportunity(MTO),aprogrammethatgavefamilieswithindisadvantagedneighbourhoodstheopportunitytomovetomoreaffluentareasonarandombasis.InorderforthestudytoachievethemaximumSMSscoreof5,thethreecriteriamustbemet.

1. Randomisation is successful

Pass

TheauthorsmakeaconvincingargumentinfavourofthetrulyrandomnatureoftheMTOprogramme.Inapreliminarystage,interestedresidentsofpoorneighbourhoods(4,604low-incomepublichousingfamilies)enrolledinMTO.Thesefamilieswererandomlyassignedtooneofthreegroups:theExperimentalgroup(whichreceivedhousingvouchersthatsubsidisedprivate-marketrentsinlow-

2 Jens,L.,Duncan,G.,Gennetian,L.,Katz,L.,Kessler,R.,Kling,J.,Sanbonmatsu,L.(2013).Long-TermNeighborhoodEffectsonLow-IncomeFamilies:EvidencefromMovingtoOpportunity.AmericanEconomicReview,226-231.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 8

povertycommunities),theSection8group(whichreceivedunconstrainedhousingvouchers),andthecontrolgroup.Balancingtestsshowthattreatmentandcontrolgroupsaresimilaraccordingtoarangeofobservablecharacteristics,meaningthattherandomisationwassuccessful.

2. Attrition carefully addressed

Pass

Around90percentoftheindividualsthatwereoriginallyenrolledintheprogramwereinterviewed10to15yearsafterthebaselineyear(from1994to1998).Thisisafairlyhighrate,sothatsampleattritionisnotaprobleminthisinstance.

3. Contamination not an issue

Pass

Thevoucherswereinnowaytransferable,meaningthatcontaminationisnotanissueinthiscase.

Given that all three criteria are satisfied, this study achieves SMS 5 for implementation.

RCT Example 2 (SMS 5, 5)

A2014paperbyDammandDustmannpublishedintheAmericanEconomicReviewevaluatestheimpactofearlyexposuretoneighbourhoodcrimeonsubsequentcriminalbehaviour.3Isolatinghowneighbourhoodsaffectyoungpeople’spropensitytocommitcrimesisdifficult,asitispossiblethatchildrengrowingupincrime-riddencommunitiescomefromfamilyenvironmentsthatarethemselvesconducivetocriminalbehaviour.Toovercomethisproblem,theauthorsexploitaneventinwhichasylum-seekingfamilieswererandomlyassignedtocommunitieswithdifferinglevelsofcrimebytheDanishgovernment.

1. Randomisation is successful

Pass

Uponbeingapprovedforrefugeestatus,asylum-seekerswereassignedtemporaryhousinginoneofDenmark’s15counties.Withinthecounty,thecouncil’slocalofficeassignedeachfamilytoamunicipality.Thisassignmentwasrandom,althoughcouncilswereawareofbirthdates,maritalstatuses,numberofchildren,andnationalities.Theauthorshighlightthefactthatthecouncilinnowaybaseditsdecisiononthefamily’seducationalattainment,criminalrecord,orfamilyincome,simplybecauseitwasnotprivytothisinformation.Italsodidnotknowanythingaboutthefamilybesideswhatwaswritteninthequestionnaire.Furthermore,thefamily’spersonalpreferenceswerenottakenintoaccount.Thebaselinebalancingtestconfirmsthatthetreatmentandcontrolgroupswereindeedobservablysimilarbeforethetreatment.

2. Attrition carefully addressed

Pass

Ofthesampleof5,615refugeechildren,975leftDenmarkbeforetheageof21.Furthermore,anadditional215childrenwerenotobservedineveryyearbetweenarrivalandage21.Nonetheless,theauthorsfindnosignificantrelationshipbetweenleavingDenmarkornotbeingobservedallyearsandtheoutcomevariables.Thisindicatesthatthereisnosignificantselectionbiasinherenttodroppingoutofthestudy.

3 Damm,A.&Dustmann,C.(2014).DoesGrowingUpinaHighCrimeNeighborhoodAffectYouthCriminalBehavior?.AmericanEconomicReview,1806-1832.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 9

3. Contamination not an issue

Pass

Althoughfamilieswererandomlyassignedtoamunicipalityandwereencouragedtostaythereforatleastthedurationofthe18-monthintroductoryperiod,theywerenostrictrelocationrestrictions.Thismeansthatitispossiblethatfamiliesthatreceivedthetreatment(beinginaneighbourhoodwithmorecrime)didnotnecessarilyadheretothetreatment,andfamiliesthatwerecontrols(livedinsaferneighbourhoods)mayhavebeenexposedtothetreatment.Theauthorsfindthataround80percentofthefamiliesstayedintheirassignedareaafterthefirstyear,andhalfofallfamiliesstayedintheassignedareaaftereightyears.

Given that all three criteria are satisfied, this study achieves SMS 5 for implementation.

RCT Example 3 (SMS 5, 3)

A2012reportbyDoyleevaluatestheimpactofPreparingforLife(PFL),aprogrammethatprovidessupporttofamiliesfrompregnancyuntilthechildisoldenoughtostartschool.4Itislikelythatthereisasignificantselectionintotreatment–moreconcernedparentsmaychoosetojointheprogramme.Therefore,inordertoaccuratelyunderstandtheeffectofthetreatment,theauthoremploysarandomisedcontroltrial.

1. Randomisation is successful

Fail

PFLisspecifictocertaindeprivedcatchmentareasinIreland.Withinthesecatchmentareas,52percentofallpregnantwomenparticipatedintheprogramme,withtheremaining26percentrejectingtheoffer,andanother22percentnothavingbeenidentified.Inasubsequentstage,PFLrecipientswererandomlyassignedtooneoftwogroups:hightreatmentandlowtreatment.Anobservationallysimilarcommunitywaslaterusedasacontrol,whichimpliesthattreatmentwasnot,infact,randomlyassigned.

2. Attrition carefully addressed

Fail

Theauthorsdonotdiscusstheextenttowhichtreatedwomendroppedoutoftheprogramme.

3. Contamination not an issue

Fail

GiventhatalargepartofPFLentailstheprovisionofinformation(i.e.mentoringanddevelopmentinformationpacks)withinacommunity,itispossiblethatneighbourssharethecontentsoftheirtreatment.Thisisparticularlytrueforthehighandlowtreatmentgroups,astheyliveinthesamecommunity.

Given that all three criteria are violated, this study achieves SMS 3 for implementation.

4 Doyle,O.,&UCDGearyInstitute PFL EvaluationTeam.(2012). PreparingforLifeEarlyChildhoodInterventionAssessingtheEarlyImpactofPreparingforLifeatSixMonths.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 10

SMS 4 methods

SMS4methodsarecharacterisedbytheexploitationofsomesourceof‘quasi-randomness’.Thatis,randomnessthathasnotbeendeliberatelyimposedbutarisesbecauseofsomeotherreason.Usually,thismeansidentifyinghistorical,socialornaturalfactorsthatresultinpolicybeingimplementedinawaythatistosomeextentrandom.Byidentifyingcaseswhereapolicywasimplementedtosomeextentrandomly,theSMS4methodstrytoensurethatthetreatmentandcontrolgroupsaresimilaronobservableandunobservablecharacteristics.However,unlikecontrolledrandomisation,theyneedtomakethecasethattheresultingvariationintreatmentistrulyrandom.Ifthisargumentfailsinpracticethentreatmentandcontrolgroupsdifferandthiscanleadustoincorrectconclusionsabouttreatmenteffects.Itisforthisreasonthatthesemethodsscore4,ratherthanamaximum5,ontheSMSscale.

Instrumental variables (IV)Tosolvetheproblemofselectionbias,apolicyevaluationcanusetheinstrumentalvariables(IV)method.Thisapproachentailsfindingsomethingthatexplainstreatmentbuthasnodirecteffectontheoutcomeofinterest(andisnotrelatedtoanyotherfactorsthatmightdeterminethatoutcome).Thisfactor,the‘instrument’,substitutesforthetreatmentvariablethatmayitselfbecorrelatedwithothercharacteristicsthataffectoutcomes(andthuscauseselectionbias).Forinstance,whenlookingattheimpactofhighwaysonpopulationdensitywithincities,itispossiblethatnotonlymighthighwaysimpactpopulationdensity,butthattheconstructionofhighwaysmightrespondtochangesinpopulationdensity.Agoodwaytocircumventthisproblem,then,istoemployanIV.Forexample(discussedfurtherbelow)Baum-Snow(2007)usesaplannedUShighwaygridthatwasplannedin1947(butnotnecessarilybuilt)asaninstrumentforactualUShighways.Thelogicbehindthisinstrumentisthatitisessentiallyrandominitsassignment(asfaraspopulationdensitygoes),becausethe1947gridwasplannedformilitaryandtradepurposes.ThiselementofrandomnessisessentialforasuccessfulIV.Inthissense,itisawayofsimulatingtherandomassignmenttotreatmentthatisdonewithrandomisedcontroltrials.Asuccessfulinstrumenthasthepotentialtoeliminatedifferencesbetweentreatmentandcontrolgrouponobservableorunobservablecharacteristicsandthusscoreamaximumof4ontheSMSscale.Inordertoachievethisscore,the

04

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 11

instrumentmustsatisfythreemaincriteria.Firstly,theinstrumentmustbeexogenous,orrandominassignment.Secondly,itmustberelevant,orcrediblyrelatedtothevariablethatitreplaces.Thirdly,itmustbeexcludable,meaningthatitdoesnotdirectlyimpacttheoutcome.

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Instrumental Variable (IV)

a.k.a.

Two-StageLeastSquares(2SLS)

4,4ifinstrumentis:

• Relevant(explainstreatment)

• Exogenous(notexplainedbyoutcome)

• Excludable(doesnotdirectlyaffectoutcome)

Scoredasperunderlyingmethodif:

• Instrumentisinvalid

• e.g.crosssectionwithinvalidIVscores2;difference-in-differencewithinvalidIVscores3(seebelow)

IV Example 1 (SMS 4, 4)

A2007paperbyNathanielBaum-SnowpublishedintheQuarterlyJournalofEconomicsevaluatestheimpactofhighwaysonsuburbanisation.5Thisisadifficultquestiontoevaluatebecauseofthepotentialpresenceofselectionbias:highwaysthatwerebuiltmayhavebeenconstructedtoaccommodatesuburbanisation.Inordertoovercomethisproblem,theauthorusestheIVmethod,withhighwaysthatwereoriginallyplanned(butnotbuilt)in1947asaninstrumentforactualhighways.InordertoachievethemaximumSMSscoreof4,theinstrumentmustsatisfythethreecriteria.

1. Instrument is relevant

Pass

Thereisnodoubtthattheoriginal1947planisarelevantinstrumentbecausethecurrenthighwaysystemispartlybasedonit.

2. Instrument is exogenous

Pass

The1947planisanexogenousinstrumentbecauseitwasnotplannedwithsuburbanisationinmindbuttoconnectcitiesfortradepurposes.Onethreattoexogeneityisthatitmayhavebeenthecasethatlargercitiesweregivendenserhighwaysystemsinthe1947plan.Thereforetheauthorcontrolsforthe1947populationofeachcity.Conditionalonthisimportantcontrol,theinstrumentisconvincinglyexogenous.

3. Instrument is excludable

Pass

Byvirtueofthe1947gridbeingsimplyaplan,thereisnowayitcouldhavedirectlyaffectedsuburbanization,hencetheexclusionrestrictionholds.

Given that all three criteria are satisfied, this study achieves SMS 4 for implementation.

5 Baum-Snow,N.(2007).DidHighwaysCauseSuburbanization?.QuarterlyJournalofEconomics,775-805.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 12

IV Example 2 (SMS 4, 4)

A2013paperbyCollinsandShesterevaluatestheimpactofurbanrenewalprogrammesoneconomicoutcomesinvariousU.S.cities.6Theseprogrammesareparticularlyproblematicintermsofevaluationbecauseitmaybethatpoorercitiesself-selectintotreatmentbecausetheyaremoreinneedofurbanrenewalschemes,orconversely,thatrichercitiesself-selectintotreatmentbecausetheycanaffordtoundertakemoreurbanrenewal.Toovercometheseissues,theauthorsusevariationinthetimingofwhenstatesapprovedtheurbanrenewallawsasaninstrument.

1. Instrument is relevant

Pass

Duringtheroll-outofthepolicy,theinstrumentofstate-leveldelaysstronglypredictedwhetherornoturbanrenewalprogrammeswereinplace.

2. Instrument is exogenous

Pass

Theauthorsarguethatthestates’decisiontoallowurbanrenewallawsissomewhatrandominitsprecisetiming.Theyconcede,however,thatstates’receptivenesstofederalinterventionmaybeanindicatorofliberalness,whichmayinturnbepositivelyrelatedtoeconomicprogress.Thus,controlsforstateconservatismareincluded.Conditionalonthiscontrol,theargumentappearssound.

3. Instrument is excludable

Pass

Theyarguethatiftimingofstateapprovalofurbanrenewallawsdirectlyaffectedurbaneconomicoutcomes,thenitwouldalsoaffecttheeconomyoftherestofthestate.Findingthattheruralareasofstatesthatpassedthelawslaterwerenoeconomicallydifferentfromtheruralareasofstatesthatpassedthelawsearlier,theauthorsconvincinglydemonstratethattheexclusionrestrictionholds.

Given that all three criteria are satisfied, this study achieves SMS 4 for implementation.

IV Example 3 (SMS 4, 2)

A2011paperbyNishimuraandOkamuroevaluatestheimpactofindustrialclustersonthenumberofpatentsafirmappliesfor.Thisevaluationisproblematicbecausefirmsthatchoosetolocatewithinanindustrialclusterareprobablyinherentlydifferentfromthosethatdonot.Toovercomethis,theauthorsusefirmageasaninstrumentforclusterparticipation.Thelogicbehindthisinstrumentisthatsmallerfirmsareattractedtoindustrialclusters.Firmsizeis,inturn,relatedtofirmage.Furthermore,theauthorsarguethattheageofafirmissomewhatrandomgiventhatthenumberofpatentsthatafirmappliesfordoesnotaffectitsage.

1. Instrument is relevant

Pass

Theauthorsarguethatonlysmallandmediumfirmsgravitatetowardsclustersbecauselarge,internationalfirms,arelessinterestedinregionalcollaborations.Theydefendthatthesizeofafirmis,inturn,significantlyrelatedtoitsage.Thisappearsareasonableargument.

6 Collins,W.&Shester,K.(2013).SlumClearanceandUrbanRenewalintheUnitedStates.AmericanEconomicJournal:AppliedEconomics,239-273.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 13

2. Instrument is exogenous

Fail

Thisinstrumentisnotexogenousbecausefirmageisnotrandomlyassigned;onlyrelativelysuccessfulfirmssurvivebeyondacertainage.

3. Instrument is excludable

Fail

Theauthorsclaimthatfirmagedoesnotdirectlyinfluencethenumberofpatentsafirmappliesfor.However,youngfirms(e.g.start-ups)maybemoreinnovativeandapplyformorepatents.Conversely,oldfirmsaretypicallylargerandmayhaveagreaterbudgetforR&Dthatcouldleadtopatents.Ifeitheristruethentheexclusionrestrictionisviolated.Theauthorsdonotaddresstheseconcerns.

Given that only one of the three criteria is satisfied, this study is scored according to its underlying method (cross sectional with controls) and achieves SMS 2 for implementation.

Regression Discontinuity Design (RDD)Thismethodcanbeappliedincaseswheretherearecut-offsfortreatmenteligibility.Thesediscontinuitiesintreatment(wherebyunitsaretreatedaslongastheyareabove/belowacertainobservablethreshold)canbeexploitedtogeneratequasi-randomassignmentintotreatment.Essentially,thisinvolvescomparingunitswhoarejustabovethethreshold(andhencetreated)tothosethatarejustbelowthethreshold(andhenceuntreated).Whilstingeneraltheunitsthataretreatedaredifferenttothosethatarenot,theunitsthatarejusteithersideofthecut-offarelikelytobesimilar(bothintermsofobservableandunobservablecharacteristics).Thismakestreatmentaroundthecut-offalmostrandom.Giventhatthistypeofstudyexploitsacertainrandomnessintreatment(aswiththeIVmethod),itensuresthatthetreatmentgroupandcontrolgrouparesimilaronobservableandunobservablecharacteristics,thereforewarrantingamaximumofSMS4.

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Regression Discontinuity Design (RDD)

4,4if:

• Discontinuityintreatmentissharp(e.g.stricteligibilityrequirement)orfuzzydiscontinuitymethodused

• Onlytreatmentchangesatboundary

• Behaviourisnotmanipulatedtomakethecut-off

Scoredasperunderlyingmethodif

• Discontinuityconditionsseverelyviolated

• e.gcrosssectionwithinvalidIVscores2;difference-in-differencewithinvalidIVscores3(seebelow)

However,inorderforthestudytoachievethismaximumscore,threeassumptionsmustbesatisfied.Firstly,thediscontinuityintreatmentmustbesufficientlysharp.Thismeansthatthethresholdfortreatmentmustbeclearandenforcedinpractice,sothatwhenindividualsaboveandbelowthethresholdarecomparedtheresearcherisinfactcomparingtreatedtountreatedindividuals.Secondly,onlythetreatmentshouldchangeattheboundary.Thisisimportanttoensurecomparabilitybetweentreatmentandcontrolgroups.Thisassumptionwouldbeviolatedif,forexample,firmsthathadlessthantenemployeeshadtopaylowerpayrolltaxesinadditiontobeingeligibleforthegrant.Finally,behaviourmustn’tbemanipulatedaroundthecut-off,soastoguaranteethatindividualsaround

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 14

thethresholdareinfactcomparableineverywayexceptfortheirexposuretotreatment.Thisisparticularlyrelevantifthethresholditselfinducesunitstochangetheirbehaviour–forexample,firmscouldartificiallyensurethattheyhavelessthantenemployeeseventhoughtheywouldotherwisehiremore,justbecausetheywanttobeeligibleforthegrant.Ifthishappens,thenitcannotbecrediblyheldthattreatmentisrandomaroundtheboundary.

RDD Example 1 (SMS 4, 4)

A2013paperbyPop-ElechesandUrquiolapublishedintheAmericanEconomicReviewevaluatestheimpactofschoolqualityoneconomicoutcomes.7Thisisaparticularlydifficultquestiontoaddress,aspupilsofbetterschoolsareacademicallymorecapableinthefirstplace,andthereforemorelikelytogetbetterresultsregardlessofschoolquality.Toovercomethisselectionbias,theauthorsemploytheregressiondiscontinuitymethod.Theyexploitadiscontinuityintreatment,wherebystudentsthatachieveaboveacertaingradepointaverageareadmittedintoahigherqualityhighschool,whilethosethatachievejustbelowtherequiredgradeaverageattendalowerqualityhighschool.InorderforthestudytoachievethemaximumSMSscoreof4,thethreecriteriamustbemet.

1. Discontinuity in treatment is sufficiently sharp

Pass

InRomania,highschooladmissionsaredeterminedsolelybyschoolaveragesandexaminationresults.Thestandardsapplytoallstudentsinthesamemanner.Therefore,itcanbearguedwithcertaintythatthediscontinuityissharp,inthatthosethatscorebelowthethresholddefinitelydonotgetaplaceinthebestschool,andthosethatscoreabovedefinitelydo.

2. Only treatment changes at the boundary

Pass

Becausethisstudylooksatschoolquality,itcanbeplausiblyheldthattheonlythingthatchangesattheboundaryisthequalityofschool,sincestudentsthatsurpassthecut-offdonotgetfurtherbenefits(suchasmeritawardsforexample).

3. Behaviour is not manipulated to make the cut-off

Pass

Inorderforthisconditiontohold,itcannotbe,forinstance,thatstudentsjustabovethethresholdimploredtheirteachersforslightlybettergrades,asthatwouldmaketheminherentlydifferenttothosethatdidnot.Forthiscase,itisunlikelythatthishappenedgiventhatthethresholdisonlyknownonceexamresultsareout.

Given that all three criteria are satisfied, this study achieve SMS 4 for implementation.

RDD Example 2 (SMS 4, 4)

A2014studybyAndersonpublishedintheAmericanEconomicReviewevaluatestheimpactofpublictransporttransitonhighwaycongestion.8Inthiscase,itisdifficulttoisolatethedirectionofcausality,asthenumberofpeoplewhousepublictransportcanaffecthighwaycongestion,butat

7 Pop-Eleches,C.&Urquiola,M.(2013).GoingtoaBetterSchool:EffectsandBehavioralResponses.AmericanEconomicReview,1289-1384.

8 Anderson,M.(2014).Subways,Strikes,andSlowdowns:TheImpactsofPublicTransitonTrafficCongestion.AmericanEconomicReview,2763-2796.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 15

thesametime,highwaycongestioncanalsoaffectthenumberofpeoplethatusepublictransport.Toovercomethisissue,theauthorsexploitanaturalexperimentwherebyLosAngelespublictransportworkerssuddenlywentonstrikefor35days.Morespecifically,theyusetheRDDmethodtocomparehighwaycongestionjustbeforethestrikestartedandhighwaycongestionjustafterthestrikestarted.

1. Discontinuity in treatment is sufficiently sharp

Pass

Here,thediscontinuityisverysharp.Fromonedaytoanother,transitworkerswentonstrike,meaningthattheyworkednormallyoneday,anddidnotworkatallthenextday,andthedaysthereafter.Thisledtothealmostcompleteshutdownofpublictransportservices,withminorexceptionsofsomecontract-operatedbusservices.Theauthorsarguethattheseemergencyserviceswereonlyusefulforaverysmallfractionofpublictransportusers.

2. Only treatment changes at boundary

Pass

Thestrikewasspurredbyadisagreementbetweenthetransportmechanicsandtheemployersovercontributionstoahealthcarefund.Theauthorsarguethatthetimingofthestrikeisexogenousbecauseitstartedjustaftertheexpirationofa60-daycourt-orderedinjunctionthatprohibitedstriking.

3. Behaviour is not manipulated to make the cut-off

Pass

Itispossiblethatifpeoplewereexpectingthestrike,theychangedtheirworkarrangementstoadjusttotheinterruptions.Forinstance,itcouldbethatpeoplestartedworkingfromhomewhentheyotherwisewouldn’thave.Ifthiswerethecase,theeffectsofpublictransportonhighwaycongestionwouldbedownwardlybiased.However,theauthorsarguethatthestrikeledtoanabruptandunexpectedhaltinservice,meaningthatitisunlikelythatpeopleanticipatedit.

Giventhatallthreecriteriaaresatisfied,thisstudyachievesSMS4forimplementation.

RDD Example 3 (SMS 4, 2)

A2007reportbyHaeglandandMoenevaluatestheimpactofanR&DtaxcreditonactualfirmR&Dinvestment.9Inprinciple,thisisatoughpolicytoevaluateasfirmsoptintoreceivingthetaxcredit.Toovercomethisproblem,theauthorsusetheRDDmethodandexploitthefactthatfirmsthatinvestedupto4millionNOK(about£342,000)receivedan18percenttaxcreditoneveryNOKspent,whilefirmsthatinvestedabovethisamountreceivedthesamecredituptothe4millionthreshold,butnocreditthereafter.

1. Discontinuity in treatment is sufficiently sharp

Fail

Thediscontinuityseemsextremelyfuzzy,asfirmsabovethethresholdstillgetthetreatment,buttoalesserextent.Thisisbecausefirmsthatspendmorethanthethresholdstillgetataxcreditforthefirst4millionNOKspent,butdonotreceiveacreditforanyamountinvestedbeyondthat.Forexample,afirmthatinvests4million(belowthreshold)getsan18percenttaxcredit(i.e.720,000,about

9 Haegland,T.,&Moen,J.(2007).InputAdditionalityintheNorwegianR&DTaxCreditScheme.DiscussionPaper,StatisticsNorway.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 16

£61,500)butafirmthatinvests4.1million(abovethreshold)alsogets720,000whichis17.5percent.Thusthereisvirtuallynodifferenceintreatmentoneithersideoftheboundary.

2. Only treatment changes at boundary

Pass

Theredoesnotseemtoanysortofotherelementofvariationatthe4millionNOKcut-off.

3. Behaviour is not manipulated to make the cut-off

Fail

Inthiscase,firmshaveaclearincentivetospendlessthan4millionNOKinR&D,aseveryNOKspentbeyondthatthresholdisnotsubsidised.Therefore,itispossiblethatonlymuchlargerandmoresuccessfulfirmsspendbeyondthethreshold,whilesmaller,lesssuccessfulfirmsdeliberatelyspendlessthantheyotherwisewouldhaveintheabsenceofthe4millioncut-off.

Given that only one of the criteria is satisfied, this study achieves SMS 2 for implementation.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 17

SMS 3/4 methods

Thereareanumberofmethodswhichmayscorea4ora3dependingontheparticularwayinwhichtheyarespecified.ThisisaparticularissueforhazardregressionsandHeckmanselection(andother‘controlfunction’-type).Becausetheseapproachesarenotfrequentlyencounteredinexistingstudiesoflocaleconomicgrowth,andbecauseaspectsoftheseapproachesareparticularlytechnical,we’verelegatedconsiderationofthemtoAppendix2.Theseapproachesmayseeincreasedapplicationinfuture,butinthemaintextwefocusonthemorefrequentlyencounteredapproaches.

05

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 18

SMS 3 methods

WedefineSMS3methodstobetheminimumstandardforourreviews.Thesemethodsmustinvolveacomparisonoftreatedunitsagainstacontrolgroupbefore-and-afterthetreatmentdate.Thesemethodscontrolforselectiononobservablecharacteristics,andthroughabeforeandaftercomparison,eliminateanyfixedunobservabledifferencebetweentreatmentandcontrolgroup.However,theydonotinherentlycontrolforunobservabledifferencesthatdovarywithtime.Forthisreason,theyscoreamaximumof3ontheSMS.Inordertoachievesuchascore,however,theymustattempttosomehowcontrolfortime-varyingunobservableselectionbias.Thisisoftendonethroughtheconstructionofavalidcontrolgroup.Furthermore,theyshouldcontrolforobservabledifferencesbetweenthetreatmentandcontrolgroupsusingsomeformofregression,matching,controlforselection,orcontrolvariables.

Difference-in-differences (DiD)TheDIDmethodentailscomparingatreatmentandcontrolgroupbeforeandaftertreatment.Morespecifically,thetreatmenteffectiscalculatedbyfirstevaluatingthechangeintheoutcomevariableforthetreatedgroup,andthensubtractingthechangeininthecontrolgroupoverthesameperiod.Here,thecontrolgroupprovidesthecounterfactualgrowthpathi.e.whatwouldhavehappenedinthetreatmentgrouphaditnotbeentreated.Thisismuchbetterthanasimplebeforeandaftertreatmentcomparison,becauseitaccountsforthefactthatchangesinoutcomecanbeduetomanydifferentfactorsandnotjustthetreatmenteffect.Moreover,becauseDIDsubtractsthedifferencesbetweentreatmentandcontrolbothbeforeandafterthetreatment,iteffectivelycontrolsforanyunobserved,time-invariantdifferencesbetweenthetwogroups.However,itdoesnotaccountforunobservabledifferencesbetweenthetwothatvarywithtime.Forthisreason,itachievesamaximumofSMS3.Inordertoachievethisscore,twomaincriteriamustbesatisfied.Firstly,itmustbecrediblyarguedthatthetreatmentgroupwouldhavefollowedthesametrendasthecontrolgroup.Secondly,theremustalsobeaknowntimeperiodfortreatmentsothatthegroupscanbecomparedbeforeandafterthetreatment.

06

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 19

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Difference in differences (DID)

a.k.a.

Diffindiff

3,3if

• Controlgroupwouldhavefollowedsametrendandtreatmentgroup

• Knowntimeperiodfortreatment

3,2if

• Eitherofthecriteriaisnotsatisfied

DiD Example 1 (SMS 3, 3)

A2011studybyCurrie,Greenstone,andMorettievaluatestheimpactofhazardouswastesiteclean-upsonthebirthoutcomesofmotherslivingwithin2,000metresofthesite.10Theyusebirthstomotherswholivebetween2,000and5,000metresawayfromthesiteasacontrolgroup.InorderforthestudytoachievethemaximumSMSscoreof3,itmustsatisfythetwocriteria.

1. Treatment group would have followed same trend as control group

Pass

Firstly,thecontrolgroupischosenbecauseitissufficientlyfarawaysothatitisunaffectedbytreatment,becausehazardouswastesitesimpactpeoplethroughdirectcontact,fumes,andwatersupply,andonlythoselivingcloseenoughtoit(i.e.within2,000metres)areaffected.Secondly,thecontrolgroupwaschosenbecauseitissimilartothetreatmentgroup,asthepeoplelivingwithin2,000metresofawastesitearecomparabletothoselivingslightlyfartheraway.Evenso,theauthorsrecognisethatmotherslivingextremelyclosetothesitemaybeslightlydifferenttomotherslivingabitfartheraway.Forthisreason,theycontrolformaternaleducation,race,ethnicity,birthorder,andsmoking,aswellasotherneighbourhoodcharacteristics.

2. Known time period for treatment

Pass

Becausetheauthorsusedatafrom154sitesthatwerecleanedupbetween1989and2003,itishardtopinpointprecisedatesforeachclean-up.However,theauthorsusedatafromtheEnvironmentalProtectionAgency,whichincludesthedatewhenthesitewasaddedtotheNationalPriorityList,whentheclean-upwasinitiated,andwhenitwascompleted.

Given that the two criteria are satisfied, this study achieves SMS 3 for implementation.

DiD Example 2 (SMS 3, 3)

A1993studybyCardandKruegerevaluatestheimpactofaNewJerseyminimumwageincreaseontheemploymentlevelsoffastfoodrestaurants.11Itishardtoaccuratelygaugetheeffectsofsuchanincrease,asitislikelythatchangesinemploymentarecausedbyotherfactors.Toovercomethisproblem,theauthorsemployadifference-in-differencesapproachwherebytheycomparetheemploymentinNewJerseywiththeemploymentinPennsylvania,aneighbouringstatethatdidnotincreaseitsminimumwage.

1. Treatment group would have followed same trend as control group

10 Currie,J.,Greenstone,M.,Moretti,E.(2011).SuperfundCleanupsandInfantHealth.MITCEEPRWorkingPapers.11 Card,D.,Krueger,A.(1993).MinimumWagesandEmployment:ACaseStudyoftheFast-FoodIndustryinNewJersey

andPennsylvania.AmericanEconomicReview,Vol.84,No.4.,pp.772-793

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 20

Pass

TheauthorsmakeaconvincingargumentthatNewJerseyandPennsylvaniaare,infact,comparablestates.Theyanalysetherestaurantcompositionofeachandfindthattherearenosignificantdifferencesbetweenthetwo.Furthermore,becausetheyareneighbouringstates,itcanbecrediblyheldthattheyareexposedtothesamemacroeconomicshocks.

2. Known time period for treatment

Pass

Theminimumwageincreasedfrom$4.25to$5.05perhouronthe1stofApril1992.Theauthorsshowthatthenewminimumwageslawwas,infact,bindinginNewJersey,whichmeansthatwecanreasonablyassertthatthetreatmentwasalsosignificant.

Given that the two criteria are satisfied, this study achieves SMS 3 for implementation.

DiD Example 3 (SMS 3, 2)

A2005studybyValentinandLundJensenevaluatestheimpactofalawthattransferredpatentrightsfromindividualresearcherstouniversitiesoncollaborativedrugdiscoveryresearchinDenmark.12Becausethechangeinresearchfollowingthelawcouldbeattributedtomanydifferentcauses,theauthorsemployaDiDapproachusingSwedenasthecontrolgroup.

1. Treatment group would have followed same trend as control group

Fail

TheauthorsarguethattheDanishandSwedishbiotechindustriesaresimilarinsize,history,responsetothe2001high-techbubble,numberofinventions,andamountofinventorsmobilisedtobringinventionsabout.TheyarguethattheonlydifferencebetweenthemisthatDanishscientiststendtoworkinfirms,whileSwedishscientiststendtobeacademics.Tocontrolforthis,theyemployavariablefortheratioofbiopharmaceuticaltosmallmoleculepatents(asbiopharmaceuticalresearchtendstobedoneinuniversities,whilesmallmoleculeresearchtendstobedoneinfirms).However,theauthorsdonotadequatelyaddressthefactthatSwedenandDenmarkaresubjectedtodifferentmacroeconomicclimates,andhavedifferentsetsoffederallaws.Giventhattheydidnotuseanycontrolstotackletheseissues,itcannotbecrediblyheldthattheirtreatmentandcontrolgroupschangedinsimilarways,meaningthattheyarenotdirectlycomparable.

2. Known time period for treatment

Pass

ThelawthattransferredtheownershipofpatentsfromindividualresearcherstouniversitieswasenactedinDenmarkin2000.

Given that only one of the criteria is satisfied, this study achieves SMS 2 for implementation.

Panel data methods Thesemethodsusedatathatfollowsthesameindividualsovertimeallowingtheresearchertocontrolforthingsthatremainconstantforeachindividualacrosstime.Thiscanbedoneinavarietyofways.Themostcommonisthefixedeffects(FE)approachwhichentailsincludingdummyvariables(the

12 Valentin,F.&Jensen,R.(2005).EffectsonAcademia-IndustryCollaborationofExtendingUniversityPropertyRights.WorkingPaper.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 21

“fixedeffects”)forbothindividualcharacteristicsandanythingthathappenedinagiventimeperiod(timedummies).Alternatively,thefirstdifferences(FD)approachcanbeused,wherebytheoutcomeofapreviousperiodissubtractedinthefinalregression.Thisessentiallymeansthateverythingthathappenedinthepreviousperiodisremovedfromthefinalregression,thereforecancellingoutanythingthatremainsconstantacrosstime.13Inthiscase,becauseeverythingthatremainsconstantacrosstimeisdifferencedout,itisunnecessarytoincludefixedeffects.

Paneldatamethodssuccessfullycontrolforobservableandunobservablecharacteristicsthatremainconstantthroughouttime.However,theydonotaccountforunobservablecharacteristicsthatchangewithtime.Forthisreason,theycanachieveamaximumSMS3.Inordertoachievethemaximumscore,twomaincriteriamustbemet.Firstly,yearlyfixedeffectsmustbeaccountedfor,sothatthetreatmentisnottaintedbythecontextofagiventimeperiod.Secondly,whenusingthefixedeffectsmethod,thefixedeffectsmustbeattheunitofanalysis.Thismeansthatifthesampleisofpeople,thenthefixedeffectsmustrefertothem.Lastly,althoughpaneldatamethodscontrolforcharacteristicsthatdonotchangewithtime,theydonotcontrolforcharacteristicsthatdonotchangewithtime.Forthisreason,itisimportanttoincludeothertime-varyingcontrolsthatmayalsoaffecttheoutcome.

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Panel methods

Panel Fixed Effects (FE)

3,3if

• Fixedeffectisattheunitofobservation

• Yeareffectsareincluded

• Appropriatetime-varyingcontrolsareused

3,2if

• Oneormoreofthethreecriteriaisnotsatisfied

First Differences (FD)

3,3if

• Yeareffectsareincluded

• Appropriatetime-varyingcontrolsareused

3,2if

• Eitherofthetwocriteriaisnotsatisfied

FE Example 1 (SMS 3, 3)

A2011studybyDinkelmanpublishedintheAmericanEconomicReviewevaluatestheimpactofelectricityroll-outontheemploymentoutcomesofruralcommunities.14Inthiscase,theevaluationmaybetaintedbyselectionbias,asitispossiblethatpoliticallyimportantorgrowingareasarefavouredoverothers(thereversecouldalsobetrue).Toovercomethisissue,theauthoremploysafixedeffectsstrategy.InorderforthisstudytoachievethemaximumSMSscoreof3,itmustsatisfythethreecriteria.

1. Fixed effect is at the unit of analysis

Pass

Theauthorincludesavariableforcommunityfixedeffects.Giventhattheunitofobservationisthecommunity,thiscriterionismet.

13 SometimesresearchersthinkthatoutcomesinpreviousperiodsmaydirectlyaffectoutcomestodaywhichcausecomplicationsinpaneldatasettingsthataresometimesaddressedusingtheArellano-Bondmethod(seeAppendix3).

14 Dinkelman,T.(2011).TheEffectsofRuralElectrificationonEmployment:NewEvidencefromSouthAfrica.AmericanEconomicReview,3078-3108.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 22

2. Year effects are included

Pass

Controlsfortimetrendswithinthedistrictareincluded.

3. Appropriate time-varying controls used

Pass

Inthisparticularexample,itislikelythattherearemanykeyvariablesthatvaryovertime,andarethereforenotcapturedbythefixedeffects.Forthisreason,theauthorincludesavectorofcommunitycovariateswhichincludehouseholddensity,fractionofhouseholdslivingbelowthepovertyline,fractionofwhiteadults,educationalattainmentofadults,shareoffemale-headedhouseholds,andfemaletomaleratio.

Given that all three criteria are satisfied, this study achieves SMS 3 for implementation.

FE Example 2 (SMS 3, 2)

A2013studybyAtasoyanalysestheimpactofbroadbandinternetexpansiononemployment.15Thisevaluationposesthesameissuesofselectivityastheelectricityroll-outexample.Toovercomethis,theauthoremploysthefixedeffectsstrategy.

1. Fixed effect is at the unit of analysis

Pass

Theauthorderivescounty-levelbroadbandadoptionratesusingzip-codedata.Hesubsequentlyusescountyfixedeffects,whicharetherelevantunitofobservation.

2. Year effects are included

Pass

Yeardummyvariablesareincluded.

3. Appropriate time-varying controls used

Fail

Theauthorincludescontrolsfortime-variantpopulationcharacteristicssuchasincome,age,gender,race,anddensity.However,heneglectscertaincounty-levelcharacteristicsthataffectemployment,andarenotcapturedbythecountyfixedeffectsbecausetheychangewithtime.Thesecontrolscanincludetransportationinfrastructure,investmentineducationandjobtraining,andleveloffundinggrantedbythestate.

Given that only two of the criteria are satisfied, this study achieves SMS 2 for implementation.

15 Atasoy,H.(2013).TheEffectsofBroadbandInternetExpansiononLabourMarketOutcomes.ILRReview.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 23

FE Example 3 (SMS 3, 2)

A1997studybyGuellecanddelaPotterieevaluatestheimpactofgovernmentsubsidiesonR&Dexpenditure.16Inthiscasethereisaclearpotentialproblemofselectionbias,asfirmsthatchoosetousethesubsidiesaredifferentfromfirmsthatdonot.Toovercomethis,theyusethefirstdifferencesmethod.InorderforthisstudytoachievethemaximumSMSscoreof3,thetwocriteriamustbesatisfied.

1. Year effects are included

Pass

Theauthorsincludeayeardummyvariableinthefinalregression.

2. Appropriate time-varying controls used

Fail

Theauthorsdonotincludeanytime-varyingcontrols.Thismeansthatwhatisthoughttobetheimpactofthegovernmentsubsidiesmayinfactbeduetothingslikegrowthinthenumberoffirmemployees.

Given that only one of the criteria is satisfied, this study achieves SMS 2 for implementation.

Box 2: Fixed effects in the cross-sectional data context

Sometimes,cross-sectionalstudieswillusetheterm“fixedeffects”.However,itisimportanttodistinguishfixedeffectsinthiscontext,andfixedeffectsinthepaneldatacontext.Whilepaneldataconsidersanumberofindividualsacrosstime,cross-sectionaldataonlyconsidersindividualsinacertaintimeperiod.Forthisreason,panelfixedeffectsrefertowhatremainsconstantacrosstimeforaparticularindividual,andcross-sectionalfixedeffectssimplyrefertocontrols.Forinstance,ina2014studybyGibbonsetal,theauthorslookattheimpactofnaturalamenitiesonpropertyprices.Althoughtheirdataiscross-sectional,theyinclude“TravelToWorkArea(TTWA)levelfixedeffects”,whichinthiscasemeanscontrolsfortheTTWAthepropertyislocatedin.

Itisworthnotingthatevencross-sectionalstudiesthatincluderelevantcontrolscanonlygetamaximumSMSof2.

Propensity Score Matching (PSM)Thismethodentailsmatchingeverytreatedunitwithanobservationallysimilaruntreatedunit,andthenlookingatthedifferencesbetweenthemtogaugethetreatmenteffect.Thetreatedanduntreatedunitsarenotmatchedaccordingtowhethertheyhavesimilarcharacteristicsperse,butaccordingtoapropensityscore.Thisscoreisessentiallytheprobabilitythataunitistreated,accordingtoasetofcharacteristics.Thepropensityscoreistypicallycalculatedusingaregressionthatdealswitha0to1probabilityastheoutcomevariable(e.g.alogisticorprobitmodel).However,evenso,thismethoddoesnotentirelycorrectforselectionbiasbecauseitonlyaccountsforobservablecharacteristics.Forthisreason,ifusedinapurelycross-sectionalcontext,itcanachieveamaximumSMS2.Inordertoachievethisscore,itmustbecrediblyheldthatthematchingcriteriaarerelevanttoselectionintotreatment.Furthermore,akeyissuewithPSMisthatunitsthatareactuallytreatedaremorelikelytohavehigherpropensityscores,whileunitsthatareactuallyuntreatedaremorelikelytohavelower

16 Guellec,D.&delaPotterie,B.(1997)DoesgovernmentsupportstimulateprivateR&D?OECDEconomicStudies

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 24

propensityscores.Therefore,havingasufficientlylargeareaofoverlap(wherebytreatedunitsarematchedwithsimilaruntreatedunits)isessentialforthemethodtowork.InorderforthemethodtobeofSMS3quality,itmustbeusedwithothermethodsthatattempttocontrolforunobservablecharacteristics.Forinstance,itcanbecombinedwithmethodsthatexploittimevariationintreatment(e.g.DID).Inthiscase,inorderforthemethodtoachieveSMS3,thecriteriaforDIDmustberespected.

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Propensity Score Matching (PSM)

a.k.a.

Matching

With DID or panel method

3,3if

• Matchingcriteriasatisfied(seebelow)

• DIDorpanelcriteriasatisfied

3,2if

• Oneofthecriteriaisviolated

Cross-sectional

2,2if

• Goodmatchingvariables(i.e.relevanttoselection)

• Significantcommonsupport

2,1if

• Oneofthecriteriaisviolated

PSM Example 1 (SMS 3, 3)

A2012studybyBuscha,Maurel,Page,andSpeckesserevaluatestheimpactofhighschoolemploymentoneducationaloutcomes.17Thisquestionisdifficulttoevaluatebecausehighschoolstudentsthathaveapart-timejobmaynotbedirectlycomparabletotheirnon-workingpeers;itcouldbethattheyaremorehard-workinganddrivenandthereforehavebettergradesanyway,oritcouldbethattheycomefromdisadvantagedbackgroundsandthereforehaveworsegrades.Toovercomethefactthattreatedanduntreatedhighschoolstudentsmaybeintrinsicallydifferent,theauthorsemploythePropensityScoreMatchingapproachincombinationwiththeDIDmethod.InorderforthisstudytoachievethemaximumSMSscoreof3,thethreecriteriamustbesatisfied.

1. Good matching variables

Pass

Studentsarematchedaccordingtosocio-economicbackground,familybackground,schoollevelandregionalvariables,parentalexpenditure,schoolabsenteeismandconflict,alcoholanddrugissues,andeducationalaspirationsatgrade8.

2. Significant common support

Pass

Thereisaverylargeareaofcommonsupport,meaningthatfewobservationsweredroppedduetolackofsuitablematch.

3. Treatment group would have followed same trend as control group

Pass

17 Buscha,F.,Maurel,A.,Page,L.,Speckesser,S.(2012).TheEffectofEmploymentwhileinHighSchoolonEducationalAttainment:AConditionalDifference-in-DifferencesApproach.OxfordBulletinofEconomicsandStatistics,380-396.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 25

Usingthematchingmethod,theauthorsuseobservationallysimilarstudentsascontrolsforworkingstudents.Althoughunobserveddifferencesmayremain,itcanbereasonablyarguedthatthetwogroupswilllikelyfollowthesametrend.

4. Treatment date known and singular

Pass

Theauthorsuseadatasetthattracksstudentsfromwhentheywere8thgradein1988,until1992whentheygraduatedhighschool.Thedatasetincludesinformationforthreewaves:1988,1990,and1992.Becausestudentsarenotlegallyallowedtoworkbeforetheageof14,thedatasethaspreandposttreatmentinformationforthosethatchoosetoworkduringtheirhighschoolyears.Therefore,becausestudentsareonlylegallyallowedtoworkfrom1990onwards,thetreatmentdateisknownandsingular.

Given that the four criteria are satisfied, this study achieves SMS 3 for implementation.

PSM Example 2 (SMS 2, 2)

A2011studybyGhalib,Malki,andImaievaluatestheimpactofmicrofinanceonruralhouseholdpoverty.18Inthiscasethereisaclearproblemofselectionbiasbecausehouseholdsthatapplyformicrofinancemaybeintrinsicallydifferentfromthosethatdonot.Forinstance,theymayhavemoreentrepreneurialdriveorsimplybemorefinanciallyliterate.Toovercomethisproblem,theauthorsemployPSM.

1. Good matching variables

Pass

Treatmentandcontrolgroupsarematchedcharacteristicsthatincludeageofadults,childdependencyratio,accesstoelectricity,homeownershipstatus,consumptionofluxuryfood,percentageofliterateadults,andavailabilityoftoilets.Thesevariablesareimportantinthesensethattheycaptureafamily’slevelofwealthandeducation,anditcanbereasonablyheldthatpoorerhouseholdsselectintotreatment(orapplyformicrocreditinthiscase).

2. Significant common support

Pass

Whenanalysingtheoverlapbetweentreatmentandcontrol,theauthorsfindthatonly11observations(ofthe1,132)aredroppedduetoalackofsuitablematch.

Given that the two criteria are satisfied, this study achieves SMS 2.

18 Ghalib,A.,Malki,I.,Imai,K.(2011).TheEffectofEmploymentwhileinHighSchoolonEducationalAttainment:AConditionalDifference-in-DifferencesApproach.RIEBDiscussionPaperSeries.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 26

SMS 2 methods and below

Cross-sectional regressionCross-sectionalregressionsentailusingadatasetthatfeaturesmanydifferentindividualsatonepointintime.Thismethodsimplycomparestreatedindividualswithuntreatedindividuals,withouttakingintoaccountthedifferentcharacteristicsofthetwogroupsthatmayinfluencetheoutcome.Anattempttoaccountfordifferencesbetweentreatmentandcontrolgroupscanentailtheinclusionofcontrolvariables.However,evenifsomeobservablecharacteristicsarecontrolledfor,unobservabledifferencesbetweenthegroupsremain.Forthisreason,thismethodcanonlyachieveamaximumSMS2.Inordertoachievethisscore,however,adequatecontrolvariablesmustbeused.

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Cross-section regression 2,2if

• Adequatecontrolvariablesareused

2,1if

• Inadequatecontrolvariables

Cross-Sectional Regression Example 1 (SMS 2, 2)

A2012studybySarzynskievaluatestheimpactofpopulationsizeonthelevelofacity’spollution.19Theauthorusesadatasetthatfeaturesasampleof8,038citiesworldwidefortheyear2005.InorderforthestudytoachievethemaximumscoreofSMS2,thecriterionmustbesatisfied.

1. Adequate control variables are used

Pass

Besidespopulation,othercityvariablessuchasGDPpercapita,populationdensity,growthrate,climate,anddevelopmentstatusareincludedintheregression.Here,itisimportanttothinkabout

19 Sarzynski,A.(2012).BiggerIsNotAlwaysBetter:AComparativeAnalysisofCitiesandtheirAirPollutionImpact.UrbanStudies,3121,3138.

07

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 27

whetherthereareanyimportantvariablesthatareomittedandmaythereforebiastheresults.Inthisparticularcase,theauthorseemstohaveincludedanexhaustivelistoffactorsthatmayinfluencepollutionlevels.

Given that the criterion is satisfied, this study achieves SMS 2 for implementation.

Cross-Sectional Regression Example 2 (SMS 2, 1)

A2007studybyPateletal.evaluatestheimpactoflivinginanurbanversusruralsettingonmentalhealth.20Theyuseacross-sectionaldatasetofhouseholdsinMaputoandCuambainMozambique.

1. Adequate control variables are used

Fail

Toestimatetheimpactoflivinginaruralsettingonmentalhealth,theauthorsperformasimplebivariateregression.

Giventhatthecriterionisnotrespected,thisstudyachievesSMS1forimplementation.

Before-and-AfterThismethodoftenentailsusingatimeseriesdatasetforwhichoneindividualistrackedacrosstime.Anexampleofthisisthevariationofaspecificcountry’sGDPoveranumberofdifferentyears.Inthiscase,itispossiblethattheindividualisobservedbeforethetreatmentandafterthetreatment.However,the“after”doesnotnecessarilycapturethepuretreatmenteffect-itispossiblethatothercontextualfactorsaffectedtheoutcome,aswellastheindividual’scharacteristics.Forthisreason,thismethodcanachieveamaximumSMSscoreof2.Inordertoachievethisscore,however,relevantcontrolvariablesmustbeused.

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Before-and-after 2,2if

• Adequatecontrolvariablesareused

2,2if

• Inadequatecontrolvariables

Before-and-After Example 1 (SMS 2, 2)

A2014studybyBlankandEgginkevaluatestheimpactofdifferentgovernmentalhealthpoliciesonhospitalproductivity.Theyexploitatimeseriesdatasetthatfeaturestheoverallproductivityofdutchhospitalsfrom1972to2010.Itisworthnotingthatalthoughthismightseemlikeapaneldataset(becausethereareobviouslymanydifferenthospitals),itisessentiallyatimeseriesdatasetsimplybecausethereisnovariationintreatment-i.e.allhospitalsaresubjectedtothesamecentralgovernmentpolicies.Theonlyvariationintreatmentinthiscaseistemporal,asdifferentpolicieswereadoptedindifferenttimeperiods.InorderforthisstudytoachievethemaximumSMSof2,thecriterionmustberespected.

20 Patel,V.,Simbine,A.,Soares,I.Weiss,H.,Wheeler,E.(2007).PrevalenceofSevereMentalandNeurologicalDisordersinMozambique:aPopulation-BasedSurvey.Lancet,1055-1060.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 28

1. Adequate control variables are used

Pass

Inthisparticularcase,theauthorsdocontrolforthingslikethepriceofinputs,wages,andnumberofadmissions.

Given that the criterion is satisfied, the study achieves SMS 2 for implementation.

Before and After Example 2 (SMS 2, 1)

A2000studybyCormanandMocanevaluatestheimpactofdruguseandlevelsoflawenforcementoncrime.21Morespecifically,theylookathowthenumberofarrestsandpoliceofficers,aswellasdruguse,impactthenumberofmurders,assaults,robberies,andburglaries.TheyexploitadatasetthatfeaturesmonthlycrimeratesforNewYorkCity.Here,the“treatment”isthevariationinarrests,numberofpoliceofficers,andhospitalisationsrelatedtodruguse.

1. Adequate control variables are used

Fail

Theauthorsdonotincludecontrolsforotherfactorsthatmayinfluencecrimerates.Forinstance,itispossibletheunemploymentlevelsorinflationmayimpactcrime.Becausethesefactorsarenotaccountedfor,itmaybethecasethatthetreatmenteffectsarenotentirelyattributabletonumberofarrests,policeofficers,anddruguse.

Given that the criterion is not satisfied, the study achieves SMS 1 for implementation.

Additionality (not SMS scoreable) Thismethodentailsaskingparticipantsinaprogrammeaboutthechangesinoutcomethattheyattributetotheprogrammeeffects.AnexamplewouldbetoaskafirminreceiptofanR&Dgrant‘doyouthinkyoudomoreR&Dasaresultofthegrant?’.Ifsay75%offirmsanswer‘yes’thentheremaining25%isconsidered‘deadweight’i.e.R&Dthatwouldhavehappenedanyway.Whilstthismethoddoesacknowledgedeadweight,thecounterfactualisbasedpurelyonopinion:whatthefirmthinkswouldhavehappened.Giventhelackofobservablecounterfactual,thismethodisnotSMSscoreable.

Additionality Example

A2013reportcommissionedbytheDepartmentforBusinessInnovationandSkillsevaluatestheimpactoftheEnterpriseFinanceGuaranteescheme(EFG),aprogrammethatprovidedSMEswithanadditionalsourceoffunding.22Themethodologyofthisevaluationrelieschieflyonself-reportedsurveyinformation.Morespecifically,EFGfirmswereaskedtorespondtoquestionnairesthataskedabouthowtheirbusinessesbenefitedfromtheprogramme.Self-reporteddataareoftentheobjectofbias,andforthisreason,thisstudyisnotSMSscoreable.

21 Corman,H.,&Mocan,H.(2000).ATime-SeriesAnalysisofCrime,Deterrence,andDrugAbuseinNewYorkCity.AmericanEconomicReview,584-604.

22 Allinson,G.,Robson,P.,Stone,I.(2013).EconomicEvaluationoftheEnterpriseFinanceGuarantee(EFG)Scheme.DepartmentforBusinessInnovation&Skills.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 29

Impact modelling (not SMS scoreable)Impactmodellingisanapproachtoproduceexanteorexpostestimatesoftheeconomicimpactofapolicyorevent.Estimatesaremadeusing‘modellingassumptions’aboutthewayinwhicheventsorpolicesimpactonlocalareas.Forexample,impactmodelsadjustestimatesusingscalefactorstoaccountforissuessuchasdisplacementeffects–wherespendingassociatedwithalocaleventisoffsetfromspendingelsewhereorinanothertimeperiod.Sinceestimatesarebased(partlyorentirely)onassumptions,ratherthanobservedoutcomes,theapproachisparticularlyamenabletoexantestudiesofimpactwhereoutcomesarenotyetobservableinthedata(e.g.modellingthepotentialeconomicimpactsofafestivalorotherbigevent).

However,theresultsfromsuchmodellingprocessesareverysensitivetotheassumptionsmade,implyingthatinaccurateassumptionscanleadtoveryunreliableestimates.Inpractice,theassumptionsusedtendnotbebasedonstrongevidenceandareoftennotrelevanttothespecificcaseathand.Forexample,thereisnoreasontoexpectlargelocalmultipliereffectsinalocaleconomythatisalsopartofthelargernationaleconomy,andnostraightforwardwaytodividetheimpactsacrossspace.Similarly,itisneverreallypossibletoknowaccuratelyhowmuchdisplacementoccurs(e.g.inthefestivalexample,howmanyindividualswhowouldhavevisitedduringthefestival,avoidthefestivalandeithervisitanothertimeornotatall).Sincethismethoddoesnotprovideareasonablecounterfactual,itisnotSMSscoreable.

Impact modelling Example 1

AstudycommissionedbyChelmsfordCityCouncilevaluatestheimpactoftheVFestival(amusicfestival)onthelocaleconomy.Theystartbycalculatingthefestival’stotaldirectexpenditureanditsincidenceonthecity,county,andregion.Todothis,theauthorsusesurveydataontheexpenditurebyMetropolisMusic(theorganisersoftheevent),festivalcontractors,andvisitors.Inordertoestimatetheimpactofthisexpenditureonthelocalarea,theymakevariousassumptionsaboutthesizeofthelocalmultipliereffect,leakageeffectsnddisplacementeffects.Overallthedirectexpenditure(netimpacts)areestimatedat£7.4m(£6.6m)fortheEastofEnglandregion,£7.2m(£7.4m)forEssexcountyand£6.6m(£8.2m)forthecityofChelmsford.Notably,thenetimpactishigherthanexpenditureinChelmsfordbutlowerthanexpenditureintheregion.ThisreflectsthefactsomeexpenditureinChelmsfordisprobablypartlydisplacedfromthegreaterregion(50%ofvisitorstothefestivalcamefromtheEastofEnglandregion)whereasthelocalmultiplierisassumedtobethesame(about1.45)forthecity,countyandregion.

Inpracticethough,itisverydifficulttoensuretheseassumptionsarecorrect.Forexample,onewaytheseassumptionscouldfalldownisthatitislikelythatthelocalmultiplierissmallerinasinglecitythaninanentireregion.Alternatively,leakageanddisplacementeffectsmaybemuchlargerthanthought.ForChelmsford,itisassumedthatonlyaround12.8%ofexpenditurewouldhavehappenedanywayorleavesthecity.Giventhatthegreatmajorityofexpenditureisonfoodonthefestivalsite,thisfactorcouldbeanunderestimateif,forexample,alotofthecateringfirmscomefromoutsideofChelmsford.

Insummary,makingbroadassumptionsabouteffectsisoftenastraightforwardwaytoestimateimpactbut,duetothemanysourcesforerror,itisnosubstituteforobservingactualchangeinoutcomes(e.g.localfirmGVA)comparedwithacontrolgroup.SincethismethoddoesnotprovideacounterfactualitisnotSMSscoreable.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 30

Appendix 1: Quick-scoring guide for the Maryland Scientific Method Scale

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Randomised Control Trial (RCT)

a.k.a.

FieldExperiment

5,5if

• Randomisationissuccessful

• Attritioncarefullyaddressedornotanissue

• Contaminationnotanissue

5,4if

• Oneofthecriteriaisseverelyviolated

5,3if

• Twoormoreofthecriteriaareseverelyviolated

Instrumental Variable (IV)

a.k.a.

Two-StageLeastSquares(2SLS)

4,4ifinstrument

• Relevant(explainstreatment)

• Exogenous(notexplainedbyoutcome)

• Excludable(doesnotdirectlyaffectoutcome)

Scoredasperunderlyingmethodif

• Instrumentinvalid

• e.gcrosssectionwithinvalidIVscores2;difference-in-differencewithinvalidIVscores3(seebelow)

Regression Discontinuity Design (RDD)

4,4if

• Discontinuityintreatmentissharp(e.g.stricteligibilityrequirement)orfuzzydiscontinuitymethodused

• Onlytreatmentchangesatboundary

• Behaviourisnotmanipulatedtomakethecut-off

Scoredasperunderlyingmethodif

• Discontinuityconditionsseverelyviolated

• e.gcrosssectionwithinvalidIVscores2;difference-in-differencewithinvalidIVscores3(seebelow)

Difference in differences (DID)

a.k.a.

Diffindiff

3,3if

• Controlgroupwouldhavefollowedsametrendandtreatmentgroup

• Knowntimeperiodfortreatment

3,2if

• Eitherofthecriteriaisnotsatisfied

Panel methods

Panel Fixed Effects (FE)

3,3if

• Fixedeffectisattheunitofobservation

• Yeareffectsareincluded

• Appropriatetime-varyingcontrolsareused

3,2if

• Oneormoreofthethreecriteriaisnotsatisfied

First Differences (FD)

3,3if

• Yeareffectsareincluded

• Appropriatetime-varyingcontrolsareused

3,2if

• Eitherofthetwocriteriaisnotsatisfied

Arellano-Bond (AB)

3,3if

Yeareffectsareincluded

• Appropriatetime-varyingcontrolsareused

3,2if

• Oneofthetwocriteriaisnotsatisfied

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 31

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Hazard Regressions

Mixed Proportional Hazards (MPH)

4,4if

• Keyassumptionof‘noanticipation’holds

• Variationintiming(e.g.peoplestarttrainingatdifferenttimesrelativetobecomingunemployed)

4,3if

• Anticipationoftreatmentlikely

• Littlevariationintiming

4,2if

• Neitherofthecriteriaissatisfied

Proportional Hazards (PH)

3,3if

• Adequatecontrolgroupisestablished

• Treatmentdateisknownandsingular

3,2if

• Oneofthetwocriteriaisnotsatisfied

Heckman Two-Stage Approach (H2S)

Or

Control Function (CF)

With IV 4,4if

• SelectionequationincludesanIVthatsatisfiesthethreecriteria(seeIV)

Scoredasperunderlyingmethodif

• Instrumentinvalid

• e.gcrosssectionwithinvalidIVscores2;difference-in-differencewithinvalidIVscores3(seebelow)

With DID or panel method

3,3if

• Selectionequationincludesrelevantobservablevariables

• DIDorpanelcriteriamet

3,2if

• Oneofthecriteriaisnotsatisfied

Cross-sectional

2,2if

• Selectionequationincludesrelevantobservablevariables

2,1if

• Selectionequationdoesnotincluderelevantobservablevariables

Propensity Score Matching (PSM)

a.k.a.

Matching

With DID or panel method

3,3if

• Matchingcriteriasatisfied(seebelow)

• DIDorpanelcriteriasatisfied

3,2if

• Oneofthecriteriaisviolated

Cross-sectional

2,2if

• Goodmatchingvariables(i.e.relevanttoselection)

• Significantcommonsupport

2,1if

• Oneofthecriteriaisviolated

Cross-sectional regression 2,2if

• Adequatecontrolvariablesareused

2,1if

• Inadequatecontrolvariables

Before-and-after 2,2if

• Adequatecontrolvariablesareused

2,1if

• Inadequatecontrolvariables

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 32

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Stated effects/impact

a.k.a.

Additionality

Statedexperiences

NotSMSscoreable

Impact modelling NotSMSscoreable

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 33

Appendix 2: Mixed methods scoring 4 or 3.Asdiscussedinthemaintext,thereareanumberofmethodsthatarelessfrequentlyencounteredinlocaleconomicgrowthevaluationswhichmayscorea4ora3dependingontheparticularwayinwhichtheyarespecified.Thissectionexaminesthesemethodsandhowwescorethem.

Hazard Regressions Hazardfunctionsareusedtofindouttheimpactofapolicyonanoutcomethatrepresentsaduration.Theyarecommoninlaboureconomics,wheretheyareoftenusedtoanalysethevariablesthatimpactstrikeorunemploymentduration.Thehazardfunctioncalculatestheprobabilitythatanindividualwillleaveagivenstate(forinstance,unemployment)atagivenmomentintime.Proportionalhazardmodelsacknowledgethatthedependentvariableisafunctionofthetreatment,someobservedexplanatoryvariables(suchasageoreducation),arandomvariablethataccountsforindividualheterogeneity,andabase-linehazard.Thisbase-linehazardisessentiallythe“average”hazardfunction(i.e.propensitytoexitunemployment).Thismethoddealswithselectionintheprogrammeonobservablecharacteristicsbycontrollingfortheeffectofthesefactorsonthehazardrate.However,sinceselectionintothetreatmentgroupmayoccuronunobservablevariables,abiasmaypersist.BecausethismethodcontrolswellforobservablesbutisnotabletodealwithunobservablesitscoresamaximumofSMS3.However,inorderforthemethodtoachieveSMS3,twocriteriamustbesatisfied.Firstly,acontrolgroupmustbeused,anditmustbecrediblyarguedthattreatmentgroupwouldhavefollowedthesametrendasthiscontrolgroup,haditnotbeenexposedtotreatment.Secondly,theremustalsobeaknownandsingulartreatmentdatesothatthegroupscanbecomparedbeforeandafterthetreatment.

AnextensionofProportionalHazardmodelsistheMixedProportionalHazard(MPH)model.TheMPHexploitstimevariationintreatmenttoconstructacontrolgroup.Morespecifically,ithingesonthefactthatindividualsareexposedtotreatmentatdifferenttimes.Therefore,individualsthatreceivedthetreatmentatthebeginningoftheirunemploymentspellarecomparedtoindividualsthatreceivedthetreatmentafewmonthsaftertheyenteredunemployment.Here,theperiodwheretheindividualdidnothavetreatmentisusedasacontrolgroup.Thebasicideaisthattheindividualthatgottreatmentfromtheoutsetandtheindividualthatgottreatmentafewmonthslaterareverysimilarbecausetheybothself-selectedintotreatment,theonlydifferencebeingthepointatwhichtheyactuallygotit.BecausetheMPHfeaturesacontrolgroupthatisplausiblyunobservablyandobservablysimilartothetreatmentgroup,itcanachieveamaximumofSMS4.However,inorderforittoachievethismaximum,itmustmeettwokeycriteria.Firstly,individualscannotanticipatetreatment.Forinstance,ifanindividualknowstheyaregoingtoreceivetraininginamonth’stime,theymaynotsearchforjobsasintenselyastheydidbeforeknowing.Thismeansthatthe“control”groupiseffectivelytaintedandtheimpactofthepolicywillthereforelikelybeoverstated.Secondly,theremustbevariationintimingofindividuals’treatment.TheMPHmodelessentiallycomparesindividualsthatgotthetreatmentstraightawaywiththosethatgotthetreatmentabitlater.Forthisreason,itisextremelyimportantthatthisvariationexists.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 34

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Hazard Regressions

Mixed Proportional Hazards (MPH)

4,4if

• Keyassumptionof‘noanticipation’holds

• Variationintiming(e.g.peoplestarttrainingatdifferenttimesrelativetobecomingunemployed)

4,3if

• Anticipationoftreatmentlikely

• Littlevariationintiming

4,2if

• Neitherofthecriteriaissatisfied

Proportional Hazards (PH)

3,3if

• Adequatecontrolgroupisestablished

• Treatmentdateisknownandsingular

3,2if

• Oneofthetwocriteriaisnotsatisfied

MPH Example 1 (SMS 4, 4)

A2008paperbyLalive,vanOurs,andZweimullerevaluatestheimpactofactivelabourmarketprogrammesinSwitzerland.23Theseprogrammesaimtohelptheunemployedbyprovidingassistanceinsearchingforjobs.Aswithmostunemploymentprogrammes,thisisparticularlyhardtoevaluateastheunemployedarelikelytobedifferentfromtheemployed.Toovercomethis,theauthorsemployaMixedProportionalHazard(MPH)approach.InorderforthestudytoachievethemaximumSMSscoreof4,thetwocriteriamustbesatisfied.

1. No anticipation of treatment

Pass

Inthisparticularcase,itisarguedthatanticipationdoesnotplayasignificantrolebecauseSwissparticipantswereonlynotifiedaboutactualparticipationonetotwoweeksinadvance.Furthermore,theauthorshighlightthatinSwitzerlandtherearepenaltiesforjobseekersthatreducetheirsearcheffortsinanticipationofprogrammeparticipation.Lastly,theypointtothefactthattheprogrammewasoversubscribed,sojobseekerscouldnotaccuratelyknowwhenandiftheywouldbeparticipating.

2. Variation in timing

Pass

Theprogrammedoesnotspecifythatindividualsshouldbeunemployedforacertainamountoftimeinordertobeeligible.Thismeansthattheauthorscanstudytheeffectsoftheprogrammeforindividualsthathavebeenunemployedfordifferenttimeperiods.

Given that the two criteria are satisfied, this study achieves SMS 4 for implementation.

23 Lalive,R.,VanOurs,J.,Zweimuller,J.(2008).TheImpactofActiveLabourMarketProgrammesonTheDurationofUnemploymentinSwitzerland.EconomicJournal,235-257.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 35

MPH Example 2 (SMS 4, 3)

A2006studybyHujerandZeissevaluatestheimpactofajobcreationschemeonemployment.24Policiesthatseektoimproveindividuals’chancesoffindingajobareparticularlyhardtoevaluatebecauseunemploymentisoftendeterminedbypeople’sobservedandunobservedcharacteristics.Accordingly,theauthorsusetheMPHmodeltoovercomethisissue.

1. No anticipation of treatment

Fail

Itisextremelyunlikelythatindividualsdidnotknowbeforehandthattheywouldbeparticipatinginthejobschemeinthefuture,meaningthatthisconditiondoesnothold.Itisthereforelikelythatparticipantsstoppedsearchingforjobsasintenselywhentheyfoundoutthattheywouldbeparticipatinginthejobcreationscheme.

2. Variation in timing

Pass

Peoplethatparticipateintheprogrammedosowithvaryingamountsoftimeinunemployment.Inpractice,thismeansthattherearen’tanyeligibilityrequirementsintermsofunemploymenttime,thereforeimplyingthattheeffectoftheprogrammeonpeoplewithdifferingtimesofunemploymentcanbestudied.

Given that only one of the criteria is satisfied, this study achieves SMS 3 for implementation.

PH Example 3 (SMS 3, 3)

A2013studybyPalaliandvanOursevaluatestheimpactoflivingclosetoacoffee-shoponcannabisuse.25Theauthorsuseahazardfunctiontolookathowlongpeople“survive”withoutusingcannabis.Furthermore,theyexploitthefactthatcoffee-shopsbecamewidespreadduringthe1980s/90s,andthatthereweren’tanybeforethen.Theythereforeusecohortsthatwereinprimedrug-usingageaftertheriseofcoffee-shopsasthetreatmentgroup,andcohortsthataretoooldtobeaffectedcoffee-shopsasacontrol.

1. Adequate control group is established

Pass

Inthiscase,thetreatmentgroupisthecohortbornbetween1974and1992,whilethecontrolgroupisthecohortbornbetween1955and1973.Itcouldbeargued,however,thatthedifferentcohortsareexposedtodifferentculturaltrends,andarethereforeinherentlydifferenttoeachother,particularlywithrespecttocannabisuse.Tocounterthis,theauthorscontrolforthingslikereligiousaffiliation,urbanversusruralresidence,andmigrantstatus.

2. Treatment date is known and singular

Pass

In1980,theDutchgovernmentpubliclyannounceditspolicyoftoleranceofcoffee-shops.Thisledtoarapidincreaseinthenumberofestablishments,sothatbythemid-1990stherewerearound

24 Hujer,R.,Zeiss,C.(2006).TheEffectsofJobCreationSchemesontheUnemploymentDurationinEastGermany.IABDiscussionPapers.

25 Palali,A.,&VanOurs,J.(2013).DistancetoCannabis-ShopsandAgeofOnsetofCannabisUse.CentERDiscussionPaper,2013-2048.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 36

1500coffee-shops.Itcanbereasonablybelievedthatthisannouncementisasingularandtraceabletreatment.

Given that the two criteria are satisfied, this study achieves SMS 3 for implementation.

PH Example 2 (SMS 3, 2)

A1990studybyGundersonandMelinoevaluatestheimpactofpublicpolicyonstrikeduration.26Becausetheoutcomevariablehastodowithduration,theauthorsuseahazardregression.InordertoachievethemaximumSMSscoreof3,thetwocriteriamustbesatisfied.

1. Adequate control group is established

Fail

Inthiscase,theauthorsdonotattempttofindacontrolgroup.Instead,theysimplyrunaregressionwherebythestrikedurationsofdifferentunitswithdifferentpublicpoliciesarecompared.Therefore,itcanbeheldthattheauthorsdonotcomparetreatedunitswithuntreatedunits,butcompareunitswithdifferentintensitiesoftreatment.

2. Treatment date is known and singular

Fail

Thereisnoattempttodoacomparisonbeforeandafterthetreatment.Instead,placeswithinitialandfixeddifferinglevelsoftreatmentarecompared.

Given that none of the criteria are satisfied, this study achieves SMS 2 for implementation.

Heckman two-stage correction (H2S)/ Control function (CF)Thesemethodsuseatwo-stageapproachtoovercomeselectionbias.Thefirststageentailscalculatinga“selectionequation”thatincludesthecharacteristicsthatmayleadsomeonetoselectintotreatment.Inthesecondstage,elementsofthisselectionequationareincorporatedintothefinalregression(theonethatincludesthetreatmenteffectandoutcomevariableofinterest).Inthisway,theinclusionoftheselectionequationeffectively“absorbs”anyofthepre-existingselectionbias.However,althoughthesemethodsprovideapotentialsolutionforselectionbias,theydonotnecessarilyachievethisaim.Thisisbecausethequalityofthesemethodsislargelydependentonthetypesofvariablesthatareincludedintheselectionequation.Iftheselectionequationincludesahigh-qualityinstrument,thenitcanbeheldthattheselectionequationsuccessfullyaccountsforanyunobservableselectionbias,meaningthatthesemethodscanachieveamaximumofSMS4.Thisscorecanbeachievediftheinstrumentintheselectionequationsatisfiesthethreecriteriaforvalidinstruments(exogenous,relevant,andexclusionary).However,iftheselectionequationonlyincludesobservablecharacteristicsandnoinstrument,thentheunobservableelementofselectionbiasremainsanissue.Inthiscase,thesemethodscanachieveamaximumSMS2.Thismaximumscorecanbeachievedifthevariablesintheselectionequationadequatelyexplainselectionintotreatmentthatisbasedonobservablecharacteristics.However,evenifonlyobservablecharacteristicsareincludedintheselectionequation,ifthesemethodsarecombinedwithothermethodsthatattempttocontrolforunobservablecharacteristics(forinstance,DIDorpaneldatamethods),thentheycanachieveamaximumSMSscoreof3.Inordertoachievethisscore,theymustsatisfythecriteriaforthemethodthattheyareusedincombinationwith.

26 Morley,G.&Melino,A.(1990).TheEffectsofPublicPolicyonStrikeDuration.JournalofLaborEconomics

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 37

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Heckman Two-Stage Approach (H2S)

Or

Control Function (CF)

With IV 4,4if

• SelectionequationincludesanIVthatsatisfiesthethreecriteria(seeIV)

Scoredasperunderlyingmethodif

• Instrumentinvalid

• e.gcrosssectionwithinvalidIVscores2;difference-in-differencewithinvalidIVscores3(seebelow)

With DID or panel method

3,3if

• Selectionequationincludesrelevantobservablevariables

• DIDorpanelcriteriamet

3,2if

• Oneofthecriteriaisnotsatisfied

Cross-sectional

2,2if

• Selectionequationincludesrelevantobservablevariables

2,1if

• Selectionequationdoesnotincluderelevantobservablevariables

H2S with IV Example (SMS 4, 3)

A2013studybyChen,Sheng,andFindlayevaluatestheimpactofindustry-levelforeigndirectinvestment(FDI)ontheexportvalueofdomesticfirmsinChina.27Thereisaclearproblemofselectionbias,asindustriesthatreceiveFDIarelikelydifferenttothosethatdonot.Toovercomethis,theauthorsemploytheHeckmantwo-stageapproach.Furthermore,theyuseFDIinflowstoASEANcountries(perindustry)asaninstrumentforFDIinflows,andincludethisvariableintheselectionequation.

Instrument is:

1. Instrument is relevant

Pass

ItcanbeplausiblyheldthatFDIinChinaiscloselyrelatedtotheFDIinflowsofASEANcountriesasawhole.

2. Instrument is exogenous

Fail

WhetherornotaparticularindustryattractsFDIisnotrandom.Forthisreason,itcannotbereasonablyarguedthattheinstrumentisexogenous.

3. Instrument is exclusionary

Pass

FDIinASEANcountriesdoesnotdirectlyimpacttheexportvalueofChinesefirms.

Given that only two of the criteria are satisfied, this study achieves SMS 3 for implementation.

27 Chen,C.,Sheng,Y.,Findlay,C.(2013).ExportSpilloversofFDIonChina’sDomesticFirms.ReviewofInternationalEconomics,841-856.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 38

CF with DID Example (SMS 3, 3)

A2006studybyAndrenandAndrenevaluatestheimpactofvocationaltrainingonemployment.28Toovercomeissuesofselectionbias,theyemploytheC.F.method.Furthermore,itcontrolforunobservableselectionintotreatment,theyconstructacontrolgroupofindividualsthatdidnotreceivethetraining.Theythencomparetreatedandcontrolindividuals’propensitytoleaveunemploymentbeforeandafterthetraining.

1. Selection equation includes relevant observable variables

Pass

Theauthorsincludevariablessuchasage,sex,education,whethertheindividualhaschildrenornot,andregionofresidence.Withrespecttoobservables,thisseemstobeanadequatelist.

2. Control group would have followed same trend as treatment group

Pass

TheauthorshadaccesstoarichdatabaseofallunemployedindividualsinSweden.Theythereforechoseindividualsthatwereobservationallysimilartotreatedindividualstoformtheircontrolgroup.Accordingly,itcanbecrediblyarguedthattreatmentandcontrolgroupsareatleastobservationallysimilar,andthereforewilllikelyfollowthesametrends.

3. Treatment date is known and singular

Pass

Althoughdifferentindividualsstartedandendedthevocationaltrainingprogrammeatdifferenttimes,itisreasonabletopresumethatthereisaclearandsingulardateoftreatmentforeachperson.Thisisbecausevocationaltrainingprogrammeshaveaclearstartdate,sothatthereareno“partial”treatmenteffects.

Giventhatthethreecriteriaaresatisfied,thestudyachievesSMS3forimplementation.

H2S Cross-sectional Example (SMS 2, 2)

A2009studybyHammarstedtevaluatestheimpactoffutureearningsonthedecisiontoseekself-employmentinsteadofwage-employment.29Thedecisiontobecomeanentrepreneurislikelytaintedbyproblemsofself-selection–thosethatchoosetoforgowageemploymentareprobablyinherentlydifferenttothosethatdonot.Forthisreason,theauthorusestheHeckmantwostageapproachtocorrectforselectionbias.However,becausethestudyonfeaturesobservationsfortheyear2003,itispurelycross-sectional.

1. Selection equation includes relevant observable variables

Pass

Theauthorincludesanextensivelistofcontrolvariablesthataccountfordifferencesineducationandlocation.

Giventhatthecriterionissatisfied,thisstudyachievesSMS2forimplementation.

28 Andren,T.&Andren,D.(2006).AssessingtheEmploymentEffectsofVocationalTrainingUsingaOne-FactorModel.AppliedEconomics,2469-2486.

29 Hammarstedt,M.(2009).PredictedEarningsandthePropensityforSelf-Employment.InternationalJournalofManpower,349-359.

Guide to scoring evidence using the Maryland Scientific Method Scale - Updated June 2016 39

Appendix 3: Arellano-Bond methodTheArellano-Bond(AB)methodtakesasimilarapproachtoeitherthefixedeffectsorfirstdifferencesmethodbutincludesalagoftheoutcomevariableintheregression..However,ABrecognisesthatthelagmaybecorrelatedwiththeerrorterm,meaningthatthecoefficientsmaybebiased.Forthisreason,afurtherlagisusedasaninstrumentfortheoriginallag.

MethodMaximum SMS score (method, implementation)

Adjusted SMS score (method, implementation)

Panel Methods

Arellano-Bond (AB)

3,3if

• Yeareffectsareincluded

• Appropriatetime-varyingcontrolsareused

3,2if

• Oneofthetwocriteriaisnotsatisfied

AB Example 1 (SMS 3, 3)

A2010studybyBayona-SaezandGarcia-MarcoevaluatestheimpactoftheEurekaprogramme(aEuropeaninitiativetosupportR&D)onfirmperformance.30AswiththeevaluationofotherR&Dsubsidyprogrammes,thereisprobablyaproblemofselectionbiasasfirmsthatusethesubsidiesarelikelytobedifferentfromthosethatdonot.Toovercomethis,theauthorsusetheArellano-Bondmethod.

1. Year effects are included

Pass

Theauthorsincludeayearvariableinthefinalregression.

2. Appropriate time-varying controls used

Pass

Inthisparticularcase,itislikelythatthereareimportantvariablesthataffectfirmprofitabilityandvarywithtime.Accordingly,theauthorsincludeatime-varying“firmsize”variable.

Given that the two criteria are satisfied, this study achieves SMS 3 for implementation

30 Bayona-Sáez,C.&García-Marco,T.(2010).AssessingtheEffectivenessoftheEurekaProgram.ResearchPolicy,1375-1386.

The What Works Centre for Local Economic Growth is a collaboration between the London School of Economics and Political Science (LSE), Centre for Cities and Arup.

www.whatworksgrowth.org

ThisworkispublishedbytheWhatWorksCentreforLocalEconomicGrowth,whichisfundedbyagrantfromtheEconomicandSocialResearchCouncil,theDepartmentforBusiness,InnovationandSkillsandtheDepartmentofCommunitiesandLocalGovernment.ThesupportoftheFundersisacknowledged.TheviewsexpressedarethoseoftheCentreanddonotrepresenttheviewsoftheFunders.

Everyefforthasbeenmadetoensuretheaccuracyofthereport,butnolegalresponsibilityisacceptedforanyerrorsomissionsormisleadingstatements.

Thereportincludesreferencetoresearchandpublicationsofthirdparties;thewhatworkscentreisnotresponsiblefor,andcannotguaranteetheaccuracyof,thosethirdpartymaterialsoranyrelatedmaterial.

June2016

WhatWorksCentreforLocalEconomicGrowth

info@whatworksgrowth.org@whatworksgrowth

www.whatworksgrowth.org

©WhatWorksCentreforLocalEconomicGrowth2016