The Psychologist May 2012

psychologistthe

may 2012vol 25 no 5

Replication –psychology’shouse of cards?An ‘opinion special’ on acornerstone of science

interview with Trevor Robbins 360origins of the human cultural mind 364analytical decision making 390one on one with Richard Freeman 396

£5 or free to members of The British Psychological Society

letters 330news 338

careers 382looking back 394

Incorporating Psychologist Appointments

psy 05_12 pOFC_Layout 1 16/04/2012 13:28 Page 1

vol 25 no 5 may 2012

We rely on your submissions, and in return we help you to get your message across to a large and diverse audience. See www.bps.org.uk/writeforpsycho

‘Please consider contributing to The Psychologist! The magazine relies on yoursupport, and is always on the look-out for a range of content, from reviews, tointerviews, to full articles. The editorial team are very supportive, and it is agreat way of communicating your work and opinions to other psychologists.’Richard Wiseman, Professor of the Public Understanding of Psychology at theUniversity of Hertfordshire

Connect with The Psychologist and the Society’s free ResearchDigest service for more psychological news and analysis:

Subscribe by RSS or e-mail at www.thepsychologist.org.uk and www.researchdigest.org.uk/blog

Become a fan at tinyurl.com/thepsychomag and www.facebook.com/researchdigest

Follow us at www.twitter.com/psychmag and www.twitter.com/researchdigest

For all the latest psychology jobs and careers information, see www.psychapp.co.uk

We can help you to advertise to a large, well-qualified audience: see www.bps.org.uk/advertise and find out how.

For full details of the policy and procedures of The Psychologist, see www.thepsychologist.org.uk.

If you feel these policies and procedures have not been followed, contact theeditor on [email protected], or the Chair of the Psychologist and DigestPolicy Committee, Professor David Lavallee, on [email protected]

Welcome to The Psychologist, the monthly publication of The British PsychologicalSociety. It provides a forum for communication, discussion and controversy among allmembers of the Society, and aims to fulfil the main object of the Royal Charter, ‘to promote the advancement and diffusion of a knowledge of psychology pure and applied’. It is supported by www.thepsychologist.org.uk, where you can view this month’s issue,search the archive, listen, debate, contribute, subscribe, advertise, and more.

If you need ThePsychologist in adifferent format,contact us withyour requirements tel 0116 252 9523or e-mail us [email protected]

ContactThe British Psychological SocietySt Andrews House48 Princess Road EastLeicester LE1 7DRtel 0116 254 9568fax 0116 227 1314

Society websitewww.bps.org.uk

The Psychologist [email protected]

General Society [email protected]

AdvertisingReach 50,000 psychologists at very reasonable rates. Display Ben Nelmes020 7880 [email protected] (in print and onlineat www.psychapp.co.uk)Giorgio Romano 020 7880 [email protected]

April 2012 issue49,089 dispatched

Printed byWarners Midlands plc on 100 per cent recycled paper

Please re-use or recycle.See the online archive atwww.thepsychologist.org.ukand digital samples at www.issuu.com/thepsychologist

ISSN 0952-8229

© Copyright for all published material is heldby The British Psychological Society, unlessspecifically stated otherwise. Authors,illustrators and photographers may use theirown material elsewhere after publicationwithout permission. The Society asks that thefollowing note be included in any such use:‘First published in The Psychologist, vol. no. anddate. Published by The British PsychologicalSociety – see www.thepsychologist.org.uk.’ Asthe Society is a party to the Copyright LicensingAgency agreement, articles in The Psychologistmay be photocopied by licensed institutionallibraries for academic/teaching purposes. Nopermission is required. Permission is requiredand a reasonable fee charged for commercialuse of articles by a third party: please apply inwriting. The publishers have endeavoured totrace the copyright holders of all illustrations. Ifwe have unwittingly infringed copyright, we willbe pleased, on being satisfied as to the owner’stitle, to pay an appropriate fee.

Managing Editor Jon SuttonAssistant EditorPeter Dillon-Hooper Production Mike Thompson Staff journalist /Research DigestChristian Jarrett Editorial Assistant Debbie JamesOccupational DigestAlex Fradera

The Psychologist andDigest Policy Committee David Lavallee (Chair),Phil Banyard, NikChmiel, Olivia Craig,Helen Galliard, RowenaHill, Jeremy Horwood,Catherine Loveday, PeterMartin, Victoria Mason,Stephen McGlynn, TonyWainwright, PeterWright, and AssociateEditors

Associate Editors Articles Harriet Gross, Marc Jones, RebeccaKnibb, Charlie Lewis, Wendy Morgan, PaulRedford, Miles Thomas, Monica Whitty, JillWilkinson, Barry WinterConferences Alana JamesHistory of Psychology Nathalie ChernoffInterviews Nigel Hunt, Lance WorkmanViewpoints Catherine Loveday

International panelVaughan Bell, Uta Frith, Alex Haslam, Elizabeth Loftus

psy 05_12 p329 contents_Layout 1 16/04/2012 12:12 Page 328

read discuss contribute at www.thepsychologist.org.uk 329

psychologistthe

may 2012vol 25 no 5

Squeezing a rubber ball in one's lefthand increases creativity. Writing abrief paragraph on important valuesimproves academic achievementover the subsequent year. Chocolatecookies help you to persist with anunsolvable problem.

What have these findings got incommon? They are all in a ‘top 20’ offindings that people would most liketo see replicated (seewww.psychfiledrawer.org/top-20).Replication is described by many as the cornerstone of scientificprogress, and the issue has beendiscussed extensively in theblogosphere of late.

The debate built following failuresto replicate studies on priming byJohn Bargh, and on ‘anomalousretroactive influences’ by Daryl Bem.Some have even suggested thatreplication is psychology’s ‘house ofcards’.

In an attempt to shed some lighton this perennial issue, I invitedpsychologists to share their views onreplication, and constructive waysforward. I hope you will find theresults to be a timely and importantcollection, building on ThePsychologist’s role as a forum fordiscussion and debate.

Dr Jon Sutton

THE ISSUE

At the interfaceJon Sutton interviews Trevor RobbinsCBE (University of Cambridge)

Evolutionary origins of the humancultural mindThibaud Gruber and Klaus Zuberbühlerlook to non-human primates for clues

ww

w.t

heps

ycho

logi

st.o

rg.u

k

BIG

PICT

URE

What happens when you play‘Chinese whispers’ withdrawings? And what does thistell us about the nature ofobject recognition andperception? Following thework of Bartlett in the 1930s,psychologist Benjamin Dyson(Ryerson University, Ontario)investigated, in acollaboration with UK artistRachel Cohen. In the study,published last year in

Perception, the researchersgot participants to copyimages which were presentedto them in either canonicalviewpoint (their simplest andmost familiar form), canonicalview with a missing feature, ora non-canonical perspective.By the 19th copy, there wasinevitable deterioration in therepresentation. When newparticipants were given thesesets of images in reverse, they

were quicker to correctlyname the final image in asequence derived from acanonical view. ‘We think thatthe relative lack of “top down”influence for the non-canonical images may resultin more accurate copying,’Dyson and Cohen say. ‘Butwith the canonical depictions,what is left is morecaricature-like drawings,yielding consistent labellingat the expense of accuratecopying.’ According to theresearchers, this study makesit clear that the interactionbetween bottom-up and top-down processes can betranslated from traditionalparadigms to the uniquedomain of drawing. ‘This is awindow onto how differencesin perception and cognitionare embodied and expressed.’

Decaying drawingsImages from research by artist Rachel Cohen and psychologist Benjamin Dyson. E-mail ‘Big picture’ ideas to [email protected].

letters 330impact factors; panic buying; REF; well-being psychology; and more

media 344

364

360

science journalism, with Mark Sergeant; France’s autism ‘shame’; and more

society 376President’s column; Award forOutstanding Doctoral ResearchContributions; public engagementgrants; latest from BPS journals; and more

382

pull-out

346

we meet Don Rawson, counselling psychologist, and Yvonne Gailey, ChiefExecutive of the Risk Management Authority

looking back 394

new voices 390

the journal of a mental hospital user in the 1960s, recounted by Richard S. Hallamand Michael P. Bender

does it pay to be analytical? Stephanie Rhodes with the latest in our series forbudding writers

one on one 396…with Richard Freeman

news and digest 338the Health and Social Care Bill; dementia challenge; willpower report; fundingnews; and nuggets from the Society’s Research Digest service

book reviews 372toilet psychology; social neuroscience;the mindful workplace; PTSD; andqualitative research methods

careers and psychologist appointments

Replication, replication, replication Stuart J. Ritchie, Richard Wiseman andChristopher C. French with the openingcontribution to a special on one of thecornerstones of scientific progress

Replication: where do we go from here? A variety of perspectives on replication andpossible ways forward, from: Daniel Simons;Dave Nussbaum; Henry L. Roediger, III; GregoryMitchell; Daryl Bem; Claudia Hammond; DanielBor; Sam Gilbert; Joshua Hartshorne and AdenaSchachner; Alex Holcombe and Hal Pashler;Jelte Wicherts; and Stephen Pilling

psy 05_12 p329 contents_Layout 1 16/04/2012 12:12 Page 329

limits of science, our current statisticalparadigm, the policies of academicjournal publishing, and what exactly ascientist needs to do to convince theworld that a surprising finding is true. Inthis article we outline the ‘Feeling thefuture’ controversy, our part in it, andhighlight these important questions aboutscientific psychology.

Some recent parapsychologicalresearch projects have taken a somewhatidiosyncratic approach to extrasensoryperception by examining, for example,whether zebra finches can see into thefuture (Alvarez, 2010). In contrast, Bemadopted a more back-to-basics approach,taking well-worn psychologicalphenomena and ‘time-reversing’ them toplace the causes after the effects. By farthe largest effect size was obtained inBem’s final experiment, whichinvestigated the ‘retroactive facilitation of recall’. In this procedure, participantswere shown a serial list of words, whichthey then had to type into a computerfrom memory in a surprise free recall test.After the test, the computer randomlyselected half of the words from the listand showed them again to theparticipants. Bem’s results appeared toshow that this post-test practice hadworked backwards in time to help hisparticipants to remember the selectedwords – in the recall test they hadremembered more of the words they were about to (randomly) see again.

If these results are true, theimplications for psychology – andsociety – are huge. In principle,experimental results could be confoundedby participants obtaining informationfrom the future, and studying for an examafter it has finished could improve yourgrade!

As several commentators have pointedout, Bem’s (2011b) experiments were farfrom watertight – for instance, Alcock(2011) and Yarkoni (2011) have outlinednumerous experimental flaws in thedesign. We won’t describe these variousissues here as they have been widelydiscussed in the blogosphere and

Last year, Cornell social psychologistDaryl Bem had a paper published inthe prestigious Journal of Personality

and Social Psychology (JPSP) entitled‘Feeling the future’ (Bem, 2011b).According to the nine studies described in the paper, participants could reliably – though unconsciously – predict future

events using extrasensory perception. Thefindings proved eye-catching, with manymajor media outlets covering the story;Bem even discussed his work on thepopular American TV show The ColbertReport.

The wide-ranging discussion of Bem’spaper has raised questions regarding the

346 vol 25 no 5 may 2012

Alcock, J.E. (2011, 6 January). Back fromthe future: Parapsychology and theBem affair. Skeptical Inquirer.Retrieved 6 March 2012 fromhttp://tinyurl.com/5wtrh9q

Aldous, P. (2011, 5 May). Journal rejectsstudies contradicting precognition.New Scientist. Retrieved 6 March,2012 from http://tinyurl.com/3rsb8hs

Alvarez, F. (2010). Higher anticipatoryresponse at 13.5 ± 1 H local sidereal

time in zebra finches. Journal ofParapsychology, 74(2), 323–334.

Bem, D.J. (2011a, 6 January). Response toAlcock’s ‘Back from the future:comments on Bem’. SkepticalInquirer. Retrieved 6 March 2012,from http://tinyurl.com/chhtgpm

Bem, D.J. (2011b). Feeling the future:Experimental evidence for anomalousretroactive influences on cognitionand affect. Journal of Personality and

Social Psychology, 100(3), 407–425.doi: 10.1037/a0021524

Bem, D.J., Utts, J. & Johnson, W.O.(2011). Must psychologists change theway they analyse their data? Aresponse to Wagenmakers, Wetzels,Borsboom & van der Maas (2011).Journal of Personality and SocialPsychology, 101(4), 716–719.

Goldacre, B. (2011, 23 April). Backwardsstep on looking into the future. The

Guardian. Retrieved 16 March 2012from http://tinyurl.com/3d9o65e

LeBel, E.P., & Peters, K.R. (2011). Fearingthe future of empirical psychology:Bem’s (2011) evidence of psi as acase study of deficiencies in modalresearch practice. Review of GeneralPsychology, 15(4), 371–379. doi:10.1037/a0025172

Ritchie, S.J., Wiseman, R., & French, C.C.(2012). Failing the future: Three

refe

renc

esOP

INIO

N

Replication, replication,replication Stuart J. Ritchie, Richard Wiseman and Christopher C. French with the openingcontribution to a special on one of the cornerstones of scientific progress

psy 05_12 p346_357 replication_Layout 1 16/04/2012 12:32 Page 346

elsewhere (see, for instance, Bem’s, 2011a,response to Alcock). While many of thesemethodological problems are worrying,we don’t think any of them completelyundermine what appears to be animpressive dataset.

The ‘Feeling the future’ study hasbecome a test case for proponents ofBayesian theory in psychology, with somecommentators (e.g. Rouder & Morey,2011) suggesting that Bem’s seeminglyextraordinary results are an inevitableconsequence of psychology’s love for null-hypothesis significance testing. Indeed,Wagenmakers et al. (2011a) suggest thathad Bayesian analyses been employed,with appropriate priors, most ofBem’s effects would have beenreduced to a credibility level nohigher than anecdotal evidence.Given that casinos are not goingbankrupt across the world,argued the authors, our priorlevel of scepticism about theexistence of precognitive psychicpowers should be high.

Bem and colleaguesresponded (2011), suggesting a selection of priors which werein their view more reasonable,and which were in our viewillustrative of the problem withBayesian analyses, especially in a controversial area likeparapsychology: Your Bayesianprior will depend on where you stand on the previous evidence. Do you, unlikemost scientists, take seriously the positiveresults that are regularly published inparapsychology journals like the Journalof the Society for Psychical Research, or theJournal of Parapsychology? Or do you onlyaccept those that occasionally appear inorthodox journals, like the recent meta-analysis of ‘ganzfeld’ telepathy studies inPsychological Bulletin (Storm et al.,2010)? Do you consider the real world –full as it is of the aforementionedsuccessful casinos – as automaticevidence against the existence of anypsychic powers? Your answers to thesequestions will inform your priors and,

consequently, the results of your Bayesiananalyses (see Wagenmakers et al., 2011b,for a response to Bem et al., 2011).

We reasoned that the first steptowards discovering whether Bem’salleged effects were genuine was to see if they would replicate. As one of us haspointed out previously (Wiseman, 2010),the only definitive way of doing this is to carry out exact replications of theprocedure in the original experiment.Otherwise, any experimental differencesmuddy the waters and – if the replicationsfail – allow for alternative interpretationsand ‘get-outs’ from the originalproponents. Recently, this argument was

taken up with direct reference to Bem’sexperiment by LeBel and Peters (2011),who strongly argued in favour of moreexact replications.

Admittedly, carrying out exactreplications of someone else’s work ishardly the most glamorous way to spendyour time as a scientist. But we are oftenreminded – most recently by an excellentarticle in the APS Observer (Roediger,2012, and this issue) – that replication is one of the cornerstones of scientificprogress. Keeping this in mind, the threeof us each repeated the procedure forBem’s ‘retroactive facilitation of recall’experiment in our respective psychologydepartments, using Bem’s instructions,

involving the same number ofundergraduate participants (50) as heused, and – crucially – using Bem’scomputer program (with only someminor modifications, such as anglicising a few of the words). Either surprisingly orunsurprisingly, depending on your priors,all three replication attempts were abjectfailures. Our participants were no betterat remembering the words they were aboutto see again than the words they wouldnot, and thus none of our three studiesyielded evidence for psychic powers.

We duly wrote up our findings andsent them off to the JPSP. The editor’sresponse came very quickly, and was

friendly, but negative. Thejournal’s policy, the editorwrote, is not to publishreplication attempts, eithersuccessful or unsuccessful.Add something new to thestudy (a ‘replication-and-extension’ study), he told us,and we may consider it. Wereplied, arguing that, sinceBem’s precognitive effectwould be of such clearimportance for psychology, it would surely be critical tocheck whether it exists in thefirst place, before going on tolook at it in different contexts.

The editor politely declinedonce more, as described by Aldous

(2011), and Goldacre (2011).While exact replications are useful

for science, they’re clearly not veryinteresting for top journals like JPSP,which will only publish findings thatmake new theoretical or empiricalcontributions. We are not arguing thatour paper should automatically have beenpublished; for all we knew at this point, itmay have suffered from some unidentifiedflaw. We would, however, like to raise thequestion of whether journals should be sofast to reject without review exactreplications of work they have previouslypublished, especially in the age of onlinepublishing, where saving paper is nolonger a priority (Srivastava, 2011).


opinion

unsuccessful replications of Bem’s‘retroactive facilitation of recall’effect. PLoS ONE, 7(3), e33423.

Roediger, H.L., III (2012). Psychology'swoes and a partial cure: The value ofreplication. Observer. Retrieved 16March 2012 fromhttp://tinyurl.com/d4lfnwu

Rosenthal, R. (1966). Experimenter effectsin behavioural research. New York:Appleton-Century-Crofts.

Rosenthal, R. (1979). The file drawerproblem and tolerance for nullresults. Psychological Bulletin, 86(3),638–641. doi: 10.1037/0033-2909.86.3.638

Rouder, J.N. & Morey, R.D. (2011). ABayes-factor meta-analysis of Bem’sESP claim. Psychonomic Bulletin &Review, 18(4), 682–689.

Schlitz, M., Wiseman, R., Watt, C., &Radin, D. (2006). Of two minds:

Sceptic-proponent collaborationwithin parapsychology. British Journalof Psychology, 97, 313–322. doi:10.1348/000712605X80704

Srivastava, S. (2011, May 10). How shouldjournals handle replication studies?[Web log post]. Retrieved 6 March2012 from http://tinyurl.com/crb24a8

Storm, L., Tressoldi, P. & Di Risio, L.(2010). Meta-analysis of free responsestudies, 1992–2008: Assessing the

noise reduction model inparapsychology. Psychological Bulletin,136(4), 471–485. doi: 10.1037/a001945

Wagenmakers, E.-J., Wetzels, R.,Borsboom, D. & van der Maas, H.L.J.(2011a). Why psychologists mustchange the way they analyse theirdata: The case of psi: Comment onBem (2011). Journal of Personality andSocial Psychology, 100, 426–432. doi:10.1037/a0022790

“the file drawerproblem has beendiscussed fordecades, but still nosolid solutions appearto be forthcoming”

I Stuart J. Ritchie (above left) is at theUniversity of [email protected]

I Richard Wiseman (above) is at theUniversity of Hertfordshire

I Christopher C. French (left) is atGoldsmiths, University of London


After a couple of other submissionattempts rapidly failed, we submitted ourpaper to the British Journal of Psychology(BJP). They have no automatic rejectionpolicy for replication studies, and to ourrelief, sent our paper out for review. Aftera long waiting period, we heard backfrom two reviewers, the first of whom was very positive, and urged the editor topublish our paper. Conversely, the secondreviewer – who found no technical faultswith our procedures – recommendedagainst publishing our paper as it was,because of one problem: the experimentereffect.

We know, argued the reviewer, thatexperimenter expectations can influenceobtained results (e.g. Rosenthal, 1966),and this problem is worse inparapsychology, where scepticalexperimenters might somehowcommunicate their scepticism toparticipants, thus affecting taskperformance. We found this implausible,given both the very brief and simplenature of Bem’s experiment, and the lackof evidence that performance on a psychicmemory task is influenced by priorscepticism. But one explanation forparapsychological experimenter effects isthat sceptics might inadvertently use theirown psychic powers to nullify the powersof their participants (e.g. Wiseman &Schlitz, 1998), rendering them unable to ‘feel the future’ for the duration of thestudy. Perhaps the editors found thisexplanation convincing, because theyagreed with the reviewer, asking us to runa fourth study where a believer in psychicpowers ran the experiments.

This latter explanation seemed to usto beg the question, to say the least. Intrying to assess whether or not psychicpowers exist, it is surely jumping the gunsomewhat to expect that those unverifiedpowers are influencing the assessmentitself! Indeed, one of us had previouslytested this exact phenomenon inparapsychology, asking whether believersobtain more positive results than scepticswho use the same methodology. The firstexperiments, on remote detection of

staring, seemed to show that this was the case (Wiseman & Schlitz, 1998).However, the most recent study of thisphenomenon (Schlitz et al., 2006) – theone with the largest sample size and thetightest experimental set-up, published inthe BJP – showed no experimenter effects,with sceptic and believer both findingnull results. Not exactly stunningevidence for the existence of unconsciousbias, or indeed psychic interference, onthe part of sceptics.

Most importantly, however, anyexperimenter effects should not havemattered – Bem, in his original paper,pointed out that his experimental setupreduced experimenter effects to aminimum (Bem, 2011b, p.16), as thecomputer program ran and scored theentire procedure itself, and theexperimenter only greeted and debriefedparticipants. In two of our replicationattempts we, like Bem, had researchassistants do all the required jobs, andhad no contact with the participantsourselves. This reviewer, then, seemed to have missed Bem’s point – these werespecifically intended to be replicableexperiments, which could demonstrateprecognitive effects to sceptics everywhere.

Since we didn’t agree with the logicbehind the believer-as-experimentercondition (we wonder – should beingcriticised for not believing in a particularphenomenon be dubbed the ‘Tinkerbelleffect’?), we withdrew our paper from theBJP, and decided to have one final try atsubmitting elsewhere. Happily for us,PLoS ONE accepted ourarticle for publication(Ritchie et al., 2012), and the article is now availableon their open-access website.

We would be the first tostate that, even though wehave had three failedreplication attemptspublished, this does not rule out Bem’s precognitiveeffects. Most obviously, wehave only attempted toreplicate one of Bem’s nine

348 vol 25 no 5 may 2012

opinion

Wagenmakers, E.-J., Wetzels, R.,Borsboom, D. & van der Maas, H.L.J.(2011b). Yes, psychologists must changethe way they analyse their data:Clarifications for Bem, Utts, andJohnson (2011). Unpublishedmanuscript.

Wiseman, R. (2010). ‘Heads I win, tails youlose’: How parapsychologists nullifynull results. Skeptical Inquirer, 34(1),36–39.

Wiseman, R. & Schlitz, M. (1998).Experimenter effects and the remotedetection of staring. Journal ofParapsychology, 61(3), 197–208.

Yarkoni, T. (2011, 10 January). Thepsychology of parapsychology, or whygood researchers publishing goodarticles in good journals can still get ittotally wrong [Web log post]. Retrieved6 March 2012 fromhttp://tinyurl.com/694ycam

experiments; much work is yet to bedone. We’ve been made aware of a fewother replication attempts through astudy registry set up by Wiseman andCaroline Watt (tinyurl.com/bemreplic).Like trial registries in clinical medicine,researchers were asked to record theirBem replication attempts here, for theirresults to be included in a meta-analysis,which is currently in progress. The ideal,we believe, would be a prospective meta-analysis: researchers sit down togetherand plan, say, 10 studies in differentlaboratories of one effect, with samplesizes and analyses set in stone before theystart. When the studies are complete, thedata is pooled and analysed, andconclusions can be drawn that are(hopefully) acceptable to everyoneinvolved.

While our experience relates to arather outlandish area of psychology, thecontroversies, questions, and lessons wecan draw from it apply to all publishingscientists (see Box 1). How many otherresearchers, we wonder, have tried andfailed to replicate more sublunary effectsand had their papers rejected, or not evenattempted to publish them in the firstplace? Scientists’ and scientific journals’well-known aversion to publishing null ornegative results – the file-drawer problem(e.g. Rosenthal, 1979) – has beendiscussed for decades, but still no solidsolutions appear to be forthcoming. Wehave a feeling the future will hold furtherdebate over these vexed and importantquestions.

Questions arising I To quote the title of Bem et al.’s (2011) response

to Wagenmakers et al. (2011a): ‘Do psychologistsneed to change the way they analyse their data?’Do we need to consider becoming Bayesians?

I How do we deal with experimenter effects inpsychology laboratories?

I Should journals accept papers reportingreplication attempts, successful or failed, whenthey themselves have published the originaleffect?

I Where should journals publish replicationattempts? Internet-only, with article abstracts inthe paper copy?

I Who should carry out replication studies? Shouldscientists be required to replicate their ownfindings?

I If a scientist chose to carry out many replicationsof other people’s work, how would this impact hisor her career?

I Should more outstanding and controversialscientific questions be subject to prospectivemeta-analyses?


Scientists hope other laboratories willreplicate their findings – any competentresearcher should be able to re-do anexperiment and produce the same effect,and independent replication is thebenchmark of cumulative science. Forgood reason, we all dread publishingfindings that fail to replicate. Butreplication failures happen for manyreasons.

Statistically, replication failures shouldhappen some of the time, even for directreplications of a real effect. Individualstudies are samples of reality, noisyestimates of the truth. But such estimatesvary, sometimes overestimating and othertimes underestimating the true effect. Forunderpowered studies – those testingsmall effects using relatively fewparticipants – replication failures arestatistically more likely. But so are falsepositives.

Given the existing bias shared byauthors, reviewers and journals to publishonly significant results (see Fanelli, 2012,for evidence that the problem isworsening and not restricted topsychology), coupled with investigatordegrees of freedom that inflatesignificance levels artificially (e.g.Simmons et al., 2011), most striking newfindings from underpowered studies arelikely to overestimate the actual effectsize. Some will be false positives, withlittle or no underlying effect.Consequently, a similarly poweredreplication attempt is likely to find asmaller, possibly non-significant result.

As a field, we could reduce thelikelihood of false positives and enhanceour ability to detect them if journals weremore willing to publish both successfuland failed replications. Perhaps moreimportantly, we need to rethink thepublishing incentives in psychology toencourage multiple replication attemptsfor important new findings. Otherwise,science cannot adequately correct itsmistakes orlead to acumulativeestimate of the true effect.The presentpublicationmodel providesstrongdisincentives to replications.Some journalsoutright refuseto publish direct replication attempts, anda failed replication often incurs the wrathof the original researcher.

My favoured solution to the incentiveproblem would be a journal of replicationattempts. In my view, the peer-reviewprocess for such a journal should occurbefore a replication is attempted, based on a detailed method section and analysisplan. Once the design and analysis meetthe approval of the original authors (orsomeone they designate), the resultswould be published regardless of theoutcome. A crucial advantage of thisapproach would be a more constructive


opinion

Replication: Where do wego from here? A stellar cast of contributors offer their personal take on replication and possible progress

review process – because the originalresearchers want to see their findingsreplicated, they have an incentive to makesure such replications are conductedcompetently and accurately. Andresearchers would be guaranteed apublication should they choose toreplicate another’s research. For importantfindings, the field would benefit becausemany labs could contribute directreplication attempts, leading to a moreaccurate estimate of the true effect and toa more cumulative science (the measuredeffect from each replication attempt couldbe catalogued on a site likewww.psychfiledrawer.org). Finally, thisapproach would carry an indirect benefitto the public perception of psychology –flashy findings based on underpoweredstudies garner tremendous coverage inthe media. Such effects are less likely toreplicate, and if researchers knowreplication attempts will follow, they will be more cautious about publishing

dubious data inthe first place.

The goal ofscientificpsychology shouldbe to obtain anaccurate estimateof the actual sizeof the effects we

measure. The moredirect replicationattempts, the better the estimate of the

true effect. In light of recent evidence forpublication bias, investigator degrees offreedom in analysis, and the risk of falsepositives, any individual finding,especially one from an underpoweredstudy, should be viewed with as muchskepticism as a single failure to replicate.What the field needs are many directreplication attempts for each importantfinding. Only then can we be confidentthat an intriguing finding is more than afalse positive or that a replication failureis more than a false negative.

Interested in a new journal like theone I have proposed? Please e-mail me.

The need for new incentives

Anderson, I., Pilling, S., Barnes, A. et al.(2009). Clinical Practice GuidelineNo.90: Update: Depression in adults inprimary and secondary care. London:Gaskell/British Psychological Society.

Bem, D.J. (2011). Feeling the future:Experimental evidence for anomalousretroactive influences on cognitionand affect. Journal of Personality andSocial Psychology, 100, 426–432.

Chalmers, I. (2002). Lessons for research

ethics committees. Lancet, 359, 174.Chan, A.W., Hrobjartsson, A., Haahr, M.T.

et al. (2004). Empirical evidence forselective reporting of outcomes inrandomized trials: Comparison ofprotocols to published articles.Journal of the American MedicalAssociation, 291, 2457–2465.

Egger, M., Davey Smith, G. & Altman, D.G.(eds.) (2001). Systematic reviews inhealth care: Meta-analysis in context

(2nd edn). London: BMJ Publishing.Fanelli, D. (2012). Negative results are

disappearing from most disciplinesand countries. Scientometrics, 90,891–904. doi: 10.1007/s11192-011-0494-7.

Fellows, L.K. & Farah, M.J. (2005). Isanterior cingulate cortex necessaryfor cognitive control? Brain, 128,788–796.

Hartshorne, J.K. & Schachner, A. (2012).

Tracking replicability as a method ofpost-publication open evaluation.Frontiers in ComputationalNeuroscience, 6, 8.

Ioannidis, J.P.A. (2005). Why mostpublished research findings are false.PLoS Medicine, 2, e124. doi:10.1371/journal.pmed.0020124

Isen, A.M. & Levin, P.F. (1972). Effect offeeling good on helping: Cookies andkindness. Journal of Personality and

“if researchers knowreplication attemptswill follow, they will bemore cautious aboutpublishing dubiousdata”

I Daniel J. Simons is at University of [email protected].


At least in my little corner of the world ofpsychological science, I see replicationsall the time. Often, for cognitivepsychologists, replications of experimentsare required for publication by editors inour most prestigious journals. To thosewho argue that a robust level of statisticalsignificance is all one needs to assurereplicability, I recall the aphorism(attributed to Confucius) that ‘Onereplication is worth a thousand t-tests’.

Researchers should, wheneverpossible, replicate a pattern of resultsbefore publishing it. The phenomenon of interest should be twisted, bent and

hammered to see if it will survive. If thebasic effect is replicated under the exactconditions as in the original study, but itdisappears when conditions are changed a bit, then the effect is real but brittle; theboundary conditions for obtaining theeffect are rather narrow. That is not ideal,but is certainly worth knowing.

In the mid-1990s, KathleenMcDermott and I were collaborating onresearch, and we tried two rather riskyexperiments, ones that seemed likely tofail but that were worth trying. To oursurprise, we found startling patterns ofdata in both procedures.

One case involved a technique forstudying false memories in a list-learningsituation in which the illusory memoriesseemed to occur nearly immediately andto be remarkably strong. After a firstclassroom pilot experiment, weconducted a proper second experimentthat confirmed and strengthened ourinitial results. We started to write up thetwo experiments. However, we were still a bit worried about the robustness of theeffects, so we continued experimentingwhile we wrote. We were able to confirmthe results in new experiments(employing various twists), so that by the

350 vol 25 no 5 may 2012

opinion

Twist, bend and hammer your effect

There is no substitute for direct replication –if you cannot reproduce the same resultusing the same methods then you cannothave a cumulative science. But conceptualreplication also has a very important role toplay in psychological science. What isconceptual replication? It’s when instead ofreplicating the exact same experiment inexactly the same way, we test theexperiment’s underlying hypothesis usingdifferent methods.

One reason conceptual replication isimportant is that psychologists aren’t alwaysdealing with objectively defined quantitieslike 2mg of magnesium; instead we have tooperationalise conceptual variables in aconcrete manner. For instance, if we want to test the effects of good mood on helpingbehaviour, what counts as a good mood andwhat counts as helping behaviour? Theseare not objectively defined quantities, so wehave to decide on something reasonable. Forexample, in the 1970s Alice Isen found thatpeople were far more likely to help someonewho had dropped some papers after theyhad found a dime in a phone booth (Isen &Levin, 1972). But we have even moreconfidence in this result now that it’s been

conceptually replicated: helping has beenfound to increase not only after findingmoney, but after reflecting on happymemories, doing well on a test, or receivingcookies. Numerous different ways ofmeasuring helping behaviour have beenused aswell. Inother words,even ifnobody hadever tried todirectlyreplicate theoriginalresearch,theconceptualreplicationsgive usconfidence that the underlying hypothesis –that a positive mood increases helpingbehaviour – is correct.

In the recent debate (seetinyurl.com/cfkl2gk and tinyurl.com/7ffztux)about the failure to replicate John Bargh’sfinding that priming the elderly stereotypeleads people to later walk more slowly as

they leave the experiment, there has been a great deal of misunderstanding because of confusion between direct replication andconceptual replication. While it is true thatthere have been few direct replications ofthat finding, the underlying hypothesis that

‘activating a stereotypein the perceiver withoutthe perceiver’sknowledge wouldincrease the perceiver'sown behaviouraltendencies to act in linewith the content of that

stereotype’ has beenreplicated many times. It was even conceptuallyreplicated in the verysame series of publishedstudies in which the

original ‘slow walking’ study was published,as well as by independent researchers atdifferent universities around the world.Taking these conceptual replications intoaccount, most psychologists are not nearly as troubled by a single failure toreplicate the result as it may appear thatthey should be.

The role of conceptual replication

“conceptualreplications give usconfidence that theunderlying hypothesisis correct”

I Dave Nussbaum is Adjunct Assistant Professorof Behavioral Science at the Booth School ofBusiness at the University of [email protected]

Social Psychology, 21, 384–388.Kriegeskorte, N., Simmons, W.K.,

Bellgowan, P.S.F. & Baker, C.I. (2009).Circular analysis in systemsneuroscience: The dangers of doubledipping. Nature Neuroscience 12(5),535–540.

Mitchell, G. (2012). Revisiting truth ortriviality: The external validity ofresearch in the psychologicallaboratory. Perspectives on

Psychological Science, 7, 109–117.Moonesinghe, R., Khoury, M.J. &

Janssens, C.J.W. (2007). Mostpublished research findings are false– but a little replication goes a longway. PLoS Medicine, 4, e28.

Poldrack, R.A., Fletcher, P.C., Henson,R.N. et al. (2008). 'Guidelines forreporting an fMRI study. NeuroImage40(2), 409–414.

Ritchie, S.J., Wiseman, R. & French, C.C.

(2012). Failing the future: Threeunsuccessful attempts to replicateBem’s ‘retroactive facilitation ofrecall’ effect. PLoS ONE, 7, e33423.

Schulz, K.F., Altman, D.G. & Moher, D.(2010) , CONSORT 2010 Statement:Updated guidelines for reportingparallel group randomised trials.BMJ, 340, c332.

Shallice, T. & Cooper, R.P. (2011). Theorganization of mind. Oxford: Oxford

University Press.Simmons, J., Nelson, L.D. & Simonsohn,

U. (2011). False-positive psychology:Undisclosed flexibility in datacollection and analysis allowspresenting anything as significant.Psychological Science, 22, 1359–1366.

Song, F., Parekh-Bhurke, S., Hooper, L. etal. (2009). Extent of publication bias indifferent categories of researchcohorts: A meta-analysis of empirical


there were a way to retract conferencepapers, we would have retracted that one.Most people don’t count conferencepresentations as ‘real’ for the scientific

literature,and our caseprovidesanother goodreason forthat attitude.At least wefound outthat our

effect was notreplicable beforewe published it.

The recentcritical examination of our field, thoughpainful, may lead us to come out strongeron the other side. Of course, failures toreplicate and the other problems (fraud,

the rush to publish) are not unique topsychology. Far from it. A recent issue of Science (2 December 2011) contained a section on ‘Data replication andreproducibility’ that covered issues inmany different fields. In addition, anarticle in the Wall Street Journal(‘Scientists’ elusive goal: Reproducingstudy results’, 2 December 2011) coveredfailures to replicate in medical research.So, failures to replicate are not only aproblem in psychology. Somehow, though,when an issue of fraud or a failure-to-replicate occurs in (say) field biology,journalists do not create headlinesattacking field biology or even all ofbiology. It seems that psychology isspecial that way.

This contribution is an edited version of acolumn for the APS Observer: see

tinyurl.com/6vju949

time the paper was published in theJournal of Experimental Psychology in1995, we had several more replicationsand extensions ready to be written.Papers by otherresearchers,replicating andextending the effect,were also quicklypublished – noproblem in gettingreplications publishedin this instance – andthus, within twoyears of its initialpublication, anyonein my field who caredcould know that our effect was genuine.

The second experiment we wereexcited about at that time did not have so happy a fate. After feeling confidentenough to submit the research to bepresented as a conference talk, wedecided we needed to replicate andextend the effect before submitting it forpublication. Altogether, we tried severalmore times over the next few years toreplicate the effect. Sometimes we gotresults that hinted at the effect in our new experiments, but more often theresults glared out at us, dull and lifeless,telling us our pet idea was just wrong. We gave up.

McDermott and I might well havepublished our initial single initialexperiment as a short report. After all, itwas well conducted, the result was novel,we could tell a good story, and the initialstatistics were convincing. I would betstrongly we could have had the paperaccepted. Luckily, we did not pollute the literature with our unreplicable data –but only because we required replicationourselves (even if the editors probablywould not have – brief reports do notencourage and sometimes do not permitreplication).

The moral of the story is obvious:Replicate your own work prior topublication. Don’t let others find out thatyou are wrong or that your work is tightlyconstrained by boundary conditions. If


opinion

“Replicate your own work prior topublication. Don’t letothers find out thatyou are wrong”

I Henry L. Roediger, III is at WashingtonUniversity in St. Louis. [email protected]

studies BMC Medical ResearchMethodology, 79. doi: 10.1186/1471-2288-9-79.

Sterling, T.D. (1959). Publication decisionsand their possible effects oninferences drawn from tests ofsignificance – Or vice versa. Journal ofthe American Statistical Association,54(285), 30 –34.

Sterling T.D., Rosenbaum W.L. &Weinkam, J.J. (1995). Publication

decisions revisited – The effect of theoutcome of statistical tests on thedecision to publish and vice-versa.The American Statistician, 49, 108–112.

Turner, E.H., Matthews, A.M., Linardatos,E. et al. (2008) Selective publication ofantidepressant trials and its influenceon apparent efficacy. New EnglandJournal of Medicine, 358, 252–260.

Tversky, A. & Kahneman, D. (1971). Beliefin the law of small numbers.

Psychological Bulletin, 76, 105–110.Wicherts, J.M., Bakker, M. & Molenaar, D.

(2011). Willingness to Share ResearchData Is Related to the Strength of theEvidence and the Quality of Reportingof Statistical Results. PLoS ONE, 6(11),e26828.doi:10.1371/journal.pone.0026828

A common retort to complaints about theinfrequency of replication in psychology isto point out that, while ‘strict replications’may be rare, ‘conceptual replications’ arenot (see Nussbaum, this issue).Conceptual replications build on priorstudies by employing common methodsor similaroperationalising ofvariables in newsettings to examinea theory’s limits,but many of thesenew settingsinvolve artificialrelations created in the laboratory or in an onlineexperiment.

What can welearn from a laboratory replication of anearlier laboratory study? We may learnthat the first study contained errors, wemay learn that a result is more or lessstrong than previously thought, or wemay learn that a result generalises to a

new laboratory situation. What we cannotlearn is whether the result holds up in thefield setting of ultimate interest to thetheorist.

In a recent study that compiled meta-analyses comparing effect sizes found inthe laboratory to those found in the field

for a widevariety ofpsychologicalphenomena, I found thatpsychologicalsubfields differmarkedly inthe degree to

which theirlaboratory resultshold up in the field(Mitchell, 2012).

The subfield traditionally most concernedabout the external validity of laboratorystudies, industrial-organisationalpsychology, performed remarkably well:lab and field results were highly correlated(r = .89), and the magnitude of effects was

The importance of replication in the field

“psychologicalsubfields differmarkedly in thedegree to which theirlaboratory resultshold up in the field”

I Gregory Mitchell, School of Law, Universityin Virginia. [email protected]


similar in the lab and field. Socialpsychology, on the other hand, performedmuch more poorly: over 20 per cent ofthe results from the laboratory changeddirections in the field, the correlation oflab and field results was significantlylower (r = .53) , and there were largerdisparities in effect sizes between the laband field. In short, if we consider therecord of a theory using only laboratorytests of that theory, theories from I-Opsychology will look less impressive thanthey should, and theories from socialpsychology will look more impressivethan they should.

Another noteworthy result from mystudy was the relative dearth of meta-analyses comparing laboratory and fieldeffects outside the areas of I-O and socialpsychology. Although it is possible thatfield studies are common in otherpsychological subfields but have escapedquantitative synthesis, the more likelyexplanation is that field replications arerelatively rare for many areas ofpsychology.

We should not pin our hopes for amature science on conceptual replicationsin the lab. As I found for a number oftheories in social psychology, successfullaboratory replications may be positivelymisleading about the size and direction ofan effect. Replication in the field, not thelaboratory, is crucial to the developmentof reliable theory.

352 vol 25 no 5 may 2012

opinion

Longstanding misunderstandings about replication

Ritchie, Wiseman and French’s failedattempt to replicate one of my nineexperiments on precognition (see p.356)has been widely interpreted by thepopular media and some psychologists as convincingly falsifying my claim that Ihave produced evidence of psi (ESP). Thiscoverage has revealed many longstandingmisunderstandings about replication –held even by those who should knowbetter.

The firstmisunderstandingis the sheeroverestimation ofhow likely it isthat anyreplication attemptwill be successful,even if the claimedeffect is genuine.Tversky andKahneman (1971)posed thefollowing problem to their colleagues atmeetings of the Mathematical PsychologyGroup and the American PsychologicalAssociation: ‘Suppose you have run anexperiment on 20 subjects and haveobtained a significant result whichconfirms your theory (z = 2.23, p < .05,two-tailed). You now have cause to run anadditional group of 10 subjects. What do

you think the probability is that theresults will be significant, by a one-tailedtest, separately for this group?’ Themedian estimate was .85, with 9 out of 10respondents providing an estimate greaterthan .60. The correct answer is less than.5 (approximately .48).

Second, it takes a long time forenough replications to accumulate todraw any firm conclusions. Wiseman set

up an onlineregistry forthose planningto replicate anyof myexperiments. Ashe noted: ‘Wewill carry out ameta-analysis of

all registeredstudies…that havebeen completed by1 December 2011.’The deadline was

only a few months after my articleappeared, and by then only threeexperiments other than those by Ritchieet al. had been reported. Two of them hadsuccessfully reproduced my originalfindings at statistically significant levels, a fact known to Ritchie et al., but notmentioned in the literature review sectionof their report. In any case, firm

“it takes a long timefor enoughreplications toaccumulate to drawany firm conclusions”

I Daryl J. Bem is Professor Emeritus ofPsychology at Cornell [email protected]

A view from the mediaWhen there’s a failure to replicatea study that was initially coveredin the press, should that failure toreplicate be reported in the mediatoo? There are two difficultieswith doing so. One is that afailure to replicate is verydifferent from deliberate fraud. InAll in the Mind we covered Stapel’sresearch on stereotyping in oneseries and then, in the next, afterit had been discovered to befraudulent, discussed the caseand how the discipline ofpsychology handles fraud. Fraudis an issue that’s interesting evenif a listener hadn’t heard of thespecific research before. Failureto replicate implies no wrong-doing; so, although it interestsme, you could argue that to thegeneral audience at which aprogramme like mine is aimed,

it’s less newsworthy. The second difficulty is that

we could report that a previousfinding might be incorrect – andin the recent case of the failure toreplicate John Bargh’s work we’dbe reporting that it’s possible thatreading words associated withage doesn’t make you walk moreslowly – but this would bereporting that something hasn’thappened. This is tricky unless itthe finding was very well knownin the first place. Outside theworld of psychology most peoplewill have never have heard ofBargh’s priming experiment. I canthink of very few studies in thehistory of psychology that arewell known enough for a failureto replicate to get a lot ofcoverage. Perhaps if someonedemonstrated that dogs can’t be

conditioned to salivate at thesound of bell then we might see‘Pavlov got it wrong’ headlines,

but it would need to be a study at that level of fame. For lessfamous studies the spaceafforded by blogs seems perfectfor this kind of discussion.

Then there’s the suggestionthat journalists should wait untila study has been replicated

before they reportit in the firstplace. But withthe scarcity ofpublications ofreplications youcould wait a verylong time for thishappen, by which

time the findingswouldn’t be new,which is after allwhat ‘news’ issupposed to be. Thisleaves journalists

having to trust in the peer reviewprocess, but the best journalistsdo stress that a study is the firstof its kind and should examine itcritically. To be fair, I have the

“very few studies in the history ofpsychology are wellknown enough for afailure to replicate toget a lot of coverage”

I Claudia Hammond is presenter of All in theMind on BBC Radio 4 and author of TimeWarped: Unlocking the Mysteries of TimePerception. [email protected]


more concern, though, is that while onaverage the field is improving, withnascent guidelines agreed upon (Poldracket al., 2008), common flaws still regularlyarise (Kriegeskorte et al., 2009).

I recently wrote a blog on thesematters (bit.ly/HCxDLM), with thecomments section hosting a constructivedebate between many leadingneuroscientists, demonstrating that theneuroimaging community are broadlyaware of these problems, and keen tomove forward.

There are several issues specific toneuroimaging. Unlike many behaviouralpsychology experiments, neuroimagingresearchers rarely replicate their ownstudy prior to publication. Of course, we all should be doing this, but when a standard fMRI study may total £50,000(including salaries) and involve 6–12months work, many believe that internalreplication is too great a burden to bear.

A second problem concerns the extentof knowledge required, both to generate a sound fMRI design, and particularly toanalyse the results. Although an entireyear of study of the techniques wouldmake considerable sense, the intense timepressures in academia make this optionunpalatable. In addition, only the largestneuroimaging departments have theinfrastructure to support regular coursesand a decent collective understanding ofcurrent methods. Smaller centres are thusmore likely to publish flawedneuroimaging papers, because of morelimited access to technical knowledge.

A third issue relates again to thecomplexity of fMRI, though in a moredisturbing way. Because of the lack ofmaturity of such complex techniquesapplied to massive datasets, as well manyvarieties of analysis, it is worryingly easy


opinion

The case of neuroimaging

luxury of working on a specialistprogramme where there’s time todiscuss the limitations of a study,and unlike journalists working ongeneral programmes I don’t have to compete for air-time with everyother subject on earth to get aneditor’s attention.

It’s true that the more staggeringa result is, the more likely it is to getreported. The media know that weall enjoy a discovery that’scounterintuitive. These are thestories that people retweet and telltheir friends about. They’re thesame studies that journals send outto the media, knowing they’ll getcoverage. And of coursepsychologists are interested in thecounterintuitive too. If lay theoriesabout human behaviour were allcorrect, much psychologicalresearch would be redundant.

But could some of the most surprising and thereforenewsworthy findings also turn out to be those that are hardest toreplicate? Without a full database ofreplications it’s hard to know. But inrecent years psychology has beenvery successful at increasing thecoverage of research in the media,often pushing the stories that aremost entertaining and curious. Here it’s not so much a lack ofreplications that’s the problem, butthe tendency that I’ve observed fortwo views of psychologists topredominate – therapists who knowwhat you’re thinking or people whoproduce research that’s entertainingyet largely trivial. For psychology tocontribute to society in all the waysthat it could, the next challenge is tofind a way to communicate thewhole range of research out there.

conclusions are clearly somewhatpremature at this point.

In mainstream psychology it takesseveral years and many experiments todetermine which variables influence thesuccess of replications. Consider, forexample, the well-known ‘mere exposureeffect’, first brought to the attention ofpsychologists by Robert Zajonc in 1968:Across a wide range of contexts, the morefrequently humans or other animals areexposed to a particular stimulus, the morethey come to like it. Twenty years later, ameta-analysis of over 200 mere exposureexperiments was published, confirmingthe reality of the effect. But that samemeta-analysis reveals that the effect failsto replicate on simple stimuli if other,more complex stimuli are presented inthe same session. It fails to replicate if toomany exposures are used, if the exposureduration is too long, if the intervalbetween exposure and the assessment ofliking is too short, or if participants areprone to boredom. As a result, the meta-analysis included many failures toreplicate the effect; several of themactually produced results in the directionopposite to prediction. A virtuallyidentical situation has now arisen in thecontext of currently popular ‘priming’experiments in cognitive and socialpsychology.

Finally, I believe that some majorvariables determining the success orfailure of replications are likely to be theexperimenters’ expectations about, andattitudes toward, the experimentalhypothesis. Psychologists seem to haveforgotten Robert Rosenthal’s extensive andconvincing demonstrations of this inmainstream psychology during the 1960s.The same effect has been observed in psiexperiments as well. Ironically, Wiseman,a psi-skeptic, has himself participated in atest of the experimenter effect in a seriesof three psi experiments in which he andpsi-proponent Marilyn Schlitz used thesame subject pool, identical procedures,

and were randomly assigned to sessions.Schlitz obtained a significant psi effect intwo of the three experiments whereasWiseman failed to obtain an effect in anyof the three. Thus, it may be pertinentthat Ritchie, Wiseman and French arewell-known as psi sceptics, whereas I andthe investigators of the two successfulreplications are at least neutral withrespect to the existence of psi.

The existence of such experimentereffects does not imply that psi results areunverifiable by independent investigators,but that we must begin to systematicallyinclude the experimenters’ attributes,expectations and attitudes as variables.

The field of neuroimaging deserves to beplaced in a special class where questionsof replication are concerned.

The most common current form,fMRI, has only been in widespread use forabout 15 years, and methods are still indevelopment. Somewhat understandably,therefore, many older papers includewhat we now acknowledge are clearstatistical flaws, and the community isdebating how we should deal with such a residue of potentially false results (e.g.see bit.ly/H80N5S and bit.ly/Ha0Bcg). Of


for neuroimagers, who should knowbetter, to bend the rules, and publishstriking results, which aren’t statisticallyvalid – but in a way that could be hiddenfrom the reader,and subtleenough to getpast the averagereviewer.Although thereare many solidneuroimagingpapers, even nowothers getpublished where,to the trained eye,these tricks arerelativelyapparent.

Consequently, I discount the majorityof neuroimaging papers I read. There wasa reasonable consensus for thisperspective on my blog, with NancyKanwisher claiming that ‘way less than50%’ and Matthew Brett estimating thatonly 30 per cent of neuroimaging paperswould replicate.

In most cases, such publications aredetrimental to the field: they implyinvalid techniques are acceptable, prolongwrong theories, and may waste

considerabletime and money,as otherscientists fail toreplicate a resultthat was neverreal in the firstplace. If thestudy is

clinically relevant,the damage may befar more critical, forinstance leading towrong medical

advice or inappropriate treatment. So how do we improve the situation?

Better education should be a priority:many researchers need to learn moremethods and common pitfalls, while largeneuroimaging departments should furtherhelp educate those within and outsidetheir borders, with courses and onlinematerial.

There should be a culture of greatertransparency, with any neuroimagingpaper refraining from any selectivityabout methods or results. Ideally, papersshould publicly release all raw imagingdata, so that another lab can validate theresults using different analyses. There aresteps towards this (e.g. www.fmridc.org),but far more could be achieved.

Finally, there should be a secondcultural shift, so that the communityvalues rigour above dramatic results. Thegatekeepers – the manuscript referees,editors and journals themselves – havethe most vital role to play here. Forcertain issues, there is a broad consensusabout what steps are invalid, so that achecklist of minimum standards could bewritten by a few key neuroimagingexperts, and this could be adopted byjournals, and upheld by their editors andthose participating in peer review.

Undoubtedly the neuroimaging fieldis gradually gaining sophistication andrigour. But as a community, we need to beembarrassed by our slow progress and doall we can to improve matters.

354 vol 25 no 5 may 2012

opinion

“it is worryingly easyfor neuroimagers tobend the rules, andpublish strikingresults, which aren’tstatistically valid”

I Daniel Bor is at the Sackler Centre forConsciousness Science at the University ofSussex. [email protected]

The journal editor’s view

It is commonplace to observe that thenorms of scientific publishing prioritisestriking, novel findings. The problem withthis emphasis is well known. Given thepotential for bias in the interpretation ofexperimental results, the ‘file-drawproblem’, and simply the nature ofprobability, a large number of publishedscientific findings – perhaps a majority(Ioannidis, 2005) – will be false. A simpleremedy for this problem is also wellknown. If a published scientific findingcan be replicated, it is much more likelyto be true (Moonesinghe et al., 2007).

To some extent, developments in thefield of scientific publishing are addressingthis problem. Internet-only journals –without limited printed pages to constrainthe number of articles that can beaccepted – can play a big role. Forinstance, PLoS ONE recently publishedRitchie et al.’s (2012) failure to replicateone of the studies reported in Bem’s(2011) highly surprising paper onprecognition. Ritchie et al.’s paper waspreviously rejected from other journals on the basis of its status as a replicationattempt. But the editorial criteria of PLoSONE are clear. It will publish any paperthat appropriately describes an experimentperformed to a high technical standard,described in sufficient detail, and drawingappropriate conclusions that aresupported by the data. That is not to say

that PLoS ONE will publish anything. Allsubmissions to PLoS ONE are subject topeer review, as with any other journal.Authors are offered a waiver if they areunable to pay the publication fee, buteditors and reviewers are blind to the feestatus of each submission. However,highly subjective considerations ofnovelty, impact and importance are notconsidered inmakingeditorialdecisions,and nomanuscriptwould berejected onthe groundsof being a‘mere’replication.Alongsidejournals suchas PLoS ONE,internet innovations such asPsychFileDrawer.org provide anotherrepository for the publication ofreplication attempts.

These developments in scientificpublishing certainly help. But even withgreater availability of outlets for thepublication of replication attempts,scientists must still be willing and able toconduct the relevant experiments. And

with the current reward structure ofscience, most researchers have littleincentive to spend time on replicationstudies, most of which will generaterelatively little benefit in careerprogression. This is perhaps a moreintractable problem.

If we are to conduct replicationstudies, how precisely should we attempt

to repeat the earlierwork? Whileattempts to replicatean earlier study asclosely as possiblemay be relativelyrare, perhaps themajority of publishedstudies involve an

incremental advancedrawing on previouswork. Studies of this kindthat corroborate as wellas extend earlier resultsmight be thought of as

‘conceptual replications’.It could be argued, however, that such

conceptual replications have limited value.Insofar as earlier results are notsupported, this may be attributed tomethodological differences with the earlierstudy. But insofar as such conceptualreplications do support earlier findings,typically they only do so in a rathernarrow way, based on a limited set of tasks

“internet-onlyjournals – withoutlimited printed pagesto constrain thenumber of articles –can play a big role”

I Sam Gilbert is research fellow at UCL'sInstitute of Cognitive Neuroscience and anacademic editor at PLoS [email protected]


Schachner, 2012) hint that this mayextend to psychology, but nocomprehensive data exist. If a reform of our practices were to be enacted, we would remainignorant as towhether itimprovedreplicability rates oreven harmed them.

The currentstate of affairs is sofamiliar that it maybear reminding ofjust how strange itis. As a field, wedemand precisionin our theories andrigour in theirempiricalvalidation. The factthat the data neededto test the theory donot exist is not anaccepted reason fordeciding the matterin their absence. The claim that currentmethodologies are or are not sufficient forguaranteeing reliability is an empiricalclaim, and there is nobody in a betterposition to address the validity of theseclaims than us. Evaluating empiricalclaims is our business. Nor is thereanybody to whom it matters more: lowreplicability threatens our ability to buildon previous findings and makecumulative scientific progress.

We have argued that the first, key stepis to systematically track which studieshave and have not been replicated, and

have discussed how this might be done (see PsychFileDrawer.org, for analternative but related approach). First,this provides the raw material for any

systematic study ofthe factors that leadto more replicableresearch. Just asimportant, it changesthe incentivestructure. Instead oflanguishing in a filedrawer or in anobscure corner ofsome journal,replication attemptswill be noted,highlighted and used,making it moreworthwhile toconduct and report

replication attempts. Bymaking it possible tocalculate a quantitativereplicability index foruse alongside impact

factor, journals that successfully ensurereliability of reported results will getcredit for their efforts.

This is not a trivial task: conducting,reporting and indexing replicationattempts will require changes in ourresearch practices, not to mention timeand money. We hope our proposal leadsto discussion and progress towardensuring replicability of findings. Thedifficulty of the problem does not mitigatethe necessity of solving it, and we believethat the scientific community can takeaction to resolve questions of replicability.

and methodologies that tend to berepeated from one study to the next. This is particularly problematic in a fieldsuch as cognitive neuroscience, which has diverse intellectual origins and is still undergoing rapid methodologicaldevelopment. Shallice and Cooper (2011)describe ‘a golden rule of cognitiveneuroscience – that all empirical brain-based findings must currently rely onsomewhat flaky inference procedures asfar as cognitive conclusions areconcerned’ (p.156). Given this goldenrule, one can be much more confident ina conclusion that has been derived fromseveral methodologies, each with theirown distinct sets of assumptions.Consider, for example, the enormousfunctional imaging literature on anteriorcingulate cortex and errordetection/conflict monitoring, against thestriking failure to detect an impairment insuch processes in patients with damage tothis brain region (e.g. Fellows & Farah,2005). I would argue that cognitiveneuroscience would benefit from both anincrease in attempts at direct replication,attempting to replicate a study as preciselyas possible, and also an increase in boldconceptual replications involving asdiverse a set of tasks and methodologiesas possible. Both of these might be moreinformative than methodologicallynarrow conceptual replications.


opinion

“By making itpossible tocalculate aquantitativereplicabilityindex for usealongsideimpact factor,journals thatsuccessfullyensure reliabilitywill get creditfor their efforts”

I Joshua K. Hartshorne and AdenaSchachner are at Harvard [email protected]

Where’s the data?

Recently, researchers in a number of fieldshave been raising the alarm, worried thatlittle of what appears in the pages of ourjournals is actually replicable (i.e. true).This is not the first time such concernshave been raised, and editorials calling forimproved methodological and statisticaltechniques in order to increase theproportion of reported results that aretrue go back decades. On the flip side,though they are less vocal about it, itappears that many in our field aresceptical about the pervasiveness of theproblem, taking the view that certainlysome results do not replicate, butprobably not very many, and in the longrun, science is self-correcting.

What is remarkable about thisdiscussion is how very nearly data-free itis. A handful of intrepid researchers – e.g.John Ioannidis and colleagues, looking atthe medical sciences – have managed todemonstrate worryingly low replicabilityrates in some targeted sub-literatures, andrecent surveys (e.g. Hartshorne &

Making it quick and easy to report replications

What proportion of the statisticallysignificant empirical findings reported inpsychological journals are not real? Thatis, for how many is there actually nophenomenon in the world resembling theone reported?

The intention of our basic statisticalpractices is to place a small cap on theproportion of errors we are willing totolerate in our literature, namely the alphalevel we use in our statistics – typically 5per cent. The alpha level is the maximumprobability for a published study that aneffect at least as large as the one measuredcould have occurred if there were actuallyno effect present. This ought to place anupper bound on the proportion ofspurious results appearing in ourliterature. But it doesn’t work.

One of the biggest reasons is

publication bias: the fact that scientists aremuch more likely to publish significantfindings than they are to report nulleffects. There are a host of reasons for this practice, some of them reflectingconfusions about statistics, somereflecting editorial tastes that may have some merit. We also know thatresearchers who find non-significantresults tend not to publish them, probablybecause they are both difficult to publishand there is little reward for doing so.

To see how this publication bias canlead to a literature infested with error,imagine that 1000 investigators eachconduct a study looking for a difference,and a difference actually exists in, say, 10per cent of these studies, or 100 cases. Ifthe investigations have a power of .5, then50 of these differences will be discovered.


In addition to allowing researchers toreport their method and results, a forumis provided for users to discuss eachreport. The website further allows users tovote on ‘What are important studies thatyour field of Psychology gives credenceto, but which – as far as you know – havenot been replicated in any publishedfollow-up work?’ Each registered user is

allowed up to three votes. As of thiswriting, the study with the most votes isentitled ‘Improving fluid intelligence withtraining on working memory’, which waspublished in PNAS in 2008. As the list

solidifies, we hope it will encourageinvestigators to conduct replications ofthese studies and report the results on thesite.

The most novel feature ofPsychFileDrawer is an ‘article-specificnetworking tool’ designed for users whoare timid about posting their findings orwho simply wish to connect with others

interested in a particularpublished finding about which nothing is yet posted to PsychFileDrawer. With thisfeature, users register theirinterest in learning aboutunpublished replicationattempts relating to aparticular study; whenever

other users express interest in the same study, the websiteautomatically puts these users intouch with each other via e-mail sothey can discuss their experiencesand hopefully post their results on

the site.The website is still in beta testing and

we continue to add new features. Wehope readers will visit and providesuggestions for how it might be improved.

Of the 900 studies performed looking foreffects that do not exist, 5 per cent or 45will succeed. The result, then, will be that45 out of 95 significant results (47 percent) will be type 1 errors. As JoeSimmons and colleagues (2011) recentlypointed out in Psychological Science,hidden flexibility in data analysis andpresentation is likely to inflate the rate of type 1 errors still further.

The harm done by publication bias has been recognised since at least1959, when Theodore Sterlingcanvassed 294 psychology papers andfound that 286 of them reported apositive result. Thirty-seven years later,Sterling re-evaluated the literature andconcluded that little had changed.Indeed, the problem appears to begetting worse, and not just inpsychology. The title of a recent paperin Scientometrics by Fanelli (2012)declares ‘Negative results aredisappearing from most disciplinesand countries’ based on an analysis ofvarious scientific fields.

The problem is so bad that after a series of calculations involving someadditional considerations, John Ioannidisconcluded in a 2005 paper that ‘mostpublished research findings are false’.Although the estimate of error dependson several unknowns regarding theproportion of effects being looked forwhich actually exist, statistical power, and so forth, Ioannidis’ conclusion is,unfortunately and disturbingly, quitereasonable.

Given that the incentives forpublishing null results are modest, wemust make it easy and quick for scientiststo report them. The traditional system ofjournal article cover-letter writing,submission, rejection, repeat until sentout for review, wait for reviews, revise,and write rejoinders to reviewers is fartoo consuming of researchers’ time.

To provide a quick way to reportreplication studies, we have created,together with Bobbie Spellman and SeanKang, a new website calledPsychFileDrawer.org. The site is designedspecifically for replications of previouslypublished studies, as this allows thereporting process to be quick. In the caseof exact replications, for instance,researchers can simply indicate that theirmethodology was identical to thepublished study. When their methoddiffers somewhat, they can report onlythe differences. Researchers are invited toreport the results in as much detail asthey can, but we believe even those whosimply report the main effect andassociated statistics are making a valuablecontribution to the researcher community.

356 vol 25 no 5 may 2012

opinion

“The site isdesignedspecifically forreplications ofpreviouslypublished studies”

I Alex O. Holcombe is in the School of Psychology,University of Sydney. [email protected]

I Hal Pashler is in the Department of Psychology,University of California San Diego

Share your data

Researchers typically hold strong beliefs,can be quite ambitious, and seldomrespond dispassionately to p-values fromtheir own analyses. Moreover, statisticalanalysis of psychological data involvesnumerous decisions (e.g. on how to dealwith outliers, operationalise dependentvariables, select covariates, etc.) that arenot always carved in stone. This providesresearchers withconsiderableroom tomanoeuvrewhen analysingtheir data.These issues areignored in mosttextbooks.

MarjanBakker and I recentlydocumented analarmingly highprevalence of errors in the reporting ofstatistical results in psychology. In nearlyhalf of papers we scrutinised weencountered at least one inconsistentstatistical result. Errors typically alignedwith the researcher’s hypothesis and somay introduce bias. We employed asuperficial test of statistical accuracy, so it would be interesting to conduct a fullreanalysis of the raw data. Unfortunately

quite a few researchers are reluctant to share their data for independentreplication of their analyses, even whenthey are ethically obliged to do so. Mycolleagues and I also found thatresearchers who were unwilling to sharetheir data report more inconsistentstatistical results and more results thatappear contentious. It is tempting to

accuse theseresearchers of a lack of integrity,but we should notforget that dataarchiving too isnot part of mosttextbooks.

As researchers,we are accustomed

to dealing withobserver biases byblinding proceduresduring data collection.

It is rather naive to pretend that statisticalanalyses are completely immune tosimilar biases. Whenever feasible(ethically) we should start publishing thedata to supplement research papers. Thisenables replication of statistical outcomes(even by sceptics) and superior means fordetecting misconduct. It also opens thedoor to future use of the data and debateson what science is all about.

“researchers whowere unwilling toshare their data reportmore results thatappear contentious”

I Jelte M. Wicherts is an associate professorat the Department of Methods and Statisticsat Tilburg University. [email protected]


The past 50 years have seen atransformation in the evaluation ofpsychological interventions with therandomised controlled trial establishingpsychological interventions on a par withpharmacological treatments in thetreatment of mental disorders (e.g.Anderson et al., 2009).

Replication of the results of earlyclinical trials has been and remainscentral to the proper evaluation of allhealthcare interventions. Good scientificpractice demands replication, but thereare challenges in achieving this. Theseinclude, the ethical behaviour of healthprofessionals, publication bias, conflictsof interest, trial design and reporting, andthe systematic evaluation of datasets ofspecific interventions.

Chalmers (e.g. 2002) has argued thathealth service professionals conductingresearch have an overriding ethicalresponsibility to any research participantin line with those governing professionalswhen treating patients. This imposesresponsibilities not only for the conductof clinical trials but also for theirpublishing and reporting. Chalmersargues that a rigorous application andmonitoring of these ethicalresponsibilities is one of the best ways toensure effective research practice. Readersmay want to hold this in mind whenconsidering the issues discussed below.

Publication bias remains a majorproblem, and the consequences can besignificant leading to inappropriate andharmful treatment. This is perhaps mostobvious in pharmacological interventions(e.g. Turner et al., 2008), but it is wrongto assume that the problem does notapply topsychologicalinterventions.Evidence suggeststhat this problemlies not with thedecisions ofjournals (theypublish just asmany negative aspositive trials –Song et al., 2009)but withresearchers whodo not seek topublish theirwork. Lack of transparency aboutconflicts of interests may be crucial here.Conflicts of interest may also impact onthe accurate reporting of the outcomes,with evidence that over 50 per cent oftrials do not report the initial primaryoutcome (Chan et al., 2004). Again thiscan lead to the use of inappropriate andharmful treatments.

Considerable efforts have been madeto address these problems, including the

development of procedures for trialreporting (the CONSORT Statement:Shultz et al., 2010) and the pre-registration of trials protocols (includingthe predetermined primary outcome)without which publication of trials willno longer be possible. The use ofsystematic reviews and meta-analysis(Egger et al., 2001) to increase precision

about the effectsof treatment canalso help addresssome of theseproblems,particularlywhere carefulattention isdevoted to thequality ofincluded studiesand thecomparatorsused in trials.

Clinicaltrials are costly

and make significant demands onparticipants; badly conducted orunpublished studies not only wasteresources, they also do harm. A numberof methodological developments haveemerged to address this issue, but they donot remove the responsibility of all healthprofessionals to uphold the higheststandards not only in their clinicalpractice but also in their researchpractice.


opinion

“Replication of earlyclinical trials has beenand remains central tothe proper evaluationof all healthcareinterventions”

I Stephen Pilling is the Director of CORE (Centrefor Outcomes Research & Effectiveness) atUniversity College London, and the Director of theNational Collaborating Centre for Mental [email protected]

Evaluating psychological treatments

Opinion special – Have your sayAs ever, we would like to hear your views onthis topic. E-mail your letters for publicationto [email protected] or post to theLeicester office address.

I would also like to know what you think ofthe ‘opinion special’ more generally. This isthe first we have tried in this format: a largecollection of brief and personal responses torecent events, accompanying a main article. Ithink this allows us to respond in a timelyfashion, building on our role as a forum forcommunication, discussion and controversywhile fulfilling the main object of the Society’sRoyal Charter: ‘to promote the advancementand diffusion of a knowledge of psychologypure and applied’. But do you agree? Let meknow on [email protected].

If feedback is positive, we would look toinclude more of these opinion specials, andwould welcome your suggestions for topicslikely to engage and inform our large anddiverse audience.Dr Jon Sutton Managing Editor


Documents

The Psychologist May 2012