6
REPRODUCIBILITY A datathonmodel to support cross-disciplinary collaboration Jerôme Aboab, 1 * Leo Anthony Celi, 1 Peter Charlton, 1 Mengling Feng, 1 Mohammad Ghassemi, 1 Dominic C. Marshall, 1Louis Mayaud, 1 Tristan Naumann, 1 Ned McCague, 1 Kenneth E. Paik, 1 Tom J. Pollard, 1 Matthieu Resche-Rigon, 1 Justin D. Salciccioli, 1 David J. Stone 2,3 In recent years, there has been a growing focus on the unreliability of published bio- medical and clinical research. To introduce effective new scientific contributors to the culture of health care, we propose a datathonor hackathonmodel in which partici- pants with disparate, but potentially synergistic and complementary, knowledge and skills effectively combine to address questions faced by clinicians. The continuous peer review intrinsically provided by follow-up datathons, which take up prior uncompleted projects, might produce more reliable research, either by providing a different perspec- tive on the study design and methodology or by replication of prior analyses. The research problems of health care extend beyond clinical medicine and cannot be solved by physicians working in isolation. Clinicians are well aware of the uncertainties and information gaps that permeate the practice of medicine, which range from the trivial but annoying to the significant and seemingly in- tractable. However, busy clinicians typically do not have the time, energy, or training to ad- dress questions they encounter in day-to-day practice. Instead, scientists, physician-scientists, and engineers carry out biomedical research designed to decipher clinical problems, but these efforts often are isolated from critical clinical input. In addition, recent accusations against this research community include the production of an overwhelming number of published studies that are irreproducible, underpowered, poorly designed, and other- wise lack statistical rigor (15). The typical clinician solves problems by ap- plying basic medical-science concepts and di- agnostic and treatment protocols mastered during medical training. In academic facilities, there are physicians who see patients but spend most of their time working on basic, translational, or clinical research. In fact, some physician-scientistseven those on medical school facultiesrarely have exposure to clin- ical medicine. The nonresearcher clinicians seem to be getting busier and busier with pa- tient care, but their input is critically important to identify relevant medical questions and prob- lems and to tackle them with a deep under- standing of the clinical context. This must be accomplished in an era in which progressively limited resources are struggling to cope with increasingly expensive health care. Over the past decade or so, data scientists and engineers have become increasingly drawn to and involved with health care (5). This interest has recently been accelerated by the now near-universal digitization of health care, which provides data scientists with a toe- hold and a progressively important role in dai- ly clinical care as well as biomedical research. How can this group of diverse talents, interests, and schedules be combined for productive col- laboration? We propose an adaptation of the computer industrys hackathon modelan event in which diverse professionals (entrepre- neurs, software developers, designers, engineers) work together on software development over a short period of time. In the biomedical arena, a hackathon similarly would bring together par- ticipants with disparate but complementary knowledge and skillsfor example, academic biomedical researchers, industry scientists, bioengineers, physicians, statisticians, and computational biologiststo address the myr- iad questions and unmet medical needs faced by clinicians. It is difficult to establish a platform for the real-time, respectful, and effective exchange of ideas among specialists who are usually sep- arated by time, space, methods, attitudes, and terminology (language). The hackathon pro- vides just such a platform for initial conception and design of a study as well as subsequent anal- ysis and publication (in the literal sense of making the results public). The hackathon also provides a real-time infrastructure for the kind of simultaneous translation of specialty terminologies and jargon that is necessary for effective communication and cooperation. Last, follow-up hackathonsevents that pick up and advance prior uncompleted projectsoffer a form of continuous peer review that might pro- duce more reliable research either through rep- lication of published results or by virtue of their differing perspectives on study conception and design. HACKATHON FOR DATA ANALYSIS Originally the brainchild of Silicon Valley, CA, hackathons have proven to be successful mod- els for innovation in business settings and are typically organized as intense, short-duration, competitions in which teams generate innova- tive solutions (5). The hackathon model inte- grates collaboration, idea generation, and group learning by joining various stakeholders in a mutually supportive setting for a limited peri- od of time. For health data analysis, the goal of the hackathon is to assemble clinical experts, data scientists, statisticians, and those with domain- specific knowledge to create ideas and produce clinically relevant research that reduces or eliminates biases, relies on sound statistical rig- or and adequate data samples, and aims to produce replicable results. For that purpose, we have coined the term datathonas a port- manteau of data + hackathon, accentuating the application of the hackathon model to data analytics. For example, a critical-care datathon is an event in which participants are brought together to form interdisciplinary teams and answer research questions in the field of criti- cal (that is, intensive) care. In addition, using the term datathon avoids the potentially neg- ative connotation of the term hackathon in the minds of some participants and observers. Although this approach might be more dif- ficult for some study designs, the analysis of health data is especially suited to benefit from this model, because the data have already been collected and are readily available to research- ers in a machine-readable format. Whereas the process flow in translational medicine is con- ventionally from the research bench to the bedside, the flow in the datathon model is from the point of care to research, or more spe- cifically, to the database and the analytical team. Approaching health care in this manner 1 Massachusetts Institute of Technology (MIT) Critical Data, MIT, Cambridge, MA 02139, USA. 2 Departments of Anesthesiology and Neurosurgery, University of Vir- ginia School of Medicine, Charlottesville, VA 22908, USA. 3 Center for Wireless Health, University of Virginia School of Engineering and Applied Science, Charlottesville, VA 22904, USA. *All authors contributed equally to this manuscript. Corresponding author. E-mail: dominic.marshall12@ imperial.ac.uk PERSPECTIVE www.ScienceTranslationalMedicine.org 6 April 2016 Vol 8 Issue 333 333ps8 1 by guest on November 23, 2017 http://stm.sciencemag.org/ Downloaded from

A datathon model to support cross-disciplinary collaboration · 2018. 7. 11. · REPRODUCIBILITY A “datathon” model to support cross-disciplinary collaboration Jerôme Aboab,1*

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

  • P ER SP EC T I V E

    REPRODUC IB I L I TY

    Dow

    nloaded

    A “datathon” model to supportcross-disciplinary collaborationJerôme Aboab,1* Leo Anthony Celi,1 Peter Charlton,1 Mengling Feng,1

    Mohammad Ghassemi,1 Dominic C. Marshall,1† Louis Mayaud,1

    Tristan Naumann,1 Ned McCague,1 Kenneth E. Paik,1 Tom J. Pollard,1

    Matthieu Resche-Rigon,1 Justin D. Salciccioli,1 David J. Stone2,3

    In recent years, there has been a growing focus on the unreliability of published bio-medical and clinical research. To introduce effective new scientific contributors to theculture of health care, we propose a “datathon” or “hackathon” model in which partici-pants with disparate, but potentially synergistic and complementary, knowledge andskills effectively combine to address questions faced by clinicians. The continuous peerreview intrinsically provided by follow-up datathons, which take up prior uncompletedprojects, might produce more reliable research, either by providing a different perspec-tive on the study design and methodology or by replication of prior analyses.

    by guest on Novem

    ber 23, 2017http://stm

    .sciencemag.org/

    from

    The research problems of health care extendbeyond clinical medicine and cannot be solvedby physicians working in isolation. Cliniciansare well aware of the uncertainties andinformation gaps that permeate the practiceof medicine, which range from the trivial butannoying to the significant and seemingly in-tractable.However, busy clinicians typically donot have the time, energy, or training to ad-dress questions they encounter in day-to-daypractice. Instead, scientists, physician-scientists,and engineers carry out biomedical researchdesigned to decipher clinical problems, butthese efforts often are isolated from criticalclinical input. In addition, recent accusationsagainst this research community include theproduction of an overwhelming number ofpublished studies that are irreproducible,underpowered, poorly designed, and other-wise lack statistical rigor (1–5).

    The typical clinician solves problemsby ap-plying basic medical-science concepts and di-agnostic and treatment protocols masteredduring medical training. In academic facilities,there are physicians who see patients butspend most of their time working on basic,translational, or clinical research. In fact, somephysician-scientists—even those on medicalschool faculties—rarely have exposure to clin-

    1Massachusetts Institute of Technology (MIT) CriticalData, MIT, Cambridge, MA 02139, USA. 2Departmentsof Anesthesiology and Neurosurgery, University of Vir-ginia School of Medicine, Charlottesville, VA 22908, USA.3Center for Wireless Health, University of Virginia Schoolof Engineering and Applied Science, Charlottesville, VA22904, USA.*All authors contributed equally to this manuscript.†Corresponding author. E-mail: [email protected]

    ical medicine. The nonresearcher cliniciansseem to be getting busier and busier with pa-tient care, but their input is critically importantto identify relevantmedical questions andprob-lems and to tackle them with a deep under-standing of the clinical context. This must beaccomplished in an era in which progressivelylimited resources are struggling to cope withincreasingly expensive health care.

    Over the past decade or so, data scientistsand engineers have become increasinglydrawn to and involved with health care (5).This interest has recently been accelerated bythe now near-universal digitization of healthcare, which provides data scientists with a toe-hold and a progressively important role in dai-ly clinical care as well as biomedical research.Howcan this groupof diverse talents, interests,and schedules be combined for productive col-laboration? We propose an adaptation of thecomputer industry’s hackathon model—anevent in which diverse professionals (entrepre-neurs, software developers, designers, engineers)work together on software development over ashort period of time. In the biomedical arena, ahackathon similarly would bring together par-ticipants with disparate but complementaryknowledge and skills—for example, academicbiomedical researchers, industry scientists,bioengineers, physicians, statisticians, andcomputational biologists—to address themyr-iad questions and unmet medical needs facedby clinicians.

    It is difficult to establish a platform for thereal-time, respectful, and effective exchange ofideas among specialists who are usually sep-arated by time, space, methods, attitudes, andterminology (language). The hackathon pro-

    www.ScienceTranslationalMedicin

    vides just such a platform for initial conceptionanddesignof a study aswell as subsequent anal-ysis and publication (in the literal sense ofmaking the results “public”). The hackathonalso provides a real-time infrastructure forthe kind of simultaneous translationof specialtyterminologies and jargon that is necessary foreffective communication and cooperation. Last,follow-up hackathons—events that pick up andadvance prior uncompleted projects—offer aform of continuous peer review thatmight pro-duce more reliable research either through rep-lication of published results or by virtue of theirdiffering perspectives on study conception anddesign.

    HACKATHON FOR DATA ANALYSISOriginally the brainchild of SiliconValley, CA,hackathons have proven to be successful mod-els for innovation in business settings and aretypically organized as intense, short-duration,competitions in which teams generate innova-tive solutions (5). The hackathon model inte-grates collaboration, idea generation, and grouplearning by joining various stakeholders in amutually supportive setting for a limited peri-od of time.

    For health data analysis, the goal of thehackathon is to assemble clinical experts, datascientists, statisticians, and thosewith domain-specific knowledge to create ideas and produceclinically relevant research that reduces oreliminates biases, relies on sound statistical rig-or and adequate data samples, and aims toproduce replicable results. For that purpose,we have coined the term “datathon” as a port-manteau of data + hackathon, accentuatingthe application of the hackathonmodel to dataanalytics. For example, a critical-care datathonis an event in which participants are broughttogether to form interdisciplinary teams andanswer research questions in the field of criti-cal (that is, intensive) care. In addition, usingthe term datathon avoids the potentially neg-ative connotation of the term hackathon in theminds of some participants and observers.

    Although this approachmight bemore dif-ficult for some study designs, the analysis ofhealth data is especially suited to benefit fromthis model, because the data have already beencollected and are readily available to research-ers in amachine-readable format.Whereas theprocess flow in translational medicine is con-ventionally from the research bench to thebedside, the flow in the datathon model isfrom the point of care to research, ormore spe-cifically, to the database and the analyticalteam. Approaching health care in this manner

    e.org 6 April 2016 Vol 8 Issue 333 333ps8 1

    http://stm.sciencemag.org/

  • P ER SP EC T I V E

    by guest on Novem

    ber 23, 2017http://stm

    .sciencemag.org/

    Dow

    nloaded from

    is a form of reverse engineering, because itexamines the data record of the system inputsand elements that produce the observed out-comes. As databases become larger and moredetailed, they will come to represent more andmore accurate depictions of a kind of virtualclinical reality providing even richer resourcesfor research.

    Health care data admittedly are subject tointrinsic and extrinsic problems of accuracy.Therefore, analytical studies based on thesecondary use of health care data inherentlysuffer this particular fragility. The data canbe wrong for a large variety of reasons, includ-ing human and machine misentry, missingdata, and faulty assessments, among others.However, all scientific research suffers fromsome degree of data unreliability—those whohave worked in wet labs understand that notall reported results reflect repeatable experi-mental perfection. One further advantage ofa team approach is that faulty data potentiallycan be recognized from a variety of viewpointaxes by the various expert types required to doso. For example, there may be technical prob-lems whose recognition requires a data scientistor software engineer, clinical issues requiringa clinician, and other issues best recognized bydomain experts.

    AN EXEMPLAR DATABASEThe MIMIC (Medical Information Mart forIntensive Care) database contains patient-leveldata for intensive care unit (ICU) admissionsat Beth Israel Deaconess Medical Center from2001 to 2012 (6). Developed and maintainedby the Laboratory of Computational Physiolo-gy at MIT, the publically accessible databasecontains anonymized data from more than60,000 ICU admissions, with data storedacross multiple tables and thousands of fields.

    In order to give all teams in the datathonsimultaneous and continuous access to theMIMIC database, the data were stored in theSAP HANA in-memory relational databasemanagement system and hosted on cloudservers. Database instances were deployeddynamically according to the data querydemands of participants to allow for moreefficient queries.

    Note that the current MIMIC databaseserves as a model for the clinical databases ofthe future, which will grow more completeand, therefore, more useful as electronic healthrecords become nearly universally imple-mented and an open, shared data philosophybecomes more widespread. Already, discus-sions are ongoing to build an international

    consortium that leverages data from ICUs intheUnited States, theUnitedKingdom, France,Belgium, Brazil, Japan, Australia, and NewZealand. Other specialties are following suit:in April 2015, the American Heart Associationconvened a 1.5-day forum to discuss criticalissues in the acquisition, analysis, and sharingof data in the field of cardiovascular and strokescience (7).

    CRITICAL CARE DATATHONOur group hosted International Critical Caredatathons in September 2014 and September2015. The 2014 datathon spanned three daysand three countries, with teams located inCambridge,MA; London; andParis.More than200 participants across all locations gathered toconduct secondary data analysis using theMIMIC database. The organizing committeemade a particular effort to prepare participantsin advance of the event. Video lectures and on-line tutorials were available on the eventwebsiteto educate participants about the MIMICdatabase, including guidelines to define clini-cal questions and tips to extract variables.

    The introductory Friday evening sessionbegan with an overview of expectations fortheweekend; a list of dos anddon’tswere high-lighted to make the weekend as productive aspossible. Participants were encouraged to col-laborate, fail fast, and iterate. This session wasfollowed by problem-based pitches, in whichparticipants shared clinical problems andknowledge gaps that could be addressed usingthe MIMIC database. After problem pitching,participants divided into specifically diverseteams during a stand-up dinner. Teams wererequired to have at a minimum one clinician,one data engineer, and one data scientist. Wealso encouraged participants to promote theevent on Twitter (with labels #criticaldataand #criticaldata2014). The session endedwithan overview of the MIMIC database and ahands-on tutorial, which helped minimizethe technical hurdles related to software instal-lation, cloud server connection, and program-ming syntax.

    All of Saturday and part of Sunday werespent “hacking.” During the event, partici-pants were encouraged to share code (for ex-ample, SQL, Python, R, SAS, and STATA)used for data extraction and statistical analysisin a GitHub repository. Use of the GitHub re-pository served multiple purposes: (i) facili-tated code review among members of thesame team; (ii) facilitated interteam code re-view to cross-check methods for similarlythemed projects (8, 9); and (iii) allowed code

    www.ScienceTranslationalMedicin

    to be reused for future related studies, whilepermitting customizations to be made basedonmodifications in a study design. On Sundayafternoon, preliminary findings were presentedby each team and focused on the difficultiesencountered and insights gained during thecourse of the weekend. However, the threesimultaneous events unfolded with subtle dif-ferences and have provided insight for im-provement in future datathons.

    The 2014 Cambridge event saw more than100 registrantswith awide rangeof self-identifiedprofessions, including biomedical/computerengineers (n = 37), physicians (n = 33), datascientists (n = 33), entrepreneurs (n = 22), bio-statisticians (n = 11), patients (n = 4), nurses(n = 3), and pharmacists (n = 2). The presenceof four patients was singular to the Cambridgehackathon and demonstrated a valuable op-portunity for researchers to include a popula-tion usually relegated to study participation assubjects. In addition, the Cambridge event in-cluded some experienced participants whohad previously attended other hackathons, in-cluding our inaugural, single-site MIT CriticalCare datathon in January 2014, fromwhichwegained valuable experience in the planningand execution of an event of this kind.

    In London, more than 40 participantsformed seven interdisciplinary teams. Researchtopics included examination of the relationshipbetween lactic acidosis andmortality, the asso-ciation between supranormal oxygen levels inthe blood and survival after subarachnoidhemorrhage, and visualization of outcomesamong obese patients in the ICU. The eventalso had a number of invited speakers who dis-cussed opportunities arising from the in-creasing availability of data in health care.

    The Paris event was structured to ensurethat the relationships formed at the datathonwould transition into long-term collabora-tions. To accomplish this, five teams werelimited to three participants each, and each teamincluded one intensivist, one biostatistician, andone data engineer. Instead of problem-pitching,the clinical questions were selected frompreviously submitted ideas. Last, in the spiritof continuous collaboration and peer re-view, the Paris-based organizing committeescheduled regular follow-up meetings andprogress presentations with the participat-ing teams.

    TheMIT Critical Care datathon has servedas amodel for and ushered in similar events inother specialties. The following year, in June2015, the Cross Neurodegenerative Diseasesdatathon was held in Boston, MA (10), and

    e.org 6 April 2016 Vol 8 Issue 333 333ps8 2

    http://stm.sciencemag.org/

  • P ER SP EC T I V E

    by guest on Novem

    ber 23, 2017http://stm

    .sciencemag.org/

    Dow

    nloaded from

    focused on developing approaches to under-stand similarities and differences across a vari-ety of neurodegenerative diseases. About 30scientists from leading institutions aroundthe world participated in five teams, whichhad access to public datasets of interest to re-search teams studying Parkinson’s and otherneuromuscular diseases.

    In September 2015, the follow-up secondinternational MIT Critical Care datathon washeld, bringing back participants from the pre-vious year’s event as well as new ones. Asoftware platform (Fig. 1) was provided thatintegrated with Jupyter Notebook and includ-ed a custom Python package to facilitate dataprocessing and analysis (11). The platformconnected directly to the database, alloweddocumentation and information sharingamong the team members as regards the re-search question and study design, and, mostimportantly, facilitated archiving and sharingof queries and codes among the teams for co-hort selection, variable extraction, and data vi-sualization and analysis. For example, teamsassisted each other in identifying vasopressorand mechanical ventilation use, their dosesor settings, and durations of use. Last, we en-couraged the teams to publish their project no-tebooks at the MIMIC website once theypublish their manuscripts, in order to sharetheir patient cohorts, queries, and codes usedfor analysis.

    Whereas the objective of the first Interna-tional Critical Care datathon was to draw datascientists and frontline clinicians to a researchcommunity aroundMIMIC, the second inter-national event focused on attracting partici-pants who are committed to publishing theirfindings. The first datathon produced onepublication (12), and another is currently un-der review. The second datathon is nowpoisedto deliver its goal: Six out of the 10 teamssubmitted abstracts of their projects for pre-sentation at the 2016 American Thoracic Soci-ety meeting in San Francisco, CA. All sixprojects were accepted.

    LESSONS LEARNEDBiomedical research is often undertaken by in-vestigators operating independently andworking in isolation (13). For experimentssuch as double-blind randomized controlledtrials, independence and isolation are criticalmeasures to reduce bias. However, theoveremphasis on isolation and independencein all types of research has come at the costof validity and reproducibility. This problemis only exacerbated by the academic reward

    A

    B

    C

    Fig. 1. Sample snapshots. Shown are screen shots of the Jupyter Notebooks used in the SecondInternational Critical Care datathon to facilitate sharing of queries and codes among the teams. (A) Acustom Python package was developed to assist with transforming and analyzing data. (B) Notebookssimplified data exploration. (C) Data visualization.

    www.ScienceTranslationalMedicine.org 6 April 2016 Vol 8 Issue 333 333ps8 3

    http://stm.sciencemag.org/

  • P ER SP EC T I V E

    by guest on Novem

    ber 23, 2017http://stm

    .sciencemag.org/

    Dow

    nloaded from

    system, which is largely tailored toward thepromotion of individual researchers ormultiple researchers working in the same areaof basic science, rather than work by inter-disciplinary teams. Further, the current aca-demic system effectively hinders potentiallybeneficial idea generation and collaborationacross disparate entities, because data sharingrestraints keep valuable data within hostinstitutions. In contrast, our datathons aimto accelerate the discovery of evidence-basedknowledge, increase collaboration via an open,cross teams−based approach during the designphase of research, and create a system ofiterative peer review that might improve thereliability of research.

    The importance of addressing communi-cation failure and a lack of shared responsibil-ity among collaborators was evident in anevent involving a Duke laboratory that joltedthe researchworld (14). Flawed gene-expressiontests developed in the laboratory were used inthree clinical trials to determinewhich chemo-therapy treatment patients with lung or breastcancer would receive. Throughout this pro-cess, the responsibilities of the co-investigatorson the research team and lines of accountabil-ity were apparently unclear. The U.S. Instituteof Medicine, in response, published a manu-script in 2012 entitled Evolution of TranslationalOmics: Lessons Learned and the Path Forward(15). The report emphasized that given thecomplexity and multidisciplinary nature ofomics research, there should be heightenedattention in promoting a culture of teamworkacross disciplines, in addition to scientific in-tegrity and transparency.

    Last, a potential advantage of upfront andongoing collaborative interdisciplinary reviewis better recognition of which ideas are clearlyworth investigation, which are interesting butnot particularly promising, and which are ex-tremely unlikely to be successfully pursued. Attimes, the latter are just the kinds of ideas thatare new and innovative and should thereforebe pursued. But at other times, wild ideasmight simply lead to dead ends and wasted ef-forts. Team collaboration and discussion, inaddition to the group’s development of ideasin the first place, should help in determiningwhether such ideas are worthy of the effortinvolved to investigate them. With the po-tential for more efficiency in research, theremight also be the opportunity for pilot efforts,in which unlikely but interesting ideas are in-vestigated at a smaller scale by a select smallerteam broken off from the main group for thispurpose. Overall, the goal is to pursue impor-

    tant and interesting ideas that seem likely topay off but also not to neglect unusual ideasthat might ultimately be correct.

    DOING DATATHONS RIGHTWith the increasing complexity of research ina world where the low-hanging fruit has beenpicked and the issues are often at the scale of“big data,” research has already to some extentbecome a team-based process. We proposethat multi-expertise viewpoints inserted open-ly into the processwould aid in the conception,development, processing, and publication ofresearch that is more reliable while hopefullyremaining as interesting, innovative, and im-portant as that produced by the current sys-tem. The datathon approach offers potentialsolutions to research problems such as poorupfront study design with inadequate statisticsand an insufficient number of data samples. Itmight alsoprovide thebenefits ofmore andcon-tinuous transparency, as the input of many andmore objective “eyes” is applied to the entireenterprise; this could lead to more objectiveand valid analyses and contents being providedto those in the next stage of peer review.

    Although our datathons have not yet in-volved basic biomedical scientists, participa-tion in these kinds of events might providethis group with ideas and opportunities. Theearly datathons have already given data-scienceexperts a critical toehold in the space andallowed them to leverage their skills whileexploring their evolving interests in the area.The datathons have also given a variety of cli-nicians normally excluded from the researchprocess the opportunity to lend their practicalexperiential insights and have opened up theresearch arena to entirely new and valuablegroups such as entrepreneurs and patients. Fu-ture datathons are likely to build and expand onthese early innovations in collaborative research.

    If half of reported research is indeed un-reliable (3), those who read the literature arenot only wasting half their time, they are actu-ally poisoning half their time with falsehoods.There is an unacceptable time delay in thefeedback loop between reported researchresults and the subsequent discovery that theresults are not valid. In fact, the results mightnever even be carefully scrutinized and remainas unknown falsehoods in academic-literaturelimbo. It is important to deliver a researchoutput that clinicians and patients can relyon to the extent that everything we read (in-cluding this paper) should be fundamentallyreliable, while still approached with criticalskepticism.

    www.ScienceTranslationalMedicin

    We are not so naïve as to believe that thiskind of paradigm change will be easy or un-challenged. But the current level of researchunreliability is unacceptable and is only likelyto increase with the deluge of digital healthdata unless the fundamental fragilities of theresearch system are addressed.We also under-stand that, while the proposed datathonmodelof collaborative work might yield better sci-ence andbetter use of interdisciplinary person-nel and resources, it is unlikely to have realimpact until the academic reward systemchanges. The current research enterprise isan open control loop in which the output isnot continuously evaluated for its impact onthe plant. Academic pressures, sometimescompoundedby influential funding bynondis-interested commercial parties, tend to movethe research process relentlessly forward witha speed that at times compromises due dili-gence. A datathon model with its interdis-ciplinary team approach has the potential toraise and answer important new questionswhile potentially reducing the wasteful un-reliability of the current system.

    REFERENCES AND NOTES1. F. Godlee, Research misconduct is widespread and harms

    patients. BMJ 344, e14 (2012).

    2. A.-W. Chan, F. Song, A. Vickers, T. Jefferson, K. Dickersin,P. C. Gøtzsche, H. M. Krumholz, D. Ghersi, H. B. van derWorp,Increasing value and reducing waste: Addressing in-accessible research. Lancet 383, 257–266 (2014).

    3. J. P. Ioannidis, How to make more published researchtrue. PLOS Med. 11, e1001747 (2014).

    4. M. R. Macleod, S. Michie, I. Roberts, U. Dirnagl, I. Chalmers,J. P. Ioannidis, R. Al-Shahi Salman, A.-W. Chan, P. Glasziou,Biomedical research: Increasing value, reducing waste.Lancet 383, 101–104 (2014).

    5. E. T. Moseley, D. J. Hsu, D. J. Stone, L. A. Celi, Beyond openbig data: Addressing unreliable research. J. Med. InternetRes. 16, e259 (2014).

    6. L. A. Celi, R. G. Mark, D. J. Stone, R. A. Montgomery, “Bigdata” in the intensive care unit. Closing the data loop.Am. J. Respir. Crit. Care Med. 187, 1157–1160 (2013).

    7. E. M. Antman, E. J. Benjamin, R. A. Harrington, S. R. Houser,E. D. Peterson, M. A. Bauman, N. Brown, V. Bufalino, R. M. Califf,M. A. Creager, A. Daugherty, D. L. Demets, B. P. Dennis,S. Ebadollahi, M. Jessup, M. S. Lauer, B. Lo, C. A. MacRae,M. V. McConnell, A. T. McCray, M. M. Mello, E. Mueller,J. W. Newburger, S. Okun, M. Packer, A. Philippakis, P. Ping,P. Prasoon, V. L. Roger, S. Singer, R. Temple, M. B. Turner,K. Vigilante, J. Warner, P. Wayte; American Heart Associ-ation Data Sharing Summit Attendees, Acquisition, anal-ysis, and sharing of data in 2015 and beyond: A survey ofthe landscape: A conference report from the AmericanHeart Association data summit 2015. J. Am. Heart Assoc.4, e002810 (2015).

    8. D. C. Ince, L. Hatton, J. Graham-Cumming, The case foropen computer programs. Nature 482, 485–488 (2012).

    9. G. Wilson, D. A. Aruliah, C. T. Brown, N. P. Chue Hong,M. Davis, R. T. Guy, S. H. Haddock, K. D. Huff, I. M. Mitchell,M. D. Plumbley, B. Waugh, E. P. White, P. Wilson, Best

    e.org 6 April 2016 Vol 8 Issue 333 333ps8 4

    http://stm.sciencemag.org/

  • P ER SP EC T I V E

    practices for scientific computing. PLOS Biol. 12, e1001745(2014).

    10. C. I. Barash, K. Elliston, R. Potenzone, tranSMART Founda-tion Datathon 1.0: The cross neurodegenerative diseaseschallenge. Applied Transl. Genomics 6, 42–44 (2015).

    11. F. Pérez, B. E. Granger, IPython: A system for interactivescientific computing. Comput. Sci. Eng. 9, 21–29 (2007).

    12. J. D. Salciccioli, D. C. Marshall, M. A. Pimentel, M. D. Santos,T. Pollard, L. A. Celi, J. Shalhoub, The association betweenthe neutrophil-to-lymphocyte ratio and mortality in crit-ical illness: An observational cohort study. Crit. Care 19, 13(2015).

    13. J. P. Ioannidis, S. Greenland, M. A. Hlatky, M. J. Khoury,M. R. Macleod, D. Moher, K. F. Schulz, R. Tibshirani,Increasing value and reducing waste in research design,conduct, and analysis. Lancet 383, 166–175 (2014).

    14. The Economist (2011) An array of errors. 10th September 2011.15. C. M. Micheel, S. J. Nass, G. S. Omenn, Evolution of

    Translational Omics: Lessons Learned and the Path For-ward (National Academies Press, Washington, DC, 2012).

    Acknowledgments: The MIMIC database is funded by the U.S. National Institutes of Health grant R01 EB001659. The MITCritical Care datathons were funded by the MIT International

    www.ScienceTranslationalMedicin

    Science and Technology Initiatives. The London datathonwas supported by The IET: The Institution of Engineeringand Technology. Competing interests: The authors declarethat they have no competing interests.

    10.1126/scitranslmed.aad9072

    Citation: J. Aboab, L. A. Celi , P. Charlton, M. Feng,M. Ghassemi, D. C. Marshall, L. Mayaud, T. Naumann,N. McCague, K. E. Paik, T. J. Pollard, M. Resche-Rigon,J. D. Salciccioli, D. J. Stone, A “datathon” model to supportcross-disciplinary collaboration. Sci. Transl. Med. 8, 333ps8 (2016).

    e.org 6 April 2016 Vol 8 Issue 333 333ps8 5

    by guest on Novem

    ber 23, 2017http://stm

    .sciencemag.org/

    Dow

    nloaded from

    http://stm.sciencemag.org/

  • A ''datathon'' model to support cross-disciplinary collaboration

    Salciccioli and David J. StoneMayaud, Tristan Naumann, Ned McCague, Kenneth E. Paik, Tom J. Pollard, Matthieu Resche-Rigon, Justin D. Jerôme Aboab, Leo Anthony Celi, Peter Charlton, Mengling Feng, Mohammad Ghassemi, Dominic C. Marshall, Louis

    DOI: 10.1126/scitranslmed.aad9072, 333ps8333ps8.8Sci Transl Med

    ARTICLE TOOLS http://stm.sciencemag.org/content/8/333/333ps8

    CONTENTRELATED

    http://stm.sciencemag.org/content/scitransmed/8/338/338ed6.fullhttp://stm.sciencemag.org/content/scitransmed/8/336/336ed5.fullhttp://stm.sciencemag.org/content/scitransmed/8/335/335ps10.full

    REFERENCES

    http://stm.sciencemag.org/content/8/333/333ps8#BIBLThis article cites 13 articles, 2 of which you can access for free

    PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

    Terms of ServiceUse of this article is subject to the

    is a registered trademark of AAAS.Science Translational Medicinetitle licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. TheScience, 1200 New York Avenue NW, Washington, DC 20005. 2017 © The Authors, some rights reserved; exclusive

    (ISSN 1946-6242) is published by the American Association for the Advancement ofScience Translational Medicine

    by guest on Novem

    ber 23, 2017http://stm

    .sciencemag.org/

    Dow

    nloaded from

    http://stm.sciencemag.org/content/8/333/333ps8http://stm.sciencemag.org/content/scitransmed/8/335/335ps10.fullhttp://stm.sciencemag.org/content/scitransmed/8/336/336ed5.fullhttp://stm.sciencemag.org/content/scitransmed/8/338/338ed6.fullhttp://stm.sciencemag.org/content/8/333/333ps8#BIBLhttp://www.sciencemag.org/help/reprints-and-permissionshttp://www.sciencemag.org/about/terms-servicehttp://stm.sciencemag.org/