Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
OriginalPaperIdentifyingAcuteLowBackPainEpisodesinPrimaryCarePracticefromClinicalNotesRiccardoMiotto1,2,3, PhD; Bethany L. Percha2,3, PhD; Benjamin S. Glicksberg1,2,3, PhD;Hao-Chih Lee2,3, PhD;LisanneCruz4,MD,MSc,FAAPMR;JoelT.Dudley1,2,3,PhD;andIsmailNabeel5,MD,MPH
(1)HassoPlattnerInstituteforDigitalHealthatMountSinai,IcahnSchoolofMedicineatMountSinai,NewYork,USA(2)InstituteforNextGenerationHealthcare,IcahnSchoolofMedicineatMountSinai,NewYork,USA(3)DepartmentofGeneticsandGenomicSciences,IcahnSchoolofMedicineatMountSinai,NewYork,USA(4)DepartmentofPhysicalMedicineandRehabilitation,IcahnSchoolofMedicineatMountSinai,NewYork,USA(5)DepartmentofEnvironmentalMedicineandPublicHealth,IcahnSchoolofMedicineatMountSinai,NewYork,USA
Abstract
Background:Acuteandchroniclowbackpain(LBP)aredifferentconditionswithdifferenttreatments.However,theyarecodedinelectronichealthrecordswiththesameICD-10code(M54.5)andcanbedifferentiatedonlybyretrospectivechartreviews.Thispreventsefficientdefinition of data-driven guidelines for billing and therapy recommendations, such asreturn-to-workoptions.Objective: To solve this issue,we evaluate the feasibility of automatically distinguishingacuteLBPepisodesbyanalyzingfreetextclinicalnotes.Methods:Weusedadatasetof17,409clinicalnotesfromdifferentprimarycarepractices;ofthese,891documentsweremanuallyannotatedas“acuteLBP”and2,973weregenerallyassociatedwithLBPviatherecordedICD-10code.Wecompareddifferentsupervisedandunsupervised strategies for automated identification: keyword search; topic modeling;logisticregressionwithbag-of-n-gramsandmanualfeatures;anddeeplearning(ConvNet).We trained the supervised models using either manual annotations or ICD-10 codes aspositivelabels.Results:ConvNettrainedusingmanualannotationsobtainedthebestresultswithanAUC-ROC of 0.97 and F-score of 0.69. ConvNet’s resultswere also robust to reduction of thenumber of manually annotated documents. In the absence of manual annotations, topicmodels performed better than methods trained using ICD-10 codes, which wereunsatisfactoryforidentifyingLBPacuity.Conclusions:Thisstudyusesclinicalnotestodelineateapotentialpathtowardsystematiclearningoftherapeuticstrategies,billingguidelines,andmanagementoptionsforacuteLBPatthepointofcare.
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Keywords: electronic health records; clinical notes; low back pain; natural languageprocessing;machinelearning.
IntroductionLowbackpain(LBP)isoneofthemostcommoncausesofdisabilityinUSadultsundertheageof45 [1],with10-20%ofAmericanworkers reportingpersistentbackpain [2]. LBPimpactsone’sabilitytoworkandaffectsthequalityoflife.Forexample,in2015Luckhauptetal.showedthat,fromapoolof19,441people,16.9%ofworkerswithanyLBPand19.0%ofthosewithfrequentandsevereLBPmissedatleastonefulldayofworkoveraperiodofthreemonths[3].LBPeventsalsoleadtosignificantfinancialburdenforbothindividualsand clinical facilities, with combined direct and indirect costs of treatment formusculoskeletal injuries and associated pain estimated to be approximately $213 billionannually[4].LBPeventsfallintotwomajorcategories:acuteandchronic[5].AcuteLBPoccurssuddenly,usuallyassociatedwithtraumaorinjurywithsubsequentpain,whereaschronicLBPisoftenreportedbypatientsinregularcheckupsandhasledtoasignificantincreaseintheuseofhealthcareservicesoverthepasttwodecades.ItisveryimportanttodifferentiatebetweenacuteandchronicLBPintheclinicalsettingastheseconditions-aswellastheirmanagementandbilling-aresubstantivelydifferent.Chronicbackpainisgenerallytreatedwithspinalinjections[6,7],surgery[8,9],and/orpainmedications[10,11],whileanti-inflammatoriesandarapidreturntonormalactivitiesofdailylivingaregenerallythebestrecommendationsforacuteLBP[12].However, acute and chronic LBP are usually not explicitly separated in electronic healthrecords (EHRs) due to a lack of distinguishing codes. The ICD-10-CM (InternationalClassificationofDiseases,TenthRevision,ClinicalModification)standardonlyincludesthecodeM54.5tocharacterize“Lowbackpain”diagnosis,anddoesnotprovidemodifierstodistinguishdifferentLBPacuities[13].Acuityisusuallyreportedinclinicalnotes,requiringretrospective chart review of the free text to characterize LBP events, which is time-consumingandnotscalable[14].Moreover,acuitycanbeexpressedindifferentways.Forexample,thetextcouldmention“acutelowbackpain”or“acutelbp”,butcouldalsosimplyreport “shootingpaindown into the lower extremities”, “limited spine rangeofmotion”,“vertebral tenderness”, “diffuse pain in lumbarmuscles”, and so on [15]. This variabilitymakesitdifficult forclinical facilitiesandresearcherstogroupLBPepisodesbyacuitytoperformkeytasks,suchasdefiningappropriatediagnosticandbillingcodes;evaluatingtheeffectivenessofprescribedtreatments;andderivingtherapeuticguidelinesandimproveddiagnosticmethodsthatcouldreducetime,disabilityandcost.Thispaperisthefirsttoexploretheuseofautomatedapproachesbasedonmachinelearningandinformationretrievaltoanalyzefree-textclinicalnotesandidentifytheacuityofLBPepisodes.Specifically,weuseasetofmanuallyannotatednotestotrainandevaluatevarious
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
machine learningarchitecturesbasedon logistic regression,n-grams, topicmodels,wordembeddings and convolutional neural networks, and to demonstrate that some of thesemodelsareable to identifyacuteLBPepisodeswithpromisingprecision. Inaddition,wedemonstratetheineffectivenessofusingICD-10codesalonetotrainthemodels,reinforcingtheideathattheyarenotsufficienttodifferentiatetheacuityofLBP.Ouroverallobjectiveistodevelopanautomatedframeworkthatcanhelpfrontlineprimarycareprovidersinthedevelopment of targeted strategies and return-to-work (RTW) options for acute LBPepisodesinclinicalpractice.
BackgroundandSignificancePrimary care providers (PCPs) are commonly the first medical practitioners to assesspatient’smusculoskeletalinjuriesandpainassociatedwiththeseinjuriesandarethereforeinauniqueposition tooffer reassurance, treatmentoptions, andRTWrecommendationscatered to the acuity of the injury and pain associated with it. Several studies havedocumented increases in medication prescriptions and visits to physicians, physicaltherapists,andchiropractorsforLBPepisodes[16–18].SinceindividualswithchronicLBPseekcareandusehealthcareservicesmorefrequentlythanthosewithacuteLBP,increasesinhealthcareuseandcostsforbackpainaredrivenmorebychronicthanacutecases[19].Arapidreturntonormalactivitiesofdailyliving,includingwork,isgenerallythebestactivityrecommendationforacuteLBPmanagement[12].ThenumberofworkdaysthatarelostduetoacuteLBPcanbereducedbyimplementingclinicalpracticeguidelinesintheprimarycaresetting [20]. In previous work, Cruz et al. built a RTW protocol tool for PCPs based onguidelinesfromtheLBPliterature[21].Basedonthetypeofwork(e.g.,clerical,manual,orheavy) and the severity of the condition, thedoctorwould recommendRTWoptions (inpartialorfulldutycapacity)withinacertainnumberofdays.Thestudyfoundthatphysicianswerelikelytousethisprotocol,especiallywhenitwasintegratedintotheEHRs.TheprotocolwasnotalwaysusedforpatientssufferingfromacuteLBP,however,astheresearchteamwasunabletoquicklyidentifytheacuityusingonlythestructuredEHRdata(e.g., ICD-10codes). Acuity information was only available in the progress notes and was thus notincorporatedintotheautomatedrecommendations.ThispreventedtheresearchteamfromprovidinganaccuratefeedbacktoPCPsbasedonafullpictureofthepatient’scondition.AsimilartoolthatcouldincorporateacuityinformationfromnotescouldprovidemuchmorespecificrecommendationstoPCPsthatincorporatebestpracticeguidelinesforeachacuitylevel.Besidesleadingtomoreprecisecare,thiswouldstreamlinebillingforLBP[22].Similarneedsariseforothermusculoskeletalconditions,suchasknee,elbow,andshoulderpain,whereICD-10codesdonotdifferentiatebypainlevelandacuity[23,24].MachinelearningmethodsforEHRdataprocessingareenablingimprovedunderstandingofpatientclinicaltrajectories,creatingopportunitiestoderivenewclinicalinsights[25,26].Inrecentyears,theapplicationofdeeplearning,ahierarchicalcomputationaldesignbasedon
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
layersofneuralnetworks[27],tostructuredEHRshasledtopromisingresultsonclinicaltaskslikediseasephenotypingandprediction[28–33].However,awealthofrelevantclinicalinformation remains locked behind clinical narratives in the free text of notes. NaturalLanguage Processing (NLP), a branch of computer science that enables machines tounderstandandprocesshumanlanguage[34]forapplicationslikemachinetranslation[35],text generation [36], and image captioning [37], has beenused toparse clinical notes toextractrelevantinsightsthatcanguideclinicaldecisions[38].RecentapplicationsofdeeplearningtoclinicalNLPhaveclassifiedclinicalnotesaccordingtodiagnosisordiseasecodes[39–41], predicted disease onset [32,42], and extracted primary cancer sites and theirlateralityinpathologyreports[43,44].However,whiledeeplearninghassuccessfullybeenappliedtoanalyzeclinicalnotes,traditionalmethodsarestillpreferablewhentrainingdataarelimited[45,46].Regardlessof the specificmethodology, toolsbasedonNLPapplied to clinicalnarrativeshavenotbeenwidelyusedinclinicalsettings[31,38],despitethefactthatphysiciansarelikely to follow computer-assisted guidelines if recommendations are tied to their ownobservations [47]. In this paper, we present an NLP-based framework that can helpphysiciansadheretobestpracticesandRTWrecommendationsforLBP.Tothebestofourknowledge,therearenostudiestodatethathaveappliedmachinelearningtoclinicalnotestodistinguish theacuityofamusculoskeletal condition incaseswhere it isnotexplicitlycoded.
MethodsThe conceptual steps of this study are summarized in Figure 1, specifically: datasetcomposition; text processing; clinical notes modeling; and experimental evaluation. Theoverall goal was to evaluate the feasibility of automatically identifying clinical notesreporting“acuteLBP”episodes.
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Figure1:Conceptualframeworkusedtoevaluatetheuseofautomatedapproachesbasedonmachine learning and information retrieval to analyze free-text clinical notes and identify“acutelowbackpain”episodes.DatasetWeuseda setof free-text clinicalnotesextracted from theMountSinaidatawarehouse,made available foruseunder IRB approval followingHIPAAguidelines. TheMount SinaiHealthSystemisanurbantertiarycarehospitallocatedontheUpperEastSideofManhattanin New York City. It generates a high volume of structured, semi-structured, andunstructureddataaspartof its routinehealthcareandclinicaloperations,which includeinpatient,outpatient,andemergencyroomvisits.TheseclinicalnoteswerecollectedduringapreviouspilotstudyevaluatingaRTWtoolbasedonEHRdatathatincludednearly40,000encountersfor15,715patientsspanningtheyears2016-2018andclinicalnoteswrittenby
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
81differentproviders(Cruzetal.[21]).Inthatstudy,weusedthepublishedliteraturetodevelop a list of guidelines to determine the assessment andmanagement of acute LBPepisodesinclinicalpractice.InparticularweusedICD-10codesaswellasotherparameters,such as “presenting complaint”, “pre-existing conditions”, “management factors”,“imaging/radiology/testordered”,andsoon,todefineandlabeltheacuityofLBPinaclinicalencounter.Followingtheseguidelines,14individuals(physicalmedicineandrehabilitationfellows,residents,andmedicalstudents)manuallyreviewedarandomsetof4,291clinicalnotesassociatedwiththeseencountersandlabeledall“acutelowbackpain”events.Eachnotewasreviewedbyatleasttwoindividualsandwasfurthercheckedbyaleadphysicianresearcherifitwasmarkedasambiguousand/ortherewasdiscordancebetweenreviewers.Thisproject leveraged the entire set of clinicalnotes thatwere collected in thepreviousstudy. Inparticular,we joinedall theprogressnotesof theseencountersunder thesameinitial visit, andwe eliminated duplicate, short (less than 3words), and non-meaningfulreports.Thefinaldatasetwascomposedof17,409distinctclinicalnotes,withlengthrangingfromsevento6,638words.Ofthisset,3,092notesweremanuallyreviewedinthepreviousstudyand891ofthemwereannotatedas“acuteLBP”.Theremaining14,317noteswerenotmanually evaluated and were related to different clinical domains, including variousmusculoskeletaldisordersandpotentiallyLBPevents.Inthisfinaldataset,1,973noteswerealsoassociatedtoanencounterbilledwithanICD10M54.5“Lowbackpain”code.TextProcessingEverynote in thedatasetwas tokenized, divided into sentences, and checked to removepunctuation,numbers,andnon-relevantconceptssuchasURLs,emails,dates,etc.Eachnotewasthenrepresentedasalistofsentences,witheverysentencebeingalistoflemmatizedwordsrepresentedasone-hotencodings.Thevocabularywascomposedofall thewordsappearingatleastfivetimesinthetrainingset.Thediscardedwordswerecorrectedtotheterms in the vocabularyhaving theminimumedit distance, i.e., theminimumnumberofoperations required to transform one string into the other [48]. This step reduced thenumber of misspelled words and prevented the accidental discarding of relevantinformation;atthesametimeitalsolimitedthesizeofthevocabularytoimprovescalability[39].Overall, thevocabularycoveringthewholedatasetwascomprisedof56,142uniquewords.ClinicalNoteModelingWe evaluated different approaches for identifying clinical notes that refer to acute LBPepisodes.Theseincludedbothsupervisedandunsupervisedmethods.Whilewebenefitedfrom theuseofhigh-qualitymanual annotations to train the supervisedmodels,wealsoinvestigated alternatives that did not require manual annotation of notes. All of these
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
methodsprovidedstraightforwardexplanationsoftheirpredictions,enablingustovalidateeachmodelandto identifypartsoftextandpatternsthatarerelevanttothe“acuteLBP”predictions.KeywordSearchWesearchedforasetofrelevantkeywordsinthetext.Inparticular,welookedfor“acutelowback pain”, “acute lbp”, “acute lowbp”, and “acute back pain” andwe counted theiroccurrencesinthetext.WeusedNegEx[49]toremovenegatedoccurrencesofthekeywords.Intheevaluation,werefertothismodelas“WordSearch”.TopicModelingWeusedtopicmodelingonthefullsetofwordscontainedinthenotestocaptureabstracttopicsreferredtointhedataset[50].Topicmodelingisanunsupervisedinferenceprocess,inthiscaseimplementedusinglatentDirichletallocation[51],thatcapturespatternsofwordco-occurrences within documents to define interpretable topics (i.e., multinomialdistributionofwords)andrepresentadocumentasamultinomialoverthesetopics.Everydocument can then be classified as talking about one or (usually) more topics. Topicmodeling is often used in healthcare to generalize clinical notes, improve the automaticprocessingofpatientdata,andexploreclinicaldatasets[52–55].Inthisstudy,weassumedthatoneormoreofthesetopicsmightrefertoacuteLBP.Inordertodiscoverthem,weidentifiedthemostlikelytopicsforasetofkeywords(i.e.,“acute”,“low”,“back”,“pain”,“lbp”,“bp”)andwemanuallyreviewedthemtoretainonlythosethatseemedmorelikelytocharacterizeacuteLBPepisodes(i.e.,thatincludedmostofthekeywordswithhighprobability).WethenconsideredthemaximumlikelihoodamongthesetopicsastheprobabilitythatareportreferredtoacuteLBP(i.e.,“TopicModel”intheexperiments).BagofN-gramsEach clinical note was represented as a bag of n-grams (with n = 1, …, 5), with TermFrequency-InverseDocument Frequency (tf-idf)weights (determined from the corpus ofdocuments).Eachn-gramisacontiguoussequenceofnwordsfromthetext.WeconsideredallthewordsinthevocabularyandfilteredthecommonstopwordsbasedontheEnglishdictionarybeforebuildingallthen-grams.TheclassificationwasimplementedusingLogisticRegressionwithLasso(i.e.,“BoN-LR”).FeatureEngineeringWeusedtheprotocolbuiltbyCruzetal. [21]todefineacuteLBPepisodes intheclinicalnotes.Inparticular,weusedalltheconceptsdescribedinthatguideline,pre-processedthemwiththesamealgorithmusedfortheclinicalnotes,andbuiltasetof5,154distinctn-grams(withn=1,…,5),thatwerefertoas“FeatEng”.Wethenrepresentedeachclinicalnoteasa
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
bag of FeatEng (i.e., we counted the occurrences of only these n-grams in the text),normalizedwithtf-idfweights,andclassifiedthemusingLogisticRegressionwithLasso(i.e.,“FeatEng-LR”).DeepLearningWeimplementedanend-to-enddeepneuralnetworkarchitecture(i.e.,“ConvNet”)thattakesasinputthefullnoteandoutputsitsprobabilityofbeingrelatedto“acuteLBP”.Thefirstlayerofthearchitecturemapsthewordstodensevectorrepresentations(i.e,“embeddings”),which attempt to contextualize the semanticmeaning of eachwordby creating ametricspace where vectors of semantically similar words are close to each other. We appliedword2vecwiththeskip-gramalgorithmtotheparsednotes[56]toinitializetheembeddingofeachwordinthevocabulary.Word2veciscommonlyusedwithEHRstolearnembeddingsofmedicalconceptsfromstructureddataaswellasfromclinicalnotes[46,57–59].TheembeddingswerethenfedtoaConvolutionalNeuralNetwork(CNN)inspiredbythemodel described by Kim [60] and by Liu et al. [42]. This architecture concatenatesrepresentationsofthetextatdifferentlevelsofabstraction,byessentiallychoosingthemostrelevantn-gramsateachlevel.Here,wefirstappliedasetofparallel1Dconvolutionsontheinputsequencewithkernelsizesrangingfrom1to5,thussimulatingn-gramswithn=1,…,5.Theoutputsofeachoftheseconvolutionswerethenmax-pooledoverthewholesequenceandconcatenatedtoa5xddimensionalvector,wheredisthenumberof1Dconvolutionalfilters.Thisrepresentationwasthenfedtosequencesoffullyconnectedlayers,whichlearnthe interactionsbetweenthetext features,andfinally toasigmoid layerthatoutputs thepredictionprobability.Then-grams that aremost relevant to theprediction, in this architecture, are those thatactivatetheneuronsinthemax-poolinglayer.Therefore,weusedthelog-oddsthatthen-gramcontributestothesigmoiddecisionfunction[42]asanindicationofhowmucheachn-graminfluencesthedecision.EvaluationDesignWeevaluatedallthearchitecturesusinga10-foldcross-validationexperiment,witheverynoteappearinginthetestsetonlyonce.Ineachtrainingsetweusedarandom90/10splitto train and validate all themodel configurations.As baselinewe also report the resultsobtainedbyconsideringas“acuteLBP”all thenotesassociatedwiththe“Lowbackpain”M54.5ICD-10code(i.e.,“ICD-10”intheresults).
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
TrainingAnnotationsWeconsideredtwodifferentsetsofannotationsasgoldstandardstotrainthesupervisedmodels.Inthefirstexperiment,weusedthemanuallycuratedannotationsprovidedwiththedatasetfrompreviouswork[21],whereasinthesecondexperimentwetrainedthemodelsusing the ICD-10 codes associated with each note encounter. Both experiments wereevaluated using manual annotations. The rationale was to compare the feasibility ofidentifyingacuteLBPeventswhenmanualannotationsareandarenotavailable.Wetrainedtheclassifiertooutput“acuteLBP”vs.“other”becausethegoaloftheprojectwastoidentifyclinicalnoteswithacuteLBPevents,ratherthandiscriminatedifferentfacetsofLBPevents(e.g.,“chronicLBP”vs.“acuteLBP”).MetricsForallexperiments,wereportareaunderthereceiveroperatingcharacteristiccurve(AUC-ROC),micro-precision,recall,F-score,andareaundertheprecision-recallcurve(AUC-PRC)[61].TheROCcurveisaplotoftruepositiverateversusfalsepositiveratefoundoverthesetof predictions. F-score is the harmonic mean of classification precision and recall perannotation,whereprecisionisthenumberofcorrectpositiveresultsdividedbythenumberof all positive results, and recall is thenumber of correct positive results dividedby thenumberofpositiveresultsthatshouldhavebeenreturned.ThePRCisaplotofprecisionandrecallfordifferentthresholds.TheareasundertheROCandPRCcurvesarecomputedbyintegratingthecorrespondingcurves.ModelHyperparametersThemodelhyperparameterswereempiricallytunedusingthevalidationsetstooptimizetheresultswithbothtrainingannotations.Inthetopicmodelingmethod,weinferredtopicsusingthewholetrainingsetofdocumentsand200topics(derivedusingperplexityanalysis).Whileseeminglymore intuitive,usingonly thenotesassociatedwith theM54.5 “Lowbackpain” ICD10codeactuallyproducedworse results. For each fold, the most relevant topics associated with acute LBP weremanuallyreviewedandusedtoannotatethenotes.Inthedeeplearningarchitecture,weusedembeddingswithsize300,andfull-lengthnotes.We trained word2vec just on the clinical note dataset to initialize embeddings. Pre-initializingtheembeddingswithageneral-purposecorpusdidnotleadtoanyimprovement.EachCNNhad200filtersandusedaReLuactivationfunction.Weaddedtwofullyconnectedlayers of size 600 following the CNNs with ReLu activations and batch normalization.Dropoutvaluesacrossthelayerswereallsetto0.5.Thearchitecturewastrainedusingcross-entropy losswith theAdamoptimizer for five epochs andbatch size32 (learning rate=0.001).Theclassificationthresholdsforprecision,recall,andF-scorewerefoundbyranging
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
thevaluefrom0.1to1,with0.1increments,andretaining,foreachmodel,thevalueleadingtothebestresultsonthevalidationset.
ResultsTable1andFigure2showtheaverageresultsofthe10-foldcross-validationexperimentforallthemodelsconsidered.ThebestresultswereobtainedbyConvNetwhentrainedwiththemanualannotations.Whilethisisnotentirelysurprisinggiventhesuccessofdeeplearningfor NLPwhen high-quality annotations and a large amount of data (i.e., on the order ofmillionsoftrainingexamples)areavailable,thiswasnotcertaininthisdomainwherethetrainingdatasetwasmuchsmaller.Asexpected,theresultsobtainedbythebaselineandbytrainingthemodelsusingtheICD-10codeswerenotasgood,confirmingthattheM54.5ICD-10codeisnotasufficientindicatorofacuteLBP.TopicModelleadstosimilarperformancebut provides a more intuitive and potentially effective way for exploring the collection,extractingmeaningfulpatternsthatarerelatedtoacuteLBPepisodes(seeFigure3).Whilethisapproachmightnotberobustenoughforclinicalapplication,arefinedandmanuallycuratedversionofTopicModelpromisestoallowanefficientpre-filteringofclinicalreportsthat can speed up themanual work required to annotate them.On the contrary but asexpected,WordSearchperformedpoorlyastheconditionismentionedintoomanydifferentwaysacrossthetextandsimplekeywordswerenotsufficient.
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Table1:Classificationresultsinidentifyingclinicalnoteswith“acuteLBP”episodesintermsofPrecision, Recall, F-score, Area Under the ROC (AUC-ROC) and Precision-Recall (AUC-PRC)Curves. Results are averaged over the 10-fold cross validation experiment. We compareddifferent supervised and unsupervised strategies: keyword search (“WordSearch”); topicmodeling (“TopicModel”); logistic regression with bag-of-n-grams (“BoN-LR”) and manualfeatures(“FeatEng-LR”);anddeeplearning(“ConvNet”).Thesupervisedmodels(i.e.,BoN-LR,FeatEng-LRandConvNet)weretrainedusingmanualannotationsorM54.5ICD-10codes.The“ICD-10”baselinesimplyconsideredas“acuteLBP”allthenotesassociatedwiththegenericM54.5“Lowbackpain”ICD-10code.
Precision Recall F-score AUC-ROC AUC-PRC
Baseline ICD-10 0.32 0.68 0.41 0.81 0.42
UnsupervisedMethods
WordSearch 0.71 0.03 0.06 0.52 0.40
TopicModel 0.44 0.58 0.50 0.92 0.46
TrainedwiththeM54.5ICD-10Code
BoN-LR 0.50 0.70 0.59 0.83 0.42
FeatEng-LR 0.47 0.59 0.52 0.88 0.41
ConvNet 0.55 0.68 0.61 0.89 0.46
TrainedwithManual
Annotations
BoN-LR 0.53 0.64 0.58 0.93 0.56
FeatEng-LR 0.58 0.66 0.62 0.93 0.58
ConvNet 0.65 0.73 0.70 0.98 0.72
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Figure2:ROCandPrecision-RecallcurvesobtainedwhenusingastrainingdataforBoN-LR,FeatEng-LRandConvNetthemanualannotations(a)andtheM54.5ICD-10codes(b).ConvNettrained using themanual annotations obtained the best results. In the absence ofmanualannotationstousefortraining,TopicModelworkedbetterthanmethodstrainedusingICD-10codes,whichprovednottobeagoodindicatortoidentifyacuityinLBPepisodes.
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Figure3:Representative“acuteLBP”-relatedtopicderivedbyaveragingthewordlikelihoodinall the relevant topics (i.e., that weremanually verified) inferred across the 10-fold cross-validation experiment.We report the top30words,with the biggestwords being themostrelevant.Asitcanbeseen,mostofthewordsareindeedrelatedtoacuteLBP,includingseveralmedicationsthatareusuallyprescribedtotreatinflammationandpain(e.g.,Cyclobenzaprine,Flexeril,Advil).AmanuallyrefinedversionofTopicModelcanhelppre-filteringthenotesinanintuitivesemi-automaticway,promisingtospeedupthemanualannotationprocess.Figure4reportstheclassificationresultsintermsofAUC-ROCandAUC-PRCwhenrandomlysubsamplingthe“acuteLBP”manualannotationsinthetrainingset.WefoundthatConvNetalwaysoutperformstheothermethodsbasedonLRaswellasTopicModel.Inaddition,wenoticethatusingjust30%ofthemanualannotations(i.e.,70clinicalnotes)alreadyleadstobetter results than using ICD-10 codes as training data. This is a particularly interestinginsightas it shows thatonlyminimal manualwork is required inorder toachievegoodclassifications;thesecanthenbefurtherimprovedbyaddingautomaticallyannotatednotestothemodel(aftermanualverification)andretraining.
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Figure4:AreaundertheROC(AUC-ROC)andPrecision-Recall(AUC-PRC)curvesobtainedwhentraining the supervised models using random sub-samples of the manual annotations.TopicModel is reported as reference baseline. ConvNet obtained satisfactory results whentrainedusinglessmanuallyannotateddocuments,showingrobustnessandscalabilitytothegoldstandard.
Figure5highlightsthedistributionsoftheclassificationscores(predictedprobabilityofthelabel“acuteLBP”)derivedbyseveralsupervisedmodels(trainedwithmanualannotations)andTopicModel.ConvNetshowsclearseparationbetweenacuteLBPnotesandtherestofthedataset.Inparticular,allacuteLBPnoteshadscoresgreaterthan0.2,with82%ofthem(i.e.,727notes)havingscoresgreaterthan0.5.Onthecontrary,only347controlshadscoresgreater than 0.5, meaning that only a few notes were highly likely to be misclassified.Similarly,TopicModelhadnocontrolswithscoresgreaterthan0.7andall“acuteLBP”noteshadscoresgreaterthan0.2.
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Figure 5: Representation of the probability distribution of the scores obtained by BoN-LR,FeatEng-LR,ConvNet(trainedwithmanualannotations)andTopicModel.ConvNetledtogoodseparationbetween“acuteLBP”clinicalnotesandalltheotherdocuments.Intheothercases,suchseparation isnotasclear,explainingtheworseclassificationresultsobtainedbythosemodels. Finally, Table 2 summarizes some of the n-grams driving the “acute LBP” predictionsobtainedbyConvNet(trainedwithmanualannotations)acrosstheexperiments.Whilesomeof these are obvious and refer to the disease itself (e.g., “acute lbp”), others refer tomedications(e.g.,“prescribedmusclerelaxant”,“flexeril”),andrecommendations(e.g.,“rtwfulldutyquick”).Giventheirclinicalmeaningandrelevance,allthesepatternscanbefurtheranalyzedandreviewedtopotentiallydrivethedevelopmentofguidelinesfor,e.g.,treatmentandRTWoptions.
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Table2:Examplesofn-gramsthatwererelevantinidentifying“acuteLBP”noteswhenusingConvNet trained with manual annotations. The n-grams relevance was determined byanalyzing theneurons of theCNNsactivating themax-pooling layers and their log-odds tocontributetothefinaloutput.Log-oddswerefilteredpernotesandthenaveragedoverallthenotesandevaluationfolds.
Type AcuteLBP-relatedPredictiveN-grams
Diagnosis
musclespasmlowerbackacutelowbackpainflarebeenhavingacutebackpainacutemidlinelowbackpainsportsacutebilaterallowbackpainacutelowbackpainacutelbp
RelatedConditions
gaitabnormalityshowedsignificantdiskherniationintermittentsciaticaspinalstenosis
Medications
backpainflareprescribedflexerilcyclobenzaprineflexerilnaproxenforacutelowbackprescribedmusclerelaxant
Recommendations
backbraceforbackpainobtainlumbarspineMRIrecommendationrtwvisitrtwfulldutyquick
DiscussionInthisworkweevaluatedtheuseofseveralmachinelearningapproachestoidentifyacuteLBP episodes in free text clinical notes in order to better personalize the treatment andmanagementofthisconditioninprimarycare.Theexperimentalresultsshowedthatit ispossible toextractacuteLBPepisodeswithpromisingprecision,especiallywhenat leastsomemanuallycuratedannotationsareavailable.Inthisscenario,ConvNet,adeeplearningarchitecturebasedonCNNs,significantlyoutperformedothershallowtechniquesbasedon
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
bag-of-n-gramsandlogisticregression,openingthepossibilitytoboostperformancesusingmorecomplexarchitecturesfromcurrentresearchintheNLPcommunity.Theimplementeddeeparchitecturealsoprovidesaneasymechanismtoexplain thepredictions, leading toinformeddecisionsupportbasedonmodel transparency[62,63]andthe identificationofmeaningfulpatternsthatcandriveclinicaldecisionmaking.Ifnoannotationsareavailable,experimentsshowedthattheuseoftopicmodelingispreferredtotrainingaclassifierusingonly the M54.5 ICD-10 codes (i.e., “Low back pain”) associated with the clinical noteencounter,whichprovedtobeapoorindicatortodiscriminateLBPepisodes.Inaddition,thetopicsidentifiedcanserveasanintuitivetooltoinformguidelinesandrecommendationsaswellastopre-filterthedocumentsandreducethemanualworkrequiredtoannotatethenotes. Theproposed framework is inherently domain agnostic anddoes not require anymanual supervision to identify relevant features from the free-text. Therefore, it can beleveragedinothermusculoskeletalconditiondomainswhereacuityisnotexpressedintheICD-10/diagnosticcodes,suchasknee,elbow,andshoulderpain.PotentialApplicationsMedical care decisions are often based on heuristics and manually derived rule-basedmodelsconstructedonpriorknowledgeandexpertise[64].Cognitivebiasesandpersonalitytraits,suchasaversiontoriskorambiguity,overconfidence,andtheanchoringeffect,maylead to diagnostic inaccuracies and medical errors resulting in mismanagement orinadequate utilization of resources [65]. In the LBP domain, thismay lead to: delays infindingtherighttherapyandassistinginthereturnofpatientstonormalactivities;increasein the risk of transitioning the condition from acute to chronic; creating discomfort forpatients;andincreasingtheeconomicburdensonclinicalfacilitiestoadequatelytreatandmanage this patient population. Deriving data-driven guidelines for treatmentrecommendationscanhelpinreducingthesecognitivebiasesandpersonalitytraits,leadingto more consistent and accurate decisions. In this scenario, the proposed frameworksintegrate seamlesslywith theRTWtoolproposedbyCruzetal. [21]by includingacuity-relevantinformationintheclinicalnotesandaddressingoneofthelimitationsofthatstudy(i.e.,recommendingtheRTWtoolatthepointofcarebyaccuratelyidentifyingconditionasacuteLBP).Similarly,anunderstandingofthepatternsdrivingthepredictionscanleadtothedevelopmentofnewand improved treatment strategies forvarious typesof injuries,whichcanbepresentedtothecliniciansatthetimeofpatientencountertohelpthemwithbettermanagementof thecondition.Whilephysicianswillcontinue tohaveautonomy indeterminingoptimalcarepathways fortheirpatients,therecommendationsprovidedbythesupportingframeworkwillbeusefultosystematizeandsupporttheiractivitieswithintherealmofthebusyclinicalpractice.PosterioranalysisoftheclinicalnotestoinferacuteLBP episodes can also help in assigning the proper diagnostic and billing codes for theencounter.Inaforeseeablefuturescenariowhereclinicalobservations areautomaticallytranscribedviavoiceandEHRsareprocessedinreal-time,anautomatedtoolthatidentifies
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
acuityinformationcouldalsoimprovetheaccuracyofdiagnosisandbillinginreal-time,withnoneedtowaitforposteriorevaluations.LimitationsThisworkevaluatedthefeasibilityofusingmachinelearningtoidentifyacuteLBPepisodesinclinicalnotes.Therefore,wecompareddifferenttypesofmodels(shallowvs.deep)andlearning frameworks (unsupervised vs. supervised) to identify the best directions forimplementationanddeploymentinrealclinicalsettings.Whileseveralofthearchitecturesevaluatedinthisworkobtainedpromisingresults,moresophisticatedmodelsarelikelytoimprove these performances, especially in the deep learning domain. For example,algorithms based on attention models [66], BERT [67], or XLNet [68] have shownencouraging results on similarNLP tasks and are likely to obtain better results in thisdomain as well. In this work we only focused on processing clinical notes; however,embeddingstructuredEHRdata,especiallymedications,imagingstudiesand/orlabtests,intothemethodshouldimprovetheresults.ThedatasetofclinicalnotesusedinthisstudyoriginatedfromageographicallydiversesetofprimarycareclinicsservingtheNewYorkpopulationacrosstheNYmetroareaoveralimitedperiod(i.e.,2016-2018).Providerswereenrolledandrandomizedintothestudyonarollingbasis,withthenumberofencountersforLBPvaryingforeachindividualprovider,basedonhis/herownpractice.Themajorityoftheprimarycareproviderswereassistantprofessorsservingonthefrontlines.Nospecialistswereincludedintheinitialstudyasthepilotprojectwasonlygearedtowardstheprimarycareproviders.Consequently,theresultsofthisstudymightnotbeapplicabletospecialtycarepractice.FutureWorkTheclassificationofLBPepisodesasacuteorchronicatthepointofcarelevelwithinprimarycarepracticeisimperativeforaRTWtooltobeeffectivelyusedtorenderevidence-basedguidelines.Atthistime,weplantoclassifya largesetofnotes,derivepatternsrelatedtoacuteLBPandextendthetoolproposedbyCruzetal.[21]accordingtothem.WealsoplantoidentifycaseswheretheRTWtoolcanbeeasilydeployedbasedonEHRintegrationintheclinicaldomain.Second,wewillbegintoaddresssomeofthemethodologicallimitationsofthisstudytooptimizeperformanceandevaluateitsgeneralizabilityoutsideprimarycare.Finally,weaimtoevaluatethefeasibilityofthistypeofapproachforothermusculoskeletalconditions,inparticular,shoulderandkneepain.
ConclusionsThisarticledemonstratesthefeasibilityofusingmachinelearningtoautomaticallyidentifyacuteLBPepisodesfromclinicalreportsusingonlyunstructuredfreetextdata.Inparticular,
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
manuallyannotatingasetofnotes touseasagoldstandardcan leadtoeffectiveresults,especiallywhenusingdeeplearning.Topicmodelingcanhelpinspeedinguptheannotationprocess,initiatinganiterativeprocesswhereinitialpredictionsarevalidatedandthenusedto refineandoptimize themodel.This approachprovidesa generalizable framework forlearning to differentiate disease acuity in primary care, which canmore accurately andspecificallyguidethediagnosisandtreatmentofLBP.ItalsoprovidesaclearpathtowardimprovingtheaccuracyofcodingandbillingofclinicalencountersforLBP.
AcknowledgementsI.N.andL.C.wouldliketothankthePilotProjectsResearchTrainingProgramoftheNYandNJ Education and Research Center (ERC), National Institute for Occupational Safety andHealth,fortheirfunding(grant#T42OH008422).R.M.wouldliketothankthesupportfromtheHassoPlattnerFoundationandacourtesyGPUdonationfromNVIDIA.CompetingInterestsNone.ContributionsR.M.and I.N. initiated the ideaandwrote thearticle; I.N.collected thedataandprovidedclinicalsupport;R.M.conductedtheresearchandtheexperimentalevaluation;B.L.P.advisedonevaluationstrategiesandrefinedthearticle;B.S.G.,H.L.,andL.C.refinedthearticle;J.T.D.supportedtheresearch.Alltheauthorseditedandreviewedthemanuscript. Abbreviations AUC-PRC:AreaUnderthePrecision-RecallCurveAUC-ROC:AreaUndertheReceiverOperatingCharacteristicCurveBoN:BagOfN-gramsCNN:ConvolutionalNeuralNetworkEHR:ElectronicHealthRecordHIPAA:HealthInsurancePortabilityandAccountabilityActICD-CM:InternationalStatisticalofDiseases,ClinicalModificationIRB:InstitutionalReviewBoardLBP:LowBackPainLR:LogisticRegressionNLP:NaturalLanguageProcessingNY:NewYorkPCP:PrimaryCareProviderRTW:ReturnToWorkTF-IDF:TermFrequency-InverseDocumentFrequency
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
References 1 Centers forDisease Control and Prevention (CDC). Prevalence andmost common causes of disability
amongadults--UnitedStates,2005.MMWRMorbMortalWklyRep2009;58:421–6.2 RicciJA,StewartWF,CheeE,etal.BackpainexacerbationsandlostproductivetimecostsinUnitedStates
workers.Spine2006;31:3052–60.3 LuckhauptSE,DahlhamerJM,GonzalesGT,etal.Prevalence,RecognitionofWork-Relatedness,andEffect
on Work of Low Back Pain Among US Workers. Ann Intern Med Published Online First:2019.https://annals.org/aim/article-abstract/2733500/prevalence-recognition-work-relatedness-effect-work-low-back-pain-among?searchresult=1
4 HealthCareUtilizationandEconomicCost.BMUS:TheBurdenofMusculoskeletalDiseasesintheUnitedStates. https://www.boneandjointburden.org/2014-report/if0/health-care-utilization-and-economic-cost(accessed22Apr2019).
5 Fairbank J, Gwilym SE, France JC, et al. The role of classification of chronic low back pain. Spine2011;36:S19–42.
6 WeinerDK,KimY-S,BoninoP,etal.Lowbackpaininolderadults:areweutilizinghealthcareresourceswisely?PainMed2006;7:143–50.
7 FriedlyJ,ChanL,DeyoR.IncreasesinlumbosacralinjectionsintheMedicarepopulation:1994to2001.Spine2007;32:1754–60.
8 DeyoRA,MirzaSK.TrendsandVariationsintheUseofSpineSurgery.ClinOrthopRelatRes2006;443:139.9 DeyoRA,NachemsonA,MirzaSK.Spinal-FusionSurgery—TheCaseforRestraint.SpineJ2004;4:S138–
42.10 BallantyneJC.OpioidsfortheTreatmentofChronicPain:MistakesMade,LessonsLearned,andFuture
Directions.AnesthAnalg2017;125:1769–78.11 LuoX,PietrobonR,HeyL.Patternsand trends inopioiduseamong individualswithbackpain in the
UnitedStates.Spine2004;29:884–90;discussion891.12 MalmivaaraA,HäkkinenU,AroT,etal.TheTreatmentofAcuteLowBackPain—BedRest,Exercises,or
Ordinary Activity? New England Journal of Medicine. 1995;332:351–5.doi:10.1056/nejm199502093320602
13 2019 ICD-10-CM Diagnosis Code M54.5: Low back pain.https://www.icd10data.com/ICD10CM/Codes/M00-M99/M50-M54/M54-/M54.5 (accessed 24 Apr2019).
14 PetersenT,LaslettM,JuhlC.Clinicalclassificationinlowbackpain:best-evidencediagnosticrulesbasedonsystematicreviews.BMCMusculoskeletDisord2017;18:188.
15 CasazzaBA.Diagnosisandtreatmentofacutelowbackpain.AmFamPhysician2012;85:343.16 FeuersteinM,MarcusSC,HuangGD.Nationaltrendsinnonoperativecarefornonspecificbackpain.Spine
J2004;4:56–63.17 KesslerRC,DavisRB,FosterDF,etal. Long-term trends in theuseof complementaryandalternative
medicaltherapiesintheUnitedStates.AnnInternMed2001;135:262–8.18 MartinBI,DeyoRA,MirzaSK,etal.Expendituresandhealthstatusamongadultswithbackandneck
problems.JAMA2008;299:656–64.19 FreburgerJK,HolmesGM,AgansRP,etal.Therisingprevalenceofchroniclowbackpain.ArchInternMed
2009;169:251–8.20 RossignolM,AbenhaimL,SéguinP,etal.Coordinationofprimaryhealthcareforbackpain.Arandomized
controlledtrial.Spine2000;25:251–8;discussion258–9.21 CruzLC,AlamgirHA,ShethP,etal.Developmentofareturntoworktoolforprimarycareprovidersfor
patientswithlowbackpain:Apilotstudy.JFamilyMedPrimCare2018;7:1185–92.22 OwensJD,HegmannKT,ThieseMS,etal.ImpactsofAdherencetoEvidence-BasedMedicineGuidelines
fortheManagementofAcuteLowBackPainonCostsofWorker’sCompensationClaims.JOccupEnviron
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
Med2019;61:445–52.23 Reporting Pain in ICD-10-CM. Coding Strategies. https://www.codingstrategies.com/news/reporting-
pain-icd-10-cm(accessed21Jun2019).24 GrossDP,Armijo-OlivoS,ShawWS,etal.ClinicalDecisionSupportToolsforSelectingInterventionsfor
PatientswithDisablingMusculoskeletalDisorders:AScopingReview.JOccupRehabil2016;26:286–318.25 JensenPB,JensenLJ,BrunakS.Miningelectronichealthrecords:towardsbetterresearchapplicationsand
clinicalcare.NatRevGenet2012;13:395–405.26 GlicksbergBS,JohnsonKW,DudleyJT.Thenextgenerationofprecisionmedicine:observationalstudies,
electronichealthrecords,biobanksandcontinuousmonitoring.HumMolGenet2018;27:R56–62.27 LeCunY,BengioY,HintonG.Deeplearning.Nature2015;521:436–44.28 MiottoR, Li L,KiddBA,etal.DeepPatient:AnUnsupervisedRepresentation toPredict theFutureof
PatientsfromtheElectronicHealthRecords.SciRep2016;6:26094.29 MiottoR,WangF,WangS,etal.Deeplearningforhealthcare:review,opportunitiesandchallenges.Brief
BioinformPublishedOnlineFirst:6May2017.doi:10.1093/bib/bbx04430 ChoiE,BahadoriMT,SchuetzA,etal.DoctorAI:PredictingClinicalEventsviaRecurrentNeuralNetworks.
arXiv[cs.LG].2015.http://arxiv.org/abs/1511.05942v1131 XiaoC,ChoiE,SunJ.Opportunitiesandchallengesindevelopingdeeplearningmodelsusingelectronic
health recordsdata: a systematic review. JAmMed InformAssoc PublishedOnlineFirst:8 June2018.doi:10.1093/jamia/ocy068
32 RajkomarA,OrenE,ChenK,etal.Scalableandaccuratedeeplearningwithelectronichealthrecords.npjDigitalMedicine2018;1:18.
33 MiottoR,LiL,DudleyJT.DeepLearningtoPredictPatientFutureDiseasesfromtheElectronicHealthRecords.In:FerroN,CrestaniF,MoensM-F,etal.,eds.AdvancesinInformationRetrieval.Cham::SpringerInternationalPublishing2016.768–74.
34 GoldbergY.APrimeronNeuralNetworkModelsforNaturalLanguageProcessing. JournalofArtificialIntelligenceResearch.2016;57:345–420.doi:10.1613/jair.4992
35 WuY,SchusterM,ChenZ,etal.Google’sNeuralMachineTranslationSystem:BridgingtheGapbetweenHumanandMachineTranslation.arXiv[cs.CL].2016.http://arxiv.org/abs/1609.08144
36 KannanA,KurachK,RaviS,etal.SmartReply:AutomatedResponseSuggestionforEmail.In:Proceedingsofthe22NdACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining.NewYork,NY,USA::ACM2016.955–64.
37 VinyalsO,ToshevA,BengioS,etal.Showandtell:Aneuralimagecaptiongenerator.In:ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition.2015.3156–64.
38 SheikhalishahiS,MiottoR,DudleyJT,etal.NaturalLanguageProcessingofclinicalnotes:AsystematicreviewforChronicDiseases.JMIRMedicalInformatics.2018.doi:10.2196/12239
39 BaumelT,Nassour-KassisJ,CohenR,etal.Multi-LabelClassificationofPatientNotesaCaseStudyonICDCodeAssignment.arXiv[cs.CL].2017.http://arxiv.org/abs/1709.09587
40 MullenbachJ,WiegreffeS,DukeJ,etal.ExplainablePredictionofMedicalCodesfromClinicalText.arXiv[cs.CL].2018.http://arxiv.org/abs/1802.05695
41 Shi H, Xie P, Hu Z, et al. Towards Automated ICD Coding Using Deep Learning. arXiv [cs.CL].2017.http://arxiv.org/abs/1711.04075
42 Liu J, Zhang Z, RazavianN.DeepEHR: ChronicDisease PredictionUsingMedicalNotes. arXiv [cs.LG].2018.http://arxiv.org/abs/1808.04928
43 Yoon H-J, Ramanathan A, Tourassi G. Multi-task Deep Neural Networks for Automated Extraction ofPrimarySiteandLateralityInformationfromCancerPathologyReports.In:AdvancesinBigData.SpringerInternationalPublishing2017.195–204.
44 QiuJX,YoonH-J,FearnPA,etal.DeepLearningforAutomatedExtractionofPrimarySitesFromCancerPathologyReports.IEEEJBiomedHealthInform2018;22:244–51.
45 Turner CA, Jacobs AD, Marques CK, et al. Word2Vec inversion and traditional text classifiers for
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint
phenotypinglupus.BMCMedInformDecisMak2017;17:126.46 GehrmannS,DernoncourtF, LiY,et al. ComparingRule-BasedandDeepLearningModels forPatient
Phenotyping.arXiv[cs.CL].2017.http://arxiv.org/abs/1703.0870547 DavisDA,Taylor-VaiseyA.Translatingguidelinesintopractice.Asystematicreviewoftheoreticconcepts,
practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ1997;157:408–16.
48 NavarroG.AGuidedTourtoApproximateStringMatching.ACMComputSurv2001;33:31–88.49 ChapmanWW,BridewellW,HanburyP,etal.Asimplealgorithmfor identifyingnegated findingsand
diseasesindischargesummaries.JBiomedInform2001;34:301–10.50 Blei DM. Probabilistic topic models. Communications of the ACM. 2012;55:77.
doi:10.1145/2133806.213382651 BleiDM,NgAY,JordanMI.LatentDirichletAllocation.JMachLearnRes2003;3:993–1022.52 Miotto R,Weng C. Case-based reasoning using electronic health records efficiently identifies eligible
patientsforclinicaltrials.JAmMedInformAssoc2015;22:e141–50.53 PerotteAJ,WoodF,ElhadadN,etal.HierarchicallySupervisedLatentDirichletAllocation. In: Shawe-
TaylorJ,ZemelRS,BartlettPL,etal.,eds.AdvancesinNeuralInformationProcessingSystems24.CurranAssociates,Inc.2011.2609–17.
54 ChangKR,LouX,KaraletsosT,etal.AnEmpiricalAnalysisofTopicModelingforMiningCancerClinicalNotes.bioRxiv.2016;:062307.doi:10.1101/062307
55 CohenR,AviramI,ElhadadM,etal.Redundancy-awaretopicmodelingforpatientrecordnotes.PLoSOne2014;9:e87555.
56 Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and theirCompositionality.In:BurgesCJC,BottouL,WellingM,etal.,eds.AdvancesinNeuralInformationProcessingSystems26.CurranAssociates,Inc.2013.3111–9.
57 GlicksbergBS,MiottoR,JohnsonKW,etal.AutomateddiseasecohortselectionusingwordembeddingsfromElectronicHealthRecords.PacSympBiocomput2018;23:145–56.
58 Choi Y, ChiuCY-I, SontagD. LearningLow-DimensionalRepresentations ofMedical Concepts.AMIA JtSummitsTranslSciProc2016;2016:41–50.
59 WangY, Liu S, AfzalN, et al. A comparison ofword embeddings for the biomedical natural languageprocessing.JBiomedInform2018;87:12–20.
60 Kim Y. Convolutional Neural Networks for Sentence Classification. arXiv [cs.CL].2014.http://arxiv.org/abs/1408.5882
61 Ricardo BY, Berthier RN. Modern Information Retrieval: the concepts and technology behind searchsecondedition.AddisionWesley2011.
62 HolzingerA,BiemannC,PattichisCS,etal.WhatdoweneedtobuildexplainableAIsystemsforthemedicaldomain?arXiv[cs.AI].2017.http://arxiv.org/abs/1712.09923
63 LiptonZC.TheMythosofModelInterpretability.arXiv[cs.LG].2016.http://arxiv.org/abs/1606.0349064 MarewskiJN,GigerenzerG.Heuristicdecisionmakinginmedicine.DialoguesClinNeurosci2012;14:77–
89.65 SaposnikG,RedelmeierD,RuffCC,etal.Cognitivebiasesassociatedwithmedicaldecisions:asystematic
review.BMCMedInformDecisMak2016;16:138.66 VaswaniA,ShazeerN,ParmarN,etal.AttentionisAllyouNeed.In:GuyonI,LuxburgUV,BengioS,etal.,
eds.AdvancesinNeuralInformationProcessingSystems30.CurranAssociates,Inc.2017.5998–6008.67 DevlinJ,ChangM-W,LeeK,etal.BERT:Pre-trainingofDeepBidirectionalTransformersforLanguage
Understanding.arXiv[cs.CL].2018.http://arxiv.org/abs/1810.0480568 YangZ,DaiZ,YangY,etal.XLNet:GeneralizedAutoregressivePretrainingforLanguageUnderstanding.
arXiv[cs.CL].2019.http://arxiv.org/abs/1906.08237
All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint