Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

OriginalPaperIdentifyingAcuteLowBackPainEpisodesinPrimaryCarePracticefromClinicalNotesRiccardoMiotto1,2,3, PhD; Bethany L. Percha2,3, PhD; Benjamin S. Glicksberg1,2,3, PhD;Hao-Chih Lee2,3, PhD;LisanneCruz4,MD,MSc,FAAPMR;JoelT.Dudley1,2,3,PhD;andIsmailNabeel5,MD,MPH

(1)HassoPlattnerInstituteforDigitalHealthatMountSinai,IcahnSchoolofMedicineatMountSinai,NewYork,USA(2)InstituteforNextGenerationHealthcare,IcahnSchoolofMedicineatMountSinai,NewYork,USA(3)DepartmentofGeneticsandGenomicSciences,IcahnSchoolofMedicineatMountSinai,NewYork,USA(4)DepartmentofPhysicalMedicineandRehabilitation,IcahnSchoolofMedicineatMountSinai,NewYork,USA(5)DepartmentofEnvironmentalMedicineandPublicHealth,IcahnSchoolofMedicineatMountSinai,NewYork,USA

Abstract

Background:Acuteandchroniclowbackpain(LBP)aredifferentconditionswithdifferenttreatments.However,theyarecodedinelectronichealthrecordswiththesameICD-10code(M54.5)andcanbedifferentiatedonlybyretrospectivechartreviews.Thispreventsefficientdefinition of data-driven guidelines for billing and therapy recommendations, such asreturn-to-workoptions.Objective: To solve this issue,we evaluate the feasibility of automatically distinguishingacuteLBPepisodesbyanalyzingfreetextclinicalnotes.Methods:Weusedadatasetof17,409clinicalnotesfromdifferentprimarycarepractices;ofthese,891documentsweremanuallyannotatedas“acuteLBP”and2,973weregenerallyassociatedwithLBPviatherecordedICD-10code.Wecompareddifferentsupervisedandunsupervised strategies for automated identification: keyword search; topic modeling;logisticregressionwithbag-of-n-gramsandmanualfeatures;anddeeplearning(ConvNet).We trained the supervised models using either manual annotations or ICD-10 codes aspositivelabels.Results:ConvNettrainedusingmanualannotationsobtainedthebestresultswithanAUC-ROC of 0.97 and F-score of 0.69. ConvNet’s resultswere also robust to reduction of thenumber of manually annotated documents. In the absence of manual annotations, topicmodels performed better than methods trained using ICD-10 codes, which wereunsatisfactoryforidentifyingLBPacuity.Conclusions:Thisstudyusesclinicalnotestodelineateapotentialpathtowardsystematiclearningoftherapeuticstrategies,billingguidelines,andmanagementoptionsforacuteLBPatthepointofcare.

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

https://doi.org/10.1101/19010462

Keywords: electronic health records; clinical notes; low back pain; natural languageprocessing;machinelearning.

IntroductionLowbackpain(LBP)isoneofthemostcommoncausesofdisabilityinUSadultsundertheageof45 [1],with10-20%ofAmericanworkers reportingpersistentbackpain [2]. LBPimpactsone’sabilitytoworkandaffectsthequalityoflife.Forexample,in2015Luckhauptetal.showedthat,fromapoolof19,441people,16.9%ofworkerswithanyLBPand19.0%ofthosewithfrequentandsevereLBPmissedatleastonefulldayofworkoveraperiodofthreemonths[3].LBPeventsalsoleadtosignificantfinancialburdenforbothindividualsand clinical facilities, with combined direct and indirect costs of treatment formusculoskeletal injuries and associated pain estimated to be approximately $213 billionannually[4].LBPeventsfallintotwomajorcategories:acuteandchronic[5].AcuteLBPoccurssuddenly,usuallyassociatedwithtraumaorinjurywithsubsequentpain,whereaschronicLBPisoftenreportedbypatientsinregularcheckupsandhasledtoasignificantincreaseintheuseofhealthcareservicesoverthepasttwodecades.ItisveryimportanttodifferentiatebetweenacuteandchronicLBPintheclinicalsettingastheseconditions-aswellastheirmanagementandbilling-aresubstantivelydifferent.Chronicbackpainisgenerallytreatedwithspinalinjections[6,7],surgery[8,9],and/orpainmedications[10,11],whileanti-inflammatoriesandarapidreturntonormalactivitiesofdailylivingaregenerallythebestrecommendationsforacuteLBP[12].However, acute and chronic LBP are usually not explicitly separated in electronic healthrecords (EHRs) due to a lack of distinguishing codes. The ICD-10-CM (InternationalClassificationofDiseases,TenthRevision,ClinicalModification)standardonlyincludesthecodeM54.5tocharacterize“Lowbackpain”diagnosis,anddoesnotprovidemodifierstodistinguishdifferentLBPacuities[13].Acuityisusuallyreportedinclinicalnotes,requiringretrospective chart review of the free text to characterize LBP events, which is time-consumingandnotscalable[14].Moreover,acuitycanbeexpressedindifferentways.Forexample,thetextcouldmention“acutelowbackpain”or“acutelbp”,butcouldalsosimplyreport “shootingpaindown into the lower extremities”, “limited spine rangeofmotion”,“vertebral tenderness”, “diffuse pain in lumbarmuscles”, and so on [15]. This variabilitymakesitdifficult forclinical facilitiesandresearcherstogroupLBPepisodesbyacuitytoperformkeytasks,suchasdefiningappropriatediagnosticandbillingcodes;evaluatingtheeffectivenessofprescribedtreatments;andderivingtherapeuticguidelinesandimproveddiagnosticmethodsthatcouldreducetime,disabilityandcost.Thispaperisthefirsttoexploretheuseofautomatedapproachesbasedonmachinelearningandinformationretrievaltoanalyzefree-textclinicalnotesandidentifytheacuityofLBPepisodes.Specifically,weuseasetofmanuallyannotatednotestotrainandevaluatevarious



https://doi.org/10.1101/19010462

machine learningarchitecturesbasedon logistic regression,n-grams, topicmodels,wordembeddings and convolutional neural networks, and to demonstrate that some of thesemodelsareable to identifyacuteLBPepisodeswithpromisingprecision. Inaddition,wedemonstratetheineffectivenessofusingICD-10codesalonetotrainthemodels,reinforcingtheideathattheyarenotsufficienttodifferentiatetheacuityofLBP.Ouroverallobjectiveistodevelopanautomatedframeworkthatcanhelpfrontlineprimarycareprovidersinthedevelopment of targeted strategies and return-to-work (RTW) options for acute LBPepisodesinclinicalpractice.

BackgroundandSignificancePrimary care providers (PCPs) are commonly the first medical practitioners to assesspatient’smusculoskeletalinjuriesandpainassociatedwiththeseinjuriesandarethereforeinauniqueposition tooffer reassurance, treatmentoptions, andRTWrecommendationscatered to the acuity of the injury and pain associated with it. Several studies havedocumented increases in medication prescriptions and visits to physicians, physicaltherapists,andchiropractorsforLBPepisodes[16–18].SinceindividualswithchronicLBPseekcareandusehealthcareservicesmorefrequentlythanthosewithacuteLBP,increasesinhealthcareuseandcostsforbackpainaredrivenmorebychronicthanacutecases[19].Arapidreturntonormalactivitiesofdailyliving,includingwork,isgenerallythebestactivityrecommendationforacuteLBPmanagement[12].ThenumberofworkdaysthatarelostduetoacuteLBPcanbereducedbyimplementingclinicalpracticeguidelinesintheprimarycaresetting [20]. In previous work, Cruz et al. built a RTW protocol tool for PCPs based onguidelinesfromtheLBPliterature[21].Basedonthetypeofwork(e.g.,clerical,manual,orheavy) and the severity of the condition, thedoctorwould recommendRTWoptions (inpartialorfulldutycapacity)withinacertainnumberofdays.Thestudyfoundthatphysicianswerelikelytousethisprotocol,especiallywhenitwasintegratedintotheEHRs.TheprotocolwasnotalwaysusedforpatientssufferingfromacuteLBP,however,astheresearchteamwasunabletoquicklyidentifytheacuityusingonlythestructuredEHRdata(e.g., ICD-10codes). Acuity information was only available in the progress notes and was thus notincorporatedintotheautomatedrecommendations.ThispreventedtheresearchteamfromprovidinganaccuratefeedbacktoPCPsbasedonafullpictureofthepatient’scondition.AsimilartoolthatcouldincorporateacuityinformationfromnotescouldprovidemuchmorespecificrecommendationstoPCPsthatincorporatebestpracticeguidelinesforeachacuitylevel.Besidesleadingtomoreprecisecare,thiswouldstreamlinebillingforLBP[22].Similarneedsariseforothermusculoskeletalconditions,suchasknee,elbow,andshoulderpain,whereICD-10codesdonotdifferentiatebypainlevelandacuity[23,24].MachinelearningmethodsforEHRdataprocessingareenablingimprovedunderstandingofpatientclinicaltrajectories,creatingopportunitiestoderivenewclinicalinsights[25,26].Inrecentyears,theapplicationofdeeplearning,ahierarchicalcomputationaldesignbasedon



https://doi.org/10.1101/19010462

layersofneuralnetworks[27],tostructuredEHRshasledtopromisingresultsonclinicaltaskslikediseasephenotypingandprediction[28–33].However,awealthofrelevantclinicalinformation remains locked behind clinical narratives in the free text of notes. NaturalLanguage Processing (NLP), a branch of computer science that enables machines tounderstandandprocesshumanlanguage[34]forapplicationslikemachinetranslation[35],text generation [36], and image captioning [37], has beenused toparse clinical notes toextractrelevantinsightsthatcanguideclinicaldecisions[38].RecentapplicationsofdeeplearningtoclinicalNLPhaveclassifiedclinicalnotesaccordingtodiagnosisordiseasecodes[39–41], predicted disease onset [32,42], and extracted primary cancer sites and theirlateralityinpathologyreports[43,44].However,whiledeeplearninghassuccessfullybeenappliedtoanalyzeclinicalnotes,traditionalmethodsarestillpreferablewhentrainingdataarelimited[45,46].Regardlessof the specificmethodology, toolsbasedonNLPapplied to clinicalnarrativeshavenotbeenwidelyusedinclinicalsettings[31,38],despitethefactthatphysiciansarelikely to follow computer-assisted guidelines if recommendations are tied to their ownobservations [47]. In this paper, we present an NLP-based framework that can helpphysiciansadheretobestpracticesandRTWrecommendationsforLBP.Tothebestofourknowledge,therearenostudiestodatethathaveappliedmachinelearningtoclinicalnotestodistinguish theacuityofamusculoskeletal condition incaseswhere it isnotexplicitlycoded.

MethodsThe conceptual steps of this study are summarized in Figure 1, specifically: datasetcomposition; text processing; clinical notes modeling; and experimental evaluation. Theoverall goal was to evaluate the feasibility of automatically identifying clinical notesreporting“acuteLBP”episodes.



https://doi.org/10.1101/19010462

Figure1:Conceptualframeworkusedtoevaluatetheuseofautomatedapproachesbasedonmachine learning and information retrieval to analyze free-text clinical notes and identify“acutelowbackpain”episodes.DatasetWeuseda setof free-text clinicalnotesextracted from theMountSinaidatawarehouse,made available foruseunder IRB approval followingHIPAAguidelines. TheMount SinaiHealthSystemisanurbantertiarycarehospitallocatedontheUpperEastSideofManhattanin New York City. It generates a high volume of structured, semi-structured, andunstructureddataaspartof its routinehealthcareandclinicaloperations,which includeinpatient,outpatient,andemergencyroomvisits.TheseclinicalnoteswerecollectedduringapreviouspilotstudyevaluatingaRTWtoolbasedonEHRdatathatincludednearly40,000encountersfor15,715patientsspanningtheyears2016-2018andclinicalnoteswrittenby



https://doi.org/10.1101/19010462

81differentproviders(Cruzetal.[21]).Inthatstudy,weusedthepublishedliteraturetodevelop a list of guidelines to determine the assessment andmanagement of acute LBPepisodesinclinicalpractice.InparticularweusedICD-10codesaswellasotherparameters,such as “presenting complaint”, “pre-existing conditions”, “management factors”,“imaging/radiology/testordered”,andsoon,todefineandlabeltheacuityofLBPinaclinicalencounter.Followingtheseguidelines,14individuals(physicalmedicineandrehabilitationfellows,residents,andmedicalstudents)manuallyreviewedarandomsetof4,291clinicalnotesassociatedwiththeseencountersandlabeledall“acutelowbackpain”events.Eachnotewasreviewedbyatleasttwoindividualsandwasfurthercheckedbyaleadphysicianresearcherifitwasmarkedasambiguousand/ortherewasdiscordancebetweenreviewers.Thisproject leveraged the entire set of clinicalnotes thatwere collected in thepreviousstudy. Inparticular,we joinedall theprogressnotesof theseencountersunder thesameinitial visit, andwe eliminated duplicate, short (less than 3words), and non-meaningfulreports.Thefinaldatasetwascomposedof17,409distinctclinicalnotes,withlengthrangingfromsevento6,638words.Ofthisset,3,092notesweremanuallyreviewedinthepreviousstudyand891ofthemwereannotatedas“acuteLBP”.Theremaining14,317noteswerenotmanually evaluated and were related to different clinical domains, including variousmusculoskeletaldisordersandpotentiallyLBPevents.Inthisfinaldataset,1,973noteswerealsoassociatedtoanencounterbilledwithanICD10M54.5“Lowbackpain”code.TextProcessingEverynote in thedatasetwas tokenized, divided into sentences, and checked to removepunctuation,numbers,andnon-relevantconceptssuchasURLs,emails,dates,etc.Eachnotewasthenrepresentedasalistofsentences,witheverysentencebeingalistoflemmatizedwordsrepresentedasone-hotencodings.Thevocabularywascomposedofall thewordsappearingatleastfivetimesinthetrainingset.Thediscardedwordswerecorrectedtotheterms in the vocabularyhaving theminimumedit distance, i.e., theminimumnumberofoperations required to transform one string into the other [48]. This step reduced thenumber of misspelled words and prevented the accidental discarding of relevantinformation;atthesametimeitalsolimitedthesizeofthevocabularytoimprovescalability[39].Overall, thevocabularycoveringthewholedatasetwascomprisedof56,142uniquewords.ClinicalNoteModelingWe evaluated different approaches for identifying clinical notes that refer to acute LBPepisodes.Theseincludedbothsupervisedandunsupervisedmethods.Whilewebenefitedfrom theuseofhigh-qualitymanual annotations to train the supervisedmodels,wealsoinvestigated alternatives that did not require manual annotation of notes. All of these



https://doi.org/10.1101/19010462

methodsprovidedstraightforwardexplanationsoftheirpredictions,enablingustovalidateeachmodelandto identifypartsoftextandpatternsthatarerelevanttothe“acuteLBP”predictions.KeywordSearchWesearchedforasetofrelevantkeywordsinthetext.Inparticular,welookedfor“acutelowback pain”, “acute lbp”, “acute lowbp”, and “acute back pain” andwe counted theiroccurrencesinthetext.WeusedNegEx[49]toremovenegatedoccurrencesofthekeywords.Intheevaluation,werefertothismodelas“WordSearch”.TopicModelingWeusedtopicmodelingonthefullsetofwordscontainedinthenotestocaptureabstracttopicsreferredtointhedataset[50].Topicmodelingisanunsupervisedinferenceprocess,inthiscaseimplementedusinglatentDirichletallocation[51],thatcapturespatternsofwordco-occurrences within documents to define interpretable topics (i.e., multinomialdistributionofwords)andrepresentadocumentasamultinomialoverthesetopics.Everydocument can then be classified as talking about one or (usually) more topics. Topicmodeling is often used in healthcare to generalize clinical notes, improve the automaticprocessingofpatientdata,andexploreclinicaldatasets[52–55].Inthisstudy,weassumedthatoneormoreofthesetopicsmightrefertoacuteLBP.Inordertodiscoverthem,weidentifiedthemostlikelytopicsforasetofkeywords(i.e.,“acute”,“low”,“back”,“pain”,“lbp”,“bp”)andwemanuallyreviewedthemtoretainonlythosethatseemedmorelikelytocharacterizeacuteLBPepisodes(i.e.,thatincludedmostofthekeywordswithhighprobability).WethenconsideredthemaximumlikelihoodamongthesetopicsastheprobabilitythatareportreferredtoacuteLBP(i.e.,“TopicModel”intheexperiments).BagofN-gramsEach clinical note was represented as a bag of n-grams (with n = 1, …, 5), with TermFrequency-InverseDocument Frequency (tf-idf)weights (determined from the corpus ofdocuments).Eachn-gramisacontiguoussequenceofnwordsfromthetext.WeconsideredallthewordsinthevocabularyandfilteredthecommonstopwordsbasedontheEnglishdictionarybeforebuildingallthen-grams.TheclassificationwasimplementedusingLogisticRegressionwithLasso(i.e.,“BoN-LR”).FeatureEngineeringWeusedtheprotocolbuiltbyCruzetal. [21]todefineacuteLBPepisodes intheclinicalnotes.Inparticular,weusedalltheconceptsdescribedinthatguideline,pre-processedthemwiththesamealgorithmusedfortheclinicalnotes,andbuiltasetof5,154distinctn-grams(withn=1,…,5),thatwerefertoas“FeatEng”.Wethenrepresentedeachclinicalnoteasa



https://doi.org/10.1101/19010462

bag of FeatEng (i.e., we counted the occurrences of only these n-grams in the text),normalizedwithtf-idfweights,andclassifiedthemusingLogisticRegressionwithLasso(i.e.,“FeatEng-LR”).DeepLearningWeimplementedanend-to-enddeepneuralnetworkarchitecture(i.e.,“ConvNet”)thattakesasinputthefullnoteandoutputsitsprobabilityofbeingrelatedto“acuteLBP”.Thefirstlayerofthearchitecturemapsthewordstodensevectorrepresentations(i.e,“embeddings”),which attempt to contextualize the semanticmeaning of eachwordby creating ametricspace where vectors of semantically similar words are close to each other. We appliedword2vecwiththeskip-gramalgorithmtotheparsednotes[56]toinitializetheembeddingofeachwordinthevocabulary.Word2veciscommonlyusedwithEHRstolearnembeddingsofmedicalconceptsfromstructureddataaswellasfromclinicalnotes[46,57–59].TheembeddingswerethenfedtoaConvolutionalNeuralNetwork(CNN)inspiredbythemodel described by Kim [60] and by Liu et al. [42]. This architecture concatenatesrepresentationsofthetextatdifferentlevelsofabstraction,byessentiallychoosingthemostrelevantn-gramsateachlevel.Here,wefirstappliedasetofparallel1Dconvolutionsontheinputsequencewithkernelsizesrangingfrom1to5,thussimulatingn-gramswithn=1,…,5.Theoutputsofeachoftheseconvolutionswerethenmax-pooledoverthewholesequenceandconcatenatedtoa5xddimensionalvector,wheredisthenumberof1Dconvolutionalfilters.Thisrepresentationwasthenfedtosequencesoffullyconnectedlayers,whichlearnthe interactionsbetweenthetext features,andfinally toasigmoid layerthatoutputs thepredictionprobability.Then-grams that aremost relevant to theprediction, in this architecture, are those thatactivatetheneuronsinthemax-poolinglayer.Therefore,weusedthelog-oddsthatthen-gramcontributestothesigmoiddecisionfunction[42]asanindicationofhowmucheachn-graminfluencesthedecision.EvaluationDesignWeevaluatedallthearchitecturesusinga10-foldcross-validationexperiment,witheverynoteappearinginthetestsetonlyonce.Ineachtrainingsetweusedarandom90/10splitto train and validate all themodel configurations.As baselinewe also report the resultsobtainedbyconsideringas“acuteLBP”all thenotesassociatedwiththe“Lowbackpain”M54.5ICD-10code(i.e.,“ICD-10”intheresults).



https://doi.org/10.1101/19010462

TrainingAnnotationsWeconsideredtwodifferentsetsofannotationsasgoldstandardstotrainthesupervisedmodels.Inthefirstexperiment,weusedthemanuallycuratedannotationsprovidedwiththedatasetfrompreviouswork[21],whereasinthesecondexperimentwetrainedthemodelsusing the ICD-10 codes associated with each note encounter. Both experiments wereevaluated using manual annotations. The rationale was to compare the feasibility ofidentifyingacuteLBPeventswhenmanualannotationsareandarenotavailable.Wetrainedtheclassifiertooutput“acuteLBP”vs.“other”becausethegoaloftheprojectwastoidentifyclinicalnoteswithacuteLBPevents,ratherthandiscriminatedifferentfacetsofLBPevents(e.g.,“chronicLBP”vs.“acuteLBP”).MetricsForallexperiments,wereportareaunderthereceiveroperatingcharacteristiccurve(AUC-ROC),micro-precision,recall,F-score,andareaundertheprecision-recallcurve(AUC-PRC)[61].TheROCcurveisaplotoftruepositiverateversusfalsepositiveratefoundoverthesetof predictions. F-score is the harmonic mean of classification precision and recall perannotation,whereprecisionisthenumberofcorrectpositiveresultsdividedbythenumberof all positive results, and recall is thenumber of correct positive results dividedby thenumberofpositiveresultsthatshouldhavebeenreturned.ThePRCisaplotofprecisionandrecallfordifferentthresholds.TheareasundertheROCandPRCcurvesarecomputedbyintegratingthecorrespondingcurves.ModelHyperparametersThemodelhyperparameterswereempiricallytunedusingthevalidationsetstooptimizetheresultswithbothtrainingannotations.Inthetopicmodelingmethod,weinferredtopicsusingthewholetrainingsetofdocumentsand200topics(derivedusingperplexityanalysis).Whileseeminglymore intuitive,usingonly thenotesassociatedwith theM54.5 “Lowbackpain” ICD10codeactuallyproducedworse results. For each fold, the most relevant topics associated with acute LBP weremanuallyreviewedandusedtoannotatethenotes.Inthedeeplearningarchitecture,weusedembeddingswithsize300,andfull-lengthnotes.We trained word2vec just on the clinical note dataset to initialize embeddings. Pre-initializingtheembeddingswithageneral-purposecorpusdidnotleadtoanyimprovement.EachCNNhad200filtersandusedaReLuactivationfunction.Weaddedtwofullyconnectedlayers of size 600 following the CNNs with ReLu activations and batch normalization.Dropoutvaluesacrossthelayerswereallsetto0.5.Thearchitecturewastrainedusingcross-entropy losswith theAdamoptimizer for five epochs andbatch size32 (learning rate=0.001).Theclassificationthresholdsforprecision,recall,andF-scorewerefoundbyranging



https://doi.org/10.1101/19010462

thevaluefrom0.1to1,with0.1increments,andretaining,foreachmodel,thevalueleadingtothebestresultsonthevalidationset.

ResultsTable1andFigure2showtheaverageresultsofthe10-foldcross-validationexperimentforallthemodelsconsidered.ThebestresultswereobtainedbyConvNetwhentrainedwiththemanualannotations.Whilethisisnotentirelysurprisinggiventhesuccessofdeeplearningfor NLPwhen high-quality annotations and a large amount of data (i.e., on the order ofmillionsoftrainingexamples)areavailable,thiswasnotcertaininthisdomainwherethetrainingdatasetwasmuchsmaller.Asexpected,theresultsobtainedbythebaselineandbytrainingthemodelsusingtheICD-10codeswerenotasgood,confirmingthattheM54.5ICD-10codeisnotasufficientindicatorofacuteLBP.TopicModelleadstosimilarperformancebut provides a more intuitive and potentially effective way for exploring the collection,extractingmeaningfulpatternsthatarerelatedtoacuteLBPepisodes(seeFigure3).Whilethisapproachmightnotberobustenoughforclinicalapplication,arefinedandmanuallycuratedversionofTopicModelpromisestoallowanefficientpre-filteringofclinicalreportsthat can speed up themanual work required to annotate them.On the contrary but asexpected,WordSearchperformedpoorlyastheconditionismentionedintoomanydifferentwaysacrossthetextandsimplekeywordswerenotsufficient.



https://doi.org/10.1101/19010462

Table1:Classificationresultsinidentifyingclinicalnoteswith“acuteLBP”episodesintermsofPrecision, Recall, F-score, Area Under the ROC (AUC-ROC) and Precision-Recall (AUC-PRC)Curves. Results are averaged over the 10-fold cross validation experiment. We compareddifferent supervised and unsupervised strategies: keyword search (“WordSearch”); topicmodeling (“TopicModel”); logistic regression with bag-of-n-grams (“BoN-LR”) and manualfeatures(“FeatEng-LR”);anddeeplearning(“ConvNet”).Thesupervisedmodels(i.e.,BoN-LR,FeatEng-LRandConvNet)weretrainedusingmanualannotationsorM54.5ICD-10codes.The“ICD-10”baselinesimplyconsideredas“acuteLBP”allthenotesassociatedwiththegenericM54.5“Lowbackpain”ICD-10code.

Precision Recall F-score AUC-ROC AUC-PRC

Baseline ICD-10 0.32 0.68 0.41 0.81 0.42

UnsupervisedMethods

WordSearch 0.71 0.03 0.06 0.52 0.40

TopicModel 0.44 0.58 0.50 0.92 0.46

TrainedwiththeM54.5ICD-10Code

BoN-LR 0.50 0.70 0.59 0.83 0.42

FeatEng-LR 0.47 0.59 0.52 0.88 0.41

ConvNet 0.55 0.68 0.61 0.89 0.46

TrainedwithManual

Annotations

BoN-LR 0.53 0.64 0.58 0.93 0.56

FeatEng-LR 0.58 0.66 0.62 0.93 0.58

ConvNet 0.65 0.73 0.70 0.98 0.72



https://doi.org/10.1101/19010462

Figure2:ROCandPrecision-RecallcurvesobtainedwhenusingastrainingdataforBoN-LR,FeatEng-LRandConvNetthemanualannotations(a)andtheM54.5ICD-10codes(b).ConvNettrained using themanual annotations obtained the best results. In the absence ofmanualannotationstousefortraining,TopicModelworkedbetterthanmethodstrainedusingICD-10codes,whichprovednottobeagoodindicatortoidentifyacuityinLBPepisodes.



https://doi.org/10.1101/19010462

Figure3:Representative“acuteLBP”-relatedtopicderivedbyaveragingthewordlikelihoodinall the relevant topics (i.e., that weremanually verified) inferred across the 10-fold cross-validation experiment.We report the top30words,with the biggestwords being themostrelevant.Asitcanbeseen,mostofthewordsareindeedrelatedtoacuteLBP,includingseveralmedicationsthatareusuallyprescribedtotreatinflammationandpain(e.g.,Cyclobenzaprine,Flexeril,Advil).AmanuallyrefinedversionofTopicModelcanhelppre-filteringthenotesinanintuitivesemi-automaticway,promisingtospeedupthemanualannotationprocess.Figure4reportstheclassificationresultsintermsofAUC-ROCandAUC-PRCwhenrandomlysubsamplingthe“acuteLBP”manualannotationsinthetrainingset.WefoundthatConvNetalwaysoutperformstheothermethodsbasedonLRaswellasTopicModel.Inaddition,wenoticethatusingjust30%ofthemanualannotations(i.e.,70clinicalnotes)alreadyleadstobetter results than using ICD-10 codes as training data. This is a particularly interestinginsightas it shows thatonlyminimal manualwork is required inorder toachievegoodclassifications;thesecanthenbefurtherimprovedbyaddingautomaticallyannotatednotestothemodel(aftermanualverification)andretraining.



https://doi.org/10.1101/19010462

Figure4:AreaundertheROC(AUC-ROC)andPrecision-Recall(AUC-PRC)curvesobtainedwhentraining the supervised models using random sub-samples of the manual annotations.TopicModel is reported as reference baseline. ConvNet obtained satisfactory results whentrainedusinglessmanuallyannotateddocuments,showingrobustnessandscalabilitytothegoldstandard.

Figure5highlightsthedistributionsoftheclassificationscores(predictedprobabilityofthelabel“acuteLBP”)derivedbyseveralsupervisedmodels(trainedwithmanualannotations)andTopicModel.ConvNetshowsclearseparationbetweenacuteLBPnotesandtherestofthedataset.Inparticular,allacuteLBPnoteshadscoresgreaterthan0.2,with82%ofthem(i.e.,727notes)havingscoresgreaterthan0.5.Onthecontrary,only347controlshadscoresgreater than 0.5, meaning that only a few notes were highly likely to be misclassified.Similarly,TopicModelhadnocontrolswithscoresgreaterthan0.7andall“acuteLBP”noteshadscoresgreaterthan0.2.



https://doi.org/10.1101/19010462

Figure 5: Representation of the probability distribution of the scores obtained by BoN-LR,FeatEng-LR,ConvNet(trainedwithmanualannotations)andTopicModel.ConvNetledtogoodseparationbetween“acuteLBP”clinicalnotesandalltheotherdocuments.Intheothercases,suchseparation isnotasclear,explainingtheworseclassificationresultsobtainedbythosemodels. Finally, Table 2 summarizes some of the n-grams driving the “acute LBP” predictionsobtainedbyConvNet(trainedwithmanualannotations)acrosstheexperiments.Whilesomeof these are obvious and refer to the disease itself (e.g., “acute lbp”), others refer tomedications(e.g.,“prescribedmusclerelaxant”,“flexeril”),andrecommendations(e.g.,“rtwfulldutyquick”).Giventheirclinicalmeaningandrelevance,allthesepatternscanbefurtheranalyzedandreviewedtopotentiallydrivethedevelopmentofguidelinesfor,e.g.,treatmentandRTWoptions.



https://doi.org/10.1101/19010462

Table2:Examplesofn-gramsthatwererelevantinidentifying“acuteLBP”noteswhenusingConvNet trained with manual annotations. The n-grams relevance was determined byanalyzing theneurons of theCNNsactivating themax-pooling layers and their log-odds tocontributetothefinaloutput.Log-oddswerefilteredpernotesandthenaveragedoverallthenotesandevaluationfolds.

Type AcuteLBP-relatedPredictiveN-grams

Diagnosis

musclespasmlowerbackacutelowbackpainflarebeenhavingacutebackpainacutemidlinelowbackpainsportsacutebilaterallowbackpainacutelowbackpainacutelbp

RelatedConditions

gaitabnormalityshowedsignificantdiskherniationintermittentsciaticaspinalstenosis

Medications

backpainflareprescribedflexerilcyclobenzaprineflexerilnaproxenforacutelowbackprescribedmusclerelaxant

Recommendations

backbraceforbackpainobtainlumbarspineMRIrecommendationrtwvisitrtwfulldutyquick

DiscussionInthisworkweevaluatedtheuseofseveralmachinelearningapproachestoidentifyacuteLBP episodes in free text clinical notes in order to better personalize the treatment andmanagementofthisconditioninprimarycare.Theexperimentalresultsshowedthatit ispossible toextractacuteLBPepisodeswithpromisingprecision,especiallywhenat leastsomemanuallycuratedannotationsareavailable.Inthisscenario,ConvNet,adeeplearningarchitecturebasedonCNNs,significantlyoutperformedothershallowtechniquesbasedon



https://doi.org/10.1101/19010462

bag-of-n-gramsandlogisticregression,openingthepossibilitytoboostperformancesusingmorecomplexarchitecturesfromcurrentresearchintheNLPcommunity.Theimplementeddeeparchitecturealsoprovidesaneasymechanismtoexplain thepredictions, leading toinformeddecisionsupportbasedonmodel transparency[62,63]andthe identificationofmeaningfulpatternsthatcandriveclinicaldecisionmaking.Ifnoannotationsareavailable,experimentsshowedthattheuseoftopicmodelingispreferredtotrainingaclassifierusingonly the M54.5 ICD-10 codes (i.e., “Low back pain”) associated with the clinical noteencounter,whichprovedtobeapoorindicatortodiscriminateLBPepisodes.Inaddition,thetopicsidentifiedcanserveasanintuitivetooltoinformguidelinesandrecommendationsaswellastopre-filterthedocumentsandreducethemanualworkrequiredtoannotatethenotes. Theproposed framework is inherently domain agnostic anddoes not require anymanual supervision to identify relevant features from the free-text. Therefore, it can beleveragedinothermusculoskeletalconditiondomainswhereacuityisnotexpressedintheICD-10/diagnosticcodes,suchasknee,elbow,andshoulderpain.PotentialApplicationsMedical care decisions are often based on heuristics and manually derived rule-basedmodelsconstructedonpriorknowledgeandexpertise[64].Cognitivebiasesandpersonalitytraits,suchasaversiontoriskorambiguity,overconfidence,andtheanchoringeffect,maylead to diagnostic inaccuracies and medical errors resulting in mismanagement orinadequate utilization of resources [65]. In the LBP domain, thismay lead to: delays infindingtherighttherapyandassistinginthereturnofpatientstonormalactivities;increasein the risk of transitioning the condition from acute to chronic; creating discomfort forpatients;andincreasingtheeconomicburdensonclinicalfacilitiestoadequatelytreatandmanage this patient population. Deriving data-driven guidelines for treatmentrecommendationscanhelpinreducingthesecognitivebiasesandpersonalitytraits,leadingto more consistent and accurate decisions. In this scenario, the proposed frameworksintegrate seamlesslywith theRTWtoolproposedbyCruzetal. [21]by includingacuity-relevantinformationintheclinicalnotesandaddressingoneofthelimitationsofthatstudy(i.e.,recommendingtheRTWtoolatthepointofcarebyaccuratelyidentifyingconditionasacuteLBP).Similarly,anunderstandingofthepatternsdrivingthepredictionscanleadtothedevelopmentofnewand improved treatment strategies forvarious typesof injuries,whichcanbepresentedtothecliniciansatthetimeofpatientencountertohelpthemwithbettermanagementof thecondition.Whilephysicianswillcontinue tohaveautonomy indeterminingoptimalcarepathways fortheirpatients,therecommendationsprovidedbythesupportingframeworkwillbeusefultosystematizeandsupporttheiractivitieswithintherealmofthebusyclinicalpractice.PosterioranalysisoftheclinicalnotestoinferacuteLBP episodes can also help in assigning the proper diagnostic and billing codes for theencounter.Inaforeseeablefuturescenariowhereclinicalobservations areautomaticallytranscribedviavoiceandEHRsareprocessedinreal-time,anautomatedtoolthatidentifies



https://doi.org/10.1101/19010462

acuityinformationcouldalsoimprovetheaccuracyofdiagnosisandbillinginreal-time,withnoneedtowaitforposteriorevaluations.LimitationsThisworkevaluatedthefeasibilityofusingmachinelearningtoidentifyacuteLBPepisodesinclinicalnotes.Therefore,wecompareddifferenttypesofmodels(shallowvs.deep)andlearning frameworks (unsupervised vs. supervised) to identify the best directions forimplementationanddeploymentinrealclinicalsettings.Whileseveralofthearchitecturesevaluatedinthisworkobtainedpromisingresults,moresophisticatedmodelsarelikelytoimprove these performances, especially in the deep learning domain. For example,algorithms based on attention models [66], BERT [67], or XLNet [68] have shownencouraging results on similarNLP tasks and are likely to obtain better results in thisdomain as well. In this work we only focused on processing clinical notes; however,embeddingstructuredEHRdata,especiallymedications,imagingstudiesand/orlabtests,intothemethodshouldimprovetheresults.ThedatasetofclinicalnotesusedinthisstudyoriginatedfromageographicallydiversesetofprimarycareclinicsservingtheNewYorkpopulationacrosstheNYmetroareaoveralimitedperiod(i.e.,2016-2018).Providerswereenrolledandrandomizedintothestudyonarollingbasis,withthenumberofencountersforLBPvaryingforeachindividualprovider,basedonhis/herownpractice.Themajorityoftheprimarycareproviderswereassistantprofessorsservingonthefrontlines.Nospecialistswereincludedintheinitialstudyasthepilotprojectwasonlygearedtowardstheprimarycareproviders.Consequently,theresultsofthisstudymightnotbeapplicabletospecialtycarepractice.FutureWorkTheclassificationofLBPepisodesasacuteorchronicatthepointofcarelevelwithinprimarycarepracticeisimperativeforaRTWtooltobeeffectivelyusedtorenderevidence-basedguidelines.Atthistime,weplantoclassifya largesetofnotes,derivepatternsrelatedtoacuteLBPandextendthetoolproposedbyCruzetal.[21]accordingtothem.WealsoplantoidentifycaseswheretheRTWtoolcanbeeasilydeployedbasedonEHRintegrationintheclinicaldomain.Second,wewillbegintoaddresssomeofthemethodologicallimitationsofthisstudytooptimizeperformanceandevaluateitsgeneralizabilityoutsideprimarycare.Finally,weaimtoevaluatethefeasibilityofthistypeofapproachforothermusculoskeletalconditions,inparticular,shoulderandkneepain.

ConclusionsThisarticledemonstratesthefeasibilityofusingmachinelearningtoautomaticallyidentifyacuteLBPepisodesfromclinicalreportsusingonlyunstructuredfreetextdata.Inparticular,



https://doi.org/10.1101/19010462

manuallyannotatingasetofnotes touseasagoldstandardcan leadtoeffectiveresults,especiallywhenusingdeeplearning.Topicmodelingcanhelpinspeedinguptheannotationprocess,initiatinganiterativeprocesswhereinitialpredictionsarevalidatedandthenusedto refineandoptimize themodel.This approachprovidesa generalizable framework forlearning to differentiate disease acuity in primary care, which canmore accurately andspecificallyguidethediagnosisandtreatmentofLBP.ItalsoprovidesaclearpathtowardimprovingtheaccuracyofcodingandbillingofclinicalencountersforLBP.

AcknowledgementsI.N.andL.C.wouldliketothankthePilotProjectsResearchTrainingProgramoftheNYandNJ Education and Research Center (ERC), National Institute for Occupational Safety andHealth,fortheirfunding(grant#T42OH008422).R.M.wouldliketothankthesupportfromtheHassoPlattnerFoundationandacourtesyGPUdonationfromNVIDIA.CompetingInterestsNone.ContributionsR.M.and I.N. initiated the ideaandwrote thearticle; I.N.collected thedataandprovidedclinicalsupport;R.M.conductedtheresearchandtheexperimentalevaluation;B.L.P.advisedonevaluationstrategiesandrefinedthearticle;B.S.G.,H.L.,andL.C.refinedthearticle;J.T.D.supportedtheresearch.Alltheauthorseditedandreviewedthemanuscript. Abbreviations AUC-PRC:AreaUnderthePrecision-RecallCurveAUC-ROC:AreaUndertheReceiverOperatingCharacteristicCurveBoN:BagOfN-gramsCNN:ConvolutionalNeuralNetworkEHR:ElectronicHealthRecordHIPAA:HealthInsurancePortabilityandAccountabilityActICD-CM:InternationalStatisticalofDiseases,ClinicalModificationIRB:InstitutionalReviewBoardLBP:LowBackPainLR:LogisticRegressionNLP:NaturalLanguageProcessingNY:NewYorkPCP:PrimaryCareProviderRTW:ReturnToWorkTF-IDF:TermFrequency-InverseDocumentFrequency



https://doi.org/10.1101/19010462

References 1 Centers forDisease Control and Prevention (CDC). Prevalence andmost common causes of disability

amongadults--UnitedStates,2005.MMWRMorbMortalWklyRep2009;58:421–6.2 RicciJA,StewartWF,CheeE,etal.BackpainexacerbationsandlostproductivetimecostsinUnitedStates

workers.Spine2006;31:3052–60.3 LuckhauptSE,DahlhamerJM,GonzalesGT,etal.Prevalence,RecognitionofWork-Relatedness,andEffect

on Work of Low Back Pain Among US Workers. Ann Intern Med Published Online First:2019.https://annals.org/aim/article-abstract/2733500/prevalence-recognition-work-relatedness-effect-work-low-back-pain-among?searchresult=1

4 HealthCareUtilizationandEconomicCost.BMUS:TheBurdenofMusculoskeletalDiseasesintheUnitedStates. https://www.boneandjointburden.org/2014-report/if0/health-care-utilization-and-economic-cost(accessed22Apr2019).

5 Fairbank J, Gwilym SE, France JC, et al. The role of classification of chronic low back pain. Spine2011;36:S19–42.

6 WeinerDK,KimY-S,BoninoP,etal.Lowbackpaininolderadults:areweutilizinghealthcareresourceswisely?PainMed2006;7:143–50.

7 FriedlyJ,ChanL,DeyoR.IncreasesinlumbosacralinjectionsintheMedicarepopulation:1994to2001.Spine2007;32:1754–60.

8 DeyoRA,MirzaSK.TrendsandVariationsintheUseofSpineSurgery.ClinOrthopRelatRes2006;443:139.9 DeyoRA,NachemsonA,MirzaSK.Spinal-FusionSurgery—TheCaseforRestraint.SpineJ2004;4:S138–

42.10 BallantyneJC.OpioidsfortheTreatmentofChronicPain:MistakesMade,LessonsLearned,andFuture

Directions.AnesthAnalg2017;125:1769–78.11 LuoX,PietrobonR,HeyL.Patternsand trends inopioiduseamong individualswithbackpain in the

UnitedStates.Spine2004;29:884–90;discussion891.12 MalmivaaraA,HäkkinenU,AroT,etal.TheTreatmentofAcuteLowBackPain—BedRest,Exercises,or

Ordinary Activity? New England Journal of Medicine. 1995;332:351–5.doi:10.1056/nejm199502093320602

13 2019 ICD-10-CM Diagnosis Code M54.5: Low back pain.https://www.icd10data.com/ICD10CM/Codes/M00-M99/M50-M54/M54-/M54.5 (accessed 24 Apr2019).

14 PetersenT,LaslettM,JuhlC.Clinicalclassificationinlowbackpain:best-evidencediagnosticrulesbasedonsystematicreviews.BMCMusculoskeletDisord2017;18:188.

15 CasazzaBA.Diagnosisandtreatmentofacutelowbackpain.AmFamPhysician2012;85:343.16 FeuersteinM,MarcusSC,HuangGD.Nationaltrendsinnonoperativecarefornonspecificbackpain.Spine

J2004;4:56–63.17 KesslerRC,DavisRB,FosterDF,etal. Long-term trends in theuseof complementaryandalternative

medicaltherapiesintheUnitedStates.AnnInternMed2001;135:262–8.18 MartinBI,DeyoRA,MirzaSK,etal.Expendituresandhealthstatusamongadultswithbackandneck

problems.JAMA2008;299:656–64.19 FreburgerJK,HolmesGM,AgansRP,etal.Therisingprevalenceofchroniclowbackpain.ArchInternMed

2009;169:251–8.20 RossignolM,AbenhaimL,SéguinP,etal.Coordinationofprimaryhealthcareforbackpain.Arandomized

controlledtrial.Spine2000;25:251–8;discussion258–9.21 CruzLC,AlamgirHA,ShethP,etal.Developmentofareturntoworktoolforprimarycareprovidersfor

patientswithlowbackpain:Apilotstudy.JFamilyMedPrimCare2018;7:1185–92.22 OwensJD,HegmannKT,ThieseMS,etal.ImpactsofAdherencetoEvidence-BasedMedicineGuidelines

fortheManagementofAcuteLowBackPainonCostsofWorker’sCompensationClaims.JOccupEnviron



https://doi.org/10.1101/19010462

Med2019;61:445–52.23 Reporting Pain in ICD-10-CM. Coding Strategies. https://www.codingstrategies.com/news/reporting-

pain-icd-10-cm(accessed21Jun2019).24 GrossDP,Armijo-OlivoS,ShawWS,etal.ClinicalDecisionSupportToolsforSelectingInterventionsfor

PatientswithDisablingMusculoskeletalDisorders:AScopingReview.JOccupRehabil2016;26:286–318.25 JensenPB,JensenLJ,BrunakS.Miningelectronichealthrecords:towardsbetterresearchapplicationsand

clinicalcare.NatRevGenet2012;13:395–405.26 GlicksbergBS,JohnsonKW,DudleyJT.Thenextgenerationofprecisionmedicine:observationalstudies,

electronichealthrecords,biobanksandcontinuousmonitoring.HumMolGenet2018;27:R56–62.27 LeCunY,BengioY,HintonG.Deeplearning.Nature2015;521:436–44.28 MiottoR, Li L,KiddBA,etal.DeepPatient:AnUnsupervisedRepresentation toPredict theFutureof

PatientsfromtheElectronicHealthRecords.SciRep2016;6:26094.29 MiottoR,WangF,WangS,etal.Deeplearningforhealthcare:review,opportunitiesandchallenges.Brief

BioinformPublishedOnlineFirst:6May2017.doi:10.1093/bib/bbx04430 ChoiE,BahadoriMT,SchuetzA,etal.DoctorAI:PredictingClinicalEventsviaRecurrentNeuralNetworks.

arXiv[cs.LG].2015.http://arxiv.org/abs/1511.05942v1131 XiaoC,ChoiE,SunJ.Opportunitiesandchallengesindevelopingdeeplearningmodelsusingelectronic

health recordsdata: a systematic review. JAmMed InformAssoc PublishedOnlineFirst:8 June2018.doi:10.1093/jamia/ocy068

32 RajkomarA,OrenE,ChenK,etal.Scalableandaccuratedeeplearningwithelectronichealthrecords.npjDigitalMedicine2018;1:18.

33 MiottoR,LiL,DudleyJT.DeepLearningtoPredictPatientFutureDiseasesfromtheElectronicHealthRecords.In:FerroN,CrestaniF,MoensM-F,etal.,eds.AdvancesinInformationRetrieval.Cham::SpringerInternationalPublishing2016.768–74.

34 GoldbergY.APrimeronNeuralNetworkModelsforNaturalLanguageProcessing. JournalofArtificialIntelligenceResearch.2016;57:345–420.doi:10.1613/jair.4992

35 WuY,SchusterM,ChenZ,etal.Google’sNeuralMachineTranslationSystem:BridgingtheGapbetweenHumanandMachineTranslation.arXiv[cs.CL].2016.http://arxiv.org/abs/1609.08144

36 KannanA,KurachK,RaviS,etal.SmartReply:AutomatedResponseSuggestionforEmail.In:Proceedingsofthe22NdACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining.NewYork,NY,USA::ACM2016.955–64.

37 VinyalsO,ToshevA,BengioS,etal.Showandtell:Aneuralimagecaptiongenerator.In:ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition.2015.3156–64.

38 SheikhalishahiS,MiottoR,DudleyJT,etal.NaturalLanguageProcessingofclinicalnotes:AsystematicreviewforChronicDiseases.JMIRMedicalInformatics.2018.doi:10.2196/12239

39 BaumelT,Nassour-KassisJ,CohenR,etal.Multi-LabelClassificationofPatientNotesaCaseStudyonICDCodeAssignment.arXiv[cs.CL].2017.http://arxiv.org/abs/1709.09587

40 MullenbachJ,WiegreffeS,DukeJ,etal.ExplainablePredictionofMedicalCodesfromClinicalText.arXiv[cs.CL].2018.http://arxiv.org/abs/1802.05695

41 Shi H, Xie P, Hu Z, et al. Towards Automated ICD Coding Using Deep Learning. arXiv [cs.CL].2017.http://arxiv.org/abs/1711.04075

42 Liu J, Zhang Z, RazavianN.DeepEHR: ChronicDisease PredictionUsingMedicalNotes. arXiv [cs.LG].2018.http://arxiv.org/abs/1808.04928

43 Yoon H-J, Ramanathan A, Tourassi G. Multi-task Deep Neural Networks for Automated Extraction ofPrimarySiteandLateralityInformationfromCancerPathologyReports.In:AdvancesinBigData.SpringerInternationalPublishing2017.195–204.

44 QiuJX,YoonH-J,FearnPA,etal.DeepLearningforAutomatedExtractionofPrimarySitesFromCancerPathologyReports.IEEEJBiomedHealthInform2018;22:244–51.

45 Turner CA, Jacobs AD, Marques CK, et al. Word2Vec inversion and traditional text classifiers for



https://doi.org/10.1101/19010462

phenotypinglupus.BMCMedInformDecisMak2017;17:126.46 GehrmannS,DernoncourtF, LiY,et al. ComparingRule-BasedandDeepLearningModels forPatient

Phenotyping.arXiv[cs.CL].2017.http://arxiv.org/abs/1703.0870547 DavisDA,Taylor-VaiseyA.Translatingguidelinesintopractice.Asystematicreviewoftheoreticconcepts,

practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ1997;157:408–16.

48 NavarroG.AGuidedTourtoApproximateStringMatching.ACMComputSurv2001;33:31–88.49 ChapmanWW,BridewellW,HanburyP,etal.Asimplealgorithmfor identifyingnegated findingsand

diseasesindischargesummaries.JBiomedInform2001;34:301–10.50 Blei DM. Probabilistic topic models. Communications of the ACM. 2012;55:77.

doi:10.1145/2133806.213382651 BleiDM,NgAY,JordanMI.LatentDirichletAllocation.JMachLearnRes2003;3:993–1022.52 Miotto R,Weng C. Case-based reasoning using electronic health records efficiently identifies eligible

patientsforclinicaltrials.JAmMedInformAssoc2015;22:e141–50.53 PerotteAJ,WoodF,ElhadadN,etal.HierarchicallySupervisedLatentDirichletAllocation. In: Shawe-

TaylorJ,ZemelRS,BartlettPL,etal.,eds.AdvancesinNeuralInformationProcessingSystems24.CurranAssociates,Inc.2011.2609–17.

54 ChangKR,LouX,KaraletsosT,etal.AnEmpiricalAnalysisofTopicModelingforMiningCancerClinicalNotes.bioRxiv.2016;:062307.doi:10.1101/062307

55 CohenR,AviramI,ElhadadM,etal.Redundancy-awaretopicmodelingforpatientrecordnotes.PLoSOne2014;9:e87555.

56 Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and theirCompositionality.In:BurgesCJC,BottouL,WellingM,etal.,eds.AdvancesinNeuralInformationProcessingSystems26.CurranAssociates,Inc.2013.3111–9.

57 GlicksbergBS,MiottoR,JohnsonKW,etal.AutomateddiseasecohortselectionusingwordembeddingsfromElectronicHealthRecords.PacSympBiocomput2018;23:145–56.

58 Choi Y, ChiuCY-I, SontagD. LearningLow-DimensionalRepresentations ofMedical Concepts.AMIA JtSummitsTranslSciProc2016;2016:41–50.

59 WangY, Liu S, AfzalN, et al. A comparison ofword embeddings for the biomedical natural languageprocessing.JBiomedInform2018;87:12–20.

60 Kim Y. Convolutional Neural Networks for Sentence Classification. arXiv [cs.CL].2014.http://arxiv.org/abs/1408.5882

61 Ricardo BY, Berthier RN. Modern Information Retrieval: the concepts and technology behind searchsecondedition.AddisionWesley2011.

62 HolzingerA,BiemannC,PattichisCS,etal.WhatdoweneedtobuildexplainableAIsystemsforthemedicaldomain?arXiv[cs.AI].2017.http://arxiv.org/abs/1712.09923

63 LiptonZC.TheMythosofModelInterpretability.arXiv[cs.LG].2016.http://arxiv.org/abs/1606.0349064 MarewskiJN,GigerenzerG.Heuristicdecisionmakinginmedicine.DialoguesClinNeurosci2012;14:77–

89.65 SaposnikG,RedelmeierD,RuffCC,etal.Cognitivebiasesassociatedwithmedicaldecisions:asystematic

review.BMCMedInformDecisMak2016;16:138.66 VaswaniA,ShazeerN,ParmarN,etal.AttentionisAllyouNeed.In:GuyonI,LuxburgUV,BengioS,etal.,

eds.AdvancesinNeuralInformationProcessingSystems30.CurranAssociates,Inc.2017.5998–6008.67 DevlinJ,ChangM-W,LeeK,etal.BERT:Pre-trainingofDeepBidirectionalTransformersforLanguage

Understanding.arXiv[cs.CL].2018.http://arxiv.org/abs/1810.0480568 YangZ,DaiZ,YangY,etal.XLNet:GeneralizedAutoregressivePretrainingforLanguageUnderstanding.

arXiv[cs.CL].2019.http://arxiv.org/abs/1906.08237



https://doi.org/10.1101/19010462

Documents

Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised