Extracting Insights from Electronic Health Records: …ben/papers/Wang2011Extracting.pdfORIGINAL PAPER Extracting Insights from Electronic Health Records: Case Studies, a Visual Analytics

ORIGINAL PAPER

Extracting Insights from Electronic Health Records: CaseStudies, a Visual Analytics Process Model, and DesignRecommendations

Taowei David Wang & Krist Wongsuphasawat &Catherine Plaisant & Ben Shneiderman

Received: 17 January 2011 /Accepted: 13 April 2011 /Published online: 4 May 2011# Springer Science+Business Media, LLC 2011

Abstract Current electronic health record (EHR) systemsfacilitate the storage, retrieval, persistence, and sharing ofpatient data. However, the way physicians interact with EHRshas not changed much.More specifically, support for temporalanalysis of a large number of EHRs has been lacking. Anumber of information visualization techniques have beenproposed to alleviate this problem. Unfortunately, due to theirlimited application to a single case study, the results are oftendifficult to generalize across medical scenarios.We present theusage data of Lifelines2 (Wang et al. 2008), our informationvisualization system, and user comments, both collected overeight different medical case studies. We generalize ourexperience into a visual analytics process model for multipleEHRs. Based on our analysis, we make seven designrecommendations to information visualization tools toexplore EHR systems.

Keywords Information visualization .Medical records[L01.280.900.968] . User-computer interface[L01.224.900.910] . Computer-assisted decision making[L01.700.508.100] . Computer graphics [L01.224.108]

Introduction

Whether it is to diagnose a single patient or to obtainquality assurance measures of health care by analyzing

multiple patients, physicians and clinical researchers mustincorporate large amounts of multivariate historic data.Current electronic health record (EHR) systems facilitatethe storage, retrieval, persistence, and sharing of patienthealth information; however, the availability of informationdoes not directly translate to adequate support for complextasks physicians and clinical researchers encounter everyday.

Overwhelmingly large amounts of information and alack of support for temporal queries and analyses are but afew problems physicians and clinical researchers face. As aresult, a number of information visualization systems havebeen introduced to address these issues. These systemssupport higher-level decision-making and exploratory anal-ysis tasks in the medical domain. Commendably, thesesystems aim to solve real problems physicians face and toadd value to the EHR systems for the end-users. However,these systems are often designed for one specific medicalscenario, and subsequently evaluated on that scenario. As aresult, it is difficult to make generalizations on physicians’visual analytics process or the process’s user requirementsin EHRs.

In contrast, our information visualization tool, Lifelines2,has been applied to 11 different case studies, eight of whichare in the medical domain. By case studies, we mean along-term, in-depth study on our users’ usage and experi-ence of Lifelines2 on a domain and data sets that they selectand care about. Case studies provide human-computerinteraction (HCI) researchers a valuable perspective onhow their tools are used in the real world, as opposed to inan experimental setting. This differs from medical casestudies. Since Lifelines2’s inception, we have workedclosely with physicians and hospital administrators togather user requirements for the tasks of temporal searchand exploratory analysis of multiple patient records overtime. It has been used by physicians for the purpose of (1)

T. D. Wang :K. Wongsuphasawat : C. Plaisant :B. ShneidermanDepartment of Computer Science & Human-Computer InteractionLab, University of Maryland,College Park, MD 20742, USA

T. D. Wang (*)1 Constitution Center OCC-02-260LBoston, MA 02129, USAe-mail: [email protected]

J Med Syst (2011) 35:1135–1152DOI 10.1007/s10916-011-9718-x

obtaining quality assurance measures, (2) assessing impacton patient care due to hospital protocol changes, (3)replicating published clinical studies using in-hospital data,and (4) simply searching for patients with interestingmedical event patterns.

Over the two-and-half year period in which these casestudies took place, we observed how physicians usedLifelines2, logged the actions performed and features used,and collected physicians’ comments. By analyzing theobservations, logs, and user-feedback, we were able tomake generalizations about searching for temporal infor-mation in EHRs. Related work first presents related work.Lifelines2 introduces Lifelines2 and describes one casestudy in detail. We then present an analysis of Lifelines2log data and a process model, and conclude with a list ofdesign recommendations. Further information and videodemonstrations are available at http://www.cs.umd.edu/hcil/lifelines2.

Related work

As EHR systems become more prevalent, the need foreffective techniques to interact with EHRs also becomesmore pressing. A growing number of recent field researchefforts have studied how end-users interact with EHRs inhospitals. While some studies have focused on how patientscan benefit from a display of their own EHR [1], mostefforts have focused on how medical professionals behaveas end users. These studies follow, for example, physicians’workflow in supplementing, annotating, and reusing EHRs[2–5]. These field studies identify important design chal-lenges, which EHR systems designers must overcome tosupport medical professionals’ tasks. Unfortunately, thefield studies often fall short of recommending possibletechnologies for solving these problems [2, 4, 5].

Many EHR systems lack features that support importantend-user tasks. Exploratory analysis, effective representa-tion, and temporal queries are but a few that are often foundlacking even in state-of-the-art systems such as Amalga [6]or i2b2 [7]. As a result, many information visualizationsystems have been proposed with different techniques tosupport these tasks and supplement the EHR systems.These systems focus on novel ways to integrate andvisualize personal information in a useful way. Theintegration aspect is akin to creating personal histories orlife narratives [8–10] to better contextualize medicalinformation. The visualization techniques vary widely fordifferent use cases. Some approaches are static visual-izations, such as the one proposed by Powsner and Tufte[11], but most modern ones are interactive. Many of thesesupport only a single EHR—Lifelines [12], Midgaard [13],Web-Based Interactive Visualization System [14], VIE-

VISU [15], to name a few. They generally focus onsupporting physicians to quickly absorb a patient’s poten-tially lengthy medical history in order to make bettermedical decisions. On the other hand, a number of systemsexpand the coverage to multiple EHRs, for example,Similan [16], Protempa [17], Gravi++ [18], VISITORS[19], and IPBC [20]. These systems typically focus onnovel search and aggregation strategies for multiple EHRs.

These information visualization systems are all motivat-ed by real issues physicians or clinical researchersencounter when the typical presentation of medical data isnot conducive to their analysis tasks. However, because oflimited availability of physicians and clinical researchers,very few systems have gone through multiple detailed long-term case studies [21]. While these systems demonstrate theusefulness of their features in one or two isolated medicalcase studies, the results are harder to generalize. As aconsequence, these information visualization efforts rarelymake broader generalizations about their techniques. Theyalso rarely make recommendations on the directionsinformation visualization designers for EHRs should pursuefurther. In contrast, we applied Lifelines2 to 11 case studies,eight of which are medical scenarios according to themultidimensional in-depth long-term case studies (MILC)model [22]. By analyzing the multidimensional user andusage data, we believe we can contribute to the field bymaking useful generalizations and recommendations. How-ever, because Lifelines2 aims to support searching andexploring multiple EHRs, the generalizations and recom-mendations presented in this work may not apply to thedesign of single-EHR systems.

In addition to presenting the analysis of user and usagedata of Lifelines2, we also present a process model whichgeneralizes how physicians seek information in EHRs. Ourprocess model is similar in construction to the sense-making loop presented by Stuart Card and others [23–25].However, ours differ in the level of granularity andapplication domain. We focus specifically on multipleEHRs and with a strong emphasis on temporal analysis.Our level of granularity and task-specificity is similar to theproposed process model for social network analysis [26].

Lifelines2

Lifelines2 is designed for visualizing temporal categoricaldata for multiple records. Temporal categorical data aretime-stamped data points that are not numerical in nature.For example, in an EHR, the patient’s past hospital visits,diagnoses, treatments, medication prescribed, medical testsperformed, etc. can be considered as temporal categoricaldata. These data are point data (no durations) with a name,and can be thought of as “events.” This differs from

1136 J Med Syst (2011) 35:1135–1152

http://www.cs.umd.edu/hcil/lifelines2

http://www.cs.umd.edu/hcil/lifelines2

temporal numerical data such as blood pressure readings, orplatelet counts. Lifelines2 visualizes these temporal cate-gorical data and provides a number of visualization andinteraction techniques for exploratory analysis.

Figure 1 shows a screen shot of Lifelines2, in whichregion (a) is Lifelines2’s main display of EHRs. Eachpatient occupies a row, and is identified by its ID on theleft. Under the ID, a list of event types in that EHR is listed.Each event is represented by a color-coded triangle andplaced on the time line. Region (c) is the control panel forLifelines2. Each patient is Aligned by the 1st occurrence ofIMC, Ranked by the number of ICU events, and Filtered bythe sequence of events [Admit, No ICU, IMC, ICU]. EHRsthat match the Filter are highlighted in orange. Of the 318EHRs in Fig. 1, only 39 were found to be matches. Region(b) is called a temporal summary, and it displays thedistribution of Admit, Exit, and ICU events. In (a) and (b),analysts can zoom in, zoom out, pan, and scroll. Tool tipsprovide detailed information for each event when mousedover.

By Aligning every patient by its corresponding events(1st, 2nd, …, and last, 2nd to the last, …), physicians can

better compare the patients as a group. Events that occurcommonly before or after the Alignment can be more easilydetected. When an Alignment is active, the time linebecomes relative to the Alignment (Fig. 2b). Analysts canalso Align by all occurrences of an event type, in whichcase Lifelines2 duplicates each EHR by the number ofevents of that type the EHR contains, and shifts theduplicates by each of the event instances.

Finally, analysts can Rank the EHRs by their ID (defaultbehavior), or by the number of occurrences of the differentevent types, such as the number of ICU visits. They canalso Filter by the number of occurrences of event types, orby a sequence Filter (Fig. 1c). Align, Rank, Filter areaffectionately called the ARF framework, and serves as abasis for user interaction in Lifelines2 [27].

Temporal summaries [28] are histograms of events overtime. Additionally, analysts can change number of events tonumber of EHRs or number of events per EHR over time.Temporal summaries are temporally synchronized with themain visualization, and share the same temporal granularity.Analysts can use direct manipulation to select EHRs thatcontribute to a certain bin in the histogram. By combining

Fig. 1 A screen shot of Lifelines2. a shows the main visualization ofmultiple EHRs. b is a temporal summary, showing the distribution ofthe three event types Admit, Exit, and ICU over time. c is the control

panel for Lifelines2. Each of the 318 patients is Aligned by their 1stoccurrence of IMC, Ranked by the number of ICU events, and Filteredby the sequence of events

J Med Syst (2011) 35:1135–1152 1137

alignment and temporal summaries, analysts can select, forexample, all patients that entered the ICU within 24 h ofentering the IMC.

After Filtering and selection, analysts can optionallysave their results as a separate group. Set operations areavailable to create the union, intersection, and difference onthe groups. Multiple groups can be compared in compar-ison mode, where one temporal summary represents agroup, and arbitrarily many groups can be compared.Figure 3 compares two groups of patients by their hospitalexit events over time. The first contains patients who haveentered the emergency room (ER), and the second containsthose who have not. These patients are all Aligned by theiradmission time (Admit), and the distribution of Exit eventsis plotted. The events are normalized by the number ofpatients in each group, subsequently the bars represent the

percentage of patients who exit in each day following theiradmission. There is a peak for patients who go through theER, while those who do not have a more irregulardistribution. The comparison features allow physicians to,for example, directly compare patient groups that undergodifferent treatment options.

Medical case studies

Overview

We conducted our case studies in two phases. In the firstphase (early-adoption), physicians and hospital administra-tors worked with us to iteratively refine and improveLifelines2’s features and usability. Our collaborators

Fig. 2 Part (a) shows threeEHRs in Lifelines2 that areun-aligned (calendar time).Part (b) shows the same threeEHRs aligned by their 1st IMCevent (relative time)

Fig. 3 Comparison of thedistribution of the Exit eventsfor two different groups ofpatients

1138 J Med Syst (2011) 35:1135–1152

learned features of Lifelines2. This early adoption phaselasted for over a year, during which we conducted threecase studies: (1) finding patients who exhibited contrast-induced nephropathy, (2) finding patients who exhibitedheparin-induced thrombocytopenia, and (3) studying hem-atrocrit levels in trauma patients with respect to length ofstay in the hospital and discharge patterns.

After the early-adoption case studies, we conducted eightadditional mature-adoption case studies, five of which werein the medical domain. In this phase, no new novelinteraction or visualization features were implemented inLifelines2. We only added bug fixes and small features thatfacilitate the analyses. In this phase, Lifelines2 was used fora number of different analysis tasks: (1) Replicating a study[29] that investigates the relationship between day lightsavings time change and heart attack incidents usingclinical data. (2) Performing a follow-up study onheparin-induced thrombocytopenia in ICU patients [28],(3) studying hospital room transfer patterns as a measurefor quality assurance (two case studies, one for the Bounce-Back patterns, and the other for the Step-Up patterns), and(4) studying the impact on patient care due to a change inprotocol that governs when Bi-level Positive AirwayPressure (BiPAP) is applied. Table 1 shows the statisticsof selected datasets. While sequences of temporal eventsoften temptingly suggest causality, our case studies are allbased on some well-described phenomena (contrast-in-duced nephropathy, heparin-induced thrombocytopenia, orthe step-up patterns, etc.). No causality is otherwisesuggested. Instead we emphasize that Lifelines2 is a tooldesigned for discovery and search of temporal eventpatterns. With that said, we believe discovery is animportant step that may eventually lead to establishingcausal links with well-designed controlled experiments.

All of our case studies were selected and initiated by ourcollaborators (regardless of phase). Our collaborators wouldpresent a medically relevant question that they would liketo investigate. These questions are typically difficult toanswer with their current EHR system and supportingsoftware. However, not all questions are good candidates.For example, questions that involve analysis of numericaldata, for which Lifelines2 is ill-suited, are discontinued

(such as the hematocrit study). Most of the unsuitablequestions arise during the first phase, when collaborators’familiarity with Lifelines2 is low. By the end of the early-adopter phase, however, our collaborators became expertswith Lifelines2’s features, and, subsequently, became verygood at identifying interesting medical questions suitablefor Lifelines2. Due to time constraints, however, we wereonly able to perform five mature-adoption case studies.

Each case study follows the same template. Physiciansdescribe a medical scenario interesting to them and askdatabase administrators to obtain the relevant data fromtheir current EHR system. The data is preprocessed andthen converted to Lifelines2 format. The data is later loadedin Lifelines2 and interactively explored together by ourcollaborators and us (University of Maryland (UMD)researchers). This exploration often revealed additionalproblems, which may, for example, require additional dataor prompt additional preprocessing. It usually takes two tothree one-hour meetings with our collaborators to ensuregood data quality, followed by more meetings dedicated toanalysis.

During the analysis meetings, the physicians and UMDresearchers share a large display. In the early-adoptionphase, we encourage physicians to interact with Lifelines2directly to (1) familiarize themselves with the features andoperations of the system, and (2) identify bugs and interfaceissues as end-users. In the mature adoption phase, thephysicians would typically dictate what actions to take, andUMD researchers would interact with Lifelines2 based onthe dictation. Using this methodology, we were able tobetter follow our collaborators’ thought process in a fieldwe were unfamiliar with. This also forced our collaboratorsto explain to us the medical significance and nuances oftheir interpretation. During these meetings, we recorded ourcollaborators’ feedback. The feedback typically includedour collaborators’ impressions of Lifelines2, its comparisonand contrast with their current EHR system, and sugges-tions of features to include in future versions. They alsooften include discussions of the case study and proposals ofadditional related case studies. The recording was originallycollected via note-taking and later via audio recording. Allinteractions performed in Lifelines2—Align, Rank, Filter,

Dataset name #Records #Events # Patterns Average length of patterns

Creatinine 3598 32134 5 3.2

Heparin 841 65728 5 4.6

Heart attack 9361 196581 – –

Transfer 51006 207187 5 3

BiPAP 6583 135951 9 3.89

Step-up 284a 1612a 1 3

Bounce-back 544a 3055a 10 3

Table 1 Basic statistics on theselected datasets. The numbersof event patterns and the aver-age length of such patterns ineach case study are also shown

a average number over manyquarterly datasets

J Med Syst (2011) 35:1135–1152 1139

Zoom, etc.—are logged automatically using Lifelines2’slogging facility.

In the early stages of Lifelines2 we worked with aneurologist, an osteopathic physician, and two nursingprofessors. Over the eight medical case studies, the onlymedical professionals we worked with were physicians,including an emergency room director, two professors ofmedicine, an internal medicine physician, and a resident.Some of them participated in more than one case study.Database administrators and EHR system engineers werealso involved. A case study typically takes one to 6 monthsto complete. Some case studies include repeated analysis ofpatients in different time periods, while some are performedfor a single period. Some case studies include patientsduring a 10 year period, while others include only a fewmonths. The number of patients in the case studies can beas few as a couple hundred (with a few thousand events) oras many as 51,000 (with over 207,000 events), and theevent-per-record ratio is different for each case study. Casestudies can take tens of meetings and hundreds of e-mailexchanges to organize, execute, and finally compile finalresults.

Case study: Identifying step-up patterns

We present a case study on patient room transfers in detailto demonstrate how Lifelines2 is used in a real scenario.Hospital rooms can be roughly classified into: (1) ICU(intensive care units that provide the highest level of care),(2) IMC (intermediate medical care rooms that housepatients who need elevated level of care, but not seriousenough to be in ICU), (3) Floor (normal hospital beds thattypically house patients with no life-threatening condi-tions), and (4) Special (emergency room, operating room,or other rooms). In this study, the dataset also includespatients’ hospital admission (Admit) and hospital discharge(Exit) if they have already exited. Each of these room eventdata comes with a time stamp, indicating when the patientis transferred-in. Transfer-out is implied by subsequenttransfer-ins to another rooms or Exit.

The physicians are interested in the Step-Up pattern.This is a pattern where a patient initially triaged to go to anIMC room escalates to ICU rooms immediately. The patternmay be indicative of mis-triage—that is, sending patientstoo sick to IMC instead of ICU in the first place. The exactcriteria are patients who were sent to IMC and escalated toICU within 24 h. For example, the fifth patient from the topin Fig. 1 exhibits exactly the Step-Up pattern.

There were two hypotheses our physician collaboratorsare interested in. First, the nurses in IMC had noticedanecdotally that Step-Ups have increased. Our collaboratorswanted to verify this claim and decide if protocols forperforming triage need to be changed. Secondly, our

collaborators hypothesized that because newly graduateddoctors enter the hospital in the third quarter (July-September) every year, the percentage of Step-Up casesmight be higher in these months due to their inexperience.

The original query seemed easy to perform at first. Byfirst Aligning by all patients’ IMC events and selecting allICU events that occur within 24 h after the Alignment, weshould be able to identify all patients who exhibit the Step-Up case. However, when the authors and our collaboratorsexamined data together, we realized several issues. Forexample, the pattern must not include any Floor eventsbetween IMC and ICU (transferring from IMC to Floorthen to ICU) because this suggests the escalation fromFloor to ICU is likely not due to an earlier triage. Similarly,there should not be an ICU prior to the IMC in question. Ifthere were, the patient was already in ICU, and this wouldnot be considered a Step-Up. These nuances in data werenot expected initially, but the visualization and theapplication of Alignment made their existence alarminglyobvious; whereas a direct application of, for example, SQLwould have made the discovery and the correction difficult.

We first presented the Step-Up case study in an earlierpublication [30]. However, through continued verificationto our analysis process, we discovered a few mistakes inour conversion file and new information from our collab-orators. For example, our collaborators told us that there isthe addition of the MS room, an overflowing area for ICUs.This means that when we account for transfers into ICU inthe Step-Up pattern, we must include patients going to ICUor MS. We modified our conversion script, reconverted theraw data into Lifelines2 format, and re-performed theanalysis. While the main results have not changed, thespecific numbers have. We then apply the followinginteractions in Lifelines2 to identify the Step-Up cases:

1. Perform a sequence Filter using [IMC, No Floor,ICU}], and save the results as a new group namedIMC-No Floor-ICU.

2. Perform a sequence Filter using [IMC, No Floor, MS}],and save the results as a new group named IMC-NoFloor-MS.

3. Use the Union operation in Lifelines2 to combine thetwo groups into a few one: Potential Step-Ups.

4. Align by the 1st occurrence of IMC.5. Temporally select (in a temporal summary) ICU events

that occur any time prior to the Alignment, and removethe selected EHRs.

6. Temporally select ICU and MS events that occur within24 h after Alignment, and keep the selected EHRs.

7. Save as a new group and export this new group as a file.8. Return to group Potential Step-Ups.9. Repeat steps 4–8 by changing the 1st IMC to the nth

IMC. Stop when there are no records with n IMCs.

1140 J Med Syst (2011) 35:1135–1152

We conducted this study for every quarter from January,2007 to March, 2010. Each quarter took roughly 12–20 min toperform. The data contains all patients who have been admittedto IMC in that period. A screen shot of a quarterly data isshown in Fig. 1. Figure 4 shows the number of patientsadmitted to IMC in that period. The IMC patient count isoverall on a rising trend because of hospital expansion.Figure 5 shows the number of patients who exhibit the Step-Up pattern in the same period, a subset of all patients admittedto IMC. The graph looks more jagged than Fig. 4, but theredoes also seem to have a upward trend. Finally, we plot thepercentage of patients who exhibit Step-Up patterns (out ofthe IMC patient counts) in Fig. 6. The percentage of Step-Uppatients peaks in the second quarter of 2008 (at nearly 8%),but has mostly been holding steady between 4 and 7%. Oneof our physician collaborators explains, “The nurses musthave gotten the impression that mis-triaging occurred moreoften because they have encountered more Step-Up cases.They felt the increased number of cases was due to errors inthe triaging, while the real reason is more likely due to theincrease of IMC patients.” He also added, “The reason for theincrease of IMC patients was not due to the increase ofdiseases or injuries. Instead, it was merely a reflection on theexpansion of IMC care in the hospital.”

The average percentage of Step-Up cases for eachquarter is shown in Fig. 7 to investigate the secondhypothesis. Of the four quarters, the second quarter hasthe highest percentage of Step-Up cases (6.55%), and thefirst quarter has the fewest (5.42%). There is no evidence ofan increase of Step-Up cases in quarter 3. One physiciancollaborator commented that, “The attending physicians(supervisors of the residents) must have been doing a goodjob reviewing the results of the resident triaging process.”He, however, did not offer an explanation for why thenumbers in the first quarter are so much lower than theothers or that the second quarter has much higherpercentage.

Our collaborators have taken the results produced fromour collaboration using Lifelines2 (patient counts in Excelspread sheets, graphs, Lifelines2 screen shots, and annota-tion) to a physicians’ meetings in the hospital. Theconsensus is that these percentages are well-within theboundaries, and changing triage procedures is probably nota necessary course of action at this point. They note that theanalysis is interesting, and would love to keep performingthe same analysis for as long as possible to build anhistorical baseline and so every quarter in the future can beevaluated the same way. Finally, as one collaborators notes,the data can be used to compare to the numbers from otherhospital care systems, for the purpose of hospital metrics.

User experience and feedback

During the early-adoption case studies, several physiciansgave good appraisal to the visual representation in Lifelines2.One collaborator said, “I am a very visual person. To be ableto see the patient records this way allows me to understand itso much quicker and more reliably.” Later when interviewedby Terp Magazine, he commented, “This technology savestime and gives us another important diagnostic tool”, and“[it] will not only make for better care by doctors, but alsohelp patients make healthier choices on their own” [31].Another collaborator chimed in about the Alignment feature,“This is great. To be able to see what occurs before and aftera heart attack in all these patients is great.”

While during the mature adoption case studies, ourphysician collaborators dictated and did not directly interactwith Lifelines2 in the majority of the collaborative analysissessions, they did sometimes drive the application and gavefeedback on using Lifelines2. In the Step-Up case, for example,one collaborator drove the application alone a few times.Through a hospital initiative to explore novel technologies thatcan help improve patient care, he was able to work closely withthe UMD researchers far more frequently than other physicians.

Fig. 4 The quarterly number ofpatients who entered IMC forthe period of January 2007 toMarch 2010

J Med Syst (2011) 35:1135–1152 1141

By the time we started conducting the Step-Up case study, wehad already met a few times, and he became fairly familiar withLifelines2. We observed that he had no problem using Align,Group Selection/Creation, Rank, Temporal Summaries, andSelections on Temporal Summaries. However, he did have aproblem formulating his query into a sequence Filter. Hementioned that he did not see the sequence Filter controlimmediately on screen so it did not remind him how to get tothe Filter. After we showed him how to get to the sequenceFilter, he was able to perform the necessary queries for Step-Up in a matter of minutes by himself.

Over all, he felt that “Lifelines2 would save me so muchtime to deal with all the different scenarios.”, “I wouldnever have to spend hours to write broken Excel scripts thatproduce low-quality data ever again!”. He also commentedthat, “the good thing about Lifelines2 is its visual power[…] I can visually look quickly to see if there is anythingamiss,” and “I can perform the pattern-finding in matter ofminutes, and feel that data is far more reliable [than theExcel spreadsheet he had created] at the same time.”Finally, having spent several months working with Life-lines2 also changed how he views clinical data. In

particular, when I asked him about Lifelines2, he said that,“Yeah, alignment was very different idea, yet so natural! Ithink about clinical problems in terms of alignment now,but none of my coworkers does the same.”

Throughout the interaction with domain experts, weobserved that temporal ordering and aggregation of eventsoften elicit domain knowledge, resulting in a fullerdescription of a patient’s situation than the data wouldconvey to a non-domain expert. For example, seeing apatient with annual attacks of asthma and pneumonia everyautumn led a nursing professor to surmise that the patientmay be an elderly or someone vulnerable to seasonal flu[27]. In the contrast-induced nephropathy case, the emer-gency room director could identify patients who probablyhad chronic kidney problems [28]. The extent andfrequency of contextual domain knowledge being recalledis difficult to quantify. However, it is widely believed in theinformation visualization field that the appropriate repre-sentation of data can lead to better interpretation [12, 13,21], as these two examples illustrate how a physician mayapply domain knowledge to suggest patients’ situation notincluded in plain data.

Fig. 5 The quarterly number ofpatients who exhibited the Step-up pattern for the period ofJanuary 2007 to March 2010

Fig. 6 The percentage of theStep-Up patients over all IMCpatients in the period of January2007 to March 2010

1142 J Med Syst (2011) 35:1135–1152

Despite all the positive feedback, several user interfaceissues were identified in the mature adoption case studies.First, the align-by-all operator can be conceptually confusing.While analysts are presented with instances (duplicates ofrecords), Lifelines2 operates on records. This inconsistencyis a major design problem because the visual representationdiffers from what is happening in the data. Since the firstmeetings on the Bounce-Back and Step-Up studies, thiscritical issue has been fixed. Secondly, some case studiessuch as the Bounce-Back study and the Step-Up studyrequire the merging of a number of groups. Initially weperform merging outside of Lifelines2, but that soonbecomes a burden. New functionalities are then implementedto support typical set operations: union, intersection, anddifference to facilitate analysis. Finally, features that supportmanually creating, adding, or removing records to/from agroup are also implemented to support manual review of therecords in case studies such as BiPAP. Other smaller featuressuch as search-by patient ID, data export, group import,annotation, and screen capture are added to facilitate theoverall analysis process. Some features our collaboratorswould like are left out. For example, the dream feature forour collaborator on the Step-Up case study is automation ofan analysis, as he intends to perform the same study forevery quarter longitudinally. He wants it to run the analysisas a script, and at the end allow him the freedom to visuallyinspect the results so that he can manually change the stepsof analysis if he needs to. For example, if his review of theprocess revealed something wrong, he would like to allowfor different branches of the analysis to take place.

Implementation details and system performance

Lifelines2 is developed entirely in Java. It contains over27,000 lines of code, and uses the Piccolo 2D graphicslibrary [32]. All performance measures in this section areobtained from running Lifelines2 on a 2.4 GHz Intel Core2

Duo laptop running Windows Vista Home Premium Edition64-bit, with 4 GB of RAM. As an inspirational prototype,Lifelines2 is not tightly coupled with an underlyingdatabase system, which allows us to better work withcollaborators who may use a variety of different storagetechnologies. Lifelines2 takes in a simple three-column textfile, where the first column lists the patient IDs; the secondlists the event type; and the third lists the time stamps. Eachrow thus describes one event that belongs to a patient. Thisformat is designed to be easily created by any databasesystem or spreadsheet. Because Lifelines2 is not connectedto a database, the amount of data it can load is limited toamount of RAM available on the machine (and assigned tothe Java Virtual Machine). In our tests, Lifelines2 is able toload up to over 100,000 records with over 1,500,000 eventswhen the Java Virtual Machine is allocated 1 GB of RAM.

The main functionalities in Lifelines2 are its visuali-zation and its interaction techniques. In order to providea sense of fluidity when using the system, much workhas been dedicated to make sure the drawing and theinteraction are optimized. For example, drawing of theEHRs only occurs for visible EHRs and developing anovel pattern search algorithm [33]. The drawing speed isdetermined by the number of events that appear on screen,while the query (performing Align-Rank-Filter) speed isdetermined by the number of records in the set, and thedensity of events. Figure 8 shows a comparison ofLifelines2’s query time and render time over five datasetof different sizes. The data points represent the average often different trials. A linear fitting is performed to showthe slope. Query time outpaces render time quickly as thedata size grows. Analyzing the details of the trials revealsalso that the query time has twice as much standarddeviation as the render time. The bound at which a userexperiences a noticeable delay is if an action or redrawtakes longer than 160 milliseconds. This is only achiev-able for data size of around 10,000 records for querying

Fig. 7 The average percentageof Step-Up cases over all IMCpatients in the period of January2007 to March 2010

J Med Syst (2011) 35:1135–1152 1143

and rendering. When the data size grows past 50,000,rendering has a relative short turn-around of 200 milli-seconds, but querying takes up 400 milliseconds.

Interaction logs

The case studies such as the one presented in Case study:Identifying step-up patterns demonstrate how Lifelines2can be beneficial to analysts in medical scenarios. However,some case studies rely on a set of Lifelines2 features morethan the others. For example, in replicating a study thatlinks heart attack incidents to daylight savings time change[29], analysts found the features in temporal summaries inconjunction with Alignment are sufficient. Alignment, Rank,and Filter were not necessary. In the Step-Up study,however, more features are required.

By September of 2008, most of Lifelines2 features werecomplete. Since then, the logging facilities in Lifelines2had been logging all user actions. The logs keep track ofanalysts’ every action in Lifelines2—Align, Rank, Filter,Zoom, Scroll, etc. The Lifelines2 log output is in the format

of Lifelines2 input, so the logs can be read by Lifelines2 forour analysis. There are a total of 2477 Lifelines2 sessionlogs. However, many of the logs are short, and no casestudy files were opened aside from the default sample file.These are indicative of testing/debugging sessions insteadof analysis/exploration sessions. After removing thesetesting/debugging sessions, 426 real sessions remain. Weloaded these sessions into Lifelines2 for analysis. Thetemporal summary in Fig. 9 shows the number of events ofAlign, Rank, and Filter. The minimal amount of activity inJanuary 2009 and summer of 2009 represents winter breakand summer vacation. The peculiar spike in October of2009 represents frequent meetings and analysis of thehospital transfer data with our collaborators. The amount ofoperations in that period was reflective of the fact that thesecase studies involved over 15 datasets and many analyses.Table 2 summarizes the logs of the usage of Lifelines2 forthe 426 sessions. The operations are broken down into fivemain categories: ARF, Temporal Summary, Comparison,Data Operations, and Navigation. The table includes theraw number of counts, counts per session, and percentageof sessions that logged such operations.

Fig. 8 The time growth com-parison for ARF (query) time,and render time in Lifelines2over 5 datasets of various sizes

Fig. 9 Lifelines2’s temporalsummary shows the distributionof Align, Rank, and Filter usagein 426 case studies sessionsfrom September 2008 toFebruary 2010

1144 J Med Syst (2011) 35:1135–1152

With respect to the ARF Framework, Filter was themost-frequently used operator. Alignment was second, andtrailed by Rank. However, a larger percentage (87%) ofsessions recorded at least one use of Align, while only 68%had any Filter. While Rank was useful to reorder therecords by their event counts, it was ultimately not a vitaloperator in our case studies. When pairs of Align, Rank, andFilter were looked at as sequences,

[Align, Filter] and [Filter, Align] occurred in 250 (59%)and 211 (50%) sessions respectively. [Align, Rank] and[Rank, Align] occurred at 162 (38%) and 123 (29%)sessions respectively. Finally, [Rank, Filter] occurred in148 (37%) sessions, and [Filter, Rank] occurred in only 48(11%) sessions. When looking at sequences of threeoperators, the break down (number of sessions that hadthe contained the sequence) is as follows: ARF ([Align,Rank, Filter]): 123, AFR: 41, FAR: 37, FRA: 27, RFA: 104,and RAF: 87. These numbers indicate that although Rank isthe least popular of the three operations, when it is used,Rank is typically used prior to Align or Filter, or both.

30% of the sessions used temporal summaries, and 24%used selections in temporal summary. However, the average

number of these operations across all sessions was over 1per session. This means that in sessions that theseoperations were used, they were used many times, so muchthat the average count per record is brought up. Theoperations under the Comparison feature only occurred in11–13% of all sessions. However, analysts tended tochange the event types in the comparison and the groupsin the comparison heavily. Changing the type of compar-ison (Between Group/Within Group/Both) or the type ofaggregation (Events/Records/Events Normalized By Re-cord Count) were less frequently used.

In these EHR case studies, analysts tended to use KeepSelected as opposed to Remove Selected in conjunction withfiltering. Save Group occurred in 29% of the sessions whileChange Group occurred only in 23%. This indicates thatfor some datasets, analysts would save a group, but notchange into that group specifically. This situation occursbecause Lifelines2 automatically brings the analysts to thenewly created group without having them perform thegroup change themselves. By raw counts, Change Group,as expected, is used more frequently than Save Group.

The first thing to notice in the navigation operations isthat Scroll (to pan vertically) is a dominant operation. Everysession involved scrolling, and on average, each session hasmore than 16. Changing the Time Range Slider (to zoom orpan horizontally) was a distant second in usage in thiscategory. In contrast, Change Granularity (temporal gran-ularity) was not as popular. This may be attributed to thefact that using the Time Range Slider analyst can control thetemporal range more finely. Even after using the cruderChange Granularity, an adjustment in the Time RangeSlider was often necessary. Zoom In was used more oftenthan Zoom Out. This is attributed to the fact that users canperform zoom out by using Change Granularity or use theTime Range Slider. Collapse Record and Expand Recordwere the least used features. These features collapse thevertical space of each EHRs so that more can fit in onescreen, or expand them to see details more clearly.

A process model for exploring temporal categoricalrecords

Thomas and Cook’s defining book on visual analytics,Illuminating the Path: the Research and DevelopmentAgenda for Visual Analytics [23], presents a canonicalprocess model for analytical reasoning:

1. Gathering information.2. Re-representing the information to aid analysis.3. Developing insight through the manipulation of the

representation.4. Producing results from the insight.

Table 2 Operator usage in Lifelines2 through our case studies

Operation Count Average/session % sessions

ARF

Align 1680 3.94 87%

Rank 260 0.61 42%

Filter 2564 6.02 68%

Temporal summary

Show summary 623 1.46 30%

Temporal selection 531 1.25 24%

Comparison

Event type change 406 0.95 11%

Comparison type change 79 0.18 12%

Group change 406 0.95 12%

Distribution type change 179 0.42 13%

Data operation

Keep selected 400 0.94 30%

Remove selected 96 0.23 18%

Save group 409 0.96 29%

Change group 687 1.61 23%

Navigation

Zoom in 646 1.52 30%

Zoom out 157 0.37 13%

Time range slider 1865 4.38 29%

Change granularity 217 0.51 14%

Scroll 6840 16.10 100%

Collapse 55 0.13 8%

Expand 30 0.07 5%

J Med Syst (2011) 35:1135–1152 1145

This process is repeated as necessary to complete theanalyses. We extend this canonical process model andconstruct one specifically for exploring temporal categori-cal records by filling out the four steps in detail. We use amultidimensional approach advocated by Shneiderman andPlaisant [22]. The observations of, interviews with, andcomments from our collaborators are corroborated withLifelines2’s log data to construct the detailed model. Anexample sequence of visual analytics steps using Lifelines2is shown in Fig. 10.

Gathering information

Since Lifelines2 is not directly linked to the databases ahospital may have, we obtain our data through ourphysician collaborators, facilitated by database administra-tors. After physicians decide on a medically interesting casestudy, they begin scoping of the data that they want toexamine, e.g. the scope of patients, time frame, relevantevents. The physicians then request database administratorsof the hospital to gather the requisite data from the EHRsystem. The de-identified data is then preprocessed intoLifelines2 format for our case studies. At any point of theanalytic process, we sometimes revisit this informationgathering stage because the physicians (1) become unsat-isfied with the data, (2) found systematic errors in the data,or (3) want to incorporate more data for deeper analyses.

In our experience, the information gathering stagetypically takes a long time, because of the complexity ofthe data, underlying data semantics, and infrastructural or

organizational barriers. For example, a case study mayrequire data residing in several potentially isolated, data-bases and medical terms using different IDs and codes ineach database. With the paucity of documentation on themapping of medical terminology to the terms used in thedatabase schema, a specific medical term may be difficultto search, and may require first finding someone whoknows where it is. Even with physicians working closelywith database administrators, this stage can take from daysto weeks, depending on how involved the case study is andthe ease in locating the data.

Re-representing information

After de-identification and preprocessing of the raw datato Lifelines2 format, the data is loaded in Lifelines2, andour physician collaborators would examine the resultvisually in Lifelines2. The visualization enables physi-cians to better see temporal relationships and investigatecommon predecessors or successors to a specific eventacross patients. The physicians would cursorily browseand sometimes examine in detail the data to make surethe data reflects what they know. One of the mostcommon results in seeing data for the first time in a newvisualization is the discovery of interesting artifacts suchas systematic errors, lack of data consistency, etc. Forexample, when the data in the mature heparin-inducedthrombocytopenia case study was first converted, ourphysician collaborators found that some patients weregiven drugs after they had been discharged dead! We were

Fig. 10 An example sequence of visual analytics steps using Lifelines2in the early adoption contrast and creatinine case study. Alignment isapplied to focus on the orange Contrast events. Rank is applied to show

patients with the highest number of red Low Creatinine events, andTemporal Summary and Zoom are used to see details

1146 J Med Syst (2011) 35:1135–1152

able to find 7 such cases and determined that they alloccurred within 1 h of their Discharge Dead events. Byconsulting the original dataset to make sure this was not anerror that occurred in preprocessing, our physiciancollaborators were able to conclude that this occurredbecause of the systematic delays in the drug databasewhile the data such as Patient Discharged events in otherdatabases are unaffected by the delay. For that case study,this incident raised questions on how reliable the timestamp was for drugs, and whether subsequent case studieswould be affected. We eventually found better data tocircumvent this particular systematic problem.

Another scenario included inaccuracies in the datasetsuch as a patient is admitted once, but discharged multipletimes. This situation can occur when, for example, multipledatabases keep track of the patients discharge status, andwhen data from these databases are merged naively.Sometimes, however, the data is not usable or is discoveredto be unsuitable for a particular case study, and we wouldtake a step back to the data acquisition stage.

Manipulating representation to gain insight

Utilizing different search strategies

After our physician collaborators gain confidence in thedata and become familiar with Lifelines2’s visualization,they start seeking answers to their questions or findingevidence for their hypotheses. They would change visualrepresentation of the data in order to see event relationshipsmore clearly. This is where visual and data operators suchas Align, Rank, Filter, and Temporal Summary are used toperform exploratory search. As Table 2 suggests, the orderand frequency of each of the operators’ occurrence differcase-by-case. In addition, different analysts have differentexploratory search style. We have observed that someanalysts would apply Alignment on different sentinel eventsin the same exploratory session to look at the data indifferent views. By using different Alignment whileshowing distribution of certain events they care about intemporal summary, they aimed to find useful or telling“sentinel” events with respect to the events in the temporalsummary.

Some analysts take a more traditional approach. Theyactively manipulate the display by Aligning, Ranking,Filtering iteratively, or changing the temporal summary.When issuing these manipulation operators, analysts wouldsometimes issue a number of them in succession, andobserve the results only at the end of the series ofmanipulations. This occurs when they are very familiarwith the data, perhaps have already gone through a fewrounds of analysis. When they are less familiar with thedata, they tend to act more tentatively and deliberately. We

observed careful examination of the data after each datamanipulation operator is applied. They would makemeticulous observations on the distribution of events, andcomment on whether what they see conform to theirunderstanding of the domain.

Regardless of the strategy they used, Alignment remainedthe strongest indicator on their focus on data. A change inAlignment indicates a change of exploratory focus. When thecollaborators realize that an Alignment would not lead themto the information they seek, they would reformulate thequestion and subsequently apply a new Alignment. This hadbeen observed with different physicians in different casestudies, including the mature heparin-induced thrombocyto-penia study and the BiPAP study.

Watching results of manipulation

Another important observation of the analysts in explorato-ry search is that the analysts paid special attention to thechange of data at each step of the data manipulation. Whilethis is not specifically mentioned in the canonical processmodel, we observed this to be a major part of exploratorysearch in all of our case studies. The physicians would payclose attention to the records on screen. They would alsopay attention to how the overall data changes, often bylooking at the temporal summaries. Lifelines2 updates thevisualization immediately after each of the manipulationoperators, and this facilitates our collaborators in identify-ing the differences between each manipulation step. Table 2confirms how often the physicians used Scroll to viewrecord details. Closer analysis revealed that there are twohot spots where many scroll operations are performed. Thefirst is when the analysts are examining the data for the firsttime, where the goal is to obtain confidence on thecorrectness of the data. The second is after the Align, Rank,and Filter operator have been applied. For example,identifying the nuances of sequence Filters in the Step-Upcase study was accomplished in this manner.

Aside from manual scrolling, our physician collabo-rators would also keep an eye on (1) the distribution oftheir favorite events in the temporal summary and (2) thenumber of records in view out of the total number ofrecords. We observe that when Align, Rank, Filter andGroup operators are applied, the analysts would focus onthese two things. They give the analysts a global feel ofhow the data is changing when the operators are applied.In fact, in cases where heavy exploration is needed, as inthe early adoption heparin-induced thrombocytopenia casestudy, we notice that the physicians keep their eyes fixedon the temporal summary as a variety of Filters areapplied. When we work on the mature heparin-inducedthrombocytopenia study [28], a different set of physiciansalso show the same tendency to focus on the temporal

J Med Syst (2011) 35:1135–1152 1147

summaries as the data is being manipulated. The physi-cians often ask questions out loud on how a particularmanipulation (e.g. Align, Filter), changes the distributionin the temporal summaries, and they correlate what theysee with their experience to better examine if there areunexpected insights. By fixing their attention on thetemporal summaries, they gain a good mental model ofthe data and how the successively applied operators affectthe data. They then can decide if their path of explorationseems to be on the right track.

When manipulating representations of the data iterative-ly, successively, and quickly, the number of representationsgrows very fast. We anticipated this need and built in somerudimentary mechanisms in Lifelines2 to support resettingand reverting Filters and saving subsets of records that canbe referred to later on. For example, if physicians do notlike a previously applied operator, they can backtrack to aprevious state, and rethink their approach. Because ourphysician collaborators want to keep track of the manipu-lations they take and understand how these manipulationschange the data globally, we have actually observed themcombining Lifelines2’s features to perform novel tasks thatwere not initially anticipated. Our physician collaboratorswould use the grouping operators to create successivelysmaller groups of patients via Filters and temporalsummaries. They would then use the comparison featureto show multiple temporal summaries on these previouslycreated groups to examine if the successive Filters seemedto be fruitful, or to determine if a Filter is too aggressive.This technique was used in the case studies reported in[28]. Over all, temporal summaries provide indispensableguidance to the physicians. Although focusing on temporalsummaries was quick, our collaborators would still examinethe records individually when they have the chance, thoughnot exhaustively.

Handling findings

In the process of watching the result of each manipulation,our collaborators are often confronted with a variety offindings. These findings may be a positive one (e.g. onethat helps them answer their question), negative (e.g. onethat tells them the answer they seek cannot be answered), orunexpected (e.g. an unanticipated characteristic of the datais found, prompting new questions). Regardless which typeof finding is reached, our collaborators would, in general,double check their work by revisiting the groups they havepreviously created and comparing their distributions, or re-performing the data manipulation steps. They would thensave their work for dissemination (if it is a positive finding)or for later investigation. Here we present the detailedstrategies our collaborators use when each of these types offindings are encountered.

When our collaborators arrive at a point where theirquestions might be answered, they would use their domainknowledge to comprehend and explain what they see. Theyfirst verify how they get to the point by looking at thegroups they have created before. They then examine thedata in detail to decide whether their questions areanswered satisfactorily. If so, then they would take notesand prepare the results for dissemination. More often thannot, however, they would find additional, new questions topursue. When this occurs, we have observed our physiciancollaborators to utilize their domain knowledge to try toalso explain the scenario (e.g. one physician would narrateand reason about why certain EHRs share similar patternswhile the others do not) and decide if this is a relevant topursue. If they are interested in the new question, theywould save their current search progress and immediatelychange the focus to the new question. Alternatively, theywould write down new questions for later exploration. Wehave observed both of these strategies. Depending on thekind of new questions that arise, sometimes it may involveadditional data—in which case, the process would loopback to the Information Gathering stage—and sometimes itmay only involve using a different set of Filters to branchthe exploration paths. This has occurred in a number ofcase studies, including the hematocrit and trauma patientstudy and the mature heparin-thrombocytopenia study,where discharge status and specific drug prescriptions werelater added to investigate new but related questions.

Aside from saving states or jotting down notes, when ourcollaborators encounter unexpected interesting findings, theywould additionally make screenshots and annotate immedi-ately what they see. They would use the built-in screencapture feature in Lifelines2 so they can easily remember whatis unexpected and also to share with their colleagues. Theywould also use the annotation tool in Lifelines2, althoughannotation was recorded to only occur in 5% of all loggedsessions (not listed in Table 2), to annotate the screenshots. Ifthey discovered a set of interesting patients, they would savethem and export the results so they can examine the result setof patients in detail, including cross-referencing against theirlive medical database system.

Unfortunately, sometimes a dead-end is reached. If it isthe case that more data is required, or that betterpreprocessing of the data can help breakthrough the dead-end, we would return to the Information Gathering stage,and restart the analysis. On the other hand, the dead-endcan be caused by the limitations of Lifelines2. This occurswhen the analysis require features Lifelines2 does notsupport (e.g., temporal search of numerical values ordealing with patient attributes). Sometimes the limitationsof Lifelines2 can be compensated by other systems. Forexample, in the heart attack and daylight savings casestudy, we used Excel as a platform to compute average

1148 J Med Syst (2011) 35:1135–1152

incidents per day. In general, however, unless we find aworkaround, the case study would discontinue, such as thesituation hematocrit in trauma patient cases study encoun-tered. It is worth noting that none of our mature adoptioncase studies ran into dead-ends due to Lifelines2’slimitations. Over the years, our physician collaboratorshave become good at picking medical case studies that areappropriate for Lifelines2. This suggests that the features inLifelines2 facilitate certain ways of thinking, and it takesanalysts some time to comprehend and frame their ques-tions appropriately to suit Lifelines2’s features.

Producing and disseminating results

Finally, when analysts are able to obtain answers to theirquestions, they would prepare their findings. Our collabo-rators routinely keep subsets of EHRs that represent the fruitof their labor, screen shots, annotations, and spreadsheetscreated in our collaborative exploration sessions. They wouldadditionally ask us to package up the raw data and the finalresults so their colleagues can verify and examine the analysisresults. Our collaborators would then stitch up the results in apresentation to present to colleagues. They show theircolleagues or supervisors to argue for or against a proce-dure/policy change as in the Bounce-Back, Step-Up, andBiPAP case studies. These results can help hospitals monitorthe quality of their healthcare, and potentially save operationalcost. For example, the results of the Step-Up case study havebeen presented in at least one physicians’ internal meeting,and have been used as evidence to show the step-up rates arenormal, and there is no need to change the standard operatingprocedure of triage.

Summary of the process model

1. Gathering Information2. Re-representing Information3. Manipulating Representation to Gain Insight

a. Utilizing Different Search Strategiesb. Watching Results of Manipulationc. Handling Findings

4. Producing and Disseminating Results

This visual analytics process model is designed topromote insight discovery, based on systematic, yet flexibleefforts at data understanding, data cleaning, appropriaterepresentations, hypotheses generation, hypotheses valida-tion, and then extraction of evidence to share withcolleagues. This process is iterative, and requires userdirection, as it is not yet built into the user interface. Thisprocess model is meant to guide users in learning to dodiscovery in temporal event data and to help designersimprove future visual analytic tools.

Recommendations

The case studies, Lifelines2 logs, and observations haverevealed some interesting user behaviors when dealing withmultiple EHRs. They have also revealed the strengths andweaknesses of Lifelines2. We generalize these into thefollowing seven design recommendations for future devel-opers of visualization tools for multiple EHRs. While someare closely related to Lifelines2, we do try to make them asgeneral as possible.

1. (Use Alignment) The usefulness of Alignment wasevident in the Lifelines2 logs and from observationsand collaborator comments. The user logs corroboratethe findings of Alignment in our previous controlledexperiment [27]. When dealing with a large number ofEHRs, the ability to use Alignment to impose a strictrelative time frame was important to our collaborators.It allowed for quicker visual scanning of the data alongthe Alignment. The dynamism of Alignment allowed theanalysts to quickly switch perspectives and focus ifthey need to. The idea of “anchoring” the data by datacharacteristics for exploration had been successful invisualization of other complex data such as networkdata [34, 35], and Alignment seems to be one naturalversion of it for temporal data. Developing futurevisualization systems for EHRs should leverage onAlignment for its power, flexibility, and wide range ofapplicability. We would encourage researchers tofurther explore alternative “anchoring” techniques intemporal visualization.

2. (Show details) One surprising finding was that ourcollaborators liked to look at the details of the records.One piece of evidence is that Scroll was the mostfrequently used operation. Seeing and comparing thedetails of records seem to reassure the analysts that nodata are missing, broken, or lost along the analysisprocess. Another piece of evidence was that theCollapse operator, which makes details harder to see,was hardly used, despite the fact that collapsed recordstake far less space, and more can fit on a screen. Oursecond recommendation is that detailed depiction of therecords is important, even for multiple EHR visualiza-tion, and even if the primary view of the data was to bein an overview, where details may be hidden.

3. (Overview Differently) We observed that our collab-orators tended to focus on the overview most of thetime to get a sense of what each Filtering operatordoes. However, Lifelines2 only provides overview inthe form of temporal summaries. Additional concurrentoverviews may be beneficial. For example, in additionto the “horizontal” temporal summaries, “vertical”overviews can simultaneously show a different aggre-

J Med Syst (2011) 35:1135–1152 1149

gation over records, such as [36, 37]. Furthermore, agood vertical overview design may reduce the amountof Scroll necessary, improving overall user performancein information seeking.

4. (Support Richer Exploration Process) The features inLifelines2 that support branching in exploration such asSave Group and Change Group are, by today’s stand-ards, rudimentary. However, they were both usedfrequently, and we have received comments from ourcollaborators that a lot of improvements in this regardare desired. As analysis processes becomes more andmore involved, analysis tools need to better supportbranched search, history keeping, and backtracking. Anadditional requirement for visual analysis systems is toallow users to perform history keeping, backtrackingwith respect to visualization, not just data. For example,to be able to revert quickly from one representation to aprevious one can help users better maintain a consistentmental model.

5. (Support Flexible Data Types) Some of the earliercase studies stopped because Lifelines2 does notsupport numerical values. We discovered that depend-ing on the focus of a medical scenario, sometimes ourcollaborators reasoned at a higher abstraction (catego-ries), and sometimes lower (numerical values), andsometimes the abstractions change within the samescenario. Most visualization systems focus on eithercategorical data or numerical data, but there are a fewsystems that visualize machine-created abstractions [17,19]. Our experiences suggest that a visualizationsystem that supports temporal analysis seamlessly inmultiple, user-driven, and dynamically constructed,abstractions will be valuable.

6. (Increase Information Density) The high amount ofscrolling we recorded indicates that the amount of dataour collaborators want to see is typically much largerthan a screen can hold. It is important to improveinformation density in Lifelines2 and other time-linebased visualizations, e.g [13, 16, 38],.. From ourexperience, the reason our physician collaborators lookat the detail of the records is to better understand eventsequences across multiple EHRs. To be able to increaseinformation density and preserve the event sequencesthey care about is very important. Finally, a good“vertical” overview (Recommendation 3) may alleviatethis problem at the same time.

7. (Integration with Live Databases) Today’s clinicalinformation systems contain invaluable informationthat can be analyzed to monitor and improve health-care. As [39] suggests, newer medical informationsystems should contain analytical modules that allowdirect connection to the data and provide tools foranalysis. We have taken a different route in building

and evaluating Lifelines2. We built a visual analysistool that is agnostic to a underlying storage architec-ture. As a result, our case studies require considerableamount of efforts to collect, de-identify, preprocess,and convert the data just to begin the analysis. Thispresents a significant barrier to physician analysts. Atighter integration with a medical information systemthat has analysis tasks in mind will be better. However,more generalized tools to convert existing data toanalyzable data are what will give physicians thepower to take the analysis from end-to-end. Whendesigning medical systems, this should be one of thepriorities.

While these recommendations are derived from ourthree-year long design-implement-evaluation process,they are certainly not exhaustive. A different systemmay derive a different, perhaps overlapping, set ofrecommendations due to the difference in interactionand visual design, target domain, user, and analyticaltasks. While each system can only capture the userexperience and analytical styles from its perspective andmake recommendations accordingly, we hope our in-depth discussions can foster future dialogues.

Conclusions

We believe the definition of a successful EHR system isnot only the storage, retrieval, and exchange of patientdata. It should support tasks its end-users care about, andit should be usable and useful. Only then will EHRsystems provide value to its end-users and broaden itsbase of end-users. Collaborating with physicians over thepast two and half years, we focused specifically ontemporal categorical data analysis tasks. Using Life-lines2, our collaborators were able to make interestingdiscoveries and help improve patient care. We present ageneralization of our eight case studies visualizing EHRdata using Lifelines2. By analyzing the feature usagedata, user comments, and study observations, we presentan visual analytics process model for multiple EHRs anda list of recommendations for future information visual-ization designers for EHR systems for the tasks oftemporal data analysis. While some of our results arelimited to capabilities of Lifelines2 applied to EHRs, wewere able to draw several other more general recom-mendations. In this era of vast opportunities for EHRsystems, we have made only a small step towardsvisualization and interface design. We encourage theinformation visualization designers to continue building auser-centered, task-based design requirements and pro-cess models for the betterment of EHR end-users.

1150 J Med Syst (2011) 35:1135–1152

Acknowledgements We appreciate the support from NIH-NationalCancer Institute grant RC1CA147489-02: Interactive Exploration ofTemporal Patterns in Electronic Health Records. We would like tothank MedStar Health for their continued support of our work. Wewould like to thank Dr. Mark Smith, Dr. Phuong Ho, Mr. DavidRoseman, Dr. Greg Marchand, and Dr. Vikramjit Mukherjee for theirclose collaboration.

References

1. Wilcox, L., Morris, D., Tan, D., and Gatewood, J., Designingpatient-centric information displays for hospitals. Proc. of the 28thACM Int Conf on Human Factors in Comp Syst. 2123–2132. NewYork, NY, USA. ACM, 2010.

2. Wilcox, L., Lu, J., Lai, J., Feiner, S., and Jordan, D., Physician-driven management of patient progress notes in an intensive careunit. Proc. of the 28th ACM Int Conf on Human Factors in CompSyst. 1879–1888. New York, NY, USA. ACM, 2010.

3. Zhou, X., Ackerman, M. S., and Zhen, K., Doctors andpsychosocial information: Records and reuse in inpatient care.Proc. of the 28th ACM Int Conf on Human Factors in Comp Syst.1767–1776. New York, NY, USA. ACM, 2010.

4. Chen, Y., Documenting transitional information in EMR, Proc. ofthe 28th ACM Int Conf on Human Factors in Comp Sys. 1787–1796. New York, NY, USA. ACM, 2010.

5. Sarcevic, A., “Who’s scribing?”: Documenting patient encounterduring trauma resuscitation, Proc. of the 28th ACM Int Conf onHuman Factors in Comp Syst. 1899–1908. New York, NY, USA.ACM, 2010.

6. Microsoft., http://www.microsoft.com/amalga/7. Murphy, S., Mendis, M., Hackett, K., Kuttan, R., Pan, W.,

Phillips, L., and Gainer, V., et al., Architecture of the open-sourceclinical research chart from informatics for integrating biology andthe bedside. Proc. Am. Med. Inform. Assoc. Ann. Symp. 598–602,2007.

8. Viederman, M., The psychodynamic life narrative: A psychother-apeutic intervention useful in crisis situations. Psychiater. 46(3):236–246, 1983.

9. Greenhalgh, T., Narrative based medicine in an evidence basedworld. Brit. Med. J. 318(7179):323–325.

10. Charon, R., A model for empathy, reflection, profession, and trust.J. Am. Med. Assoc. 286(15):1897–1902, 2001.

11. Powsner, S., and Tufte, E., Graphical summary of patient status.Lancet 344:386–389, 1994.

12. Plaisant, C., Mushlin, R., Snyder, A., Li, J., Heller, D., andShneiderman, B., Using visualization to enhance navigation andanalysis of patient records. Proc. Am. Med. Inform Assoc. Ann.Symp. 76–80, 1998.

13. Bade, R., Schelchweg, S., and Miksch, S., Connecting time-oriented data and information to a coherent interactive visualiza-tion. Proc 22nd ACM Int Conf on Human Factors in Comp Syst.105–112. New York, NY, USA, 2004.

14. Pieczkiewicz, D. S., Finkelstine, S. M., and Hertz, M. I., Designand evaluation of a web-based interactive visualization system forlung transplant home monitoring data. Proc. Am. Med. Inform.Assoc. Ann. Symp. 598–602, 2007.

15. Horn, W., Popow, C., and Unterasinger, L., Support for fastcomprehension of ICU data: Visualization using metaphorgraphics. Method Info. Med. 40(5):421–424, 2001.

16. Wongsuphasawat, K., and Shneiderman, B., Finding comparabletemporal categorical records: A similarity measure with an

interactive visualization. Proc. IEEE Symp. Vis. Ana Sci. andTech. 27–34, 2009.

17. Post, A. R., and Harrison, J. H., Protempa: A method for specifyingand identifying temporal sequences in retrospective data for patientselection. J. Am. Med. Inform. Assoc. 14(5):674–683, 2007.

18. Hinum, K., Miksch, S., Aigner, W., Ohmann, S., Popow, C.,Pohl, M., and Rester, M., Gravi++: Interactive informationvisualization to explore highly structured temporal data. J. UnivComp. Sci. – Spec. Iss. Vis. Data Mining 11(11):1792–1805,2005.

19. Klimov, D., Shahar, Y., and Taieb-Maimon, M., Intelligentselection and retrieval of multiple time-oriented records. J. Intel.Inform Syst. (Published Online), 2009.

20. Chittaro, L., Combi, C., and Trapasso, G., Data mining ontemporal data: A visual approach and its clinical applications tohemodialysis. J. Vis. Lang. Comput. 14:591–620, 2003.

21. Rind, A., Wang, T. D., Aigner, W., Miksh, S., Wongsuphasawat,K., Plaisant, C., and Shneiderman, B., Interactive informationvisualization for exploring and querying electronic health records:A systematic review. Technical Report HCIL-2010-19, Human-Computer Interaction Lab, University of Maryland at CollegePark, 2010.

22. Shneiderman, B., and Plaisant, C., Strategies for evaluatinginformation visualization tools: Multi-dimensional in-depth long-term case studies. Proc. Beyond Time and Errors: Novel Eval.Metho. Inform Vis. Workshop, 38–43, 2006.

23. Card, S. K., Mackinlay, J. D., and Shneiderman, B., Readings ininformation visualization: Using vision to think. Morgan Kaufmannpublishers, San Francisco, 1999.

24. Pirolli, P., and Card, S. K., The Sensemaking process andleverage points for analyst technology as identified throughcognitive task analysis. Proc. Inter. Conf on. Intel. Analy, 2–4,2005.

25. Thomas, J. J., and Cook, K. A., Illuminating the path: Theresearch and development agenda for visual analytics. NationalVisualization and Analytics Center, 2004.

26. Hansen, D. L., Rotman, D., Bonsignore, E., Milić-Frayling, N.,Rodrigues, E. M., Smith, M., and Shneiderman, B., Do youknow your way to NSA?: A process model for analyzing andvisualizing social media data. Technical Report HCIL-2009-17,Human-Computer Interaction Lab, University of Maryland,2009.

27. Wang, T. D., Plaisant, C., Quinn, A. J., Stanchak, R., Murphy, S.,and Shneiderman, B., Aligning temporal data by sentinel events:Discovering patterns in electronic health records. Proc. of the 26thACM Int Conf on Human Factors in Comp Syst. 457–466. NewYork, NY, USA. ACM, 2008.

28. Wang, T. D., Plaisant, C., Shneiderman, B., Spring, N., Roseman,D., Marchand, G., Mukherjee, V., and Smith, M., Temporalsummaries: Supporting temporal categorical searching, aggrega-tion, and comparison. IEEE Trans. Vis. Comput. Graph. 15(6):1049–1056, 2009.

29. Janszky, I., and Ljung, R., Shifts to and from daylight saving timeto incidence of myocardial infarction. New Engl. J. Med 359(18):1966–1968, 2008.

30. Wang, T. D., Wongsuphasawat, K., Plaisant, C., andShneiderman, B., Visual information seeking in multipleelectronic health records: Design recommendations and aprocess model. Proc 1st ACM Int Health Infor Symp, 46–55.NY, USA. ACM, 2010.

31. Ventsias, T., Health IT: An Rx for health care. Terp 7(1):20–23,2009.

32. Bederson, B. B., Grosejean, J., and Meyer, J., Toolkit design forinteractive structured graphics. IEEE Trans. Softw. Eng. 30(8):535–546, 2004.

J Med Syst (2011) 35:1135–1152 1151

http://www.microsoft.com/amalga/

33. Wang, T. D., Deshpande, A., and Shneiderman, B. A.,Temporal pattern search algorithm, IEEE Trans Knowl andData Engin, Pre-print, http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.257, 2010.

34. Lee, B., Parr, C. C., Plaisant, C., Bederson, B., Veksler, V. D.,Gray, W. D., and Kotfila, C., TreePlus: Interactive exploration ofnetworks with enhanced tree layouts. IEEE Trans. Vis. ComputGraph Spec. Issue Vis. Anal. 12(6):1414–1426, 2006.

35. Aris, A., and Shneiderman, B., Designing semantic substrates forvisual network exploration. Info. Vis. 6(4):281–300, 2007.

36. Wongsuphasawat, K., Guerra Gómez, J., Plaisant, C., Wang, T.D., Taieb-Maimon, M., and Shneiderman, B., LifeFlow: Visual-

izing an overview of event sequences. To appear in Proc of the29th ACM Int Conf on Human Factors in Comp Syst. New York,NY, USA. ACM, 2011.

37. Vrotsou, K., Johansson, J., and Cooper, M., ActiviTree: Interactivevisual exploration of sequences in event-based data using graphsimilarity. IEEE Trans. Vis. Comput. Graph. 15(6):945–952, 2009.

38. Stottler Henke. DataMontage., http://www.stottlerhenke.com/datamontage/

39. Patrick, J., and Budd, P., Ockham’s razor of design: An heuristicfor guiding design and development of a clinical informationsystems generator. Proc 1st ACM Int Health Infor Symp, 18–27.NY, USA. ACM, 2010.

1152 J Med Syst (2011) 35:1135–1152

http://dx.doi.org/doi.ieeecomputersociety.org/10.1109/TKDE.2010.257

http://dx.doi.org/doi.ieeecomputersociety.org/10.1109/TKDE.2010.257

http://www.stottlerhenke.com/datamontage/

http://www.stottlerhenke.com/datamontage/

Documents

Extracting Insights from Electronic Health Records: …ben/papers/Wang2011Extracting.pdfORIGINAL PAPER Extracting Insights from Electronic Health Records: Case Studies, a Visual Analytics