60
Time Series Lecture 6

Lecture 6. Outlines 1. Terminology and User Tasks for time series 2. Limit of Statistics Parameters 3. Examples by Tableau 4. Examples of other tools

Embed Size (px)

Citation preview

  • Slide 1
  • Lecture 6
  • Slide 2
  • Outlines 1. Terminology and User Tasks for time series 2. Limit of Statistics Parameters 3. Examples by Tableau 4. Examples of other tools 5. Design Principles and High Dimensionality Challenge Objectives 1. Examination of a number of case studies 2. Learn from some of the different visualization ideas that have been created 3. Can you generalize these techniques into classes or categories?
  • Slide 3
  • 1. Time Series Data Fundamental chronological component to the data set Random sample of 4000 graphics from 15 of world s newspapers and magazines from 74-80 found that 75% of graphics published were time series Tufte, Vol. 1
  • Slide 4
  • Taxonomy Time Series Discrete points vs. interval points Linear time vs. cyclic time Ordinal time vs. continuous time Ordered time vs. branching time vs. time with multiple perspectives Cross Section Data (Multiple subjects/levels) Panel /TSCS (time-series cross-sectional) Data
  • Slide 5
  • 1. Terminology Data Sets Each data case is likely an event of some kind e.g. sunspot activity, baseball games, medicines taken, cities visited, stock prices, etc. One of the variables can be the date and time of the event Other Data Attribute Prices Trade volume
  • Slide 6
  • 1. Meta Level Consider multiple stocks being examined Is each stock a data case, or is a price on a particular day a case, with the stock name as one of the other variables? Confusion between data entity and data cases Answers to time series data for multiple stocks or call 410K, Mutual fund profile Data entity Data Cases Data attributes
  • Slide 7
  • 1. Data Exploration Vs. Data Mining Data mining domain has techniques for algorithmically examining time series data, looking for patterns, etc. Good when objective is known a priori But what if not? Which questions should I be asking? InfoVis and data exploration better for that
  • Slide 8
  • 1. Applications Autocorrelation analysis to examine Serial Dependence Spectral Analysis to examine cyclic behavior which need not be related to seasonality For example, sun spot activity varies over 11 year cycles Other common examples include celestial phenomena, weather patterns, neural activity, commodity prices, and economic activity. Separation into components representing trend, seasonality, slow and fast variation, and cyclical irregularity
  • Slide 9
  • 1. User tasks for time series What kinds of questions do people ask about time series data? Examples When was something greatest/least? Is there a pattern? Are two series similar? Do any of the series match a pattern? Do some events have causal relationships? Provide simpler, faster access to the series
  • Slide 10
  • 1. Other Tasks Does data element exist at time t ? When does a data element exist? How long does a data element exist? How often does a data element occur? How fast are data elements changing? In what order do data elements appear? Do data elements exist together? Answer all these questions about stock price
  • Slide 11
  • 1. Fundamental Tradeoff Is the visualization time-dependent, i.e., changing over time (beyond just being interactive) Static Shows history, multiple perspectives, allows comparison Dynamic (animation) Gives feel for process & changes over time, has more space to work with
  • Slide 12
  • Limit of Regress Analysis
  • Slide 13
  • Four Sets of Data IIIIIIIV xyxyxyxy 108.04109.14107.4686.58 86.9588.1486.7785.76 137.58138.741312.7487.71 98.8198.7797.1188.84 118.33119.26117.8188.47 149.96148.10148.8487.04 67.2466.1366.0885.25 44.2643.1045.391912.50 1210.84129.13128.1585.56 74.8277.2676.4287.91 55.6854.7455.7386.89 Anscombe's Quartet, American Statistician, 27 [February 1973], 17-21) comprises 4 data sets of 11 points each:
  • Slide 14
  • 3.1 How about Linear Regress? Mean of the x values = 9.0 Mean of the y values = 7.5 Equation of the least-squared regression line is: y = 3 + 0.5x Sums of squared errors (about the mean) = 110.0 Regression sums of squared errors (variance accounted for by x) = 27.5 Residual sums of squared errors (about the regression line) = 13.75 Correlation coefficient = 0.82 Coefficient of determination = 0.67 http://astro.swarthmore.edu/astro121/anscombe.html
  • Slide 15
  • 3.1. Four Data Sets
  • Slide 16
  • 3. Basic Graphs of Time Series Present time data as a 2D line graph with time on x- axis and some other variable on y-axis
  • Slide 17
  • 3. Classic View
  • Slide 18
  • Fun example What If Everybody in Canada Flushed At Once?
  • Slide 19
  • 3. A few examples by Tableau
  • Slide 20
  • 3. Interactive Filter
  • Slide 21
  • Time Series segments for comparison
  • Slide 22
  • Comparison with new dimensions
  • Slide 23
  • 3. Time series with distribution
  • Slide 24
  • Multiple time series with distribution
  • Slide 25
  • 3. Monitor Multiple Views of Time series
  • Slide 26
  • 4. Other tools, Example 1 Calendar visualization Present series of events in context of calendar Task Put together complete story Garner information for decision-making Notice trends Gain an overview of the events to grasp the big picture
  • Slide 27
  • One Solution
  • Slide 28
  • 3D View and Projections
  • Slide 29
  • Example 2 Personal histories Consider a chronological series of events in someones life Present an overview of the events Examples Medical history Educational background Criminal history Tasks Put together complete story Gather information for decision-making Notice trends Gain an overview of the events to grasp the big picture
  • Slide 30
  • Life Line Project Visualize personal history in some Domain
  • Slide 31
  • Medical display
  • Slide 32
  • Interaction Features Different colors for different event types Line thickness can correspond to another variable Interaction: Clicking on an event produces more details Certainly could also incorporate some dynamic query capabilities
  • Slide 33
  • Benefit and Challenges Benefit Reduce chances of missing information Facilitate spotting trends or anomalies Streamline access to details Remain simple and tailorable to various applications Challenges Scalability Can multiple records be visualized in parallel (well)?
  • Slide 34
  • New works for Medical Image Work with query results Need to align, rank, and filter Medical application: Look for temporal coincidence of two events First pneuomonia and asthma attack Medical professionals dont want to fool with zooming and panning
  • Slide 35
  • Focus on alignment of events
  • Slide 36
  • Example 3 Understand patterns of presence/events over time Focus: Peoples presence/movements in some space Situation: Workers punch in and punch out of a factory Want to understand the presence patterns over a calendar year Alternate: Power plant electricity usage over a year
  • Slide 37
  • KW/Punch in/out times for workers
  • Slide 38
  • Approach Cluster analysis Find two most similar days, make into one new composite Keep repeating until some preset number left or some condition met How can this be visualized? Ideas?
  • Slide 39
  • Display
  • Slide 40
  • Characteristics Cluster Analysis Unique types of days (individual or cluster) get their own color Contextually placed in calendar and line graph for it is shown Stop clustering when a threshold met or at a predetermined number of clusters Interactive Click on day, see its graph Select a day, see similar ones Add/remove clusters
  • Slide 41
  • Insight from the analysis Traditional office hours followed Most employees present in late morning Fewer people are present on summer Fridays Just a few people work holidays When the holidays occurred School vacations occurred May 3-11, Oct 11-19, Dec 21-31 Many people take off day after holiday Many people leave at 4pm on December 5
  • Slide 42
  • Example 4 Flow of changes across electronic documents history flow is being used to visualize the evolutionary history of wiki* pages on Wikipedia. http://researchweb.watson.ibm.com/history/
  • Slide 43
  • Techniques
  • Slide 44
  • What can be found? Understanding the frequency and timing of vandalism for wiki sites. Analyzing the overall stability in size and content for assessing the reliability of group-authored web sites. Have a solid understanding of the relationship between various factors. e.g., how does anonymity affect the likelihood of vandalism? Are page sections that survive many edits more likely to be high quality? The Visualizations above are suggestive, Need to be verified through statistical analysis.
  • Slide 45
  • Example 5 Computer system logs Potentially huge amount of data Tedious to examine the text Looking for unusual circumstances, patterns, etc. MieLog System to help computer systems administrators examine log files Interesting characteristics
  • Slide 46
  • System View
  • Slide 47
  • Another View
  • Slide 48
  • Interaction Features Tag area Click on tag shows only those messages Time area Click on tiles to show those times Can put line on histogram to filter on values above/below Outline area Can filter based on message length Just highlight messages to show them in text Message area Can filter on specific words
  • Slide 49
  • Example 6 Very large scale temporal log data Show more context of what else was going on at that time Likely have to abstract some then Allow several different levels of detail at once Allow drill-down for details Domain: Computer systems management
  • Slide 50
  • LiveRAC Computer system management data Heavy interaction Semantic zooming
  • Slide 51
  • 5. Design Principles Show familiar visual representations whenever possible Provide side-by-side comparisons of small multiple views Spatial position is strongest visual cue Multiple views are more effective when coordinated through explicit linking Avoid abrupt visual change Follow Shneiderman s mantra User actions should receive immediate visual feedback Assertion: Showing several levels of detail simultaneously provides useful high information density in context
  • Slide 52
  • Periodic Data Serial, periodic data Data with chronological aspect, but repeats and follows a pattern over time How might one visualize that? Using Spirals Standard x-y timeline or tabular display is problematic for periodic data It has endpoints Use spiral to help display data One loop corresponds to one period
  • Slide 53
  • Basic Spiral Graph One year per loop Same month on radial bars Quantity represented by size of blob Is it as easy to see serial data as periodic data?
  • Slide 54
  • Advanced Spiral Graph Spiral Same mapping as previous one Different foods represented by different colors and drawn at different heights Can you still see serial and periodic attributes? As with all 3-D, requires navigation
  • Slide 55
  • Example 1, Geo-temporal data How about events in time and place? Many applications of this problem Represent place by 2D plane (or maybe 3D topography) Use 3 rd dimension to encode time Object types: Entities (people or things) Locations (geospatial or conceptual) Events (occurrences or discovered facts)
  • Slide 56
  • Overview Objective: visualize spatial interconnectedness of information over time and geography with interactive 3-D view
  • Slide 57
  • Design Characteristics
  • Slide 58
  • Solution
  • Slide 59
  • March size of army directional altitude, longitude temperature, date Napoleon s March
  • Slide 60
  • Documents Connections, Jigsaw