Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Time Series
Karanveer Mohan Keegan Go Stephen Boyd
EE103Stanford University
November 15, 2016
Outline
Introduction
Linear operations
Least-squares
Prediction
Introduction 2
Time series data
I represent time series x1, . . . , xT as T -vector x
I xt is value of some quantity at time (period, epoch) t, t = 1, . . . , T
I examples:
– average temperature at some location on day t– closing price of some stock on (trading) day t– hourly number of users on a website– altitude of an airplane every 10 seconds– enrollment in a class every quarter
I vector time series: xt is an n-vector; can represent as T × n matrix
Introduction 3
Types of time series
time series can be
I smoothly varying or more wiggly and random
I roughly periodic (e.g., hourly temperature)
I growing or shrinking (or both)
I random but roughly continuous
(these are vague labels)
Introduction 4
Melbourne temperature
I daily measurements, for 10 years
I you can see seasonal (yearly) periodicity
0 500 1000 1500 2000 2500 3000 3500 4000−5
0
5
10
15
20
25
30
Introduction 5
Melbourne temperature
I zoomed to one year
50 100 150 200 250 300 3500
5
10
15
20
25
Introduction 6
Apple stock price
I log10 of Apple daily share price, over 30 years, 250 trading days/year
I you can see (not steady) growth
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
1.5
2
2.5
Introduction 7
Log price of Apple
I zoomed to one year
6000 6050 6100 6150 6200 62500.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
Introduction 8
Electricity usage in (one region of) Texas
I total in 15 minute intervals, over 1 year
I you can see variation over year
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
0
1
2
3
4
5
6
7
8
9
10
Introduction 9
Electricity usage in (one region of) Texas
I zoomed to 1 month
I you can see daily periodicity and weekend/weekday variation
500 1000 1500 2000 25001
2
3
4
5
6
7
8
9
Introduction 10
Outline
Introduction
Linear operations
Least-squares
Prediction
Linear operations 11
Down-sampling
I k× down-sampled time series selects every kth entry of x
I can be written as y = Ax
I for 2× down-sampling, T even,
A =
1 0 0 0 0 0 · · · 0 0 0 00 0 1 0 0 0 · · · 0 0 0 00 0 0 0 1 0 · · · 0 0 0 0...
......
......
......
......
...0 0 0 0 0 0 · · · 1 0 0 00 0 0 0 0 0 · · · 0 0 1 0
I alternative: average consecutive k-long blocks of x
Linear operations 12
Up-sampling
I k× (linear) up-sampling interpolates between entries of x
I can be written as y = Ax
I for 2× up-sampling
A =
11/2 1/2
11/2 1/2
1. . .
11/2 1/2
1
Linear operations 13
Up-sampling on Apple log price
4× up-sample
5100 5102 5104 5106 5108 5110 5112 5114 5116 5118 51200.09
0.1
0.11
0.12
0.13
0.14
0.15
0.16
0.17
Linear operations 14
Smoothing
I k-long moving average y of x is given by
yi =1
k(xi + xi+1 + · · ·+ xi+k−1), i = 1, . . . , T − k + 1
I can express as y = Ax, e.g., for k = 3,
A =
1/3 1/3 1/3
1/3 1/3 1/31/3 1/3 1/3
. . .
1/3 1/3 1/3
I can also have trailing or centered smoothing
Linear operations 15
Melbourne daily temperature smoothed
I centered smoothing with window size 41
0 500 1000 1500 2000 2500 3000 3500 4000−5
0
5
10
15
20
25
30
Linear operations 16
First-order differences
I (first-order) difference between adjacent entries
I discrete analog of derivative
I express as y = Dx, D is the (T − 1)× T difference matrix
D =
−1 1 . . .
−1 1 . . .. . .
. . .
. . . −1 1
I ‖Dx‖2 (Laplacian) is a measure of the wiggliness of x
‖Dx‖2 = (x2 − x1)2 + · · ·+ (xT − xT−1)2
Linear operations 17
Outline
Introduction
Linear operations
Least-squares
Prediction
Least-squares 18
De-meaning
I de-meaning a time series means subtracting its mean:x = x− avg(x)
I rms(x) = std(x)
I this is the least-squares fit with a constant
Least-squares 19
Straight-line fit and de-trending
I fit data (1, x1), . . . , (T, xT ) with affine model xt ≈ a+ bt(also called straight-line fit)
I b is called the trend
I a+ bt is called the trend line
I de-trending a time series means subtracting its straight-line fit
I de-trended time series shows variations above and below thestraight-line fit
Least-squares 20
Straight-line fit on Apple log price
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
1.5
2
2.5
Trend
0 1000 2000 3000 4000 5000 6000 7000 8000 9000−1
−0.5
0
0.5
1
Residual
Least-squares 21
Periodic time series
I let P -vector z be one period of periodic time series
xper = (z, z, . . . , z)
(we assume T is a multiple of P )
I express as xper = Az with
A =
IP...IP
Least-squares 22
Extracting a periodic component
I given (non-periodic) time series x, choose z to minimize ‖x−Az‖2
I gives best least-squares fit with periodic time series
I simple solution: average periods of original:
z = (1/k)ATx, k = T/P
I e.g., to get z for January 9, average all xi’s with date January 9
Least-squares 23
Periodic component of Melbourne temperature
0 500 1000 1500 2000 2500 3000 3500 4000−5
0
5
10
15
20
25
30
Least-squares 24
Extracting a periodic component with smoothing
I can add smoothing to periodic fit by minimizing
‖x−Az‖2 + λ‖Dz‖2
I λ > 0 is smoothing parameter
I D is P × P circular difference matrix
D =
−1 1
−1 1. . .
. . .
−1 11 −1
I λ is chosen visually or by validation
Least-squares 25
Choosing smoothing via validation
I split data into train and test sets, e.g., test set is last period (Pentries)
I train model on train set, and test on the test set
I choose λ to (approximately) minimize error on the test set
Least-squares 26
Validation of smoothing for Melbourne temperature
trained on first 8 years; tested on last two years
10−2
10−1
100
101
102
2.6
2.65
2.7
2.75
2.8
2.85
2.9
2.95
3
Least-squares 27
Periodic component of temperature with smoothing
I zoomed on test set, using λ = 30
2900 3000 3100 3200 3300 3400 3500 36000
5
10
15
20
25
Least-squares 28
Outline
Introduction
Linear operations
Least-squares
Prediction
Prediction 29
Prediction
I goal: predict or guess xt+K given x1, . . . , xtI K = 1 is one-step-ahead prediction
I prediction is often denoted xt+K , or more explicitly x(t+K|t)(estimate of xt+K at time t)
I xt+K − xt+K is prediction error
I applications: predict
– asset price– product demand– electricity usage– economic activity– position of vehicle
Prediction 30
Some simple predictors
I constant: xt+K = a
I current value: xt+K = xtI linear (affine) extrapolation from last two values:
xt+K = xt +K(xt − xt−1)
I average to date: xt+K = avg(x1:t)
I (M + 1)-period rolling average: xt+K = avg(x(t−M):t)
I straight-line fit to date (i.e., based on x1:t)
Prediction 31
Auto-regressive predictor
I auto-regressive predictor:
xt+K = (xt, xt−1, . . . , xt−M )Tβ
– M is memory length– (M + 1)-vector β gives predictor weights– can add offset v to xt+K
I prediction xt+K is linear function of past window xt−M :t
I (which of the simple predictors above have this form?)
Prediction 32
Least squares fitting of auto-regressive models
I choose coefficients β via least squares (regression)
I regressors are (M + 1)-vectors
x1:(M+1), . . . , x(N−M):N
I outcomes are numbers
xM+K+1, . . . , xN+K
I can add regularization on β
Prediction 33
Evaluating predictions with validation
I for simple methods: evaluate RMS prediction error
I for more sophisticated methods:
– split data into a training set and a test set (usually sequential)– train prediction on training data– test on test data
Prediction 34
Example
I predict Texas energy usage one step ahead (K = 1)
I train on first 10 months, test on last 2
Prediction 35
Coefficients
I using M = 100I 0 is the coefficient for today
0 10 20 30 40 50 60 70 80 90 100−1
−0.5
0
0.5
1
1.5
2
Prediction 36
Auto-regressive prediction results
3 3.1 3.2 3.3 3.4 3.5
x 104
1
2
3
4
5
6
7
8
9
Prediction 37
Auto-regressive prediction results
showing the residual
3 3.1 3.2 3.3 3.4 3.5
x 104
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Prediction 38
Auto-regressive prediction results
predictor RMS erroraverage (constant) 1.20current value 0.119auto-regressive (M = 10) 0.073auto-regressive (M = 100) 0.051
Prediction 39
Autoregressive model on residuals
I fit a model to the time series, e.g., linear or periodic
I subtract this model from the original signal to compute residuals
I apply auto-regressive model to predict residuals
I can add predicted residuals back to model to obtain predictions
Prediction 40
Example
I Melbourne temperature data residualsI zoomed on 100 days in test set
3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−8
−6
−4
−2
0
2
4
6
8
Prediction 41
Auto-regressive prediction of residuals
3000 3010 3020 3030 3040 3050 3060 3070 3080 3090 3100−10
−8
−6
−4
−2
0
2
4
6
8
10
Prediction 42
Prediction results for Melbourne temperature
I tested on last two years
predictor RMS erroraverage 4.12current value 2.57periodic (no smoothing) 2.71periodic (smoothing, λ = 30) 2.62auto-regressive (M = 3) 2.44auto-regressive (M = 20) 2.27auto-regressive on residual (M = 20) 2.22
Prediction 43