An evaluation of observationally based, high resolution gridded data sets over the continental
United States
Ruben Behnke – UMT Missoula, UW MadisonAndrew Allstadt – UW Madison
Jared Oyler – UMT MissoulaSteve Vavrus – UW Madison
7/13/1995 2/2/1996 10/3/2000Tmax Tmax Precipitation
HOT day in Madison Cold day in Madison Very Wet day Miami
DataSetTMax_Obs
(°C)Data(°C)
TMax_Obs (°C)
Data(°C)
Precip_Obs (mm)
Data (mm)
DayMet 38.3 35 -25.6 -20.5 319 40JohnA 38.3 35.48 -25.6 -17.1 319 222.5Livneh 38.3 37.78 -25.6 -25 319 258.8Maurer 38.3 37.2 -25.6 -23.7 319 124.9NLDAS2 38.3 36.74 -25.6 -27.1 319 151.5TopoWx 38.3 35.38 -25.6 -20 319 57.34 (CPC)
1st Motivation: Different data sets can provide much different data for the same query. Here’s a couple specific examples of where/when data sets differ from each other and obs…
2nd Motivation: After having been asked to provide climate data for a myriad of research topics, I decided a formal analysis of which data set to either recommend or provide to the end user is needed.
3rd Motivation: A formal analysis is needed in order to determine what needs to be improved, possible new research paths, etc.
Project Focus and Path1) Increasing number of gridded, daily data sets out there, but which
one to use?
2) Focus on END USERS needs (how well gridded data match station data only)! – not designed to take into account interpolation
algorithms, stations included, etc.
3) Consider ‘Observations’ to be actual GHCN/COOP/etc. station data
4) Focus on ‘extreme’ indices (CLIMDEX) as they are the hardest to model, but BIOCLIM indices, as well as several other measures are
being calculated
Large scale; ~7000 precipitation, ~5000 temperature stations
Some results… (from a first study using 119 stations from around the country)
Introductory CLIMDEX figures – these are meant to indicate variation among data setsAll examples are for Madison, WI.
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
0
5
10
15
20
25
30
# Nights > 20° C
ObsJohnADayMetLivnehMaurerNLDAS2TopoWx
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
75
95
115
135
155
175
195
# Days > .25 mm (0.01") Precipitation
ObsJohnADayMetLivnehMaurerNLDAS2CPC
CLIMDEX, continued…
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
110
120
130
140
150
160
170
180
# Frost Days/Year
ObsJohnADayMetLivnehMaurerNLDAS2TopoWx
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
40
50
60
70
80
90
100
110
120
# Days/Year > 25° C
ObsJohnADayMetLivnehMaurerNLDAS2TopoWx
How well does each data set model daily Precipitation from 1981 – 2010?(based on ‘average daily U.S. Precipitation’ derived from 119 stations)
HistogramsBLUE – StationGREY - MODEL
Conditional Quantiles- For each bin in the histogram, the median, 25th/75th, and 10th/90th percentiles are calculated.
Why is there a spike in 4 of these data sets for
this bin?
How well does each data set model daily Precipitation from 1981 – 2010?(based on ‘average daily U.S. Precipitation’ derived from 119 stations)
Taylor DiagramShows the RMSE, correlation, and
standard deviation of a modeled data
series relative to an observed data series
ExampleThe observed RMSE is 0, correlation is 1,
and standard deviation is 1 (by
definition)
Relative to the observed data,
DayMet’s correlation is 0.79, RMSE is 0.57,
and normalized standard deviation is
0.96
How well does each data set model daily TMax from 1981 – 2010?(based on ‘average daily U.S. Tmax’ derived from 119 stations)
How well does each data set model daily TMax from 1981 – 2010?(based on ‘average daily U.S. Tmax’ derived from 119 stations)
So, when averaging across many stations, the data sets do a good job for temperature. But precipitation is more difficult and this
showed in the Taylor and quantile diagrams.
What happens when we look at individual stations?Precipitation
(observed daily data vs daily data corresponding to grid cell where station is located)
Maximum Temperature (observed daily data vs daily data corresponding to grid cell where station is located)
Let’s take a look at Madison, WI in more detail.Precipitation Maximum Temperature
Individual grid cells, even those with a first order station used in the
interpolation, are much harder to model
There’s that spike in the downscaled
data again!
May be stating the obvious, but…1) Fairly easy to get good temporal and spatial averages, totals,
etc. Much more difficult to model daily values at individual grid cells (even those which contain a station).
2) Precipitation is much more difficult to model than temperature.
3) Higher resolution does not necessarily equal better data.4) Choosing a graphic or statistic to: a) analyze data and/or b) communicate results isn’t straightforward as user’s needs vary (extremes vs. means, station vs. region, etc.)
Some First Results5) The “best” data set tends to vary by location and variable6) An “overall best” data set… Ben Livneh (??) – also the newest
7) Much more specific results coming…
Future work…
• Expand analysis to 7000 precipitation, 5000 temperature stations
• Aggregate results spatially by topography, ecoregion, etc.
• Add new/upcoming data sets (daily PRISM, Dan McKenney, etc.)
• Include regional data sets (??)
• Maps, Portrait Diagrams, Time Series, etc.
Comments, questions, ideas are all welcome!
Thank [email protected]
Seasonal Maximum Temperature (119 Station Mean)
Seasonal Precipitation (119 Station Mean)
TMax Data Set n FAC2 MB MGE NMB NMGE RMSE r COE
1 DayMet 10950 0.8431211 0.139863012.481726
0 0.010183252 0.18069138 3.46781220.959256
00.764378
9
2 JohnA 10957 0.7700100 -0.094658783.421430
2 -0.006894610 0.24920487 4.42983210.933569
00.675135
5
3 Livneh 10957 0.9726202 -0.069773340.339451
6 -0.005082043 0.02472445 0.58411260.998869
90.967769
1
4 Maurer 10957 0.9437751 -0.045391990.855736
1 -0.003306192 0.06232879 1.17045960.995389
40.918747
9
5 NLDAS2 10957 0.8381856 -1.074721931.913466
0 -0.078278941 0.13937009 2.42718720.985099
70.818316
6
6 TopoWx 10957 0.8758558 0.23122844 1.9141782
0.016841861 0.13942196 2.5840992 0.9774450
0.818248
A few of the many more statistics that can be used…