Generating High Resolution Climate Change Projections ... · Advances in single image super-resolution (SR) correspond well to SD, which learns a mapping between low- and high-resolution

Generating High Resolution Climate Change Projections through Single ImageSuper-Resolution: An Abridged Version

Thomas Vandal1, Evan Kodra2, Sangram Ganguly3,4,Andrew Michaelis5, Ramakrishna Nemani3,6 and Auroop R. Ganguly1,2

1 Northeastern University, Civil and Environmental Engineering2 risQ Inc.

3 NASA Ames Research Center4 Bay Area Environmental Research Institute

5 University Corporation, Monterey Bay6 NASA Advanced Supercomputing Division

[email protected], [email protected], [email protected], [email protected],[email protected], [email protected]

AbstractThe impacts of climate change are felt by mostcritical systems, such as infrastructure, ecologicalsystems, and power-plants. However, contempo-rary Earth System Models (ESM) are run at spa-tial resolutions too coarse for assessing effects thislocalized. Local scale projections can be obtainedusing statistical downscaling, a technique whichuses historical climate observations to learn a low-resolution to high-resolution mapping. The spatio-temporal nature of the climate system motivatesthe adaptation of super-resolution image processingtechniques to statistical downscaling. In our work,we present DeepSD, a generalized stacked superresolution convolutional neural network (SRCNN)framework with multi-scale input channels for sta-tistical downscaling of climate variables. A com-parison of DeepSD to four state-of-the-art meth-ods downscaling daily precipitation from 1 degree( 100km) to 1/8 degrees ( 12.5km) over the Conti-nental United States. Furthermore, a framework us-ing the NASA Earth Exchange (NEX) platform isdiscussed for downscaling more than 20 ESM mod-els with multiple emission scenarios.

1 IntroductionClimate change is causing detrimental effects to society’swell being as temperatures increase, extreme events be-come more intense[Pachauri et al., 2014], and sea levelsrise[Nicholls and Cazenave, 2010]. Natural resources that so-ciety depends on, such as agriculture, freshwater, and coastalsystems, are vulnerable to increasing temperatures and moreextreme weather events. Similarly transportation systems,energy systems, and urban infrastructure allowing society tofunction efficiently continue to degrade due to the changingclimate. Furthermore, the health and security of human be-ings, particularly those living in poverty, are vulnerable to

extreme weather events with increasing intensity, duration,and frequency [Trenberth, 2012]. Scientists and stakehold-ers across areas such as ecology, water, and infrastructures,require access to credible and relevant climate data for riskassessment and adaptation planning.

Earth System Models (ESMs) are physics-based numeri-cal models which run on massive supercomputers to projectthe Earth’s response to changes in atmospheric greenhousegas (GHG) emissions scenarios. Archived ESM outputs aresome of the principal data products used across many disci-plines to characterize the likely impacts and uncertainties ofclimate change [Taylor et al., 2012]. These models encodephysics into dynamical systems coupling atmospheric, land,and ocean effects. ESMs provide a large number of climatevariables, such as temperature, precipitation, wind, humid-ity, and pressure, for scientists to study and evaluate impacts.The computationally demanding nature of ESMs limits spa-tial resolution between 1 and 3 degrees. These resolutionsare too course to resolve critical physical processes, such asconvection which generates heavy rainfall, or to assess thestakeholder-relevant local impacts of significant changes inthe attributes of these processes [Schiermeier, 2010].

Downscaling techniques are used to mitigate the low spa-tial resolution of ESMs through dynamical and statisticalmodeling. Dynamical downscaling, also referred to as re-gional climate models (RCMs), account for local physicalprocesses, such as convective and vegetation schemes, withsub-grid parameters within ESM boundary conditions forhigh-resolution projections. Like ESMs, RCMs are compu-tationally demanding and are not transferable across regions.In contrast, the statistical downscaling (SD) technique learnsa functional form to map ESMs to high resolution projectionsby incorporating observational data. A variety of statisticaland machine learning models, including linear models [Hes-sami et al., 2008], neural networks [Cannon, 2011], and sup-port vector machines [Ghosh, 2010], have been applied to SD.Despite the availability of many techniques, we are not awareof any SD method which explicitly captures spatial dependen-cies in both low-resolution and high-resolution climate data.

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18)

5389

SRCNN SRCNN SRCNN

Low-Resolution Input (100km)

50km Elevation 25km Elevation 12.5km Elevation

Prediction (12.5km)

Figure 1: (Best viewed in color) DeepSD Framework: This end-to-end framework of DeepSD includes a set of stacked SRCNNs where theinputs (dashed boxes) include a low-resolution input and high-resolution auxiliary variables.

Furthermore, traditional methods require observational dataat the high-resolution target, meaning that regions with littleobservation data, often the poorest regions which are mosteffected by climate change, are unable to receive downscaledclimate data needed for adaptation.

The lack of explicit spatial models in SD of ESMs mo-tivated us to study the applicability of computer vision ap-proaches, most often applied to images, to this problem.Advances in single image super-resolution (SR) correspondwell to SD, which learns a mapping between low- and high-resolution images. Through experimentation, we found thatsuper-resolution convolutional neural networks were able tocapture spatial information in climate data to improve beyondexisting methods.

1.1 Key ContributionsThe key contributions are as follows:• We present DeepSD, an augmented stacked super-

resolution convolutional neural network for statisticaldownscaling of climate and earth system model simu-lations based on observational and topographical data.

• DeepSD outperforms a state-of-the-art statistical down-scaling method used by the climate and earth sciencecommunities as well as a suite of off-the-shelf data min-ing and machine learning methods, in terms of both pre-dictive performance and scalability.

• The ability of DeepSD to outperform and generalizebeyond grid-by-grid predictions suggests the ability toleverage cross-grid information content in terms of sim-ilarity of learning patterns in space, while the abilityto model extremes points to the possibility of improvedability beyond matching of quantiles.

• For the first time, DeepSD presents an ability to gen-erate, in a scalable manner, downscaled products frommodel ensembles, specifically, simulations from differ-ent climate modeling groups across the world run withdifferent emissions trajectories and initial conditions.

• DeepSD provides NASA Earth Exchange (NEX) amethod of choice to process massive climate and earthsystem model ensembles to generate downscaled prod-ucts at high resolutions which can then be disseminatedto researchers and stakeholders.

2 Statistical Downscaling

SD is the problem of mapping a low-resolution climate vari-able to a high-resolution projection. This mapping, whichmust transform a single grid point to multiple points is anill-posed problem, one with many possible solutions. How-ever, we can mitigate the ill-posed problem by including statichigh-resolution topography data in conjunction with otherlow-resolution climate variables. We learn the SD modelusing observed climate datasets and then infer downscaledESM projections. Spatial and temporal non-stationarity ofthe changing climate system challenges traditional statisti-cal techniques. Downscaling precipitation further challengesthese methods with sparse occurrences and skewed distribu-tions. The combination of an ill-posed problem, uncertaintyin the climate system, and data sparsity propagates uncer-tainty in downscaled climate projections further.

3 Methods

3.1 Super-resolution CNN

SR methods, given a low-resolution (LR) image, aim to ac-curately estimate a high-resolution image (HR). As presentedby Dong et al. [Dong et al., 2014], a CNN architecture canbe designed to learn a functional mapping between LR andHR using three operations, patch extraction, non-linear map-pings, and reconstruction. The LR input is denoted as Xwhile the HR label is denoted as Y. A three layer CNN, F ,with rectified linear unit actiations can be defined such thatF (X) attempts to reconstruct Y. A Euclidean loss functionis used to the parameters (weights and biases) of F to define


5390

an optimization objective as:

argminΘ

n∑i=1

‖F (Xi; Θ)−Yi‖22 (1)

such that n is the number of training samples (batch size).

3.2 Stacked SRCNNTraditional SR methods are built for resolution enhancementsof factors from 2 to 4 while statistical downscaling conserva-tively requires resolution increases of factors from 8 to 12.Rather than enhancing resolution directly to 8-12x, as SRapplications typically do, we take an alternative approach.To achieve such a large resolution improvement, we presentstacked SRCNNs such that each SRCNN increases the res-olution by a factor of s. This approach allows the model tolearn spatial patterns at multiple scales, requiring less com-plexity in the spatial representations. The approach of stack-ing networks has been widely used in deep learning architec-tures, including stacked denoising autoencoders [Vincent etal., 2010] and stacked RBMs for deep belief networks [Hin-ton and Salakhutdinov, 2006]. However, contrary to theabove networks where stacking is applied in an unsupervisedmanner, each SRCNN is trained independently using their re-spective input/output resolutions and stacked at test time.

3.3 DeepSDWe now present DeepSD, an augmented and specific archi-tecture of stacked SRCNNs, as a novel SD technique. Whenapplying SR to images we generally only have a LR image toestimate a HR image. However, during SD, we may have un-derlying high-resolution data coinciding with this LR imageto estimate the HR images. For instance, when downscal-ing precipitation we have two types on inputs including LRprecipitation and static topographical features such as HR el-evation and land/water masks to estimate HR precipitation.As topographical features are known beforehand at very highresolutions and generally do not change over the period of in-terest they can be leveraged at each scaling factor. As donewhen training stacked SRCNNs, each SRCNN is trained in-dependently with it’s associated input/output pairs. As pre-sented in figure 1, inference is executed by starting with thelowest resolution image with it’s associated HR elevation topredict the first resolution enhancement. The next resolutionenhancement is estimated from the previous layer’s estimateand it’s associated HR elevation. This process is repeated foreach trained SRCNN. Figure 1 illustrates this process with aprecipitation event and it’s various resolution improvements.We see that this stacked process allows the model to captureboth regional and local patterns.

4 Experiments4.1 DataAs ESM outputs do not provide information at the high res-olution scales, we must use observational datasets to learnour SD model. Hence, in this context, a gridded observa-tion dataset acts as a proxy which can then be applied to

ESMs after training. In our experiments, we obtain precip-itation through the PRISM dataset at a 4km daily spatial res-olution which aggregates station observations to a grid withphysical and topographical information [Daly et al., 2008].We then upscale the precipitation data to 1/8◦ ( 12.5 km) asour high-resolution observations. Following, we upscale fur-ther to 1◦ corresponding to a low-resolution precipitation, asapplied in [Pierce et al., 2014]. For topography we use thethe GTOPO30 elevation (30 arcsec spatial resolution) datasetdistributed by the Land Processes Distributed Active ArchiveCenter (LP DAAC). The goal is then to learn a mapping be-tween our low-resolution and high-resolution datasets.

Data for a single day at the highest resolution, 1/8◦, cover-ing CONUS is an “image” of size 208x464. Precipitation andelevation are used as input channels while precipitation is thesole output. Images are obtained at each resolution throughup-sampling using a bicubic interpolation. For instance up-sampling to 1.0◦ decreases the image size from 208x464 to26x58. Precipitation features for the first SRCNN, downscal-ing from 1.0◦ to 1/2◦, are first up-sampled to 1.0◦ and theninterpolated for a second time to 1/2◦ in order to correspondto the output size of 52x116. This process is subsequentlyapplied to each SRCNN depending on it’s corresponding res-olution. During the training phase, 51x51 sub-images are ex-tracted at a stride of 20 to provide heterogeneity in the train-ing set. The number of sub-images per year (1095, 9125,and 45,625) increase with resolution. Features and labels arenormalized to zero mean and unit variance. Precipitation val-ues are only available over land so we set each null valueto a sufficiently low value of −5 which is then masked afterdownscaling accordingly.

4.2 Training ParametersAll SRCNNs are trained with the same set of parameters, se-lected using those found to work well by Dong et al. [Donget al., 2014]. Layer 1 consists of 64 filters of 9x9 kernels,layer 2 consists of 32 filters of 1x1 filters, and the outputlayer uses a 5x5 kernel. Higher resolution models which havea greater number of sub-images may gain from larger ker-nel sizes and an increased number of filters. Each networkis trained using Adam optimization [Kingma and Ba, 2014]with a learning rate of 10−4 for the first two layers and 10−5

for the last layers. Each model was trained for 107 iterationswith a batch size of 200. Tensorflow [Abadi et al., 2016] wasutilized to build and train DeepSD. Training harnessed threeTitan X GPUs on an NVIDIA DIGITS DevBox by indepen-dently training each SRCNN. Inference was then executedsequentially on a single Titan X GPU on the same machine.

4.3 ResultsDeepSD’s ability to provide credible projections is crucial toall stakeholders. While there are many facets to statisticaldownscaling, we use a few key metrics to show DeepSD’sapplicability. Root mean square error (RMSE) and Pearson’scorrelation are used to capture the predictive capabilities ofthe methods. Figure 2 maps this RMSE (mm/day) for each lo-cation. Bias, the average error, presents the ability to estimatethe mean while a skill score metric, as presented in [Perkins et


5391

25°N

30°N

35°N

40°N

45°N

125°W

120°W

115°W

110°W

105°W

100°W95

°W90°W

85°W

80°W

75°W

70°W

25°N

30°N

35°N

40°N

45°N

125°W

120°W

115°W

110°W

105°W

100°W95

°W90°W

85°W

80°W

75°W

70°W

DeepSD Bias Correction Spatial Disaggregation (BCSD)

0.0

0.6

1.2

1.8

2.4

3.0

3.6

4.2

4.8

5.4

6.0

mm/day

0.0

0.6

1.2

1.8

2.4

3.0

3.6

4.2

4.8

5.4

6.0

mm/day

Figure 2: (Best viewed in color) Daily Root Mean Square Error (RMSE) computed at each location for years 2006 to 2014 (test set) inCONUS for Left) DeepSD and Right) BCSD. Red corresponds to high RMSE while blue corresponds to low RMSE.

Bias Corr RMSE Skill RuntimeModel (mm/day) (mm/day) (secs)

Lasso 0.053 0.892 2.653 0.925 1297ANN 0.049 0.862 3.002 0.907 2015SVM -1.489 0.886 3.205 0.342 27800BCSD -0.037 0.849 4.414 0.955 –SRCNN -0.699 0.894 2.949 0.833 24DeepSD 0.022 0.914 2.529 0.947 71

Table 1: Comparison of predictive ability between all six methodsfor 1000 randomly selected locations in CONUS. Runtime is com-puted as the amount of time to downscale 1 year of CONUS.

al., 2007], is used to measure distribution similarity between0 and 1 where 1 corresponds to identical distributions.

Our first experiment compares six approaches, DeepSD,SRCNN (DeepSD w/o stacking), BCSD, Lasso, SVM, andANN, on their ability to capture daily predictability, pre-sented in Table 1. The four metrics discussed above are com-puted and averaged over the 1000 randomly selected loca-tions in CONUS where ASD methods were trained. We findthat DeepSD outperforms the other approaches in terms ofbias, correlation, and RMSE and closely behind BCSD interms of skill. We also find that the stacking performed byDeepSD provides a large performance improvement beyonda single SRCNN network with an 8x resolution increase. Fur-thermore, we find that SVM performs poorly in testing whilehaving the longest runtime. Similarly, the least complex ASDmethod, Lasso, outperforms the more complex ANN. As ex-pected, BCSD, a method built around estimating the underly-ing distribution, does well in minimizing bias and estimatingthe underlying distribution. For these reasons, in conjunctionwith our previous findings [Vandal et al., 2017a], the remain-ing experiments will limit the methods to DeepSD and BCSD.

In the next experiment compare DeepSD and BCSD, thetwo scalable and top performing methods from the previousexperiment, with each metric over CONUS. Each metric iscomputed per location and season using the daily observa-tions and downscaled estimates then averaged over CONUS.We find that DeepSD has high predictive capabilities for allseasons, higher correlation and lower RMSE, when com-pared to BCSD. Similar results are shown in Figure 2 where

DeepSD has a lower RMSE than BCSD for 79% of CONUS.Furthermore, we find each method’s ability to estimate theunderlying distribution well with low bias, < 0.5 mm/day,and a high skill score of ∼ 0.98. As BCSD is built specifi-cally to minimize bias and match the underlying distribution,DeepSD’s performance is strong. Overall, DeepSD outper-forms BCSD for the chosen set metrics. We encourage thereader to refer to [Vandal et al., 2017b] for more thorough re-sults and DeepSD’s application on the NASA Earth Exchange(NEX) platform.

5 ConclusionThough DeepSD shows promise for SD, there are still somelimitations in our experimentation regarding spatial and tem-poral generalization. An advantage of DeepSD is that a sin-gle trained model is able to downscale spatial heterogeneousregions. However, we do not test predictability in regionswhere the model was not trained. Future work will exam-ine this hypothesis to understand DeepSD’s credibility in re-gions with few observations. Second, we do not test temporalnon-stationarity, a longstanding problem in statistical down-scaling. Evaluation under non-stationarity can be tested us-ing approaches presented by Salvi et al. [Salvi et al., 2015],such that training and testing data is split between cold/warmyears. As there is a single model for all locations, includ-ing cold and warm climates, we hypothesize that DeepSD iscapable of capturing non-stationarity.

Furthermore, future work can improve multiple facets ofDeepSD. For instance, the inclusion of more variables suchas temperature, wind, and humidity at different pressure lev-els of the atmosphere may capture more climate patterns.Also, downscaling multiple climate variables simultaneouslycould be explored to find similar spatial patterns in the high-resolution datasets, such as high temperatures and increasedprecipitation. Most importantly, DeepSD fails to capture un-certainty around its downscaled projections, a key factor inadapting to climate change. Recent advances in BayesianDeep Learning concepts [Gal, 2016; Vandal et al., 2018] mayaid in quantifying uncertainty. Though these limitations ex-ist, DeepSD is a scalable architecture with high predictivecapabilities which provides a novel framework for statistical


5392

downscaling.

AcknowledgmentsThis work was supported by NASA Earth Exchange (NEX),National Science Foundation CISE Expeditions in Comput-ing under grant number: 1029711, National Science Founda-tion CyberSEES under grant number:1442728, and NationalScience Foundation BIGDATA under grant number:1447587.The GTOPO30 dataset was distributed by the Land ProcessesDistributed Active Archive Center (LP DAAC), located atUSGS/EROS, Sioux Falls, SD. http://lpdaac.usgs.gov. We thank Arindam Banerjee for valuable comments.

References[Abadi et al., 2016] Martın Abadi, Ashish Agarwal, Paul

Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro,Greg S Corrado, Andy Davis, Jeffrey Dean, MatthieuDevin, et al. Tensorflow: Large-scale machine learn-ing on heterogeneous distributed systems. arXiv preprintarXiv:1603.04467, 2016.

[Cannon, 2011] Alex J Cannon. Quantile regression neu-ral networks: Implementation in r and application toprecipitation downscaling. Computers & geosciences,37(9):1277–1284, 2011.

[Daly et al., 2008] Christopher Daly, Michael Halbleib,Joseph I Smith, Wayne P Gibson, Matthew K Doggett,George H Taylor, Jan Curtis, and Phillip P Pasteris. Phys-iographically sensitive mapping of climatological temper-ature and precipitation across the conterminous unitedstates. International journal of climatology, 28(15):2031–2064, 2008.

[Dong et al., 2014] Chao Dong, Chen Change Loy, KaimingHe, and Xiaoou Tang. Learning a deep convolutional net-work for image super-resolution. In European Conferenceon Computer Vision, pages 184–199. Springer, 2014.

[Gal, 2016] Yarin Gal. Uncertainty in Deep Learning. PhDthesis, Ph. D. thesis, University of Cambridge, 2016.

[Ghosh, 2010] Subimal Ghosh. Svm-pgsl coupled approachfor statistical downscaling to predict rainfall from gcmoutput. Journal of Geophysical Research: Atmospheres,115(D22), 2010.

[Hessami et al., 2008] Masoud Hessami, Philippe Gachon,Taha BMJ Ouarda, and Andre St-Hilaire. Automatedregression-based statistical downscaling tool. Environ-mental Modelling & Software, 23(6):813–834, 2008.

[Hinton and Salakhutdinov, 2006] Geoffrey E Hinton andRuslan R Salakhutdinov. Reducing the dimensionality ofdata with neural networks. Science, 313(5786):504–507,2006.

[Kingma and Ba, 2014] Diederik Kingma and Jimmy Ba.Adam: A method for stochastic optimization. arXivpreprint arXiv:1412.6980, 2014.

[Nicholls and Cazenave, 2010] Robert J Nicholls and AnnyCazenave. Sea-level rise and its impact on coastal zones.science, 328(5985):1517–1520, 2010.

[Pachauri et al., 2014] Rajendra K Pachauri, MR Allen,VR Barros, J Broome, W Cramer, R Christ, JA Church,L Clarke, Q Dahe, P Dasgupta, et al. Climate change 2014:Synthesis report. contribution of working groups i, ii andiii to the fifth assessment report of the intergovernmentalpanel on climate change. 2014.

[Perkins et al., 2007] SE Perkins, AJ Pitman, NJ Holbrook,and J McAneney. Evaluation of the ar4 climate models’simulated daily maximum temperature, minimum temper-ature, and precipitation over australia using probabilitydensity functions. Journal of climate, 20(17):4356–4376,2007.

[Pierce et al., 2014] David W Pierce, Daniel R Cayan, andBridget L Thrasher. Statistical downscaling using local-ized constructed analogs (loca)*. Journal of Hydrometeo-rology, 15(6):2558–2585, 2014.

[Salvi et al., 2015] Kaustubh Salvi, Subimal Ghosh, and Au-roop R. Ganguly. Credibility of statistical downscalingunder nonstationary climate. Climate Dynamics, (May2014), 2015.

[Schiermeier, 2010] Quirin Schiermeier. The real holes inclimate science. Nature News, 463(7279):284–287, 2010.

[Taylor et al., 2012] Karl E Taylor, Ronald J Stouffer, andGerald A Meehl. An overview of cmip5 and the experi-ment design. Bulletin of the American Meteorological So-ciety, 93(4):485, 2012.

[Trenberth, 2012] Kevin E Trenberth. Framing the way to re-late climate extremes to climate change. Climatic change,115:283–290, 2012.

[Vandal et al., 2017a] Thomas Vandal, Evan Kodra, and Au-roop R Ganguly. Intercomparison of machine learningmethods for statistical downscaling: The case of daily andextreme precipitation. arXiv preprint arXiv:1702.04018,2017.

[Vandal et al., 2017b] Thomas Vandal, Evan Kodra, San-gram Ganguly, Andrew Michaelis, Ramakrishna Nemani,and Auroop R Ganguly. Deepsd: Generating high res-olution climate change projections through single imagesuper-resolution. In 23rd ACM SIGKDD Conference onKnowledge Discovery and Data Mining, 2017.

[Vandal et al., 2018] Thomas Vandal, Evan Kodra, JenniferDy, Sangram Ganguly, Ramakrishna Nemani, and Au-roop R Ganguly. Quantifying uncertainty in discrete-continuous and skewed data with bayesian deep learning.arXiv preprint arXiv:1802.04742, 2018.

[Vincent et al., 2010] Pascal Vincent, Hugo Larochelle, Is-abelle Lajoie, Yoshua Bengio, and Pierre-Antoine Man-zagol. Stacked denoising autoencoders: Learning use-ful representations in a deep network with a local de-noising criterion. Journal of Machine Learning Research,11(Dec):3371–3408, 2010.


5393

Documents

Generating High Resolution Climate Change Projections ... · Advances in single image super-resolution (SR) correspond well to SD, which learns a mapping between low- and high-resolution