Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CALIFORNIA HYDROGEN INFRASTRUCTURE TOOL TECHNICAL FORMULATION WEBINAR Primer on GIS and Data Source Concepts
October 09, 2015
For questions or comments, contact:Andrew Martinez (916) [email protected]
Primer Outline 2/46
Purpose: The optional primer is intended as a brief overview of GIS-related concepts pertinent to many aspects of the technical formulation of CHIT
• Brief Introduction and review of CHIT and AB 8 process • Introduction to GIS fundamentals • ArcGIS tools utilized in formulation of CHIT • Creating custom tools in ArcGIS • Working with geographies in Census and DMV data sources
Primer Outline 3/46
• This primer will answer questions like:
• What are some of the major built-in ArcGIS tools used in CHIT and what do they accomplish?
• What are the differences in the spatial structure of the data sources and how are they reconciled?
• How does ArcGIS assess spatial distribution of data? What statistical analyses are applied?
• How is a tool like CHIT made in ArcGIS?
INTRODUCTION AND REVIEW OF CHIT
5/12
Cl-tAT
+ Auto Manufacturer Surveys
and OMV Records
Cl-t IT
Station Coverage Assessment
+
Station Status
Market Indicator Assessment
Regional Vehicle Placement
Regional Hydrogen Balance
Localized Analysis of Priority
Localized Targets for Further Funding
HYDROGEN DRAFTSOUCITATIONCONCEPTS
Alternittive itnd Renewable Fuel and Vehicle Technology
Program
SYbject Area - Hydroeen Rtfu11lin1 Infrastructure
... __ ....__ ..... _____ . ..,_ ... __ o. .. _ .. ____ .. _ "'"......,_ ... _ --... ---~-...-. _.,, .. OIIW'T_.N_.,.,._IJ-1<1.I0\1~._._, ... .., __ .,__ ... ____ .,._., ........ JeU .. 111o-., ---~----...---.. ---
Introduction CHIT/CHAT Tools and AB 8
5/46
'' '' l
t
6/46
Introduction CHIT: A Coverage and Market Assessment Tool
GIS Network Analysisand Station Area
Planning
• CHIT is a planning tool intended to provide general direction indicating areas of needed infrastructure
• CHIT evaluates relative need for hydrogen infrastructure based on a gap analysisbetween a projected market and current infrastructure Existing
Coverage: Market: 100 50
City ACoverage
Existing Gap: 50Coverage: Market: 60 40
City B CoverageGap: 20
FUNDAMENTALS OF GEOGRAPHIC INFORMATION SYSTEMS
• •
•
• • •
8/46
GIS Fundamentals Representing Geospatial Data
• Geospatial data can be conceptualized as objects that are stored, analyzed, compared, edited or transformed
• Objects have geometry (geography) and attributes • Geometry includes dimension, size (resolution), orientation,
location, etc…
•
• • • • •
• •
9/46
GIS Fundamentals Representing Geospatial Data
• Attributes are data fields, like temperature reading at a weatherstation, traffic volume along a stretch of road, or population counts in counties
• Not all data sources are equivalently rich and detailed in geometry and attributes
• May not find a ready-made data source with all required aspects for a given project
.... ..~ .. ' '~(" •~-.
y ;
" l \
. . "'"" --,;:· / ·-. )1 ::: ' --~ ;::,: \
10/46
GIS Fundamentals Representing Geospatial Data
• When working with geospatial data, it is important to remain aware of geometry and attributes and assure the best and most logical choices are used for data analysis, transformation, and representation
• It is also important to always remember that even the most detailed geospatial data is itself a representation or model of the world • May closely match the real world but all data will
have inherent errors, even before any analysis is performed
• There are potentially many valid models of the real world
• Data quality needs to be constantly monitored and assessed during analyses
• • • • • •
•
11/46
GIS Fundamentals Normalization and Representation of Quantitative Data
TargetPopulation
Additional Population
How should we relatively weight thecount of red dots in each division?
1) Independent ofcontaining areaand blue dots in same area
2) Proportional tototal number of dots (red + blue) in area
3) Normalized to size of containing area
4) Proportional tototal and normalized by area
12/46
GIS Fundamentals Normalization and Representation of Quantitative Data
How should we relatively weight thecount of red dots in each division?
1) Independent ofcontaining areaand blue dots in same area
2) Proportional tototal number of dots (red + blue) in area
3) Normalized to size of containing area
4) Proportional tototal and normalized by area
13/46
GIS Fundamentals Normalization and Representation of Quantitative Data
How should we relatively weight thecount of red dots in each division?
1) Independent ofcontaining areaand blue dots in same area
2) Proportional tototal number of dots (red + blue) in area
3) Normalized to size of containing area
4) Proportional tototal and normalized by area
14/46
GIS Fundamentals Normalization and Representation of Quantitative Data
How should we relatively weight thecount of red dots in each division?
1) Independent ofcontaining areaand blue dots in same area
2) Proportional tototal number of dots (red + blue) in area
3) Normalized to size of containing area
4) Proportional tototal and normalized by area
15/46
GIS Fundamentals Normalization and Representation of Quantitative Data
How should we relatively weight thecount of red dots in each division?
1) Independent ofcontaining areaand blue dots in same area
2) Proportional tototal number of dots (red + blue) in area
3) Normalized to size of containing area
4) Proportional tototal and normalized by area
• • • • • • • • • • • • • • • • • • • •• • • • • • •
• • • • • • • • • • • • • • ·• • • • • • • • • • • •
• • • • • • • • • • • • • • • • • • • • • • • • • • •
16/46
GIS Fundamentals Normalization and Representation of Quantitative Data
17/46
GIS Fundamentals Normalization and Representation of Quantitative Data
• Best practice in GIS to consider area-based normalization • Provides consistent basis of intensity measure for objects of varying sizes
• Considering proportion of target market to total population can be influenced by intent of analysis • Do the blue dots (not target market) matter in our calculation?
• Normalization by population accounts for population type clustering, but causes issues of representing large absolute differences
• Normalization by both population and area does not eliminate this problem
18/46
GIS Fundamentals Overlay Actions
• Overlay actions are one of the most common analysis processes within GIS
• Used to compare attributes from varying data sets and possibly of different geometries
• Overlaying geometries can lead to creation of undesirable features and rapidly growing numbers of objects
19/46
GIS Fundamentals Overlay Actions
• Can alleviate by using a structured grid representation of theinput data
• Grid resolution must be carefully considered as compared to input data geometry
ARCGIS TOOLS USED IN DEVELOPMENT OF CHIT
Network Analy.st ... 11~ ■lf; Rl_ a ~ New Ro!!Jte
New Sewi ce Arie.a
New Clo-sest Facility
New OD Co-st M atri ix
New Vehicl e Ro!!Jtirng Prnblem
New Location -All ocat ion
Options ...
Network Analyst q. X
OD Cost Matrix • @.) ll] !.-0.U
I±] Destinations (14J
lines (196J
El Point Barriers (OJ
Restriction (OJ
Added Cost (0)
El line Barriers (OJ
Restriction (0)
Scaled Cost (OJ
El Polygon Barriers (OJ
Restriction (OJ
Scaled Cost (OJ
Table Of Contents q. X
~ g ~ @ ~ -El ~ Layers
El ~ 0 D Cost Matrix
-J-~ Origins
• Erro r
• Located
?Q Un located
-J-D Destinations
■ Erro r
■ Located
? O Un located
-;i-~ Point Barriers
@ Erro r
0 Restriction
0 Added Cost
-)-~ Lines
- Lines
21/46
ArcGIS Functions Network Analysis and Service Areas
• If you’ve used a mapping application to find directions to a destination, you have used a network analysis tool
• Network analysis works with data sets having: • Geometry representing the system of roadways • Attributes representing travel metrics like velocity
• Representation of the roadway network has significant impacton the quality and validity of network analyst solutions
22/46
ArcGIS Functions Network Analysis and Service Areas
• Network analysis can be used to: • Find the fastest route between an
origin and destination (as in the fastest route from work to home when there’s heavy highway traffic)
• Find the closest facility to an origin from a set of available options (asin finding the closest pizza restaurant to your house)
• Optimize the order of visitingmultiple locations on a route (as in optimizing a delivery truck’s daily route)
• Find the boundary of the area that can be reached from a startingpoint within a given constraint (as in finding all the wineries you can reach by driving at most 20 minutes from downtown Napa)
23/46
ArcGIS Functions Network Analysis and Service Areas
• Network analyses provide options for highly detailed considerations, if the data are sufficiently known: • Elevations of roads, to determine sections that are over/underpasses • U-turn capabilities of vehicles and locations of allowable U-turns • Curb approaches allowed for vehicles to various locations • Delays encountered at traffic lights and in making turns
• Degree of detail to consider depends heavily on data and computing resources, extent of the network analysis, and desired goals of the analysis
24/46
ArcGIS Functions Statistical Analysis and Spatial Distributions
• Analysis of spatial data involves an additional dimension(s)beyond non-spatial data
• Patterns that may not be apparent in non-spatial data can be readily evident when spatial arrangement of the data is considered
• Spatial statistics is the study and implementation of statistically analyzing spatial variability and uncertainty
• Can be useful for data interpolation, analysis and identification of trends and patterns, comparison of attributes in data sets, and identification of co-variation between attributes
0 -
1 2 3 4 More
1
2
3
4
1413433123221424
25/46
ArcGIS Functions Statistical Analysis and Spatial Distributions
Location X Location Y Value 1 1 1 2 2 4 4 1 1 3 1 3 3 2 4 4 2 3 1 3 3 1 4 1 2 1 2 2 4 3 1 2 2 4 3 2 4 4 1 2 3 4 3 4 2 3 3 4
Average: 2.5Standard Deviation: 1.15
Loca
tion
Y Location X
Geo,st at ist ical Analyst ~ ~
Expl o,re Data
Geo,st at ist ical Wi!Zard ...
~~~ Su h!>et Features ...
~ Geostat istica l Analyst Help
~ Tutorial 1 ...... ----------------,j
Histog ram
Normal QQPlot
Voronoi Map
@ Trend Analysis
[Zl Semivariogram/ Covariance Cloud
lt:J General QQPlot
[Sl CroS'!>covariance Cloud
El Analyzing Pattern s
Average Nea rest l'N eighbo,r
Hi gh/ Low Clw;tering (Getis-Ord General G)
- In cremental Spat ial Autocorrelation
Mu lt i-Dist ance Spati al Cluster Analysis (Ripleys K Function)
Spat ial Autocorrelat ion (Morans I)
El Mapping Clust ers
- Clm.ter and Outli er Analysis (An!>eli n Local Morans I)
Grouping Analysis
- Hot Spot Analysis (Getis-Ord Gi1
Optimi!Zed Hot Spot Analysis
El . Mea suring Geograph ic Dist ributiom
Centra I Feature
- Directioml Distribution (Stand ard Deviat iom l Elli p,se)
Linea r Direction al Mea n
Mea n Center
Median Center
- St andard Dist ance
El Mod eli ng Spat ial Relat ionship,s
- Explorato,ry Reg ression
Generat e !N etwork Spati al Weights
Generat e Sp,at ial Weights Matrix
Geographically Weighted RegreS'sion
- Ord inary Lea st Squ ares
[±I Rend ering
[±I ~ Uti lit ies
26/46
ArcGIS Functions Statistical Analysis and Spatial Distributions
• ArcGIS provides methods to explore and model spatial relationships and patterns in data attributes
• Key concept is Tobler’s first law of geography:
“Everything is related to everything else, but near things are more related than distant things”
27/46
ArcGIS Functions Statistical Analysis and Spatial Distributions
• Tobler’s law provides a conceptual framework for finding notable features in the distribution of attributes, like hot spots
• Additionally, it is directly related to predominant methods of interpolation, used to “fill in” data where there are spatial gaps between observed data
• The Inverse Distance Weighted (IDW) interpolation method and its related functions directly assume that interpolated data points are more heavily influenced by nearby data
• Like many statistical modeling methods, care must be taken indefining a number of parameters of the estimation method(s) that may significantly alter the solution
□ □ _f _f _f _f _f ----
28/46
ArcGIS Functions Statistical Analysis and Spatial Distributions
2 ∗ 11 3 ∗ 12
1 ∗ 1 2 ∗ 11 3 ∗
12
1 ∗ 15
1 ∗ 1 85
2 3 1 2 3 1 1
0.76 0.70
ARCGIS CUSTOM TOOLS
30/46
ArcGIS Custom Tools Models
• The term “model” is a broad term used in many disciplines and can mean many different things depending on discipline, context, and individual interpretation
• For example, a model could be • A single equation representing the relationship between a dependent and
independent variable • A set of equations that must be solved simultaneously • A statistical regression of the apparent distribution of attributes in a data
set • An intricate, branching, possibly iterating set of code meant to represent
processes at many scales for a complex (or even simple) system
IE Al.t,,:lwnmb
lli ~ OM.aC~ lil ~ Oirtribul:edGeodttab.i~
IE ~ '~~ ~ Fr•~CLn,
lil ~ fut ... u
1E " 1Flthh. "-i ... Gro61t,bt.aV
.i ~ GeDtt,I 8 Iii., Gmn, . "6on
Ui111~e '\,C!imirv=eP~P.wt
Iii !It., (i..,mL,l..a,....,_~ A..._,,d,..t-,
•l ~ ~ril:Nll!N'Grt
lil "' Gr.ph IE ~ lnd-
~ loiM
~ ~Dmut
CHIT_Tools
~ Determin e Drive Overl aps
::),a Determin e St at ion Counts 6 -Factor Matc h_Norm
::),a Determine St ation Counts 6-Factor Match_Norm_Alt
::),a Filter Prioriti es by Area
::),a Fin alize Priorit ies
~ Generate Coverage Factor
::),a Generate Heat Map
::),a Prio ritize Heat Map Areas
::),a Reorder Pri oriti es
::),a Statistica l Hot Spots
31/46
ArcGIS Custom Tools Custom Tools as a Shareable Analysis Workflow
• Though ArcGIS uses the term “model,” in ARB’s implementation, it may be more appropriate to think of Model Builder as a way tocatalog and share analysis in a consistent and repeatable manner
• Model Builder allows an analyst to combine built-in ArcGIS tools (and their own custom tools) to perform tasks that are combinations of these tools
Custom Tool
®-
~ Generate Coverage Factor
Ckk error and w~ ieon:; fur more i'lformation
Test_Drilles
)C-•\ llc•r.:\ >1 m>1rtin•\ n..: L-t n,,\ C- ,,rr•nt Wn,l-\AAA H' '\hti_3 ~ & CA_Gridded
C:~s\amartine'Desktopv=,.,rent Work\ABS H2 Station Oat.al:
,&rest_Jori C:~s\amartine'Desktop~entWork\ABSH2StationDatal:
&-_ .. ....,, C:~s\amartine'Desktop'Ar'r"eritWork\A88H2StationDatal:
lMinSCOre
3MinSCOre
,..,~(7J
~
Generate Coverage Factor
Generates a map of the calculated coverage factor. This tool is intended to be used pnor to uenerate Heal Map m oraer to DUIIa one ot 1ne key input nles.
This tool require s:
1) A feature class containing the results of a service area network analysis with 1, 3, 6, 9, 12, and 15 minute drive times evaluated for all stations of interest. The tool requires this pre•processing as it was found to take much longer in Model Builder than can be achieved interactively in ArcMap.
2) A template feature class within which lo store temporary rating data. A file named Test_Join_Temp has been supplied with this tool. It is highly recommended that this file never be altered. Prior to use of this tool , export Test_Join_Temp to a new feature class and specify this as the Test_Join input.
3) This tool should be supplied with a supporting tool called Determine Drive Time Overlaps. This tool must be available for proprer execution.
32/46
ArcGIS Custom Tools Building Custom Tools from Built-in and Custom Tools
• Creating custom tools also allows for documentation of theprocess and the opportunity to consistently apply key input values that can affect analyses
Default values for parameterscan be stored with custom tools
Values that can be defined by theuser at run time are Parameters and indicated by a “P”
LC1 I I
~xecuted
are Append are
33/46
ArcGIS Custom Tools Iteration
• At times, an analysis may require the same steps to be performed on every feature in a dataset separately (or every dataset in a database, etc…)
• In these cases, iteration is required; however, ArcGIS does have strict limits on the types of iteration available
• Additionally, each custom tool canonly have one iterator and alloperations in the tool will be executedin the iteration
• For this reason, iterative steps often kept in separate toolsand included as a stepin other custom tools
In this example, the Sort step must before the field calculation because the expression depends on the index order of features in the data set
rec =O def autolncrementQ : global rec pStart = 1 plnterval = 1 ifrec == 0: rec = pStart
else: rec = rec + plnterval
Fedunt0.. lo ....... a..
34/46
ArcGIS Custom Tools Controlling Execution Flow
• Unlike other development languages or environments, ArcGIS does not initially assume the visual structure of the model indicates the intended order of execution
• For models that require sequential steps, preconditions must be put in place to indicate which steps must finish before another can begin
occurPreconditions are indicated with a dashed line in the Model Builder interface
CENSUS AND DMV DATA GEOGRAPHIES
36/46
Data Source Geographies Census Data
• Census Data are available from American Fact Finder: https://factfinder.census.gov
• American Fact Finder provides data from multiple datasets • Selection of dataset determined by geographic scale, attributes desired,
and statistical certainty desired • CHIT utilizes the American Community Survey (ACS), chosen for being a
program with continual, annual updates and the richness of the attribute data available
37/46
Data Source Geographies Census Data
• ACS data are available on a variety of geographic scales. Choosing the right scale depends on the intent.
• Block: The fundamental unit in census data sources and the only unit for which 100% data are published and collected (not just representative samples). Represented as points in the census data.
• Blockgroup: The smallest geography for which representative sample data are published. Represented as polygons.
• Tract: Roughly equivalent to a neighborhood; largely constant geography through time but may change. Represented as polygons.
• County, State, etc…
• Others : County Subdivision, Region, Urbanized Area, Metropolitan Statistical Area, Legislative Districts, etc…
... · •• ·• .t,:
~
~;:~ -:.~ -~ -·.:--.::"'.\
. . . -~.: .. . · ....
. ~--: : ": . ,_. ':
38/46
Data Source Geographies Census Data
• Blocks: 410,559 in California • Most limited in attributes available; ARB considered best
used for populations only
• Block Groups: 23,212 in California • Typically 600 to 3,000 people • Income data aggregates limited to:
• Aggregate (Sum), Median • Average area: ~7 sqmi
• Tracts: 8,057 in California • Typically 1,200 to 8,000 people; optimum 4,000 • Income data available in a variety of forms
• Aggregate (Sum), Median, Mean, Quintile Mean, Quintile Bounds, Quintile Share
• Quintile data includes top 5% • Average area: ~20 sqmi
39/46
Data Source Geographies Census Data
• Counts in the form of aggregate sums or means are useful as directly-usable data points when other estimations do not need to be made • Cannot use aggregates alone to develop an understanding of the shape
of the population’s distribution of an attribute • Ex: The mean of income alone cannot help determine the cutoff for the top 20%
without significant additional assumptions
• Descriptive statistic attributes beyond counts allow for statistical modeling, interpolation, and extrapolation • Will see in afternoon that this was important for implementing income
data in particular
40/46
Data Source Geographies Department of Motor Vehicles Data
• DMV data are based on registrations of vehicles • For tools or analyses that can remain confidential, data could be
analyzed at resolutions down to the street address (point resolution even finer than Census blocks)
• CHIT is intended to be public and transparent; non-confidential data is available from DMV only at the ZIP code level
41/46
Data Source Geographies Department of Motor Vehicles Data
• ZIP codes are often conceptualized as polygons
• However, ZIP codes are actually defined by a set of routes
• Defining a bounding polygonis not a trivial matter and can have multiple equivalently valid solutions
• US Census has developed a standardized set of polygons (ZCTA: Zip Code Tabulation Area) that ARB has adopted for use in CHIT to define ZIP codes
Set of routes/streetsserved by a single ZIPcode
42/46
Data Source Geographies Department of Motor Vehicles Data
• 1,808 ZIP Codes represented in DMV data • ZCTAs are typically larger than tracts, though may be similar in
size in urban areas
43/46
Data Source Geographies Summary of Spatial Bases Used
Increasing• Population: • Block (Highest Resolution Available) Size
• Income: leads to… • Tract used for modeling of distribution of income
and education Greater
• DMV Data: Statistical • ZIP codes (as represented by ZCTAs) Certainty,
• OEM Surveys: Lower • Counties or Statewide Resolution
44/46
Data Source Geographies ACS Multi-Year Estimates
• ACS data are provided in 1-, 3-, and 5-year aggregates
• Aggregating years allows for inclusion of more samples, which increases statistical certainty
• For similar reasons, the Census provides aggregated data on increasingly large geometries as fewer years included
• CHIT prioritizes certainty over currency
Including more years
leads to…
Greater Statistical Certainty,Less Current
Wrap‐Up 45/46
• Covered a variety of topics that will be instrumental in the main technical formulation discussion and further exploration of CHIT
• Brief Introduction and review of CHIT and AB 8 process • Introduction to GIS fundamentals • ArcGIS tools utilized in formulation of CHIT • Creating custom tools in ArcGIS • Working with geographies in Census and DMV data sources
• What are some of the major built-in ArcGIS tools used in CHIT and what do they accomplish?
• What are the differences in the spatial structure of the data sources and how are they reconciled?
• How does ArcGIS assess spatial distribution of data? What statistical analyses are applied?
• How is a tool like CHIT made in ArcGIS?
DISCUSSION For questions or comments, contact: Andrew Martinez (916) 322-8449 [email protected]