35
ENVR 468 Temporal GIS Assignment on Address Geocoding What is Address Geocoding? Address Geocoding, Address Matching, or “geocoding” is the process of creating locational points (i.e. Latitude/Longitude Coordinates) from data that only contains addresses. There are different levels of geocoding accuracy such as geocoding to an address vs. geocoding to zip code. Address geocoding compares the given addresses with the Reference File, which contains known addresses and their locations. What are its uses? It is used when spatial analysis is desired, but the locations are not provided. It is used in medical, epidemiological, geographical, and environmental studies on a regular basis. In this assignment you will do address geocoding: 1) using a zip code address locator, 2) using a street address locator, 3) using a point address locator, and 4) using a composite address locator. Geocoding Addresses for a study on Dookitifi: Background: Dookitifi is a condition found in a small fraction of patients seen in Orange and Durham counties.

ENVR 468 Temporal GIS - unc.edu€¦  · Web viewENVR 468 Temporal GIS . Assignment on Address ... Right click on DISEASE Table, select ‘Geocode Addresses…’, select the ‘World

Embed Size (px)

Citation preview

ENVR 468 Temporal GIS

Assignment on Address Geocoding

What is Address Geocoding?

Address Geocoding, Address Matching, or “geocoding” is the process of creating locational points (i.e. Latitude/Longitude Coordinates) from data that only contains addresses. There are different levels of geocoding accuracy such as geocoding to an address vs. geocoding to zip code. Address geocoding compares the given addresses with the Reference File, which contains known addresses and their locations.

What are its uses?

It is used when spatial analysis is desired, but the locations are not provided. It is used in medical, epidemiological, geographical, and environmental studies on a regular basis.

In this assignment you will do address geocoding:1) using a zip code address locator,2) using a street address locator,3) using a point address locator, and4) using a composite address locator.

Geocoding Addresses for a study on Dookitifi: Background: Dookitifi is a condition found in a small fraction of patients seen in Orange and Durham counties.

Address Geocoding Part 1: Using a zip code address locator

1) Download the file ‘13b_GeocodingFiles.zip’ from the class website and then unzip in a local folder named ‘D:\temp\Geocoding’

2) Open ArcMap, click on “Add Data”, navigate to the ‘D:\temp\Geocoding’ folder, and select all in order to add the following disease data table and supporting shapefiles to your map.

a. DISEASE.dbf (Table of patients’s address and Dookitifi condition)b. Orange_Durham_County_Polygon.shp (Boundaries of Orange and Durham counties)c. Orange_Durham_DOT_Line.shp (Road lines from the NC Depart. of Transp. (DOT) ) d. Orange_Durham_Point.shp (Building points from Orange and Durham counties)e. Orange_Durham_Zip_Codes.shp (Centroid points of zip codes)f. UNC_Duke.shp (Location of the UNC and Duke universities)

3) Turn off the Orange_Durham_DOT_Line.shp and Orange_Durham_Point.shp shapefiles, display the Zipcode points with large dots, and display the UNC and Duke university with large crosses (you can also label them).

4) Right click on the DISEASE table and select ‘Open’. This table is entirely made up. For (randomly generated) patients the table lists their zip code, street address, and Dookitifi condition. For example patient 1 has the Dookitifi condition while patients 2-4 do not.

5) Go to Customize Toolbars Geocoding. Dock the geocoding bar to your ArcMap window.

6) Go to the geocoding bar, click on the first pull down (‘Select Address Locator’), and select ‘<Manage Address Locators…>’. You will see some generic address locators, but they may not work for you so you will have to create your own address locator for NC.

7) Create your own address locator based on zipcode centroids. You can either do this in ArcCatalog by right clicking on your working folder (‘D:\temp\Geocoding’) and selecting New ‘Address Locator…’, or in ArcToolBox by selecting Geocoding Tools Create Address Locator:

8) Once the ‘Create Address Locator’ window opens. Fill up the fields as follow:a. Address Locator Style: Select US Address – ZIP 5 – Digit b. Reference Data: Select Orange_Durham_Zip_Codes.shp as the Primary Tablec. Field Map: Keep the defaults (ZIP=’ZIP’, etc.)d. Output Address Locator: D:\temp\Geocoding\zipcode_Loc

9) Click on OK to create the ‘zipcode_Loc’ address locator. This takes a few moments. Once the address locator is created it will be listed in ArcCatalog and in the Address Locator Manager

a. In ArcCatalog refresh our working directory to see zipcode_Loc listed as follow

b. In ArcMap go to the geocoding bar, click on ‘Select Address Locator’ and select ‘<Manage Address Locators…>’ to see zipcode_Loc listed as the default address locator

10) You will now geocode the patients’ addresses using their zip codes. In ArcMap find the DISEASE table, right click on it, and select ‘ GeoCode Addresses…’. Alternatively you can click on the icon on the geocoding bar. First the Address Locator Manager will open (see above). Select the ‘zipcode_Loc’ address locator and click “OK”. This opens up the Geocode Addresses dialog box. Fill up the fields as follow

a. Address Table: Select the DISEASE tableb. Address Input Fields: Single Field, ZIPcode = ‘ZIP_CODE’c. Output shapefile or feature class: ‘D:\temp\Geocoding\Geocoding_Result_zip.shp’

d. Click on ‘Geocoding Options’, set Minimum candidate score = 10 and Check all of the boxes under the Output Fields

e. Click OK twice. ArcGIS will then begin the geocoding process and show the results

f. After you review the results, click Closeg. View the newly geocoded points. All the patients were located to the centroid of their zip

code. Note that some zip codes have no patient, while other zip codes may have more than one patient (see exercise 1).

Exercise 1. Find if there is a spatial relationship in the number of patients with the Dookitifi condition after geocoding to the Zip Code level.

1. Summarize the DISEASE table by Zip Code. Under Dookitifi, check Sum. Save the output as DISEASEbyZip.

2. Geocode DISEASEbyZip using the zip code address locator. Save the results as Geocoding_Result_zipSums.shp.

3. Display the properties using Symbology and Proportional symbols to show the sum of the Dookitifi counts.

4. Is there a spatial pattern?

Address Geocoding Part 2: Using a street address locator

1) Create a street address locator: Click on ArcToolBox Geocoding Tools Create Address Locator. Fill the fields as follows:

a. Address Locator Style: Click on to open the Select Address Locator Style window, and select ‘US Address - Street Name’

b. Reference Data: Orange_Durham_DOT_Line.shp (Primary Table)c. Field Map: Street Name=’Rd_Name’d. Output Address Locator: D:\temp\Geocoding\street_Loc

2) Geocode addresses using your new address locator: Right click on the DISEASE table, select ‘Geocode Addresses…’, select your new ‘street_Loc’ address locator, under Address Input Fields set ZIP Code = ZIP_CODE, set your Output shapefile to D:\temp\Geocoding\Geocoding_Result_street.shp. Under ‘Geocoding Options’, set Minimum candidate score = 10 and check all of the boxes under the Output Fields.

3) Click OK and view the results. The unmatched rate have gone up. Why? Click on Close and open the attribute table for the new shapefile (Geocoding_Result_street.shp). A status of M=Matched, T=Tied, U=Unmatched.

4) View the newly geocoded points. The geocoding improved because each address is located based on their street address, as opposed to their zip code. As a result the geocoding is more specific and accurate. The increased specificity of address geocoding is the reason why there are more unmatched addresses. Hence increased accuracy comes at the cost of increased unmatched rate.

5) Turn on the Orange_Durham_DOT_Line.shp shapefile, and zoom in close to Duke University so you can see Lewis Street and Anderson Street. Click on Add Basemap and select the OpenStreetMap. While street geocoding is better than zip code geocoding, we can see that the geocoded locations are restricted along the streets. This is not where people actually live. Hence the accuracy of street geocoding can still be further improved. We will do this next by using a point address locator.

Address Geocoding Part 3: Using a point address locator

1) Display the Orange_Durham_Point shapefile. This shapefile contains the actual location of buildings.

2) Create a point address locator using the Orange_Durham_Point shapefile: Click on ArcToolBox Geocoding Tools Create Address Locator. Fill the fields as follows:

a. Address Locator Style: Select ‘US Address – Single House’b. Reference Data: Orange_Durham_Point.shp. Role: Primary Tablec. Field Map: House Number=’ADDRESS_NU’, Street Name=’STR_NAME’, ZIP

code=ZIP_CODEd. Output Address Locator: D:\temp\Geocoding\point_Loc

3) Geocode addresses using your new address locator: Right click on the DISEASE table, select ‘Geocode Addresses…’, select your new ‘point_Loc’ address locator, under Address Input Fields set ZIP Code = ZIP_CODE, set your Output shapefile to D:\temp\Geocoding\Geocoding_Result_point.shp. Under ‘Geocoding Options’, set Minimum candidate score = 10 and check all of the boxes under the Output Fields.

4) Click OK and view the results. Click on Close and open the attribute table for the new shapefile (Geocoding_Result_point.shp).

5) View the newly geocoded points. The geocoding improved because each address is located based on actual building location.

Address Geocoding Part 4: Using a composite address locator

We have three address locators. The point address locator is the most accurate, but there are no match found for several addresses. The street address locator is slightly less accurate, but it may be able to geocode the addresses that are unmatched in the point address locator. Finally the zip code locator is much less accurate, but it may match the addresses that are not geocoded using the point or street address locator.

What is a good solution? The Multistage or Composite Geocoding

1) Click on ArcToolBox Geocoding Tools Create Composite Address Locator. Select the following address locators: point_Loc, street_Loc, zipcode_Loc. Save the locator as composite_Loc.

2) Geocode the DISEASE table using the composite_Loc address locator. The geocoding result is a composite of point and street locations.

Exercise 2 Find if there is a spatial relationship in the patients with the Dookitifi condition after geocoding to the point and street level

Use the symbology to show the patients geocoded using the composite address locator as either having or not having the Dookitifi condition. Is there a spatial relationship?

Exercise 3: Find if there is a spatial relationship in the Dookitifi prevalence amongst patients after geocoding to the Zip Code level.

Patients with the Dookitifi condition are simply patients who are Duke students. The spatial relationship shown in the map created in exercise 1, which displayed the sum of patients who are Duke students, show that there is a greater number of patients how are Duke students near Duke University. This makes sense since students prefer to live near campus. This spatial relationship was also clear in the map created in exercise 2, which showed that the proportion of patients who are Duke students is greater near Duke University.

Next, in this exercise you are asked to display the Geocoding_Result_zipSums.shp shapefile again (the one you created in Exercise 1), but this time show the spatial distribution of the prevalence of Duke students amongst patients (i.e. Properties Symbology Proportional symbols, Value=Sum_DOOKIT, Normalization=Cnt_ZIP_CO). Why is the spatial relationship no longer clear? Do a research on the small number problem when mapping health outcomes with a small denominator (here the number of patient is the denominator).

Additional Geocoding Resources

1) The ArcGIS 10.0 streetmap address locator:

ESRI provided a US wide street address locator up until ArcGIS 10 (this will be discontinued with the transition to ArcGIS online). The ArcGIS 10 streetmap address locator is available at \\afs\data\gis\esri\dm13\streetmap_na\data\Street_Addresses_US.loc. This address locator is very large (>2GB).

Geocoding assignment: Right click on DISEASE Table, select ‘Geocode Addresses…’, click on Add…, navigate to the \\afs\data\gis\esri\dm13\streetmap_na\data\ folder, select the Street_Addresses_US.loc, click OK. Fill in Geocode Address window and click on OK. The geocoding will take a long time. The geocoding result are as follow

2) The ArcGIS online address locator:

This address locator is only available with an ArcGIS online account. Request one from the UNC GIS librarian (go to GIS & Statistical Software, click on the ‘GIS software’ tab, and read the instructions for “ArcGIS Online).

Geocoding assignment: Right click on DISEASE Table, select ‘Geocode Addresses…’, select the ‘World Geocode Service (ArcGIS Online)’, click OK. Fill in Geocode Address window and click on OK. The geocoding is extremely fast.

3) NC Master Addresses

http://www.cgia.state.nc.us/Services/NCMasterAddress.aspx

4) ESRI ArcGIS Help

http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/What_is_geocoding/002500000001000000/