32
Custom Geocoding Filled Areas in Tableau tabgeohack.exe Version 1.0 Caveat The tabgeohack utility is, as its name suggests, an unsupported hack. The utility allows the creation of custom geocoding roles in Tableau with associated filled areas. It does this by extending the database schema in Tableau’s custom geocoding database to hold geometry and populating the additional columns from spatial data files. It is unsupported in several senses. If it doesn't work as you expect, there's no guarantee it will ever get fixed. Best endeavours, if I'm interested and not too busy, that sort of thing. If you have any problems with a workbook that uses this approach; don’t even think about asking Tableau for help until you remove the custom geocoding. (I have no idea what Tableau’s attitude would be, but I know what mine would be if I were them.) It is virtually certain that a future release of Tableau will change how geocoding works in some way that will stop this from working altogether – simply because this approach relies on very specific (and unpublished!) details of the internal structure of the geocoding database. That is bound to change at some point. Hopefully any release that changes things in this way will also include adding support for similar extensibility capabilities, but there’s absolutely no telling. It uses an open source GIS library - and at least one of the features I'm using (simplification of complex shapes) doesn't work as well as I'd like - but there's nothing I can do about it.

Tableau Georeference

Embed Size (px)

DESCRIPTION

Tableau Georefence

Citation preview

Page 1: Tableau Georeference

Custom Geocoding Filled Areas in Tableau

tabgeohack.exeVersion 1.0

Caveat

The tabgeohack utility is, as its name suggests, an unsupported hack. The utility allows the creation of custom geocoding roles in Tableau with associated filled areas. It does this by extending the database schema in Tableau’s custom geocoding database to hold geometry and populating the additional columns from spatial data files.

It is unsupported in several senses.

If it doesn't work as you expect, there's no guarantee it will ever get fixed. Best endeavours, if I'm interested and not too busy, that sort of thing.

If you have any problems with a workbook that uses this approach; don’t even think about asking Tableau for help until you remove the custom geocoding. (I have no idea what Tableau’s attitude would be, but I know what mine would be if I were them.)

It is virtually certain that a future release of Tableau will change how geocoding works in some way that will stop this from working altogether – simply because this approach relies on very specific (and unpublished!) details of the internal structure of the geocoding database. That is bound to change at some point. Hopefully any release that changes things in this way will also include adding support for similar extensibility capabilities, but there’s absolutely no telling.

It uses an open source GIS library - and at least one of the features I'm using (simplification of complex shapes) doesn't work as well as I'd like - but there's nothing I can do about it.

The implications of all of this are clear: don’t use it for anything which you care about. In particular don’t use it for anything which needs to keep working beyond the next release of Tableau. Personally, I intend to use it for point-in time, throw-away analysis: blog posts and the like, and also to explore how this sort of capability would be useful if it were a supported part of the product. I strongly suggest you limit your use similarly.

You have been warned.

Page 2: Tableau Georeference

Page 2

Overview

The utility takes one or more spatial data files containing polygon data, transforms them to an appropriate geographic coordinate reference system for Tableau to use (i.e. to lat/long coordinates) and generates CSV files in the format needed for creating custom geocoding roles. After the custom geocoding has been imported to Tableau the utility is run again to insert the polygon boundary data into the custom geocoding database.

The source spatial data can (in principle) be in any spatial data format supported by the Geographic Data Abstraction Library (GDAL, an open source GIS library). I say in principle because I’ve only tested with ESRI shape files and a handful of other formats, but I see no reason why it shouldn’t work with anything the GDAL utilities can understand (which is an extensive list).

The utility also supports purging of unneeded geographic roles from the resulting custom geocoding database. Reducing the size of the database in this way can improve performance and also reduces the size of any packaged workbooks (which minimises the use of Tableau Public quota when publishing the workbook).

One of the key factors which determines the viability and usability of the resulting geocoding database is the number and complexity of loaded shapes. Too many, or too complex shapes can lead to very poor performance or even an out of memory error - it can simply take Tableau outside the envelope it is designed for. To help ensure you don't overload it, the utility provides the option to simplify the boundaries of the shapes using the GDAL library and also to display statistics about the complexity which help in deciding the appropriate simplification settings. However, don’t expect too much. Simplification of spatial data is a notoriously difficult task and can often lead to anomalies and artefacts in the simplified data (such as missing or overlapping “slivers” at the boundaries of adjoining shapes).

For example, the two screenshots below show a sample of New Zealand electoral boundaries simplified with a tolerance of 1,000 metres (left) and 100 metres (right). The original shape file with no simplification results in almost 600,000 boundary points being loaded. Simplifying at 1,000 metre tolerance reduces that to 3,000 boundary points (a factor of 200), which makes the view much more responsive, but introduces a lot of error. At 100 metres tolerance the number of points is around 16,000 (a factor of 40 on the original), which still allows the view to respond quickly whilst also retaining acceptable accuracy.

Page 3: Tableau Georeference

Page 3

Finding the best compromise between simplicity (and hence performance) and accuracy can involve a lot of trial and error. Getting satisfactory results may require manual intervention using a GIS tool. It can be particularly difficult if there is a wide range of sizes of shapes in the one file, since the same level of simplification has to apply to the whole file.

The utility is run from a DOS command line, driven by a configuration file in YAML format (YAML is a structured text format). It has no user interface. This means it’s not easy to use unless you understand what is going on. A sample configuration file and associated shape files are provided with the utility. These illustrate most of the key features and allow the whole process to be run, in order to get familiar with how it works.

Page 4: Tableau Georeference

Page 4

Summary of Commands

The tabgeohack utility has the following syntax. The options are described briefly below.

Usage: tabgeohack.pl [options] <config_file>OPTIONS: --info display shape file metadata --roles generate custom geocoding CSV files --shapes load custom shapes into geocoding D/B --assign <twb_file> assign custom geocoding instance to workbook --analyse display summary statistics for all geometry in D/B --activate activate the processed custom geocoding database for this configuration --revert restore the custom geocoding database to the unprocessed state --version display version number and exit

The configuration file <config_file> is a YAML file.

--infoThe --info option runs the GDAL ogrinfo command for each shape file referenced in the configuration file. This displays summary information about the file (number of features, geographic area extent, coordinate reference system used, units of measurement) and also displays a list of metadata attributes contained.

--rolesThe --roles option parses the shape files specified for each role, calculates the location of the centroid of each shape and generates a CSV file for each role containing specified identifying fields from the shape file plus the latitude and longitude of the centroid. These files are created in a single directory, in the format needed to import the roles into Tableau as custom geocoding (initially without any associated shapes).

Optionally, an additional file of feature metadata extracted from the shape files can be generated for each geographic role.

The input shape files are transformed to an appropriate geographic (lat/lon) coordinate reference system, if necessary.

--shapesThe --shapes option modifies the schema for the tables supporting the newly created roles in the custom geocoding database to accommodate the associated shape data. It then inserts the shapes into the database. Optionally, the shapes can be “simplified”, reducing the number of points per boundary line. This reduces the size of the custom geocoding database and improves performance, at the cost of loss of accuracy.

It then optionally purges any unneeded geocoding details (the custom geocoding database includes a full copy of all geocoding data supplied with Tableau). Purging the unneeded data in this way can improve performance (if only a subset of the data for a particular role is needed for the particular analysis) and also allows the geocoding database to be made much smaller, which reduces the size of resulting packaged workbooks and saves quota on Tableau Public used by any workbooks published there.

Finally it compresses the custom geocoding database and saves a copy which can be referenced even after the “current” custom geocoding has been replaced with a different

Page 5: Tableau Georeference

Page 5

set. This avoids the need to keep re-generating custom geocoding when swapping between workbooks requiring different custom geocoding.

--assignThe --assign option associates a Tableau workbook with the saved instance of the geocoding database specified in the given configuration file. This can be extremely useful for switching to and fro between different custom geocoding instances, without having to keep copying the files back into the standard location in the repository.

Unfortunately, this assignment is not retained when the workbook is saved, so the only way to keep the assignment is to save the workbook as a packaged workbook (which actually embeds a compressed copy of the custom geocoding database in the packaged workbook).

--analyseThe --analyse option displays summary statistics for the numbers of shapes and the numbers of boundary points for all geographic roles with shape data. This can be useful in determining the level of simplification needed.

--activateThe --activate option switches the current custom geocoding to use the saved copy of the processed geocoding database associated with the specified configuration file. This is an alternative way of switching to and fro between different geocoding instances.

--revertThe --revert option switches the current custom geocoding to use the saved copy of the unprocessed geocoding database associated with the specified configuration file (i.e. the version after importing custom geocoding with Tableau, but before inserting shape data with the --shapes option).

This can be extremely useful when experimenting with different levels of simplification. Simply run the –shapes option with one simplification setting and examine the results, then change the simplification setting in the configuration file, use –revert to return to the unprocessed geocoding database and run –shapes again to import shapes with the new simplification level.

--versionThe –version option simply displays the version number.

Page 6: Tableau Georeference

Page 6

Installation Instructions

1) Download TabGeoHack.zip from here:

http://dl.dropbox.com/u/59458890/TabGeoHack.zip

and save it on a local drive.

The default location specified in the sample configuration files included is C:\Data\Tableau – putting it there means making less changes to the configuration files.

2) Unzip the file, which will create a subdirectory of “TabGeoHack”, containing the utility plus a couple of Firebird utilities (needed for accessing the geocoding database) and some other components.

There are also two sub-directories: “Sample” is exactly what it says, “gdal” is the suggested location to install the required GDAL utilities (step 4).

3) Add the location where you have installed the utility to the PATH (edit system environment variables from control panel), e.g.:

“C:\Data\Tableau\TabGeoHack”

4) Download version 1.9 of GDAL (Geographic Data Abstraction Library) and save it in an appropriate location (such as the gdal directory under TabGeoHack). The current stable release of GDAL and a nightly build of the latest version are available at GISINTERNALS. I have been using release-1600-gdal-1-9-0-mapserver-6-0-1 (choose the zip file containing all components).

As the GISINTERNALS site often seems to be unavailable, I’ve put a copy of the version I’ve been using in my Dropbox account, here:

http://dl.dropbox.com/u/59458890/release-1600-gdal-1-9-0-mapserver-6-0-1.zip

5) Unzip the GDAL package.

Running GDAL components requires various directories to be on the path and other environment variables to be set. This is done automatically by tabgeohack, based on a setting in the configuration file (step 6).

If you want to run the GDAL components standalone, as well as from tabgeohack you’ll need to do this by following some fairly confusing instructions available on the GDAL site by following the “information” link next to each download. Or the script SDKShell.bat can be run in a command window to set the environment variables, but this is not persistent, so it is probably better to set them up permanently.

6) Modify the configuration file (tabgeohack.yml), which is located in the installation directory, specifying the location of your Tableau repository and the GDAL installation directory.

For example:

Page 7: Tableau Georeference

Page 7

# path of Tableau repositorytableau_repository_path: C:\Users\richard\Documents\My Tableau Repository

# GDAL installation pathGDAL_installation_path: C:\Data\Software\GDAL_1.9

A few other optional and little-used options are available, including the ability to specify a German or French installation of Tableau (although currently this only works for German). Refer to the reference section at the end of this document for details – and to the known issues section for issues and workarounds when using the French or German installations.

Page 8: Tableau Georeference

Page 8

Configuration File Format

The configuration file is in YAML format. YAML is a “data serialisation” language which aims to allow complex data structures to be expressed in a simple, human-readable format. It also makes loading and using that data extremely easy, which is the primary reason I chose it. Judge for yourself about the “simple, human-readable” bit.

There are a couple of key things to understand about YAML before attempting to edit the configuration files.

YAML works on textual indentation, so it is vital to keep the indentation level consistent. The best bet is to keep it exactly as you find it in the example files. It is best to edit the YAML files in a text editor, using a fixed-width font. Note that YAML only accepts indentation based on spaces, not tabs, so make sure your editor isn’t “helpfully” converting white space to tabs for you. I have used an indentation level of 4 spaces, but that isn’t required, it can be anything – as long as the indentation level remains consistent.

YAML includes simple lists, in which the list elements are introduced by a dash (“-“) and also hashes (named values), in which the name and value are separated by a colon (“:”).

Comments are indicated by “#”. Hopefully I’ve included enough of them to give you a fighting chance.

Just remember that the positions of spaces, dashes and colons are all crucial, and refer back to the example if it breaks. The utility attempts to give meaningful messages about format errors.

The configuration file has four main sections:

Miscellaneous details defining the input and output file locations and formats and such like.

Details of the geographic roles to be added, specifying the source spatial files, lists of attributes and details of any simplification required.

A definition of the hierarchical structure of the geographic roles: both Tableau’s built-in geographic roles and those being added by this configuration.

A definition of geographic roles to be purged from the resulting custom geocoding database (to improve performance and also reduce the size of any packaged workbooks using the custom geocoding). Roles may either be purged completely, or trimmed down to just the members required. The purge processing walks down the hierarchy defined in the preceding section, deleting or retaining children as appropriate.

Page 9: Tableau Georeference

Page 9

Example

An example configuration file and its associated shape files are included in the sample directory supplied with the utility. This is for the Tsunami warning zones for my home town of Porirua in New Zealand, along with “meshblock” (a grouping of land parcels) data. Sample data associating meshblocks with street addresses is also included. A sample workbook allowing meshblocks to be located by street address and displayed overlaying the Tsunami evacuation zones is published on Tableau Public here.

The sample workbook illustrates the impact of simplifying the shapes, by including three versions of the Tsunami zone boundaries: at full detail, simplified to 10 metre tolerance and simplified to 100 metre tolerance (see the various tabs in the workbook). To keep it as simple as possible, the sample YAML file only generates one version of the Tsunami zone boundaries: simplified to 10 metres.

The sample YAML file also illustrates how purging works, by selectively retaining Australia and New Zealand and also retaining just the Wellington Region (which corresponds to a “State” in Tableau). All cities within the Wellington region are also implicitly retained.

The sample file is shown below.

Porirua Tsunami Warnings.yml

# location of input spatial data filesshape_file_dir: C:\Data\Tableau\TabGeoHack\Sample\Shape Files

# location of various generated filesoutput_dir: C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings

# list of geographic roles to processgeographic_roles: # list of geographic roles to create # maximum length is due to Firebird identifier length limit # of 31 and the need for a 'LocalData<role_name>' table - role_name: Porirua_Meshblock # shape file(s) - note that this is a list of files to allow # for layers being split across shape files normally there # will be a single entry shape_file_names: - porirua_mb_wgs84.shp # list of fields from shape file to include in geocoding # database # Firebird identifier names are limited to 31 characters required_geocoding_fields: MB11: # column name to be used in geocoding database alias: Meshblock Code # unique ID indicator (default false) unique_id: true # list of phrases to be used for automatic geocoding heuristics: - meshblock # list of fields from shape file to include in separate file # of feature data which can be joined to datasource – field # names from the shape file are listed, with optional aliases # to be used as CSV column names (otherwise the original

Page 10: Tableau Georeference

Page 10

# field name is used) required_feature_fields: MB11: Meshblock Code AU11: Area Unit Code TA11: Territorial Authority Code WARD11: Ward Code REGC11: Regional Council Code X_GCEN: Y_GCEN: # whether or not to generate CSV file of points generate_points: true - role_name: Porirua_Tsunami shape_file_names: - porirua-tsunami-evacuatio.shp required_geocoding_fields: OBJECTID: alias: Tsunami Object ID unique_id: true COL_CODE: alias: Colour Code required_feature_fields: OBJECTID: ZONE_CLASS: COL_CODE: EVAC_ZONE: LOCATION: INFO: HEIGHTS: simplify_tolerance: 10 generate_points: true # Definition of Role Hierarchy to allow purging of unwanted roles# First the built-in geographic rolesrole_hierarchy: - role: Country children: - role: State children: - role: City - role: County - role: ZipCode - role: AreaCode - role: CMSA# Custom geocoding roles needs to be defined at the appropriate# position in the hierarchy if they are to be (partially) purged.# This can be a useful way to trim down the volume of imported data# to just the region of interest. In this case we are not purging the# custom roles, so there is no need define them here.

# Whether or not to purge synonyms for any kept rolespurge_synonyms: true

# Definition of geographic roles to purge. Note that children in the# role hierarchy are automatically purged if their parents are# purged. Additional children can be purged by specifying the role# explictly here.purge_roles_exceptions: # All countries except New Zealand and Australia are purged. Country: - New Zealand - Australia

Page 11: Tableau Georeference

Page 11

# All states (aka regions in NZ) are purged except the Wellington # Region. Note that all states of other countries (including # Australia) will be purged with this definition. State: - Wellington # City is not listed, so all cities except those within the # Wellington Region (aka "State") are purged. # City: # County, ZipCode, AreaCode and CMSA are listed with no # exceptions, so even New Zealand Postcodes (aka ZipCodes) are # purged. County: ZipCode: AreaCode: CMSA:

Page 12: Tableau Georeference

Page 12

Example - Loading the Sample Data

The utility comes with a sample configuration, the associated shape files and a directory structure set up as needed to run it. The sample data contains the boundaries of the Tsunami warning zones for my home town of Porirua in New Zealand, along with local “meshblock” data (meshblocks are a grouping of land parcels) and street address data.

The steps needed to load the data are explained below. A sample workbook using this data (and also demonstrating the impact of various levels of simplification) is published on Tableau Public here.

The sample also illustrates how purging works, by selectively retaining Australia and New Zealand and also retaining just the Wellington Region (aka “State”). All cities within the Wellington region are also implicitly retained.

The sample YAML configuration file is included earlier in this document.

This section works through how to use all of the commands with the sample data.

1. --help - display the list of options for the command

To make sure the utility is installed and available on the PATH, open a DOS window and run the command with the --help option:

tabgeohack --help

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --help

Usage: tabgeohack [options] <config_file>OPTIONS: --info display shape file metadata --roles generate custom geocoding CSV files --shapes load custom shapes into geocoding D/B --assign <twb_file> assign custom geocoding instance to workbook --analyse display summary statistics for all geometry in D/B --activate activate the processed custom geocoding database for this configuration --revert restore the custom geocoding database to the unprocessed state --version display version number and exitThe configuration file <config_file> is a YAML file.

C:\Data\Tableau\TabGeoHack\Sample>

2. --info - display details of referenced shape files

The --info option runs the GDAL ogrinfo command for each shape file referenced in the configuration file. This displays summary information about the file (number of features, geographic area extent, coordinate reference system used, units of measurement) and also displays a list of metadata attributes contained.

Particularly useful details are highlighted in the listing below.

tabgeohack --info "Porirua Tsunami Warnings.yml"

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --info "Porirua Tsunami Warnings.yml"

Page 13: Tableau Georeference

Page 13

Displaying shapefile metadata using ogrinfo...Porirua_Meshblock=================INFO: Open of `C:\Data\Tableau\TabGeoHack\Sample\Shape Files\porirua_mb_wgs84.shp' using driver `ESRI Shapefile' successful.

Layer name: porirua_mb_wgs84Geometry: PolygonFeature Count: 602Extent: (174.770784, -41.164136) - (174.995543, -41.003921)Layer SRS WKT:GEOGCS["GCS_WGS_1984", DATUM["WGS_1984", SPHEROID["WGS_84",6378137,298.257223563]], PRIMEM["Greenwich",0], UNIT["Degree",0.017453292519943295]]MB11: String (7.0)TA11: String (3.0)WARD11: String (5.0)CB11: String (5.0)TASUB11: String (5.0)REGC11: String (2.0)CON11: String (4.0)MCON11: String (4.0)AU11: String (6.0)UA11: String (3.0)X_GCEN: Integer (9.0)Y_GCEN: Integer (9.0)AREA: Real (19.5)======================Porirua_Tsunami===============INFO: Open of `C:\Data\Tableau\TabGeoHack\Sample\Shape Files\porirua-tsunami-evacuatio.shp' using driver `ESRI Shapefile' successful.

Layer name: porirua-tsunami-evacuatioGeometry: PolygonFeature Count: 3Extent: (1749174.069200, 5443680.348800) - (1762544.429300, 5459038.196500)Layer SRS WKT:PROJCS["NZGD2000 / New Zealand Transverse Mercator 2000", GEOGCS["GCS_NZGD_2000", DATUM["New_Zealand_Geodetic_Datum_2000", SPHEROID["GRS_1980",6378137,298.257222101]], PRIMEM["Greenwich",0], UNIT["Degree",0.017453292519943295]], PROJECTION["Transverse_Mercator"], PARAMETER["latitude_of_origin",0], PARAMETER["central_meridian",173], PARAMETER["scale_factor",0.9996], PARAMETER["false_easting",1600000], PARAMETER["false_northing",10000000], UNIT["Meter",1]]OBJECTID: Integer (10.0)ZONE_CLASS: Real (19.11)COL_CODE: String (15.0)EVAC_ZONE: String (50.0)LOCATION: String (50.0)INFO: String (254.0)HEIGHTS: String (100.0)======================Done in 0 seconds

C:\Data\Tableau\TabGeoHack\Sample>

The highlighted fields are all useful in setting the details in the configuration file.

In particular, note that the map units shown are the units which must be used for the simplify_tolerance: setting, if that is used. So in the case of the example, the simplify_tolerance for the tsunami zone boundaries is specified in metres, since the tsunami shape file uses a projected coordinate reference system with those units. To simplify the sample meshblock data, however, the tolerance would need to be specified in (fractional) degrees, since that shape file is using a geographic (lat/long) coordinate reference system.

Page 14: Tableau Georeference

Page 14

3. --roles – generate CSV files for custom geocoding

The --roles option parses the shape files specified for each role, calculates the location of the centroid of each shape and generates a CSV file for each role containing specified identifying fields from the shape file plus the latitude and longitude of the centroid. These files are created in the sub-directory “Custom Geocoding Files” under the output files location specified in the configuration file. These files are created in the format needed to import the roles into Tableau as custom geocoding (initially without any associated shapes).

If the option to create additional files of feature data was chosen for any of the roles, these are created in the sub-directory “Feature Files” under the output location.

tabgeohack --roles "Porirua Tsunami Warnings.yml"

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --roles "Porirua Tsunami Warnings.yml"Generating custom geocoding files... Porirua_Meshblock... (602) Porirua_Tsunami... (3)Done in 2 seconds

C:\Data\Tableau\TabGeoHack\Sample>

4. Import custom geocoding into Tableau

The custom geocoding CSV files generated in the previous step should be imported into Tableau in the usual way. (Note that if you are using the French or German interface, you’ll need to switch to English while you import the Geocoding.)

For example:

Create a new workbook by opening the file “Porirua Tsunami Zones.csv” from the sample directory with Tableau 7.0. Choose a live connection. There is no need to add any fields to the view at this stage.

Select “Map->Geocoding->Import Custom Geocoding…” and select the directory location holding the custom geocoding CSV files just created in the previous step – by default this is:

C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Custom Geocoding Files

Save the workbook as “Tsunami.twb”.

Close (all copies of) Tableau.

Page 15: Tableau Georeference

Page 15

5. --shapes – add shape boundaries to the custom geocoding

The --shapes option modifies the schema for the tables supporting the newly created roles in the custom geocoding database to accommodate the associated shape data. It then inserts the shapes into the database.

It then optionally purges any unneeded geocoding details (the custom geocoding database includes a full copy of all geocoding data supplied with Tableau). Purging the unneeded data in this way can improve performance (if only a subset of the data for a particular role is needed for the particular analysis) and also allows the geocoding database to be made much smaller, which reduces the size of resulting packaged workbooks and saves quota on Tableau Public used by any workbooks published there.

Finally it compresses the custom geocoding database and optionally saves a copy which can be referenced even after the “current” custom geocoding has been replaced with a different set.

tabgeohack --shapes "Porirua Tsunami Warnings.yml"

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --shapes "Porirua Tsunami Warnings.yml"Generating shapes... Porirua_Meshblock... added 602 rows with a total of 18706 points (min: 5, avg: 31, max: 595) Porirua_Tsunami... added 3 rows with a total of 3387 points (min: 729, avg: 1129, max: 1507)Overall totals: 605 rows, 22093 pointsPurging unwanted geocoding data...Processing role: Country - Keeping: 'New Zealand', 'Australia'Processing role: State - Keeping: 'Wellington'Processing role: CityProcessing role: CountyProcessing role: ZipCodeProcessing role: AreaCodeProcessing role: CMSATotal rows deleted: 1324Compressing geocoding D/B...Saving a copy of the unprocessed custom geocoding data at: C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (no geometry)Saving a copy of the processed custom geocoding data at: C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (with geometry)Done in 20 seconds

C:\Data\Tableau\TabGeoHack\Sample>

6. Check that it has worked

Reopen the workbook “Tsunami.twb” saved in step 4, assign the geographic role “Tsunami Object ID” to the field [OBJECTID], drag [OBJECTID] onto Level of Detail and set the mark type to Filled Map. Drag [COL_CODE] onto the color shelf and you should have something that looks like this. Tableau isn’t quite smart enough to get the colours right automatically.

Page 16: Tableau Georeference

Page 16

About Tableau maps: www.tableausoftware.com/mapdata

Sheet 1

COL_CODE

Orange Zone

Red Zone

Yellow Zone

Map based on Longitude (generated) and Latitude (generated). Color shows details about COL_CODE.Details are shown for OBJECTID.

7. --assign – associate Tableau workbooks with saved geocoding

The --assign option associates a Tableau workbook with the saved instance of the geocoding database specified in the given configuration file. This can be useful for switching to and fro between different custom geocoding instances, without having to keep copying the files back into the standard location in the repository.

Unfortunately, this assignment is not retained when the workbook is saved, so the only way to keep the assignment is to save the workbook as a packaged workbook (which actually embeds a compressed copy of the custom geocoding database in the packaged workbook).

tabgeohack --assign Tsunami.twb "Porirua Tsunami Warnings.yml"

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --assign Tsunami.twb "Porirua Tsunami Warnings.yml"Assigning workbook 'Tsunami.twb' to custom geocoding instance 'Porirua Tsunami Warnings'...Done in 0 seconds

C:\Data\Tableau\TabGeoHack\Sample>

8. --analyse – display statistics for all geometry objects

The --analyse option displays summary statistics for the numbers of shapes and the numbers of boundary points for all geographic roles with shape data. This can be useful in determining the level of simplification needed.

tabgeohack --analyse "Porirua Tsunami Warnings.yml"

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --analyse "Porirua Tsunami Warnings.yml"Role: Country, 2 out of 2 rows with geometry. 2565 points (min: 766, avg: 1282, max: 1799 per row).Role: State, 1 out of 1 rows with geometry. 100 points (min: 100, avg: 100, max: 100 per row).Role: County, 0 out of 0 rows with geometry.Role: ZipCode, 0 out of 0 rows with geometry.Role: Porirua_Meshblock, 602 out of 602 rows with geometry.

Page 17: Tableau Georeference

Page 17

18706 points (min: 5, avg: 31, max: 595 per row).Role: Porirua_Tsunami, 3 out of 3 rows with geometry. 3387 points (min: 729, avg: 1129, max: 1507 per row).Done in 1 seconds

C:\Data\Tableau\TabGeoHack\Sample>

9. --activate – switch to previously processed geocoding D/B

The --activate option switches the current custom geocoding to use the saved copy of the processed geocoding database associated with the specified configuration file. This is an alternative way of switching to and fro between different geocoding instances.

tabgeohack --activate "Porirua Tsunami Warnings.yml"

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --activate "Porirua Tsunami Warnings.yml"Activating the saved copy of the processed custom geocoding data from: C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (with geometry)Done in 0 seconds

C:\Data\Tableau\TabGeoHack\Sample>

10. --revert – switch to saved unprocessed geocoding D/B

The –revert option switches the current custom geocoding to use the saved copy of the unprocessed geocoding database associated with the specified configuration file (i.e. the version after importing custom geocoding with Tableau, but before inserting shape data with the –shapes option.

tabgeohack --revert "Porirua Tsunami Warnings.yml"

C:\Data\Tableau\TabGeoHack\Sample>tabgeohack --revert "Porirua Tsunami Warnings.yml"Reverting to the saved copy of the unprocessed custom geocoding data from: C:\Data\Tableau\TabGeoHack\Sample\Porirua Tsunami Warnings\Local Data Copy (no geometry)Done in 0 seconds

C:\Data\Tableau\TabGeoHack\Sample>

Page 18: Tableau Georeference

Page 18

Troubleshooting and Known Issues

The utility does some rudimentary validation of the configuration files and attempts to handle any errors that occur during processing as gracefully as possible. A failed run should leave all files in the repository in their initial state, so there should not often be a need to intervene to recover following failures.

However if the custom geocoding database does ever get into a state that Tableau is not happy with, it is straightforward either to revert to the unmodified custom geocoding database using the –revert option, or to remove custom geocoding from the Tableau menu:

“Map->Geocoding->Remove Custom Geocoding

If all else fails, simply delete the “Local Data” directory from the repository and start again – Tableau will automatically recreate it when you add custom geocoding.

The most common issues are caused either by inconsistencies in the YAML configuration file, or by invalid data in the shape files being read. I have encountered several examples of invalid shape files – including official government supplied public information files. There are also a few known bugs and limitations.

Known Issues

1. The” tableau_country_code” option doesn’t work for the French installation of Tableau.

This is due to the need to use Unicode characters in the path for the custom geocoding database (for the acute accent on the e in “Données locale”, which is the directory name in the French version.

I have had a brief go at getting this working and not managed to figure it out. For now the workaround is to rename the “Données locale” directory in the repository to “Local Data” before running the utility and back again afterwards.

2. Generated custom geocoding CSV files cannot be imported if the user interface language is set to French or German. The CSV files are required to use the correct French or German words for Latitude and Longitude, as per the current language setting.

Again, I had a brief go at getting this going in German, but the ”a umlaut” in Längengrad defeated me.

3. The “simplify” option can lead to gaps and overlaps in the shapes.

Unfortunately this is just how the GDAL library works. Experiment with different settings to find the best compromise between speed and accuracy. Alternatively, the shape file can be simplified using a GIS system before processing with tabgeohack.

I hope to make this better in a future version, but don’t count on it.

Page 19: Tableau Georeference

Page 19

4. Some shape files do not contain a field which uniquely identifies each feature.

Tableau requires a unique identifier in order to import custom geocoding and tabgeohack also needs it to associate the shapes with the right features.

To help identify any potential identifier fields in the shape file, tabgeohack checks all fields mentioned in the “required_geocoding_fields” and “required_feature_fields” lists for uniqueness, and suggests candidates if the specified field is not unique.

Failing that, the option “generate_unique_id” can be set for the role. This will generate a unique id field (simply the sequence in the file).

5. Views are slow to render or Tableau fails with out of memory errors.

If too much geocoding data is loaded, Tableau performance will degrade. This may be caused either by too many features or by too much detail in the shape boundaries.

The workarounds for reducing the total number of rows are either to limit the number of rows imported by choosing shape files limited to the region of interest, or by purging unneeded features during the processing of the --shapes step.

The workaround for too many boundary marks per feature is to use the simplify option, or simplify the shape file before loading.

6. Errors in the YAML configuration files.

There are numerous errors that can occur due to invalid layout, incorrect field names or invalid characters in the YAML file. Carefully compare the file that is giving problems with the sample file and pay close attention to the exact layout (particularly the number of spaces used for indenting).

For example:

Errors found in configuration file:YAML::Tiny failed to classify line ' shape_file_names:' at C:\Data\Performance Testing\Tableau\Filled Maps\tabgeohack.pl line 97

This error is caused by too few spaces indenting the “shape_file_names:” entry.

Errors found in configuration file:[/geographic_roles/0/] 'shape_file_name' is not one of the allowed keys: generate_points, generate_unique_id, no_transform_crs, precision, required_fea... at C:\Data\Performance Testing\Tableau\Filled Maps\tabgeohack.pl line 97

This error is caused by a missing “s” – it should be “shape_file_names:” not “shape_file_name:”.

7. Unknown Coordinate Reference system.

Sometimes shape files do not have the necessary information to allow the GDAL Utilities to identify the coordinate reference system used by the file, which means the utility doesn’t know how to transform the file to the required geographic

Page 20: Tableau Georeference

Page 20

coordinate reference system (lat/lon) for Tableau to use. There are a couple of options in this case.

If the file is already expressed using latitude and longitude, there is probably no need to transform its coordinates. Any errors introduced like this are likely to be insignificant at the sort of scales normally used for Tableau visualisations. In this case, simply add the “no_transform_crs” option, with a value of true, for the role in question.

If the file is in a projected coordinate reference system, and you know (or can guess) the CRS, the gdal utilitie allow you to specify the source CRS when transforming a file. This can be specified by adding the “source_crs” option to the relevant role.

Refer to the configuration file syntax reference at the end of this document for both of these options.

8. Bad shape files.

Various errors can be caused by invalid shape files. These generally result in failures during calls to the GDAL utilities used by tabgeohack and can be very hard to pin down. The only way to fix these is to edit the shape file with a GIS tool. In some cases the errors are detected by the validation options provided by the GIS tool, which at least locates the troublesome feature. In other cases these errors are not detected by the GIS tool I’m using (QGIS), which makes it even trickier.

Examples I have seen include:

A “divide by zero” error during the --roles step. This was due to a polygon with only 3 boundary points (which is not valid according to the ESRI shape file specification). This caused the area of the polygon to be zero, which broke the calculation of the centre of the shape. Deleting that feature (or that polygon if the feature has multiple polygons) is the best option.

The points in a polygon do not form a closed ring. This causes a failure during processing of the --shapes option. In this case QGIS did not detect the error.

The boundary of a polygon crosses itself. Again this can happen during --shapes processing. This case was detected by QGIS and can be relatively easily fixed.

An “Out of Memory” error during processing of the --shapes option can also be caused by a self-intersecting polygon. How the issue manifests itself depends on options chosen (such as whether or not the simplify option is being used). Once again, locating and fixing the troublesome feature with a GIS tool is the only option.

Page 21: Tableau Georeference

Page 21

Structure and Allowed Values of Configuration Files

The structure and allowed values for all options supported by the tabgeohack installation configuration file and the custom geocoding instance configuration files are defined in the YAML “schema” shown below. Hopefully this is just about intelligible. Most of the interesting options are illustrated in the example above, so only refer to this section for reference for the more obscure options.

(Don’t bother Googling “YAML schema”, by the way. The YAML schema language I am using is an invention of a colleague of mine who hasn’t quite got around to donating it to the YAML community yet – but I think the meaning is fairly self-explanatory.)

Installation configuration: tabgeohack.yml

# path of Tableau repositorytableau_repository_path: required text

# GDAL installation pathGDAL_installation_path: required text

# working directory (defaults to TEMP environment variable)temp_loc: text

# optionally retain temporary files (default false)keep_temp: boolean

# country code for international Tableau edition (US/DE/FR)# (default US)tableau_country_code: values US, DE, FR

# default number of digits of precision for latitude and longitude# values (default 4)default_precision: integer

# optional geographic coordinate reference system to be used# (default WGS84).# Specified in any format understood by ogr2ogr (eg, WGS84 is# specified as EPSG:4326)target_crs: text

Custom Geocoding Instance Configuration

# location of input spatial data filesshape_file_dir: required text

# location of various generated filesoutput_dir: required text

# whether or not to split polygons which span the dateline (default# false)wrapdateline: boolean

# list of geographic roles to processgeographic_roles: # list of geographic roles to create # maximum length is due to Firebird identifier length limit of 31 # and the need for a 'LocalData<role_name>' table - role_name: required text length 1 to 22

Page 22: Tableau Georeference

Page 22

# names of shape files (or potentially any other type of spatial # file supported by GDAL) # this is a list of files to allow for layers being split across # shape files – normally just a single file shape_file_names: - required text # coordinate reference system used in source spatial files. Only # needed if not defined in source files # Specified in any format understood by ogr2ogr (eg, WGS84 is # specified as EPSG:4326) source_crs: text # suppress transformation of coordinate reference system, if # shape file uses an unknown but geographic (lat/long) CRS no_transform_crs: boolean # number of digits of precision for latitude and longitude values # (default set at tabgeohack config level) precision: integer # whether to generate a unique feature ID in the generated output # files (default false) generate_unique_id: boolean # list of fields from shape file to include in geocoding database # Firebird identifier names are limited to 31 characters required_geocoding_fields: <field_name>: # column name to be used in geocoding database alias: required text length 1 to 31 # unique ID indicator (default false) unique_id: boolean # list of phrases to be used for automatic geocoding heuristics: - required text # list of fields from shape file to include in separate file of # feature data which can be joined to datasource - field names # from the shape file are listed, with optional aliases to be # used as CSV column names (otherwise the original field name is # used) required_feature_fields: <field_name>: optional text # whether or not to generate CSV file of points generate_points: boolean # optional tolerance (in map units of source CRS) for # simplification simplify_tolerance: number # optional number of iterations to allow coarse and then finer # simplification (for example, specifying: # simplify_tolerance: 100 # simplify_iterations: 10 # will simplify at 10, 20, 30 ... 100 metre tolerance) simplify_iterations: integer # Definition of role hierarchy - this is required for both built in# and custom roles in order to support automatic purging of unneeded

Page 23: Tableau Georeference

Page 23

# roles, whilst complying with referential integrity rules.## Note that specifying a hierarchy in this version of the YAML# validator requires Kwalify notation - so this deifinition uses a# mixture of Kwalify and Compact notation - which is very confusing.role_hierarchy: type: seq required: yes define: hierarchy-node-rule sequence: - role: required text children: use: hierarchy-node-rule

# Whether or not to purge synonyms for any kept rolespurge_synonyms: boolean

# definition of geographic roles to purge - named roles are purged,# except for any listed exceptions# child roles in the hierarchy are automatically purged, keeping any# children of the exceptions listed# additional children may also be purged by specifying the child role# explicitlypurge_roles_exceptions: <role>: - required text