Upload
dub-linked
View
113
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Graduating with a BA from UCD in 1995, Colman emigrated to America to pursue a career that combined creativity, commerce and computers. Heading west to California, Colman worked for 11 years in Hollywood's visual effects (VFX) industry. During this time he worked mainly at The Walt Disney Co. and also as a. In 2006, Colman returned home to Ireland to undertake a . A short time after the conclusion of the course, while starting up his own , Colman was invited back to DIT as a part-time lecturer. In 2011, Colman was offered a PhD Fellowship at modeling and simulating the relationship between innovation and profit. This full-time study is under the direction of Prof. Petra Ahrweiler, Director UCD Innovation Research Unit and Professor of Technology and Innovation Management, Smurfit School of Business. In 2012, Colman designed and delivered the first iteration of a new Visualisation module as part of DIT's . Details of Colman's research activities can be found at . -Dubinked-Drawing from a new module at DIT, Colman's presentation at Dublinked will be an introduction to the domain of visualisation and a demonstration of powerful yet "do-able" data visualisations. The ethos of the presentation is for people who have little or no visualisation experience but have an aptitude and appetite for using technical tools to surface meaning from data. The tools used will be R, R Studio and Inkscape.
Citation preview
2012-05-24 2
Visualisation
MSc Data Analyticshttp://www.dit.ie/postgrad/programmes/dt285dt286mscincomputingdataanalytics/
2012-05-24 3
Agenda
1) Background to Data Visualization*
2) Resources
3) Classification of Visualization
4) The Design Process
5) Demonstration
*Disclaimer (and apologies to some), I use the American spelling “visualization”
2012-05-24 4
Take-away Points
1) Open to all
» new domain with many facets
2) Professional-level output is achievable
» practice a few programming and graphic design techniques
3) It's (only) a means to an end
» should affect behaviour
2012-05-24 5
BackgroundData Visualisation
(very briefly)
2012-05-24 6
Charles Joseph Minard(1781 – 1870)
2012-05-24 7
William Playfair(1759 – 1823)
2012-05-24 8
Broad Street cholera outbreak(John Snow - 1854)
2012-05-24 9
Crimea War deaths(Florence Nightingale - 1858)
2012-05-24 10
London Underground MapHarry Beck (1933)
http://briankerr.wordpress.com/2009/06/08/connections/
2012-05-24 11
John Tukey(1915 – 2000)
2012-05-24 12
Edward Tufte
2012-05-24 13
Hans Rosling
2012-05-24 14
...and many other giants of statistics, mathematics, medicine,
design, computing and related fields
2012-05-24 15
Agenda
1) Background to Data Visualization*
2) Resources
3) Classification of Visualisation
4) The Design Process
5) Demonstration
2012-05-24 16
Resources(ever growing)
2012-05-24 17
Texts (1 of 2)► R in a Nutshell: A Desktop Quick Reference - Adler, Joseph ► Excel 2007 Dashboards & Reports For Dummies - Alexander, Michael ► Ways of Seeing: Based on the BBC Television Series - Berger, John S. ► Semiology of Graphics: Diagrams, Networks, Maps - Bertin, Jacques ► Statistics in a Nutshell: A Desktop Quick Reference - Boslaugh, Watters► The Jelly Effect: How to Make Your Communication Stick - Bounds, Andy► Gamestorming: A Playbook for Innovators, Rulebreakers, and Changemakers - Brown, Sunni ► Sketching User Experiences: Getting the Design Right and the Right Design - Buxton, Bill ► Readings in Information Visualization: Using Vision to Think - Card, Mackinlay and Shneiderman► The Elements of Graphing Data - Cleveland, William S.► Visualizing Data - Cleveland, William S. ► Now You See It - Davidson, Cathy N. ► slide:ology: The Art and Science of Creating Great Presentations - Duarte, Nancy ► Art: The Whole Story - Farthing, Stephen ► Information Dashboard Design: The Effective Visual Communication of Data - Few, Stephen► Now You See It: Simple Visualization Techniques for Quantitative Analysis - Few, Stephen ► Show Me the Numbers: Designing Tables and Graphs to Enlighten - Few, Stephen ► Freelance Design in Practice - Fishel, Cathy ► Art of Plain Talk - Flesch, Rudolf ► The Art of Looking Sideways - Fletcher, Alan ► Graphic Artist's Guild Handbook of Pricing and Ethical Guidelines - Graphic Artists Guild► Made to Stick: Why Some Ideas Survive and Others Die - Heath, Chip and Dan ► Switch: How to Change Things When Change Is Hard - Heath, Chip and Dan ► Data Analysis with Open Source Tools - Janert, Philipp K.► We Feel Fine: An Almanac of Human Emotion - Kamvar, Sep ► Turning Numbers into Knowledge: Mastering the Art of Problem Solving - Koomey, Jon ► Elements of Graph Design - Kosslyn, Stephen M.
Andy Kirk, http://www.visualisingdata.com
2012-05-24 18
Texts (2 of 2)► Graph Design for the Eye and Mind - Kosslyn, Stephen M. ► Don't Make Me Think: A Common Sense Approach to Web Usability, 2nd Edition - Krug, Steve ► Universal Principles of Design, Revised and Updated - Lidwell - Holden, Butler► Visual Complexity: Mapping Patterns of Information - Lima, Manuel ► The Power of the 2 x 2 Matrix: Using 2 x 2 Thinking to Solve Business Problems and Make Better Decisions - Lowy, Alex ► How Maps Work: Representation, Visualization, and Design - MacEachren, Alan M.► The Laws of Simplicity (Simplicity: Design, Technology, Business, Life) - Maeda, John► Visual Language for Designers: Principles for Creating Graphics that People Understand - Malamed, Connie► Understanding Comics: The Invisible Art - Mccloud, Scott ► The Chicago Guide to Writing about Numbers (Chicago Guides to Writing, Editing, and Publishing) - Miller, Jane E. ► How to make an IMPACT - Moon, Jon ► Designing Visual Interfaces: Communication Oriented Techniques - Mullet, Kevin ► The Designful Company: How to build a culture of nonstop innovation - Neumeier, Marty ► Emotional Design: Why We Love (or Hate) Everyday Things - Norman, Donald A. ► The Design of Everyday Things - Norman, Donald A. ► Playfair's Commercial and Political Atlas and Statistical Breviary - Playfair, William ► Presentation Zen Design: Simple Design Principles and Techniques to Enhance Your Presentations - Reynolds, Garr ► Presentation Zen: Simple Ideas on Presentation Design and Delivery - Reynolds, Garr ► The Back of the Napkin (Expanded Edition): Solving Problems and Selling Ideas with Pictures - Roam, Dan ► Unfolding the Napkin: The Hands-On Method for Solving Complex Problems with Simple Pictures - Roam, Dan ► Creating More Effective Graphs - Robbins, Naomi B. ► The Craft of Information Visualization: Readings and Reflections - Shneiderman, Ben ► The Visual Display of Quantitative Information - Tufte, Edward R. ► Envisioning Information - Tufte, Edward R.► Beautiful Evidence - Tufte, Edward R.► Graphic Discovery: A Trout in the Milk and Other Visual Adventures - Wainer, Howard ► Visual Thinking: for Design - Ware, Colin ► The Grammar of Graphics - Wilkinson, Leland ► Non-Designer's Design Book (3rd Edition) - Williams, Robin ► Glut: Mastering Information Through the Ages - Wright, Alex
Andy Kirk, http://www.visualisingdata.com
2012-05-24 19
Tools for Analysis, Graphing and Enterprise► Microsoft Excel► Open Office Calc► Tableau Desktop► Tableau Public► TIBCO Spotfire► QlikView► Grapheur► Gephi► Visokio Omniscope► Panopticon► Wolfram Mathematica► Data Graph► OmniGraphSketcher► PLOT ► MATLAB► SPSS Visualisation Designer► STATA► Visualize Free► Dundas► Wondergraphs
http://office.microsoft.com/en-us/excel/
http://why.openoffice.org/why_great.html
http://www.tableausoftware.com/products/desktop
http://www.tableausoftware.com/public/
http://spotfire.tibco.com/
http://www.qlikview.com/
http://grapheur.com/
http://gephi.org/
http://www.visokio.com/
http://www.panopticon.com/
http://www.wolfram.com/mathematica/
http://www.visualdatatools.com/DataGraph/
http://www.omnigroup.com/products/omnigraphsketcher
http://plot.micw.eu/
http://www.mathworks.com/products/matlab/
http://www-01.ibm.com/software/analytics/spss/products/statistics/vizdesigner/
http://www.stata.com/
http://visualizefree.com/index.jsp
http://www.dundas.com/dashboard/
http://www.wondergraphs.com/
Andy Kirk, http://www.visualisingdata.com
2012-05-24 20
Visual Programming Languages and Environments► Adobe Flash► Processing► Processing.js► R► D3► Protovis► Prefuse► Prefuse Flare► Impure► Mondrian► HTML5► Python► Silverlight ► Orange► paper.js► WebGL► Dejavis► Simile Widgets► JavaScript InfoVis Toolkit► Juice Kit► Treevis
http://www.adobe.com/products/flash/
http://processing.org/
http://processingjs.org/
http://www.r-project.org/
http://mbostock.github.com/d3
http://protovis.org/
http://prefuse.org/
http://flare.prefuse.org/
http://www.impure.com/
http://www.theusrus.de/Mondrian/
http://dev.w3.org/html5/spec/
http://www.python.org/
http://www.silverlight.net/
http://orange.biolab.si
http://paperjs.org/about/
http://www.chromeexperiments.com/webgl
http://dejavis.org/stacks
http://simile-widgets.org/
http://thejit.org/
http://www.juicekit.org/
http://treevis.net/
Andy Kirk, http://www.visualisingdata.com
2012-05-24 21
Tools for Analysis, Graphing and Enterprise► Microsoft Excel► Open Office Calc► Tableau Desktop► Tableau Public► TIBCO Spotfire► QlikView► Grapheur► Gephi► Visokio Omniscope► Panopticon► Wolfram Mathematica► Data Graph► OmniGraphSketcher► PLOT ► MATLAB► SPSS Visualisation Designer► STATA► Visualize Free► Dundas► Wondergraphs
http://office.microsoft.com/en-us/excel/
http://why.openoffice.org/why_great.html
http://www.tableausoftware.com/products/desktop
http://www.tableausoftware.com/public/
http://spotfire.tibco.com/
http://www.qlikview.com/
http://grapheur.com/
http://gephi.org/
http://www.visokio.com/
http://www.panopticon.com/
http://www.wolfram.com/mathematica/
http://www.visualdatatools.com/DataGraph/
http://www.omnigroup.com/products/omnigraphsketcher
http://plot.micw.eu/
http://www.mathworks.com/products/matlab/
http://www-01.ibm.com/software/analytics/spss/products/statistics/vizdesigner/
http://www.stata.com/
http://visualizefree.com/index.jsp
http://www.dundas.com/dashboard/
http://www.wondergraphs.com/Andy Kirk, http://www.visualisingdata.com
2012-05-24 22
Google's Charting and Visualisation Tools► Google Docs► Google Fusion Tables► Google Chart API► Google Visualization API► Google Motion Chart & Public Data Explorer► Google Insights for Search► Google Zeitgeist► Google Ngram Viewer► Google Analytics► Google.org Philanthropy► Google Wonder Wheel► GraphViz► Choosel► Data Appeal
https://docs.google.com/?pli=1#home
http://www.google.com/fusiontables/Home?pli=1
http://code.google.com/apis/chart/
http://code.google.com/apis/visualization/documentation/gallery.html
http://www.google.com/publicdata/home
http://www.google.com/insights/search/#
http://www.google.com/intl/en/press/zeitgeist2010/
http://ngrams.googlelabs.com/
http://www.google.com/intl/en_uk/analytics/
http://www.google.org/#one
http://www.google.com/landing/searchtips/engineers.html
http://code.google.com/apis/chart/docs/gallery/graphviz.html
http://code.google.com/p/choosel/
http://dataappeal.com/
Andy Kirk, http://www.visualisingdata.com
2012-05-24 23
Tools for Mapping► Google Maps & Google Earth► ArcGIS► GeoCommons► OpenHeatMap► Indiemapper► InstantAtlas► Target Map► TileMill► Polymaps► Color Brewer► Dotspotting► DataMaps.eu► GeoTime
http://www.google.co.uk/help/maps/tour/
http://www.arcgis.com/home/index.html
http://geocommons.com/
http://www.openheatmap.com/
http://indiemapper.com/
http://www.instantatlas.com/Choose_your_language.xhtml
http://www.targetmap.com/
http://tilemill.com/index.html
http://polymaps.org/
http://colorbrewer2.org/
http://dotspotting.org/
http://www.datamaps.eu/
http://geotime.com/
Andy Kirk, http://www.visualisingdata.com
2012-05-24 24
Specialist Tools and Visualisation Communities► Many Eyes► Visual.ly► Visualizing Player► Number Picture► Parallel Sets► Dipity► Wordle► Tagxedo► VisualEyes► Wordlings► Chartle► ChartsBin► Simple Usability► Fineo
http://www-958.ibm.com/software/data/cognos/manyeyes/
http://visual.ly/
http://www.visualizing.org/
http://numberpicture.com/
http://eagereyes.org/parallel-sets
http://www.dipity.com/
http://www.wordle.net/
http://www.tagxedo.com/
http://www.viseyes.org/
http://wordlin.gs/
http://www.chartle.net/
http://chartsbin.com/
http://www.simpleusability.com/services/usability/eye-tracking
http://fineo.densitydesign.org/custom/
Andy Kirk, http://www.visualisingdata.com
2011/12
25
Combination of Many Disciplines
Given complexity of data, insights from diverse fields are required to provide
meaningful solutions:
(Ben Fry – “Visualizing Data”)
Statistics
Data Mining
Graphic Design
Computer Science
Data/Info Visualisation
2012-05-24 26
Pick an area of interest/define your requirements, then drill down...
2012-05-24 27
Primary Texts
2012-05-24 28
“Designing Data Visualizations”
Designing Data VisualizationsIntentional Communication from Data to Display
Noah Iliinsky and Julie Steele
Publisher: O'Reilly Media (September 29, 2011)
ISBN-10: 1449312284
2012-05-24 29
“Visualize This”
Visualize ThisThe Flowing Data Guide to Design, Visualization and Statistics
Nathan Yau
Publisher: Wiley (July 20, 2011)
ISBN-10: 0470944889
2011/12 30
“Visualizing Data”
Visualizing Data
Ben Fry
Publisher: O'Reilly Media (January 11, 2008)
ISBN-10: 1449312284
2012-05-24 31
Course tools(all free/open source)
2012-05-24 32
R Projecthttp://www.r-project.org/
2012-05-24 33
R Studiohttp://rstudio.org
2012-05-24 34
R & R Studio stack
Computer OS
R
R-Studio
Must have R for R Studio to work
2012-05-24 35
Inkscapehttp://inkscape.org/
2012-05-24 36
Pythonhttp://python.org/
► Download & install
» http://wiki.python.org/moin/BeginnersGuide/Download
► Beginners Guide
» http://wiki.python.org/moin/BeginnersGuide/NonProgrammers
2012-05-24 37
Beautiful Souphttp://www.crummy.com/software/BeautifulSoup/
2012-05-24 38
Notepad++http://notepad-plus-plus.org/
2012-05-24 39
7Ziphttp://www.7-zip.org/
2012-05-24 40
Calibrehttp://calibre-ebook.com/
2012-05-24 41
Agenda
1) Background to Data Visualization*
2) Resources
3) Classification of Visualisation
4) The Design Process
5) Demonstration
2012-05-24 42
Classification of Visualization
2012-05-24 43
“Designing Data Visualizations”
Designing Data VisualizationsIntentional Communication from Data to Display
Noah Iliinsky and Julie Steele
Publisher: O'Reilly Media (September 29, 2011)
ISBN-10: 1449312284
2012-05-24 44
Infographics Data Viz
Exploration Explanation
Informative
Complexity1
2
3
4
Classifications of Visualizations
Persuasive Visual Art
2012-05-24 45Figure 1-2. The difference between infographics and data visualization may be loosely determined by the method of generation, the quantity of data represented, and the degree of aesthetic treatment applied.
(Data Visualisations)
(Infographics)
2012-05-24 46
Infographics
Infographics is useful term for referring to visual representation of data that is:
» manually drawn (and therefore a custom treatment of the information)
» specific to the data at hand (and therefore non-trivial to recreate with
different data)
» aesthetically rich (strong visual content meant to draw the eye and hold
interest)
» relatively data—poor (because each piece of information must be manually
encoded)
2012-05-24 47
2012-05-24 48
2012-05-24 49
Infographics Data Viz
Exploration Explanation
Informative
Complexity1
2
3
4
Classifications of Visualizations
Persuasive Visual Art
2012-05-24 50Figure 1-2. The difference between infographics and data visualization may be loosely determined by the method of generation, the quantity of data represented, and the degree of aesthetic treatment applied.
(Data Visualisations)
(Infographics)
2012-05-24 51
Data Visualization
The terms data visualization and information visualization refer to any visual
representation of data that is:
» algorithmically drawn (may have custom touches but is largely rendered with
the help of computerized methods);
» easy to regenerate with different data (the same form may be re-purposed to
represent different datasets with similar dimensions or characteristics);
» often aesthetically barren (data is not decorated); and
» relatively data-rich (large volumes of data are welcome and viable, in contrast
to infographics)
2012-05-24 52
Figure 4-47: Unemployment rates with fitted LOESS curve
2012-05-24 53
2012-05-24 54
Infographics Data Viz
Exploration Explanation
Informative
Complexity1
2
3
4
Classifications of Visualizations
Persuasive Visual Art
2012-05-24 55
Exploration vs Explanation
Exploratory visualization:
► The dataset
► The mind of the designer
Explanatory visualization:
► The mind of the designer
► The mind of the reader
10312310112342583245324650216340921836406341029236401326432654736147236421523452123453456856141232343576153465
?
(2)
(3)
(1)
2012-05-24 56
"Holy Trinity"Designer-Reader-Data
Reader
DesignerData Visual Art
Informative Persuasive
Figure 1-4. The nature of the visualization depends on which relationship (between two of the three components) is dominant.
2012-05-24 57
Infographics Data Viz
Exploration Explanation
Informative
Complexity1
2
3
4
Classifications of Visualizations
Persuasive Visual Art
2012-05-24 58
Informative
http://www.irisheconomy.ie/wp-content/uploads/2009/05/unemployment.gif
2012-05-24 59
Persuasive
2012-05-24 60
http://www.flickr.com/photos/robertpalmer/3743826461/sizes/l/in/photostream/
2012-05-24 61
Visual Art
Nora Ligorano and Marshall Reese designed a project that converts Twitter streams into a woven fiber-optic tapestryhttp://ligoranoreese.net/hber-optic-tapestry)
2012-05-24 62
Infographics Data Viz
Exploration Explanation
Informative
Complexity1
2
3
4
Classifications of Visualizations
Persuasive Visual Art
2012-05-24 63
Agenda
1) Background to Data Visualization*
2) Resources
3) Classification of Visualisation
4) The Design Process
5) Demonstration
2012-05-24 64
The Design Process
2011/12 65
“Visualizing Data”
Visualizing Data
Ben Fry
Publisher: O'Reilly Media (January 11, 2008)
ISBN-10: 0596514557
2011/12
66
Reconcile through single process...
► Must reconcile the various elements
through a single process
► The process begins with:
» a set of numbers
» a question
2011/12
67
Visualization Goals - Technical
1) Highlight data features in order of
their importance
2) Reveal patterns
3) Simultaneously show features across
multiple dimensions
» e.g. time, quantity & geography
2012-05-24 68
Visualization Goals - People
► The goal of your visualization will be informed by:
» Your own goals and motivations
» The needs of your reader
• need for specific information
• to change the reader’s opinions or behaviour
?
2011/12
69
► Iteration & combination
» demonstrates how later decisions can affect earlier stages
Data Visualization Process-7 Stages-
acquire parse filter mine represent refine interact
2011/12
70
Data Process – 7 Stages
1) Acquire Obtain the data (file, disk, over network)
2) Parse Provide some structure for the data's meaning, and order it into categories
3) Filter Remove all but the data of interest
4) Mine Apply methods from statistics or data mining as a way to discern patterns or place the data in mathematical context
5) Represent Choose a basic visual model, such as a bar graph, list or tree
6) Refine Improve the basic representation to make it clearer and more visually engaging
7) Interact Add methods for manipulating the data or controlling what features are visible
(may not need every step in every project)
2012-05-24 71
Represent
► Rule #1 - function then form
► The visual design elements should enhance and enable the function
► The key to a successful visualization is making good design choices
» elegance, simplicity, efficiency
2012-05-24 72
Encodings
2012-05-24 73
Agenda
1) Background to Data Visualization*
2) Resources
3) Classification of Visualisation
4) The Design Process
5) Demonstration
2012-05-24 74
Demonstration(walk-through followed by demo)
2012-05-24 75
“Visualize This”
Visualize ThisThe Flowing Data Guide to Design, Visualization and Statistics
Nathan Yau
Publisher: Wiley (July 20, 2011))
ISBN-10: 0470944889
2012-05-24 76
R Projecthttp://www.r-project.org/
2012-05-24 77
R Studiohttp://rstudio.org
2011/12 78
The R Script
► A file in the R format
► Allows you to save your scripting work
► File (or Ctrl+Shift+N)
» New
• R Script
► Hit “Run” (or Ctrl + Enter) after each
command
2011/12 79
The R Script
2011/12 80
The R Script pane
2011/12 81
Installing packages
► Option 1 (R or R Studio)
» Type the following commands into
the console or R script:
» install.packages(packagename)
» library (packagename)
► Option 2 (R Studio)
» Use GUI as show on right ->
Package installation in R Studio
Activate package
2012-05-24 82
Pythonhttp://python.org/
► Download & install
» http://wiki.python.org/moin/BeginnersGuide/Download
► Beginners Guide
» http://wiki.python.org/moin/BeginnersGuide/NonProgrammers
2012-05-24 83
Beautiful Souphttp://www.crummy.com/software/BeautifulSoup/
2012-05-24 84
Inkscapehttp://inkscape.org/
2012-05-24 85
Process(roughly)
BeautifulSoupcolorize
_svg.py
counties.svg/cmd
(or double-click to run)
(run colorize_svg.py)
(uses BS & Python)
(data crunched)(writes to a new file)
2011/12 86
► What to Look For
► Specific Locations
» Just Points
• Map with Dots
• Map with Lines
» Scaled Points
• Map with Bubbles
► Regions
» Color by Data
• Map Counties
• Map Countries
Chapter 8: Visualizing Spatial Relationships
2011/12 87
Map the points
2011/12 88
Map with Dots
► R, although limited in mapping functionality, makes placing dots on a map easy
► The maps package does most of the work
» install via Package Installer or console.
► Next step: Load the data. Use the Costco locations that you just geocoded, or load it
directly from the URL
costcos <read.csv("http://book.flowingdata.com/ch08/geocode/costcosgeocoded.csv", sep=",")
NewR scriptfile
2011/12 89
Costco
2011/12 90
Mapping – first layer
► When you create your maps, it’s useful to think of them as layers (regardless of the
software in use).
► The bottom layer is usually the base map that shows geographical boundaries, and then
you place data layers on top of that.
► In this case the bottom layer is a map of the United States, and the second layer is
Costco locations
Figure 8-2: Plain map of the United States
map(database="state")
2011/12 91
Mapping – second layer
► The second layer, or Costco’s, are then mapped with the symbols() function.
symbols(costcos$Longitude, costcos$Latitude,
circles=rep(1, length(costcos$Longitude)), inches=0.05, add=TRUE)
Figure 8-3: Map of Costco locations
symbols()
2011/12 92
Change colours
► Change the colors of both the map and the circles so that the locations stand out and
boundary lines sit in the background
Figure 8-4: Using color with mapped locations
map(database="state", col="#cccccc")
symbols(costcos$Longitude, costcos$Latitude, bg="#e2373f", fg="#ffffff",
lwd=0.5, circles=rep(1, length(costcos$Longitude)),
inches=0.05, add=TRUE)
2011/12 93
Result?
► Not bad for a few lines of code. Costco has clearly focused on opening locations on the
coasts with clusters in southern and northern California, northwest Washington, and in
the northeast of the country.
Figure 8-4: Using color with mapped locations
2011/12 94
Anything missing?(US geography question)
2011/12 95
Alaska & Hawaii
► Alaska and Hawaii are in the “world” database, so you need to map the entire world
Figure 8-5: World map of Costco locations
map(database="world", col="#cccccc")symbols(costcos$Longitude, costcos$Latitude, bg="#e2373f", fg="#ffffff", lwd=0.3, circles=rep(1, length(costcos$Longitude)), inches=0.03, add=TRUE)
2011/12 96
State specific
Figure 8-6: Costco locations in selected states
► Say you want to only map Costco locations
for a few states. You can do that with the
region argument.
map(database="state", region=c("California", "Nevada", "Oregon",
"Washington"), col="#cccccc")
symbols(costcos$Longitude, costcos$Latitude, bg="#e2373f", fg="#ffffff",
lwd=0.5, circles=rep(1, length(costcos$Longitude)), inches=0.05,
add=TRUE)
► Some dots are not in any of those states
» easy to remove in Inkscape
2011/12 97
► What to Look For
► Specific Locations
» Just Points
• Map with Dots
• Map with Lines
» Scaled Points
• Map with Bubbles
► Regions
» Color by Data
• Map Counties
• Map Countries
Chapter 8: Visualizing Spatial Relationships
2011/12 98
Figure 8-7: Drawing a location trace
2011/12 99
Map with Lines
► Draw the lines by simply plugging in the two columns into lines(). Also specify color
(col) and line width (lwd).
► Now also add dots, exactly like you just did with the Costco locations
Figure 8-7: Drawing a location trace
symbols(faketrace$longitude, faketrace$latitude, lwd=1, bg="#bb4cd4", fg="#ffffff", circles=rep(1, length(faketrace$longitude)), inches=0.05, add=TRUE)
lines(faketrace$longitude, faketrace$latitude, col="#bb4cd4", lwd=2)
NewR scriptfile
2011/12 100
Figure 8-8: Drawing worldwide connections
2011/12 101
Drawing Connections
► It could be interesting to draw lines from one location to all the others
Figure 8-8: Drawing worldwide connections
► Isn’t very informative, but maybe
you can find a good use for it
► The point here is that you can draw a
map and then use R’s other graphics
functions to draw whatever you want
using latitude and longitude
coordinates.
map(database="world", col="#cccccc")for (i in 2:length(faketrace$longitude)1) { lngs < c(faketrace$longitude[8], faketrace$longitude[i]) lats < c(faketrace$latitude[8], faketrace$latitude[i]) lines(lngs, lats, col="#bb4cd4", lwd=2)
} (run function as a block)
2011/12 102
► What to Look For
► Specific Locations
» Just Points
• Map with Dots
• Map with Lines
» Scaled Points
• Map with Bubbles
► Regions
» Color by Data
• Map Counties
• Map Countries
Chapter 8: Visualizing Spatial Relationships
2011/12 103
Figure 8-10: Rates more clearly explained for a wider audience
2011/12 104
Scaled Points
► Usually,don’t just have a location
» also have other values, e.g
• sales volume
• city population
► Use the principle of bubble plot and
apply it to a map
2011/12 105
► The code is almost the same as when you mapped Costco locations, but remember you
just passed a vector of ones for circle size in the symbols() function. Instead, we use the
sqrt() of the rates to indicate size.
fertility <
read.csv("http://book.flowingdata.com/ch08/points/adolfertility.csv")
map(‘world’, fill = FALSE, col = "#cccccc")
symbols(fertility$longitude, fertility$latitude,
circles=sqrt(fertility$ad_fert_rate), add=TRUE,
inches=0.15, bg="#93ceef", fg="#ffffff")
Figure 8-9: Adolescent fertility rate worldwide
NewR scriptfile
2011/12 106
Figure 8-10: Rates more clearly explained for a wider audience
2011/12 107
► What to Look For
► Specific Locations
» Just Points
• Map with Dots
• Map with Lines
» Scaled Points
• Map with Bubbles
► Regions
» Color by Data
• Map Counties
• Map Countries
Chapter 8: Visualizing Spatial Relationships
2011/12 108
Regions
► Mapping points can take you only so far
because they represent only single
locations.
► Large scale data is usually aggregated
over whole counties, states, countries,
and continents
► Use Python and SVG to generate map
» Python - to process the data
» SVG - for the map
http://www.nevron.com/Gallery.DiagramFor.NET.Maps.ChoroplethMaps.aspx
2011/12 109
Color By Data
► Choropleth maps are the most common way to map regional data
► Based on some metric, regions are colored following a color scale that you define
Figure 8-11: Choropleth map framework
2011/12 110
Using colours
► When you have your color scheme, you have two more things to do:
» Scale - decide how the colors you picked match up to the data range
» Location - assign colors to each region based on your choice
http://gismapcatalog.blogspot.com/2010/07/standardized-choropleth-map.html
2011/12 111
► What to Look For
► Specific Locations
» Just Points
• Map with Dots
• Map with Lines
» Scaled Points
• Map with Bubbles
► Regions
» Color by Data
• Map Counties
• Map Countries
Chapter 8: Visualizing Spatial Relationships
2011/12 112
Unemployment by county
2011/12 113
Connect data & map
Unemploymentrates
Beautiful SoupPython
“colorize_svg.py”
New map
Blankmap
2011/12 114
Connect data & map
Beautiful SoupPython
“colorize_svg.py”
2011/12 115
File structure
2011/12 116
Get data
► U.S. Bureau of Labor Statistics provides county-level unemployment
data every month
► Download the data at
http://book.flowingdata.com/ch08/regions/unemploymentaug2010.txt.
► There are six columns:
► For the purposes of this example, only interested in COUNTY ID (FIPS) and the RATE
1) is a code specific to the Bureau of Labor Statistics2) and 3) are a unique id specifying county4) is the county name and 5) is the month the rate is an estimate of6) is the estimated percentage of people in the county who are
unemployed
2011/12 117
US Unemployment figures (BLS)
LAUS_CODE,STATE_FIPS,COUNTY_FIPS,COUNTY,MONTH,RATECN010010,01,001,"Autauga County, AL",Aug10(p),8.1PA011000,01,003,"Baldwin County, AL",Aug10(p),8.2CN010050,01,005,"Barbour County, AL",Aug10(p),11.6CN010070,01,007,"Bibb County, AL",Aug10(p),10.1CN010090,01,009,"Blount County, AL",Aug10(p),8.3CN010110,01,011,"Bullock County, AL",Aug10(p),15.0CN010130,01,013,"Butler County, AL",Aug10(p),12.2PA010250,01,015,"Calhoun County, AL",Aug10(p),9.1CN010170,01,017,"Chambers County, AL",Aug10(p),13.6CN010190,01,019,"Cherokee County, AL",Aug10(p),8.8CN010210,01,021,"Chilton County, AL",Aug10(p),9.4CN010230,01,023,"Choctaw County, AL",Aug10(p),11.1CN010250,01,025,"Clarke County, AL",Aug10(p),15.8CN010270,01,027,"Clay County, AL",Aug10(p),13.3CN010290,01,029,"Cleburne County, AL",Aug10(p),8.4CN010310,01,031,"Coffee County, AL",Aug10(p),7.3PA010900,01,033,"Colbert County, AL",Aug10(p),9.2CN010350,01,035,"Conecuh County, AL",Aug10(p),15.4CN010370,01,037,"Coosa County, AL",Aug10(p),12.2
2011/12 118
Get map
► Blank map from Wikimedia Commons:
http://commons.wikimedia.org/wiki/File
:USA_Counties_with_FIPS_and_names.svg
► download SVG file and save as
counties.svg, in the same directory
that you save the unemployment data
2011/12 119
Download the SVG file
http://commons.wikimedia.org/wiki/File:USA_Counties_with_FIPS_and_names.svg
2011/12 120
SVG map file
► SVG (scalable vector graphics) is an XML file
► It’s text with tags, and you can edit it in a text editor like you would an HTML file
► The browser or image viewer reads the XML, and the XML tells the browser what to show,
such as the colors to use and shapes to draw.
2011/12 121
Figure 8-15: Blank SVG county map from Wikimedia Commons
2011/12 122
SVG - colour of each state
► Change the fill color of each county to match the corresponding unemployment rate
► There are more than 3,000 counties so use Beautiful Soup to make parsing XML and HTML
easy
<path style="fontsize:12px;fill:#d0d0d0;fillrule:nonzero;stroke:#000000;strokeopacity:1;strokewidth:0.1;strokemiterlimit:4;strokedasharray:none;strokelinecap:butt;markerstart:none;strokelinejoin:bevel"
2011/12 123
Load the elements(create a small script/program)
colorize.svg.py► Open a blank file in the same directory
as your SVG map and unemployment
data
► Save it as colorize_svg.py
► Follow instructions from book to
construct the script
2011/12 124
Connect data & map)
FIPS codes
► The challenge is to somehow link the unemployment data to the county map
► The linkage = the FIPS codes (Federal Information Processing Standard)
Underemploymentrates
Blankmap
2011/12 125
US Unemployment figures (BLS)
LAUS_CODE,STATE_FIPS,COUNTY_FIPS,COUNTY,MONTH,RATECN010010,01,001,"Autauga County, AL",Aug10(p),8.1PA011000,01,003,"Baldwin County, AL",Aug10(p),8.2CN010050,01,005,"Barbour County, AL",Aug10(p),11.6CN010070,01,007,"Bibb County, AL",Aug10(p),10.1CN010090,01,009,"Blount County, AL",Aug10(p),8.3CN010110,01,011,"Bullock County, AL",Aug10(p),15.0CN010130,01,013,"Butler County, AL",Aug10(p),12.2PA010250,01,015,"Calhoun County, AL",Aug10(p),9.1CN010170,01,017,"Chambers County, AL",Aug10(p),13.6CN010190,01,019,"Cherokee County, AL",Aug10(p),8.8CN010210,01,021,"Chilton County, AL",Aug10(p),9.4CN010230,01,023,"Choctaw County, AL",Aug10(p),11.1CN010250,01,025,"Clarke County, AL",Aug10(p),15.8CN010270,01,027,"Clay County, AL",Aug10(p),13.3CN010290,01,029,"Cleburne County, AL",Aug10(p),8.4CN010310,01,031,"Coffee County, AL",Aug10(p),7.3PA010900,01,033,"Colbert County, AL",Aug10(p),9.2CN010350,01,035,"Conecuh County, AL",Aug10(p),15.4CN010370,01,037,"Coosa County, AL",Aug10(p),12.2
2011/12 126
Connect data & SVG (map)
► Each path in the SVG file has a unique id
» combined FIPS state and county FIPS code:
id="01001" inkscape:label="Autauga, AL”
2011/12 127
Run the Python script
$ python colorize_svg.py > colored_map.svg
2011/12 128
Possible code problem...unemployment = {}rates_only = [] # To calculate quartilesmin_value = 100; max_value = 0; past_header = Falsefor row in reader: if not past_header: past_header = True continue try: full_fips = row[1] + row[2] rate = float( row[5].strip() ) unemployment[full_fips] = rate rates_only.append(rate) except: pass
In book...
Finished script...
http://book.flowingdata.com/ch08/regions/colorize_svg.py.txt
2011/12 129
Figure 8-18: Choropleth map showing unemployment rates
► Open your new choropleth map in a modern browser such as Firefox, Safari, or Chrome
or in Inkscape to see the fruits of your labor
2011/12 130
Next... unemployment rates divided by quartiles
2011/12 131
Define thresholds is by quartiles
► Another common way to define thresholds is by quartiles
» This means that a quarter of the counties have rates below 6.9 percent, another
quarter between 6.9 and 8.7, one between 8.7 and 10.8, and the last quarter is
greater than 10.8 percent
# Quantile scaleif rate > 10.8: color_class = 3elif rate > 8.7: color_class = 2elif rate > 6.9: color_class = 1else: color_class = 0
2011/12 132
Define thresholds is by quartiles
► Use four colors to represents a quarter of the regions
» one shade per quarter
colors = ["#f2f0f7", "#cbc9e2", "#9e9ac8", "#6a51a3"]
2011/12 133
Quartiles for re-use
► Instead of hard-coding the values 6.9, 8.7, and 10.8 in your code, you can replace those
values with q1, q2, and q3, respectively.
» The advantage of calculating the values programmatically is that you can reuse the
code with a different dataset just by changing the CSV file
# Quartiles rates_only.sort()q1_index = int( 0.25 * len(rates_only) )q1 = rates_only[q1_index]
q2_index = int( 0.5 * len(rates_only) )q2 = rates_only[q2_index]
q3_index = int( 0.75 * len(rates_only) )q3 = rates_only[q3_index]
2011/12 134
Modify the script(or create a new one)
colorize.svg.py► Follow instructions in book to construct
the next script example
► Minor alterations
2011/12 135
Figure 8-19: Unemployment rates divided by quartiles
2011/12 136
Customise and reuse
► You can edit the SVG file in Inkscape,
change border colors and sizes, and add
annotation to make it a complete
graphic for a larger audience (hint: It
still needs a legend) and that fits with
the theme of your project.
► The code is reusable - you can apply it
to other datasets that use the FIPS
code.
2012-05-24 137
In action...
2012-05-24 138
Summary
1) Background to Data Visualization
2) Resources
3) Classification of Visualization
4) The Design Process
5) Demonstration
2012-05-24 139
Take-away Points
1) Open to all
» new domain with many facets
2) Professional-level output is achievable
» practice a few programming and graphic design techniques
3) It's (only) a means to an end
» should affect behaviour
2012-05-24 140
The end.
Thank you :)