Upload
kai-sasaki
View
3.561
Download
0
Embed Size (px)
Citation preview
Embulk makes Japan visible
Kai Sasaki Treasure Data Inc.
Who am I?
• Kai Sasaki (@Lewuathe)
• Treasure Data Inc
• Maintaining and improvingHadoop infrastructure
• Hadoop, Spark contributor
Topic• What is Embulk?
• Embulk ☓ GeoJSON
• DATA.GO.JP (http://www.data.go.jp/)
• DEMO
• Conclusion
What is Embulk?• Parallel bulk data loader
• using plugins
• to make data integration relaxed
http://www.embulk.org/docs/
http://www.slideshare.net/frsyuki/embulk-56197273/4
Plugins
http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed/12
Embulk ☓ GeoJSON• GeoJSON is a format for encoding geographic
data structures
{ “type”: “FeatureCollection”, “features”: [ { “type”: “Feature”, “geometry”: { “type”: “Point”, “coordinates”: [37.0, 128.4] }, “properties”: { “name”: “Point A” } } ] }
Embulk ☓ GeoJSON
https://github.com/benbalter/dc-wifi-social/blob/master/bars.geojson
Embulk ☓ GeoJSON
• embulk-formatter-geojsonhttps://rubygems.org/gems/embulk-formatter-geojson
• Convert any type of source data (csv, tsv, json msgpack etc) supported by input plugin into GeoJSON format.
$ embulk new ruby-formatter …
Embulk ☓ GeoJSON
id,name,population,…1,Tokyo,1000,…2,Osaka,800,…
template.geojson
{ “id”: 1, “properties”: { “name”: “Tokyo”, “population”: 1000 }, “geometry”: <From template.geojson> }
embulk-formatter-geojson$ embulk gem install embulk-formatter-geojson $ cat config.yml … out: type: file formatter: type: geojson template_file: /path/to/template.geojson identifier: “id" … $ embulk run config.yml
DATA.GO.JP
http://www.data.go.jp/
DEMO
http://www.lewuathe.com/opendata/
d3.json(url, function(error, geoJp) { svg.selectAll("path") .data(geoJp.features) .enter().append(“path") .on("mouseover", function(d) { $("#description").text(d.properties["name"]); }) .attr("class", function(d) { return d.id; }) .attr("d", geopath) .attr("fill", function(d) { var prop = d.properties[“population”]; return colors[prop]; }); });
• d3.js (https://d3js.org/)
d3.json(url, function(error, geoJp) { svg.selectAll("path") .data(geoJp.features) .enter().append(“path") .on("mouseover", function(d) { $("#description").text(d.properties["name"]); }) .attr("class", function(d) { return d.id; }) .attr("d", geopath) .attr("fill", function(d) { var prop = d.properties[“population”]; return colors[prop]; }); });
• d3.js (https://d3js.org/)Embedded Properties
Conclusion
• Embulk can be yet another format converter
• GeoJSON as a container including data and topology
• DATA.GO.JP provides various type of open data
Thank you!