Treasure DataTreasure Data and Heroku
Masahiro Nakagawa
Heroku Meetup #8 TreasureData + Waza Report!! Thu, 04 Apr 2013
Friday, April 5, 13
Who are you? Masahiro Nakagawa
• @repeatedly / [email protected]
Treasure Data, Inc.• Senior Software Engineer, since 2012/11
Open Source projects• D Programming Language• MessagePack: D, Python, etc...• Fluentd: Core, mongo, etc...• etc...
2
Friday, April 5, 13
Introduction toTreasure Data
Friday, April 5, 13
Company Overview Silicon Valley-based Company
• All Founders are Japanese• Hironobu Yoshikawa• Kazuki Ohta• Sadayuki Furuhashi
OSS Enthusiasts• MessagePack, Fluentd, etc.
4
Friday, April 5, 13
Investors Bill Tai Naren Gupta - Nexus Ventures, Director of Redhat, TIBCO Othman Laraki - Former VP Growth at Twitter James Lindenbaum, Adam Wiggins, Orion Henry - Heroku
Founders Anand Babu Periasamy, Hitesh Chellani - Gluster Founders Yukihiro “Matz” Matsumoto - Creator of Ruby Dan Scheinman - Director of Arista Networks Jerry Yang - Founder of Yahoo! + 10 more people
• and....
5
Friday, April 5, 13
6
Data Volume
Cloud
EnterpriseRDBMSLightweight
RDBMS
DB2
1Bil entryOr 10TB
TraditionalData Warehouse
$10Bmarket
$34Bmarket
Database-as-a-service
Big Data-as-a-Service
On-Premise
© 2012 Forrester Research, Inc. Reproduction Prohibited
Treasure Data = Cloud + Big Data
Friday, April 5, 13
7
Why Cloud? ‘Time’ is Money
CustomerValue
Time
IdealExpectation
Sign-up or PO
Obsoleteover time
Reality(On-Premise)
Upgrade
AWS(or hosted Hadoops)
EC2
EMR
RedShift
S3
Step-by-step manual integrations
Maintain
HW/SW Selection, PoC, Deploy...
Friday, April 5, 13
8
Full Stack Support for Big Data Reporting
Our best-in-class architecture and operations team ensure the integrity and availability of your data.
Data from almost any source can be securely and reliably uploaded using td-agent in streaming or batch mode.
Our SQL, REST, JDBC, ODBC and command-line interfaces support all major query tools and approaches.
You can store gigabytes to petabytes of data efficiently and securely in our cloud-based columnar datastore.
Friday, April 5, 13
Columnar Storage+
HadoopMapReduce
250bil+ records2mil+ jobs
Product9
Data Collection Data Warehouse Data Analysis
Open-SourceLog Collector
2,000+ companies(incl. LinkedIn, etc)
Bulk Loader
CSV / TSVMySQL, Postgres
Oracle, etc.
Web Log
App Log
Sensor
RDBMS
CRM
ERP
Streaming Upload
>60billion / month
BI Tools
Tableau, QlickViewExcel, etc.
RESTJDBC / ODBC
SQL(HiveQL)
orPig
Bulk UploadParallel Upload
Value Proposition:“Time-to-Answer” 20bil+, 2 weeks,
UK/Austria3bil+, 3 weeks
Singapore2 weeks,
US
2 weeks, US
3 weeks,Japan
Dashboard
Custom App,RDBMS, FTP, etc.
Result push
Multi-Tenant: Single Code for Everyone - no code modification, Improving the Platform Faster.
Friday, April 5, 13
Customer Use Cases
Friday, April 5, 13
11
Our Customers – Fortune Global 500 leaders and start-ups including:
Friday, April 5, 13
12
Example in AdTech: MobFox
1. Europe’s largest independent mobile ad exchange.
2. 20 billion imps/month (circa Jan. 2013)
3. Serving ads for 15,000+ mobile apps (circa Jan. 2013)
4. Needed Big Data Analytics infrastructure ASAP.
Friday, April 5, 13
13
Two Weeks From Start to Finish!
Friday, April 5, 13
Viki.com: “Global Hulu”
Friday, April 5, 13
Viki.com Before
Hard to manage Hadoop Complicated data collection
Friday, April 5, 13
Viki.com After
No more Hadoop maintenance Versatile data collector, td-agent
Friday, April 5, 13
Our Usage
Friday, April 5, 13
18
https://console.treasure-data.com/
Friday, April 5, 13
19
http://fluentd.org/
Friday, April 5, 13
Staging environment
Internal testing application
Proxy server for our used services
Other usage
20
Friday, April 5, 13
Heroku integration
Friday, April 5, 13
22
http://blog.treasure-data.com/post/44003014921/treasure-data-is-sponsoring-heroku-waza-2013
Friday, April 5, 13
23
Matz
http://www.wired.com/business/2013/03/heroku-waza/ http://instagram.com/p/WTIEwpA_9-/#
Friday, April 5, 13
24
https://addons.heroku.com/provider/resources/technical/how/overview
Heroku addons
Friday, April 5, 13
25
Friday, April 5, 13
26
https://addons.heroku.com/treasure-data
Friday, April 5, 13
Setup “td” command• Install via td-toolbelt or rubygems
Setup “td” heroku plugin • heroku plugins:install https://github.com/treasure-data/
heroku-td.git Add ‘td’ gem to your Gemfile
• or STDOUT log collecting “heroku td” is now available for Treasure Data
• “heroku td xxx”: xxx is the same as “td” command
Using Heroku addon
27
https://devcenter.heroku.com/articles/treasure-data
Friday, April 5, 13
Just STDOUT Use STDOUT to collect event logs
• No need libraries• log forward via Heroku syslog drain
Format• @[db_name.table_name] json_in_one_line• Ruby:
puts '@[service.users] {"name":"D", "via":"Phobos"}'
28
http://blog.treasure-data.com/post/41886298790/just-stdout-the-simplest-most-flexible-way-to-collect
Friday, April 5, 13
29
Friday, April 5, 13
Treasure Data• Cloud based Big-data analytics platform• Provide Machete for Big data reporting
Heroku and Treasure Data• Treasure Data addon
• easy to integrate with your Heroku app• STDOUT log collecting with Heroku syslog drain
Conclusion
30
Friday, April 5, 13
Big Data for the Rest of Us
www.treasure-data.com | @TreasureData
Friday, April 5, 13