43
StudySapuri Data Analytics Platform with Treasure Data Tetsuo Yamabe Recruit Marketing Partners Co., Ltd.

StudySapuri Data Analytics Platform with Treasure Data

Embed Size (px)

Citation preview

Page 1: StudySapuri Data Analytics Platform with Treasure Data

StudySapuri Data Analytics Platform with Treasure Data

Tetsuo Yamabe Recruit Marketing Partners Co., Ltd.

Page 2: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

About Me

Tetsuo Yamabe

2

Data Engineer / Ph.D. (Eng) Communication Design Group Business Development Department Online Learning Development Office Education & Learning Business Division

Page 3: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

About Me

Tetsuo Yamabe

3

Joined RMP at Aug.2015 10 months TD experience Data analytics platform development for our online learning service (a.k.a. StudySapuri)

Page 4: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Page 5: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

• 980 JPY / month ~ • Individual & In class business model

5

Page 6: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Individual In class

Page 7: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Individual In class

Page 8: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

http://www.slideshare.net/Seigen/ss-61816140

Adaptive Learning for personalized LX Collaborative research with Matsuo Lab. at Tokyo Univ.

Page 9: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Outline

1. Background 2. Platform Migration and TD 3. Technical Details 4. Challenges and Future Work 5. Conclusion

9

Page 10: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

1. Background

Page 11: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 11

Page 12: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 12

Recruit Technologies

Recruit Marketing Partners

Page 13: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 13

Recruit Marketing Partners

Recruit TechnologiesQuipper

Page 14: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Quipper

• “Distributors of Wisdom” ‒ Japanese EdTech company launched in London ‒ Teacher-student communication support system

• Worldwide presence in global education scene ‒ London, Tokyo, Manila, Jakarta, Mexico City ‒ Open culture with strong engineering competence ‒ Acquired by Recruit Marketing Partners in Apr. 2015

14

Page 15: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Page 16: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Recruit private cloud

AWS

Before After2016.2.25

Page 17: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

2. Platform Migration and TD

Page 18: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Before “Quipper Migration”

• Main usage ‒ KPI monitoring ‒ Adhoc user activity analytics

• Used together with private Hadoop ‒ WebHive

18

Page 19: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Before “Quipper Migration”

19

Raw tables/logs Transformed tables

Member attributes

Activity logs

Data

Ops

Page 20: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Extract, Transform and Load Pattern

Pros • Easy to use (simple schema, aggregated information) • Easy to maintain (data team perspective) • Reduced size information and logs Cons • Inflexibility in fixed data source and schema definition • Bloating tables • Black-boxed transformation • Communication cost across divisions/companies

20

Page 21: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

After “Quipper Migration”

21

Raw tables/logsScooped tables

Member attributes

Activity logs

Transformed tables

DataInfraDev

Page 22: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Extract, Load and Transform Pattern

Pros • You have everything you need/want • Fully aggregated data in TD Cons • Duplicate business logic • Batch process maintenance cost • Data volume and load time • Learning cost (app data and internal architecture)

22

Page 23: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 23

Contents Performance Monitoring

Customer Support Support

Students Performance Report

Class Status Report

KPI MonitoringSalesman Support

Developer Support Prototyping New FeatureData Science Support

Page 24: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Fact Sheet

• 50+ tables are daily imported by Embulk • 30+ hive queries are invoked by Luigi • 10+ presto queries are scheduled in TD web console • 20+ reports are delivered to 5 business divisions

24

Page 25: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

3. Technical Details

Page 26: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Streaming Insert

Application (Server side)

Databases

Application (Client side)

TD SDK

Kinesis Lambda

DataTank

PlazmaDB

Join /w FDW

Bulk import

System OverviewPayment logs

Video info

Page 27: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Featured Topics

• Client-side events ‒ SPA event tracking ‒ Customized TD tag

• Server-side events ‒ Streaming insert with Kinesis + Lambda

• td-client-python ‒ Durability improvement

27

Page 28: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Featured Topics

• DataTank ‒ Isolate sensitive information from Plazma DB ‒ Data mart store to connect BI

• Luigi ‒ Define data transforming job with table dependency ‒ Invoke Embulk command inside Luigi Jobs

28

Page 29: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Featured Topics

• Bulk import ‒ Cross import from MongoDB and PostgreSQL to

PlazmaDB and DataTank • embulk-input-mongodb • embulk-input-postgresql • embulk-filter-insert • embulk-filter-eval • embulk-output-td • embulk-output-postgresql

29

Page 30: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

4. Challenges and Future Work

Page 31: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Scooped raw tables

Transformed tables

Report tables / marts

Scheduled queries in web console • Select all without conditions • Assign column name in Japanese • Result export to Google spreadsheet

Transform tables in Luigi tasks

Page 32: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Record Set Versioning at Transforming Phase

32

=2016/03/31

2016/04/01

2016/04/02

append

user_0001 user_0002 user_0003

Table C

user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

Table B

user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

Table A

user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

=

=

+

+

+

Partition-based versioning pattern

Page 33: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Record Set Versioning at Transforming Phase

33

create

Table A_yyyymmdd

=2016/03/31user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

+

2016/04/01user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

=+

2016/04/02user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

user_0001 user_0002 user_0003

=+

Table B_yyyymmdd Table C_yyyymmdd

Table-based versioning pattern

Page 34: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Record Set Versioning at Transforming Phase

• Table-based versioning doesn’t fit TD ‒ Increased table degrades query performance ‒ Union operator is needed for all the tables ‒ Append and remove is not realistic

• Partition-based versioning with “once a day” rule ‒ Drop daily partition first before record insert ‒ ALTER TABLE capability would be helpful to

invoke drop partition in a query34

Page 35: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Reuse Application’s Business Logic

• Frequently appearing clause should be defined as a common UDF or view ‒ Incl. schema definition, const definition etc ‒ TD is missing both UDF and view features

• Preliminary transform complicated tables in application side before loading into TD? ‒ Hybrid approach ‒ Reuse application code

35

Page 36: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Other topics

• Increasing users across division ‒ Account management (incl. dev/ops/biz) ‒ Race condition in Presto resource ‒ Large file delivery via web console

• Presto/Hive query testing framework ‒ Test against small dataset with Presto/Hive SQL

interface?

36

Page 37: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

5. Conclusion

Page 38: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Success Factors

• TD allows to focus on understanding application and communication with Quipper engineers ‒ Fully managed Hadoop service ‒ Customer support’s quick response

• Different DB but still in same TD ‒ No extra cost at database-cross JOIN ‒ Continuous analytics with JukenSapuri data

38

Page 39: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Success Factors

• Quipper’s culture and strong skills are really helpful to setup a data analytics platform for their application ‒ Global market already had a BQ based platform ‒ Open information and communication

• Slack x GitHub x Google Drive ‒ Clean code with fine readability ‒ HRT : Humanity, Respect, and Trust

• Cultural convergence between Quipper and RMP

39

Page 40: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Conway’s Law?

40

Data

Infra

Dev

Casual open communication over chat + PR

Page 41: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Beyond Monitoring and Reporting

• Sophisticated machine-learning with Hivemall • Realtime data processing and feed to application

41

Page 42: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved.

Distributors of Wisdom x

世界の果てまで最高のまなびを届ける

42

Page 43: StudySapuri Data Analytics Platform with Treasure Data

(C) Recruit Marketing Partners Co.,Ltd. All rights reserved. 43