46
[email protected] Hortonworks Stockholm Summit 2018-12-06 Journey in country of data access governance 2018-04-18 1

access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

[email protected] Stockholm Summit 2018-12-06

Journey in country of data access governance

2018-04-18 1

Page 2: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

22018-04-18

Who is talking?

Magnus Runesson

Data Engineer @ Svenska Spel

DeveloperOpsRDBMSBigDataHigh performance

Page 3: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

Gaming is for everyone´s enjoyment

Page 4: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

42018-04-18

Why?

Svenska Spel’s data warehouse

Atlas & Ranger

How did we implement it?

Learnings

Conclusions

Agenda

Page 5: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

52018-04-18

GDPR requires

• clear purpose for PII data

• privacy by design

• clear consent or legal ground

• not to use/store PII if not needed

• people own their own data.

• penalty if not followed

Why?New gaming market requires

• introduce multi tenancy

Page 6: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

62018-04-18

Our customers and partners integrity is protected

Follow competition regulation

Users have only access to data aimed for current purpose

Keep doing our required processing

Adaptable for new requirements

Maintainable solution

Goals

Page 7: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

72018-04-18

Svenska Spel’s data warehouse

Page 8: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

82018-04-18

Moved from classic Cognos + Oracle

HDP 2.6 using Hive

Includes Personal Identifiable Information (PII)

300+ event streams in

150+ published tables and views

Svenska Spel’s data warehouse

Page 9: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

92018-04-18

Used data are

Understood

Documented

Modelled

Modelled with Data Vault

Oracle SQL Developer Data Modeler

SQL code generated from model

Model based development

Page 10: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

102018-04-18

History tracking

Uniquely linked

Pattern based

Easy to generate code

Easy to add new sources

Data Vault

Link

Hub

Hub

Satellite

Satellite Satellite

Page 11: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

11

CRM mart

ETLAnonymization

Data

Lak

e

Inte

grat

ion

Data

Vau

lt

Dim

ensio

nm

art

ETL BI martExasolTableau

Hadoop Presentation

Role based access

CRM

Whitelisting

Page 12: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

122018-04-18

Apache Atlas and Ranger

Page 13: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

132018-04-18

Metadata about resources

Resource is

Table

Column

Schema

File on HDFS

Lineage

Apache Atlas

Page 14: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

142018-04-18

Tags have no meaning themselves

Your business vocabulary define the meaning

Example of tags:

Business entity owning the data

Indication of sensitive data

The rules in Ranger enforces the policy

Separate metadata from policy implementation

Atlas tags

PII

Page 15: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

152018-04-18

Is user U allowed to do operation O on resource R?

Access

Row based filtering

Masking

Audit logging

Resources referred with tags

Apache Ranger

Page 16: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

162018-04-18

customerCustomer_id Name Postal_code Has_phone Marketing

1 Steve 12345 False False

2 Bill 54321 True False

3 Paul 54672 False True

Table in Hive before we started our work

Page 17: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

172018-04-18

customerCustomer_id Name Postal_code Has_phone Marketing

1 Steve 12345 False False

2 Bill 54321 True False

3 Paul 54672 False True

PII_table

PII

Add PII tags on table and columns in Atlas.No behaviour change.

PII

Page 18: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

182018-04-18

customerCustomer_id Name Postal_code Has_phone Marketing

17 ABC 12345 False False

42 DEF 54321 True False

13 BDE 54672 False True

PII

We set a rule in Ranger to mask PII columnsAnalyst viewPII_table

PII

Page 19: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

192018-04-18

customerCustomer_id Name Postal_code Has_phone Marketing

3 Paul 54672 False True

PII

Ranger restrict our CRM user to only see rows withMarketing = TruePII_table

PII

Page 20: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

202018-04-18

How did we implement this?

Page 21: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

212018-04-18

Development process

Change model

Store model

Generate code

Deploy

PII

Add rules

Page 22: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

222018-04-18

• In-house tool

• Template based generation of SQL/HQL

• Generate files with tag-information

• Tables and columns respectively

HQL generator

HQL generator

CSV SQL

PII

Page 23: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

232018-04-18

schema;table;attribute;tags

dim_mart;customer_d;customer_id;PII,Sensitive

dim_mart;customer_d;has_phone;

Corresponding file for tables without attribute(column)

Tag file for columns

Page 24: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

242018-04-18

Hand coded of rules per tag

Policy tool applies rule on all tables with the tag

Can be different rules for different users

Filter gets appended to where condition by Ranger

Used for

Row based filtering (access)

Masking (anonymization)

Catch all rule to deny access to tables not in our model

Ranger rules

Page 25: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

25

{ "command": "apply_tag_row_rule", "filters": [

{ "groups": [ "tenant_1"],

"users": [], "tagFilterExprs": [ { "tags": [ "multitenant" ], "filterExpr": "${table}.tenant_id = 1" } ] },...

Ranger rule filter example

Page 26: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

262018-04-18

Deployment process

*.sqltable_tags.csvcolumn_tags.csvranger_policies.json

Apply *.sql DDL

Policy tool - tag files

Policy tool - policy file

Page 27: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

272018-04-18

• Makes it easy to manage

• Atlas tags

• Ranger policy rules

• Command line tool

• Consumes tags from CSV files

• Consumes policies from JSON files

• Calls Atlas and Ranger API

• Ensure same access on Hive as HDFS (not filtering and masking)

• Supports tag-based filtering

Policytool

Page 28: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

282018-04-18

Put everything together

Page 29: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

292018-04-18

Development process

Change model

Store model

Generate code

Deploy

PII

Add rules

*.sqlcolumn_tags.csvtable_tags.csvtag_row_policies.csv

Page 30: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

302018-04-18

Deployment process

*.sqlcolumn_tags.csvtable_tags.csvranger_policies.json

Apply *.sql DDL

Policy tool - tag files

Policy tool - policy file

Page 31: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

312018-04-18

Change in view of an AnalystBefore

CRM

Analyst

Page 32: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

322018-04-18

Learnings

Page 33: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

33

Work closely with the business

Avoid too complex rules

Minimize number of rules

Use {user}, public and other alias Ranger uses.

Clear business rules

Page 34: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

34

People do unconsciously things differently

Keep hdfs and hive rules in sync

Use tags as much as possible

Systematic model

Page 35: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

35

Ensure rules are in sync with what is deployed

Use CI/CD

Ask HW for latest patches on 2.6.5 (ATLAS-2634, HIVE-20633, ATLAS-2891, ATLAS-2975)

Automate

Page 36: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

362018-04-18

Hey, would it not be nice to have the same rules in the

presentation layer?

Page 37: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

37

CRM mart

ETLAnonymization

Data

Lak

e

Inte

grat

ion

Data

Vau

lt

Dim

ensio

nm

art

ETL BI martExasolTableau

Hadoop Presentation

Role based access

CRM

Whitelisting

Page 38: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

38

Transfer rules and tags to Exasol

Use virtual schemas to apply them

Reduce amount of data in Exasol

Lower license cost

Single source of truth of access policies

Atlas & Ranger on Exasol

Page 39: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

392018-04-18

• Simple and easy model

• Limited performance penalty

• Tag on table with masking rule => all columns masked

• Lot of moving pieces

• Hard to understand API doc

• Restriction on Ranger row based filtering (not on tags)

• Row based filtering and masking not on direct file access

Experiences of Atlas and Ranger

Page 40: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

402018-04-18

• Our customers and partners integrity is protected

• Users have only access to data aimed for current purpose

• Keep doing our required processing

• Adaptable for new requirements

• Maintainable solution

Reached Goals

Page 41: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

412018-04-18

• Goals reached

• No SQL changes

• Scale when new datasets added

• Our data model is guaranteed in sync

• Better comments in Hive

• Minimal impact on ETL developers workflow

Conclusions

Page 42: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

422018-04-18

• Make it as simple as possible

• Automate

• Know your tool

• Be clear on your authorization model

• Know your data

Takeaways

Page 43: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

43

cobra-policytool on GitHub https://github.com/SvenskaSpel/cobra-policytool

Resources

Page 44: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

[email protected]@MRunesson

Thank you!

2018-04-18 44

karriar.svenskaspel.se

Page 45: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

452018-04-18

BONUS - How everything is connected

Page 46: access governance Journey in country of data€¦ · Journey in country of data access governance 2018-04-18 1. S v 2018-04-18 2 Who is talking? Magnus Runesson Data Engineer @ Svenska

SvSv

46