12
DATA (SCIENCE) GOVERNANCE. DATA SCIENCE IN BANKING, 23-5-2015 BRUSSELS DATA SCIENCE COMMUNITY. Bart Hamers be.linkedin.com/in/hamersbart

Data Science Governance

Embed Size (px)

Citation preview

DATA (SCIENCE) GOVERNANCE.

DATA SCIENCE IN BANKING, 23-5-2015 BRUSSELS DATA SCIENCE COMMUNITY.

Bart Hamers be.linkedin.com/in/hamersbart

DATA SCIENCE IN BANKING

Marketing • Customer segmentation

• LTV • Cross & upselling • Churn

Risk Management • Credit Risk • Market Risk • Operational Risk

Markets • Pricing • Trading • High Frequency Trading

Security & Fraud • Intrusion detection • Anti Money Laundering

• Rogue Trading

BANKING: RULES, RULES AND MORE RULES

risk

bank

data

reporting

aggregation

managementprinciples

supervisors

capabilities

include

information

requirements

expect

practices

processes

appropriate basel board

business committee

crisis

effective ensure

exposures

meet

review senior stress

timely

able accuracy action

apply

enhancements

financial governance group

identify implementation improve

internal level

measures needs

recipients relevant

supervisory system

ability accurate

assess

completeness

compliance cooperation critical decision-making develop

document eg

framework frequency g-sibs

infrastructure integrity key limited

material

operations organisation

provide remedial

requests

type used validation

•  Basel 3 •  CDR IV •  Solvency II •  BSBS 239 •  …

The regulatory text also influence all aspects of data science modeling.

HOW SHOULD WE DEAL WITH THIS?

The results of all data science initiatives produce new information and data.

Using data science, data even more becomes a company asset.

All ‘traditional’ principles of data quality management and data governance remain applicable.

PRINCIPLES OF DATA (SCIENCE) QUALITY?

Recency

Volatility Timeliness

Inter-relational

Time

Intra-relational

Co

nsis

ten

cy

q  Time: the time dimension of the data science q  Volatility: characterizes the frequency with which

data vary in time and models need to be refreshed.

q  Timeliness: expresses how current the models are for the task at hand

q  Recency: how promptly are DS results updated. (outdated information)

q  Accuracy: the closeness between real-life phenomena and its representation

q  Validity : the semantic meaning of the data science results. Are the results following the business logic

q  Comprehensiveness: ability of the user to interpret correctly the data science results

q  Metadata: Is there formal description of the data science wrt technical, operational and business information.

q  Can the data science results easy by understood by non-technical users.

q  Consistency: Captures inconsistencies between similar data attributes in data

q  Inter-relational: captures of the violation or conflicting opinions of the data science results on the same data

q  Intra-relational: captures of the risk of a to limited view on the subject. (ex. only cross selling, no churn and LTV view. )

q  Completeness: degree to which concepts are not missing

q  Can and do we cover the full client portfolio?

q  Operational Risk : Is the data secured in terms of human and IT errors?

q  Human aspects: ad hoc human manipulation, unfollowed regulations and hierarchical access levels

q  IT aspects: unrealistic implementation

MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE

1.  Data science should focus on the end-user’s needs.

2.  Data science should be well managed, it should be transparent who has the authority to create, modify, delete, use and control the data science initiatives.

3.  The data science results should be trustworthy.

4.  All data science should be easily available for the end-users

5.  Data science should be fit-for-purpose.

6.  Data science initiatives should be globally managed in order to be lean, agile and forward looking.

MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE

1. Data science initiatives should focus on the end-user’s needs.

•  What is the business problem we are trying to solve? •  Will the data science solution provide a measurable

improvement and how will this be evaluated?

MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE

2. Data science should be well managed, it should be transparent who has the authority to create, modify, delete, use and control the data science initiatives.

•  Apply data governance principles to data science in order to create policies and install trust. •  Ownership, stewardship, end-users,… •  Ownership is at business side!

•  Write guidelines about who and how the data science results can be used without constraining the usage.

MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE

3. The the results of data science should be trustworthy.

•  Guarantee the data quality used by the models. •  More (big) data is not a solution for bad quality data.

•  Test and backtest the result of your model frequently. •  Test your results on accuracy, precision and stability. •  The results quantitatively and qualitatively. •  Take into account the time dimension and expiration

date of the results.

MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE

4. All data science results should be easily available for the end-users

•  Data science you not be something magical for the happy few.

•  A data driven company is only created by sharing the data results at all levels of the company. •  Marketing predictions •  Sales predictions •  Risk and finance forecasting •  Business process optimization.

MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE

5. Data science should fit-for-purpose.

•  Never forget Occam’s razor!

•  Be aware of the risk of over-fitting!

MY 6 PRINCIPLES OF DATA (SCIENCE) GOVERNANCE

6. All data science initiatives should be globally managed in order to be lean, agile and forward looking.

•  Do not create data science silos. •  Share your experience, systems, methodologies and

data. •  Create data sandboxes. •  Define a forward looking data strategy linked to your

business plan. (data is not collected overnight.)