14
Building Credit Infrastructure with Anaconda

Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Embed Size (px)

Citation preview

Page 1: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Building Credit Infrastructure with Anaconda

Page 2: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Introduction

Hussain Sultan/ Data Scientist @ Capital One

Page 3: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Most data-driven analysis is simple

1Consume 2Analyze

� Predictive models

3Act

� Decisions/Rules� Historical performance

� Relational Data � Parameterized model scoring

� Facilitates experimentation

� Recommendations

� Iterative process� SQL data sources

Page 4: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

And the trick is to strike the right balance between technology and people

Data Science Business Analysis+ +• Open Source • Extensible • Easy to maintain

Page 5: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Our challenge is to provide a way for efficient data access and model scoring along with extensible tooling

Data Retrieval

� Medium size data

Model Scoring

� Complex hierarchy of models

Ad-hoc Analysis

� SQL-like analysis

� Ad-hoc access patterns

� Fast retrieval� Lots of model adjustment

knobs

� Aggregations

� KPIs

Page 6: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

We decided to build out a Python package that business analysts could interact to consume analytics

Page 7: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

… and consumed by analysts in Jupyter notebooks

Business Analysts and SMEs

Analysts interact via a custom Python

PackageData ScientistsJupyter

Notebook Server/AEN

EC2 Instances (AWS)

Conda RepositoryCICDGithub

Page 8: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Navigating to an appropriate segmentation

API tree is a source of important metadata around segmentation

Pre-defined segmentation

Mechanism to register new segmentations

Page 9: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

We use blaze as the mechanism to retrieve known segmentations of the data

Blaze

Postgres

RedshiftBackendsTo

Pandas

S3

Page 10: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Model scoring

Model scoring API - Configurable parameters

Page 11: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Dask is a mechanism to wrap inter-dependency of models and score them in an optimum way

Custom Dask Graph

Compute

Pandas

Page 12: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

All of our data is returned as a pandas dataframe

Bring your own tool or create a custom workflow for analysis

Page 13: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

With Jupyter notebooks, analyst can do most medium sized data analysis in a remote kernel

Jupyter Notebooks

(AEN)

Conda Env

Some codeConda

Repository

Execute

Page 14: Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Thank you