Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017

Building Credit Infrastructure with Anaconda

Introduction

Hussain Sultan/ Data Scientist @ Capital One

Most data-driven analysis is simple

1Consume 2Analyze

� Predictive models

3Act

� Decisions/Rules� Historical performance

� Relational Data � Parameterized model scoring

� Facilitates experimentation

� Recommendations

� Iterative process� SQL data sources

And the trick is to strike the right balance between technology and people

Data Science Business Analysis+ +• Open Source • Extensible • Easy to maintain

Our challenge is to provide a way for efficient data access and model scoring along with extensible tooling

Data Retrieval

� Medium size data

Model Scoring

� Complex hierarchy of models

Ad-hoc Analysis

� SQL-like analysis

� Ad-hoc access patterns

� Fast retrieval� Lots of model adjustment

knobs

� Aggregations

� KPIs

We decided to build out a Python package that business analysts could interact to consume analytics

… and consumed by analysts in Jupyter notebooks

Business Analysts and SMEs

Analysts interact via a custom Python

PackageData ScientistsJupyter

Notebook Server/AEN

EC2 Instances (AWS)

Conda RepositoryCICDGithub

Navigating to an appropriate segmentation

API tree is a source of important metadata around segmentation

Pre-defined segmentation

Mechanism to register new segmentations

We use blaze as the mechanism to retrieve known segmentations of the data

Blaze

Postgres

RedshiftBackendsTo

Pandas

S3

Model scoring

Model scoring API - Configurable parameters

Dask is a mechanism to wrap inter-dependency of models and score them in an optimum way

Custom Dask Graph

Compute

Pandas

All of our data is returned as a pandas dataframe

Bring your own tool or create a custom workflow for analysis

With Jupyter notebooks, analyst can do most medium sized data analysis in a remote kernel

Jupyter Notebooks

(AEN)

Conda Env

Some codeConda

Repository

Execute

Thank you

Data & Analytics

Developing & Deploying Credit Risk Models with Anaconda | AnacondaCON 2017