41
Hassle-Free Data Science Apps with Bokeh

Hassle Free Data Science Apps with Bokeh Webinar

Embed Size (px)

Citation preview

Page 1: Hassle Free Data Science Apps with Bokeh Webinar

Hassle-Free Data Science Apps with Bokeh

Page 2: Hassle Free Data Science Apps with Bokeh Webinar

Presenters

Peter Wang is the CTO and Co-founder of Continuum Analytics and the creator of Bokeh.

He has been developing commercial scientific computing and visualization software for over 15 years.

As a creator of the PyData conference, he devotes time and energy to growing the Python data

community, and advocating and teaching Python at conferences worldwide.

Bryan Van de Ven is the lead developer on the Bokeh project.

He holds an undergraduate degree in Computer Science & Mathematics form UT Austin, and a Masters degree in Physics from UCLA.

Previously Bryan developed data exploration and visualization software for sonar feature detection, financial risk modeling, and fluid mixing simulation.

Page 3: Hassle Free Data Science Apps with Bokeh Webinar

Overview• What is Bokeh?

• Overview and tour of major features

• Demo 1: Scikit-learn clustering

• Demo 2: Gapminder

• Demo 3: Streaming data

• Really big data: Preview of data shading

• Q&A

Page 4: Hassle Free Data Science Apps with Bokeh Webinar

Overview of Anaconda

Page 5: Hassle Free Data Science Apps with Bokeh Webinar

is….the modern open source analytics platform powered by Pythonthe fastest growing open data science language• Easy to Build, Maintain & Deploy Analytics• Talks with Everything, Runs Anywhere• High Performance, Scalable Analytics

Page 6: Hassle Free Data Science Apps with Bokeh Webinar

AnacondaAccelerating Adoption of Python for Enterprises

COLLABORATIVE NOTEBOOKSwith publication, authentication, & search

Jupyter/ IPython

PYTHON & PACKAGE MANAGEMENTfor Hadoop & Apache stack Spark

PERFORMANCEwith compiled Python for lightning fast execution

Numba

VISUAL APPSfor interactivity, streaming, & BigBokeh

SECURE & ROBUST REPOSITORYof data science libraries, scripts, & notebooks

Conda

ENTERPRISE DATA INTEGRATIONwith optimized connectors & out-of-core

processing

NumPy & Pandas

Page 7: Hassle Free Data Science Apps with Bokeh Webinar

Anaconda for Data ScienceEmpowering Everyone on the Team

Data Scientist• Advanced analytics with Python & R• Simplified library management• Easily share data science notebooks & packages

Developer• Support for common APIs & data formats• Common language with data scientists• Python extensibility with C, C++, etc.

Business Analyst• Collaborative interactive analytics with

notebooks• Rich browser based visualizations• Powerful MS Excel integration

Data Engineer• Powerful & efficient libraries for data

transformations • Robust processing for noisy dirty data• Support for common APIs & data formats

Ops• Validated source of up-to-date packages including indemnification • Agile Enterprise Package Management• Supported across platforms

Computational Scientist• Rich set of advanced analytics• Trusted & production ready libraries for

numerics• Simplified scale up & scale out on clusters &

GPUs

Page 8: Hassle Free Data Science Apps with Bokeh Webinar

Modern Analytics Stack

Page 9: Hassle Free Data Science Apps with Bokeh Webinar

Write Once, Deploy AnywhereM

ANAG

ED

PYTH

ON

Explore & Visualize

Python & R Advanced Analytics

High Performance & Scalability

Data Engineering & Analysis

Collaboration & Integration

Servers Linux,Windows

OSX

GPUs&HighEndWorkstations

Linux&Windows

NVIDIA,AMD,X86/ARM

Clusters Yarn,Mesos,MPI

Power8,LSF,SungridEngine

NoSQL MongoDB

Cassandra/DataStax

Hadoop Cloudera,Hortonworks

ApacheHadoop&Spark

Files MicrosoftExcel

Trifacta,Import.io

DW&SQL AnySQLDB

AnySQLDW,Impala

Page 10: Hassle Free Data Science Apps with Bokeh Webinar

Bokeh Overview & Tour

Page 11: Hassle Free Data Science Apps with Bokeh Webinar

Bokeh

11

http://bokeh.pydata.org

• Interactive visualization • Novel graphics • Streaming, dynamic, large data • For the browser, with or without a server • No need to write Javascript

Page 12: Hassle Free Data Science Apps with Bokeh Webinar

Versatile Plots

12

Page 13: Hassle Free Data Science Apps with Bokeh Webinar

Novel Graphics

13

Page 14: Hassle Free Data Science Apps with Bokeh Webinar

14

Linked Plots (Notebook 2)

• Easy to show multiple plots and link them • Easy to link data selections between plots • Can easily customize the kind of linkage straight from

Python, without needing to fiddle around with JS

Page 15: Hassle Free Data Science Apps with Bokeh Webinar

15

Flexible Tools (Notebook 3)

• Many useful tools with built-in functionality • Easy to extend with Javascript, if so inclined

Page 16: Hassle Free Data Science Apps with Bokeh Webinar

rBokeh

16http://hafen.github.io/rbokeh

Plays well with R ecosystem: HTMLwidget, RMarkdown…

Page 17: Hassle Free Data Science Apps with Bokeh Webinar

rBokeh with RStudio & Shiny

17

Page 18: Hassle Free Data Science Apps with Bokeh Webinar

Architecture

Page 19: Hassle Free Data Science Apps with Bokeh Webinar

19Server-side Data Processing: Python, Java, etc.

HTML

Javascript

D3 Highcharts Flot nvd3 dcjs

JavaScript Plotting library

CSV, SQL

Data

Traditional Web Visualization

CSSTech: • Python/R/Java • HTML & browser compat • CSS/LESS/Sass • JS plotting library API • Javascript

• jQuery, underscore • svg, canvas2D • webGL, three.js • React • Angular • node.js, browserify,

gulp, grunt, npm, …

Page 20: Hassle Free Data Science Apps with Bokeh Webinar

Browser

HTML

20

HTML

CSSJavascript

User

Data

Python, Ruby, Java, .NET

Server

Traditional Web Viz - Interaction

Javascript

Javascript

Data’

Simple dashboard: Server language generating HTML, JS, CSS styling, subset of data

Handling user interaction: Custom Javascript, calling Server endpoint, which generates updated JSON or JS that gets pushed back to client via websocket

Page 21: Hassle Free Data Science Apps with Bokeh Webinar

Server

Bokeh BokehJS

JSON

(HTML, CSS)

Client

Bokeh Conceptual Architecture

UserPython, R,

Scala

Data

Simple dashboard: Single language, no need to write HTML, JS, CSS

Handling user interaction: Single language that you already know; interactive data updates feel seamless to the user

Page 22: Hassle Free Data Science Apps with Bokeh Webinar

• Skills required: 5-10 skills • Time to market: weeks to months • Server code: 100s to 1000s lines

• Skills required: ~1 skill • Time to market: minutes • Server code: 0

Client

Data

BokehJSPython, RBokeh

Server

Python, Ruby Java, .NET

Data

Client

CSSData

Comparison Chart

Page 23: Hassle Free Data Science Apps with Bokeh Webinar

Some Bokeh Users

Page 24: Hassle Free Data Science Apps with Bokeh Webinar

Community & AdoptionGithub • 3500+ watchers • 680 forks

Mailing list • 400+ members • 150+ posts in November

Downloads • 21,500 / month (conda) • 10,000 / month (pip)

Page 25: Hassle Free Data Science Apps with Bokeh Webinar

25

http://cecp.mit.edu

Embeds Well

Page 26: Hassle Free Data Science Apps with Bokeh Webinar

Demo: Clustering with Scikit-learn

Page 27: Hassle Free Data Science Apps with Bokeh Webinar

Demo Overview

In this demo, we will build a basic application which lets us visualize different kinds of clustering approaches with Scikit-learn.

• We will use a drop-down to select the algorithm • We will write a Python handler function which

responds to the user action, and pushes an update to the plot in the browser.

• Notebook for basic viz: ~25 LOC • Example app with 1 dropdown: < 100 LOC • Multiple dropdown and sliders: < 200 LOC

Page 28: Hassle Free Data Science Apps with Bokeh Webinar

Demo: Gapminder

Page 29: Hassle Free Data Science Apps with Bokeh Webinar

Demo OverviewThis demo shows how we can embed a little bit of Javascript to make a server-less but very capable interactive visualization.

• We will build up the visualization from the ground up, showing different kinds of Bokeh plotting primitives

• We will do it inside the Jupyter Notebook, so we can see our changes immediately

• Then we will wire up an interactive slider

The resulting interactive visualization will be embedded in the browser, with no reliance on a server to handle user interactions.

Page 30: Hassle Free Data Science Apps with Bokeh Webinar

Demo: Animation & Streaming example

Page 31: Hassle Free Data Science Apps with Bokeh Webinar

Demo Overview

In this demo, we will demonstrate how the Bokeh server makes it easy to visualize streaming and dynamic data.

• A minimal example with < 50 LOC • Demonstrates ease of pushing

data from Python code into the browser

Page 32: Hassle Free Data Science Apps with Bokeh Webinar

32

• Realtime audio sampling via PyAudio, realtime FFT via Numpy

• 30 fps • ~200 lines of code

Page 33: Hassle Free Data Science Apps with Bokeh Webinar

Bokeh: Progress and Future

Page 34: Hassle Free Data Science Apps with Bokeh Webinar

Visualizing Big Data: Preview of “Data Shading”

Page 35: Hassle Free Data Science Apps with Bokeh Webinar

35

Billions and billions…

Page 36: Hassle Free Data Science Apps with Bokeh Webinar

36

Data Shading Main Points• When trying to visualize millions of points, browser vs. rich client

doesn’t really matter • Raft of common problems that are ignored: Overdraw, over- & under-

saturation, clipping, coarse binning • Statistical transformations of data are a first-class aspect of the

visualization • Rapid iteration of visual styles & configs, interactive selections and

filtering are key concerns in data exploration

When data is large, you don’t know when the viz is lying.

Page 37: Hassle Free Data Science Apps with Bokeh Webinar

37

Data Shading Pipeline

Data

Project / Synthesize

Scene Aggregates

Sample / Raster Transfer

Image

Visual Abstraction

DataTransforms

VisualMappings

ViewTransforms

Data Tables

Source Data Views

Selection Aggregation Transfer

SignificantSet Aggregates

Page 38: Hassle Free Data Science Apps with Bokeh Webinar

Anaconda Subscriptions and Resources

Page 39: Hassle Free Data Science Apps with Bokeh Webinar

Priority 1 support with Dedicated Customer

Support Rep

ANACONDAENTERPRISE

CONTACT USCONTACT US

ANACONDAPRO

Priority 1 support

DOWNLOAD

ANACONDA

Community Support

FREE FOREVER

Open Source Modern Analytics Platform

Powered by Python

Anaconda with Support & Indemnification

Priority 1 support

ANACONDAWORKGROUP

CONTACT US

Anaconda with High Performance and Team

Collaboration

Anaconda with Scalable High Performance and

Team Collaboration

per year

+ $1,000 per year foradditional users

$10,000Starting at

+ $3,000 per year foradditional users

per year

$30,000Starting at

+ $6,000 per year foradditional users

per year

$60,000Starting at

Anaconda Subscriptions

Page 40: Hassle Free Data Science Apps with Bokeh Webinar

Contact Information and Additional Details

• Contact [email protected] for more information aboutAnaconda subscriptions, consulting, or training

• View documentation and examples at

bokeh.pydata.org

• View demo notebooks on Anaconda Cloud

notebooks.anaconda.org/pwang/

Page 41: Hassle Free Data Science Apps with Bokeh Webinar

Thank you

Email: [email protected]

Twitter: @ContinuumIO

Peter WangTwitter: @pwang

BokehTwitter: @bokehplots