Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
A tour (of a small part) of the Python visualizationA tour (of a small part) of the Python visualizationlandscapelandscape
PyHEP Workshop - Abingdon 2019PyHEP Workshop - Abingdon 2019
Philipp Rudiger (@philippjfr)Philipp Rudiger (@philippjfr)
PhD in Computational Neuroscience @ University of EdinburghLeft academia to switch to software engineering immediatelyWorking as consultant for Anaconda Inc. for 4+ yearsSpend most of my time on open-source visualization tools
Anaconda Open Source ToolsAnaconda Open Source ToolsAnaconda employs 17+ open-source developersConda (4), Pandas (2), Dask (2), Numba (4), HoloViz (4), Intake (1), scikit-learn (1)
PackagingPackaging
Used by 18 million users per monthServe 1 PB of packages per month
Data intake & processingData intake & processing
pyviz.org provides an overview over all Python visualization toolsActively working with maintainers to expand guidance and materials
Tools that work well together and with the rest of the ecosystemTake you from exploratory work�ows smoothly to interactive dashboardsFocused on iterating within notebooks but not tied to notebooks
High-level plotting interface like DataFrame.plotRenders Bokeh plots using HoloViewsSupports interactivity and datashadingSupports Pandas, Dask, Streamz, XArray, GeoPandas, NetworkX
In [2]: import dask.dataframe as ddimport hvplot.pandas
In [3]: ddf = dd.read_parquet('/Users/philippjfr/development/datashader/examples/data/nyc_taxi_wide.parq').compute()print('The parquet table has %d rows.\n' % len(ddf))print('The table has the following columns: %s' % list(ddf.columns))
The parquet table has 11842094 rows. The table has the following columns: ['tpep_pickup_datetime', 'tpep_dropoff_datetime', 'passenger_count', 'trip_distance', 'pickup_x', 'pickup_y', 'dropoff_x', 'dropoff_y', 'fare_amount', 'tip_amount', 'dropoff_hour', 'pickup_hour']
In [5]: ddf[(ddf.trip_distance>0) & (ddf.trip_distance<10)].sort_values('pickup_hour').hvplot.violin( 'trip_distance', by='pickup_hour', c='pickup_hour', cmap='Category20')
Out[5]:(ht
Fast 2D histograming/binning using NumbaParallelized using daskSupport for different glyphs including points, lines, rasters, areas, meshes etc.Working on GPU supports in collaboration with NVIDIA
In [6]: ddf.hvplot.scatter( 'pickup_x', 'pickup_y', datashade=True, cmap='viridis', xaxis=False, yaxis=False)
Out[6]:(ht
In [7]: ddf.hvplot.scatter( 'pickup_x', 'pickup_y', groupby='pickup_hour', tiles='Wikipedia', datashade=True, cmap='viridis', xaxis=False, yaxis=False)
Out[7]:
(ht
© contributorsOpenStreetMap (https://www.openstreetmap.org/copyright)
picku
High-level visualization library focused on annotating your data with semanticmetadataAllows building dynamic pipelines to process data as events come inSupport streaming data and many data formats (pandas, dask, xarray, soon cuDF)Renders to different plotting backends including Bokeh, Matplotlib and Plotly
In [8]: import holoviews as hvfrom holoviews.operation.datashader import datashade, dynspread
ds = hv.Dataset(ddf)
geo = datashade(hv.Points(ds, kdims=['pickup_x', 'pickup_y'])).opts( width=600, xaxis=None, yaxis=None)fares = datashade(hv.Points(ds, kdims=['fare_amount', 'tip_amount'])).opts( width=600, xlim=(0, 50), ylim=(0, 10))
link = hv.selection.link_selections.instance()
link(geo + fares)
Out[8]:
Easy apps and dashboards for any viz library.Panel is relatively new library, but built on Bokeh's solid 1.x release.Panel lets all your notebooks double as apps or dashboards.Use just about any plotting library, image type, or other objects.Develop dashboards in a notebook, deploy, revise, repeat.Fully usable with Jupyter and without Jupyter installed.
To dive in, let's say we have a dataset to explore, such as this
.UCI ML dataset measuring the environment in a meeting room(http://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+)
In [9]: import matplotlib.pyplot as plt
data = pd.read_csv('./occupancy.csv', index_col='date', parse_dates=True)data.tail()
Out[9]:Temperature Humidity Light CO2 HumidityRatio Occupancy
date
2015-02-10 09:29:00 21.05 36.0975 433.0 787.250000 0.005579 1
2015-02-10 09:29:59 21.05 35.9950 433.0 789.500000 0.005563 1
2015-02-10 09:30:59 21.10 36.0950 433.0 798.500000 0.005596 1
2015-02-10 09:32:00 21.10 36.2600 433.0 820.333333 0.005621 1
2015-02-10 09:33:00 21.10 36.2000 447.0 821.000000 0.005612 1
And we've written some code that smooths a time series and plots it using Matplotlib withoutliers highlighted:
In [10]: def mpl_plot(avg, highlight): ax = avg.plot() if len(highlight): highlight.plot(style='o', ax=ax) fig = ax.get_figure() plt.close(fig) return fig
def find_outliers(variable='Temperature', window=30, sigma=10, view_fn=mpl_plot): avg = data[variable].rolling(window=window).mean() residual = data[variable] - avg std = residual.rolling(window=window).std() outliers = (np.abs(residual) > std * sigma) return view_fn(avg, avg[outliers])
We can call the function with parameters and get a plot:
In [11]: find_outliers(variable='Temperature', window=20, sigma=10)
It works! But exploring all these parameters by typing Python is slow and tedious. Plus wewant someone else to be able to try it out.
Out[11]:
Let's make a panel instead:
In [12]: pn.interact(find_outliers)
Out[12]: window: 30
sigma: 10
Let's do a bit more work and capture the full range of parameters that can be varied:
In [13]: kw = dict(window=(1, 60), variable=sorted(list(data.columns)), sigma=(1, 20))pn.interact(find_outliers, **kw)
Out[13]: variable
Temperature
window: 30
sigma: 10
Now that we've explored it, let's share it with someone:
In [14]: pn.interact(find_outliers, **kw).show()
Out[14]: <bokeh.server.server.Server at 0x32dfaa750>
The person we are sharing it with comes back and says they don't know how to use thedashboard.
Let's look at it and see how it's made, so we can add some explanatory text:
In [15]: i = pn.interact(find_outliers, **kw)
print(i)
Ah, it's just a column of widgets and a plot.
Column [0] Column [0] Select(name='variable', options=['CO2', 'Humidity', ...], value='Temperature') [1] IntSlider(end=60, name='window', start=1, value=30) [2] IntSlider(end=20, name='sigma', start=1, value=10) [1] Row [0] Matplotlib(Figure, name='interactive05169')
Let's unpack that, rearrange it, add some instructions, and hide one of the widgets that willjust be confusing:
In [16]: text = "<br>\n# Room Occupancy\nSelect the variable, and the time window for smoothing"
p = pn.Row(i[1][0], pn.Column(text, *i[0][:2], width=400))p
Out[16]:
Room OccupancyRoom OccupancySelect the variable, and the time windofor smoothing
variable
Temperature
window: 30
Once we're happy with that in the notebook, we'll share it again:
In [17]: p.show()
Out[17]: <bokeh.server.server.Server at 0x32388c090>
In [18]: p
Note that even widgets in another notebook cell stay linked:
In [19]: i[0][2]
Out[18]:
Room OccupancyRoom OccupancySelect the variable, and the time windofor smoothing
variable
Temperature
window: 30
Out[19]: sigma: 10
Also note that Panel widgets are reactive, so they will update even if you set the values byhand:
In [20]: p
In [21]: i[0][0].value = 'Light'
Out[20]:
Room OccupancyRoom OccupancySelect the variable, and the time windofor smoothing
variable
Temperature
window: 30
Of course, you don't need to use the magic of interact ; it's also easy to make widgets
and link them up by hand:
In [22]: import panel.widgets as pnw
variable = pnw.RadioButtonGroup(name='variable', value='Temperature', options=list(data.columns))window = pnw.IntSlider(name='window', value=10, start=1, end=60)
@pn.depends(variable, window)def reactive_outliers(variable, window): return find_outliers(variable, window, 10)
widgets = pn.Column("<br>\n# Room occupancy", variable, window)occupancy = pn.Row(reactive_outliers, widgets)
In [23]: occupancy
Out[23]:
Room occupancyRoom occupancyTemperature Humidity Light CO2 HumidityR
window: 10
So far we've only used Matplotlib.
What about other libraries? You can use almost anything!
E.g. , a drop-in replacement for Pandas .plot() (and xarray,dask, intake, ...) that gives fully interactive plots in panels:
hvPlot (http://hvplot.pyviz.org)Bokeh (http://bokeh.pydata.org)
In [24]: import hvplot.pandas
def hvplot(avg, highlight): return avg.hvplot(height=300, legend=False) * highlight.hvplot.scatter(color='orange', padding=0.1, legend=False)
text2 = "## Room Occupancy\nSelect the variable and the smoothing values"hvp = pn.interact(find_outliers, view_fn=hvplot, **kw)occupancy_app = pn.Column(pn.Row(pn.panel(text2, width=400), hvp[0]), hvp[1])occupancy_app.servable()
Out[24]:
Room OccupancyRoom OccupancySelect the variable and the smoothingvalues
variable
Temperature
window: 30
sigma: 10
(ht
We can bring in other datasets, dozens of other plotting libraries, linked plots, drilling down,big data, etc., but let's call that done for now.
Can we keep it running as a server inde�nitely now?
Sure, just mark the item to serve with .servable() in the notebook, then run panel serve --show PyHEP2019.ipynb .
As you can see, Panel is designed to support your entire data-analysis work�ow -- work in anotebook, explore freely, deploy freely, do batch or cron jobs, and never have to rewrite touse your code in a new context.
Other FeaturesOther Features
TemplatingTemplating{% extends base %} <!-- goes in body --> {% block postamble %} <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/css/materialize.min.css"> <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"> <script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/1.0.0/js/materialize.min.js"></script> <style type="text/css"> body { background-color: var(--jp-layout-color0); overflow-y: scroll; } .nav-wrapper { background-color: #2f2f2f; } .brand-logo {
In [29]: tmpl = pn.Template(template)
tmpl.add_panel('vega', vega)tmpl.add_panel('plotly', plotly)tmpl.add_panel('holoviews', holoviews)
tmpl.servable()
Out[29]: Template [vega] Row [0] Vega(dict, height=425, sizing_mode='fixed', width=750) [1] Column [0] RangeSlider(end=125, name='Color limits', step=1, value=(0, 40), width=200) [1] IntSlider(end=125, name='IMDB Ratings Bins', step=25, value=60, width=200) [2] IntSlider(end=125, name='Rotten Tomato R..., step=25, value=40, width=200) [plotly] Plotly(Figure) [holoviews] Row [0] Column [0] WidgetBox(css_classes=['widget-box']) [0] FloatSlider(end=1, name='Alpha', start=0, value=1) [1] Select(name='Colormap', options={'fire': <matplotlib.color...}, value=<matplotlib.colors.LinearS...) [2] Select(name='Location', options=['dropof
DemosDemoshttp://localhost:5006/ (http://localhost:5006/)https://glaciers.pyviz.demo.anaconda.com/glaciers(https://glaciers.pyviz.demo.anaconda.com/glaciers)https://webcam-classi�er.pyviz.demo.anaconda.com/app (https://webcam-classi�er.pyviz.demo.anaconda.com/app)
Thank you!
See to see how everything �ts together and go through atutorial!
Come talk to me about any of our open source tools!
Twitter & GitHub: @philippjfr
holoviz.org (https://holoviz.org)