20
Big Data Analysis with Crate and Python Matthias Wahl - developer @ crate.io Email: [email protected]

Big Data Analysis with Crate and Python

Embed Size (px)

DESCRIPTION

Analysing any huge dataset with the help of the crate datastore using the bare crate python client or SQLAlchemy.

Citation preview

Page 1: Big Data Analysis with Crate and Python

Big Data Analysis with Crate and Python

Matthias Wahl - developer @ crate.io !

Email: [email protected]

Page 2: Big Data Analysis with Crate and Python

Crate

shared nothing massively scalable datastore

standing on the shoulders of giants

Page 3: Big Data Analysis with Crate and Python

Crate

get it at: https://crate.io/download

# bash -c "$(curl -L try.crate.io)"

Page 4: Big Data Analysis with Crate and Python

Crate

automatic sharding and replication

(semi-) structured models

single table only

SQL query language

Page 5: Big Data Analysis with Crate and Python

Crate

all common SQL types(and more)

powerful aggregations (‘GROUP BY’)

linear scalability - data and query execution is distributed

basic arithmetics (next release 0.39)

Page 6: Big Data Analysis with Crate and Python

Crate

Page 7: Big Data Analysis with Crate and Python

Aggregation Execution

SELECT station_name, max(temp), avg(temp), min(temp), count(distinct date) FROM weather_de WHERE temp != -999 GROUP BY station_name ORDER BY station_name ASC;

Page 8: Big Data Analysis with Crate and Python

Aggregation Execution

H

M

M

M

R

R

R

collect

Request

Page 9: Big Data Analysis with Crate and Python

Aggregation Execution

H

M

M

M

R

R

R

collect

hash based distribution

Page 10: Big Data Analysis with Crate and Python

Aggregation Execution

H

M

M

M

R

R

R

group results

Page 11: Big Data Analysis with Crate and Python

Aggregation Execution

H

M

M

M

R

R

R

final reduceResponse

Page 12: Big Data Analysis with Crate and Python

Aggregation Execution

Page 13: Big Data Analysis with Crate and Python

Using the python client

>>> from crate.client.http import Client >>> client = Client([“127.0.0.1:4200”]) >>> response = client.sql(“select * from weather_de limit 1”) >>> print(response) { u'duration': 659, u'rowcount': 1, u'rows': [ [1303365600000, 82.0, None, None, None, 0, u'954', 54.1667, 7.45, u'UFS Deutsche Bucht', 60.0, 10.9, 100, 5.2] ], u'cols': [u'date', ...] }

Page 14: Big Data Analysis with Crate and Python

Using SQLAlchemy

>>> import sqlalchemy as sa >>> from sqlalchemy.ext.declarative import declarative_base >>> from sqlalchemy.orm import sessionmaker >>> engine = sa.create_engine(“crate://localhost:4200”) >>> Base = declarative_base()

Page 15: Big Data Analysis with Crate and Python

Using SQLAlchemy

>>> class Weather(Base): ... ... __tablename__ = 'weather_de' ... ... station_id = Column('station_id', String, primary_key=True) ... station_name = Column('station_name', String) ... station_lat = Column('station_lat', Float) ... station_long = Column('station_lon', Float) ... station_height = Column('station_height', Integer) ... date = Column('date', DateTime, primary_key=True) ... temp = Column('temp', Float) ... humility = Column(Float) ... sunshine_hours = Column(Float) ... wind_speed = Column(Float) ... wind_direction = Column(Integer) ... rainfall_fallen = Column(Integer) ... rainfall_height = Column(Float) ... rainfall_form = Column(Integer)

Page 16: Big Data Analysis with Crate and Python

Using SQLAlchemy

>>> from sa import func >>> res = DBSession.query( ... Weather.station_name, ... func.avg(Weather.temp) ... ).group_by(Weather.station_name) ... .order_by(Weather.station_name) ... .limit(10).all()

SELECT station_name, avg(temp) from weather group by station_name order by station_name limit 10;

Page 17: Big Data Analysis with Crate and Python

Using SQLAlchemy

#Average sunshine hours from sqlalchemy.sql import func DBSession.query(func.avg(Weather.sunshine_hours)).scalar() # Average sunshine hours in Konstanz DBSession.query(func.avg(Weather.sunshine_hours)).filter(Weather.station_name==‘Konstanz’).scalar()

Page 18: Big Data Analysis with Crate and Python

Feature Requests

I’m no data scientist

Page 19: Big Data Analysis with Crate and Python

Feature Requests

Please tell us what you would like to see in crate.

I’m no data scientist

Page 20: Big Data Analysis with Crate and Python

CRATE

Thank you

web: https://crate.io/

github: https://github.com/crate

twitter: @cratedata

IRC: #crate

stackoverflow tag: cratedata