31
Data Science for Developers Quick & Dirty Introduction to Data Science tool’s Ecosystem

Data science for developers

Embed Size (px)

Citation preview

Data Science for DevelopersQuick & Dirty Introduction to Data Science tool’s Ecosystem

@pdelboca @pcelayes

Patricio Del Boca Pablo Celayes

Agenda¿Por qué esta charla?

¿Por qué Data Science?

Data Science

Herramientas

Preguntas

¿Por qué esta charla?

Objetivo

Recorrido por la definición de la disciplina y las principales herramientas que hay para trabajar.

¿Por qué Data Science?

2015

Bajos costos de procesamiento,

muchos datos,

y algoritmos.

Raise of Data Science

“Data Science is a team sport.” DJ Patil - Chief Data Scientist @ White House

Visualizing Nepal’s EarthquakeThe Human Size of Data Science

Diabetic Retinopathy DetectionIdentify signs of diabetic retinopathy in eye

images

Click-Through Rate PredictionPredict whether a mobile ad will be clicked.

Improve HealthcareIdentify patients who will be admitted to a

hospital within the next year using historical claims data.

A taxonomy for Data Science (V 2.0)Methodology Data

ManipulationData

ModelingData

Visualization

Define

Obtain

Scrub

Explore

Model

Interpret

Communicatehttp://www.dataists.com/2010/09/a-taxonomy-of-data-science/

A taxonomy for Data Science (V 2.0)Methodology Data

ManipulationData

ModelingData

Visualization

Define

Obtain

Scrub

Explore

Model

Interpret

Communicatehttp://www.dataists.com/2010/09/a-taxonomy-of-data-science/

A taxonomy for Data Science (V 2.0)Methodology Data

ManipulationData

ModelingData

Visualization

Define X X

Obtain X

Scrub X

Explore X X X

Model X X

Interpret X X X

Communicate X X

http://www.dataists.com/2010/09/a-taxonomy-of-data-science/

Data Scientist Main Toolkit

Define what are you trying to solve

Data Manipulation

dplyr

http://pandas.pydata.org/https://github.com/hadley/dplyr

Data Modeling in R

randomForest

lm

nnet

gbm

e1071

Data Modeling in Python

sklearn.tree.DecisionTreeClassifier

sklearn.linear_model.LinearRegression

sklearn.svm.SVC

sklearn.svm.SVR

sklearn.ensemble.RandomForestClassifier

sklearn.ensemble.GradientBoostingClassifier

Data Modeling

CRAN

https://cran.r-project.org/web/views/MachineLearning.html http://scikit-learn.org/

Data Visualization

R Base Graphics

Lattice

Bokeh

seaborn

pandas.plot()

Data Visualization

+

Power up!

“The best minds of my generation are thinking about how to make people click ads.… that sucks.”

Jeff Hamerbacher - Former Data Scientist @ Facebook

Preguntas?

?

Muchas Gracias!

http://www.meetup.com/es/Encuentros-Data-Science-Cordoba/