20
Introduction to R, Python, and Flow Amy Wang [email protected]

H2O World - Intro to R, Python, and Flow - Amy Wang

Embed Size (px)

Citation preview

  1. 1. Introduction to R, Python, and Flow Amy Wang [email protected]
  2. 2. Getting Started with H2O Learn how R, Flow, and Python sends commands to compute in H2O FAQ on writing R, Flow, and Python expressions Hands on introduction into data science Understanding model outputs Note the limitations of the basic workflow to improve upon later Objective
  3. 3. I have H2O Installed I have Python installed I have R installed I have the H2O World data sets Pick up stickers or get install help at the information booth
  4. 4. Reading Data into H2O with R R User h2o_df = h2o.importFile(../data/allyears2k.csv) STEP 1
  5. 5. Reading Data into H2O with R STEP 2 H2O H2O H2O H2O ClusterInitiate distributed ingest 2.3 HTTP REST API request to H2O carries the path argument 2.2 allyears2k.csvRequest data 2.4 R h2o.importFile() 2.1 R function call
  6. 6. Reading Data into H2O with R Data Provided 3.1 allyears2k.csv R Cluster IP Cluster Port Pointer to Data 3.4 h2o_df object created in R h2o_df H2O H2O H2O H2O Frame 3.2 Distributed H2O Frame in DKV H2O Cluster Return pointer to data in REST API JSON Response 3.3 STEP 3
  7. 7. R Script Starting H2O GLM HTTP REST/JSON .h2o.startModelJob() POST /3/ModelBuilders/glm h2o.glm() R script Standard R process TCP/IP HTTP REST/JSON /3/ModelBuilders/glm endpoint Job GLM algorithm GLM tasks Fork/Join framework K/V store framework H2O process Network layer REST layer H2O - algos H2O - core User process H2O process Legend
  8. 8. R Parity
  9. 9. Hands on Introduction to Running H2O Import & parse a small 44000 row airlines dataset Run a Logistic Regression Build a Deep Learning model Review the Model Outputs
  10. 10. Starting up H2O and Preloaded Workbook From terminal, change the directory to where h2o.jar file is sitting and run: > java -jar h2o.jar Then access the Flow UI at: https://localhost:54321 Open the intro-to-r.md.R file and run from R (Native R or Rstudio): library(h2o) h2o.init(nthreads = -1) Open either intro-to-python.ipynb or intro-to-python.py with python: import h2o import h2o.init() Flow Users R Users Python Users
  11. 11. Load up Preinstalled Flow Pack For Flow Users
  12. 12. Import Airlines Data into H2O importFiles [ "https://s3.amazonaws.com/h2o-airlines-unpacked/allyears2k.csv" ] setupParse paths: [ "https://s3.amazonaws.com/h2o-airlines-unpacked/ allyears2k.csv" ] airlines.hex