79
Apache Spark with Hortonworks Data Pla5orm Saptak Sen [@saptak] Technical Product Manager © Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 1

Apache Spark with Hortonworks Data Platform - Seattle Meetup

Embed Size (px)

Citation preview

Page 1: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Apache Spark with Hortonworks Data Pla5orm

Saptak Sen [@saptak]

Technical Product Manager

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 1

Page 2: Apache Spark with Hortonworks Data Platform - Seattle Meetup

What we will do?• Setup

• YARN Queues for Spark

• Zeppelin Notebook

• Spark Concepts

• Hands-On

Get the slides from: h/p://l.saptak.in/sparksea/le

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 2

Page 3: Apache Spark with Hortonworks Data Platform - Seattle Meetup

What is Apache Spark?

A fast and general engine for large-scale data processing.

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 3

Page 4: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Setup

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 4

Page 5: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Download Hortonworks Sandbox

h"p://hortonworks.com/sandbox© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 5

Page 6: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Configure Hortonworks Sandbox

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 6

Page 7: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Open network se,ngs for VM

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 7

Page 8: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Open port for Zeppelin

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 8

Page 9: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Login to Ambari

Ambari is at port 8080. Username and password is admin/admin

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 9

Page 10: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Open YARN Queue Manager

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 10

Page 11: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Create a dedicated queue for Spark

Set name to root.spark, Capacity to 50% and Max Capacity to 95%

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 11

Page 12: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Save and Refresh Queues

Adjust capacity of the default queue to 50% if required before saving

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 12

Page 13: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Open Apache Spark configura1on

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 13

Page 14: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Set the 'spark' queue as the default queue for Spark

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 14

Page 15: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Restart All Affected Services

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 15

Page 16: Apache Spark with Hortonworks Data Platform - Seattle Meetup

SSH into your Sandbox

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 16

Page 17: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Install the Zeppelin Service for Ambari

VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`sudo git clone https://github.com/hortonworks-gallery/ambari-zeppelin-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/ZEPPELIN

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 17

Page 18: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Restart Ambari to enumerate the new Services

service ambari restart

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 18

Page 19: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Go to Ambari to Add the Zeppelin Service

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 19

Page 20: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Select Zeppelin Notebook

click Next and select default configura4ons unless you know what you are doing

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 20

Page 21: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Wait for Zeppelin deployment to complete

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 21

Page 22: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Launch Zeppelin from your browser

Zeppelin is at port 9995

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 22

Page 23: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Concepts

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 23

Page 24: Apache Spark with Hortonworks Data Platform - Seattle Meetup

RDD• Resilient

• Distributed

• Dataset

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 24

Page 25: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Program Flow

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 25

Page 26: Apache Spark with Hortonworks Data Platform - Seattle Meetup

SparkContext• Created by your driver Program

• Responsible for making your RDD's resilient and Distributed

• Instan<ates RDDs

• The Spark shell creates a "sc" object for you

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 26

Page 27: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Crea%ng RDDs

myRDD = parallelize([1,2,3,4])myRDD = sc.textFile("hdfs:///tmp/shakepeare.txt")# file://, s3n://

hiveCtx = HiveContext(sc)rows = hiveCtx.sql("SELECT name, age from users")

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 27

Page 28: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Crea%ng RDDs• Can also create from:

• JDBC

• Cassandra

• HBase

• Elas6csearch

• JSON, CSV, sequence files, object files, ORC, Parquet, Avro

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 28

Page 29: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Passing Func+ons to Spark

Many RDD methods accept a func3on as a parameter

myRdd.map(lambda x: x*x)

is the same things as

def sqrN(x): return x*x

myRdd.map(sqrN)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 29

Page 30: Apache Spark with Hortonworks Data Platform - Seattle Meetup

RDD Transforma,ons• map

• flatmap

• filter

• dis.nct

• sample

• union, intersec.on, subtract, cartesian

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 30

Page 31: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Map example

myRdd = sc. parallelize([1,2,3,4])myRdd.map(lambda x: x+x)

• Results: 1, 2, 6, 8

Transforma)ons are lazily evaluated

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 31

Page 32: Apache Spark with Hortonworks Data Platform - Seattle Meetup

RDD Ac&ons• collect

• count

• countByValue

• take

• top

• reduce

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 32

Page 33: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Reduce

sum = rdd.reduce(lambda x, y: x + y)

Aggregate

sumCount = nums.aggregate( (0, 0), (lambda acc, value: (acc[0] + value, acc[1] + 1)), (lambda acc1 , acc2: (acc1[0] + acc2[0], acc1[1] + acc2[1])))

return sumCount[0] / float(sumCount[1])

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 33

Page 34: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Imports

from pyspark import SparkConf, SparkContextimport collections

Setup the Context

conf = SparkConf().setMaster("local").setAppName("foo")sc = SparkContext(conf = conf)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 34

Page 35: Apache Spark with Hortonworks Data Platform - Seattle Meetup

First RDD

lines = sc.textFile("hdfs:///tmp/shakepeare.txt")

Extract the Data

ratings = lines.map(lambda x: x.split()[2])

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 35

Page 36: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Aggregate

result = ratings.countByValue()

Sort and Print the results

sortedResults = collections.OrderedDict(sorted(result.items()))

for key, value in sortedResults.iteritems(): print "%s %i" % (key, value)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 36

Page 37: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Run the Program

spark-submit ratings-counter.py

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 37

Page 38: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Key, Value RDD

#list of key, value RDDs where the value is 1.totalsByAge = rdd.map(lambda x: (x, 1))

With Key, Value RDDs, we can:- reduceByKey()- groupByKey()- sortByKey()- keys(), values()

rdd.reduceByKey(lambda x, y: x + y)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 38

Page 39: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Set opera)ons on Key, Value RDDs

With two lists of key, value RDDs, we can:

• join

• rightOuterJoin

• le/OuterJoin

• cogroup

• subtractByKey

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 39

Page 40: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Mapping just values

When mapping just values in a Key, Value RDD, it is more efficient to use:

• mapValues()

• flatMapValues()

as it maintains the original par//oning.

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 40

Page 41: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Persistence

result = input.map(lambda x: x * x)result.persist().is_cachedprint(result.count())print(result.collect().mkString(","))

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 41

Page 42: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Hands-On

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 42

Page 43: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Download the Data

#remove existing copies of dataset from HDFShadoop fs -rm /tmp/kddcup.data_10_percent.gz

wget http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz \-O /tmp/kddcup.data_10_percent.gz

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 43

Page 44: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Load data in HDFS

hadoop fs -put /tmp/kddcup.data_10_percent.gz /tmphadoop fs -ls -h /tmp/kddcup.data_10_percent.gz

rm /tmp/kddcup.data_10_percent.gz

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 44

Page 45: Apache Spark with Hortonworks Data Platform - Seattle Meetup

First RDD

input_file = "hdfs:///tmp/kddcup.data_10_percent.gz"

raw_rdd = sc.textFile(input_file)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 45

Page 46: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Retrieving version and configura1on informa1on

sc.versionsc.getConf.get("spark.home")System.getenv().get("PYTHONPATH")System.getenv().get("SPARK_HOME")

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 46

Page 47: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Count the number of rows in the dataset

print raw_rdd.count()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 47

Page 48: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Inspect what the data looks like

print raw_rdd.take(5)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 48

Page 49: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Spreading data across the cluster

a = range(100)

data = sc.parallelize(a)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 49

Page 50: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Filtering lines in the data

normal_raw_rdd = raw_rdd.filter(lambda x: 'normal.' in x)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 50

Page 51: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Count the filtered RDD

normal_count = normal_raw_rdd.count()

print normal_count

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 51

Page 52: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Impor&ng local libraries

from pprint import pprint

csv_rdd = raw_rdd.map(lambda x: x.split(","))

head_rows = csv_rdd.take(5)

pprint(head_rows[0])

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 52

Page 53: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Using non-lambda func1ons in transforma1on

def parse_interaction(line): elems = line.split(",") tag = elems[41] return (tag, elems)

key_csv_rdd = raw_rdd.map(parse_interaction)head_rows = key_csv_rdd.take(5)pprint(head_rows[0]

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 53

Page 54: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Using collect() on the Spark driver

all_raw_rdd = raw_rdd.collect()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 54

Page 55: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Extrac'ng a smaller sample of the RDD

raw_rdd_sample = raw_rdd.sample(False, 0.1, 1234)print raw_rdd_sample.count()print raw_rdd.count()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 55

Page 56: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Local sample for local processing

raw_data_local_sample = raw_rdd.takeSample(False, 1000, 1234)

normal_data_sample = [x.split(",") for x in raw_data_local_sample if "normal." in x]

normal_data_sample_size = len(normal_data_sample)

normal_ratio = normal_data_sample_size/1000.0

print normal_ratio

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 56

Page 57: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Subtrac(ng on RDD from another

attack_raw_rdd = raw_rdd.subtract(normal_raw_rdd)

print attack_raw_rdd.count()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 57

Page 58: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Lis$ng a dis$nct list of protocols

protocols = csv_rdd.map(lambda x: x[1]).distinct()

print protocols.collect()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 58

Page 59: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Lis$ng a dis$nct list of services

services = csv_rdd.map(lambda x: x[2]).distinct()print services.collect()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 59

Page 60: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Cartesian product of two RDDs

product = protocols.cartesian(services).collect()

print len(product)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 60

Page 61: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Filtering out the normal and a0ack events

normal_csv_data = csv_rdd.filter(lambda x: x[41]=="normal.")attack_csv_data = csv_rdd.filter(lambda x: x[41]!="normal.")

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 61

Page 62: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Calcula&ng total normal event dura&on and a1ack event dura&on

normal_duration_data = normal_csv_data.map(lambda x: int(x[0]))total_normal_duration = normal_duration_data.reduce(lambda x, y: x + y)print total_normal_duration

attack_duration_data = attack_csv_data.map(lambda x: int(x[0]))total_attack_duration = attack_duration_data.reduce(lambda x, y: x + y)print total_attack_duration

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 62

Page 63: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Calcula&ng average dura&on of normal and a1ack events

normal_count = normal_duration_data.count()attack_count = attack_duration_data.count()

print round(total_normal_duration/float(normal_count),3)print round(total_attack_duration/float(attack_count),3)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 63

Page 64: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Use aggegrate() func/on to calculate the average in a single scan

normal_sum_count = normal_duration_data.aggregate( (0,0), # the initial value (lambda acc, value: (acc[0] + value, acc[1] + 1)), # combine val/acc (lambda acc1, acc2: (acc1[0] + acc2[0], acc1[1] + acc2[1])))

print round(normal_sum_count[0]/float(normal_sum_count[1]),3)

attack_sum_count = attack_duration_data.aggregate( (0,0), # the initial value (lambda acc, value: (acc[0] + value, acc[1] + 1)), # combine value with acc (lambda acc1, acc2: (acc1[0] + acc2[0], acc1[1] + acc2[1])) # combine accumulators)

print round(attack_sum_count[0]/float(attack_sum_count[1]),3)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 64

Page 65: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Crea%ng a key, value RDD

key_value_rdd = csv_rdd.map(lambda x: (x[41], x))

print key_value_rdd.take(1)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 65

Page 66: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Reduce by key and aggregate

key_value_duration = csv_rdd.map(lambda x: (x[41], float(x[0])))durations_by_key = key_value_duration.reduceByKey(lambda x, y: x + y)

print durations_by_key.collect()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 66

Page 67: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Count by key

counts_by_key = key_value_rdd.countByKey()print counts_by_key

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 67

Page 68: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Combine by Key to get dura1on and a2empts by type

sum_counts = key_value_duration.combineByKey( (lambda x: (x, 1)), # the initial value, with value x and count 1 (lambda acc, value: (acc[0]+value, acc[1]+1)), # how to combine a pair value with the accumulator: sum value, and increment count (lambda acc1, acc2: (acc1[0]+acc2[0], acc1[1]+acc2[1])) # combine accumulators)

print sum_counts.collectAsMap()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 68

Page 69: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Sorted average dura,on by type

duration_means_by_type = sum_counts\ .map(lambda (key,value): (key,\ round(value[0]/value[1],3))).collectAsMap()

# Print them sortedfor tag in sorted(duration_means_by_type, key=duration_means_by_type.get, reverse=True): print tag, duration_means_by_type[tag]

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 69

Page 70: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Crea%ng a Schema and crea%ng a DataFrame

row_data = csv_rdd.map(lambda p: Row( duration=int(p[0]), protocol_type=p[1], service=p[2], flag=p[3], src_bytes=int(p[4]), dst_bytes=int(p[5]) ))

interactions_df = sqlContext.createDataFrame(row_data)interactions_df.registerTempTable("interactions")

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 70

Page 71: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Quering with SQL

SELECT duration, dst_bytes FROM interactionsWHERE protocol_type = 'tcp'AND duration > 1000AND dst_bytes = 0

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 71

Page 72: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Prin%ng the Schema

interactions_df.printSchema()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 72

Page 73: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Coun%ng events by protocol

interactions_df.select("protocol_type", "duration", "dst_bytes")\ .groupBy("protocol_type").count().show()

interactions_df.select("protocol_type", "duration",\ "dst_bytes").filter(interactions_df.duration>1000)\ .filter(interactions_df.dst_bytes==0)\ .groupBy("protocol_type").count().show()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 73

Page 74: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Modifying the labels

def get_label_type(label): if label!="normal.": return "attack" else: return "normal"

row_labeled_data = csv_rdd.map(lambda p: Row( duration=int(p[0]), protocol_type=p[1], service=p[2], flag=p[3], src_bytes=int(p[4]), dst_bytes=int(p[5]), label=get_label_type(p[41]) ))interactions_labeled_df = sqlContext.createDataFrame(row_labeled_data)

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 74

Page 75: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Coun%ng by labels

interactions_labeled_df.select("label", "protocol_type")\ .groupBy("label", "protocol_type").count().show()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 75

Page 76: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Group by label and protocol

interactions_labeled_df.select("label", "protocol_type")\ .groupBy("label", "protocol_type").count().show()

interactions_labeled_df.select("label", "protocol_type", "dst_bytes")\ .groupBy("label", "protocol_type",\ interactions_labeled_df.dst_bytes==0).count().show()

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 76

Page 77: Apache Spark with Hortonworks Data Platform - Seattle Meetup

Ques%ons

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 77

Page 78: Apache Spark with Hortonworks Data Platform - Seattle Meetup

More resources

• More tutorials: h/p://hortonworks.com/tutorials

• Hortonworks gallery: h/p://hortonworks-gallery.github.io

My co-ordinates

• Twi%er: @saptak

• Website: h%p://saptak.in

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 78

Page 79: Apache Spark with Hortonworks Data Platform - Seattle Meetup

© Crea've Commons A.ribu'on-ShareAlike 3.0 Unported License 79