28
Spark Cluster with Elasticsearch Inside Oscar Castañeda-Villagrán Universidad del Valle de Guatemala

Spark Summit EU talk by Oscar Castaneda

Embed Size (px)

Citation preview

Page 1: Spark Summit EU talk by Oscar Castaneda

Spark Cluster with Elasticsearch Inside

Oscar Castañeda-Villagrán Universidad del Valle de Guatemala

Page 2: Spark Summit EU talk by Oscar Castaneda

About• Researcher at Universidad del Valle de Guatemala.

• Research Interests: • Program Transformation, • Programming Education Research, • Online Learning to Rank.

Page 3: Spark Summit EU talk by Oscar Castaneda

Spark cluster …

http://bit.ly/2em6RUK

Page 4: Spark Summit EU talk by Oscar Castaneda

Spark cluster with …

http://bit.ly/2em6RUK

Page 5: Spark Summit EU talk by Oscar Castaneda

Spark cluster with Elasticsearch

http://bit.ly/2em6RUKhttp://bit.ly/2ebM9HO

Page 6: Spark Summit EU talk by Oscar Castaneda

Spark cluster with Elasticsearch

http://bit.ly/2em6RUK

Page 7: Spark Summit EU talk by Oscar Castaneda

Inside!Spark cluster with Elasticsearch

Page 8: Spark Summit EU talk by Oscar Castaneda

Agenda• Problem Statement and Motivation.

• Read/Write (internal) ES Server.

• Create ES Server inside Spark Cluster.

• Snapshot/Restore ES indices using S3.

• Demo: IndexTweetsLive on Spark with Elastic inside.

• Q&A

Page 9: Spark Summit EU talk by Oscar Castaneda

Problem Statement

• During development with ES-Hadoop it is cumbersome to have Elasticsearch running outside a Spark cluster.

Page 10: Spark Summit EU talk by Oscar Castaneda

Architecture

Restore ES snapshot

Read CSV files

Take ES snapshot

Restore ES snapshot

http://bit.ly/2e5H1jL

Page 11: Spark Summit EU talk by Oscar Castaneda

Architecture

Restore ES snapshot

Read CSV files

Take ES snapshot

Restore ES snapshot

Dev Ops

http://bit.ly/2e5H1jL

Page 12: Spark Summit EU talk by Oscar Castaneda

Motivation

• Control Elasticsearch instance during development.

• Reduce dependencies between teams during development.

• Use ES snapshots as interface between teams.

• Increase QA efficiency.

Page 13: Spark Summit EU talk by Oscar Castaneda

Native Integration

import org.apache.spark.SparkContext import org.apache.spark.SparkContext._

import org.elasticsearch.spark._

...

val conf = ... val sc = new SparkContext(conf)

val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3) val airports = Map("arrival" -> "Otopeni", "SFO" -> "San Fran")

sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-write

saveToEs("spark/docs")

Write data to Elasticsearch

Page 14: Spark Summit EU talk by Oscar Castaneda

Native Integration

import org.apache.spark.SparkContext import org.apache.spark.SparkContext._

import org.elasticsearch.spark._

...

val conf = ... val sc = new SparkContext(conf)

val RDD = sc.esRDD("radio/artists")

Read data from Elasticsearch

sc.esRDD("radio/artists")

https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-read

Page 15: Spark Summit EU talk by Oscar Castaneda

But where do you run Elasticsearch?

Page 16: Spark Summit EU talk by Oscar Castaneda

Why not run Elasticsearch inside

Spark Cluster? ** At least for development purposes.

Page 17: Spark Summit EU talk by Oscar Castaneda

How do you run Elasticsearch inside

Spark Cluster?

Page 18: Spark Summit EU talk by Oscar Castaneda

Imports

http://bit.ly/2efaib4

http://bit.ly/2di0cFq

http://bit.ly/2ebM9HO

Page 19: Spark Summit EU talk by Oscar Castaneda

Setup Local ES

server.start()

Page 20: Spark Summit EU talk by Oscar Castaneda

Write to Local ES

saveToEs("tweets/hashtags")

Page 21: Spark Summit EU talk by Oscar Castaneda

Check results on local ES

GET

getUrlAsString(“http://10.104.239.70:9200/_cat/indicies?v”)

Page 22: Spark Summit EU talk by Oscar Castaneda

Snapshot to S3

Page 23: Spark Summit EU talk by Oscar Castaneda

Restore from S3

Page 24: Spark Summit EU talk by Oscar Castaneda

Demo!

Page 25: Spark Summit EU talk by Oscar Castaneda

What have we seen?• How to Read/Write (internal) ES Server.

• How to create ES Server inside Spark Cluster.

• How to Snapshot/Restore ES indices using S3.

• Demo: IndexTweetsLive on Spark with Elastic inside.

Page 26: Spark Summit EU talk by Oscar Castaneda

Next Steps• Spark 2.0

• Continuous Applications

• Elasticsearch 5.0

Page 27: Spark Summit EU talk by Oscar Castaneda

Q&A

Page 28: Spark Summit EU talk by Oscar Castaneda

THANK YOU.Email: [email protected] Twitter: @oscar_castaneda