8

Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) Presented by Ankur Dave CS294-110, Fall 2015

Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Download PDF Report

Upload
trinhdiep
View
245
Download
1

Embed Size (px)

Citation preview

Page 1: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Spark SQL: Relational Data Processing in Spark

(SIGMOD 2015)

Presented by Ankur DaveCS294-110, Fall 2015

Page 2: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Problem

Imperative Declarative

⇒

Semi-Structured Data & Advanced Analytics

⇒

No support in existing systems

?

Page 3: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Spark SQL

1. DataFrame API

2. Catalyst Optimizer

Page 4: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

DataFrames• Language-integrated declarative API

• Schema inference for semi-structured data• Allows direct access to JVM objects• Unlike other declarative systems

• Supports complex types like vectors• Easy to add new types

⇒

Page 5: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Catalyst Optimizer• Rules written using Scala pattern matching

• Allows writing complex rules• Example: join elimination

• Exploits structure of data source• Example: predicate pushdown for Parquet• Easy to add new data sources (e.g., Succinct)

• Code generation for expression evaluation

• Cost-based join algorithm selection (shuffle vs. BitTorrent broadcast)

Page 6: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Evaluation• Gains over Spark and Shark due to code generation• Optimizer less mature than Impala’s

• More important: ease of use and extensibility

Page 7: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Why Spark SQL?Easy to

Program?Advanced Analytics?

No

No

Yes

Yes

Yes

?

Yes

?

Page 8: Spark SQL: Relational Data Processing in Spark (SIGMOD 2015) · Spark SQL: Relational Data Processing in Spark ... SQL AST Physical Planning Code Generation ... cassandra and more

Lasting Impact?

• Extensible optimizer

• Language integration

CS 338The Relational Model2-1 The Relational Model Lecture Topics Overview of SQL Underlying relational model Relational database structure SQL DDL and

CS 338The Relational Model2-1 The Relational Model Lecture Topics Overview of SQL Underlying relational model Relational database structure SQL DDL and

Documents

Spark SQL Deep Dive @ Melbourne Spark Meetup

Spark SQL Deep Dive @ Melbourne Spark Meetup

Software

Meetup Real Time Aggregations Spark Streaming + Spark Sql

Meetup Real Time Aggregations Spark Streaming + Spark Sql

Data & Analytics

Spark SQL | Apache Spark

Spark SQL | Apache Spark

Technology

Spark Concepts - Spark SQL, Graphx, Streaming

Spark Concepts - Spark SQL, Graphx, Streaming

Software

Relational Algebra, Relational Calculus, and SQL

Relational Algebra, Relational Calculus, and SQL

Documents

Databases 2010 The Relational Model and SQLmis/dDB/sql-2010-1.pdf · Databases 2010 The Relational Model and SQL ... The Relational Model and SQL. 14. Advantages of The Relational

Databases 2010 The Relational Model and SQLmis/dDB/sql-2010-1.pdf · Databases 2010 The Relational Model and SQL ... The Relational Model and SQL. 14. Advantages of The Relational

Documents

Intro to Spark and Spark SQL

Intro to Spark and Spark SQL

Software

Relational Algebra Maybe -- SQL

Relational Algebra Maybe -- SQL

Documents

Object-Relational SQL

Object-Relational SQL

Documents

Spark SQL: Relational Data Processing in Sparkmatei/papers/2015/sigmod_spark_sql.pdf · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny, Cheng Liany,

Spark SQL: Relational Data Processing in Sparkmatei/papers/2015/sigmod_spark_sql.pdf · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny, Cheng Liany,

Documents

Spark SQL : Relational Data Processing in Spark · Apache Spark is a general-purpose cluster computing engine with APIs in Scala, Java and Python and libraries for streaming, graph

Spark SQL : Relational Data Processing in Spark · Apache Spark is a general-purpose cluster computing engine with APIs in Scala, Java and Python and libraries for streaming, graph

Documents

Learning spark ch09 - Spark SQL

Learning spark ch09 - Spark SQL

Education

SQL Relational Database Architecture

SQL Relational Database Architecture

Documents

Spark SQL: Relational Data Processing in Sparkalig/papers/sparksql.pdfwhich modiﬁed the Apache Hive system to run on Spark and im-plemented traditional RDBMS optimizations, such

Spark SQL: Relational Data Processing in Sparkalig/papers/sparksql.pdfwhich modiﬁed the Apache Hive system to run on Spark and im-plemented traditional RDBMS optimizations, such

Documents

Relational Databases and SQL

Relational Databases and SQL

Documents

Spark and Spark SQL - Amir H. Payberah · Spark and Spark SQL Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah (SICS) Spark and Spark SQL June 29, 2016 1 / 71

Spark and Spark SQL - Amir H. Payberah · Spark and Spark SQL Amir H. Payberah [email protected] SICS Swedish ICT Amir H. Payberah (SICS) Spark and Spark SQL June 29, 2016 1 / 71

Documents

Spark SQL: Relational Data Processing in Sparkweiwa/teaching/Fall16-COMP6611B/... · 2016-09-06 · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny,

Spark SQL: Relational Data Processing in Sparkweiwa/teaching/Fall16-COMP6611B/... · 2016-09-06 · Spark SQL: Relational Data Processing in Spark Michael Armbrusty, Reynold S. Xiny,

Documents

SQL and Relational Algebra - Undergraduate Coursescourses.cs.vt.edu/.../lecture-03-intro-sql-relational-algebra.pdf · Introduction to RA and SQL Queries and Operations What is Relational

SQL and Relational Algebra - Undergraduate Coursescourses.cs.vt.edu/.../lecture-03-intro-sql-relational-algebra.pdf · Introduction to RA and SQL Queries and Operations What is Relational

Documents

Spark SQL: Relational Data Processing in Spark · PDF fileSpark SQL: Relational Data Processing in Spark Michael Armbrust†, Reynold S. Xin†, Cheng Lian†, Yin Huai†, Davies

Spark SQL: Relational Data Processing in Spark · PDF fileSpark SQL: Relational Data Processing in Spark Michael Armbrust†, Reynold S. Xin†, Cheng Lian†, Yin Huai†, Davies

Documents

Dev Sql Beyond Relational

Dev Sql Beyond Relational

Technology

Spark & Spark SQL

Spark & Spark SQL

Documents

Spark vs. PL/SQL

Spark vs. PL/SQL

Technology

z/OS Platform for Apache Spark Install and Usage (Session ... · Spark SQL • APIs for working with structured and semi-structured data • Provide for relational queries expressed

z/OS Platform for Apache Spark Install and Usage (Session ... · Spark SQL • APIs for working with structured and semi-structured data • Provide for relational queries expressed

Documents

Intro to Spark and Spark SQL - cseweb.ucsd.edu

Intro to Spark and Spark SQL - cseweb.ucsd.edu

Documents

Spark streaming , Spark SQL

Spark streaming , Spark SQL

Data & Analytics

Spark sql meetup

Spark sql meetup

Internet

Spark SQL: Relational Data Processing in Sparkweb.eecs.umich.edu/.../eecs582/armbrust15sparksql.pdfchine learning types, and query federation to external databases) tailored for the

Spark SQL: Relational Data Processing in Sparkweb.eecs.umich.edu/.../eecs582/armbrust15sparksql.pdfchine learning types, and query federation to external databases) tailored for the

Documents

Relational Algebra and SQL

Relational Algebra and SQL

Documents

Simba ODBC Driver with SQL Connector for Apache Spark ... · Simba ODBC Driver with SQL Connector for Apache Spark is used for direct SQL and Spark SQL access to Apache Hadoop / Spark

Simba ODBC Driver with SQL Connector for Apache Spark ... · Simba ODBC Driver with SQL Connector for Apache Spark is used for direct SQL and Spark SQL access to Apache Hadoop / Spark

Documents