Clydesdale: Structured Data Processing on MapReduce Jackie

Clydesdale: Structured Data Processing on

MapReduceJackie

Unmodified Hadoop Aim at workload

where the data fit s a star schema

Draw on existing techniques: columnar storage, tailored join plans, block iteration

Introduction

Background Clydesdale architecture challenges experiment

Outline

InputFormats and OutputFormats InputFormat implements two methods: getSplits(),

getRecordReader()

MapRunners Schedules JVM reuse

Background

Clydesdale Architecture

Avoid I/O for columns that are not used Store each column in separate HDFS file ColumnInputFormat ensures that different

columns in a row are co-located at datanode

Columnar Storage

Sql-like structured data processing Map phase is responsible for joining the fact

table with the dimension tables Reduce phase is responsible for the grouping

and aggregation

Join Strategy

Flow of Clydesdale’s Join Job

Consider the following query

Examples

Map phase Build hashtable for each dimension table using

predicates Maptask checks whether the input was in the

hashtables Output the record that satisfies the join conditions Key from the subset of columns needed for grouping.

Reduce phase Aggregate the values of the same key

Sort at client

Execution process

Pseudocode for the Query

Exploit multi-core parallelism Single map task per node Uses a custom MapRunner class to run a multi-threaded

map task Using MultiCIF packs multiple input splits into a single

multi-split Shared across consecutive map tasks that run on the

same node

Task scheduling Block iteration

Optimizing for the Native Implementation

Pseudocode for MapRunner

Schedule only one map task from the join job on a given node

Schedule subsequent map tasks on the node where the dimension hash table has already been built

Communicate to the map task the number of slots, or processor cores it can use on the node

Task Scheduling

High per-row overheads B-CIF: return an array of rows over the same

Block Iteration

Support two join plans: repartition join, mapjoin

Reparttion join is a robust technique that works with any combination of sizes of tables

Mapjoin is designed for one table that is significantly samller than the other

Hive Background

Hive’s Mapjoin plan

• SQL-Logical Plan-Physical Plan-MR Workflow• Workflow with Six Jobs

Hive:SQL-Like Language

Cluster Cluster A ： 9 nodes, two quad-core processors,16G

memory, 8*250G disk, 1G ethernet switch Cluster B: 42nodes, two quad-core processors, 32G

memory 5*500G disk 1G ethernet switch Clydesdale on hadoop 0.21 and Hive on

hadoop0.20.2 Workload Storage Format: Clydesdale fact tables were

stored in Multi-CIF, Hive is RCFile

Experimental Setup

Comparison with Hive

Hive joins one dimension table at a time with the fact table

Hive maintains many copies of the hash table

Hive creates the hash tables on a single node and pays the cost of disseminating it to the entire cluster

Each task in Hive has to load and deserialize the hash table when it starts.

Result Analysis

Analysis of Clydesdale

Limitations

THANK YOU

Clydesdale: Structured Data Processing on MapReduce Jackie

Documents

NORTH CLYDESDALE FORESTS · 2018-03-14 · Scottish Lowlands FD NORTH CLYDESDALE FORESTS SOILS Scale: 18,000 @ A0 0 345 690 1,380 2,070 2,760 Meters 14 Jun 2017 Legend North Clydesdale

NORTH CLYDESDALE FORESTS - scotland.forestry.gov.ukscotland.forestry.gov.uk/images/corporate/design-plans/scottish... · North Clydesdale Forests

By Alexa Klipp, Bailey Rolfing, and Jessica Clydesdale

World Clydesdale Show 2022 · 2019. 8. 16. · World Clydesdale Show 2022 20th - 23rd October 2022 At P&J Live, Aberdeen • Ad space and recog-nition in pre-and post - Clydesdale

CLYDESDALE MARKETPLACE CONTACT MARKETPLACE... · Clydesdale Marketplace is a 200,000 square foot community shopping center strategically located on the heavily traveled intersection

MapReduce a distribuovane´ vy´pocˇtyufal.mff.cuni.cz/~straka/courses/npfl102/mapreduce_slides_2009.pdf · MapReduce Implementace MapReduce Google MapReduce Hadoop Phoenix Mars

CLYDESDALE TRIBUNEclydesdale.org.nz/news/tribunemarch10.pdf · 2017. 6. 7. · CLYDESDALE HORSE SOCIETY OF NEW ZEALAND (INCORPORATED) PRESIDENT W L AFFLECK Tapanui/Waikoikoi Road

Hadoop/MapReduce - 123seminarsonly.comHadoop MapReduce • MapReduce is a programming model and software framework first developed by Google (Google’s MapReduce paper submitted in

Astrology in Ficino's Epistolae - Ruth Clydesdale

MapReduce & Hadoop IIcslui/CMSC5702/mapreduce_hadoop2.pdf · MapReduce & Hadoop II ... MapReduce & Hadoop MapReduce Recap ... example, the combiners aggregate term counts across the

MapReduce. MapReduce Outline MapReduce Architecture MapReduce Internals MapReduce Examples JobTracker Interface

MapReduce-MPI Library Users Manualmapreduce.sandia.gov/doc/Manual.pdf · MapReduce-MPI WWW Site - MapReduce-MPI Documentation What is a MapReduce? The canonical example of a MapReduce

This is America David T Clydesdale

CLYDESDALE · 2018. 12. 13. · CLYDESDALE All renderings and illustrations are artist’s concept and may vary from actual home. Specifications, terms, prices, features, dimensions,

Holy is He Clydesdale-Cloninger

CLYDESDALE - Jamieson Equipment Co., Inc.catalog.jamiesonequipment.com/Asset/Stephens Mfg... · 2019. 7. 2. · Clydesdale Portable Batch Plant Specifications Why buy a Stephens?

MapReduce vs Pig | MapReduce Pig Integration

Clydesdale Bank - UK Banknote Brochure

CLYDESDALE TRIBUNE · 2018. 9. 2. · Alan Vliet Vlieland with Goldenlane Jessie (11595) 2016 Canterbury Show Supreme Champion Clydesdale. Paul Power with Donnybrook Lana (11494)

The Ex-Lord Angus Clydesdale Targa Florio, Monza and Spa ... · The Spa 1000km, a week later, saw Clydesdale, this time driving with Terry Hunter and entered by Lord Clydesdale himself,