Cloud Computing Other High-level parallel processing languages Keke Chen

Cloud Computing

Other High-level parallel processing languages

Keke Chen

Outline sawzall Dryad and DraydLINQ (MS, abandoned) Hive

Sawzall Simplify mapreduce programming Filters + aggregator

mapper reducer

Example

mappers

reducers

Convert the input record to float

input Sawzall program works on a single

record As a filter filtering through the data stream

Input can be parsed to Values, e.g., float Data structurex: float = input;(variable : type = input)

aggregators definition

table agg_name of data_type/variable

Examples: c: table collection of string; S: table sample(100) of string; S: table sum of {count: int, revenue: float}

More aggregators Maximum, quantile, top, unique

Indexed aggregators similar to “group by”, the index is group

id Example

t1: table sum[country: string] of intcountry: string = input;Emit t1[country] <- 1;

More example

Proto “querylog.proto”queries_per_degree: table sum[lat: int]

[lon:int] of int;Log_record: queryLogProto = input;Loc: Location = locationinfo(log_record.ip);Emit queries_per_degree[int(loc.lat)]

[int(loc.lon)]<-1

Performance

Single-CPU speed, Also 51 times slower than compiled C++

Performance

Dryad and DryadLINQ Dryad provides a low-level parallel data

flow processing interface Acyclic data flow graphs Data communication methods include pipes,

file-based, message, shared-memory

DryadLINQ A high level language for app developers It hides the data flow details

Job = Directed Acyclic Graph

Processingvertices Channels

(file, pipe, shared memory)

Inputs

Outputs

Runtime

Services Name server Daemon

Job Manager Centralized coordinating process User application to construct graph Linked with Dryad libraries for scheduling

vertices Vertex executable

Dryad libraries to communicate with JM User application sees channels in/out Arbitrary application code, can use local FS

Graph operators

Hive Developed by facebook (open source) Mimic SQL language Built on hadoop/mapreduce

Hive data model: table etc. Table

Similar to DB table stored in hadoop directories Builtin compression, serialization/deserialization

Partitions Groups in the table Subdirectory in the table directory

Buckets Files in the partition directory Key (column) based partition

/table/partition/bucket1

Hive data model: Column type integers, floating point numbers, generic

strings, dates and booleans nestable collection types: array and

Architecture

Metastore stores the schema of databases. It uses non HDFSdata store

Query processing Steps (similar to DBMS)

Parse Semantic analyzer Logical plan generator (algebra tree) Optimizer Physical plan generator (to mapreduce jobs)

Operations: DDL and DML HiveQL: SQL like, with slightly different

syntax User defined filtering and aggregation

functions Java only

Map/reduce plugin for streaming process Implemented with any language

Example Facebook status updates

Table: status_updates(userid int, status string,ds string) profiles(userid int,school string,gender int)

Operations Load data

LOAD DATA LOCAL INPATH `/logs/status_updates‘ INTO TABLE status_updates PARTITION (ds='2009-03-20')

Count status updates by school and by gender

More query examples

Query examples

Query examples – using hadoopstreaming

Cloud Computing Other High-level parallel processing languages Keke Chen

Documents

Cloud Computing Cloud Data Serving Systems Keke Chen

Keke Húmedo de Chocolate

Control Strategies for Plenum Optimization in Raised Floor Data Centers · 2018-09-13 · Control Strategies for Plenum Optimization in Raised Floor Data Centers Keke Chen, Clifford

Learning Regular Languages and Automaton Graphs Regular Languages and Automaton Graphs Dongqu Chen 2016 Learning regular languages has long been a fundamental topic in computational

Kestävän kehityksen työkalujahyria.smartpage.fi/fi/keke-projektiesite/files/Keke... · 2012. 12. 10. · kehityksen ammattiosaamisen sisällöt, jotka painottuvat eri koulutusaloilla

Keke kriteerit –hankkeen tavoitteita

Cloud Computing Introduction to virtualization Keke Chen

Threat Modeling for Cloud Computing (some slides are borrowed from Dr. Ragib Hasan) Keke Chen 1

Keke Diseño d Planta

Vista: Look Into the Clusters in Very Large …Vista: Look Into the Clusters in Very Large Multidimensional Datasets Keke Chen Ling Liu College of Computing Georgia Institute of Technology

Keke qr-suunnistus

Keke Tarwi Ultimate!!!!!!!!

agribangui.files.wordpress.comID anduru keke (SEMENCES VEGETATIVES). Edition Oct 94 Anguangua ti kobe so aga na lege ti nduru keke, kugbe wala gonda ti keke: airi so bouture; mais

Hydrocarbon reservoir characterization of “Keke” field ... · PDF fileHydrocarbon reservoir characterization of “Keke” field, ... exploration work. Keywords: 3D seismic,

keke tugas.docx

BANLEN (keke de plátano y lentejas)

Cloud Computing Resource provisioning Keke Chen. Outline For Web applications statistical Learning and automatic control for datacenters For data

PKMK Stick Keke

CEG7380 Cloud Computing Lecture 1 Keke Chen. Outline Syllabus Scope of this course Tentative schedule Prerequisites Resources Assignments Introduction

Laporan Kasus Keke