Upload
teo
View
50
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Comp6611 Course Lecture Big data applications. Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 [email protected]. - PowerPoint PPT Presentation
Citation preview
Comp6611 Course LectureBig data applications
Yang PENG Network and System LabCSE, HKUST
Monday, March 11, [email protected]
Material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
Today's Topics MapReduce
Background information/overview Map and Reduce
-------- from a programmer's perspective Architecture and workflow
-------- a global overview Virtues and defects
Improvement Spark
Background
MapReduce
Spark
MapReduce Background Before MapReduce, large-scale data processing
was difficult Managing parallelization and distribution
Application development is tedious and hard to debug Resource scheduling and load-balancing
Data storage and distribution Distributed file system “Moving computation is cheaper than moving data.”
Fault/crash tolerance ScalabilityBackground
MapReduce
Spark
Does “Divide and Conquer paradigm” still work in big data?
Background
MapReduce
Spark
Work
𝒘𝟏 𝒘𝟐 𝒘𝟑
𝒘𝟏 𝒘𝟐 𝒘𝟑
Worker Worker Worker
Partition
CombineResult
Programming Model• Opportunity: design an software abstraction undertake the divide
and conquer and reduce programmers' workload for• resource management• task scheduling• distributed synchronization and communication
• Functional programming, which has long history, provides some high-order functions to support divide and conquer. Map: do something to everything in a list Fold: combine results of a list in some way
Background
MapReduce
Spark
Computer Computer Computer
Abstraction
Application
…
Map Map is a higher-order function How map works:
Function is applied to every element in a list Result is a new list
f f f f fBackground
MapReduce
Spark
Fold Fold is also a higher-order function How fold works:
Accumulator set to initial value Function applied to list element and the accumulator Result stored in the accumulator Repeated for every item in the list Result is the final value in the accumulator
f f f f f final value
Initial value
Background
MapReduce
Spark
Map/Fold in Action Simple map example:
Fold examples:
Sum of squares:
(map (lambda (x) (* x x)) '(1 2 3 4 5)) '(1 4 9 16 25)
(fold + 0 '(1 2 3 4 5)) 15(fold * 1 '(1 2 3 4 5)) 120
(define (sum-of-squares v) (fold + 0 (map (lambda (x) (* x x)) v)))(sum-of-squares '(1 2 3 4 5)) 55
Background
MapReduce
Spark
MapReduce Programmers specify two functions:
map (k1,v1) → list(k2,v2) reduce (k2, list (v2)) → list(v2)
function map(String name, String document): // K1 name: document name // V1 document: document contents for each word w in document: emit (w, 1) function reduce(String word, Iterator partialCounts): // K2 word: a word // list(V2) partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)
Background
MapReduce
Spark
An implementation of WordCount
It's just divide and conquer!Data Store
Initial kv pairs
mapmap
Initial kv pairs
map
Initial kv pairs
map
Initial kv pairs
k1, values…
k2, values…k3, values…
k1, values…
k2, values…k3, values…
k1, values…
k2, values…k3, values…
k1, values…
k2, values…k3, values…
Barrier: aggregate values by keys
reduce
k1, values…
final k1 values
reduce
k2, values…
final k2 values
reduce
k3, values…
final k3 values
Background
MapReduce
Spark
Behind the scenes…
Background
MapReduce
Spark
Programming interface input reader Map function partition function
The partition function is given the key and the number of reducers and returns the index of the desired reduce.
For load-balance, e.g. Hash function compare function
The compare function is used to sort computing output. Ordering guarantee
Reduce function output writer
Background
MapReduce
Spark
Ouput of a Hadoop jobypeng@vm115:~/hadoop-0.20.2$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount /user/hduser/wordcount/15G-enwiki-input /user/hduser/wordcount/15G-enwiki-output
13/01/16 07:00:48 INFO input.FileInputFormat: Total input paths to process : 1
13/01/16 07:00:49 INFO mapred.JobClient: Running job: job_201301160607_0003
13/01/16 07:00:50 INFO mapred.JobClient: map 0% reduce 0%
.........................
13/01/16 07:01:50 INFO mapred.JobClient: map 18% reduce 0%
13/01/16 07:01:52 INFO mapred.JobClient: map 19% reduce 0%
13/01/16 07:02:06 INFO mapred.JobClient: map 20% reduce 0%13/01/16 07:02:08 INFO mapred.JobClient: map 20% reduce 1%
13/01/16 07:02:10 INFO mapred.JobClient: map 20% reduce 2%
.........................
13/01/16 07:06:41 INFO mapred.JobClient: map 99% reduce 32%
13/01/16 07:06:47 INFO mapred.JobClient: map 100% reduce 33%13/01/16 07:06:55 INFO mapred.JobClient: map 100% reduce 39%
.........................
13/01/16 07:07:21 INFO mapred.JobClient: map 100% reduce 99%
13/01/16 07:07:31 INFO mapred.JobClient: map 100% reduce 100%
13/01/16 07:07:43 INFO mapred.JobClient: Job complete: job_201301160607_0003
(To continue.)
Background
MapReduce
Spark
Progress
Counters in a Hadoop job13/01/16 07:07:43 INFO mapred.JobClient: Counters: 18
13/01/16 07:07:43 INFO mapred.JobClient: Job Counters
13/01/16 07:07:43 INFO mapred.JobClient: Launched reduce tasks=24
13/01/16 07:07:43 INFO mapred.JobClient: Rack-local map tasks=17
13/01/16 07:07:43 INFO mapred.JobClient: Launched map tasks=249
13/01/16 07:07:43 INFO mapred.JobClient: Data-local map tasks=203
13/01/16 07:07:43 INFO mapred.JobClient: FileSystemCounters
13/01/16 07:07:43 INFO mapred.JobClient: FILE_BYTES_READ=12023025990
13/01/16 07:07:43 INFO mapred.JobClient: HDFS_BYTES_READ=15492905740
13/01/16 07:07:43 INFO mapred.JobClient: FILE_BYTES_WRITTEN=14330761040
13/01/16 07:07:43 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=752814339
13/01/16 07:07:43 INFO mapred.JobClient: Map-Reduce Framework
13/01/16 07:07:43 INFO mapred.JobClient: Reduce input groups=39698527
13/01/16 07:07:43 INFO mapred.JobClient: Combine output records=508662829
13/01/16 07:07:43 INFO mapred.JobClient: Map input records=279422018
13/01/16 07:07:43 INFO mapred.JobClient: Reduce shuffle bytes=2647359503
13/01/16 07:07:43 INFO mapred.JobClient: Reduce output records=39698527
13/01/16 07:07:43 INFO mapred.JobClient: Spilled Records=828280813
13/01/16 07:07:43 INFO mapred.JobClient: Map output bytes=24932976267
13/01/16 07:07:43 INFO mapred.JobClient: Combine input records=2813475352
13/01/16 07:07:43 INFO mapred.JobClient: Map output records=2376465967
13/01/16 07:07:43 INFO mapred.JobClient: Reduce input records=71653444
Background
MapReduce
Spark
Summary of counters in job
Master in MapReduce Resource Management
Maintain the current resource usage of each Worker(CPU, RAM, Used & free disk space, etc.)
Examine worker failure periodically. Task Scheduling
“Moving computation is cheaper than moving data.” Map and reduce tasks are assigned to idle Workers. Tasks on failure workers will be re-scheduled. When job is close to end, it launches backup tasks.
Counter provides interactive job progress. stores the occurrences of various events. is helpful to performance tuning.
Background
MapReduce
Spark
Data-oriented Map scheduling
Launch map 1 on Worker 3
Switch
Worker 1
Worker 2
Worker 3
Switch
Worker 4
Worker 5
Worker 6
1
11
2
2
2 3
3
3
4 4
45
5 5
input =1 2 3 4 5+ + + +
input splits
Rack 1 Rack 2
Launch map 2 on Worker 4
Launch map 3 on Worker 1
Launch map 4 on Worker 2
Launch map 5 on Worker 5
Background
MapReduce
Spark
Data flow in MapReduce jobs
Mapper Reducer
other Mappers
other Reducers
circular buffer (in memory)
spills (on disk)
merged spills (on disk)
intermediate files (on disk)
Combiner
Background
MapReduce
Spark
GFSlocal split
rack-local split non-local split
GFS
Map internal The map phase reads the task’s input split from
GFS, parses it into records(key/value pairs), and applies the map function to each records.
After the map function has been applied to each record, the commit phase registers the final output to Master, which will tell reduce the location of map output.
Background
MapReduce
Spark
Reduce internal The shuffle phase fetches the reduce task’s input
data.
The sort phase groups records with the same key together.
The reduce phase applies the user-defined reduce function to each key and corresponding list of values.
Background
MapReduce
Spark
Backup Tasks There are barriers in a MapReduce job.
No reduce function executes until all maps finish. The job can not complete until all reduces finish.
The execution time of a job will be severely lengthened if a task is blocked.
Master schedules backup/speculative tasks for unfinished ones before the job is close to end.
A job will take 44% longer if backup tasks are disabled.
MapMapMap
Map
ReduceReduceReduce
Job complete
Background
MapReduce
Spark
Virtues and defects of MR
Virtues Towards large scale data Programming friendly Implicit parallelism Data-locality Fault/crash tolerance Scalability Open-source with good
ecosystem[1]
Defects Bad for iterative
ML algorithms Not sure
[1] http://docs.hortonworks.com/CURRENT/index.htm#About_Hortonworks_Data_Platform/Understanding_Hadoop_Ecosystem.htm
Background
MapReduce
Spark
Network traffic in MapReduce1. Map may read split from remote ChunkServer2. Reduce copy the output of Map3. Reduce output write to GFS
1
2 3
Background
MapReduce
Spark
Disk R/W in MapReduce1. ChunkServer reads local block for remote split fetching2. Spill intermediate result to disk3. Write the copied partition to local disk4. Write the result output to local ChunkServer5. Write the result output to remote ChunkServer
12
34
5Background
MapReduce
Spark
Iterative MapReduce
Performing graph algorithm Using MapReduce.
Background
MapReduce
Spark
Motivation of Spark Iterative algorithms (machine learning, graphs)
Interactive data mining tools (R, Excel, Python)
Background
MapReduce
Spark
Programming Model Fine-grained
Computing outputs of every iteration are distributed and store to stable storage
Coarse-grained Only logging the transformations to build a dataset(i.e. lineage)
Resilient distributed datasets (RDDs) Immutable, partitioned collections of objects Created through parallel transformations (map, filter,
groupBy, join, …) on data in stable storage Can be cached for efficient reuse
Actions on RDDs Count, reduce, collect, save, …
Background
MapReduce
Spark
Spark Operations
Transformations(define a new
RDD)
mapfilter
samplegroupByKeyreduceByKey
sortByKey
flatMapunionjoin
cogroupcross
mapValues
Actions(return a result to driver program)
collectreducecountsave
lookupKey
Background
MapReduce
Spark
Example: Log MiningLoad error messages from a log into memory, then interactively search for various patternslines = spark.textFile(“hdfs://...”)
errors = lines.filter(_.startsWith(“ERROR”))messages = errors.map(_.split(‘\t’)(2))cachedMsgs = messages.cache()
Block 1
Block 2
Block 3
Worker
Worker
Worker
Driver
cachedMsgs.filter(_.contains(“foo”)).countcachedMsgs.filter(_.contains(“bar”)).count. . .
tasksresults
Cache 1
Cache 2
Cache 3
Base RDDTransformed
RDD
Action
Result: full-text search of Wikipedia in <1 sec (vs 20
sec for on-disk data)
Result: scaled to 1 TB data in 5-7 sec
(vs 170 sec for on-disk data)
RDD Fault ToleranceRDDs maintain lineage information that can be used to reconstruct lost partitions
Ex:messages = textFile(...).filter(_.startsWith(“ERROR”)) .map(_.split(‘\t’)(2))
HDFS File Filtered RDD Mapped RDDfilter
(func = _.contains(...))map
(func = _.split(...))Background
MapReduce
Spark
Example: Logistic Regression
Goal: find best line separating two sets of points
+
–
+ ++
+
+
++ +
– ––
–
–
–– –
+
target
–
random initial line
Background
MapReduce
Spark
Example: Logistic Regression
val data = spark.textFile(...).map(readPoint).cache()
var w = Vector.random(D)
for (i <- 1 to ITERATIONS) { val gradient = data.map(p => (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x ).reduce(_ + _) w -= gradient}
println("Final w: " + w)Background
MapReduce
Spark
Keep variable “data” in memory
Logistic Regression Performance
1 5 10 20 300
50010001500200025003000350040004500
Hadoop
Spark
Number of Iterations
Run
ning
Tim
e (s
)
127 s / iteration
first iteration 174 sfurther iterations 6
sBackground
MapReduce
Spark
Spark Programming Interface (eg. page rank)
Representing RDDs
Spark Scheduler
Dryad-like DAGs Pipelines functions
within a stage Cache-aware work
reuse & locality Partitioning-aware
to avoid shuffles
= cached data partition
Background
MapReduce
Spark
Behavior with Not Enough RAM
Cache disabled
25% 50% 75% Fully cached
020406080
10068
.8
58.1
40.7
29.7
11.5
% of working set in memory
Iter
atio
n ti
me
(s)
Fault Recovery Results
1 2 3 4 5 6 7 8 9 10020406080
100120140
119
57 56 58 58
81
57 59 57 59
No Failure
Iteration
Iter
atri
on t
ime
(s)
Conclusion Both MapReduce and Spark are excellent big
data software, which are scalable, fault-tolerant, and programming friendly.
Especially, Spark provides a more effective method for iterative computing jobs.
Background
MapReduce
Spark
QUESTIONS?Thanks!