Application-driven Energy-efficient Architecture Explorations for Big Data

Slide 1

Application-driven Energy-efficient Architecture Explorations for Big DataAuthors:Xiaoyan GuRui HouKe ZhangLixin ZhangWeiping Wang(Institute of Computing Technology,Chinese Academy of Sciences)

Reviewed by-Siddharth Bhave(University of Washington, Tacoma)

Big DataWhat is Big Data?

Problems with Big dataEnergy ConsumptionVelocity (Operation latency and throughput)Volume (storing capacity)Variety

Managing Big Data ProblemsStorage TechnologiesPartitioningMultithreadingParallel ProcessingEfficient ArchitectureHadoop, Map Reduce, MAHOUTFind bottle neckIntroductionBig data management at architecture level

Two architecture systemsXeon-based clusterAtom Based (micro-server) Cluster

Comparison Based on: -Energy consumptionExecution time

MotivationEver increasing data.

Energy and Time tradeoff in Xeon and Atom based clusters.

Bottleneck by the processes of compression/decompression

Stateless data processing

MastiffMastiff - Targeted application for performance analysis

Big data processing engine

Columnar store policy

Compression Ratio on 3 GB dataCompression Ratio on 100 GB dataCompression Ratio on 500 GB dataMastiff0.540.530.518Hadoop HDFS0.720.710.7Working flow of the Mastiff

MethodologyTPC-H test benchmark of queries and concurrent data

1 TB of verification data

2 cases - data load and data query

Fluke NORMA 4000

Average cases and median results are reportedPower and Performance EvaluationTime on Atom Cluster (30 nodes)Time on Xeon Cluster (30 nodes)Time on Xeon Cluster (15 nodes)Data Load3.435 hours1.543 hours3.242 hoursData Query5.877 hours2.724 hours5.564 hoursTake 3 cases for time and energy consumption

31 nodes Atom Cluster (1 master node)

31 nodes Xeon Cluster (1 master node)

16 nodes Xeon Cluster (1 master node)

Energy consumption between 30-node Atom Cluster and 30-node Xeon ClusterPower and Performance Evaluation (contd)Energy consumption between 30-node Atom Cluster and 15-node Xeon Cluster

Power and Performance Evaluation (contd)Time Breakdown in Map Phase

Power and Performance Evaluation (contd)Time Breakdown in Reduce phase

Power and Performance Evaluation (contd)FindingsAtom platform more power efficient

Data compression and decompression occupies significant percentage.

Compression and decompression can be done in software pipeline fashion i.e. with multiple interleavePropositionsHeterogeneous architecture

Accelerators to perform data compression/decompression

Multiple interleaved compression/decompression

Off-chip and On-chip Accelerators

Multiple Interleaved TasksStrengthsA much needed innovative concept

Organized well

Detailed description of energy and time investigation

Already implemented propositionsWeaknessesNot enough power meters to monitor all nodes

2 assumptionsPower of every network router is evenly counted towards nodesEnergy consumption of each node is similar

Results are generalized by Hadoop even if they might not be true for every application.

Vague propsitions implementationFAWN: A Fast Array of Wimpy NodesAuthors:

David G. AndersenJason FranklinMichael KaminskyAmar PhanishayeeLawrence TanVijay Vasudevan(Carnegie Mellon University)High performance, energy efficient system for storage

Large number of small low-performance (hence wimpy) nodes with moderate amounts of local storage

2 parts: FAWN-DS (data store) and FAWN-KV (key value)

MotivationTraditional architecture consumes too much powerI/O bottleneck due to current storage inabilitiesIntroductionFeaturesPairs of low powered embedded nodes with flash storage

FAWN-DS is the backend that consists of the large number of nodes

Each node has some RAM and flash

FAWN-KV is a consistent, replicated, highly available and high performance key value storage systemFAWN Architecture

Efficient Data Streaming with On-chip Accelerators: Opportunities and ChanllengesAuthors:

Rui HouLixin ZhangMichael C. HuangKun WangHubertus FrankeYi GeXiaotao Chang(University of Rochester)MotivationTransistor density increasing day by day

Many cores are integrated in a single die

Advantage of on-chip accelerator instead of using it as PCI

On-Chip Accelerator Architecture3 types of acceleratorsCrypto acceleratorsDecompression acceleratorsNetwork offload accelerator

Some common characteristics of data stream in the 3 accelerators

Optimize the power and performance of the accelerators.FeaturesThank You

Documents

Application-driven Energy-efficient Architecture Explorations for Big Data