Data Storage Formats in Hadoop

DATA STORAGE FORMATSin Hadoop

Botond Balázs balazsbotond@gmail.com @botond_balazs

OUR MAIN CONCERNS• Read performance (improve)

• Disk usage (reduce)

• Splittability (provide)

• Failure behavior

• Write performance (keep reasonable)

Disks are so slow that it is worth sacrificing a lot of CPU cycles to reduce disk I/O.

In a distributed system, reducing network traffic is also important.

3 WAYS OF REPRESENTING THIS TABLE ON DISK

CourseId Title Instructor CategoryId

25 Databases 1 Jennifer Widom 10

27 Databases 2 Jennifer Widom 10

28 Algorithms Charles Leiserson 12

30 Discrete Math Donald Knuth 12

35 Operating Systems A. Tanenbaum 40

ROW-ORIENTED

• Fields of a row are stored contiguously

• Quick and easy:

• Retrieve an entire row

• Insert, update

• Drawbacks:

• Without indexing, filtering is slower

• Entire row has to be read even if we only need a few columns

25 Databases 1 Jennifer Widom 10 27 Databases 2 Jennifer

Widom 10 28

COLUMN-ORIENTED

• Fields of a column are stored contiguously

• Benefits:

• Each column can serve as an index (fast filtering operations on the whole dataset)

• Only selected columns are read

• Drawbacks:

• Whole-row operations require a lot of disk I/O

• Slow and hard inserting and updating

• The same row can be stored on different nodes in a distributed environment

25 27 28 30 35 Databases 1

Databases 2 Algorithms Discrete M. Operating S. J. Widom J. Widom

C. Leiserson:003 D. Knuth:004 A. Tanenbaum:005 10 10 12

RECORD COLUMNAR

25 Databases 1 Jennifer Widom

28 Algorithms Charles Leiserson

35 Operating Systems

A. Tanenbaum 40

Horizontal Partitioning

Row Groups

RECORD COLUMNAR

A. Tanenbaum 40

Row Groups

25 27 Databases 1Databases 2 Jennifer Widom Jennifer Widom

28 30 35Algorithms Discrete Math Operating Sys.C. Leiserson Donald Knuth A. Tanenbaum

12 12 40

High redundancy in columns

Compress them!

SERIALIZATION FORMATSRow-Oriented Record Columnar

Neither

RCFileThrift

SequenceFile

SEQUENCEFILEHeader

version 3-byte magic number eg. „SEQ6”keyClassName String, Java class name of keys

valueClassName String, Java class name of values

compression Bool, true if record compression is onblockCompression Bool, true if block compression is oncompressorClass String, Java class name of compressor

metadata SequenceFile.Metadata (key-value pairs)

sync A sync marker to denote end of header

Java-only format!

SEQUENCEFILEHeaderSYNCRecordRecordRecordSYNCRecordRecordRecordSYNCRecordRecordRecord

Split points

SEQUENCEFILE FAILURE BEHAVIOR

• Readable to the first failed row

• Not recoverable after that point

{ "type": "record", "name": "LongList", "aliases": ["LinkedLongs"], "fields" : [ {"name": "value", "type": "long"}, {"name": "next", "type": ["null", "LongList"]} ]}

JSON schema

AVRO• Schema is stored in the header

• Supports writing and reading with a different schema (schema evolution)

• Supports nested types

• Block-based splittable format (SYNC marker)

• Optional block compression (Snappy, Deflate)

• Excellent failure behavior : only the failed block is lost, reading will continue at the next SYNC marker

RCFILE

First widespread record columnar format Has much better alternatives today: ORC, Parquet

PARQUET

• ORC is designed specifically for Hive

• Parquet is a general purpose format

• Supports complex nested data structures

• Stores full metadata at the end of files

PARQUET

FAILURE BEHAVIOR OF RECORD COLUMNAR FORMATS

Failure can lead to incomplete rows

They don’t handle failure well

COMPRESSIONFormat Splittability Write Speed Read Speed Compression

gzip ✖ ★★ ★★★ ★★★

bzip2 ✔ ★ ★ ★★★

Snappy ✖ ★★★ ★★★ ★

LZO ✔ ★★★ ★★★ ★

Each of these are splittable when inside a container format.

RECOMMENDATION

Analytics Archival

Format Parquet Avro

Compression Snappy/gzip bzip2

The End.

Data Storage Formats in Hadoop

Data & Analytics

Big Data Storage Options for Hadoop - SNIA Data Storage Options for Hadoop Can utilize same disk options as HDFS .

Big Data Analytics on Object Storage -- Hadoop over … Data Analytics on Object Storage -- Hadoop over Ceph* Object Storage with SSD Cache David Cohen (david.e.cohen@intel.com ) Yuan

EMC Isilon Best Practices for Hadoop Data Storage

Data Formats CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Programming. Overview MapReduce Types Input Formats Output Formats Serialization Job g/apache/hadoop/mapreduce/package-

Hadoop* Analytics with Cloudian Solution Reference ... · Cloudian HyperStore also provides cost effective Hadoop ready storage. Enterprises can run Hadoop analytics directly

Hadoop & Cloud Storage: Object Store Integration in Production

Scaling Storage and Computation with Hadoop

Data Storage Formats - web.stanford.edu€¦ · Data Storage Formats Instructor: Matei Zaharia cs245.stanford.edu. Outline Overview Record encoding Collection storage Indexes CS 245

Hadoop Lecture BigData Analytics · Hadoop Map Reduce Hadoop 2 TEZ Execution Engine DevelopmentSummary 1 Hadoop Version 1 Architecture File Formats I/O Path 2 Map Reduce 3 Hadoop

Big Data Storage Options for Hadoop - Home | Storage Networking

O’Reilly – Hadoop : The Definitive Guide Ch.7 MapReduce Types and Formats

When Hadoop-like Distributed Storage Meets NAND Flash ...dcslab.hanyang.ac.kr/nvramos/nvramos11fall/presentation/JupyungLee… · When Hadoop-like Distributed Storage Meets NAND Flash:

SME-LET Announcement of Opportunities 2009: Cal/Val and ...Oct 31, 2011 · Hadoop test cluster setup (5 desktop nodes). Performance analysis with various data storage formats. Experiments

EMC HADOOP Storage Strategy

Hadoop as Storage for Aging Data

Apache Hadoop Today & Tomorrow - SNIA€¦ · Apache Hadoop Projects . Programming Languages . Computation Object Storage Zookeeper (Coordination) Core Apache Hadoop Related Apache

Indic threads pune12-comparing hadoop data storage

Can Enterprise Storage Fix Hadoop? - SNIA | Advancing ... · Can Enterprise Storage Fix Hadoop? ... pilot to production No storage tiering Limited ... Modern Enterprise Storage Issues

Hadoop and Spark Analytics over Better Storage