13
Serial-war Xuechao Wu Evaluate the performance of serialization formats Insight Data Engineering Fellowship, SV

Xuechao Serial War

Embed Size (px)

Citation preview

Page 1: Xuechao Serial War

Serial-war

Xuechao Wu

Evaluate the performance of serialization formats

Insight Data Engineering Fellowship, SV

Page 2: Xuechao Serial War

Ideas and Motivations• What format should be used for real-time apps?

• Bandwidth usage

Page 3: Xuechao Serial War

DEMO

www.serialwar.xyz

Page 4: Xuechao Serial War

PIPELINE

Serialization Deserialization Dashboard

Ingestion Processing Cache

Page 5: Xuechao Serial War

PIPELINE

m4.x

m4.x m4.x

m4.x m4.x

m4.x

m4.x

$1.673/hr

Page 6: Xuechao Serial War

Protocol Buffers

33 Bytes

*https://martin.kleppmann.com/2012/12/05/schema-ev olution- in-avr o-pr otocol- buffers-thrift.html

Page 7: Xuechao Serial War

Apache Avro

32 Bytes*https://martin.kleppmann.com/2012/12/05/schema-ev olution- in-avr o-pr otocol- buffers-thrift.html

Page 8: Xuechao Serial War

AverageByKey

latency_stream = message_DStream.map(lambda x:json.loads(x)). //x:{json}map(lambda x:(math.ceil(time.time()),time.time()-x["time"])). //(key_time,latency)combineByKey(lambda value: (value, 1),lambda x, value: (x[0] + value, x[1] + 1),lambda x, y: (x[0] +y[0], x[1] + y[1])). //(key_time, (value,1)) -> (key_time,(sum,count)) -> (key_time,(sum,count))map(lambda (label, (value_sum, count)): (label, value_sum / count)) //(time,averaged_latency)

Page 9: Xuechao Serial War

Throughput monitoring

● “peak” pattern

Page 10: Xuechao Serial War

Overall Performance: 2000 events/sec

JSON

~50% more

~34% less latency

Avro

100kb/s38ms

Protobuf

10% more

17% higher latency

Page 11: Xuechao Serial War

I would recommend…

JSON

If your app is

Lag-critical

Light-sized data

Avro

If your app isData-heavy

real-time critical

Protobuf

If your app isHeavily

replying onGoogle

Services

Need Perfect documentatio

n

Page 12: Xuechao Serial War

About me

• University of Southern California

• MS Electrical Engineering

Before Insight At Insight

Basic MapReduce Spark, Kafka, Redis

Compression Serialization

Linux C AWS, Bash, tmux…

Basic front-end Full Stack Dev

Think Alone Communication

Page 13: Xuechao Serial War

Avro vs. Protobuf• Why Avro serialization is slightly smaller than Protobuf?

• Avro schema has both attribution name and type.

• Protobuf tags each record with name tag and type. (1 byte more per record)

• Schema Evolution?• Avro must keep the most recent version(order matters, field matters), or runtime risk

• Protobuf may decode with previous schema without runtime error, overall more flexible.

• Optional Feature?• Protobuf: decode with validation for required

• Avro: null in a union to indicate optional