Upload
xuechao-wu
View
91
Download
0
Embed Size (px)
Citation preview
Serial-war
Xuechao Wu
Evaluate the performance of serialization formats
Insight Data Engineering Fellowship, SV
Ideas and Motivations• What format should be used for real-time apps?
• Bandwidth usage
DEMO
www.serialwar.xyz
PIPELINE
Serialization Deserialization Dashboard
Ingestion Processing Cache
PIPELINE
m4.x
m4.x m4.x
m4.x m4.x
m4.x
m4.x
$1.673/hr
Protocol Buffers
33 Bytes
*https://martin.kleppmann.com/2012/12/05/schema-ev olution- in-avr o-pr otocol- buffers-thrift.html
Apache Avro
32 Bytes*https://martin.kleppmann.com/2012/12/05/schema-ev olution- in-avr o-pr otocol- buffers-thrift.html
AverageByKey
latency_stream = message_DStream.map(lambda x:json.loads(x)). //x:{json}map(lambda x:(math.ceil(time.time()),time.time()-x["time"])). //(key_time,latency)combineByKey(lambda value: (value, 1),lambda x, value: (x[0] + value, x[1] + 1),lambda x, y: (x[0] +y[0], x[1] + y[1])). //(key_time, (value,1)) -> (key_time,(sum,count)) -> (key_time,(sum,count))map(lambda (label, (value_sum, count)): (label, value_sum / count)) //(time,averaged_latency)
Throughput monitoring
● “peak” pattern
Overall Performance: 2000 events/sec
JSON
~50% more
~34% less latency
Avro
100kb/s38ms
Protobuf
10% more
17% higher latency
I would recommend…
JSON
If your app is
Lag-critical
Light-sized data
Avro
If your app isData-heavy
real-time critical
Protobuf
If your app isHeavily
replying onGoogle
Services
Need Perfect documentatio
n
About me
• University of Southern California
• MS Electrical Engineering
Before Insight At Insight
Basic MapReduce Spark, Kafka, Redis
Compression Serialization
Linux C AWS, Bash, tmux…
Basic front-end Full Stack Dev
Think Alone Communication
Avro vs. Protobuf• Why Avro serialization is slightly smaller than Protobuf?
• Avro schema has both attribution name and type.
• Protobuf tags each record with name tag and type. (1 byte more per record)
• Schema Evolution?• Avro must keep the most recent version(order matters, field matters), or runtime risk
• Protobuf may decode with previous schema without runtime error, overall more flexible.
• Optional Feature?• Protobuf: decode with validation for required
• Avro: null in a union to indicate optional