upload test 1

Preview:

Citation preview

Sadayuki Furuhashi

Fluentd

@frsyuki

!e Event Collector Service

Treasure Data, Inc.

Structured logging

Pluggable architecture

Reliable forwarding

• Sadayuki Furuhashi> twitter: @frsyuki

• Treasure Data, Inc.> Software Engineer; founder

• Author of MessagePack

• Author of Fluentd

What’s Fluentd?

It's like syslogd, but uses JSON for log messages

What’s Fluentd?

Application

Fluentd

Storage

2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}

What’s Fluentd?

Application

Fluentd

Storage

2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}

timetag

record

What’s Fluentd?

Application

Fluentd

Storage

!lter / bu"er / routing

What’s Fluentd?

Application

Fluentd

FluentdStorageSaaS

!lter / bu"er / routing

Plug-in Plug-in Plug-in

What’s Fluentd?

Application

Fluentd

FluentdStorageSaaS

!lter / bu"er / routing

File

tail

Scribesyslogd

Plug-in Plug-in

Plug-in

Plug-in Plug-in Plug-in

What’s Fluentd?• Client libraries

> Ruby> Perl> PHP> Python> Java> ...

Fluent.open(“myapp”)

Fluent.event(“login”, {“user”=>38})

#=> 2012-02-04 04:56:01 myapp.login {“user”:38}

Application

Fluentd

Fluentd & Event logsBefore:

Application

File File File ...

App server

Application

File File File ...

App server

File

Application

File File File ...

App server

Log server

Burst of tra!c

High latencymust wait for a day

Hard to analyzecomplex text parsers

Fluentd & Event logsAfter:

Application

App server

Fluentd

Application

App server

Fluentd

Application

App server

Fluentd

Fluentd Fluentd

Realtime!

Fluentd & Event logs

Fluentd Fluentd Fluentd

Fluentd Fluentd

Hadoop/ Hive MongoDB Amazon

S3 / EMRReady toAnalyze!

Realtime!

# receive events via HTTP<source> type http port 8888</source>

# read logs from a file<source> type tail path /var/log/httpd.log format apache tag apache.access</source>

# save access logs to MongoDB<match apache.access> type mongo host 127.0.0.1</match>

# save alerts to a file<match alert.**> type file path /var/log/fluent/alerts</match>

# forward other logs to servers# (load-balancing + fail-over)<match **> type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server></match>

Fluentd vs Scribe

• Deals with structured logs

• Easy to install> “gem install fluentd”> apt-get and yum http://packages.treasure-data.com/

• Easy to customize

• add/modify plugins without re-compiling> “gem search -rd fluent-plugin”

Fluentd vs Flume

• Easy to setup> “sudo fluentd --setup && fluentd”

• Very small footprint> small engine (3,000 lines) + plugins

• JVM-free

• Easy to configure

Architecture of Fluentd

Input Buffer Output

HTTP+JSONFile tailSyslog...

MemoryFile

FileAmazon S3Fluent...

Pluggable Pluggable Pluggable

Architecture :: Input

Input

HTTP+JSONFile tailSyslog...

Pluggable

✓ Receive logs✓ Or pull logs from data sources✓ Non-blocking

Input plugins:

Architecture :: Bu"er

Pluggable

✓ Improve performance✓ Improve reliability✓ Provide thread-safety

Buffer plugins:

Buffer

MemoryFile

Architecture :: Output

Pluggable

✓ Write or send event logsOutput plugins:

Output

FileAmazon S3Fluent...

Plugins :: out_forward

Fluentd

Fluentd Fluentd

out_forward

in_forward

forward event logs

Heartbeat

✓ load balancing

Plugins :: out_forward

Fluentd

Fluentd Fluentd

out_forward

in_forward

forward event logs

Heartbeat

! accrual failure detector

✓ load balancing

Plugins :: out_copy

Fluentd

MongoDB Fluentd

out_copy

out_forwardout_mongo

duplicate event logs

File

out_#le

Plugins :: buf_#le

Fluentd

buf_#le

reliable bu"ering

#le

#le

#le✓ Automatic retry✓ 2^N retry interval

#le#le

#le

✓ Persistent bu"er

Plugins :: out_exec

Fluentd

out_exec

externalprogram

TSV → stdin

execute external programs

✓ Python✓ Perl✓ C++

Plugins :: out_exec_#lter

Fluentd

out_exec_#lter

externalprogram

stdin

stdoutexternalprogram

out_execTSV → stdin

execute external programs

✓ Python✓ Perl✓ C++

Plugins :: in_exec

Fluentd

out_exec_#lter

externalprogram

stdin

stdoutexternalprogram

out_execTSV → stdin

externalprogram

stdout

in_exec

execute external programs

✓ Python✓ Perl✓ C++

Plugins :: in_tail

Fluentd

Application

in_tail

File /var/log/access.log

Read event logs from a #le

✓ Apache log parser✓ Syslog parser✓ Custom parser

Plugins :: in_tailApache log parser

87.12.1.87 - - [04/Feb/2012:00:20:11 +0900] "GET / HTTP/1.1" 200 9887.12.1.87 - - [04/Feb/2012:00:20:11 +0900] "GET / HTTP/1.1" 200 98...

{ “host”: “87.12.1.87”, “method”: “GET”, “code”: 200, “size”: 98, “path”: “/”}...

Plugins

• Bundled plugins> file writes event logs to files hourly or daily

> forward forwards event logs (+fail-over and load balancing)

> exec passes event logs to/from external commands

> tail reads event logs from a file (like `tail -f`)

Plugins

• 3rd party plugins> scribe integrates Fluentd with Scribe

> s3 uploads log files to Amazon S3 hourly or daily

> mongo writes logs to MongoDB

> hoop puts log files on Hadoop HDFS via Hoop

...

Plugin developer API

• Unit test framework (like “MRUnit”)> Fluent::Test::InputTestDriver> Fluent::Test::OutputTestDriver> Fluent::Test::BufferedOutputTestDriver

• Fluent::TailInput (base class of “tail” plugin)> text parser is customizable def parse_line(line)

Fluentd• Documents

> http://fluentd.org

• Source code> http://github.com/fluent

• Twitter> #fluentd

• Mailing list> http://groups.google.com/group/fluentd

“BIG DATA ANALYTICS PLATFORM”as a Service

Fluentd & Treasure Data

Fluentd Fluentd Fluentd

Fluentd Fluentd

Hadoop/ Hive MongoDB Amazon

S3 / EMRReady toAnalyze!

Realtime!

Fluentd & Treasure Data

Fluentd Fluentd Fluentd

Fluentd Fluentd

Realtime!

Treasure DataCloud Platform

Ready toAnalyze!

Fluentd & Treasure DataTreasure Data

Cloud Platform

SQL VisualizationSELECT users.age, COUNT(1)FROM logsLEFT JOIN users ON logs.user_id = users.idGROUP BY users.ageWHERE path = “/buyItem”

Contacts :

sales@treasure-data.com