Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Surasak Sanguanpong
Department of Computer EngineeringFaculty of Engineering, Kasetsart University
Tech%Talk% Session,%WUNCA%33rd Chulalongkorn University,% July%14%2016
Experiences*in*Traffic*Logging*and*Visualization*with*ELK*and*D3.js
U-Bahn Station Candidplazt, Munich, Germany
In This Talk
2
Real%Time%
Visualization
with%D3.js
Search%Platform%
with%ELKAbout%Traffic%Log Lessons%Learnt
Log Monitoring
Collecting
Processing
Analysing
Visualising
3Image:%https://www.flickr.com/photos/sbeebe/4772418919
At What Scale?
Hmm..Large..
4http://www.24hourcampfire.com/ubbthreads/ubbthreads.php/topics/5976731/all/That_s_a_load_of_logs
Traffic Logging Solution
Splunk? Great, but..commercial, proprietary
Graylog?Excellence, but too automatic
Elasticsearch, Logtash, Kibana, D3That is!, a lot of fun to play
5
Chapter I Log Architecture and
Raw Log Management: A Case Study
6
Evolution of KU Traffic Logging Design
2008-2015 2015-
7
Raw Log
MySQL
Simple GUI
Raw Log
Elasticsearch
Kibana/D3
Logging Architecture
8
Mirror packets
PacketLog
Web Log
Login Log
Network
Login/LogoutLogin
Search GUILogging Engine
Login Log FormatDate Time Action IP UserName LogServer
Jul 1 10:04:57 login 158.108.X.X [email protected] 192.168.1.1Jul 1 10:04:58 logout 158.108.X.X [email protected] 192.168.1.2Jul 1 10:04:59 timeout 158.108.X.X [email protected] 192.168.1.2
9
Web Log FormatUnixTime SrcIPv4 SrcIPv6 DstIPv4 DstIPv6 SrcPort DstPort URL Referer/HTTPS
20151103010000 192.55.X.X - 158.108.X.X - 17490 80 mirror1.ku.ac.th/fedora-epel/6/i386/jday-devel-2.4-5.el6.i686.rpm http://mirror1.ku.ac.th/fedora-epel/6/i386/
20151103010000 10.X.X.X - 203.104.175.X - 62635 80 sg-nvapis.line.me/ ping?&msgpad=1446487199964&md=9LMRXqv1Nb8P07aj0Vo%3D –
20151103010000 - 2406:3100:1018:1::XX - 2600:1417:a::174c:XX 61154 443 fbcdn-photos-g-a.akamaihd.net HTTPS
20151103010000 - 2406:3100:1018:1::XX - 2a03:2880:f002:105:fa:b0:0:YYXX 59960 443 edge-mqtt.facebook.com HTTPS
10
Packet Log Format (Header Log)TimeStamp SrcIP DstIP SrcPort Proto Size DstPort SrcPort [Flag]2009-07-16 17:53:59.999206 208.117.8.X 158.108.234.X 1514 TCP 80 1371 0x10
2009-07-16 17:53:59.999209 158.108.2.X 202.143.136.X 90 UDP 123 123
TimeStamp SrcIP DstIP Proto Code2009-07-16 17:53:59.999210 158.108.184.X 218.164.54.X ICMP 168
11
Example of Log Folder
Time basedHierarchical Folder
12
Minutes%FileHourDayMonthYear
2015
01
01
00
201501010000.txt201501010001.txt
:201501010059.txt01
::
23
201501012300.txt201501012301.txt
:201501012359.txt
02:
3002:
12
Minutely HTTP Log
13
11"days"(11x"24x60="15,640"data"points)
Request Rate and Log Sizing
14
Accumulated Log Request and Size
15
#Files":"120
20M
2.04"GB
14.1B
2.57"TB
#Files":"172,800
28.03"TB
3.27T
#Files":"172,800
Log Processing and Search Services
• On the fly Text based Log to MySQL converter
• Slow processing/ searching time
• Simple Search
16
Chapter II ELK Stack Testbed
17
What is the Elasticsearch?
18
Real\time
Search/Analytic
Engine%SW
Document\
Oriented
REST%API
&
JSON
Distributed ScalablePlugin
Architecture
JAVA/Lucenebased
Open"SourceApache"2"License
REST:%Representational%State%Transfer
JSON:%JavaScript%Object%Notation
What does Elasticsearch offer?
19
Full%Text%Search Very%Fast Fault%Tolerance High%Availability
How the world is using Elasticsearch?
20
Analytics solution on 40 million documents per day to deliver
real-time visibility
Providing search across GitHub's code
Full-text search to find related questions and answers
Full-text search with highlighted search snippets
Elasticsearch and Big DataES-Hadoop: Connectivity of Hadoop's big data analytics and the real-time search of Elasticsearch.
21
https://www.elastic.co/products/hadoop
ELK stack from Elastic
22
Elasticsearch: High-performance scalable search engine
Logtash: Log transport and processing daemon
ELK StackKibana: Visualisation dashboard
Logtash
23
Log aggregator and parser
Transferring parsed data to Elasticsearch
Configuration file for specifying input, filtering
(parsing) and output
input%{%stdin {%}%}
filter% {%%
grok {%%%
match%=>% {%"message"% =>%"%{COMBINEDAPACHELOG}"% }%%
}%%
date%{%%%%match%=>% [%"timestamp"% ,"dd/MM/yyyy:HH:mm:ss"% ]%%
}
}
output%{%%Elasticsearch {%hosts%=>% ["localhost:9200"]%}%%
stdout {%codec%=>%rubydebug }}
Kibana
24
General purpose query UI
Includes many widgets
Query Elasticsearch without coding
Alternative Stack
25
ELK
EFK
Elasticsearch Indexing Performance
26
35
36
37
38
39
40
41
42
43
44
45
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
THOUSANDS
MILLIONS
Daily**Performance*Indexing
#Records Records/s• Single Dell R220 • Xeon E3-1271v3 3.6
Ghz 4C/8T• 32 GB RAM• 2x6 TB NLSAS
• Elasticsearch2.3.2• 10 Shards/0 Replica• Hyper-threading off• Web Log Indexing
Search PerformanceSearch keyword: “ face” against each daily log
Not yet Optimization
27
2.01
2.33
1.992.13
2.67
2.00
1.33
1.02
3.00
2.33
2.00
2.67
3.00
2.67
2.43
3.33
2.67
2.14
3.33
17,551
22,816
16,346
18,218
16,240
7,958
5,622
1,886
23,559
9,1278,221
12,343
28,259
25,405
22,092
33,528
17,683
12,951
18,054
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
SEARCH
"TIM
E"(M
S)
Search "Performance"and"Hits
Search%Time%(ms) Hits
Kibana: Main Dashboard
28
Kibana : per IP Log
29
Kibana: Login Profile
30
Kibana: Concurrent Login View
31
Chapter III Playing with D3.js
32
Real Time Visualization with D3.js
• Data-Driven Documents (D3)
• JavaScript library for manipulating documents based on data
• Developed by Mike Bostock
33
https://d3js.org/
D3 Architecture! Input data to build
visualizations (JSON, CSV,…)
! Data manipulation of HTML elements dynamically with JavaScript
34
node.js
socket.io
Sample Gallery
35
Real-time makes impression
36
http://map.norsecorp.com/#/Norse%Live%Attack%Map%
Concurrent Login
37
IP Matrix Occupied
38
Tree Map Web Access
39
Traffic Connectivity
40
Chapter IV New Log Design
41
New Logging Architecture
42
Mirror packets
FlowLog
Web Log
Login Log
Network
Login/outeventLogin
Logging Engine
ElasticsearchReal time Indexing
Session"Tracking"&Accounting
DHCP,RADIUS
ElasticsearchGUI/
Analytics
Logging Redesign
43
User"identificationLegal"Logging
Real^timeAccounting
UserSessionControl
TrafficAnalytics
SIEMSupports
PerformanceManagement
New*Login*Log*Format• Real-time logging, one file per day• Fields
login_session_id user login_timestamplogout_timestamp mac_address ipv4 ipv6agent_ip agent_typevia_ip ipv4_byte_in ipv4_byte_outipv4_pkt_in ipv4_pkt_out ipv6_byte_in ipv6_byte_out ipv6_pkt_in ipv6_pkt_out
• Sample Log67686345 [email protected] 1467551484.163681 0 001122334455 192.0.2.1 2001:db8::1 203.0.113.5 login – 0 0 0 0 0 0 0 067686346 [email protected] 1467551490.524125 0 - 192.0.5.5 - 203.0.113.1 login – 0 0 0 0 0 0 0 067686345 [email protected] 1467551484.163681 1467551833.754636 001122334455 192.0.2.1 2001:db8::1 203.0.113.5 login – 234342 423442 5522 6622 233456 22334 445 665
New*Web*Log*Format• Real-time logging, one file per minute• Fields
request_timestamp {flow link fields} {login link fields} {ip info fields}{tcp info fields} method host path referrer agent
• Sample Log554455 1467551484.180000 67686345 [email protected] 1467551484.1636814 192.0.2.1 198.51.100.1 tcp 5566 80 GET www.domain.com /index.html - “Linux”
Traffic*Flow*log• Log commit periodically (Configurable 1 minute to 1 hour interval)• Fields
• flow_id flow_start_timestamp {segment info fields} {login link fields}• {ip info fields} {tcp info fields} {tcp additional info fields} {tcp stat fields}
• Sample Log554455 1467551484.180000 1467551484.180000 1467551492.954258 18 20 1628 25456 223344 f 67686345 [email protected] 1467551484.163681 4 192.0.2.1 198.51.100.1 tcp 5566 80 1 - -1428 1428 864 24522 3 17 2 2 0 30000 0 30000
Chapter V Lessons Learned
47
Lessons Learned
Elasticsearch offers a very fast full-text search services
Indexing size may 3x to 5x bigger than source data
Use Elasticsearch for search services, not for data archiving
48
Lessons Learned
Logtash : A powerful tool to manipulate log
Kibana : Simple and useful for visualize data
49
Lessons Learned
D3 prosFlexible, Facsinating Visualization
D3 consLow Level, Steep Learning Curve, CPU intensive
50
Lessons Learned
Combination of Lawful Log,
Security information and event management (SIEM) and Accounting
51
Thank you for your attention
Q & A Time
Q&A…
52
Kasom Koth-Arsa
Core Log Design and DevelopmentJautuporn Chuchuay
Peerapol BoonthaganonWeb GUI Development
Sataporn TechaaramwongWeb/Elasticsearch Development
Peerapong Thongpubeth
Jiradech SirijantadilokKibana Development
Poomipat ThongudomNichapat Nattee
D3 DevelopmentSurachai ChitpinijyolProject Coordinator
Surasak SanguanpongProject Director
Special Thanks to Kasetsart Office of Computer Services for supporting traffic dataSunset at Narita Airport