26
1 © Cloudera, Inc. All rights reserved. Mirko Kämpf Solutions Architect @semanpix [email protected] Network Traffic Analysis of Hadoop Clusters Understand the common usage patterns and identify typical / atypical workloads Marton Balassi Solutions Architect @MartonBalassi mbalassi@cloudera. com

PCAP Graphs for Cybersecurity and System Tuning

Embed Size (px)

Citation preview

Page 1: PCAP Graphs for Cybersecurity and System Tuning

1© Cloudera, Inc. All rights reserved.

Mirko Kämpf Solutions Architect

@[email protected]

Network Traffic Analysis of Hadoop Clusters

Understand the common usage patterns and identify typical / atypical workloads

Marton Balassi Solutions Architect

@[email protected]

Page 2: PCAP Graphs for Cybersecurity and System Tuning

2© Cloudera, Inc. All rights reserved.

Outline

•Motivation

•PCAP data capture

•Data Analysis with CDH

•Data Analysis with Gephi

• Summary

Page 3: PCAP Graphs for Cybersecurity and System Tuning

3© Cloudera, Inc. All rights reserved.

Understand the network load of a Hadoop cluster

• Network communication is often the limiting factor in distributed computing

• Storing files on DFS, heartbeats, data processing all have a footprint

• The current standard visual tools aggregate data on the host level

• Intrusion detection is critical in enterprise systems (Apache Spot)

Page 4: PCAP Graphs for Cybersecurity and System Tuning

4© Cloudera, Inc. All rights reserved.

PCAP Data Capture

• Packet capture, the standard API for capturing network traffic• Implementations: Libpcap for UNIX, WinPcap for Windows• Multiple analysis tools: tcpdump, nmap, Wireshark, Snort amongst others

Our approach:• Used pcapy, the Python pcap

extension for capturing• The capturing is initiated on the

individual machines• The captured data is written to the

local fs in Avro format, while the capturing is active

• Focus on network structure, packet data is ignored

Avro schema for PCAP data

Page 5: PCAP Graphs for Cybersecurity and System Tuning

5© Cloudera, Inc. All rights reserved.

• Formerly known as ONI• Initiative of Cloudera, Intel, and partners

• Focus on Cybersecurity for the Hadoop domain• Common data formats for advanced analytics• Reliable and robust data (ingestion) pipelines• Repeatable and reliable analysis and modeling procedures• Apache Spot uses a topic-model (LDA) approach, to classify traffic

Apache Spot

• We focus on clustering and visualization of typical workloads in this talk instead.

Page 6: PCAP Graphs for Cybersecurity and System Tuning

6© Cloudera, Inc. All rights reserved.

We implemented multiple ‘typical workloads’ and observed their behavior.

• Create reference data sets (PCAP data):• Scenario A: TeraSort (Big-Batch-Workload)• Scenario B: HDFS PUT,GET; HUE (Interactive Workload)• Scenario C: Idle cluster (Vacation time)• Scenario D: Kafka => Spark => HDFS (Realistic production Workload)• Scenario E: Twitter => Spark => HDFS (Realistic production Workload)

Our Activities

Page 7: PCAP Graphs for Cybersecurity and System Tuning

7© Cloudera, Inc. All rights reserved.

How it Works ...

• We collect raw data in Avro format, using the Snaffer (pcapy) script.• We transform the events to networks, using Hive (SQL API on Hadoop).• We analyze and visualize the networks using Gephi (open graph viz platform).

Page 8: PCAP Graphs for Cybersecurity and System Tuning

8© Cloudera, Inc. All rights reserved.

Initial Results: TeraSort

Page 9: PCAP Graphs for Cybersecurity and System Tuning

9© Cloudera, Inc. All rights reserved.

Initial Results: Twitter Collect

Page 10: PCAP Graphs for Cybersecurity and System Tuning

10© Cloudera, Inc. All rights reserved.

• Use a higher resolution: include ports in addition to hosts only • Use time dependent analysis: track time stamps per packet

• Combine time series analysis and graph analysis: use Gephi and Apache Spark

Let’s have a look inside ...

Page 11: PCAP Graphs for Cybersecurity and System Tuning

11© Cloudera, Inc. All rights reserved.

???

All ports on all hosts, used during an experiment …

Page 12: PCAP Graphs for Cybersecurity and System Tuning

12© Cloudera, Inc. All rights reserved.

Hosts & Ports in a 5 Node Hadoop Cluster

Static network:• 1.535 nodes• 2.997 edges

Network-clusters represent communication ports on individual hosts (bigger nodes in theCenter of the star) forming a Hadoop cluster.

This static view shows all potential communicationendpoints – no activity yet.

Page 13: PCAP Graphs for Cybersecurity and System Tuning

13© Cloudera, Inc. All rights reserved.

Weighted Communication Links during TeraSortCommunication Network• 1.535 node • 34.351 edges

Communication links represent real communication between ports on individual hosts in a Hadoop cluster.

This dynamic view shows all real communicationendpoints and allows a topological analysis.

Page 14: PCAP Graphs for Cybersecurity and System Tuning

15© Cloudera, Inc. All rights reserved.

PageRank & Eigenvector CentralityTopological Properties

Node sizes represent PageRank of a node based on Communication links.

Node colors still reflect the host on which the communication endpoints are active.

Node sizes represent Eigenvector Centrality based on Communication links.

Node colors still reflect the host on which the communication endpoints are active.

Page 15: PCAP Graphs for Cybersecurity and System Tuning

16© Cloudera, Inc. All rights reserved.

Which are the most central nodes?

NameNode

ResourceManager(internal) t.

Interesting ports:

2300023001

70518022

Most active Server:

172.28.209.73

Page 16: PCAP Graphs for Cybersecurity and System Tuning

17© Cloudera, Inc. All rights reserved.

Time Evolution of Dynamic Communication Processes

Host centricHost = Server

Cluster centricCluster = Functional Layer

Page 17: PCAP Graphs for Cybersecurity and System Tuning

18© Cloudera, Inc. All rights reserved.

Re-organization: Segregation by Components• Communication components are distributed

across servers.• Server centric analysis doesn’t help

• Communication layers can be interdependent.• Dependencies are not visible in event data set.

• Our Approach:• (1) Re-construct the communication structure.• (2) Segregate the communication activity by component / subsystem.• (3) Finally, we reconstruct the functional network of interacting components.

• This allows a dependency analysis for components, and hopefully also system tuning.

Page 18: PCAP Graphs for Cybersecurity and System Tuning

19© Cloudera, Inc. All rights reserved.

!!! WARNING !!!

Absolute values can be misleading.

Component Centric View

• port <=> host links removed

• Temporal networkslead to dynamicclusters

Page 19: PCAP Graphs for Cybersecurity and System Tuning

20© Cloudera, Inc. All rights reserved.

Central vs. External ComponentsImpact of the Selected Layout Algorithm

Page 20: PCAP Graphs for Cybersecurity and System Tuning

21© Cloudera, Inc. All rights reserved.

Two Experiments: TeraSort & Twitter Collect

Num

ber o

f pac

kets

Page 21: PCAP Graphs for Cybersecurity and System Tuning

22© Cloudera, Inc. All rights reserved.

5 Selected Channels during TeraSort

Num

ber o

f pac

kets

NameNodeNodeManager

YARN App ContainersYARN App ContainersYARN App Containers

Job AJob B

Job C

Replication factor: Job A : 3 Job B : 1 Job C : 5

Page 22: PCAP Graphs for Cybersecurity and System Tuning

23© Cloudera, Inc. All rights reserved.

5 Selected Channels during Twitter Collect

Num

ber o

f pac

kets

Active

Idle

Page 23: PCAP Graphs for Cybersecurity and System Tuning

24© Cloudera, Inc. All rights reserved.

Observations

• Both experiments show fundamental differences:•Only one active component vs. multiple competing communication channels.

• Common observation:•Background activity of an idle cluster shows periodic spikes (no surprise).•Different fluctuation levels on different channels

Page 24: PCAP Graphs for Cybersecurity and System Tuning

25© Cloudera, Inc. All rights reserved.

What’s next?

More Experiments & Data collection:• Ideal scenarios• Realistic workloads

Helpful Vizualization:• Provide a real time view of ongoing network activity using

Gephi streaming plugin (as shown in the Twitter Streaming demo).

Better Analysis:• Classify the components automatically …• Requires: to study activity time series,

e.g., using neuronal networks or non-linear statistics.• Understand the component structure and behavior over time …• Allows us: to find anomalies in the component structure and behavioral patterns.

Page 25: PCAP Graphs for Cybersecurity and System Tuning

27© Cloudera, Inc. All rights reserved.

Big Thanks To

Clouderans supporting the project ...

Alexander Bartfeld

Anton Vukovic

Rafael Arana

Zoltan Kiss

Nehme Tohme

Page 26: PCAP Graphs for Cybersecurity and System Tuning

28© Cloudera, Inc. All rights reserved.

Thank you@[email protected]

@[email protected]