17
+bash-4.3$ echo 'PCAP cant scale’ PCAP cant scale +bash-4.3$ echo 'PCAP cant scale'| sed 's/cant/does/’ PCAP does scale FloCon 2017 – Jan 11 2017 +bash-4.3$ whoami Erik Waher (Security Engineer @ Facebook)

Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

Embed Size (px)

Citation preview

Page 1: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

+bash-4.3$ echo 'PCAP cant scale’PCAP cant scale

+bash-4.3$ echo 'PCAP cant scale'| sed 's/cant/does/’PCAP does scale

FloCon 2017 – Jan 11 2017

+bash-4.3$ whoamiErik Waher (Security Engineer @ Facebook)

Page 2: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

• Netflow tells only part of the story

• IDS only catches what it knows

• Incident Response can often see network traffic occurred,

but never will know what was said without PCAP

Problem: IR needs network visibilitySpecifically they need indicators of compromise (IOC) to track down badness

Page 3: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

• Physical space limitations in the DC/POP/Coat closet

• Retention can’t keep up with incident “time to know”

• High throughput requires multiple hosts and infra ($$) to

load balance capture traffic

• Commercial solutions are expensive ($$) and proprietary

Difficulty to Solve: High

Page 4: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

Solution: build something better

• Flexible storage size• Low Cost • High throughput • Network accessible storage• PCAP as a service

We set out to build:

• 1PB – 3.6PB with room to grow• OCP hardware (~$500 per/TB)• Speed - ~5GBps/44Gbps per host• NFS backed by GlusterFS• PCAP as a service• Something anyone could build

themselves

We built:

Page 5: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

Demo

Page 6: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

• Network• PCAP server• Storage Tier

How does all that work?

Network

Storage Tier Pcap Server

TAP or SPAN

Corporate Network

Page 7: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

54 GFS/NFS Hosts

Storage Tier

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFSd

GlusterFSd

Pandion

Capture2Disk

Napatech Accelerator

Network

Anue

SPAN

Page 8: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

Network

Network

SPAN

Anue

eth1 eth1 eth1 eth1

Pandion

Napatech Accelerator

Capture2Disk

eth_team0

eth2 eth3 eth4 eth5

eth1 eth6

Capture eth3

Capture eth4

Capture eth1

Capture eth2

Rack Switch

pcap-rack-switch1# sh port-channel load-balance

Port Channel Load-Balancing Configuration:System: source-dest-ip

Network

#ifcfg-team0DEVICETYPE=Team...TEAM_CONFIG='{"runner": { "name": "loadbalance" }, "tx_hash": [ "eth", "l3", "l4" ], "tx_balancer": { "name": "basic" } }'...#ifcfg-eno1DEVICETYPE=TeamPort...TEAM_MASTER=team0

Page 9: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

• 4 SFP+ capture interfaces to

Napatech accelerator

• PCAP writer and reader

• NIC Teaming - 6 interfaces

(SFP+) for NFS traffic

PCAP ServerNapatech PandionFlex

Pandion

Napatech Accelerator

Capture2Disk

/mnt/Capture/Brick_N

eth_team0

/mnt/Index/File_#

VFS

eth2 eth3 eth4 eth5

eth1 eth6

Brick

NFS

GlusterFS

Capture eth3

Capture eth4

Capture eth1

Capture eth2

Page 10: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

• Bricks run NFSd and GFSd

• Brick = 30 x 4TB HD ~100TB useable

• Storage Tier = 54 Bricks = 1.3PB useable

• GFS performs file hashing, load balance

and failover

Storage TierGlusterFS with NFS and Open Compute Project Hardware

Storage Tier

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Brick

NFS/GFS

Page 11: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

GlusterFS Introduction

Replication TranslatorReplication Translator Replication TranslatorReplication Translator Replication TranslatorReplication Translator

FUSEClient NFS

DistributeTranslator

GF APIs

High Level

Page 12: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

Replicate Translator

Client IO

Brick 1 Brick 2 Brick n…

Update IO “Journal”

Heal from “wise” bricks

Brick 2

GlusterFS Introduction

Page 13: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

• Packets arrive, get copied in blocks to ram buffer, index written, compressed into 1gb chunks, written to NFS•Multiple buffers exist – accelerator, host ram disk• PCAP writes to a NFS export• GFS 3x Replication requirement limits throughput & storage• Multi-Stream many NFS paths for throughput

• Index locally for now•Moving to gluster in next iteration

How it all works

Page 14: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

Things to watch out for• Long Fat Network (LFN’s)• Bandwidth Delay Product• Multi-Homing pcap hosts• Playing nice with your network• File System best practices

(directory structures, reads)• NFS host failures

• PCAP buffers•Microbursts• Fault Tolerance• Gluster Servers• PCAP Services• Reading• Writing

Page 15: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

• Didn’t want to create or support a pcap application• Existing Napatechs already in fleet, building features on

accelerator cards• Facebook GlusterFS active development branch

https://github.com/gluster/glusterfs/tree/release-3.8-fb• NFS = FUSE = easy to take existing products to network

storage• Network storage provides flexibility to deploy anywhere

Why Napatech, Why Gluster? Why X?

Page 16: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

• Probably internal HDFS infra•MAP/Reduce function• Every time you touch a file, can you improve it?• Slices packets, run yara sigs over it? Generate new meta?

• Publisher/subscriber model• PCAP service writes without need to read• Readers can determine if file is ready to read without

querying writer• De-couples reader/writer

Next Generation Architecture

Page 17: Floocon pcap for Review - Schedschd.ws/hosted_files/flocon2017/d4/echo 'PCAP cant scale' sed... · GlusterFS Capture eth3 Capture eth4 Capture eth1 Capture eth2 • Bricks run NFSd

•Questions?

• https://github.com/gluster/glusterfs/tree/release-3.8-fb

[email protected]

Thanks