Upload
doantruc
View
225
Download
1
Embed Size (px)
Citation preview
+bash-4.3$ echo 'PCAP cant scale’PCAP cant scale
+bash-4.3$ echo 'PCAP cant scale'| sed 's/cant/does/’PCAP does scale
FloCon 2017 – Jan 11 2017
+bash-4.3$ whoamiErik Waher (Security Engineer @ Facebook)
• Netflow tells only part of the story
• IDS only catches what it knows
• Incident Response can often see network traffic occurred,
but never will know what was said without PCAP
Problem: IR needs network visibilitySpecifically they need indicators of compromise (IOC) to track down badness
• Physical space limitations in the DC/POP/Coat closet
• Retention can’t keep up with incident “time to know”
• High throughput requires multiple hosts and infra ($$) to
load balance capture traffic
• Commercial solutions are expensive ($$) and proprietary
Difficulty to Solve: High
Solution: build something better
• Flexible storage size• Low Cost • High throughput • Network accessible storage• PCAP as a service
We set out to build:
• 1PB – 3.6PB with room to grow• OCP hardware (~$500 per/TB)• Speed - ~5GBps/44Gbps per host• NFS backed by GlusterFS• PCAP as a service• Something anyone could build
themselves
We built:
Demo
• Network• PCAP server• Storage Tier
How does all that work?
Network
Storage Tier Pcap Server
TAP or SPAN
Corporate Network
54 GFS/NFS Hosts
Storage Tier
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFSd
GlusterFSd
Pandion
Capture2Disk
Napatech Accelerator
Network
Anue
SPAN
Network
Network
SPAN
Anue
eth1 eth1 eth1 eth1
Pandion
Napatech Accelerator
Capture2Disk
eth_team0
eth2 eth3 eth4 eth5
eth1 eth6
Capture eth3
Capture eth4
Capture eth1
Capture eth2
Rack Switch
pcap-rack-switch1# sh port-channel load-balance
Port Channel Load-Balancing Configuration:System: source-dest-ip
Network
#ifcfg-team0DEVICETYPE=Team...TEAM_CONFIG='{"runner": { "name": "loadbalance" }, "tx_hash": [ "eth", "l3", "l4" ], "tx_balancer": { "name": "basic" } }'...#ifcfg-eno1DEVICETYPE=TeamPort...TEAM_MASTER=team0
• 4 SFP+ capture interfaces to
Napatech accelerator
• PCAP writer and reader
• NIC Teaming - 6 interfaces
(SFP+) for NFS traffic
PCAP ServerNapatech PandionFlex
Pandion
Napatech Accelerator
Capture2Disk
/mnt/Capture/Brick_N
eth_team0
/mnt/Index/File_#
VFS
eth2 eth3 eth4 eth5
eth1 eth6
Brick
NFS
GlusterFS
Capture eth3
Capture eth4
Capture eth1
Capture eth2
• Bricks run NFSd and GFSd
• Brick = 30 x 4TB HD ~100TB useable
• Storage Tier = 54 Bricks = 1.3PB useable
• GFS performs file hashing, load balance
and failover
Storage TierGlusterFS with NFS and Open Compute Project Hardware
Storage Tier
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
Brick
NFS/GFS
GlusterFS Introduction
Replication TranslatorReplication Translator Replication TranslatorReplication Translator Replication TranslatorReplication Translator
FUSEClient NFS
DistributeTranslator
GF APIs
High Level
Replicate Translator
Client IO
Brick 1 Brick 2 Brick n…
Update IO “Journal”
Heal from “wise” bricks
Brick 2
GlusterFS Introduction
• Packets arrive, get copied in blocks to ram buffer, index written, compressed into 1gb chunks, written to NFS•Multiple buffers exist – accelerator, host ram disk• PCAP writes to a NFS export• GFS 3x Replication requirement limits throughput & storage• Multi-Stream many NFS paths for throughput
• Index locally for now•Moving to gluster in next iteration
How it all works
Things to watch out for• Long Fat Network (LFN’s)• Bandwidth Delay Product• Multi-Homing pcap hosts• Playing nice with your network• File System best practices
(directory structures, reads)• NFS host failures
• PCAP buffers•Microbursts• Fault Tolerance• Gluster Servers• PCAP Services• Reading• Writing
• Didn’t want to create or support a pcap application• Existing Napatechs already in fleet, building features on
accelerator cards• Facebook GlusterFS active development branch
https://github.com/gluster/glusterfs/tree/release-3.8-fb• NFS = FUSE = easy to take existing products to network
storage• Network storage provides flexibility to deploy anywhere
Why Napatech, Why Gluster? Why X?
• Probably internal HDFS infra•MAP/Reduce function• Every time you touch a file, can you improve it?• Slices packets, run yara sigs over it? Generate new meta?
• Publisher/subscriber model• PCAP service writes without need to read• Readers can determine if file is ready to read without
querying writer• De-couples reader/writer
Next Generation Architecture