34
Storage security in a critical enterprise OpenStack environment Danny Al-Gaaf (Deutsche Telekom AG), Sage Weil (Red Hat) OpenStack Summit 2015 - Vancouver

Storage security in a critical enterprise OpenStack environment

Embed Size (px)

Citation preview

Page 1: Storage security in a critical enterprise OpenStack environment

Storage security in a critical enterprise OpenStack environment

Danny Al-Gaaf (Deutsche Telekom AG), Sage Weil (Red Hat)OpenStack Summit 2015 - Vancouver

Page 2: Storage security in a critical enterprise OpenStack environment

● Secure NFV cloud at DT● Attack surface● Proactive countermeasures

○ Setup○ Vulnerability prevention○ Breach mitigation

● Reactive countermeasures○ 0-days, CVEs○ Security support SLA and lifecycle

● Conclusions

Overview

2

Page 3: Storage security in a critical enterprise OpenStack environment

Secure NFV Cloud @ DT

Page 4: Storage security in a critical enterprise OpenStack environment

NFV Cloud @ Deutsche Telekom

● Datacenter design○ BDCs

■ few but classic DCs ■ high SLAs for infrastructure and services■ for private/customer data and services

○ FDCs■ small but many■ near to the customer■ lower SLAs, can fail at any time■ services:

● spread over many FDCs● failures are handled by services and not the infrastructure

4

Page 5: Storage security in a critical enterprise OpenStack environment

High Security Requirements

● Multiple security placement zones (PZ)○ e.g. EHD, DMZ, MZ, SEC, Management○ TelcoWG “Security Segregation” use case

● Separation required for:○ compute○ networks○ storage

● Protect against many attack vectors● Enforced and reviewed by security department

● Run telco core services @ OpenStack/KVM/Ceph5

Page 6: Storage security in a critical enterprise OpenStack environment

Ceph and OpenStack

6

Page 7: Storage security in a critical enterprise OpenStack environment

Ceph Architecture

7

Page 8: Storage security in a critical enterprise OpenStack environment

Solutions for telco services

● Separation between security zones needed● Physical separation

○ Large number of clusters (>100)○ Large hardware demand (compute and storage)○ High maintenance effort○ Less flexibility

● RADOS pool separation○ Much more flexible○ Efficient use of hardware

● Question:○ Can we get the same security as physical separation?

8

Page 9: Storage security in a critical enterprise OpenStack environment

Placement Zones

● Separate RADOS pool(s) for each security zone○ Limit access using Ceph capabilities

● OpenStack AZs as PZs● Cinder

○ Configure one backend/volume type per pool (with own key)○ Need to map between AZs and volume types via policy

● Glance○ Lacks separation between control and compute/storage layer○ Separate read-only vs management endpoints

● Manila○ Currently not planned to use in production with CephFS○ May use RBD via NFS

9

Page 10: Storage security in a critical enterprise OpenStack environment

Attack Surface

Page 11: Storage security in a critical enterprise OpenStack environment

RadosGW attack surface

● S3/Swift○ Network access to gateway

only○ No direct access for consumer

to other Ceph daemons

● Single API attack surface

11

Page 12: Storage security in a critical enterprise OpenStack environment

RBD attack surface

● Protection from hypervisor block layer○ No network access or CephX

keys needed at guest level

● Issue:○ hypervisor is software and

therefore not 100% secure…■ e.g., Venom!

12

Page 13: Storage security in a critical enterprise OpenStack environment

Host attack surface

● If KVM is compromised, the attacker ...○ has access to neighbor VMs○ has access to local Ceph keys○ has access to Ceph public network and Ceph daemons

● Firewalls, deep packet inspection (DPI), ...○ partly impractical due to used protocols○ implications to performance and cost

● Bottom line: Ceph daemons must resist attack○ C/C++ is harder to secure than e.g. Python○ Homogenous: if one daemon is vulnerable, all in the cluster are!○ Risk of denial-of-service

13

Page 14: Storage security in a critical enterprise OpenStack environment

Network attack surface

● Client/cluster sessions are not encrypted○ Sniffer can recover any data read or written

● Sessions are authenticated○ Attacker cannot impersonate clients or servers○ Attacker cannot mount man-in-the-middle attacks

14

Page 15: Storage security in a critical enterprise OpenStack environment

Denial of Service

● Scenarios○ Submit many / large / expensive IOs

■ use qemu IO throttling!○ Open many connections○ Use flaws to crash Ceph daemons○ Identify non-obvious but expensive features of client/OSD interface

15

Page 16: Storage security in a critical enterprise OpenStack environment

Proactive Countermeasures

Page 17: Storage security in a critical enterprise OpenStack environment

Deployment and Setup

● Network ○ Always use separated cluster and public net○ Always separate your control nodes from other networks○ Don’t expose to the open internet○ Encrypt inter-datacenter traffic

● Avoid hyper-converged infrastructure○ Isolate compute and storage resources○ Scale them independently○ Risk mitigation if daemons are compromised or DoS’d○ Don’t mix

■ compute and storage■ control nodes (OpenStack and Ceph)

17

Page 18: Storage security in a critical enterprise OpenStack environment

Deploying RadosGW

● Big and easy target through HTTP(S) protocol

● Small appliance per tenant with○ Separate network ○ SSL terminated proxy forwarding

requests to radosgw○ WAF (mod_security) to filter○ Placed in secure/managed zone

● Don’t share buckets/users between tenants

18

Page 19: Storage security in a critical enterprise OpenStack environment

Ceph security: CephX

● Monitors are trusted key servers○ Store copies of all entity keys○ Each key has an associated “capability”

■ Plaintext description of what the key user is allowed to do

● What you get○ Mutual authentication of client + server○ Extensible authorization w/ “capabilities”○ Protection from man-in-the-middle, TCP

session hijacking● What you don’t get

○ Secrecy (encryption over the wire)19

Page 20: Storage security in a critical enterprise OpenStack environment

Ceph security: CephX take-aways

● Monitors must be secured○ Protect the key database

● Key management is important○ Separate key for each Cinder backend/AZ○ Restrict capabilities associated with each key○ Limit administrators’ power

■ use ‘allow profile admin’ and ‘allow profile readonly’■ restrict role-definer or ‘allow *’ keys

○ Careful key distribution (Ceph and OpenStack nodes)● To do:

○ Thorough CephX code review by security experts○ Audit OpenStack deployment tools’ key distribution○ Improve security documentation20

Page 21: Storage security in a critical enterprise OpenStack environment

● Static Code Analysis (SCA)○ Buffer overflows and other code flaws○ Regular Coverity scans

■ 996 fixed, 284 dismissed; 420 outstanding■ defect density 0.97

○ cppcheck○ LLVM: clang/scan-build

● Runtime analysis○ valgrind memcheck

● Plan○ Reduce backlog of low-priority issues (e.g., issues in test code)○ Automated reporting of new SCA issues on pull requests○ Improve code reviewer awareness of security defects

Preventing Breaches - Defects

21

Page 22: Storage security in a critical enterprise OpenStack environment

● Pen-testing○ human attempt to subvert security, generally guided by code review

● Fuzz testing○ computer attempt to subvert or crash, by feeding garbage input

● Harden build○ -fpie -fpic○ -D_FORTIFY_SOURCE=2 -O2 (?)○ -stack-protector=strong○ -Wl,-z,relro,-z,now○ Check for performance regression!

Preventing Breaches - Hardening

22

Page 23: Storage security in a critical enterprise OpenStack environment

Mitigating Breaches

● Run non-root daemons○ Prevent escalating privileges to get root ○ Run as ‘ceph’ user and group○ Pending for Infernalis

● MAC○ SELinux / AppArmor ○ Profiles for daemons and tools planned for Infernalis

● Run (some) daemons in VMs or containers○ Monitor and RGW - less resource intensive○ MDS - maybe○ OSD - prefers direct access to hardware

● Separate mon admin network23

Page 24: Storage security in a critical enterprise OpenStack environment

Encryption: Data at Rest

● Ceph-disk tool supports dm-crypt○ Encrypt raw block device (OSD and journal)○ Allow disks to be safely discarded if key remains secret

● Key management is still very simple○ Encryption key stored on disk via LUKS○ LUKS key stored in /etc/ceph/keys

● Plan○ Petera, a new key escrow project from Red Hat

■ https://github.com/npmccallum/petera○ Alternative: simple key management via monitor

24

Page 25: Storage security in a critical enterprise OpenStack environment

● Goal○ Protect data from someone listening in on network○ Protect administrator sessions configuring client keys

● Plan○ Generate per-session keys based on existing tickets○ Selectively encrypt monitor administrator sessions

Encryption: On Wire

25

Page 26: Storage security in a critical enterprise OpenStack environment

● Limit load from client○ Use qemu IO throttling features - set safe upper bound

● To do:○ Limit max open sockets per OSD○ Limit max open sockets per source IP

■ handle on Ceph or in the network layer?○ Throttle operations per-session or per-client (vs just globally)?

Denial of Service attacks

26

Page 27: Storage security in a critical enterprise OpenStack environment

CephFS

● No standard virtualization layer (unlike block)○ Proxy through gateway (NFS?)○ Filesystem passthrough (9p/virtfs) to host○ Allow direct access from tenant VM

● Granularity of access control is harder○ No simple mapping to RADOS objects

● Work in progress○ root_squash○ Restrict mount to subtree○ Restrict mount to user

27

Page 28: Storage security in a critical enterprise OpenStack environment

Reactive Countermeasures

Page 29: Storage security in a critical enterprise OpenStack environment

● Community○ Single point of contact: [email protected]

■ Core development team■ Red Hat, SUSE, Canonical security teams

○ Security related fixes are prioritized and backported○ Releases may be accelerated on ad hoc basis○ Security advisories to [email protected]

● Red Hat Ceph○ Strict SLA on issues raised with Red Hat security team○ Escalation process to Ceph developers○ Red Hat security team drives CVE process○ Hot fixes distributed via Red Hat’s CDN

Reactive Security Process

29

Page 30: Storage security in a critical enterprise OpenStack environment

Detecting and Preventing Breaches

● Brute force attacks○ Good logging of any failed authentication○ Monitoring easy via existing tools like e.g. Nagios

● To do:○ Automatic blacklisting IPs/clients after n-failed attempts on Ceph level

● Unauthorized injection of keys○ Monitor the audit log

■ trigger alerts for auth events -> monitoring○ Periodic comparison with signed backup of auth database?

30

Page 31: Storage security in a critical enterprise OpenStack environment

Conclusions

Page 32: Storage security in a critical enterprise OpenStack environment

Summary

● Reactive processes are in place○ [email protected], CVEs, downstream product updates, etc.

● Proactive measures in progress○ Code quality improves (SCA, etc.)○ Unprivileged daemons○ MAC (SELinux, AppArmor)○ Encryption

● Progress defining security best-practices○ Document best practices for security

● Ongoing process

32

Page 33: Storage security in a critical enterprise OpenStack environment

Get involved !

● OpenStack○ Telco Working Group

■ #openstack-nfv ○ Cinder, Glance, Manila, ...

● Ceph○ https://ceph.com/community/contribute/ ○ [email protected]○ IRC: OFTC

■ #ceph, ■ #ceph-devel

○ Ceph Developer Summit 33

Page 34: Storage security in a critical enterprise OpenStack environment

[email protected]@redhat.com

dalgaafsage

Danny Al-Gaaf Senior Cloud TechnologistSage Weil Ceph Principal Architect

IRC

THANK YOU!