Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Securing Big Data in the Cloud: Towards
a More Focused and Data Driven
Approach
Ragib Hasan, UAB
Anthony Skjellum, Auburn
2014 NSF Big Data Workshop
[Cloud Computing] is a security nightmare and it can't be handled in traditional ways.
John Chambers CISCO CEO
2
The Age of Big Data and Clouds
The global market for clouds is growing – 30% Compound Annual Growth Rate (CAGR) reaching $270 billion in
2020 (Market Research Media)
– Growth happening both in private and government sectors (US Federal government’s spending on the cloud is approx. $792 million in 2013 (INPUT)
Big Data is also becoming ubiquitous, going mainstream from academic and research usage
– A 4300% growth predicted by 2020.
– Clouds are the most suitable platform to make any sense of big data
So, if cloud computing is so effective in dealing with Big Data, why isn’t everyone doing it?
4
Clouds are still subject to traditional data confidentiality,
integrity, availability, and privacy issues, plus some
additional attacks
Why are Big Data and cloud security different from traditional security?
5
Securing a house Securing a motel
Owner and user are often the same entity
Owner and users are almost invariably distinct entities
Multi-tenancy
– Same hardware/network shared by many users
Trust asymmetry
– Users have to completely trust the cloud provider for everything
Lack of accountability
– There is a lack of accountability in part of the cloud service provider
Many major challenges remain in securing clouds and Big Data
• Novel attacks
• Trustworthy cloud architectures
• Data integrity and availability
• Computation integrity
• Data and computation privacy
• Data forensics
• Misbehavior detection
• Malicious use of clouds
6
Co-tenancy in clouds creates new attack vectors
A cloud is shared by multiple users Malicious users can now legally be in the same infrastructure
Misusing co-tenancy, attackers can launch side channel attacks on victims
Research question: How to prevent attackers from exploiting co-tenancy in attacking the infrastructure and/or other clients?
Today’s cloud architectures act like big black boxes
8
Clients have no idea of or control over what is happening inside the cloud
Clients are forced to trust cloud providers completely
Research Question: How do we design cloud computing architectures that are semi-transparent and provide clients with control over security?
Today’s clouds provide no guarantee about outsourced data
9
Research Question: How can clients get assurance/proofs that the cloud provider is actually storing data, is not tampering with data, and can make the data available on-demand?
Problem: Dishonest cloud providers can throw data away or lose data. Malicious intruders can delete or tamper with data. Clients need reassurance that the outsourced data is available, has not been tampered with, and remains confidential.
Amazon’s Terms of services
Ensuring confidentiality of data in outsourced computation is difficult
10
Most type of computations require decrypting data before any computations If the cloud provider is not trusted, this may result in breach of confidentiality
Research Question: How can we ensure confidentiality of data and computations in a cloud?
Clients have no way of verifying computations outsourced to a Cloud
11
Scenario User sends her data processing job to the cloud. Clouds provide dataflow operation as a service (e.g., MapReduce, Hadoop etc.) Problem: Users have no way of evaluating the correctness of results
Research question: How can we verify the accuracy of outsourced computation?
Data Forensics in Clouds is difficult
12
Cloud providers are not willing to open up their entire storage for forensic investigations.
Certain Government regulations mandate the ability to audit and run forensic analysis on critical business or healthcare data
Clouds complicate forensic analysis, since the same storage infrastructure is shared by many clients
Research question: How can we augment cloud infrastructures to allow forensic investigations?
(Largely) Unexplored Areas
Legal/policy issues and regulatory compliance:
• Cloud based storage is still subject to regulatory compliance and legal orders.
• Implementing things such as litigation hold in a cloud is very difficult.
• Proving a cloud is fully compliant with pre-cloud law is challenging
13
Research question: How does cloud computing fit in with data security laws and regulations such as SOX, HIPAA, or with Litigation holds?
Making Big Data and Clouds Secure – focus on the data!
Our solution: Take a data driven approach
– Focus on data, it’s location, generation, and transmission – the provenance of data
– Look at the lifecycle of data and
– Ensure trustworthy computation and attribution
Make Provenance a fundamental
part of clouds
Why aren’t today’s clouds accountable?
Users do not know – What happened to their data inside the cloud?
– What applications generated their data?
– How did the state of the cloud change?
Cloud providers act like black boxes – Clouds do not provide any information about internal operation to
users
– Since they are in full control, any evidence/forensic investigation must go through them, making that less transparent
15
Cloud Provenance is Key to Solving Cloud Security
Data provenance: The modification and movement history of data objects as they enter/leave the cloud and are modified
Application provenance: The history and activities of applications and users
State provenance: The state history of the cloud computing system itself
16
Challenges in Cloud Provenance
Provenance collection: How do we efficiently collect it?
Provenance storage: Where do we store it? What structures do we use?
Securing provenance: How do we prevent attacks and forgery?
Access to provenance: How can we give access to provenance while preserving cloud’s/other users’ privacy?
17
How to provide accountability to users using Provenance-based Proofs?
• Proof of past data possession
• Proof of data possession
• Proof of data deletion
• Proof of capability
• Proof of work/task completion and Correctness
18
Current Results
PPDP: Proof of past data possession (Zawoad and Hasan, CyberSec 2012, journal 2012)
PPDP attests that a User U possessed a File F at a given past time.
An Auditor can use PPDP to check the Past Data Possession.
File can be deleted but PPDP can still preserve the proof of data possession.
19
Current Results
SecLaaS: Secure Log Access as a Service (Zawoad and Hasan, ASIACCS 2013)
20
Attack
Communication
Attacker
VM
VM
VM
Web
Server
Investigator
Ext
API
NC
Log DB Proof
DB
Ongoing projects
Provenance Aware Cloud (PAC)
We are building a provenance-capable cloud using the openstack platform
A small scale testbed has been developed for cloud security research.
21
[Cloud Computing] is a security nightmare and it can't be handled in traditional ways can be handled using Trustworthy provenance
John Chambers CISCO CEO
22
Details? Visit http://secret.cis.uab.edu or Email [email protected]