sep-26-03.ppt

New England Database Society (NEDS)

Friday, September 26, 2003

Volen 101, Brandeis University

Sponsored by Sun Microsystems

Data and Applications Security Developments and Directions

andXML Security

Bhavani Thuraisingham

The National Science Foundation

September 2003

Outline

Data and Applications Security (DAS)

- Developments and Directions; DAS at NSF

Secure Semantic Web

- XML Security; Other directions

Some Emerging Secure DAS Technologies

- Secure Information Integration; Secure Sensor Information Management; Secure Dependable Information Management

Some Directions for Privacy Research

- Data Mining for handling security problems; Privacy vs. National Security; Privacy Constraint Processing; Foundations of the Privacy Problem

What are the Challenges? Details of XML Security Research

Developments in Data and Applications Security: 1975 - Present

Access Control for Systems R and Ingres (mid 1970s) Multilevel secure database systems (1980 – present)

- Relational database systems: research prototypes and products; Distributed database systems: research prototypes and some operational systems; Object data systems; Inference problem and deductive database system; Transactions

Recent developments in Secure Data Management (1996 – Present)

- Secure data warehousing, Role-based access control (RBAC); E-commerce; XML security and Secure Semantic Web; Data mining for intrusion detection and national security; Privacy; Dependable data management; Secure knowledge management and collaboration

Developments in Data and Applications Security: Multilevel Secure Databases - I

Air Force Summer Study in 1982 Early systems based on Integrity Lock approach Systems in the mid to late 1980s, early 90s

- E.g., Seaview by SRI, Lock Data Views by Honeywell, ASD and ASD Views by TRW

- Prototypes and commercial products

- Trusted Database Interpretation and Evaluation of Commercial Products

Secure Distributed Databases (late 80s to mid 90s)

- Architectures; Algorithms and Prototype for distributed query processing; Simulation of distributed transaction management and concurrency control algorithms; Secure federated data management

Developments in Data and Applications Security: Multilevel Secure Databases - II

Inference Problem (mid 80s to mid 90s)

- Unsolvability of the inference problem; Security constraint processing during query, update and database design operations; Semantic models and conceptual structures

Secure Object Databases and Systems (late 80s to mid 90s)

- Secure object models; Distributed object systems security; Object modeling for designing secure applications; Secure multimedia data management

Secure Transactions (1990s)

- Single Level/ Multilevel Transactions; Secure recovery and commit protocols

Some Directions and Challenges for Data and Applications Security - I

Secure semantic web

- Single/multiple security models?

- Different application domains Secure Information Integration

- How do you securely integrate numerous and heterogeneous data sources on the web and otherwise

Secure Sensor Information Management

- Fusing and managing data/information from distributed and autonomous sensors

Secure Dependable Information Management

- Integrating Security, Real-time Processing and Fault Tolerance Data Sharing vs. Privacy

- Federated database architectures?

Some Directions and Challenges for Data and Applications Security - II

Data mining and knowledge discovery for intrusion detection

- Need realistic models; real-time data mining Secure knowledge management

- Protect the assets and intellectual rights of an organization Information assurance, Infrastructure protection, Access

Control

- Insider cyber-threat analysis, Protecting national databases, Role-based access control for emerging applications

Security for emerging applications

- Geospatial, Biomedical, E-Commerce, etc. Other Directions

- Trust and Economics, Trust Management/Negotiation, Secure Peer-to-peer computing,

NSF Efforts in Data and Applications Security (DAS)

Security for IIS (Information and Intelligent Systems) Technologies

- DAS focuses on security needs for IIS Division Technologies (e.g. Information and data management, digital libraries, collaboration and e-business, etc.)

- DAS related proposals have also been managed under ITR (Information Technology Research) and other initiatives (e.g., Sensor initiative) during FY2003

DAS is part of CISE-wide (Computer and Information Sciences) Directorate Theme on Cyber Trust for FY04 and beyond

- Focus areas for Cyber Trust include: Trusted Computing, Network Security, Data and Applications Security, Embedded Systems Security

- Inaugural Cyber Trust PI Meeting in Baltimore, August 13-15, 2003

- Plans for FY2004 will be announced soon

Opportunities possibly also under ITR

Directions and Challenges for Securing the Semantic Web

The Semantic Web by Tim Berners Lee

- Definition and Layers Steps for Securing the Semantic Web XML Security for Securing the Semantic Web Related research and directions for secure semantic web

- Secure Information Integration

Secure Semantic Web

According to Tim Berners Lee, The Semantic Web supports

- Machine readable and understandable web pages Layers for the semantic web: Security cuts across all layers Challenge: Not only integrating the layers for the semantic web, but

also ensuring secure interoperability

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

TCP/IP, Sockets, HTML, Agents

XML, XML Schemas

RDF

Ontologies, Semantic Interoperability

Logic, Proof, Trust

Steps to Securing the Semantic Web Flexible Security Policy

- One that can adapt to changing situations and requirements

Security Model

- Access Control, Role-based security, Nonrepudiation, Authentication

Security Architecture and Design

- Examine architectures for semantic web and identify security critical components

Securing the Layers of the Semantic Web

- Secure agents, XML security, RDF security, secure semantic interoperabiolity, security properties for ontologies, Security issues for digital rights

Challenge: How do you integrate across the layers of the Semantic Web and preserve security?

Much of the research is focusing on XML security; Next step is securing RDF documents

XML Security

Some ideas have evolved from research in secure multimedia/object data management

Access control and authorization models

- Protecting entire documents, parts of documents, propagations of access control privileges; Protecting DTDs vs Document instances; Secure XML Schemas

Update Policies and Dissemination Policies Secure publishing of XML documents

- How do you minimize trust for third party publication Use of Encryption Inference problem for XML documents

- Portions of documents taken together could be sensitive, individually not sensitive

More details at the end

thura

age

What are the Next Steps and Challenges for Secure Semantic Web? - I

We need to continue with XML security research as well as work with standards

- W3C standards are advancing rapidly; security research, prototypes and products must keep up with the developments

- Researchers, vendors and standards organizations must work together

Secure XML Database Systems (query, transactions, storage, - - -) RDF Security

- When you bring in semantics, many challenges for security

- Need to develop security models for RDF documents Secure Ontologies

- Two aspects; one is to develop protection models for Ontology databases; other is to use ontologies for ensuring security and privacy

thura

age

What are the Next Steps and Challenges for Secure Semantic Web? - II

Secure semantic interoperability

- What can we learn from secure database interoperability and federated databases?

Trust and digital rights management

- How do you trust the contents of a document? How do you pass digital rights when documents are disseminated?

Security for domain specific semantic webs

- Do we need multiple security policies and models? Secure interoperability across the layers of the semantic web

- This will be a major challenge even when security is not being considered

- Security has to be considered in the beginning Secure Information Integration is a key component of securing the

semantic web

thura

age

Secure Information Integration Integrate disparate, heterogeneous and autonomous information

sources on the web or otherwise

- E.g, structured/unstructured data, data streams, geospatial data Security must be considered together with the Information

Integration technologies IJCAI workshop on Information Integration

http://www.isi.edu/info-agents/workshops/ijcai03/iiweb.html

- Technologies include Information extraction and gathering; Wrapper learning and automatic wrapper generation; Source descriptions, source meta-data learning and source statistics learning; Web service composition; Record linkage/object consolidation and Ontology matching; Novel integration and Inter-schema mediation architectures; Answering queries using views; Web-based query planning, optimization and execution; Data mining for integration

Secure Information Integration: Directions for Research

Start research on security technologies for information integration

- E.g., Secure web services decomposition; Security architectures for integration; Security issues for ontology matching, Secure information extraction, etc.

Secure sensor information management is one aspect of secure information integration

- Data streams from disparate, autonomous and heterogeneous sensors have to be fused and managed securely

Secure Sensor Information Management Sensor network consists of a collection of autonomous and

interconnected sensors that continuously sense and store information about some local phenomena

- May be employed in battle fields, seismic zones, pavements Data streams emanate from sensors; for geospatial applications

these data streams could contain continuous data of maps, images, etc. Data has to be fused and aggregated

Continuous queries are posed, responses analyzed possibly in real-time, some streams discarded while rest may be stored

Recent developments in sensor information management include sensor database systems, sensor data mining, distributed data management, layered architectures for sensor nets, storage methods, data fusion and aggregation

Secure sensor data/information management has received very little attention; need a research agenda

Secure Sensor Information Management: Directions for Research

Individual sensors may be compromised and attacked; need techniques for detecting, managing and recovering from such attacks

Aggregated sensor data may be sensitive; need secure storage sites for aggregated data; variation of the inference and aggregation problem?

Security has to be incorporated into sensor database management

- Policies, models, architectures, queries, etc. Evaluate costs for incorporating security especially when the sensor

data has to be fused, aggregated and perhaps mined in real-time Research on secure dependable information management for sensor

data

Secure Dependable Information Management Dependable information management includes

- secure information management

- fault tolerant information

- High integrity and high assurance computing

- Real-time computing Conflicts between different features

- Security, Integrity, Fault Tolerance, Real-time Processing

- E.g., A process may miss real-time deadlines when access control checks are made

- Trade-offs between real-time processing and security

- Need flexible security policies; real-time processing may be critical during a mission while security may be critical during non-operational times

Secure Dependable Information Management Example: Next Generation AWACS

Technology

provided by

the project

Technology

provided by

the project

Hardware

Display Processor

&Refresh

Channels

Consoles(14)

Navigation

Sensors

Data LinksData Analysis Programming

Group (DAPG)

FutureApp

FutureApp

FutureApp

Multi-SensorTracks

SensorDetections

MSIApp

DataMgmt. Data

Xchg.

Infrastructure Services

•Security being considered after the system has been designed and prototypes implemented

•Challenge: Integrating real-time processing, security and fault tolerance

Real-time Operating System

Secure Dependable Information Management: Directions for Research

Challenge: How does a system ensure integrity, security, fault tolerant processing, and still meet timing constraints?

- Develop flexible security policies; when is it more important to ensure real-time processing and ensure security?

- Security models and architectures for the policies; Examine real-time algorithms – e.g.,query and transaction processing

- Research for databases as well as for applications; what assumptions do we need to make about operating systems, networks and middleware?

Data may be emanating from sensors and other devices at multiple locations

- Data may pertain to individuals (e.g. video information, images, surveillance information, etc.)

- Data may be mined to extract useful information

- Need to maintain privacy

Research Directions for Privacy

Why this interest now on privacy?

- Data Mining for National Security

- Data Mining is a threat to privacy

- Balance between data sharing/mining and privacy Is federated data management a solution

Privacy Preserving Data Mining Inference Problem as a Privacy Problem

- Handling privacy constraints; Foundations Web/Semantic Web will have to address privacy Federated Architectures for Data Sharing?

Data Mining to Handle Security Problems

Data mining tools could be used to examine audit data and flag abnormal behavior

Much recent work in Intrusion detection

- e.g., Neural networks to detect abnormal patterns Tools are being examined to determine abnormal patterns for

national security

- Classification techniques, Link analysis Fraud detection

- Credit cards, calling cards, identity theft etc.

Data Mining as a Threat to Privacy

Data mining gives us “facts” that are not obvious to human analysts of the data

Enables inspection and analysis of huge amounts of data Possible threats:

- Predict information about classified work from correlation with unclassified work

- Mining “Open Source” data to determine predictive events (e.g., Pizza deliveries to the Pentagon)

It isn’t the data we want to protect, but correlations among data items

Initial ideas presented at the IFIP 11.3 Database Security Conference, July 1996 in Como, Italy

Data Sharing/Mining vs. Privacy: Federated Data Management Architecture for the Department of Homeland Security?

What can we do?: Privacy Preserving Data Mining

Prevent useful results from mining

- limit data access to ensure low confidence and support

- Extra data (“cover stories”) to give “false” results with Providing only samples of data can lower confidence in mining results;

Idea: If adversary is unable to learn a good classifier from the data, then adversary will be unable to learn good

- rules, predictive functions Approach: Only make a sample of data available

- Limits ability to learn good classifier Several recent research efforts have been reported

Privacy Problem as a form of theInference Problem

Privacy constraints

- Content-based constraints; association-based constraints Privacy controller

- Augment a database system with a privacy controller for constraint processing and examine the releasability of data/information (e.g., release constraints)

Use of conceptual structures to design applications with privacy in mind (e.g., privacy preserving database and application design)

The web makes the problem much more challenging than the inference problem we examined in the 1990s!

Is the General Privacy Problem Unsolvable?

Privacy Constraints

Simple Constraints - an attribute of a document is private Content-based constraints: If document contains information about

XXX, then it is private Association-based Constraints: Two or more documents together is

private; individually they are public Dynamic constraints: After some event, the document is private or

becomes public Several challenges: Specification and consistency of constraints is a

Challenge; How do you take into consideration external knowledge? Managing history information

Architecture for Privacy Constraint Processing

User Interface Manager

ConstraintManager

Privacy Constraints

Query Processor:

Constraints during query and release operations

Update Processor:

Constraints during update operation

Database Design Tool

Constraints during database design operation

DatabaseDBMS

Foundations of the Privacy Problem Privacy Problem: Given a database and a set of privacy constraints,

can you decide ahead of time that privacy will be violated; that is, through querying can one extract information that is private?

Is the General Privacy problem unsolvable

- Yes.

- To what extent? Research result: For every recursively enumerable degree

one can find a privacy problem that is one-one equivalent to the degree (paper in preparation)

What is the Computational Complexity of the Privacy Problem?

- Can one develop varying degrees of privacy classes? What is the space-time complexity?

Privacy for the Web/Semantic Web

Privacy for the web is getting a lot of attention; especially after the publicity with the DARPA program (total information awareness - TIA)

We need to start looking at privacy for the semantic web also; that is what are the additional privacy concerns due to the semantic web?

Is privacy a technical problem? What roles do lawyers, policy makers and sociologists have to play?

- How can scientists and technologists, lawyers, policy makers and sociologists work together?

Should we limit privacy research within the context of national security or extend it beyond –e.g., medical community, banking, IRS

We must follow up with recent IBM workshop on Privacy; Discussions at NSF involving multiple programs

Secure Federated Database Management for Data Sharing: Schema Integration

Adapted from Sheth and Larson, ACM Computing Surveys, September 1990

Component Schema for Component A

Component Schema for Component B

Component Schema for Component C

Local Schema 1

Local Schema 2

Generic Schema for Component A

Generic Schemafor Component B

Generic Schemafor Component C

Export Schemafor Component A

Export Schema Ifor Component B

Export Schemafor Component C

Federated Schemafor FDS - 1

Federated Schemafor FDS - 2

ExternalSchema 1.2 Schema 2.1

ExternalSchema 2.2

ExternalSchema 1.1

Export Schema IIfor Component B

External

Secure Federated Database Management for Data Sharing: Policy Integration

Policies at the Componentlevel: e.g., Component policiesfor components A, B, and C

Generic policies for the components:e.g., generic policies for components A, B, and C

External policies: Policiesfor the various classes of users

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Federated policies: integrate export policies of the components of the federation

Export policies for the components:e.g., export policies for components A, B, and C(note: component may export different policiesto different federations)

Adapted from Computers and Security, Thuraisingham, December 1994

What are our challenges?

If semantic web is to become viable, we need to understand how the different layers may interoperate; we cannot ignore security and privacy

Data Mining, National Security and Privacy will dominate research because of the times we are living in

We don’t have a good handle on secure dependable data/information management

- How do we handle conflicting requirements? e.g., integrating security, real-time processing, and fault tolerant computing

- Building dependable semantic webs?

Secure sensor nets, Secure e-commerce systems, Secure knowledge management will continue to have many challenging research problems

We need to build systems based on solid theoretical foundations; composable systems (ensure interfaces are secure)

Interdisciplinary research is the way of the future; within CS as well as between CS and other areas (e.g., secure sensors)

Some Key Directions Transfer security technology to operational systems; need to

develop systems that are flexible, usable and secure

- Bring human computer interaction and people aspects into system design

Security for emerging applications

- E.g., medical informatics, bioinformatics, scientific and engineering informatics, and other areas

Data mining for security (e.g., intrusion detection, insider cyber threat); cannot forget about Privacy

Interdisciplinary research in information security Emerging areas include Secure semantic web, Secure Information

Integration, Secure Sensors, Trust Management/Negotiation, Economics, - - - - -

Other Ideas and Directions?

Please contact

- Dr. Bhavani Thuraisingham The National Science Foundation Suite 1115 4201 Wilson Blvd Arlington, VA 22230 Phone: 703-292-8930 Fax 703-292-9037 email: [email protected]

XML Security

Collaborating with University of Milan; Paper to appear in TKDE Access Control

- Pull model: User queries XML documents; results are computed by applying the access control rules in the policy base and user credentials

- Push model: Periodically portions of XML documents are pushed to the user depending on the credentials and access control rules

Secure publishing of XML documents

- With a set of digital signatures generated by the owner and no trust required of the publisher, a subject can verify the authenticity of the query response

Example XML Document

Patents

Funds

Year: 2002

Name: U. Of X

ExpensesName:CS

titleAuthorID

Asset report

Assets

Dept

Equipment

news

Patent

Other assets

Grants

Contracts

Subject Credentials and Protection Objects

Subjects are given access to XML documents or portions of documents depending on user ID and/or Credentials

Credential specification is based on credential types; credential type is a pair <credential name, credential properties>

- Example of credential types for the XML document are: Professor, Secretary (depending on the roles)

Protection objects are objects to which access is controlled

- Entire XML documents or portions of XML documents

- Protection objects is a pair <target, path>

- Target is the file name of the XML document

- Path is Xpath expression on target

Credential Base

<Professor credID=“9” subID = “16: CIssuer = “2”><name> Alice Brown </name><university> University of X <university/><department> CS </department><research-group> Security </research-group>

</Professor>

<Secretary credID=“12” subID = “4: CIssuer = “2”><name> John James </name><university> University of X <university/><department> CS </department><level> Senior </level>

</Secretary>

Policy Base

Policy base stores security policies for protecting the XML source contents

Policy base is an XML document with a subelement policyspec for each security policy defined for XML source

Policyspec has the following

- Subject consisting of userID and/or credentials

- Object (with target and path)

- Access modes: Read, Navigate, Append, Write

- Propagation option: No propagation, One level, Cascade Security officer manages the policy base

Policy Base Example

<? Xml VERSION = “1.0” ENCODING = “utf-8”?> <Policy–base>

<policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘CS’]//Node()” priv = “VIEW”/>

<policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘EE’] /Short-descr/Node() and //Patent [@Dept = ‘EE’]/authors” priv = “VIEW”/>

<policy-spec cred-expr = - - - -

<policy-spec cred-expr = - - --

</Policy-base>

Explantaion: CS professors are entitled to access all the patents of their department. They are entitled to see only the short descriptions and authors of patents of the EE department

Access Control Strategy

Subjects request access to XML documents under two modes: Browsing and authoring

- With browsing access subject can read/navigate documents

- Authoring access is needed to modify, delete, append documents

Access control module checks the policy based and applies policy specs

Views of the document are created based on credentials and policy specs

In case of conflict, least access privilege rule is enforced

System Architecture for Access Control

UserPull/Query Push/result

XML Documents

X-Access X-AdminAdmin Tools

Policybase

Credentialbase

Secure Publishing of XML Documents

Distinguish between owner, publisher and user (subject) Owner specifies access control policies based on user

credentials; policy specified in policybase Publisher computes view of document and sends reply

document to subject; no trust placed on the publisher by using signatures

Owner

Publisher

SubjectSubscribe

Policy

Security Enhanced DocumentSecure Structure

Query, Policy

Reply document, Secure structure

Subject Owner Interaction

Subjects register with Owner during subscription phase; during this phase subject is assigned by owner credentials stored at the owner site

Owner returns to the subject the Subject Policy Configuration (policy identifiers) that apply to the subject signed with the private key of the owner

Example: If polices P1 and P2 apply to John and policy P6 apply to Jane, owner Joe sends John P1 and P2 and to Jane P6 signed with Joe’s private key

Owner Publisher Interaction For each document the owner sends the publisher information on which subjects can

access which portions of the document according to the policy base (I.e. access control

policies)

- Also for each element e based on the policies applied to e, the owner inserts policy

configuration (binary string) converted to hexadecimal representation; this element

is called Policy configuration attribute (PCattribute)

- Policy element which describes the policies for the document is also inserted

Owner also sends publisher Merkle Signature of each document

- It is the Merkle hash signed by owner’s private key

The document together with the security information is called “Security Enhanced

Document” (SE-XML)

Information in the security enhanced document enables the subject to verify the

authenticity of the document returned by publisher

Additional information encoded in the document called Secure Structure is used by the

subject to verify completeness of the result (for certain queries)

Subject Publisher Interaction

The subject submits queries to publisher; it also sends its subject policy configuration

Publisher computes a view of the requested documents based on access control policies for the subject set by the owner

To verify the authenticity of the answer, subject must recompute the same bottom up hash value signed by owner (i.e. Merkle signature) and compare it with the Merkle signature generated by the owner and inserted by the publisher

Subject may not get the entire document; therefore publisher sends to the subject additional hash values that refer to the missing portions of the document

- Hash value of parent is computed from hash values of children as well as hash values of tag names/values; publisher sends enough information for subject to compute hash value of the document

Subject verifies the authenticity of the document

MhX(Author)=h(h(Author)||h(Author.value))MhX(title)=h(h(title)||h(title.value))

titletitleAuthor

Author

paragraph

Politic_page Literary_page

Paragraphs

title

date

titleAuthor

titleAuthortopic

titleAuthortopictitleAuthortopictitleAuthortopic

Article

Newspaper

Frontpage

Leading

Sport_page

news news

Politic

paragraph

MhX(paragraph)=h(h(paragraph)||h(paragraph.content)||MhX(Author)||MhX(title))

Merkle Signature: Example

Some Results Theorem 1: Let g = (Vg, vg, Eg, FEg) be the SE-XML version of

an XML document d and r = (Vr, vr, Er, FEr) be the reply document corresponding to a query submitted on d by subject s. Each node in Vr,e is authenticable by s where a document d = (Vd, vd, Ed,

Fed) is defined as follows: Vd is the set of all element nodes and attribute nodes in d, vd is the node representing the document element called the document root, Ed is the set of edges in d, and FEd is the edge labeling function, Vr,e is the set of element nodes in the reply document r

Subject Verification Algorithm: Input: Reply document r = (Vr, vr, Er, FEr) Output = True if all nodes in r are authentic. False otherwise

Theorem 2: Let s be a subject, q be a query submitted by s, and r be the reply document received by s as an answer to q. Subject verification algorithm returns True iff. Each v in Vr,e is authentic where Vr,e is the set of element nodes in the reply document r

Note on Completeness Owner sends structure of the XML document to publisher called secure

structure containing names of tags and attributes and not the data content

Publisher sends the secure structure together with reply document to subject

Subject locally executes on the structure all queries whose conditions are against the document structure of the original document; the results are compared with the reply document

Key points

- Secure structure of the document is generated by hashing each tag and attribute name; it has the hashed attribute values of the XML document

- Secure structure also has policy element and policy configuration attributes of elements (not hashed)

- Completeness for queries on structure and equality on attribute values

Challenge: Extensions for more general queries