The Willow System Implementation
Intrusion Tolerance ThroughSecure System Reconfiguration
OASIS PI MeetingSanta Rosa, CA
August 2002
The Willow Team
University of Colorado: Alexander Wolf, Dennis Heimbigner, Antonio Carzaniga Naveed Arshad, Marco Castaldi, John Giacomoni Nathan Ryan
University of Virginia: John Knight, Jonathan Hill, Phil Varner, Sean Travis Aaron Crickenberger, Rich Honhart, Serge Egelman, Warren Hall,
Mi Peng, Mike Peck, Brian Garback
University of CA, Davis: Prem Devanbu, Michael Gertz, Brian Toone
Aspects of Intrusion Tolerance•Very Large Networks•Interdependent Networks•Heterogeneous Nodes•Explicit Sense/Analyze/Respond•Non-Local Faults•Sequential Faults
NetworkSensors A
ctua
tors
Network State &Analysis Model
TolerateUnanticipat’d
Faults
TolerateAnticipated
Faults
Change toPlannedPosture
UpdateSystem
DeploySystem
Trust-Mediated External Input
Dimensions of Intrusion Tolerance
Recent Progress
Willow system implementation goals: Implement all functionality Design to scale to expected network sizes
Topics: Target system testbed Willow system error detection/analysis Willow system communication—Siena:
Site select addressing Result harvesting
Willow system actuation Trust-mediated information access Evaluation and preliminary tests
Reactive Control Mechanism
Servers - Immunix
Clients - Windows
Error Detection&
Recovery Synthesis
Error Detection&
Recovery Synthesis
Translator
Surviv.Spec.
•New notation & translator•New language features:
•Proper treatment of time•General class structure
Wide Area
Domain
Local Area
DomainLocal Area
Domain
NetworkNodes
LIRAAndreaAgent
LIRAAndreaAgent
LIRAAndreaAgent
LIRAAndreaAgent
LIRAAndreaAgent
Sen
sors
Sen
sors
Visualization
Administrator’sWorkbench
Willow Communications
Servers - Immunix
Clients - Windows
Error Detection&
Recovery SynthesisSensor
Data
ActuationCommands
Event notification Many inf. sources:
Trusted Untrusted
Mediation structure toprovide most trusted
Pub/sub implementationfor control
Two extensions: Site select addressing Result harvest
Visualization
Administrator’sWorkbench
Tru
st-M
edia
ted
Ext
ern
al In
pu
t
Willow Control Communications
Publisher
Subscriber Subscriber Subscriber Subscriber Subscriber
SienaSiena
Result Harvesting
Site Select Addressing
Selector Receptor
Subscriptions
Publication
Site Select Addressing
Specialized use of publish/subscribe: Published messages contain selection parameters
Received only by sites with matching receptors
Efficient, property-based addressing of receiver sites
Built a general selection function language on the Siena
publish/subscribe system
Receivers are dynamic—can change their properties
Messages they receive depend only on their recent properties
Previous work: Control of robot groups in MURDOCH system
Distributed query in query/response paradigm (Colorado)
Advantages Of Site Select Addressing
Brings all benefits of pub/sub to control: Qualitative addressing, does not require explicit knowledge of receivers
Flexible, easy-to-use, one-to-many messaging
Selection functions limited to AND are O(1) router table-efficient
Selection function language not efficient for all Boolean expressions. OR
causes exponential router table costs
Issues: Dynamic changes in receiver properties take time to propagate through
distributed routing tables
Time scales with network size
May not receive some messages issued after properties are changed to be
relevant, due to lag in network routing setup
Publish/Subscribe Result Harvesting
Reply mechanism for distributed Publish/Subscribe, implemented for the
Siena system
Gathers responses to a published message: All receivers can respond
Responses in a histogram
Reports histogram to publisher
Re-uses the forwarding tree generated in the propagation of the
publication
Merges histograms at the convergences on the return to the root of the
tree
Comprehensive performance analysis paper in preparation
Utility Of Result Harvesting
General uses for very large networks: Publish content and learn number of recipients
Publish orders and harvest responses
Publish queries and harvest results
Content-routed RPC (i.e., pub/sub-style)
Efficient implementation of query/response
Site-Select command—messaging via site-select
addressing followed by result harvesting
LIRA/ANDREA Architecture Agents
Wide Area
Domain
Local Area
DomainLocal Area
Domain
NetworkNodes
LIRAAndreaAgent
LIRAAndreaAgent
LIRAAndreaAgent
LIRAAndreaAgent
LIRAAndreaAgent
Servers - Immunix
Clients - Windows
Siena Publish/Subscribe Bus
LIRA/ANDREA Agent Structure
LIRA Interface(Set/Ack, Get/Reply, Notify, Call/Return)
AttributeModel
SSM SSL
AttributeModel
Monitor
CallHandler Intention
Council
LocalSenseMonitor
SetSensor Event
Change Event
SetNotify,
Call
CallAck
ReplyReturnResult
SetGet
Local
AckReply
AckReturnResult
‘Displayed’attributesare the
SSM ‘Antigen’ One to Many Command,Many to One Result
Harvesting Peer-to-peer communications
Intra-agentcommunications
LIRAAndreaAgent
Peer-to-peer
Trust Mediation in Willow
Client systems express trust requirements
Trust rating system assigns trust ratings to information sources
Trust ratings stored in trust broker
Mediator evaluates queries using trust ratings
1. Infer trust ratings for queries
2. Select source(s) and evaluate
InformationSource
ClientSystem
ClientSystem
ClientSystem
Mediator
InformationSource
InformationSource
R
R
R
TrustBroker
Trust RatingSystem
P
P
P
Modeling Trust Ratings of Sources: Completeness
Trust model based on difference in tuples
V corresponds to a belief about the content of a given relation
R corresponds to the actual content presented by a source
V R means that R is rated as over-the-top with respect to completeness
Wrong
R, V Complete
V R Incomplete
Over-the-topR V
R VV
Trust Mediation Benefits
Source selection by rating queries using inference rules (e.g.,)
…and similarly for other relational algebra constructors
Facilitates interaction with trusted or partially trusted sources
Level of abstraction for designing security policies
CorIbobbob
Ibob
Cbob SRSR )(
First Experiment – A Worm(Yes, we wrote a worm…)
Goal—detect and respond to fast-moving worm
But we only have six machines, so it’s just a feasibility
experiment
Fault tolerance: Error det.: >3 local alarms in 1 min growing attack
Error rec.: kill worm process on affected nodes
and harvest process forensics (follow the worm)
Error det.: >15 local alarms in 1 min wide-area attack
Error rec: use forensics to kill application network-wide
Test Application—A JBI
Siena Router
Siena Router
Siena Router
Siena RouterSiena Router
Siena Router
Air TaskingOrders Database
MapDatabase
WeatherSensors
AircraftObserver
ObservationAnd Command
Interface
Future Plans
Wide-area implementation across all Willow sites
Implementation on ~300-node testbed for performance measurement
Multiple asynchronous control loops
Variety of security attack scenarios
Variety of non-malicious damage scenarios
Performance measurement of: Site-select commands
Result harvesting
Expected Major Results
Technology: Efficient/scalable/secure control architecture for large
networks (specification, synthesis, communication,
configuration, coordination, mediation, actuation, etc.)
Rigorous performance analysis of the components and
the composite architecture Demonstrations:
Wide-area Willow & JBI implementations Survive (recall the definition):
Worm attack on large network Random physical damage to our JBI Coordinated security attacks on our JBI
Performance Of Result Harvesting
Pub/Sub network with fixed branching factor Hierarchical dispatch network Branching factor b=number of children
dispatchers per dispatcher Peer-to-peer dispatch network with homogenous
connection Topology (near same number of connections per
node).
Publish/Subscribe Result Harvesting a)Branching factor b=maximum number of connections-1. B) Then 1) Worst case publication (sent to the entire network of nodes, set N) 2) with the worst possible replies (each reply unique, histogram degenerates to a list) 3) generates a forwarding tree of height log(b, |N|) 4) Results in total histogram of size |N| 5) average histogram size at dispatch nodes in the tree is O(log(|N|)) 6) average merge cost and bandwidth cost at dispatch node are both O(log(|N|)) 7) merge and bandwidth costs to the root node are O(|N|) 8) merge and bandwidth costs next to leaves is O(1) C) In such a branching factor-compliant network (b fixed, see above), given the network should support P worst case simultaneous messages originated from different publishers, then we can compute 1)average bandwidth requirement as O(Plog(|N|)) 2)peak bandwidth requirement as O(|N|) (no factor of P here assuming P messages have been published from different nodes) 3)same for merge computation cost