Upload
catalina-lapsley
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
D u k e S y s t e m s
Orca internals 101
Jeff Chase
Orcafest 5/28/09
2D u k e S y s t e m s
Summary of Earlier Talks
• Factor actors/roles along the right boundaries.– stakeholders, innovation, tussle
• Open contracts with delegation– resource leases
• Orca as control plane for GENI:– Aggregates are first-class (“authority”)– Slice controllers are first-class (“SM”)– Clearinghouse brokers enable policies under
GSC direction
3D u k e S y s t e m s
For more…
• For more on all that, see the slides tacked onto the ends of the presentation.
4D u k e S y s t e m s
Orca Internals 101
• Leasing core: “shirako”
• Plugins and control flow
• Actor concurrency model
• Lease state machines
• Resource representations
5D u k e S y s t e m s
Actors: The Big Picture
Brokerrequest
ticket
redeem
lease
SliceController
Authority
delegate
6D u k e S y s t e m s
Actors: The Big Picture
Brokerrequest
ticket
redeem
lease
SliceController
Authority
delegate
Integrate experiment control tools here (e.g., Gush and DieselNet tools by XMLRPC to generic slice controller) Integrate substrate
here with authority-side handler plugins
These are inter-actor RPC calls made automatically: you should not have to mess with them.
7D u k e S y s t e m s
Terminology
• Slice controller == slice manager == service manager == guest controller– Blur the distinction between the “actor” and
the controller module that it runs.
• Broker == clearinghouse (or a service within a clearinghouse)– == “agent” (in SHARP, and in some code)
• Authority == aggregate manager– Controls some portion of substrate for a site or
domain under a Management Authority
8D u k e S y s t e m s
Slice controller monitorsthe guest and obtains/renews
leases to meet demand
Slice Controllers
• Separate application/environment demand management from resource arbitration
ExperimentManager Slice
Controller
Site Authority
Aggregate/authorities monitorresources, arbitrate access,
and perform placement of guestrequests onto resources
Experiment control tools (e.g., Gush and DieselNet tools) e.g. with XMLRPC to generic slice controller.
9D u k e S y s t e m s
ProtoGENI?
Possibility: export ProtoGENI XMLRPC from a generic slice controller
> > (One thing to consider is how close it is to protogeni, which is not that> > complicated, something like:)> From poster on protogeni.net. (We should be able to support it. )
> > - GetCredential() from slice authority> > - CreateSlice() goes to slice authority> > - Register() slice with clearinghouse> > - ListComponents() goes to CH and returns list of AMs and CMs> > - DiscoverResources() to AM or CM returns rspecs> > - RequestTicket goes straight to an AM> > - RedeemTicket also goes to the AM> > - StartSliver to the AM after redeem: "bring sliver to a running state"
10D u k e S y s t e m s
Brokerrequest
ticket
redeem
lease
SliceController
Authority
delegate
Brokers and Ticketing
• Sites delegate control of resources to a broker– Intermediary/middleman
• Factor allocation policy out of the site– Broker arbitrates
resources under its control
– Sites retain placement policy
• “Federation"– Site autonomy
– Coordinated provisioningSHARP [SOSP 2003]w/ Vahdat, Schwab
11D u k e S y s t e m s
Actor structure: symmetry
• Actors are RPC clients and servers
• Recoverable: commit state changes– Pluggable DB layer (e.g., mysql)
• Common structures/classes in all actors– A set of slices each with a set of leases
(ReservationSet) in various states.– Different “*Reservation*” classes with different state
machine transitions.– Generic resource encapsulator (ResourceSet and
IConcreteSet)– Common kernel: Shirako leasing core
12D u k e S y s t e m s
Actors and containers
• Actors run within containers– JVM, e.g., Tomcat– Per-container actor registry and keystore– Per-container mysql binding
• Actor management interface– Useful for GMOC and portals– Not yet remoteable
• Portal attaches to container– tabs for different actor “roles”– Dynamic loadable controllers and views (“Automat”)
13D u k e S y s t e m s
“Automat” Portal
14D u k e S y s t e m s
Shirako Kernel
Snippet from “developer setup guide” on the web. The pathschanged in RENCI code base: prefix with core/trunk
15D u k e S y s t e m s
Shirako Kernel Events
The actor kernel (“core”) maintains state and processes events: – Local initiate: start request from local actor
• E.g., from portal command, or a policy
– Incoming request from remote actor– Management API
• E.g., from portal or GMOC
– Timer tick– Other notifications come through tick or
protocol APIs
16D u k e S y s t e m s
Pluggable Resources and Policies
Leasing Coreinstantiate guests
monitoringstate storage/recovery
negotiate contract termsevent handlinglease groups
Configure resources
PolicyModules
“Controllers”
ResourceHandlers
andDrivers
17D u k e S y s t e m s
Kernel control flow
• All ops come through KernelWrapper and Kernel (.java)– Wrapper: validate request and access
• Most operations pertain to a single lease state machine (FSM)
• But many access global state, e.g., alloc resources from shared substrate.– Kernel: execute op with a global “core” lock – Nonblocking core, at least in principle
•
18D u k e S y s t e m s
Kernel control flow
• Acquire core lock
• Invoke *Reservation* class to transition lease FSM
• Release core lock
• Commit new state to actor DB
• Execute async tasks, e.g. “service*” methods to invoke plugins, handlers
• Ticks probe for completions of pending async tasks.
19D u k e S y s t e m s
NascentActive
Ticketed
Active
Joining Active Extending Active Closed
Extending
Extending
requestticket
returnticket
redeemticket
closehandshake
Site policy assigns concrete resources to
match ticket.
Initialize resources when lease begins (e.g., install nodes).
Broker policy selects resource types and sites, and sizes unit quantities.
returnlease update
lease
updateticket
request ticketextend
request leaseextend
Guest usesresources.
Resources joinguest application.
original lease term
Service Manager
Broker
Site Authorit
yTeardown/reclaim
resourcesafter lease expires, or on
guest-initiated close.
Reservation may change size (“flex”)
on extend.
Guest may continue to extend
lease by mutual agreement.
form resource request
TimePriming
Ticketed
Ticketed
Lease State Machine
20D u k e S y s t e m s
Handlers
• Invocation upcalls through ResourceSet/ConcreteSet on relevant lease transitions.– Authority: setup/teardown
– Slice controller: join/leave
– Unlocked “async task” upcalls
• Relevant property sets are available to these handler calls– For resource type, configuration, local/unit
– Properties ignored by the core
• ConcreteSet associated with ShirakoPlugin– e.g., COD manages “nodes” with IP addresses, invokes
handlers in Ant
21D u k e S y s t e m s
Drivers
• Note that a handler runs within the actor
• So how to run setup/teardown code on the component itself?
• How to run join/leave code on the sliver?
• Option 1: handler invokes management interfaces, e.g., XMLRPC, SNMP, ssh
• Option 2: invoke custom driver in a NodeAgent with secure SOAP
22D u k e S y s t e m s
Example: VM instantiation
handlersdrivers
23D u k e S y s t e m s
TL1 Driver Framework
• General TL1 (command line) framework– Substrate component command/response
• What to “expect”?– XML file
24D u k e S y s t e m s
Orca: Actors and Protocols
Authority
Guest
Broker
extend
ticket allocateextend
redeemassign
formulate requests
redeemtickets
redeemextendLease
ticketextendTicket updateTicket updateLease
core
core
calendar
inventory
core
resource pools
[1]
[2]
[4] [8]
[6]
[5]
[7][3]
25D u k e S y s t e m s
Host site (resource pool)Guest application
Service Manager
Policy Plugin Points
Site Authority
lease status notify
setup/teardown
handlers for resources
assignment policy
leasingservice
interface
applicationresource
request policy
leaseevent
interface
join/leave handler
for service
leasing API
Broker plug-in broker policies forresource selection and provisioning
Broker service interface
Negotiation between policy plugins over allocation and configuration
Properties used to guide negotiation
26D u k e S y s t e m s
Host site (resource pool)Guest application
Service Manager
Property Lists
Site Authority
lease status notify
setup/teardown
handlers for resources
assignment policy
leasingservice
interface
applicationresource
request policy
leaseevent
interface
join/leave handler
for service
leasing API
Broker Examples: FCFS,priority,economic
Broker service interface
Request properties Resource Properties
Configuration properties
Unit properties
elastic,deferrable machine.memorymachine.clockspeed
image.id,public.key
host.ip,host.key
27D u k e S y s t e m s
Messaging Model
• Proxies maintained in actor registry– Asynchronous RPC– Upcall to kernel for incoming ops– Downcall from lease FSM for outgoing
• Local, SOAP w/ WS-Security, etc.– WSDL protocols, but incomplete
• Integration (e.g., XMLRPC)– Experiment manager calls into slice controller
module– Substrate ops through authority-side handler
28D u k e S y s t e m s
The end, for now
• Presentation trails off….
• Follows are other slides from previous presentations dealing more with the concepts and rationale of Orca, and its use for GENI.
29D u k e S y s t e m s
NSF GENI Initiative
Sliverable GENI Substrate(Contributing domains/Aggregates)
Wind tunnel
Experiments(Guests occupying slices)
Embedding
Petri dish
Observatory
30D u k e S y s t e m s
Dreaming of GENINSF GENIclearinghouse
Component Registry
Channel
Band
Switch Port
Fiber ID ρ
Optical
Switch
GID
1. CM self-generates GID: public and private keys
2. CM sends GID to MA; out of band methods are used to validate MA is willing to vouch for component. CM delegates MA the ability to create slices.
3. MA (because it has sufficient credentials) registers name, GID, URIs and some descriptive info.
Notes:
• Identity and authorization are decoupled in this architecture. GIDs are used for identification only. Credentials are used for authorization. I.e., the GID says only who the component is and nothing about what it can do or who can access it.
• Assuming aggregate MA already has credentials permitting access to component registry
Usage PolicyEngine
4. MA delegates rights to NSF GENI so that NSF GENI users can create slices.
Aggregate MgmtAuthority
Aaron Falk, GPO BBN
http://groups.geni.net/
31D u k e S y s t e m s
Slivers and Slices
Aaron Falk, GPO BBN
32D u k e S y s t e m s
GENI as a Programmable Substrate
• Diverse and evolving collection of substrate components.– Different owners, capabilities, and interfaces
• A programmable substrate is an essential platform for R/D in network architecture at higher layers.– Secure and accountable routing plane– Authenticated traffic control (e.g., free of DOS and spam)– Mobile social networking w/ “volunteer” resources– Utility networking– Deep introspection and adaptivity– Virtual tunnels and bandwidth-provisioned paths
33D u k e S y s t e m s
Some Observations
• The Classic Internet is “just an overlay”.– GENI is underlay architecture (“underware”).
• Incorporate edge resources: “cloud computing” + sliverable network
• Multiple domains (MAD): not a “Grid”, but something like dynamic peering contracts– Decouple services from substrate; manage the
substrate; let the services manage themselves.
• Requires predictable (or at least “discoverable”) allocations for reproducibility– QoS at the bottom or not at all?
34D u k e S y s t e m s
Breakable Experimental Network (BEN)
• BEN is an experimental fiber facility• Supports experimentation at metro scale
– Distributed applications researchers
– Networking researchers
• Enabling disruptive technologies – Not a production network
• Shared by the researchers at the three Triangle Universities– Coarse-grained time sharing is the primary mode for usage
– Assumes some experiments must be granted exclusive access to the infrastructure
35D u k e S y s t e m s
Resource Control Plane
MiddlewareCloud Apps Services
Node
VM
Hardware
VM
Node
VM
Hardware
VM
Other Guests
Open Resource Control Architecture (Orca)
• Contract model for resource peering/sharing/management• Programmatic interfaces and protocols• Automated lease-based allocation and assignment• Share substrate among dynamic “guest” environments• http://www.cs.duke.edu/nicl/
Resource Control Plane
36D u k e S y s t e m s
The GENI Control Plane
• Programmable substrate elements• Dynamic end-to-end sliver allocation + control
– Delegation of authority etc.– Instrumentation (feedback)
• Resource representation and exchange– Defining the capabilities of slivers– “network virtual resource”
• Foundation for discovery– Of resources, paths, topology ra=(8,4)
rb=(4,8)
a
b
crc=(4,4)
→
→
→16
CPU shares
band
wid
th s
hare
s
37D u k e S y s t e m s
Define: Control Plane
GGF+GLIF: "Infrastructure and distributed intelligence that controls the establishment and maintenance of connections in the network, including protocols and mechanisms to disseminate this information; and algorithms for automatic delivery and on-demand provisioning of an optimal path between end points.”
s/connections/slices/s/optimal path/embedded slicesprovisioning += and programmed instantiation
39D u k e S y s t e m s
Key Questions
• Who are the entities (actors)?
• What are their roles and powers?
• Whom do they represent?
• Who says what to whom?
• What innovation is possible within each entity, or across entities?
Control plane defines “the set of entities that interact to establish, maintain, and release resources and provide…[connection,slice] control functions”.
41D u k e S y s t e m s
Design Tensions
• Governance vs. freedom
• Coordination vs. autonomy
• Diversity vs. coherence
• Assurance vs. robustness
• Predictability vs. efficiency
• Quick vs. right
• Inclusion vs. entanglement
• Etc. etc. …
42D u k e S y s t e m s
Design Tensions
• What is standardized vs. what is open to innovation?
• How can GENI be open to innovation in components/management/control?– We want it to last a long time.– Innovation is what GENI is for.
• Standardization vs. innovation– Lingua Franca vs. Tower of Babel
43D u k e S y s t e m s
Who Are the Actors?
• Principle #1: Entities (actors) in the architecture represent the primary stakeholders.
1.Resource owners/providers (site or domain)
2.Slice owners/controllers (guests)
3.The facility itself, or resource scheduling services acting on its behalf.
Others (e.g., institutions) are primarily endorsing entities in the trust chains.
44D u k e S y s t e m s
Control Plane
CloudService
NetworkService
Etc.
ResourcesInfrastructure providers
Brokering intermediaries
(ClearingHouse)
Plug guests, resources, and
management policies into the “cloud”.
45D u k e S y s t e m s
Contracts
• Principle #2: provide pathways for contracts among actors.– Accountability [SHARP, SOSP 2003]
• Be open with respect to what promises an actor is permitted to make.– Open innovation for contract languages and tools– Yes, need at least one LCD
• Rspec > HTML 1.0• Lingua Franca vs. Tower of Babel
• Resource contracts are easier than service/interface contracts.
46D u k e S y s t e m s
Rules for Resource Contracts
• Don’t make promises you can’t keep…but don’t hide power. [Lampson]
• There are no guarantees, ever.– Have a backup plan for what happens if
“assurances” are not kept.
• Provide sufficient power to represent what promises the actor is explicitly NOT making.– E.g., temporary donation of resources– Best effort, probabilistic overbooking, etc.
• Incorporate time: start/expiration time– Resource contracts are leases (or tickets).
47D u k e S y s t e m s
<lease> <issuer> Site’s public key </issuer> <signed_part> <holder> Guest’s public key </holder> <rset> resource description </rset> <start_time> … </start_time> <end_time> … </end_time> <sn> unique ID at Site </sn> </signed_part> <signature> Site’s signature </signature></lease>
Guest
request
grant Provider Site
Leases
• Foundational abstraction: resource leases• Contract between provider (site) and guest
– Bind a set of resource units from a site to a guest– Specified term (time interval)– Automatic extends (“meter feeding”)– Various attributes
50D u k e S y s t e m s
Network Description Language?<ndl:Interface rdf:about="#tdm3.amsterdam1.netherlight.net:501/3"> <ndl:name>tdm3.amsterdam1.netherlight.net:501/3</ndl:name> <ndl:connectedTo rdf:resource="http://networks.internet2.edu/manlan/manlan.rdf#manlan:if1"/> <ndl:capacity rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.244E+9</ndl:capacity></ndl:Interface><ndl:Interface rdf:about="http://networks.internet2.edu/manlan/manlan.rdf#manlan:if1"> <rdfs:seeAlso rdf:resource="http://networks.internet2.edu/manlan/manlan.rdf"/>
<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:ndl="http://www.science.uva.nl/research/sne/ndl#” xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"><!-- Description of Netherlight --><ndl:Location rdf:about="#Amsterdam1.netherlight.net"> <ndl:name>Netherlight Optical Exchange</ndl:name> <geo:lat>52.3561</geo:lat> <geo:long>4.9527</geo:long></ndl:Location><!-- TDM3.amsterdam1.netherlight.net --><ndl:Device rdf:about="#tdm3.amsterdam1.netherlight.net"> <ndl:name>tdm3.amsterdam1.netherlight.net</ndl:name> <ndl:locatedAt rdf:resource="#Amsterdam1.netherlight.net"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/1"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/2"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/3"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/4"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/1"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/2"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/3"/>>
52D u k e S y s t e m s
Delegation
• Principle #3: Contracts enable delegation of powers.– Delegation is voluntary and provisional.
• It is a building block for creating useful concentrations of power.– Creates a potential for governance– Calendar scheduling, reservation– Double-edged sword?
• Facility can Just Say No
53D u k e S y s t e m s
Aggregation
• Principle #4: aggregate the resources for a site or domain.– Primary interface is domain/site authority
• Abstraction/innovation boundary– Keep components simple– Placement/configuration flexibility for owner– Mask unscheduled outages by substitution– Leverage investment in technologies for
site/domain management
54D u k e S y s t e m s
BEN fiberplant
• Combination of NCNI fiber and campus fiber
• Possible fiber topologies:
55D u k e S y s t e m s
Infinera DTN
• PIC-bases solution
• 100Gbps DLM (digital line module)– Circuits provisioned at 2.5G granularity
• Automatic optical layer signal management (gain control etc.)
• GMPLS-based control plane
• Optical express– All-optical node bypass
56D u k e S y s t e m s
Experimentation on BEN
• Extend Orca to enable slivering of– Network elements:
• Fiber switches• DWDM equipment• Routers
• Adapt mechanisms to enable flexible description of network slices– NDL
• Demonstrate end-to-end slicing on BEN– Create realistic slices containing compute, storage and
network resources– Run sample experiments on them
57D u k e S y s t e m s
BEN Usage
• Experimental equipment connected to the BEN fiberplant at BEN points-of-presence
• Use MEMS fiber switches to switch experimental equipment in and out
– Based on the experiment schedule
• By nature of the facility, experiments running on it may be disruptive to the network
• BEN Points of presence located at the RENCI engagement sites and RENCI anchor site
58D u k e S y s t e m s
BEN Redux
• Reconfigurable optical plane– We will be seeking out opportunities to expand the available
fiber topology
• Researcher equipment access at all layers– From dark fiber up
• Coarse-grained scheduled• Researcher-controlled• No single-vendor lock-in• Equipment with exposable APIs• Connectivity with substantial non-production
resources
59D u k e S y s t e m s
Elements of Orca Research Agenda
• Automate management inside the cloud.– Programmable guest setup and provisioning
• Architect a guest-neutral platform.– Plug-in new guests through protocols; don’t hard-wire them
into the platform.
• Design flexible security into an open control plane.• Enforce fair and efficient sharing for elastic guests.• Incorporate diverse networked resources and virtual
networks.• Mine instrumentation data to pinpoint problems and
select repair actions.• Economic models and sustainability.
60D u k e S y s t e m s
Leasing Virtual Infrastructure
- e.g., CPU shares, memory etc. “slivers”- storage server shares [Jin04]- measured, metered, independent units- varying degrees of performance isolation
Policy agents control negotiation/arbitration.- programmatic, service-oriented leasing interfaces- lease contracts
The hardware infrastructure consists of pools of typed “raw” resources distributed across sites.
ra=(8,4)
rb=(4,8)
a
b
crc=(4,4)
→
→
→16
Guest
request
grant Provider Site
<lease> <issuer> Site’s public key </issuer> <signed_part> <holder> Guest’s public key </holder> <rset> resource description </rset> <start_time> … </start_time> <end_time> … </end_time> <sn> unique ID at Site </sn> </signed_part> <signature> Site’s signature </signature></lease>
61D u k e S y s t e m s
Summary
• Factor actors/roles along the right boundaries.– stakeholders, innovation, tussle
• Open contracts with delegation
• Specific recommendations for GENI:– Aggregates are first-class entities– Component interface: permit innovation– Clearinghouse: enable policies under GSC
direction
62D u k e S y s t e m s
Modularize Innovation
• Control plane design should enable local innovation within each entity.
• Can GENI be a platform for innovation of platforms? Management services?– How to carry forward the principle that
PlanetLab calls “unbundled management”?
• E.g., how to evolve standards for information exchange and contracts.– Lingua Franca or Tower of Babel?
63D u k e S y s t e m s
Slices: Questions
• What “helper” tools/interfaces must we have and what do they require from the control plane?
• Will GENI enable research on new management services and control plane?– If software is the “secret weapon”, what parts of the platform
are programmable/replaceable?
• Co-allocation/scheduling of an end-to-end slice?– What does “predictable and repeatable” mean?– What assurances are components permitted to offer?
• What level of control/stability do we assume over the
substrate?
64D u k e S y s t e m s
Focus questions
• Specify/design the “core services”:– Important enough and hard enough to argue about– Must be part of facilities planning– Directly motivated by usage scenarios– Deliver maximum bang for ease-of-use– User-centric, network-centric
• Enable flowering of extensions/plugins– Find/integrate technology pieces of value
• What requirements do these services place on other WGs?
65D u k e S y s t e m s