1 Scalable, Robust Wide-area Control Architecture for Integrated Communications Helen J. Wang...

Preview:

Citation preview

1

Scalable, Robust Wide-area Control Architecture for Integrated

Communications

Helen J. Wang

Qualifying Examination

March 8, 2000

2

Motivation

Lack support for:• Integrated use of heterogeneous devices (old & new)• Rapid arbitrary communication service customization

PSTN

CellularPager

Internet

3

Limitations of Existing Systems

• Telecommunications network: – engineered with one app and device in mind

• Existing Internet Telephony systems:– ease of service creation, but limited– scalability, availability and fault tolerance not

fully addressed

4

How good is a communication system?(Dissertation Goals)

• Functionality: communication services it can support, and the ease of creating them

• Viability: scalability, robustness

• Focus on the control aspect:– control architecture = system components +

signaling protocol (session setup, tear-down, and control)

5

Problem Statement

• Given heterogeneity, how to design a scalable, robust wide-area control architecture that supports easy creation of a wide range of communication services? And how should these services be created?

6

Outline

• Related Work and Research Contribution

• Control Architecture

• Signaling Protocol

• Service Creation Model

• Summary, Methodology, Research Agenda

7

Related Work

8

Overview of Research Contributions

• A scalable control architecture

• A robust signaling protocol

• A user-level, easy service creation model• Publications:

– “A Signaling System Using Light Weight Sessions” accepted to

Infocom 2000. – Helen J. Wang, et al. “ICEBERG, An Internet-Core Network

Architecture for Integrated Communications,” accepted to IEEE Personal Communications April/2000.

9

Outline

• Related Work and Research Contribution

• Control Architecture• Signaling Protocol

• Service Creation Model

• Summary, Methodology, Research Agenda

10

Control Architecture: Goals

• Any-to-any communication– inter-working, composition of data transformation

• Personal mobility– unique ID, name mapping

• Personalized communication services– preference storage and management

• Enable user-activity driven services– activity tracking

11

Control ArchitectureComponents and Their Operations

Call Agent

Alice@domain1iPOP

Call Agent

Bob@domain2iPOP

PR

PAC

NMSIAP

dialed333-2222

NMS PR

PAC

Pick up

APC APC

Data Path

IAP

12

Leverage Cluster Computing Platforms

• iPOP must be scalable and robust: leverage cluster computing platforms such as Ninja, AS1

• Our requirements:– highly available service invocation: Ninja Base – fault tolerant service session: AS1

• session state maintained on client (IAP)

• iPOP on Ninja Base augmented with client heartbeat support from AS1

13

Control Architecture:Facts

iPOP

Call Agent

PR PAC

iPOP

Call Agent

PR PAC

Local areacommunication

Wide-areacommunication

Access net

• One Call Agent per caller per device• One type of IAP per access network

IAP

14

Outline

• Related Work and Research Contribution

• Control Architecture

• Signaling Protocol• Service Creation Model

• Summary, Research Methodology, Agenda

15

Signaling Protocol

• Basic call service: building blocks for supplementary services– Conventional: two party, homogeneous devices– ICEBERG communication model:

• multi-device communication• invitation-based participation• large number of dynamic small group communication• Richer primitives: add/remove an endpt during a session• conference call, service handoff first class service; trivial to

implement services that require endpoint changes.

16

Challenges in Signaling:Problems with SIP

CA1 CA2Alice Bob

CA3 CA4Carol Dale

Invite(also Bob)

Invite(also Alice)

Invite AliceInvite Bob

Alice Bob Carol Dale Alice Bob CarolDale

Alice Bob DaleAlice Bob Carol

• no consideration of session dynamics: membership, component failure

• bridged conference: centralized component to maintain states -- single point of failure

CA5

17

Problems with H.323

• Centralized approach for conferencing

• Limited fault tolerance measure:– process-pair style– cannot capture new state during fault recovery

• Complex

18

Lessons Learned

• Correctness and robustness: – need to maintain up-to-date membership and

session state (call parties, device status, data path info) in the face of transient component failures, network partitions, and any exceptional conditions.

– distributed approach rather than centralized

19

Our Approach

• Maintain membership and session state as soft state in a distributed fashion. – Soft state: expired unless refreshed, protocol

action upon new state or timeout, error recovery same as normal operation

• Question: call setup latency requirement? bandwidth scalability problems?

20

Signaling Protocol: Session Membership

• Session membership– membership: CAs– IP multicast’s group service an overkill for

small group communication• per group state in routers, IP addr scarcity,

deployment issues: access control, accountability

– Solution: run an application-level group membership protocol among participating IAPs

21

Signaling Protocol: Capture the Complete Session State

iPOP

Call Agent

Session state

iPOP

Session state

iPOP

Call Agent

Session state

Comm Session

Call Agent

APC APC

APC

Listen

Listen

Listen

IAP

IAP

IAP

iPOP HB

iPOP HBiPOP HB

HB

HBHB

Announce Announce

Announce

22

Signaling Protocol: Fault Tolerance

iPOP

Call Agent

Session state

iPOP

Session state

iPOP

Comm Session

Call Agent

APC APC

APC

Listen

Listen

Listen

IAP

IAP

IAP

iPOP HB

iPOP HBiPOP HB

HB

HBHB

Announce Announce

Announce

Call Agent

Session state

23

Signaling Protocol: Fault Tolerance

iPOP

Call Agent

Session state

iPOP

Session state

iPOP

Comm Session

Call Agent

APC APC

APC

Listen

Listen

Listen

IAP

IAP

IAP

iPOP HB

iPOP HBiPOP HB

HB

HBHB

Announce Announce

Announce

Session state

Call Agent

24

Signaling Protocol: Fault Tolerance

iPOP

Call Agent

Session state

iPOP

Session state

iPOP

Comm Session

Call Agent

APC APC

APC

Listen Listen

IAP

IAP

IAP

iPOP HBiPOP HB HBHB

Announce Announce

25

Invitation Protocol

• Invite a Call Agent to participate a session

• Also a soft state protocol for robustness: – IAP maintains the call state machine, sends stateful, keep-

alive heartbeat to the iPOP

– Call Agents advance call state machines on IAPs through periodic install-state message until receiving new heartbeat with the new state

– Soft state inter-iPOP communication

26

Bandwidth Scalability

• Soft state period selection: call setup latency, fault recovery time vs Bandwidth overhead– An optimization problem: minimize bandwidth overhead,

subject to the following contraints:• expected call setup latency (1.5 second)• standard deviation (0.5 second)• fault recovery time (1, 4 seconds for local and wide area)

– parameters: 2% wide-area loss rate, 0.2% local-area loss rate, 2ms local-area propagation delay, 100 ms wide-area delay

– local: 1 sec, 800bps; wide: 3 sec, 233 bps; for 64kbps data stream, local area control traffic 1%

27

Processing Scalability

• Compare our single cluster system against a class 4 switch which is a local (end) office: 250 calls/second

• Our current prototype yields 10 calls/second on a PC due to inefficient RMI implementation (10’s ms), 25+ PCs = a class 4 switch

28

Outline

• Related Work and Research Contribution

• Control Architecture

• Signaling Protocol

• Service Creation Model• Research Agenda

29

Service Creation Model

• Focus: control, redirection services• Goal: end users can easily customize the

control services in any arbitrary way• Issues:

– service creation/customization– service invocation – service portability– system support

30

Intelligent Network

• Separate service logic from basic call processing

Switch

ServiceLogic

Trigger

• Service portability: standardize basic call state machine too strict a standard failed

• Limitation: no user-level customization

31

Proposed Approach

• Call processing implementation independent customization: use high-level events, e.g., call request received, callee device busy, callee device not answer

• Service creation: condition-action pairs– condition: conjunction of high level events, user

interested conditions, and boolean expressions;

– Action: composition of system primitives

• Hypothesis: condition-action pair sufficient

32

Proposed ApproachService Invocation & Portability

PreferenceRegistry

Call Agent

PAC

event

checkupdate

Condition Action

Activity

Condition ActionConditionCondition

ActionAction

• Service Portability: standardize the events and system primitives, much easier than call state machine

33

An Example Completion of calls to busy subscriber

callee busy && caller hang up register with callee PAC;

callee PAC reject exit

callee PAC notify invite caller; invite callee;

caller busy wait 5 minutes; re-register with the callee PAC;

hangup time > 1 hours de-register with callee PAC; exit

34

An Example, Cont.

• System support issues:– extended Call Agent life time– queue management on the PAC

– track event sequence: stack of timed events, stack depth depending on user preferences

35

How good is a communication system?

• Functionality: services– component identification– powerful signaling protocol primitives– easy, user-centric service creation model

• Viability: scalability, robustness– first application of soft state to signaling protocol,

bandwidth overhead not an issue, can fulfill latency requirements

– processing scalability, local area robustness by leveraging cluster computing platforms

36

Outline

• Related Work and Research Contribution

• Control Architecture

• Signaling Protocol

• Service Platform

• Methodology and Research Agenda

37

Methodology1st Iteration (Completed)

PrototypeDesign

Analysis Evaluation

• Control architecture

• Signaling protocol– session maintenance

protocol

• Control architecture

• Session maintenance protocol

• Measured the current prototype• Simple soft state period analysis

38

Methodology2nd Iteration Overview

PrototypeDesign

Analysis Evaluation

• Service creation model– Possibly revise the design of the

control architecture and the signaling protocol

• Completed work:– invitation protocol

– membership protocol

• Wide-area testbed

• Group membership protocol

• Invitation protocol

• Service creation model

• Evaluation: scalability, robustness, service creation, hard/soft state comparison

• Analysis: group membership protocol, service creation

39

Research Agenda

• Phase 1: complete and fine-tune service creation model design (1 month)– define events and system primitives– preference conflict resolution– identify service creation interaction with the

control architecture and signaling Planned paper submission on service creation model design to SmartNet 3/31

40

Research Agenda

• Phase 2: 2nd iteration Prototyping (3 - 6 months)– invitation protocol, membership protocol– employ Ninja vSpace– release ICEBERG to Ericsson, TU Berlin, NTT

and construct a wide-area test-bed– service creation modelPlanned paper submission to ICNP (May) orINFOCOM (July) on protocols and analysis

41

Research Agenda, Cont.

• Phase 3: Evaluation (6 months)– processing scalability: measure call processing time, #

of simultaneous sessions, compare against class 4 switch

– bandwidth scalability: group membership protocol analysis; dynamic soft state period selection

– robustness: emulate failure conditions (losses, long delays, component failures), run system over time

– hard/soft state comparison: bandwidth usage, latency, fault recovery time

42

Research Agenda, Cont.

– Service creation evaluation:• comparable functionality : implement representative IN

services such as “call completion upon busy”

• new services such as policy-based call waiting

• system extensibility: # of lines of code and amount of time to develop new primitives for new services

Planned paper submission on wide-area testbed experience and evaluation to SIGMETRICS 3/2001

43

Research Agenda, Cont.

• Phase 4: Write thesis (6 month)– compile the publications

44

Acronyms Lookup

• APC: Automatic Path Creation• CA: Call Agent• IAP: ICEBERG Access Point• iPOP: ICEBERG Point of Presence• NMS: Name Mapping Service• PAC: Personal Activity Coordinator• PR: Preference Registry

45

Soft and Hard State

• Soft State– expire unless refreshed,

protocol action upon new state and timeout

– loss of state will not stop the system -- robust

– eventual consistency – error recovery built into

normal operation --simple, but longer latency, and no diagnosis

• Hard State– explicit state setup once

only (bandwidth and processing efficiency)

– explicit error detection and recovery synchronously at involved components -- complex but immediate

– better consistency guarantees

46

Signaling Protocol: Group Membership Protocol

• Periodic membership exchange among members– no bootstrapping needed: every member knows at least

one other member (invitation-based)– receive superset or disjoint set: immediate

synchronization with the rest of the session– run among the IAPs for Call Agent fault recovery– time stamped <IAP, CA> list

• Convergence efficiency rather than bandwidth efficiency

47

Period Selection

• Soft State Period: dominates fault recovery time, affects bandwidth overhead– cannot trade latency for bandwidth scalability

• Problem: what period values to select to fulfill the call setup latency, fault recovery latency requirements and minimize the bandwidth overhead? -- an optimization problem

48

Select PeriodProblem Formulation

• Call setup latency = receiving 8 local-area and 4 wide-area msgs in sequence + msg processing time

• Receive a local-area msg = f (local-area period, local-area loss-rate, local-area propagation delay)

• The optimization problem:– find local-area and wide-area period that minimize bandwidth overhead,

subject to the following constraints• E(call setup latency) <1.5 second• Standard deviation (call setup latency) < 0.5 second• local-area fault recovery time <1 s; wide < 4 s

– with parameters: 2% wide-area loss rate, 0.2% local-area loss rate, 2ms local-area propagation delay, 100 ms wide-area delay

49

Results: Period = f (processing)

• fault recovery time constraints dominate the effects on period

• local-area period = 1s– 800 bps overhead

• wide-area period = 3s– 233 bps overhead

• for 64kbps data stream, 1% * # of members

50

Proposed Approach: Service Creation

User GUI

Condition Action

PreferenceRegistry

Call Agent

• Condition: conjunction of high level events, user interested conditions, and boolean expressions;

• Action: sequence of system primitives• Advantage: call processing impl. independent• Hypothesis: condition-action pair sufficient

51

An Example