37
WING: A Consistent Computing Platform Yiwei Ci 22/02/2012 [email protected]

WING: A Consistent Computing Platform Yiwei Ci 22/02/2012 [email protected]

Embed Size (px)

Citation preview

WING: A Consistent Computing Platform

Yiwei Ci22/02/2012

[email protected]

Goals• Give processes suitable places for computing.• Take processes to the place suitable.

• Transparency– OS abstraction (SSI).– Hardware abstraction

• supports heterogeneous system• Seamless– Dynamically switch workplace.

Challenges

• Security– Prevent malicious software?

• Availability– The availability of each computer is high today.

How about the whole distributed system?

• Scalability– Provide just-in-time scalability?

Application scenarios

• Build your data center.– Exchange between data migration and

computation migration.• Build your virtual cluster.– PC + PC– PC + Mobi– Mobi + Mobi

• Build your test bed.– Easy deployment

Related works

• XtreemOS– Grid operating system supporting virtual organizations.– Latest version: 3.0 (2012/02/10)

• OpenSSI– Provide single process space, single I/O space, single IPC

space, and single root for the cluster.– Latest version: 1.9.6 (2010/02/18)

• OpenMosix– A Linux kernel extension for single-system image

clustering– Latest version: 2.4.26 (closed)

Related worksOpenSSI OpenMosix Kerrighed

Single Administrative Domain Yes No Yes

Cluster Membership Guarantees Yes No No

High-Availability (HA) Clustering Yes No No

Single Process Management Space Yes No Yes

Process Migration Full Partial Partial

Process Load Balancing Yes Yes Yes

Process Migration is HA Yes No No

Migrate Processes Using Semaphores Yes Yes Yes

Migrate Processes Using Shared Memory Yes No Yes

Single Thread Migration Only thread group No No

Process Checkpointing 3rd party 3rd party Yes

Single IPC Namespace Yes No Yes

http://wiki.openssi.org/go/Features

Related worksOpenSSI OpenMosix Kerrighed

Distributed Shared Memory Migrates No Yes

Single PTY Namespace Yes No No

Single-Site File Naming Yes No ?

Coherent Cache / File Access Yes ? ?

Migrate Active Filesystem Experimental No No

HA Cluster Filesystem (CFS) Yes No No

HA Single Cluster Name/Address Yes No No

HA Network Load Balancer (LB) Yes No No

LB Auto Detects TCP/UDP IP Services Yes (UDP 1.9.1) N/A N/A

Transparent Socket Migration Not yet No Yes

Automatic Service Failover Yes No No

Diskless Nodes via Network Boot Yes (PXE/EtherBoot) No Yes

http://wiki.openssi.org/go/Features

Outline

• Design• Implementation• Results• Conclusions

Design

• Virtualization(1)

OS

VM VM

PM

App App

OS OS

OS

VM VM

PM

App App

OS OS

OS

VM VM

PM

App App

OS OS

Design

• Virtualization(2)

OS

PM

App App

PM

App App

PM

App App

OS

PM

App App

PM

App App

PM

App App

VM VM VM VM VM VM

OS OS OS OS OS OS

Design

• Virtualization(3)

Design

• Virtualization(4)

OS

VM VM

PM

App App

OS

OS

VM VM

PM

App App

OS

VM VM

PM

App App

Design

• Virtualization(5)

PM

App App

PM

App App

PM

App App

OS

VM VM

PM

App App

OS

OS

VM VM

PM

App App

OS

VM VM

PM

App App

Center User

Design

• Basic Architecture

OSWING

User Space

Kernel Space

Kernel Layer

Translation Layer

Link Layer

Resource Layer

WTRANS

WLN

WRES

Design

• Metaphor

sys ops

fileops

resres res

WTRANS

WLN

WRES

Object

OperationSystem call Path File operation

…/class_key_entry

op

id

flags

Metaphor: everything is a file

Subject

object

subject

Design

• Resource– Computation resource• Process

– Data resource• File• Memory

– Common resource• IPC: msg, sem, shm• …

Design

• AreaBasically, an operating system has two areas for different processes.– Local area: local computing.– Global area: distributed computing. /mnt/repo • The resources in local area are invisible to the

processes in global area.• The resources in global area are consistent.

Design

• Isolation– Process Isolation• Processes in WING are “invisible” to the world outside.

– Data Isolation• Data in WING is “invisible” to the world outside.• The private data of one user is “invisible” to other

users.– Private files.– The private data inside a file.

Implementation

• Components

WTRANS WLN WRES

Common Resource

WCR CFS

Computation Resource

IFS

Data Resource

OS

VM

WMAN

LDFS

WTRANS: kernel translatorWLN: linkerWRES: common resourceWCR: checkpoint kernel moduleCFS: checkpoint file systemWMAN: process managerLDFS: lightweight distributed file systemIFS: image file system

Implementation

• Checkpoint Record the status of a process (transparent to the programmer)– Registers– Opened files– Signals– Credentials– Memory– IPC

Implementation

• Checkpoint– Incremental saving– Dynamic recovery– Dedicated file system

dirty

dirty

dirty

dirty

remove

new

new

removeremoveremove

removeremove

removeremove

remove

new

new

removeremove

remove

CFS

mapmemory

Implementation

• Distributed IPC message 、 semaphore 、 shared memory– System V IPC does not support distributed

communication.– It is complex to dump and to restore the IPC

status in kernel.

Implementation

• Distributed IPC– Coexist of conventional System V IPC and

distributed IPC (the same interface).– Ensure the consistency of IPC resources in the

distributed environment.– Stateless in kernel (for process migration)• Re-implement IPC in user-space. • provide a pseudo file system to store the status of the

IPC resource (RAM based).– High availability

Can process?

Yes

Request

No

Can trigger another event?

Send waiting indexSend result

Yes No

Stop

Owner

Proposer

Implementation

• Distributed IPC– Event Driven

Event-Flow

Implementation

• Msg:Requirements:1) msgtyp = 0 : the first message on the queue is received.2) msgtyp > 0 : the first message of type msgtyp is received.3) msgtyp < 0 : the first message of the lowest type that is less than or equal to the absolute value of msgtyp is received.

• Sem Requirements:

1) RPC has a time-out mechanism (for semtimedop). 2) RPC has an undo mechanism (for exit_sem).

Implementation• Shared memory Problems:– Find (key, value) pair– Frequently update of (key, value) Consistency model:–Sequential consistency Features:–Multi-owner–Versioning–Write invalidation

Implementation

• Shared memory Handle shm_fault

proposer

page owner

reader

reader

writer

shm owner

1

23

3

3

44

4

Case 1:

proposer

shm owner

1

2

Case 2:

page ownerproposer

page owner

shm owner

1

2

3

Case 3:

Implementation

• Shared memory Handle shm_fault

proposer

page owner

reader

writer

1

1

1

2

3

reader

Case 4:

2

2

proposer

reader

writer

1

1

1

2

3reader

Case 5:

2

2

page owner

2

Implementation

• Image File System (IFS) Data can be shared, but the data for each user

needs to be protected.– Each user can have a different view of a file.– The processes of the same user have the same

view of a file.

Implementation

• Image File System

Bitmap State Image Source

User A

User B

File 1

File 2

File 3

Bitmap

Bitmap

Bitmap

User A

User B

File 1

File 2

File 3

Bitmap

Bitmap

Bitmap

File 1

File 2

File 3

Results

• Environment:– VM (x2): 512MB RAM, 2 processors, NAT– OS: Based on kernel 2.6.29.6– Host: 2048MB RAM, Intel Core 2 Duo CPU T6570

• Experiments:– Msg– Sem

Results

• MsgLeader:1.Use msg(key0) to collect start requests from members;2.If all start requests are received then 3. Use msg(keyi) (i > 0) to send starti;4. Use msg(keyN+1) to collect stop requests;5. if all stop requests are received then6. return success;7.return fail;Member i:1.Use msg(key0) to send start request to leader;2.If starti is successfully received by msg(keyi) then 3. create_process(msg_snd);4. create_process(msg_rcv);5. if msg_snd and msg_rcv are finished then6. Use msg(keyN+1) to send stop request to leader;

Process msg_snd1. for n = 1 to ROUNDS do2. Use msg(key0) to receive req;3. Build mtext (|mtext| = req.msgsz);4. i := req.src;5. Use msg(keyi) to send mtext;

Process msg_rcv at memberi1. for n = 1 to ROUNDS do2. req.msgsz := rand() % MSG_SIZE + 1;3. req.src := i;4. Use msg(key0) to send req;5. Use msg(keyi) to receive mtext;

1 leader, 1 member, ROUNDS = 1000, MSG_SIZE = 128, time = 65.36134 sec

Results

• sem

1 leader, 1 member, ROUNDS = 1000, time = 8.810316 sec

Leader:1.Create sem(key0) (sem(key0) has N items);2.Assign each item of sem(key0) to 0;3.If all items of sem(key0) are not 0 then 4. for i = 1 to N do5. remove sem(keyi);

Member i:1.Create sem(keyi); 2.If all sem(keyj) (j ≠ i) are created then 3. for n = 1 to ROUNDS do4. k := rand() % N + 1;5. down(sem(keyk));6. up(sem(keyk));7. endfor8. sem(key0).itemi := 1;

Conclusions

• By operating system virtualization, WING provides processes in different nodes with consistent views of the distributed resources.

• There are no additional libraries required.• Conventional multi-task applications can be used for

distributed computing.

Future work

• Components– WMAN: process manager– LDFS: lightweight distributed file system

• Tools:– Profiler– Test suits

• Stability• Security

Thanks

Q&A