37
CS550 Advanced Operating Systems (Distributed Operating Systems) Instructor: Xian-He Sun Email: [email protected], Phone: (312) 567-5260 Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment TA: TBA X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 1 TA: TBA – Email: [email protected] Office: xxx Office hours: TBA Blackboard: – http://blackboard.iit.edu Class Web site – http://www.cs.iit.edu/~sun/cs550.html

Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Embed Size (px)

Citation preview

Page 1: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

CS550

• Advanced Operating Systems (Distributed Operating Systems)

• Instructor: Xian-He Sun– Email: [email protected], Phone: (312) 567-5260

– Office hours: 1:30pm-2:30pm Tuesday, Thursday at SB229C, or by appointment

• TA: TBA

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 1

• TA: TBA– Email: [email protected]

– Office: xxx

– Office hours: TBA

• Blackboard:– http://blackboard.iit.edu

• Class Web site– http://www.cs.iit.edu/~sun/cs550.html

Page 2: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Outline

• Course information

• Key issues of distributed operating systems

• Hardware concepts– Multiprocessors

– Multicomputers

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 2

– Distributed systems

• Software concepts– Uniprocessor OS

– Distributed OS

– Network OS

– Middleware

Page 3: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

What This Course is About

• Understanding the fundamental concepts of

distributed operating system, and distributed

systems in general

• Learning distributed programming techniques

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 3

– Multithreading, RPC, RMI, Sockets, MPI, etc.

• Understanding the general principles of

distributed paradigms

– MPI, JINI, NFS, Web Service, Grid, etc.

Page 4: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Prerequisite

• CS450 “Operating Systems”

• Familiar with

– Programming in C/C++ or Java

– UNIX tools and development environment

• Command

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 4

• Command

• Editors (vi, emacs), compilers (gcc), makefiles (GNU make)

– Networking programming

• Sockets

• Multithreaded

• RPC, Java RMI

– Basic concepts of computer architecture

Page 5: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Course Materials

• Required:– “Distributed Systems: Principles and Paradigms

(2nd edition)” by Tannenbaum and Van Steen, Pearson/Prentice Hall 2007

• Recommended:– “Distributed Systems: Concepts and Design (4th

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 5

– “Distributed Systems: Concepts and Design (4th edition)” by George Coulouris, Jean Dollimore, and Tim Kindberg, Addison-Wesley, 2005

• Supplemental readings– “Virtual Machines: Versatile Platforms for Systems

and Processes” by Jim Smith and Ravi Nair, Morgan Kaufmann, 2005

Page 6: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Misc. Course Details

• You are expected to attend all of the lectures and presentations

• Grading– Written and programming assignments (35%): individual

work– One exam (35%)– Final project (30%): individual or group with 2-3 students

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 6

– Final project (30%): individual or group with 2-3 students

• Use the course blackboard– Announcements– Lecture notes– Assignments– Discussion– …

Page 7: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Policies

• Collaboration

– Encouraged for high level concepts and

understanding the courses materials

– but …..

• Cheating

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 7

• Cheating

– Copying all or part of another student's homework

– Allowing another student to copy all or part of your

homework

– Copying all or part of code found in a book,

magazine, the Internet, or other resource

Page 8: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Any Questions?

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 8

Page 9: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Personal Introduction

• Research interests

– Middleware

– High End Computing

– Performance Analysis and Modeling

• Research group:

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 9

• Research group:

– Scalable Computing Software Laboratory (SCS)

– http://www.cs.iit.edu/~scs/

– Weekly Research seminar

Page 10: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Distributed Computing at SCSMany workstations are made available for graduate students

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 10

Page 11: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Scalable Computing Software (SCS) Lab.

NU-EUIC

ANL

NCSA/UIUC

Uof C

NU-C

Star TapIIT

I-WIRE

Distributed

Optical Testbed

(Grid)

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 11

Parallel Computers at SCS OMNII-WIRE

Pervasive Computing

Environments at SCS

Page 12: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

• A computer system is called scalable if it can scale up to accommodate ever-increasing performance and functionality demand

• A software is called scalable if it can maintain

Scalable Systems

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 12

• A software is called scalable if it can maintain its functionality and efficiency while the underlying computer system and problem scale up

Page 13: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Evolution of ComputingBigger becomes even bigger

Smaller becomes ever smaller, & connected

Japan’s Earth Simulator

•640 processor nodes (PNs)

•Each PN is a system with 8

vector-type arithmetic

X.Sun (IIT) CS546 Lecture 1 Page 13.

processors (APs)

•Peak performance 40Tflops

1.4m x1m x 2m

approx; 50m x 65m x 17m

Page 14: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

� The fastest

supercomputer in

November 2007

� Scalable ultra-

computer targeted

for 106,496

compute nodes

Scalable ComputingThe way to high performance

X.Sun (IIT) CS546 Lecture 1 Page 14

BlueGene / L

compute nodes

� BlueGene/L is now

running with

212992 processors

and 478.2 TFlops

on LINPACK

� Peak performance:

596 TFlops

Page 15: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Multicore Add Another Dimension

• Cell

– 1 PPE and 8 SPEs

– Shared L2 cache

– EIB

IBM Multicore

X.Sun (IIT) CS546 Lecture 1 Page 15

• Power6 – Dual core

– 5 GHz

Page 16: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Multi-Core• Motivation for Multi-Core

– Exploits improved feature-size and density– Increases functional units per chip (spatial

efficiency)– Limits energy consumption per operation– Constrains growth in processor complexity

• Challenges resulting from multi-core– Aggravates memory wall

• Memory bandwidth– Way to get data out of memory banks

– Way to get data into multi-core processor array

• Memory latency

• Fragments L3 cache

X.Sun (IIT) CS546 Lecture 2 Page 1616

• Fragments L3 cache

– Relies on effective exploitation of multiple-thread parallelism

• Need for parallel computing model and parallel programming model

– Pins become strangle point• Rate of pin growth projected to slow and flatten

• Rate of bandwidth per pin (pair) projected to grow slowly

– Requires mechanisms for efficient inter-processor coordination

• Synchronization

• Mutual exclusion

• Context switching

Page 17: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Distributed Computing: What is the new

� Supercomputers become

ever powerful

� Communities of “Virtual

organizations” are

formed

X.Sun (IIT) CS546 Lecture 1 Page 17

formed

� No VO possesses all

required skills and

resources

� From “community

sharing” to “information

grid”

Page 18: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Higher Quality of Service

IncreasedEfficiency

Integrated VOs: the Grid

Mimic the electrical power grid

X.Sun (IIT) CS546 Lecture 1 Page 18

Reduced Complexity

& Cost

Increased Productivity

Improved Resiliency

Page 19: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

The Challenge of Grid Computing

R

Discovery

Many sources

of data, services,

computationR

Registries organize

services of interestAccessRM

RM

RM

Security & policy

must underlie access

& management

decisions

Virtualization and Resource Management

X.Sun (IIT) CS546 Lecture 1 Page 19

services of interest

to a community

Data integration activities

may require access to, &

exploration/analysis of, data

at many locations

Exploration & analysis

may involve complex,

multi-step workflows

RM

RMRM

Resource management

is needed to ensure

progress & arbitrate

competing demandsSecurity

serviceSecurity

servicePolicy

servicePolicy

service

Page 20: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Higher Quality of Service

IncreasedEfficiency

Cloud ComputingMimic the electrical power grid

X.Sun (IIT) CS546 Lecture 1 Page 20

Reduced Complexity

& Cost

Increased Productivity

Improved Resiliency

Page 21: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

What is Cloud Computing?

• A computing paradigm in which tasks are assigned to a combination of connections, software and services accessed over a network

• The network of servers and connections is collectively known as the cloud

• Other terms

X.Sun (IIT) CS546 Lecture 1 Page 21

• Other terms– Mesh Computing

– Elastic Cloud Computing

– Network Computing

2009-1-21 21

Page 22: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

What are the diff. between Cloud & Grid?

A commercial version of Grid computing

More likely under single management (VO)

More likely provide computing resources than resource

sharing

More like to serve lots of modest size jobs

X.Sun (IIT) CS546 Lecture 1 Page 22

More like to serve lots of modest size jobs

billions of dollars being spent by the likes of Amazon, Google, and Microsoft to create real commercial grids

The prospect of needing only a credit card to get on-demand access to 100,000+ computers in tens of data centers distributed throughout the world

2009-1-21 22

Page 23: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Cloud: Integrated Resource

Provide virtual computing environments on demand

X.Sun (IIT) CS546 Lecture 1 Page 23

Page 24: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

X.Sun (IIT) CS546 Lecture 1 Page 24

Page 25: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Virtualization and Virtual Machine

• Virtual service and virtual machine: the key for data

center and Cloud computing

• Virtual machine (in distributed environment): A

hosting platform where each user can create and

operate in a private machine(s), based on

Grid/distribute infrastructure, achieving:

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 25

Grid/distribute infrastructure, achieving:

– Virtualization

– Isolation and Protection

– Privacy

– Accountability and QoS

– On-demand creation and provisioning

Page 26: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Distributed (Dynamic) Virtual Machine

DVM

DVM’

Virtual service node

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 26

DVM Host

(physical)

Page 27: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Virtualization: Key Technique

• Two-level OS structure– Host OS– Guest OS

• Strong isolation– Administration isolation– Installation isolation– Fault / attack Isolation

DVMS1 DVMSn

– Fault / attack Isolation– Recovery, migration, and

reconfiguration

• Virtual service node– DVM Service (DVMS)– Guest OS – Internetworking enabled

One DVM host

Host OS

…Guest OS Guest OS

Page 28: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Embedded Systems: What is the new

• Devices become

smaller and more

powerful

• Devices are

coordinated via network

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 28

coordinated via network

• From “autonomous

computing” to

coordinated “human-

center computing”

Page 29: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Pervasive Computing

MIT’s view of pervasive computing

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 29

Page 30: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Evolution of Computing

Remote Comm.

Coordination

High Availability

Security & Fault

Distributed

Computing

Federated Communities

Virtualization

Standardization

Uneven Conditions Grid

Computing

Pervasive

Global

Smart Space

(Cloud)

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 30

Security & Fault

Tolerance

Mobility

Mobile Network

Adaptive & Reflective

Energy Aware system

Computing

Mobile

Computing

Smart Space

Invisibility

Localized Scalability

Context Awareness

Pervasive

Computing

Page 31: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Service Oriented Computing

Convergence of Core Technology Standards allows

Have beenconverging WSRF

Started far apart in

applications &

technology

WS-I

Compliant

Technology

Stack

• Internet computing: Web service

Computing as a service

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 31

Convergence of Core Technology Standards allows

Common base for Business and Technology ServicesWeb service

• Grid computing: Grid service and is merging with WS

• Pervasive computing: Human centered service

• Mobile computing: Phone service

Page 32: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Future Computing: Human-centered Service

Devices

They are connected to form `smart

space’

Grids link `smart spaces’ to

A new IT booming is coming

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 32

Devices

become smaller

and powerful

A device is an entry of the cyber

world

`smart spaces’ to support `global

smartness’

Page 33: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

The Third Wave of Computing Revolutions

• Network, communication, and interconnectivity

• Begin in the late 90s until now

• Machine/machine, software/software,

people/people

• Anytime, anywhere, WWW

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 33

• Anytime, anywhere, WWW

• The communications landscape is shifting

• Promising but a continued work

How do we get there?

– Many Challenges ahead!

Page 34: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Resource Management & Task Scheduling

• DVM provider selection:

– Among a set of DVM providers, which one should be

chosen to host an DVM?

• DVMS selection:

– Among a set of potential tenants (DVMSes), which

ones to host? (for QoS, resource utilization, security…)

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 34

ones to host? (for QoS, resource utilization, security…)

• The Grid Harvest Service (GHS) System

– A long-term application-level performance

prediction and task scheduling system for non-

dedicated distributed (Grid) environments

– Reservation-based versus shared resources

– Good, but more issues, such as QoS, FT, need to solve

Page 35: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

1,000

10,000

100,000

Perf

orm

ance Uni-rocessor

Multi-core/many-core processor

Processor-memory performance gap

• Processor performance

increases rapidly

– Uni-processor: ~52% until 2004, ~25% since then

– New trend: multi-core/many-core architecture

• Intel TeraFlops chip, 2007

Source: Intel

52%

20%

60%

X.Sun (IIT) CS546 Lecture 1 Page 35

1

10

100

1980 1985 1990 1995 2000 2005 2010

YearP

erf

orm

ance

Memory

• Intel TeraFlops chip, 2007

– Aggregate processor performance much higher

• Memory: ~9% per year

• Processor-memory speed

gap keeps increasing

Source: OCZ

25%

52%

9%9%

The Memory-wall problem requires a rethinking of the design of OS

Page 36: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Gordon

Moore’s Law

“the number of transistors

that can be fabricated on a

single integrated circuit at

a reasonable cost doubles

every year…”

X.Sun (IIT) CS546 Lecture 1 Page 36

• How?– Material techniques such as extreme ultraviolet lithography (<100 nm)

• Corollary– Processor speed doubles at same rate

• Problem– Moore’s law almost hits its limitation

– Then , what is the impact? who and which technology will lead?

Page 37: Learning distributed programming techniquessun/pdfd/cs550-lec1.pdf · distributed operating system, and distributed ... (2 nd edition) ” by Tannenbaum ... • Recommended: – “Distributed

Any Questions?

X.Sun (IIT) CS550: Advanced OS Lecture 1 Page 37