54
Execution Environments for Distributed Computing Part 0. Course Introduction EEDC 34330 Master in Computer Architecture, Networks and Systems - CANS

Part.0.eedc

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Part.0.eedc

Execution Environments for Distributed Computing

Part 0. Course

Introduction

EEDC

34330

Master in Computer Architecture, Networks and Systems - CANS

Page 2: Part.0.eedc

2

Content

Part 0. Course Introduction

0.1.Welcome

0.2.Course Organization

0.3.Course Content and Motivation

0.4.Students presentation warm-up

0.5.Conferences and Journals in Computers Science. How its work?

Page 3: Part.0.eedc

Execution Environments for Distributed Computing

0.1 Welcome

EEDC

34330

Master in Computer Architecture, Networks and Systems - CANS

Page 4: Part.0.eedc

4

Professor Background (Who am I?)

� Who am I?– Current Teaching Activity:

• System Performance Evaluation (FIB)• Execution Environments for Distributed Computing (CANS)

– Research interests : IT Resource Management• Cloud Computing, Green Computing, Big Data …

– Research groups• Professor at High Performance Computing Group at DAC (UPC).• Manager of Autonomic Systems and eBusiness Platforms research

group at BSC.

– Other interests: • I enjoy talking about technology and mountain biking

– For more details : www.JordiTorres.org

Page 5: Part.0.eedc

5

Welcome

And you?

Page 6: Part.0.eedc

6

Page 7: Part.0.eedc

Execution Environments for Distributed Computing

0.2 Course

Motivation &

Organization

EEDC

34330

Master in Computer Architecture, Networks and Systems - CANS

Page 8: Part.0.eedc

8

Syllabus Stuff (official course description)

� “Guia docente oficial” (Spanish) : http://docencia.ac.upc.edu/master//es/course.24.htm l

� 2006 OBJETIVOS/OBJECTIVES (“ absolete ”): El objetivo del curso es ofrecer una visión de los desarrollos y la investigación en los entornos de ejecución para sistemas paralelos y distribuidos. E n el curso se hará especial énfasis en los entornos requeridos en las nuevas aplicaciones ebusiness y grid. Se estudiará la organización interna y la gestión que recursos requerida para garantizar funcionalidades y calidad de servicio.

Page 9: Part.0.eedc

9

Syllabus Stuff (official course description)

� METODOLOGÍA: Esta asignatura es avanzada y los conceptos básicos se suponen ya conocidos en asignaturas anteriores y está enfocada principalmente a introducir al estudiante en los conceptos más avanzados.Con clases magistrales se introducirán los temas que se trabajarán sobre artículos recientes publicados en congresos o revistas especializadas y propuestas de proyectos en desarrollo. Los alumnos deberán realizar la lectura y discusión de estos artículos.Los alumos realizaran un trabajo personal y presentaran un tema de investigación relacionado con la asignatura. El estudiante tendrá que desarrollar su capacidad crítica para evaluar la adecuación de las soluciones presentadas o, en su caso, proponer posibles ideas.

Page 10: Part.0.eedc

10

Syllabus Stuff (official course description)

� De las 150 horas de trabajo, la distribución será:

– 60 horas (4 horas semana, 15 semanas) de clase presencial: se distribuirán en:

• clase magistrales, • trabajos, • casos de estudio y debate de artículos de investigación• presentación de trabajos

– 90 horas de trabajo no presencial: • lectura de artículos de investigación, • realización de trabajos, • preparación de las presentaciones • y estudio.

Page 11: Part.0.eedc

11

Where to find the information

� Teacher contact :Office: Campus Nord, Block C6, office 217Phone: +34 93 401 7223Email: [email protected]

� Course Slides web sitehttp://www.jorditorres.org/teaching/eedc-2011-execu tion-

environments-for-distributed-computing/eedc2012-sli des/

� Students EEDC web sitehttp://www.jorditorres.org/news/

� Official web site for personal and confidential information

“RACÓ” at http://www.fib.upc.edu

Page 12: Part.0.eedc

12

Students EEDC web site

Page 13: Part.0.eedc

13

EEDC as a network of contacts (LinkedIn)

Page 14: Part.0.eedc

14

This course provides

� an overview of the wide scope of this area and introduces past and current research focusing on conceptual and practical aspects .

� The course also has the objective of introducing the student to research. In general the EEDC course is focused on developing skills rather than content.

� For this reason each year we focus the course on some of the dichotomies which come to mind when looking at the strategies available to deal with concrete problems in the wider EEDC space and in relation to some problem of the utmost relevance.

Page 15: Part.0.eedc

15

Content

� paper readings assigned for some of the classes. � We will have two main reading parts in the course,

– one for general papers (and homework) and – one for more in depth research in the area (related with the

research project).

� All students taking the course are required to complete a research project (or “state-of-the-field” review).

� Discussions will be led by one or more students and may include brief presentations .

� Moreover we will introduce the student to the research information resources .

� We expect also to have seminars by IT companies .

Page 16: Part.0.eedc

16

This year’s special focus

� Current Execution Environment of Distributed Systems:

Explosion of Cloud Computing, dominated by the boom of available information ( Big Data ) and all in a sustainable way ( Green Computing ).

Page 17: Part.0.eedc

17

Syllabus Stuff

� “state-of-the-field” reviewAll students taking the course are required to complete a research project (or “state-of-the-field” review). The project is intended to provide the student with an opportunity to gain experience with research in a topic related to the content of the course. Moreover we will introduce the student to the research information resources

Page 18: Part.0.eedc

18

Tentative Grading Policy (*)

� Paper Readings/Presentations (and homework): 35%– Delivery: 20 %– Content: 10 %– Public presentation: 5%

� Research Project: 35%– Research survey content and writing 20%– Presentation of the workshop paper 10%

� Participation: 35%– Class/Seminars participation (in paper discussions) 10%– Class attendance: 25%

(*) pending to know the final number of students en rolled. Default organization of course have to be ch anged

Page 19: Part.0.eedc

19

GROUPS (*)

� Paper Reading/Presentation: 8 group x 3 members

� Research Project Group: 1 or 2 students

(*) pending to know the final number of students en rolled. Default organization of course have to be ch anged

Page 20: Part.0.eedc

Execution Environments for Distributed Computing 0.3

CourseContent

EEDC

34330

Master in Computer Architecture, Networks and Systems - CANS

Page 21: Part.0.eedc

21

Transactional

Indexing

� Today’s applications:– Heterogeneous set of

applications with different characteristics and service goals:

• Characteristics : CPU, I/O, network, memory… intensive

• Service goals : Response time, throughput, fault-tolerant, deadlines…

Streaming

Hardware

T10 T20

T30T40

T50T11 T21

T31T41

T51

Workflow

Today landscape

Big Data

Page 22: Part.0.eedc

22

Today landscape

� Types of applications– This set of applications includes:

• Interactive web workload • Non-interactive workloads such document indexing

or data-intensive jobs.• Scientific applications that range from a single

thread intensive CPU consuming to multi-threading applications.

• …

– Current workloads: based on a heterogeneous set of applications that deliver critical services to their customers (SLA) .

Page 23: Part.0.eedc

23

Today landscape

� Type of applications– The SLA for different applications tend to be based on

different characteristics:• Performance goals :

– For interactive : response time or …– For non interactive : for example completion time.

• Time scale of managing:– Non-interactive workloads typically require computation for an

extended period of time.– Interactive workloads present a short duration for individual

requests and require short control cycles.

• Clients priority : e.g. Gold, Silver, Platinum, Standard.

• … and others

Page 24: Part.0.eedc

24

Example (commercial data centers)

� Current use: – Resource provisioning: maximum granularity is a node– Off peak periods : wasted computing power– Set of applications : time partition (if possible!)

• Transactional on work hours• Batch on nights

� Challenges:– Reduce waste of resources– Increase provisioning granularity– Management of mixed workloads– Service level guarantees– Better exploitation of resources– …

Wor

kloa

d in

tens

ity

Day 1 Day 2 Day 3

Example:

Web workload

Page 25: Part.0.eedc

25

Execution Environment

� To efficiently execute all these kinds of applicati ons, a new execution environment is needed

Transactional

Indexing

StreamingT10 T20

T30T40

T50T11 T21

T31T41

T51

Grid

Big Data

Page 26: Part.0.eedc

26

How do these applications access resources?

Transactional

Indexing

StreamingT10 T20

T30T40

T50T11 T21

T31T41

T51

Grid

� Solution: software layerabstracts the details in a heterogeneous distributed computing environment.

Big Data

Page 27: Part.0.eedc

27

Execution Environment or Middleware?

Middleware is a somewhat overloaded term!!!!!!

� For this course:– Generally speaking, middleware provides software services for

application programs INCLUDING the basic operating system and networking services above the resources.

Operating Systemand Networking Services

Software services

USER Application

Resources

Page 28: Part.0.eedc

28

� Who is responsible for managing resources?

Different layers

Data Center Infrastructure

Applications

System

Hardware

Web Services

Aplication Server

Web server

Java Virtual Machine

Operating System

Virtualization layer

MIDDLEWARE

Page 29: Part.0.eedc

29

EEDC Content (2012)

� Part 0: Introduction� Part 1 : Distributed Computing Scenario� Part 2: Current Trends in EEDC: Cloud Computing as a core part� Part 3: Scientific EEDC� Part 4: Energy Challenges in today EEDC: Green Com puting� Part 5: Big Data challenges in EEDC� Part 6: Other challenges in today EEDC� Part 7: EEDC Open Seminars � Part 8: Fourth EEDC Workshop

Page 30: Part.0.eedc

30

EEDC Workshop series

Page 31: Part.0.eedc

31

EEDC open seminars

� A series of seminars with the view of IT industry and IT entrepreneurs. (Invited speakers)

� Aimed to stimulate intellectual conversations about real cases related with the course

� as well as to allow an opportunity to learn more about each “case study” presented through Q&A session.

� The seminars are open to FIB community

Page 32: Part.0.eedc

32

EEDC open seminars

� Tentative 2012 EEDC Open Seminar case study areas:– “Infrastructure as a Service”– “Software as a Service”– “Green Computing”– “Open Data”– “Big Data”

Page 33: Part.0.eedc

33

Tentative schedule

Page 34: Part.0.eedc

Execution Environments for Distributed Computing

0.4 Students

PresentationWarm-up

EEDC

34330

Master in Computer Architecture, Networks and Systems - CANS

Page 35: Part.0.eedc

35

Procedure for delivering homework

1. Read the assigned documentation/paper– Each of you have to read the document/paper– Meet with your team to discuss and do the homework

2. Build the presentation– Consider the specifications for the corresponding homework– Be sure to follow the EEDC template– Be sure to include your names on the first slide

3. Upload the presentation on to slideshare (or simila r)– Create your account if necessary at http://www.slideshare.net

4. Link your presentation to EEDC students website– You will have an account at www.JordiTorres.org/news– Find the correct post and edit it to link the presentation

5. Check that everything is ok before the deadline

Page 36: Part.0.eedc

36

Procedure for presenting homework

1. Prepare the presentation and bring it to the class with your «pendrive» or similar– Optionally you could use the uploaded presentation

2. Only two (*) groups will present the paper/homework– The groups will be chosen at random

3. There will be a Q&A session after presentations– This will be considered as class participation and count towards

final marks

4. After the presentation listeners will choose the be st presentation (*)

(*) In some cases only 1 group will present

Page 37: Part.0.eedc

37

First homework: Wark-up

� Project:– prepare a presentation about an easy topic: «Distributed Systems»

� Characteristics:– A 15 minute presentation (aprox.).

� Source of information?– For this homework you could use any information on the Internet

(e.g. Wikipedia)– Where can we find more? (next day!)

� Delivery deadline: – Thursday 23th February at 11:00am

� Presentation: – Thursday 23th February at 12:00am ( class room)

Page 38: Part.0.eedc

Execution Environments for Distributed Computing

0.5 Conferences

&Journals

EEDC

34330

Master in Computer Architecture, Networks and Systems - CANS

Page 39: Part.0.eedc

39

Practical view

� Source of research information– Patents– Journals– Technical Magazines– Conferences– Workshops– Others

� Why to prefer Conferences?� Why to prefer Journals?� Other important issues� Case Studies

Page 40: Part.0.eedc

40

Patents

Page 41: Part.0.eedc

41

Google Scholar

Page 42: Part.0.eedc

42

Google Scholar

Page 43: Part.0.eedc

43

Conferences vs Journals

� Conferences have higher status in computer systems – Note that, in computer systems, the top conferences are more

important than even the top journals. – (the best researchers want to send their papers to conferences

rather than journals)

� Conferences have higher quality in computer systems– The top conferences use a rigorous review process in which (3-7)

program committee members evaluate each submitted paper. – Furthermore, these conferences often "shepherd" the accepted

papers, i.e. the program committee members supervise the revision of the accepted papers according to the reviewers' comments.

– The top conferences in computer systems typically accept 10%-20% of the submitted papers. (Example EuroSys 27/178 = 15%)

Page 44: Part.0.eedc

44

Conferences vs Journals

� In most scientific fields, journals have higher standards than conferences; computer science is a rare exception.

� Journals may have longer page limits. A journal paper could recap or given an overview of an entire research area.

� The journal version of a publication will be cited more than the conference version, because the journal version has a later date and thus seems mor e authoritative.

Page 45: Part.0.eedc

45

Journal Citation Report

Page 46: Part.0.eedc

46

IEEE Journals & Magazines

Page 47: Part.0.eedc

47

ACM Journals and Magazines

Page 48: Part.0.eedc

48

Conference CFP

Page 49: Part.0.eedc

49

Case Study: Middleware 2011

Page 50: Part.0.eedc

50

Case Study: Middleware 2011

� Call for Papers� Call for Posters� Call for Workshop Proposals� Call For Industrial Track Papers� Program� Accepted Papers� Accepted Poster Papers� Workshops� Tutorials� Doctoral Symposium� Keynotes� Important dates� Organization� Venue

Page 51: Part.0.eedc

51

The other side: E-Energy conference� http://events.networks.imdea.org/content/e-energy-2 012/home

Page 52: Part.0.eedc

52

The other side: E-Energy conference

Page 53: Part.0.eedc

53

The other side: E-Energy conference

Page 54: Part.0.eedc

54

Case study: One big conference