1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi...

Preview:

Citation preview

1

Alexandru V Staicu1, Jacek R. Radzikowski1

Kris Gaj1, Nikitas Alexandridis2, Tarek El-Ghazawi2

1 George Mason University2 George Washington University

Effective Use of Networked Reconfigurable Resources

http://ece.gmu.edu/lucite

2

Problem:

• Reconfigurable resources expensive and underutilized

• Many of these resources available over the network

• It is desirable to leverage networked reconfigurable resources to help other users within the same organization

3

Tasks 1, 2, 3

Task 3

Task 1

Execution Host 1

ExecutionHost 2

Execution Host 3

Master HostSubmission Host

Task 2

Approach: Adapt and use a Job Management System

4

Approach:

• Select the most suitable existing Job Management System (JMS)

• Extend this JMS to recognize and utilize reconfigurable resources

- identify and define functional requirements- rank known systems according to these requirements- identify which JMS is the easiest to extend

- add new dynamic resources- configure scheduling to be based on these new resources

5

Tasks 1, 2, 3

Task 3

Task 1

Execution Host 1

ExecutionHost 2

Execution Host 3

Master Host

Submission Host

Task 2

Networked Reconfigurable Resource Management System

FPGAboards

6

Myrinet SAN/LAN

Switch

WILDFORCE

Dell

WILDSTAR

Dell

SLAAC

Dell

WILDSTAR

Dell

WILDFORCE

Dell Sparc 10

SLAAC Research Reference Platform

Ethernet Intelligent Hub 100

Mbps

Heterogeneous network with FPGA-based accelerators

Dell HP

Sparc 20 DellGateway

SLAAC WILDSTAR

WILDFORCE SLAAC

Ethernet Intelligent Hub 100

Mbps

7

Functional units of a typical Job Management System

jobs & their requirements

UserServer

Job SchedulerResourceMonitor

availableresources

resource requirements

scheduling policies

JobDispatcherresource allocation

and job execution

Resource Manager

8

Classification of Investigated Systems (1)

Centralized JMS

DistributedJMS w/o a Central Scheduler

DistributedOperating

System

• LSF• CODINE• PBS• Condor• RES

• Globus• Legion• NetSolve

• MOSIX

9

ParameterStudy

Scheduler

ResourceMonitor andForecaster

DistributedComputingInterface

• Compaq DCE• AppLES • NWS

Classification of Investigated Systems (2)

10

Operating system, flexibility, user interface

LSF Codine PBS CONDOR RES

Distribution

Source code

OS Support

User Interface

SolarisLinuxTru64NT

GUI &CLI

CLI

com pub pub/com pub gov

GUI &CLI

GUI &CLI

GUI &CLI

11

Scheduling and Resource Management

LSF Codine PBS CONDOR RES

Batch jobs

Interactive jobs

Parallel jobs

Accounting

12

Efficiency and Utilization

LSF Codine PBS CONDOR RES

Stage-in andstage-out

Timesharing

Process migration

Dynamic loadbalancing

Scalability

13

Fault Tolerance and Security

LSF Codine PBS CONDOR RES

Checkpointing

Daemon fault recovery

Authentication

Authorization

14

Documentation and Technical Support

LSF Codine PBS CONDOR RES

Documentation

Technicalsupport

15

JMS features supporting extension to reconfigurable hardware

• capability to define new dynamic resources

• strong support for stage-in and stage-out- configuration bitstreams- executable code- input/output data

• support for Windows NT and Linux

16

Ranking of Centralized Job Management Systems (1)

Capability to define new dynamic resources:

Excellent: LSF, PBS, CODINEMore difficult: CONDOR, RES

Stage-in and stage-out:

Excellent: LSF, PBSLimited: CONDORNo: CODINE, RES

17

Ranking of Centralized Job Management Systems (2)

Overall suitability to extend to reconfigurable hardware:

1. LSF2. CODINE3. PBS4. CONDOR5. RES

without changing the JMS source code

requires changes to the JMS source code

18

Submission host

LIM

Batch API

Master host

MLIM

MBD

Execution host

SBD

Child SBD

LIM

RES

User job

Extension of LSF to reconfigurable hardware (1)Operation of LSF

LIM – Load Information ManagerMLIM – Master LIMMBD – Master Batch DaemonSBD – Slave Batch DaemonRES – Remote Execution Server

queue1

2

3

45

6 7

89

10

11

12

13

Loadinformation

otherhosts

otherhosts

bsub app

19

Extension of LSF to reconfigurable hardware(2)

Submission host

LIM

Batch API

Master host

MLIM

MBD

Execution host

SBD

Child SBD

LIM

RES

User job

ELIM – External Load Information ManagerACS API – Adaptive Computing Systems API

queue1

2

3

45

6 7

89

10

11

12

13

Loadinformation

otherhosts

otherhosts

bsub app

ELIM

ACS API

14FPGAboard

Statusof theboard

20

Conclusions (1)

• 12 systems evaluated using 25 functional requirements + the suitability of extension to support reconfigurable hardware

• LSF, CODINE, PBS, and Condor ranked the highest in the functional requirements

• LSF, CODINE, and PBSPro found easy to extend without changes in their source codes

• LSF most suitable to support reconfigurable hardware

21

• General software architecture of the extended system developed

• Experimental developments, verification and performance evaluation of the extended system in progress

Conclusions (2)

Recommended