31
Spring 2012 Master of Computer Application (MCA) – Semester V MC0085 – Advanced Operating Systems (Distributed Systems) – 4 Credits (Book ID: B 0967) Assignment Set – 1 1. Describe the following: o Distributed Computing Systems o Distributed Computing System Models Answer: Distributed Computing Systems Over the past two decades, advancements in microelectronic technology have resulted in the availability of fast, inexpensive processors, and advancements in communication technology have resulted in the availability of cost-effective and highly efficient computer networks. The advancements in these two technologies favour the use of interconnected, multiple processors in place of a single, high-speed processor. Computer architectures consisting of interconnected, multiple processors are basically of two types: In tightly coupled systems, there is a single system wide primary memory (address space) that is shared by all the processors (Fig.

mc0085 set 1 killer assignment spring 2012

  • Upload
    killer1

  • View
    461

  • Download
    0

Embed Size (px)

Citation preview

Page 1: mc0085 set 1 killer assignment spring 2012

Spring 2012 Master of Computer Application (MCA) – Semester V MC0085 – Advanced Operating Systems (Distributed Systems) – 4 Credits (Book ID: B 0967) Assignment Set – 1

1. Describe the following: o Distributed Computing Systems

o Distributed Computing System Models

Answer:Distributed Computing Systems

Over the past two decades, advancements in microelectronic technology have resulted

in the availability of fast, inexpensive processors, and advancements in communication

technology have resulted in the availability of cost-effective and highly efficient computer

networks. The advancements in these two technologies favour the use of interconnected,

multiple processors in place of a single, high-speed processor.

Computer architectures consisting of interconnected, multiple processors are basically of two

types:

In tightly coupled systems, there is a single system wide primary memory (address space)

that is shared by all the processors (Fig. 1.1). If any processor writes, for example, the value

100 to the memory location x, any other processor subsequently reading from location x will get

the value 100. Therefore, in these systems, any communication between the processors usually

takes place through the shared memory.

In loosely coupled systems, the processors do not share memory, and each processor has

its own local memory (Fig. 1.2). If a processor writes the value 100 to the memory location x,

this write operation will only change the contents of its local memory and will not affect the

contents of the memory of any other processor. Hence, if another processor reads the memory

location x, it will get whatever value was there before in that location of its own local memory. In

Page 2: mc0085 set 1 killer assignment spring 2012

these systems, all physical communication between the processors is done by passing

messages across the network that interconnects the processors.

Usually, tightly coupled systems are referred to as parallel processing systems, and loosely

coupled systems are referred to as distributed computing systems, or simply distributed

systems. In contrast to the tightly coupled systems, the processors of distributed computing

systems can be located far from each other to cover a wider geographical area. Furthermore, in

tightly coupled systems, the number of processors that can be usefully deployed is usually small

and limited by the bandwidth of the shared memory. This is not the case with distributed

computing systems that are more freely expandable and can have an almost unlimited number

of processors.

Page 3: mc0085 set 1 killer assignment spring 2012

Hence, a distributed computing system is basically a collection of processors interconnected

by a communication network in which each processor has its own local memory and other

peripherals, and the communication between any two processors of the system takes place by

message passing over the communication network. For a particular processor, its own

resources are local, whereas the other processors and their resources are remote. Together, a

processor and its resources are usually referred to as a node or site or machine of the

distributed computing system.

Distributed Computing System Models

Distributed Computing system models can be broadly classified into five categories. They are

Minicomputer model

Workstation model

Workstation – server model

Processor – pool model

Hybrid model

Minicomputer Model :-

Page 4: mc0085 set 1 killer assignment spring 2012

The minicomputer model (Fig. 1.3) is a simple extension of the centralized time-sharing system.

A distributed computing system based on this model consists of a few minicomputers (they may

be large supercomputers as well) interconnected by a communication network. Each

minicomputer usually has multiple users simultaneously logged on to it. For this, several

interactive terminals are connected to each minicomputer. Each user is logged on to one

specific minicomputer, with remote access to other minicomputers. The network allows a user to

access remote resources that are available on some machine other than the one on to which

the user is currently logged.

The minicomputer model may be used when resource sharing (such as sharing of information

databases of different types, with each type of database located on a different machine) with

remote users is desired.

The early ARPAnet is an example of a distributed computing system based on the

minicomputer model.

Workstation Model:-

Page 5: mc0085 set 1 killer assignment spring 2012

A distributed computing system based on the workstation model (Fig. 1.4) consists of several

workstations interconnected by a communication network. An organization may have several

workstations located throughout a building or campus, each workstation equipped with its own

disk and serving as a single-user computer. It has been often found that in such an

environment, at any one time a significant proportion of the workstations are idle (not being

used), resulting in the waste of large amounts of CPU time. Therefore, the idea of the

workstation model is to interconnect all these workstations by a high-speed LAN so that idle

workstations may be used to process jobs of users who are logged onto other workstations and

do not have sufficient processing power at their own workstations to get their jobs processed

efficiently.

Page 6: mc0085 set 1 killer assignment spring 2012

In this model, a user logs onto one of the workstations called his or her "home" workstation and

submits jobs for execution. When the system finds that the user's workstation does not have

sufficient processing power for executing the processes of the submitted jobs efficiently, it

transfers one or more of the processes from the user's workstation to some other workstation

that is currently idle and gets the process executed there, and finally the result of execution is

returned to the user's workstation.

Workstation – Server Model:-

The workstation model is a network of personal workstations, each with its own disk and a local

file system. A workstation with its own local disk is usually called a diskful workstation and a

workstation without a local disk is called a diskless workstation. With the proliferation of high-

speed networks, diskless workstations have become more popular in network environments

than diskful workstations, making the workstation-server model more popular than the

workstation model for building distributed computing systems.

Page 7: mc0085 set 1 killer assignment spring 2012

A distributed computing system based on the workstation-server model (Fig. 1.5) consists of a

few minicomputers and several workstations (most of which are diskless, but a few of which

may be diskful) interconnected by a communication network.

In this model, a user logs onto a workstation called his or her home workstation. Normal

computation activities required by the user's processes are performed at the user's home

workstation, but requests for services provided by special servers (such as a file server or a

database server) are sent to a server providing that type of service that performs the user's

requested activity and returns the result of request processing to the user's workstation.

For better overall system performance, the local disk of a diskful workstation is normally used

for such purposes as storage of temporary files, storage of unshared files, storage of shared

files that are rarely changed, paging activity in virtual-memory management, and caching of

remotely accessed data.

Processor – Pool Model:-

The processor-pool model is based on the observation that most of the time a user does not

need any computing power but once in a while the user may need a very large amount of

computing power for a short time (e.g., when recompiling a program consisting of a large

number of files after changing abasic shared declaration). Therefore, unlike the workstation-

server model in which a processor is allocated to each user, in the processor-pool model the

Page 8: mc0085 set 1 killer assignment spring 2012

processors are pooled together to be shared by the users as needed. The pool of processors

consists of a large number of microcomputers and minicomputers attached to the network. Each

processor in the pool has its own memory to load and run a system program or an application

program of the distributed computing system.

The pure processor-pool model (Fig. 1.6), the processors in the pool have no terminals attached

directly to them, and users access the system from terminals that are attached to the network

via special devices. These terminals are either small diskless workstations or graphic terminals,

such as X terminals. A special server (called a run server) manages and allocates the

processors in the pool to different users on a demand basis. When a user submits a job for

computation, an appropriate number of processors are temporarily assigned to his or her job by

the run server. For example, if the user's computation job is the compilation of a program having

n segments, in which each of the segments can be compiled independently to produce separate

relocatable object files, n processors from the pool can be allocated to this job to compile all the

n segments in parallel. When the computation is completed, the processors are returned to the

pool for use by other users.

In the processor-pool model there is no concept of a home machine. That is, a user does not log

onto a particular machine but to the system as a whole. This is in contrast to other models in

which each user has a home machine (e.g., a workstation or minicomputer) onto which he or

she logs and runs most of his or her programs there by default.

Page 9: mc0085 set 1 killer assignment spring 2012

Amoeba proposed by Mullender et al. in 1990 is an example of distributed computing systems

based on the processor-pool model.

Hybrid Model:-

Out of the four models described above, the workstation-server model, is the most

widely used model for building distributed computing systems. This is because a large number

of computer users only perform simple interactive tasks such as editing jobs, sending electronic

mails, and executing small programs. The workstation-server model is ideal for such simple

usage. However, in a working environment that has groups of users who often perform jobs

needing massive computation, the processor-pool model is more attractive and suitable.

To combine the advantages of both the workstation-server and processor-pool models, a hybrid model may be used to build a distributed computing system. The hybrid model is based on the workstation-server model but with the addition of a pool of processors. The processors in the pool can be allocated dynamically for computations that are too large for workstations or that requires several computers concurrently for efficient execution. In addition to efficient execution of computation-intensive jobs, the hybrid model gives guaranteed response to interactive jobs by allowing them to be processed on local workstations of the users. However, the hybrid model is more expensive to implement than the workstation-server model or the processor-pool model

2. Describe the following with respect to Remote Procedure Calls: o The RPC Model

o STUB Generation

Answer:The RPC Model

The RPC mechanism is an extension of a normal procedure call mechanism. It enables a call to

be made to a procedure that does not reside in the address space of the calling process. The

called procedure may be on a remote machine or on the same machine. The caller and callee

have separate address space; so called procedure has no access to the caller’s environment.

Implementation of RPC Mechanism

To achieve the goal of semantic transparency, the implementation of RPC is based on the

concept of stubs. Stubs provide a perfectly normal local procedure call abstraction. It conceals

Page 10: mc0085 set 1 killer assignment spring 2012

from programs the interface to the underlying RPC system. On the client side and the server

side, a separate stub procedure is associated with each. To hide the existence of functional

details of the underlying network, an RPC communication package (called RPC runtime) is used

in both the client and server sides.

Thus implementation of an RPC mechanism involves the following five elements:

1. The Client

2. The Client stub

3. The RPC Runtime

4. The server stub, and

5. The server

The job of each of these elements is described below:

Page 11: mc0085 set 1 killer assignment spring 2012

1. Client: To invoke a remote procedure, a client makes a perfectly local call that invokes the

corresponding procedure in the stub

2. Client Stub:

The client stub is responsible for performing the following tasks:

On receipt of a call request from the client, it packs the specification of the target

procedure and the arguments into a message and asks the local runtime system to send

it to the server stub.

On receipt of the result of procedure execution, it unpacks the result and passes it to the

client.

3. RPCRuntime:

The RPC runtime handles the transmission of the messages across the network

between client and server machines. It is responsible for retransmissions, acknowledgements,

and encryption.

On the client side, it receives the call request from the client stub and sends it to the server

machine. It also receives reply message (result of procedure execution) from the server

machine and passes it to the client stub.

On the server side, it receives the results of the procedure execution from the server stub and

sends it to the client machine. It also receives the request message from the client machine and

passes it to the server stub.

4. Server Stub:

The functions of server stub are similar to that of the client stub. It performs the following

two tasks:

The server stub unpacks the call receipt messages from local RPCRuntime and makes a

perfect local call to invoke the appropriate procedure in the server.

Page 12: mc0085 set 1 killer assignment spring 2012

The server stub packs the results of the procedure execution received from server, and asks

the local RPCRuntime to send it to the client stub.

5. Server: On receiving the call request from the server stub, the server executes the

appropriate procedure and returns the result to the server stub.

STUB Generation The stubs can be generated in the following two ways: Manual Stub Generation: RPC implementer provides a set of translation functions from which user can construct his own stubs. It is simple to implement and can handle complex parameters. Automatic Stub Generation: This is the most commonly used technique for stub generation. It uses an Interface Definition Language (IDL), for defining the interface between the client and server. An interface definition is mainly a list of procedure names supported by the interface, together with the types of their arguments and results, which helps the client and server to perform compile-time type checking and generate appropriate calling sequences. An interface definition also contains information to indicate whether each argument is an input, output or both. This helps in unnecessary copying input argument needs to be copied from client to server and output needs to be copied from server to client. It also contains information about type definitions, enumerated types, and defined constants-so the clients do not have to store this information. A server program that implements procedures in an interface is said to export the interface. A client program that calls the procedures is said to import the interface. When writing a distributed application, a programmer first writes the interface definition using IDL, then can write a server program that exports the interface and a client program that imports the interface. The interface definition is processed using an IDL compiler (the IDL compiler in Sun RPC is called rpcgen) to generate components that can be combined with both client and server programs, without making changes to the existing compilers. In particular, an IDL compiler generates a client stub procedure and a server stub procedure for each procedure in the interface. It generates the appropriate marshaling and un-marshaling operations in each sub procedure. It also generates a header file that supports the data types in the interface definition to be included in the source files of both client and server. The client stubs are compiled and linked with the client program and the server stubs are compiled and linked with server program

3.Describe the following: o Distributed Shared Memory Systems (DSM)

o DSM – Design & Implementation issues

Answer:Distributed Shared Memory Systems (DSM)

This is also called DSVM (Distributed Shared Virtual Memory). It is a loosely coupled

distributed-memory system that has implemented a software layer on top of the message

passing system to provide a shared memory abstraction for the programmers. The software

layer can be implemented in the OS kernel or in runtime library routines with proper kernel

Page 13: mc0085 set 1 killer assignment spring 2012

support. It is an abstraction that integrates local memory of different machines in a network

environment into a single logical entity shared by cooperating processes executing on

multiple sites. Shared memory exists only virtually.

DSM Systems: A comparison between messages passing and tightly coupled

multiprocessor systems

DSM provides a simpler abstraction than the message passing model. It relieves the burden

from the programmer from explicitly using communication primitives in their programs.

In message passing systems, passing complex data structures between two different

processes is difficult. Moreover, passing data structures containing pointers is generally

expensive in message passing model.

Distributed Shared Memory takes advantage of the locality of reference exhibited by

programs and improves efficiency.

istributed Shared Memory systems are cheaper to build than tightly coupled multiprocessor

systems.

The large physical memory available facilitates running programs requiring large memory

efficiently.

DSM can scale well when compared to tightly coupled multiprocessor systems.

Message passing system allows processes to communicate with each other while being

protected from one another by having private address spaces, whereas in DSM one can cause

another to fail by erroneously altering data.

When message passing is used between heterogeneous computers marshaling of data takes

care of differences in data representation; how can memory be shared between computers with

different integer representation.

DSM can be made persistent - i.e. processes communicating via DSM may execute with

overlapping lifetimes.

A process can leave information in an agreed location to another process. Processes

communicating via message passing must execute at the same time.

Which is better? Message passing or Distributed Shared Memory? Distributed Shared

Memory appears to be a promising tool if it can be implemented efficiently

Page 14: mc0085 set 1 killer assignment spring 2012

As shown in the above figure, the DSM provides a virtual address space shared among

processes on loosely coupled processors. DSM is basically an abstraction that integrates the

local memory of different machines in a network environment into a single local entity shared by

cooperating processes executing on multiple sites. The shared memory itself exists only

virtually. The application programs can use it in the same way as traditional virtual memory,

except that processes using it can run on different machines in parallel.

Architectural Components:

Each node in a distributed system consists of one or more CPUs and a memory unit. The nodes

are connected by a communication network. A simple message-passing system allows

processes on different nodes to exchange messages with each other. DSM abstraction presents

a single large shared memory space to the processors of all nodes. Shared memory of DSM

exists only virtually. Memory map manager running at each node maps the local memory onto

the shared virtual memory. To facilitate this mapping, shared-memory space is partitioned into

blocks. Data caching is used to reduce network latency. When a memory block accessed by a

process is not resident in local memory:

a block fault is generated and control goes to the OS.

the OS gets this block from the remote node and maps it to the application’s address space and

the faulting instruction is restarted.

Page 15: mc0085 set 1 killer assignment spring 2012

Thus data keeps migrating from one node to another node but no communication is visible to

the user processes.

Network traffic is highly reduced if applications show a high degree of locality of data accesses.

Variations of this general approach are used for different implementations depending on

whether the DSM allows replication and/or migration of shared memory.

DSM – Design and Implementation Issues The important issues involved in the design and implementation of DSM systems are as follows: Granularity: It refers to the block size of the DSM system, i.e. to the units of sharing and the unit of data transfer across the network when a network block fault occurs. Possible units are a few words, a page, or a few pages. Structure of Shared Memory Space: The structure refers to the Lay out of the shared data in memory. It is dependent on the type of applications that the DSM system is intended to support. Memory coherence and access synchronization: Coherence (consistency) refers to memory coherence problem that deals with the consistency of shared data that lies in the main memory of two or more nodes. Synchronization refers to synchronization of concurrent access to shared data using synchronization primitives such as semaphores. Data Location and Access: A DSM system must implement mechanisms to locate data blocks in order to service the network data block faults to meet the requirements of the memory coherence semantics being used. Block Replacement Policy: If the local memory of a node is full, a cache miss at that node implies not only a fetch of the accessed data block from a remote node but also a replacement. i.e. a data block of the local memory must be replaced by the new data block. Therefore a block replacement policy is also necessary in the design of a DSM system.Thrashing: In a DSM system, data blocks migrate between nodes on demand. If two nodes compete for write access to a single data item, the corresponding data block may be transferred back and forth at such a high rate that no real work can get done. A DSM system must use a policy to avoid this situation (known as Thrashing). Heterogeneity: The DSM systems built in for homogenous systems need not address the heterogeneity issue. However, if the underlying system environment is heterogeneous, the DSM system must be designed to take care of heterogeneity so that it functions properly with machines having different architectures.

4. Discuss the clock synchronization algorithms.

Answer: Clock Synchronization Algorithms Clock synchronization algorithms may be broadly classified as Centralized and Distributed: Centralized Algorithms In centralized clock synchronization algorithms one node has a real-time receiver. This node, called the time server node whose clock time is regarded as correct and used as the reference time. The goal of these algorithms is to keep the clocks of all other nodes synchronized with the clock time of the time server node. Depending on the role of the time server node, centralized clock synchronization algorithms are again of two types – Passive Time Sever and Active Time Server.

Page 16: mc0085 set 1 killer assignment spring 2012

1. Passive Time Server Centralized Algorithm: In this method each node periodically sends a message to the time server. When the time server receives the message, it quickly responds with a message (“time = T”), where T is the current time in the clock of the time server node. Assume that when the client node sends the “time = ?” message, its clock time is T0, and when it receives the “time = T” message, its clock time is T1. Since T0 and T1 are measured using the same clock, in the absence of any other information, the best estimate of the time required for the propagation of the message “time = T” from the time server node to the client’s node is (T1-T0)/2. Therefore, when the reply is received at the client’s node, its clock is readjusted to T + (T1-T0)/2. 2. Active Time Server Centralized Algorithm: In this approach, the time server periodically broadcasts its clock time (“time = T”). The other nodes receive the broadcast message and use the clock time in the message for correcting their own clocks. Each node has a priori knowledge of the approximate time (Ta) required for the propagation of the message “time = T” from the time server node to its own node, Therefore, when a broadcast message is received at a node, the node’s clock is readjusted to the time T+Ta. A major drawback of this method is that it is not fault tolerant. If the broadcast message reaches too late at a node due to some communication fault, the clock of that node will be readjusted to an incorrect value. Another disadvantage of this approach is that it requires broadcast facility to be supported by the network. Another active time server algorithm that overcomes the drawbacks of the above algorithm is the Berkeley algorithm proposed by Gusella and Zatti for internal synchronization of clocks of a group of computers running the Berkeley UNIX. In this algorithm, the time server periodically sends a message (“time = ?”) to all the computers in the group. On receiving this message, each computer sends back its clock value to the time server. The time server has a priori knowledge of the approximate time required for the propagation of a message from each node to its own node. Based on this knowledge, it first readjusts the clock values of the reply messages, It then takes a fault-tolerant average of the clock values of all the computers (including its own). To take the fault tolerant average, the time server chooses a subset of all clock values that do not differ from one another by more than a specified amount, and the average is taken only for the clock values in this subset. This approach eliminates readings from unreliable clocks whose clock values could have a significant adverse effect if an ordinary average was taken. The calculated average is the current time to which all the clocks should be readjusted, The time server readjusts its own clock to this value, Instead of sending the calculated current time back to other computers, the time server sends the amount by which each individual computer’s clock requires adjustment, This can be a positive or negative value and is calculated based on the knowledge the time server has about the approximate time required for the propagation of a message from each node to its own node. Centralized clock synchronization algorithms suffer from two major drawbacks:1. They are subject to single – point failure. If the time server node fails, the clock synchronization operation cannot be performed. This makes the system unreliable. Ideally, a distributed system, should be more reliable than its individual nodes. If one goes down, the rest should continue to function correctly. 2. From a scalability point of view it is generally not acceptable to get all the time requests serviced by a single time server. In a large system, such a solution puts a heavy burden on that one process. Distributed algorithms overcome these drawbacks:

5. Discuss the following with respect to Resource Management in Distributed Systems: o Load – Balancing Approach

Page 17: mc0085 set 1 killer assignment spring 2012

o Load – Sharing Approach

Answer:

Load-Balancing Approach The scheduling algorithms that use this approach are known as Load Balancing or Load-Leveling Algorithms. These algorithms are based on the intuition that for better resource utilization, it is desirable for the load in a distributed system to be balanced evenly. Thus a load balancing algorithm tries to balance the total system load by transparently transferring the workload from heavily loaded nodes to lightly loaded nodes in an attempt to ensure good overall performance relative to some specific metric of system performance. We can have the following categories of load balancing algorithms: 1. Static: Ignore the current state of the system. e.g. If a node is heavily loaded, it picks up a task randomly and transfers it to a random node. These algorithms are simpler to implement but performance may not be good.

2. Dynamic: Use the current state information for load balancing. There is an overhead involved in collecting state information periodically; they perform better than static algorithms.

3. Deterministic: Algorithms in this class use the processor and process characteristics to allocate processes to nodes.

4. Probabilistic: Algorithms in this class use information regarding static attributes of the system such as number of nodes, processing capability, etc.

5. Centralized: System state information is collected by a single node. This node makes all scheduling decisions.

6. Distributed: Most desired approach. Each node is equally responsible for making scheduling decisions based on the local state and the state information received from other sites.

7. Cooperative: A distributed dynamic scheduling algorithm. In these algorithms, the distributed entities cooperate with each other to make scheduling decisions. Therefore they are more complex and involve larger overhead than non-cooperative ones. But the stability of a cooperative algorithm is better than that of a non-cooperative one.

8. Non-cooperative: A distributed dynamic scheduling algorithm. In these algorithms, individual entities act as autonomous entities and make scheduling decisions independently of the action of other entities. Load Estimation Policy: This policy makes an effort to measure the load at a particular node in a distributed system according to the following criteria: The number of processes running at a node as a measure of the load at the node.

The CPU utilization as a measure of load

None of the above fully captures the load at a node, other parameters such as resource demands of these processes, architecture and speed of the processor total remaining execution time of the processes, etc should be taken into consideration as well. Process Transfer Policy: The strategy of load balancing algorithms is based on the idea of transferring some processes from the heavily loaded nodes to lightly loaded nodes. To facilitate this, it is necessary to devise a policy to decide whether or not a node is lightly or heavily

Page 18: mc0085 set 1 killer assignment spring 2012

loaded. The threshold value of a node is the limiting value of its workload and is used to decide whether a node is lightly or heavily loaded. The threshold value of a node may be determined by any of the following methods: 1. Static Policy: Each node has a predefined threshold value. If the number of processes exceed the predefined threshold value, a process is transferred. Can cause process thrashing under heavy load, thus causing instability.

2. Dynamic Policy: In this method, the threshold value is dynamically calculated. It is increased under heavy load and decreased under light load. Thus process thrashing does not occur.

3. High-low Policy: Each node has two threshold values, high and low. Thus, the state of a node can be overloaded, under-loaded or normal depending on the number of processes greater than high, less than low or otherwise.

Location Policies: Once a decision has been made through the transfer policy to transfer a process from a node, the next step is to select the destination node for that process’ execution. This selection is made by the location policy of a scheduling algorithm. The main location policies proposed are as follows: 1. Threshold: A random node is polled to check its state and the task is transferred if it will not be overloaded; polling is continued until a suitable node is found or a threshold number of nodes have been polled. Experiment shows polling 3 to 5 five nodes performs as good as polling large number of nodes, like 20 nodes. This also has substantial performance over no load balancing at all.

2. Shortest: A predetermined number of nodes are polled and the node with minimum load among these is picked for the task transfer; if that node is overloaded the task is executed locally.

3. Bidding: In this method, each node acts as a manager (the one who tries to transfer a task) and a contractor, the one that is able to accept a new task. In this the Manager broadcasts a request-for-bids to all the nodes. A contractor returns bids (quoted price based on the processor capability, memory size, resource availability, etc). A Manager chooses the best bidder for transferring the task. Problems that could arise as a result of broadcasts of two or more managers concurrently need to be addressed.

4. Pairing: This approach tries to reduce the variance in load between pairs of nodes. In this approach, two nodes that differ greatly in load are paired with each other so they can exchange tasks. Each node asks a randomly picked node if it will pair with it. After a pairing is formed, one or more processes are transfered from heavily loaded node to the lightly loaded node. State Information Exchange Policies: The dynamic policies require frequent change of state information among the nodes of the system. In fact, a dynamic load-balancing algorithm faces a transmission dilemma because of the two opposing impacts the transmission of a message has on the overall performance of the system. On one hand, transmission improves the ability of the algorithm to balance the load. On the other hand, it raises the expected queuing time of messages because of the increase in the utilization of the communication channel. Thus proper selection of the state information exchange policy is essential. The proposed load balancing algorithms use one of the following policies for the purpose: 1. Periodic Broadcast: Each node broadcasts its state information periodically, say every t time units. It does not scale well and causes heavy network traffic. May result in fruitless messages.

Page 19: mc0085 set 1 killer assignment spring 2012

2. Broadcast When State Changes: This avoids fruitless messages. A node broadcasts its state only when its state changes. For example, when the state changes from normal to low or normal to high, etc.

3. On-Demand Exchange: Under this approach

A node broadcasts a state information request when its state changes from normal load region to high or low load.

Upon receiving this request, other nodes send their current state information to the requesting node.

If the requesting node includes its state information in the request then, only those nodes that can cooperate with the requesting node need to send reply.

4. Exchange by Polling: In this approach the state information is exchanged with a polled node only. Polling stops after a predetermined number of polling or after a suitable partner is found, whichever happens first.

5. Priority Assignment Policies: One of the following priority assignment rules may be used to assign priorities to local and remote processes (i.e. processes that have migrated from other nodes): i) Selfish: Local processes are given higher priority than remote processes. Study shows this approach yields worst response time of the three policies.

This approach penalizes processes that arrive at a busy node because they will be transferred and hence will execute as low priority processes. It favors the processes that arrive at lightly loaded nodes.

ii) Altruistic: Remote processes are given higher priority than local processes Study shows this approach yields best response time of all the three approaches.

Under this approach, remote processes incur lower delays than local processes. iii) Intermediate: If the number of local processes are more, local processes get higher priority; otherwise, remote processes get higher priority. Study shows that the overall response time performance under this policy is much closer to that of the altruistic policy.

Under this policy, local processes are treated better than the remote processes for a wide range of loads.

iv) Migration – Limiting Policies: This policy is used to decide about the total number of times a process should be allowed to migrate. Uncontrolled: Remote process is treated like local process. So, there is no limit on the number of nodes it can migrate. Controlled: Most systems use controlled policy to overcome the instability problem Migrating a partially executed process is expensive; so, many systems limit the number of migrations to 1. For long running processes, it might be beneficial to migrate more than once.

Load Sharing Approach Several researchers believe that load balancing, with its implication of attempting to equalize workload on all the nodes of the system, is not an appropriate objective. This is because the overhead involved in gathering the state information to achieve this objective is normally very

Page 20: mc0085 set 1 killer assignment spring 2012

large, especially in distributed systems having a large number of nodes. In fact, for the proper utilization of resources of a distributed system, it is not required to balance the load on all the nodes. It is necessary and sufficient to prevent the nodes from being idle while some other nodes have more than two processes. This rectification is called the Dynamic Load Sharing instead of Dynamic Load Balancing.Issues in Load-Sharing Algorithms: The design of a load sharing algorithm requires that proper decisions be made regarding load estimation policy, process transfer policy, state information exchange policy, priority assignment policy, and migration limiting policy. It is simpler to decide about most of these policies in case of load sharing, because load sharing algorithms do not attempt to balance the average workload of all the nodes of the system. Rather, they only attempt to ensure that no node is idle when a node is heavily loaded. The priority assignment policies and the migration limiting policies for load-sharing algorithms are the same as that of load-balancing algorithms. 1. Load Estimation Policies: In this an attempt is made to ensure that no node is idle while processes wait for service at some other node. In general, the following two approaches are used for estimation: Use number of processes at a node as a measure of load

Use the CPU utilization as a measure of load

Process Transfer Policies: Load sharing algorithms are interested in busy or idle states only and most of them employ the all-or-nothing strategy given below: All or Nothing Strategy: It uses a single threshold policy. A node becomes a candidate to accept tasks from remote nodes only when it becomes idle. A node becomes a candidate for transferring a task as soon as it has more than one task. Under this approach, an idle process is not able to immediately acquire a task, thus wasting processing power. To avoid this, the threshold value can be set to 2 instead of 1. Location Policies: Location Policy decides the sender node or the receiver node of a process that is to be moved within the system for load sharing. Depending on the type of node that takes the initiative to globally search for a suitable node for the process, the location policies are of the following types:

1. Sender-Initiated Policy: Under this policy, heavily loaded nodes search for lightly loaded nodes to which task may be transferred. The search can be done by sending a broadcast message or probing randomly picked nodes

An advantage of this approach is that sender can transfer the freshly arrived tasks, so no preemptive task transfers occur.

A disadvantage of this approach is it can cause system instability under high system load.

2. Receiver-Initiated Location Policy: Under this policy, lightly loaded nodes search for heavily loaded nodes from which tasks may be transferred

The search for a sender can be done by sending a broadcast message or by probing randomly picked nodes.

An disadvantage of this approach is it may result in preemptive task transfers because sender may not have any freshly arrived tasks.

Advantage is, this does not cause system instability, because under high system loads a receiver will quickly find a sender; and under low system loads, it is OK for processes to process some additional control messages.

Page 21: mc0085 set 1 killer assignment spring 2012

3. Symmetrically Initiated Location Policy: Under this approach, both senders and receivers search for receivers and senders respectively.

4. State Information Exchange Policies: Since it is not necessary to equalize load at all nodes under load sharing, state information is exchanged only when the state changes.

5. Broadcast When State Changes: A node broadcasts a state information request message when it becomes under-loaded or overloaded.

In the sender-initiated approach a node broadcasts this message only when it is overloaded.

In the receiver-initiated approach, a node broadcasts this message only when it is under-loaded.

6. Poll When State Changes: When a node’s state changes,

It randomly polls other nodes one by one and exchanges state information with the polled nodes.

Polling stops when a suitable node is found or a threshold number of nodes have been polled.

Under sender initiated policy, sender polls to find suitable receiver.

Under receiver initiated policy, receiver polls to find suitable sender. The above Average Algorithm by Krueger and Finkel (A dynamic load balancing algorithm) tries to maintain load at each node within an acceptable range of the system average. 7. Transfer Policy: A threshold policy that uses two adaptive thresholds, the upper threshold, and the lower threshold

A node with load lower than lower threshold is considered a receiver

A node with load higher than the higher threshold is considered a sender.

A node’s estimated average load is supposed to lie in the middle of the lower and upper thresholds.

6. Discuss the following with respect to File Systems: o Stateful Vs Stateless Servers

o Caching

Page 22: mc0085 set 1 killer assignment spring 2012

Answer:

Stateful Vs Stateless Servers

The file servers that implement a distributed file service can be stateless or stateful. Stateless

file servers do not store any session state. This means that every client request is treated

independently, and not as part of a new or existing session. Stateful servers, on the other hand,

do store session state. They may, therefore, keep track of which clients have opened which

files, current read and write pointers for files, which files have been locked by which clients, etc.

The main advantage of stateless servers is that they can easily recover from failure. Because

there is no state that must be restored, a failed server can simply restart after a crash and

immediately provide services to clients as though nothing happened. Furthermore, if clients

crash the server is not stuck with abandoned opened or locked files. Another benefit is that the

server implementation remains simple because it does not have to implement the state

accounting associated with opening, closing, and locking of files.

The main advantage of stateful servers, on the other hand, is that they can provide better

performance for clients. Because clients do not have to provide full file information every time

they perform an operation, the size of messages to and from the server can be significantly

decreased. Likewisethe server can make use of knowledge of access patterns to perform read-

ahead and do other optimisations. Stateful servers can also offer clients extra services such as

file locking, and remember read and write positions.

Caching

Besides replication, caching is often used to improve the performance of a DFS. In a DFS,

caching involves storing either a whole file, or the results of file service operations. Caching can

be performed at two locations: at the server and at the client. Server-side caching makes use of

file caching provided by the host operating system. This is transparent to the server and helps to

improve the server’s performance by reducing costly disk accesses.

Page 23: mc0085 set 1 killer assignment spring 2012

Client-side caching comes in two flavours: on-disk caching, and in-memory caching. On-disk

caching involves the creation of (temporary) files on the client’s disk. These can either be

complete files (as in the upload/download model) or they can contain partial file state, attributes,

etc. In-memory caching stores the results of requests in the client-machine’s memory. This can

be process-local (in the client process), in the kernel, or in a separate dedicated caching

process.

The issue of cache consistency in DFS has obvious parallels to the consistency issue in shared

memory systems, but there are other tradeoffs (for example, disk access delays come into play,

the granularity of sharing is different, sizes are different, etc.). Furthermore, because write-

through caches are too expensive to be useful, the consistency of caches will be weakened.

This makes implementing Unix semantics impossible. Approaches used in DFS caches include,

delayed writes where writes are not propagated to the server immediately, but in the

background later on, and write-on-close where the server receives updates only after the file is

closed. Adding a delay to write-on-close has the benefit of avoiding superfluous writes if a file is

deleted shortly after it has been closed.