ICS362 Distributed Systems Dr Ken Cosh Week 5. Review Communication – Fundamentals – Remote Procedure Calls (RPC) – Message Oriented Communication – Stream

ICS362 Distributed Systems

Dr Ken Cosh

Week 5

Review

Communication– Fundamentals– Remote Procedure Calls (RPC)– Message Oriented Communication– Stream Oriented Communication– Multicast Communication

This Week

Naming– Names, Identifiers & Addresses– Flat Naming– Structured Naming

Names

A string of bits / characters referring to an entity.– Entity could be resources, hosts, printers, disks,

files, processes, users, mailboxes, newsgroups, webpages, messages…

Entities can be operated on through their interfaces– But for that we need an access point – or

address

Access points

An entity can have more than one access point– We have more than one telephone– A host offers multiple ports

An entity can change its access points– A new IP address in a new network– A new email address

Entity <-> Access Point

It appears an access point is tightly associated with an entity

But the name of the entity and the name of the access point should be independent– Making a naming system which is more flexible

and easier to use.

Identifiers

Uniquely refer to an entity– An identifier refers to at most one entity.– Each entity is referred to by at most one identifier.– An identifier always refers to the same entity (i.e.,

it is never reused).

Human Friendly Names

Most names are represented in machine readable form, i.e. a bit string.

Human Friendly Names convert this to a character string.

Name Resolution

The crucial aspect is how to resolve names, identifiers and addresses?

– Close link to message routing Simply a table of name<->address pairs

– With a large distributed system this becomes a large table which can’t be centralised.

Most of this section will deal with alternative approaches to name resolution

– Flat Naming– Structured Naming– Attribute Based Naming

Flat Naming

Generally names are just random bit strings – i.e. nothing about the name gives any indication of where the access point is.

– (In contrast to cis.payap.ac.th for example)

Alternatives here include:– Broadcast Based– Home Based– Distributed Hash Tables– Hierarchical Based

Broadcasting

Message sent out to all machines in network– Broadcast a message containing the entity of that

is being looked for– Each machine checks if they have the entity– Those with an access point respond accordingly

As the network grows it becomes inefficient– Wasted Bandwidth– Too many hosts being interrupted with messages

they can’t answer

Multicasting

Multicasting can improve things as only a specified group of machines will receive the ‘broadcast’

Forwarding Pointers

When an entity moves, it leaves a forwarding pointer at its last address– Once an entity has been found we can find the

current address by following forwarding pointers Drawbacks

– The chain for a mobile entity can become very long!

– What happens if part of the chain is unreliable? Scalability?

Home Based Approaches

A Home Location keeps track of the current location of an entity.– This is the ‘Care of’ address of the entity

If a request comes it is first routed to the home location, but then forwarded to the current location– With the client being updated with the new

location.

Home Based Approaches

Home Location Drawbacks

Communication latency due to potential distances between locations

What if the Home Location doesn’t exist or is unavailable?

What is the entity decides to move permanently?

Distributed Hash Tables

A hash function is used to allocated random identifiers to nodes and keys to entities

An entities with key k is under the jurisdiction of the node with the smallest id >= k

If a node needs to find an entity that isn’t under it’s jurisdiction it could simply check with it’s predecessor or succeeding node.– This is made more efficient by storing a finger

table of nearby nodes.

Distributed Hash Table

Distributed Hash Tables

With randomly assigned ids the requests could be routed across long distances

Topology based assignments of node identifiers– Make sure that nearby nodes get nearby ids

Proximity Routing– By storing multiple successors & predecessors a node can

choose to check with a nearby node assuming it satisfies the conditions (< or >) of the key

Hierarchical Approaches

The network is divided into a collection of domains, each with subdomains until you reach a leaf domain

Each domain has an associated directory node dir(D) which leads to a tree of directory nodes.– With a root directory node at the top.

Hierarchical Approaches

Root DirectoryTop Level Domain

Subdomain

Leaf Domain

Location Records

Each directory node has a location record for each entity within its directory– If an entity is within a subdomain then it contains

a location record of the subdomain containing the entity.

If an entity has multiple locations (is replicated) a directory may contain more than one reference for the entity

Location Records

Look Up

Look Up is done through ever increasing circles – based on locality.Consider Worst Case?

Insertion

Structured Naming

Flat names are convenient for machines,– But not really for humans

File naming & host naming allow convenient human friendly names.

Here we discuss Namespaces & Name Resolution

Namespaces

Names can be represented as a labeled, directed graph.

2 Types of node– Leaf Nodes

The address of a named entity, or the actual entity. No outgoing edges

– Directory Nodes Named nodes with a number of outgoing edges

Naming Graph with 1 root node

Naming Graphs

Most have a single root Many are strictly hierarchical

– Making them into a tree where each node has exactly 1 incoming edge

Some are directed acyclic graphs (as in previous slide)– Each node can have multiple incoming edges, but

no cycles allowed

Aliases

In the previous example the entity “/keys” has an alias “/home/steen/keys”– Multiple absolute paths referring to the same

node (Hard links)

An alternative is to use symbolic links– When resolving “/home/steen/keys” the absolute

path “/keys” is returned.– (As in the following slide)

Symbolic Link

Name Resolution

Resolving a name involves following a path through the graph;

– E.g. /home/steen/mbox

Closure Mechanism– Resolution works on the assumption that we know where to

start the path from – i.e. where is the root node? Is it a node in a higher graph? Have we already resolved that

node?

– What would you do with the string 0031204430784?

Mounting Points

A directory node can store the identifier of a directory node from a different namespace.

– This is the Mounting Point

Consider a collection of distributed namespaces, we can mount a foreign namespace with;

– The name of an access protocol– The name of the server– The name of the mounting point in the foreign name space

For Example – ftp://cis.payap.ac.th

Foreign Mounting Point

Namespace Implementation

A naming service implemented by name servers– For large scale DS, it is distributed across multiple servers

This is separated into layers– Global Layer

High level nodes (root node and neighbours), hence relatively fixed & stable.

– Administrational Layer Nodes from within a single organisation, e.g. groups of entities,

perhaps a node for each department in an organisation– Managerial Layer

Frequently changing nodes e.g. hosts in a local network

DNS example

Global Layer

High Availability is particularly necessary– If one fails a large part will be unavailable as

resolution can not continue past the failed server. But, as names rarely change, clients can

cache the results– So speedy results are not as important as

availability Normally implemented using replicated

servers

Administrational Layer

Availability is important – for clients in the same organisation as the nameserver, but less important for those outside of the organisation.

Responsiveness is much more important at this layer– Updates need to be processed more quickly –

e.g. a new user account needs to be processed quickly.

Managerial Layer

Availability is less demanding– Can be managed on a single machine

Performance is crucial– Responses should be immediate

Layer Comparison

Name Resolution Implementation

Choices:– Iterative or Recursive?

Lets consider needing to resolve:– root:<nl, vu, cs, ftp, pub, globe, index.html>– Otherwise known as:– ftp://ftp.cs.vu.nl/pub/globe/index.html

Iterative Resolution

root:<nl, vu, cs, ftp, pub, globe, index.html> The root server resolves ‘nl’ and returns that location to the

client– Remaining pathname: nl: <vu, cs, ftp, pub, globe, index.html>

The nl nameserver resolves ‘vu’– Remaining pathname: vu: <cs, ftp, pub, globe, index.html>

The vu nameserver resolves ‘cs’ and‘ftp’– Remaining pathname: ftp: <pub, globe, index.html>

Then the ftp server can return the requested file. Each time the location of the next server is returned to the

client and the client makes a new request.

Iterative Name Resolution

Recursive Resolution

root:<nl, vu, cs, ftp, pub, globe, index.html> The nameserver passes the request on to the next

nameserver it finds;– i.e. root identifies nl and passes on the request:– nl: <vu, cs, ftp, pub, globe, index.html>

nl passes on the request to vu: cs: <ftp, pub, globe, index.html>

– vu passes on the request to ftp:– ftp: <pub, globe, index.html>

Finally the results are returned to the client back through the chain.

Recursive Resolution

Recursive vs Iterative

Recursive places more demands on the servers– Which generally makes it prohibitive for global

layer servers dealing with many requests


Recursive name resolution enables each server to learn the address of lower level nodes– And cache these results

This makes subsequent requests much quicker– The results can be cached both by the root server

and every other server in the chain

Recursive Caching


Recursive can also be cheaper in terms of communication– Consider if the request in the example given was

made from Chiang Mai…

Documents

ICS362 Distributed Systems Dr Ken Cosh Week 5. Review Communication – Fundamentals – Remote Procedure Calls (RPC) – Message Oriented Communication – Stream