Upload
franklin-owen
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
ICS362 Distributed Systems
Dr Ken Cosh
Week 5
Review
Communication– Fundamentals– Remote Procedure Calls (RPC)– Message Oriented Communication– Stream Oriented Communication– Multicast Communication
This Week
Naming– Names, Identifiers & Addresses– Flat Naming– Structured Naming
Names
A string of bits / characters referring to an entity.– Entity could be resources, hosts, printers, disks,
files, processes, users, mailboxes, newsgroups, webpages, messages…
Entities can be operated on through their interfaces– But for that we need an access point – or
address
Access points
An entity can have more than one access point– We have more than one telephone– A host offers multiple ports
An entity can change its access points– A new IP address in a new network– A new email address
Entity <-> Access Point
It appears an access point is tightly associated with an entity
But the name of the entity and the name of the access point should be independent– Making a naming system which is more flexible
and easier to use.
Identifiers
Uniquely refer to an entity– An identifier refers to at most one entity.– Each entity is referred to by at most one identifier.– An identifier always refers to the same entity (i.e.,
it is never reused).
Human Friendly Names
Most names are represented in machine readable form, i.e. a bit string.
Human Friendly Names convert this to a character string.
Name Resolution
The crucial aspect is how to resolve names, identifiers and addresses?
– Close link to message routing Simply a table of name<->address pairs
– With a large distributed system this becomes a large table which can’t be centralised.
Most of this section will deal with alternative approaches to name resolution
– Flat Naming– Structured Naming– Attribute Based Naming
Flat Naming
Generally names are just random bit strings – i.e. nothing about the name gives any indication of where the access point is.
– (In contrast to cis.payap.ac.th for example)
Alternatives here include:– Broadcast Based– Home Based– Distributed Hash Tables– Hierarchical Based
Broadcasting
Message sent out to all machines in network– Broadcast a message containing the entity of that
is being looked for– Each machine checks if they have the entity– Those with an access point respond accordingly
As the network grows it becomes inefficient– Wasted Bandwidth– Too many hosts being interrupted with messages
they can’t answer
Multicasting
Multicasting can improve things as only a specified group of machines will receive the ‘broadcast’
Forwarding Pointers
When an entity moves, it leaves a forwarding pointer at its last address– Once an entity has been found we can find the
current address by following forwarding pointers Drawbacks
– The chain for a mobile entity can become very long!
– What happens if part of the chain is unreliable? Scalability?
Home Based Approaches
A Home Location keeps track of the current location of an entity.– This is the ‘Care of’ address of the entity
If a request comes it is first routed to the home location, but then forwarded to the current location– With the client being updated with the new
location.
Home Based Approaches
Home Location Drawbacks
Communication latency due to potential distances between locations
What if the Home Location doesn’t exist or is unavailable?
What is the entity decides to move permanently?
Distributed Hash Tables
A hash function is used to allocated random identifiers to nodes and keys to entities
An entities with key k is under the jurisdiction of the node with the smallest id >= k
If a node needs to find an entity that isn’t under it’s jurisdiction it could simply check with it’s predecessor or succeeding node.– This is made more efficient by storing a finger
table of nearby nodes.
Distributed Hash Table
Distributed Hash Tables
With randomly assigned ids the requests could be routed across long distances
Topology based assignments of node identifiers– Make sure that nearby nodes get nearby ids
Proximity Routing– By storing multiple successors & predecessors a node can
choose to check with a nearby node assuming it satisfies the conditions (< or >) of the key
Hierarchical Approaches
The network is divided into a collection of domains, each with subdomains until you reach a leaf domain
Each domain has an associated directory node dir(D) which leads to a tree of directory nodes.– With a root directory node at the top.
Hierarchical Approaches
Root DirectoryTop Level Domain
Subdomain
Leaf Domain
Location Records
Each directory node has a location record for each entity within its directory– If an entity is within a subdomain then it contains
a location record of the subdomain containing the entity.
If an entity has multiple locations (is replicated) a directory may contain more than one reference for the entity
Location Records
Look Up
Look Up is done through ever increasing circles – based on locality.Consider Worst Case?
Insertion
Structured Naming
Flat names are convenient for machines,– But not really for humans
File naming & host naming allow convenient human friendly names.
Here we discuss Namespaces & Name Resolution
Namespaces
Names can be represented as a labeled, directed graph.
2 Types of node– Leaf Nodes
The address of a named entity, or the actual entity. No outgoing edges
– Directory Nodes Named nodes with a number of outgoing edges
Naming Graph with 1 root node
Naming Graphs
Most have a single root Many are strictly hierarchical
– Making them into a tree where each node has exactly 1 incoming edge
Some are directed acyclic graphs (as in previous slide)– Each node can have multiple incoming edges, but
no cycles allowed
Aliases
In the previous example the entity “/keys” has an alias “/home/steen/keys”– Multiple absolute paths referring to the same
node (Hard links)
An alternative is to use symbolic links– When resolving “/home/steen/keys” the absolute
path “/keys” is returned.– (As in the following slide)
Symbolic Link
Name Resolution
Resolving a name involves following a path through the graph;
– E.g. /home/steen/mbox
Closure Mechanism– Resolution works on the assumption that we know where to
start the path from – i.e. where is the root node? Is it a node in a higher graph? Have we already resolved that
node?
– What would you do with the string 0031204430784?
Mounting Points
A directory node can store the identifier of a directory node from a different namespace.
– This is the Mounting Point
Consider a collection of distributed namespaces, we can mount a foreign namespace with;
– The name of an access protocol– The name of the server– The name of the mounting point in the foreign name space
For Example – ftp://cis.payap.ac.th
Foreign Mounting Point
Namespace Implementation
A naming service implemented by name servers– For large scale DS, it is distributed across multiple servers
This is separated into layers– Global Layer
High level nodes (root node and neighbours), hence relatively fixed & stable.
– Administrational Layer Nodes from within a single organisation, e.g. groups of entities,
perhaps a node for each department in an organisation– Managerial Layer
Frequently changing nodes e.g. hosts in a local network
DNS example
Global Layer
High Availability is particularly necessary– If one fails a large part will be unavailable as
resolution can not continue past the failed server. But, as names rarely change, clients can
cache the results– So speedy results are not as important as
availability Normally implemented using replicated
servers
Administrational Layer
Availability is important – for clients in the same organisation as the nameserver, but less important for those outside of the organisation.
Responsiveness is much more important at this layer– Updates need to be processed more quickly –
e.g. a new user account needs to be processed quickly.
Managerial Layer
Availability is less demanding– Can be managed on a single machine
Performance is crucial– Responses should be immediate
Layer Comparison
Name Resolution Implementation
Choices:– Iterative or Recursive?
Lets consider needing to resolve:– root:<nl, vu, cs, ftp, pub, globe, index.html>– Otherwise known as:– ftp://ftp.cs.vu.nl/pub/globe/index.html
Iterative Resolution
root:<nl, vu, cs, ftp, pub, globe, index.html> The root server resolves ‘nl’ and returns that location to the
client– Remaining pathname: nl: <vu, cs, ftp, pub, globe, index.html>
The nl nameserver resolves ‘vu’– Remaining pathname: vu: <cs, ftp, pub, globe, index.html>
The vu nameserver resolves ‘cs’ and‘ftp’– Remaining pathname: ftp: <pub, globe, index.html>
Then the ftp server can return the requested file. Each time the location of the next server is returned to the
client and the client makes a new request.
Iterative Name Resolution
Recursive Resolution
root:<nl, vu, cs, ftp, pub, globe, index.html> The nameserver passes the request on to the next
nameserver it finds;– i.e. root identifies nl and passes on the request:– nl: <vu, cs, ftp, pub, globe, index.html>
nl passes on the request to vu: cs: <ftp, pub, globe, index.html>
– vu passes on the request to ftp:– ftp: <pub, globe, index.html>
Finally the results are returned to the client back through the chain.
Recursive Resolution
Recursive vs Iterative
Recursive places more demands on the servers– Which generally makes it prohibitive for global
layer servers dealing with many requests
Recursive vs Iterative
Recursive name resolution enables each server to learn the address of lower level nodes– And cache these results
This makes subsequent requests much quicker– The results can be cached both by the root server
and every other server in the chain
Recursive Caching
Recursive vs Iterative
Recursive can also be cheaper in terms of communication– Consider if the request in the example given was
made from Chiang Mai…