Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Operating Systems In Depth XXX–1 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Distributed File Systems Part 1
Operating Systems In Depth XXX–2 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Distributed File Systems
Operating Systems In Depth XXX–3 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
NAS vs. SAN
ClientComputer
ClientComputer
ClientComputer
ClientComputer
File Server(providing
NAS)
DatabaseServer
StorageServer
StorageServer
SANLAN
Operating Systems In Depth XXX–4 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
DFS Components
• Data state– file contents
• Attribute state– size, access-control info, modification time,
etc.
• Open-file state– which files are in use (open)
– lock state
Operating Systems In Depth XXX–5 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Possible Locationsdata
cache
attrcache
open-filestate
Client
datacache
attrcache
open-filestate
Client
datacache
attrcache
Server
localfile system
open-filestate
Operating Systems In Depth XXX–6 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Quiz 1
We’d like to design a file server that serves multiple Unix client computers. Assuming no computer ever crashes and the network is always up and working flawlessly, we’d like file-oriented system calls to behave as if all parties were on a single computer.a) It can’t be doneb) It can be done, but requires disabling all
client-side cachingc) It can be done, but sometimes requires
disabling client-side caching d) It can be done, irrespective of client-side
caching
Operating Systems In Depth XXX–7 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Guiding Principle
Principle of least astonishment (PLA)
– people don’t like surprises, particularly when they come from file systems
Operating Systems In Depth XXX–8 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Single-Thread Consistency
File System
write(fd, buf1, size1);
read(fd, buf2, size2);
// no surprises if// single-thread consistent
// Operations are time-ordered
Operating Systems In Depth XXX–9 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Single-Client Consistency
File System
% cp x y
%
% cmp x y
%
Operating Systems In Depth XXX–10 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Distributed Consistency
File System
Ted’s
Computer
Alice’s
Computer
Operating Systems In Depth XXX–11 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Strict Consistency
File System
Ted’s
Computer
Alice’s
Computer
write(fd1, "A", 2);write(fd2, "B", 2);
// an instant later …read(fd1, buf1, 2);read(fd2, buf2, 2);// buf1 contains "A"// buf2 contains "B"
Operating Systems In Depth XXX–12 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Weak Consistency
File System
Ted’s
Computer
Alice’s
Computer
write(fd1, "A", 2);write(fd2, "B", 2);
// a while later …read(fd1, buf1, 2);read(fd2, buf2, 2);// maybe buf1 contains "A"// maybe buf2 contains "B"
Operating Systems In Depth XXX–13 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Sequential Consistency
File System
Ted’s
Computer
Alice’s
Computer
write(fd1, "A", 2);write(fd2, "B", 2);
// an instant later …read(fd1, buf1, 2);read(fd2, buf2, 2);// if buf2 contains "B"// then buf1 contains "A"
Operating Systems In Depth XXX–14 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Sequential Consistency
File System
Ted’s
Computer
Alice’s
Computer
write(fd1, "A", 2);write(fd2, "B", 2);
// an instant later …read(fd1, buf1, 2);read(fd2, buf2, 2);
// buf1 and buf2 contain "X"
I just updated the file!
No you didn’t!
Operating Systems In Depth XXX–15 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Entry Consistency
File System
Ted’s
Computer
Alice’s
Computer
writelock(fd);write(fd, "B", 2);unlock(fd);
// an instant later …
readlock(fd);read(fd, buf, 2);unlock(fd);
// buf now contains "B"
Operating Systems In Depth XXX–16 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
In Practice …
• Data state
– NFS
- single-client consistent
- weakly consistent
– CIFS
- strictly consistent
• Lock state
– must be strictly consistent
Operating Systems In Depth XXX–17 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Thursday morning, November 17th At 7:00 a.m.
Maytag, the department’s central file server, will be taken down to kick off a filesystem consistency check.
Linux machines will hang. All Windows users should log off.
Normal operation will resume by 8:30 a.m. if all goes well.All windows users should log off before this time.
Questions/concerns to [email protected]
Operating Systems In Depth XXX–18 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Failures in a Local File System
On-DiskFile System
Cache
0
1
2
3
.
.
.
n–1
1 rw 0
Open-FileStateServer
Client
Client Client
Client
On-DiskFile System
Cache
0
1
2
3
.
.
.
n–1
1rw0
Open-FileState Server
Client
ClientClient
Client
Operating Systems In Depth XXX–19 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Distributed Failure
On-DiskFile System
Cache
0
1
2
3
.
.
.
n–1
1 rw 0
Open-FileStateServer
Client
Client Client
Client
On-DiskFile System
Cache
0
1
2
3
.
.
.
n–1
1rw0
Open-FileState Server
Client
Operating Systems In Depth XXX–20 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Quiz 2
We’d like to design a file server that serves multiple Unix client computers, but we now realize we must cope with failures. Which one of the following is not true?a) If we relax Unix system-call semantics a bit,
this is easyb) If we don’t relax Unix system-call semantics,
it’s doable, but we need to introduce some new error messages for certain situations
c) There are failure modes that can’t possibly occur if all parties were on the same computer
d) At least one of the above statements is false
Operating Systems In Depth XXX–21 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
In Practice …
• NFS version 2– relaxed approach to consistency– handles failures pretty well
• CIFS– strictly consistent– intolerant of failures
Operating Systems In Depth XXX–22 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
NFS Version 2
• Released in mid 1980s• Three protocols in one
– file protocol– mount protocol– network lock manager protocol
Basic NFSExtended NFS
Operating Systems In Depth XXX–23 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Distribution of Componentsdata
cache
attrcache
open-filestate
NFSv2 client
datacache
attrcache
open-filestate
NFSv2 client
datacache
attrcache
NFSv2 server
localfile system
Operating Systems In Depth XXX–24 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
NFS in Action
char buffer[100];int fd = open("/home/twd/dir/fileX", O_RDWR); read(fd, buffer, 100);…lseek(fd, 0, SEEK_SET);write(fd, buffer, 100);
Operating Systems In Depth XXX–25 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Open-File Data Structures (Client)
0123
.
.
.
n–1
File-descriptortable
File descriptor
Useraddress space
Kernel address space
refcount
accessmode
filelocation
filehandle +comm.handle
refers to file on server
Operating Systems In Depth XXX–26 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
However …
int fd = creat("/home/twd/dir/tempfile", 0600);char buf[1024];unlink("/home/twd/dir/tempfile");…write(fd, buf, 1024);…lseek(fd, 0, SEEK_SET);read(fd, buf, 1024);close(fd);
Operating Systems In Depth XXX–27 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
And …
int fd = creat("/home/twd/dir/permfile", 0600);char buf[1024];chmod("/home/twd/dir/permfile", 0400)…write(fd, buf, 1024);…
Operating Systems In Depth XXX–28 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
RPC Semantics
• All requests done with ONC RPC• Most are idempotent• A few aren’t
– e.g. unlink
• Made reasonably reliable with DRC– susceptible to Byzantine routers and poorly
timed crashes- crashes affect ability to handle
retransmitted requests correctly
Operating Systems In Depth XXX–29 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
NFS Mount Protocol
Server
Client
Approved List
Operating Systems In Depth XXX–30 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
File Handles
• Servers provide opaque file handles to clients to refer to files– contents mean nothing to clients– identify files on server
• Clients contact server via mount protocol to obtain file handles of roots of exported file systems
Operating Systems In Depth XXX–31 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
File Handle Contents
• File-System ID– which server file system
• File ID– which file within file system
• Generation #– guards against inode reuse
File-System ID File ID Generation #
Operating Systems In Depth XXX–32 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
File Handle Problem
fd = open("x", …);
Client 1 Client 2Server
returns file handle for x
unlink("x");
deletes x
write(fd, buf, size);
modifies y
creat("y", …);
creates y, using inodepreviously used for x
Operating Systems In Depth XXX–33 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Server File Systems
/
BA D EC
H I
QPON
U
32
T
1Z
F G
MLKJ
S
YX
R
WV
Operating Systems In Depth XXX–34 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Client vs. Server Mount Points (1)
/
BA D EC
H I
QPON
U
32
T
1Z
F G
MLKJ
S
YX
R
WV
/
C1 C2
mount server:/B /C2
Operating Systems In Depth XXX–35 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Client vs. Server Mount Points (2)
/
BA D EC
H I
QPON
U
32
T
1Z
F G
MLKJ
S
YX
R
WV
/
C1 C2
mount server:/B /C2
mount server:/B/F/K /C2/F/K
Operating Systems In Depth XXX–36 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Sub Mounts
/
BA D EC
H I
QPON
U
32
T
1Z
F G
MLKJ
S
YX
R
WV
/
C1 C2
mount server:/B/F/K /C2
Operating Systems In Depth XXX–37 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
However …server% ls –d /B/F
drwxr-x--- 2 tom friends 1024 …
/
BA D EC
H I
QPON
U
32
T
1Z
F G
MLKJ
S
YX
R
WV
/
C1 C2
mount server:/B/F/K /C2
Operating Systems In Depth XXX–38 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Local vs. Global Namespace
• Local namespace– each host configures its own file-system
namespace– NFS clients each mount the appropriate
remote file systems• Global namespace
– all hosts share the same namespace– not done in early NFS
Operating Systems In Depth XXX–39 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Mount Protocol Problems
• Local namespaces don’t work• Achieve global name space by having each
client mount everything consistently• giving each client a table listing all possible
mounts is administratively difficult• performing all possible mounts is time
consuming• mounting is a “heavyweight” operation
Operating Systems In Depth XXX–40 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Rather than this …
homeetcdev
ben jarod davidnatalie ankita sandy ethan peter tynan
Operating Systems In Depth XXX–41 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
… this
homeetcdev
Autofs
automountdatabase
Operating Systems In Depth XXX–42 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Automounting: 2000
• Maintain description of global namespace in global database: NIS
• Do mounts only when needed• Automount times out after period of unuse
Operating Systems In Depth XXX–43 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Automounting: 2019
• Global namespace maintained in LDAP database– lightweight directory access protocol
- vendor neutral
– everything mounted at boottime
- fewer, but larger, file systems– no timeout
Operating Systems In Depth XXX–44 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
NFS Consistency
Data CacheBlock N
Server
Data CacheBlock N
ProcessA
Data CacheBlock N
ProcessB
Operating Systems In Depth XXX–45 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
The Attribute Cache
file x block 1
file x block 5
file y block 2
file y block 17
Data cache
file x attrs
file y attrs
Attribute cache
validity period
validity period
Operating Systems In Depth XXX–46 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
More …
• All write RPC requests must be handled synchronously on the server
• Close-to-Open consistency– client writes back all changes on close– flushes cached file info on open
Operating Systems In Depth XXX–47 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Client Crash Recovery
Buffer Cache
Server
Buffer Cache
ProcessA
Buffer Cache
ProcessB
Operating Systems In Depth XXX–48 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Server Crash Recovery
Buffer Cache
Server
Buffer Cache
ProcessA
Buffer Cache
ProcessB
Operating Systems In Depth XXX–49 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
File Locking
• State is required on the server!– recovery must take place in the event of client
and server crashes
Operating Systems In Depth XXX–50 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Quiz 3
Can it be determined by a server that one of its clients has crashed?a) nob) with high probabilityc) yes
Operating Systems In Depth XXX–51 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Network Lock Manager Protocol
Buffer Cache
Server
Buffer Cache
ProcessA
Buffer Cache
ProcessB
lockdstatd
lockdstatd
lockdstatd
Operating Systems In Depth XXX–52 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Locks
• Coverage
– locks cover a region of a file: starting at some
offset, extending for some length– the region may extend beyond the current end
of the file
• Types
– exclusive locks: exclusive locks may not
overlap with any type of lock
– shared locks: shared locks may overlap
• Enforcement
– advisory: no enforcement
– mandatory: enforced (and not supported in
NFS versions 2 and 3)
Operating Systems In Depth XXX–53 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Status Monitor
• Maintains list of monitored hosts on stable storage– clients maintain list of servers on which locks
are held– servers maintain list of clients who have locks
• On restart– reads list of monitored hosts from stable
storage and sends each an SM_NOTIFY RPC
Operating Systems In Depth XXX–54 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Locking a File
Client Server
statd lockd lockd statdSM_MON
OKNLM_LOCK
SM_MON
OKLCK_GRANTED
NLM_LOCK
LCK_GRANTED
Operating Systems In Depth XXX–55 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Unlocking a File
Client Server
statd lockd lockd statd
SM_MON
OK
NLM_UNLOCK
SM_MON
OK
LCK_GRANTED
NLM_UNLOCK
LCK_GRANTED
Operating Systems In Depth XXX–56 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Delayed Locking
Client Server
statd lockd lockd statdNLM_LOCK
LCK_BLOCKED
NLM_GRANTED
LCK_GRANTED
Operating Systems In Depth XXX–57 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Never Mind …
Client Server
statd lockd lockd statdNLM_LOCK
LCK_BLOCKED
NLM_CANCEL
LCK_GRANTED
Operating Systems In Depth XXX–58 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Server Crash Recovery
Client Server
statd lockd lockd statd1: SM_NOTIFY
2: SM_UNMON_ALLOK
3: callback
4: NLM_LOCKreclaim=true 5:SM_MON
OK
6: LCK_GRANTED
7: SM_MON
OK
8: NLM_LOCKreclaim=false
Operating Systems In Depth XXX–59 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
Client Crash Recovery
Client Server
statd lockd lockd statd
1: SM_NOTIFY
STAT_SUCC
2: callback
3: NLM_LOCK
SM_MON
OK
LCK_GRANTED
Operating Systems In Depth XXX–60 Copyright © 2019 Thomas W. Doeppner. All rights reserved.
NFS Version 3
• Still in common use• Basically the same as NFSv2
– improved handling of attributes
– commit operation for writes
– append operation
– various other improvements