Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A4: Layered Block-Structured File System
CS4410OperatingSystems
SlidesoriginallybyRobbertvanRenesse.
Introduction
2
BlockStore
PhysicalDevice(e.g.,DISK)
FileSystemabstractionthatprovidespersistent,named data
Disk: sectorsidentifiedwithlogicalblockaddresses,specifyingsurface,track,andsectortobeaccessed.
Layered Abstractions to access storage(HIGHLY SIMPLIFIED FIGURE 11.7 from book)
abstractionprovidingaccesstoasequenceofnumbered blocks.
(Nonames.)
BlockStoreAbstractionProvidesadisk-likeinterface:• asequenceofblocksnumbered0,1,… (typicallyafewKB)• youcanreadorwrite1blockatatime
3
nblocks() returns size of the block store in #blocks
read(block_num) returns contents of given block number
write(block_num, block) writes block contents at given block num
setsize(size) sets the size of the block store
A4hasyouworkwithmultipleversions/instantiationsof
thisabstraction.
Headsupaboutthecode!Thisentirecodebaseiswhathappenswhenyouwantobjectorientedprogramming,butyouonlyhaveC.
PutonyourC++/JavaGoggles!
block_store_t (ablockstoretype)isessentiallyanabstractclass
4
Contentsofblock_store.h#define BLOCK_SIZE 512 // # bytes in a block
typedef unsigned int block_no; // index of a block
typedef struct block { char bytes[BLOCK_SIZE];
} block_t;
typedef struct block_store {void *state;int (*nblocks)(struct block_store *this_bs);int (*read)(struct block_store *this_bs, block_no offset, block_t *block);int (*write)(struct block_store *this_bs, block_no offset, block_t *block);int (*setsize)(struct block_store *this_bs, block_no size);void (*destroy)(struct block_store *this_bs);
} block_store_t;
5
ß poorman’sclass
Noneofthisisdata!Alltypedefs!
BlockStoreInstructions• block_store_t *xxx_init(…)– Name&signaturevaries,setsupthefn pointers
• int nblocks(…)• read(…)• write(…)• setsize(…)• destroy()– freeseverythingassociatedwiththisblockstore
6
ß “constructor”
ß “destructor”
sample.c -- justalonedisk#include ...#include “block_store.h”
int main(){block_store_t *disk = disk_init(“disk.dev”, 1024);block_t block;strcpy(block.bytes, “Hello World”);(*disk->write)(disk, 0, &block);(*disk->destroy)(disk);return 0;
}
RUN IT! IT’S COOL!> gcc -g block_store.c sample.c> ./a.out> less disk.dev
7
BlockStorescanbeLayered!
Eachlayerpresentsablockstoreabstraction
CACHEDISK
STATDISK
DISK
block_store
keepsacacheofrecentlyusedblocks
keepstrackof#readsand#writesforstatistics
keepsblocksinaLinuxfile
8
ACachefortheDisk?Yes!Allrequestsforagivenblockgothroughblockcache
9
BlockCacheAKAcachedisk
Disk
FileSystemAKAtreedisk
• Benefit#1:Performance– Cachesrecentlyreadblocks– Buffersrecentlywrittenblocks(tobewrittenlater)
• Benefit#2:Synchronization:Foreachentry,OSaddsinformationto:• preventaprocessfromreadingblockwhileanotherwrites
• ensurethatagivenblockisonlyfetchedfromstoragedeviceonce,evenifitissimultaneouslyreadbymanyprocesses
layer.c -- codewithlayers#define CACHE_SIZE 10 // #blocks in cache
block_t cache[CACHE_SIZE];
int main(){block_store_t *disk = disk_init(“disk2.dev”, 1024);block_store_t *sdisk = statdisk_init(disk);block_store_t *cdisk = cachedisk_init(sdisk, cache, CACHE_SIZE);
block_t block;strcpy(block.bytes, “Farewell World!”);(*cdisk->write)(cdisk, 0, &block);(*cdisk->destroy)(cdisk);(*sdisk->destroy)(sdisk);(*disk->destroy)(disk);
return 0;}
RUN IT! IT’S COOL!> gcc -g block_store.c statdisk.c cachedisk.c layer.c> ./a.out> less disk2.dev
10
CACHEDISK
STATDISK
DISK
ExampleLayersblock_store_t *statdisk_init(block_store_t *below);
// counts all reads and writes
block_store_t *debugdisk_init(block_store_t *below, char *descr);// prints all reads and writes
block_store_t *checkdisk_init(block_store_t *below);// checks that what’s read is what was written
block_store_t *disk_init(char *filename, int nblocks)// simulated disk stored on a Linux file// (could also use real disk using /dev/*disk devices)
block_store_t *ramdisk_init(block_t *blocks, nblocks)// a simulated disk in memory, fast but volatile
11
Howtowritealayerstruct statdisk_state {
block_store_t *below; // block store belowunsigned int nread, nwrite; // stats
};
block_store_t *statdisk_init(block_store_t *below){struct statdisk_state *sds = calloc(1, sizeof(*sds));sds->below = below;
block_store_t *this_bs = calloc(1, sizeof(*this_bs));this_bs->state = sds;this_bs->nblocks = statdisk_nblocks;this_bs->setsize = statdisk_setsize;this_bs->read = statdisk_read;this_bs->write = statdisk_write;this_bs->destroy = statdisk_destroy;return this_bs;
} 12
layer-specificdata
statdisk implementation(cont’d)int statdisk_read(block_store_t *this_bs, block_no offset, block_t *block){
struct statdisk_state *sds = this_bs->state;sds->nread++;return (*sds->below->read)(sds->below, offset, block);
}
int statdisk_write(block_store_t *this_bs, block_no offset, block_t *block){struct statdisk_state *sds = this_bs->state;sds->nwrite++;return (*sds->below->write)(sds->below, offset, block);
}
void statdisk_destroy(block_store_t *this_bs){free(this_bs->state);free(this_bs);
} 13
recordsthestatsandpassestherequesttothelayerbelow
AnotherPossibleLayer:Treedisk• Afilesystem,similartoUnixfilesystems• InitializedtosupportNvirtualblockstores(AKAfiles)• Underlyingblockstore(below)partitionedinto3sections:1. Superblock: block#02. Fixednumberofi-nodeblocks: startsatblock#1– FunctionofN(enoughtostoreNi-nodes)
3. Remainingblocks: startsafteri-nodeblocks– datablocks,freeblocks,indirectblocks,freelist blocks
14
blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
blocks:
Remainingblocksi-nodeblocks
superblock
TypesofBlocksinTreedisk
• Superblock:the0th blockbelow• Freelistblock:listofallunusedblocksbelow• I-nodeblock: listofinodes• Indirblock: listofblocks• Datablock: justdata
15
union treedisk_block {block_t datablock;struct treedisk_superblock superblock;struct treedisk_inodeblock inodeblock;struct treedisk_freelistblock freelistblock;struct treedisk_indirblock indirblock;
};
treedisk Superblock
// one per underlying block storestruct treedisk_superblock {
block_no n_inodeblocks; block_no free_list; // 1st block on free list
// 0 means no free blocks};
16
blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
blocks:
remainingblocksinode blockssuperblock
Notice:therearenopointers.Everythingisablocknumber.
n_inodeblocks 4free_list ?(some green box)
treedisk FreeList
struct treedisk_freelistblock {block_no refs[REFS_PER_BLOCK];
};
refs[0]:#ofanotherfreelistblock or0ifendoflist
refs[i]:#offreeblockfori>1,0ifslotempty
17
blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
blocks: 413
remainingblocksinode blockssuperblock
0678
5101112
914150Suppose REFS_PER_BLOCK = 4
treedisk freelist
n_inodeblocks #
free_list
superblock:
0 0 0
freelist block
0freelist block
freeblock
freeblockfreeblock
freeblock
18
treedisk I-nodeblock
19
blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
blocks:
remainingblocksinode blockssuperblock
struct treedisk_inodeblock {struct treedisk_inode inodes[INODES_PER_BLOCK];
};
struct treedisk_inode {block_no nblocks; // # blocks in virtual block storeblock_no root; // block # of root node of tree (or 0)
};
11500
inode[0]
inode[1]
91400
SupposeREFS_PER_BLOCK = 4
Whatifthefileisbiggerthan1block?
treedisk Indirectblock
20
blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
blocks:
remainingblocksinode blockssuperblock
struct treedisk_indirblock {block_no refs[REFS_PER_BLOCK];
};
115314
Suppose INODES_PER_BLOCK = 2
inode[0]
inode[1]
nblocksroot
nblocksroot
1312110
virtualblockstore:3blocks
nblocks 3
rooti-node:
indirectblock
datablock
datablock
datablock
21Whatifthefileisbiggerthan3 blocks?
treedisk virtualblockstore
nblocks ####
root
i-node: (double)indirectblock
indirectblock indirectblock
datablock
datablockdatablock
22HowdoIknowifthisisdataorablocknumber?
treedisk virtualblockstore
• alldatablocksatbottomlevel• #levels:ceil(logRPB(#blocks))+1
RPB=REFS_PER_BLOCK
• Forexample,ifrpb =16:#blocks #levels
0 0
1 1
2- 16 2
17- 256 3
257- 4096 4
REFS_PER_BLOCKmorecommonlyatleast128orso 23
virtualblockstore:withhole
nblocks 3
root
i-node: indirectblock
datablock
datablock
• Holeappearsasavirtualblockfilledwithnullbytes• pointertoindirectblockcanbe0too• virtualblockstorecanbemuchlargerthanthe“physical”blockstoreunderneath!
24
0
Puttingitalltogether
25
blocknumber 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
blocks: 49
remainingblocksinodeblocks
superblock
0678
51000
1312110
115314
inode[0]
inode[1]
nblocksroot
nblocksroot
Ashort-livedtreedisk filesystem
#define DISK_SIZE 1024#define MAX_INODES 128
int main(){block_store_t *disk = disk_init(“disk.dev”, DISK_SIZE);
treedisk_create(disk, MAX_INODES);
treedisk_check(disk); // optional: check integrity of file system
(*disk->destroy)(cdisk);
return 0;}
26
Examplecodewithtreediskblock_t cache[CACHE_SIZE];
int main(){block_store_t *disk = disk_init(“disk.dev”, 1024);block_store_t *cdisk = cachedisk_init(disk, cache, CACHE_SIZE);treedisk_create(disk, MAX_INODES);block_store_t *file0 = treedisk_init(cdisk, 0);block_store_t *file1 = treedisk_init(cdisk, 1);
block_t block;(*file0->read)(file0, 4, &block);(*file1->read)(file1, 4, &block);
(*file0->destroy)(file0);(*file1->destroy)(file1);(*cdisk->destroy)(cdisk);(*disk->destroy)(cdisk);
return 0;}
27
Layeringontopoftreedisk
CACHEDISK
DISK
inode 0 inode 1 inode …
block_store_t *treedisk_init(block_store_t *below,unsigned int inode_no);
TREEDISK TREEDISK
28
...
...
//createsanewfileassociatedwithinode_no
TREEDISK
traceutility
TREEDISK
CHECKDISK
STATDISK
CHECKDISK CHECKDISK CHECKDISK
TREEDISK TREEDISK
TRACEDISK
RAMDISK
29
CACHEDISK
...
...
tracedisk• ramdisk isbottom-levelblockstore• tracedisk isatop-levelblockstore– or“application-level”ifyouwill– youcan’tlayerontopofit
block_store_t *tracedisk_init(block_store_t *below,char *trace, //tracefilenameunsigned int n_inodes);
30
TracefileCommandsW:0:3 //writeinode 0,block3Ifnothingisknownaboutthefileassociatedwithinode 0priortothisline,bywritingtoblock3,youareimplicitlysettingthesizeofthefileto4blocks
W:0:4 // writeto inode 0,block4bythesamelogic,younowsetthesizeto5sinceyou'vewrittentoblock4
N:0:2 //checksifinode 0isofsize2thiswillfailb/cthesizeshouldbe5
S:1:0 //setsizeofinode 1to0
R:1:1 // read inode 1,block1thiswillfailb/cyou’rereadingpasttheendofthefile(thereisnoblock1forthefileassociatedwithinode 1)
31
ExampletracefileW:0:0 //writeinode 0,block0N:0:1 //checksifinode 0isofsize1W:1:1 //writeinode 1,block1N:1:2 //checksifinode 1isofsize2R:1:1 //readinode 1,block1S:1:0 //setsizeofinode 1to0N:1:0 //checksifinode 0isofsize0
ifNfails, prints “!!CHKSIZE ..”
32
CompilingandRunning• run“make”inthereleasedirectory– thisgeneratesanexecutablecalled“trace”
• run“./trace”– thisreadstracefile“trace.txt”– youcanpassanothertracefileasargument
• ./tracemyowntracefile
33
Outputtobeexpected$ makecc -Wall -c -o trace.o trace.c. . .cc -Wall -c -o treedisk_chk.o treedisk_chk.ccc -o trace trace.o block_store.o cachedisk.o checkdisk.odebugdisk.o ramdisk.o statdisk.o tracedisk.o treedisk.otreedisk_chk.o$ ./traceblocksize: 512refs/block: 128!!TDERR: setsize not yet supported!!ERROR: tracedisk_run: setsize(1, 0) failed!!CHKSIZE 10: nblocks 1: 0 != 2!$STAT: #nnblocks: 0!$STAT: #nsetsize: 0!$STAT: #nread: 32!$STAT: #nwrite: 20 34
TraceW:0:0N:0:1W:0:1N:0:2W:1:0N:1:1W:1:1N:1:2S:1:0N:1:0
Cmd:inode:block
A4:Part1/3Implementtreedisk_setsize(0)– currentlyitgeneratesanerror– whatyouneedtodo:
• iteratethroughalltheblocksintheinode• putthemonthefreelist
Usefulfunctions:• treedisk_get_snapshot
35
A4:Part2/3Implementcachedisk– currentlyitdoesn’tactuallydoanything– whatyouneedtodo:
• pickacachingalgorithm:LRU,MFU,ordesignyourown– gowild!
• implementitwithincachedisk.c• write-throughcache!!• consultthewebforcachingalgorithms!
36
A4:Part3/3Implementyourowntracefilethat:• isatleast10lineslong• usesall4commands(RWNS)• hasaneditdistanceofatleast6fromthetracewegaveyou• iswell-formed.Forexample,itshouldnottrytoverifythatafilehas
asizeXwhenthepreviouscommandhaveinfactdeterminedthatitshouldhavesizeY.Youmayfindthechktrace.c fileuseful
• Atmost:10,000commands,128inodes, 1<<27blocksizeStep1:useittoconvinceyourselfthatyourcacheisworkingcorrectly.OptionalStep:makeatracethatishardforacachinglayertobeeffective(randomreads/writes)sothatitcanbeusedtodistinguishgoodcachesfrombadones.
37
Whattosubmit• treedisk.c //withtreedisk_setsize(0)• cachedisk.c• trace.txt
38
TheBigRedCachingContest!!!• Wewillruneverybody’straceagainsteverybody’streedisk andcachedisk
• Wewillrunthisontopofastatdisk• Wewillcountthetotalnumberofreadoperations
• Thewinneriswhomeverendsupdoingthefewestreadoperationstotheunderlyingdisk
• DoesnotcounttowardsgradeofA4,butyoumaywinfameandglory
39