Upload
osgood
View
33
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Network Memory Servers: An idea whose time has come. Glenford Mapp David Silcott Dhawal Thakker. Motivation. Networks are now much faster than disks Should be quicker to get data from the memory of another computer compared to using local disk Not a new idea - so what’s different?. - PowerPoint PPT Presentation
Citation preview
MSN 2004
Network Memory Servers:An idea whose time has come
Glenford Mapp
David Silcott
Dhawal Thakker
MSN 2004
Motivation
• Networks are now much faster than disks
• Should be quicker to get data from the memory of another computer compared to using local disk
• Not a new idea - so what’s different?
MSN 2004
What’s different?• Networks are faster and cheaper
– Gigabit NICs are £35.00– We could also see 10G NICs in the near future
• Memory is also cheaper– 1GB = £100.00 – Likely to remain stable
• Availability of good “free” Oses– Linux and Free BSD
MSN 2004
Our approach is also different
• Previous approaches– Dominated by the Distributed Shared Memory
crowd (Apollo System)– DSM never became mainstream
• lots of fundamental changes to OS platform required
• Exotic Hardware (e.g Scalable Coherent Interconnect or SCI)
• Network Memory became a casualty of this failure
MSN 2004
Previous Approach cont’d• Remote paging was also one of the key
areas (SAMSON project, NYU)
• Idle machines approach– Use memory of other machines in the network
when no one is logged on but get off when the person returns
– Very complex -• how do you give guarantees to everyone
MSN 2004
Our Approach
• Applied Engineering Approach– what are the real numbers in this area
• Use the power of the Network– use standard networking approach– No DSM, no virtual memory plug-ins
• Client-Server approach– Dedicated servers with loads of memory
MSN 2004
Design of the Network Memory Server (NMS)
• NMS has an independent interface– Can interface with any OS
• not like Network Block Device (NBD) in Linux
• NMS is stateless– Does not keep track of previous interactions
• Actions of the NMS are regarded as atomic– Either complete success or total failure
MSN 2004
Design of NMS cont’d
• NMS deals with blocks of data– Has no idea how the blocks are being used
• Not like NFS
• Each block is uniquely identified by a block_id allocated by the NMS
• Each client is uniquely identified by a client_id
MSN 2004
Block_ids
• 64-bit entities– 32 minor index– 16 major index– 16 bit security tag
• generated when the blocks are created
• checked before any read/write operation on a block
MSN 2004
NMS calls
• GetblockMemory(client_id, size, nblocks, options)– Creates a number of blocks of a certain size
with consecutive block_ids• returns the starting Block_id
• options - backup
• Release(client_id, block_id, nblocks)– Releases a number of consecutive block_ids
MSN 2004
NMS calls cont’d• WriteBlockMemory(client_id, block_id,
offset, length, *buf)– writes data in buffer to a block on the server
• ReadBlockMemory(client_id, block_id, offset, length, *buf)– reads data from a block on the server into a
buffer
MSN 2004
NMS calls cont’d
• GetClientid(password)– creates a new client
• GetMasterBlock(password, client_id)– returns a number of blocks of sector/block_id
mappings
• StoreMasterBlock(block_id, client_id, password, nblocks) – stores a number of sector/block_id mappings
MSN 2004
NMS Client
• How does a client use the NMS?– What interface is presented to the OS
• Interface is one that is used to support hard disks. In Linux, we use the block device interface
• So the OS thinks of the NMS service as a fast hard disk
MSN 2004
NMS Client cont’d
• So the OS tells the NMS client to read and write sectors.
• NMS client will take sectors and map them onto blocks which it gets from the NMS
• When block device is unmounted, we must store the sector/block_id mappings on the NMS
MSN 2004
NMS Cont’d
• The StoreMasterBlock call stores these mappings on the NMS
• When the device is remounted, it must first get the sector/block_id mappings from the NMS and rebuild the sector table.
• The GetMasterBlock call retrieves the mappings from the NMS
MSN 2004
NMS Client Cache
• Client also has a cache of blocks that are used to store recently used sectors– this is a secondary cache as the main caching is
really done by the Unix Buffer Cache
• Design decision to keep our cache as a simple round-robin cache -– replace the next item pointed to in the cache
MSN 2004
NMS Client Operations• Since we are not a normal disk, we do not
need to rearrange read and write operations
• So we attempt to read and write blocks as the requests come in.
• Also developed a write-out thread operation. So a special thread, called the Write-out thread writes modified blocks to the NMS
MSN 2004
NMS Client Implementation
Operating System
Block Device Interface
Sector / Block_idHash Table
Cache
Programs
Unix Buffer Cache
Write-Out Queue
(Two levels)
NMS Block Device
MSN 2004
Getting a sectorIs sector in Hash table
YesIs it in the cache
Is it a readYes
Return Rubbish
Get Block_idFrom NMS. Put Entry inHash Table
Is it a read
Get Data from NMS Server; putin cache entry
Is the cache full
Replace Entry
Has replaced entrybeen modified
Put it on WriteOut Queue
Get New Cache Entry
Read from/ Writeto Cache Entry
OKWrite Data toCache Entry
Yes
No
No
Yes
Yes
No
No
Yes
No
No
MSN 2004
Structures on NMS Server
Client_id Hash Table
Block_idHash Table(Two-level)
Allocated Memory
Memory for Clients
Memory for InternalUse by the NMS
MSN 2004
Testing and Evaluation
• What do we really want to know
• What does it take to operate faster than a hard disk?– Can you use standard hardware (Middlesex)– Do you need special hardware (Cambridge)
• Level 5 Networks
• What are the key parameters in this space
MSN 2004
What do you measure• What happens if we change the block size
of the data transfer
• What happens if we change the number of units transferred in one transfer– Added multi-write operation
• Is local caching any good
• What is the network traffic like
MSN 2004
Using Iozone• Iozone is quite popular
– Measures the memory hierarchy
• Disk particulars– 60 GB, 2MB buffer, 7200 RPM, Seek Time 9.0 ms,
Average latency 4.16ms
• Network -– using Intel E1000 NICs and Netgear Gigabit
Switch (GS 104); using UDP port 6111
• NMS client and server implemented as Linux kernel modules
MSN 2004
Read Performance
0
200000
400000
600000
800000
1000000
1200000
1400000
0 50000 100000 150000 200000 250000 300000 350000
kB file
kB/s
ec
mw4, 2MB_cache, 1kB_msgdisksw, 2MB_cache, 4kB_msg
MSN 2004
Record Rewrite Performance
0
200000
400000
600000
800000
1000000
1200000
0 50000 100000 150000 200000 250000 300000
kB file
kB/s
ec
MW4 2mb cache, 1k
disk system
MSN 2004
Write Performance for Different Transfer sizes
0
50000
100000
150000
200000
250000
300000
0 50000 100000 150000 200000 250000 300000 350000
kB file
kB/s
ec
sw, 2MB_cache, 4kB_msgdisksw, 2MB_cache, 1kB_msgsw, 2MB_cache, 2kB_msg
MSN 2004
Write Performance for Multiples of 1K blocks
0
50000
100000
150000
200000
250000
300000
0 50000 100000 150000 200000 250000 300000
kB file
kB/s
ec
mw4, 2MB_cache, 1kB_msgdiskmw12, 2MB_cache, 1kB_msgmw8, 2MB_cache, 1kB_msgmw16, 2MB_cache, 1kB_msg
MSN 2004
Write Performance for extreme configurations
0
50000
100000
150000
200000
250000
300000
0 50000 100000 150000 200000 250000 300000 350000
kB file
kB/s
ec
disk
mw17k, 2MB_cache, 4kB_msg
mw32k, 8MB_cache, 4kB_msg
sw, 2MB_cache, 1kB_msg
MSN 2004
Maximum data transfer rate
82
83
84
85
86
87
88
50 100 150 200 250
Filesize(MB)
Ra
te(M
b/s
ec
)
Received
Sent
MSN 2004
Buffer cache Hits
0
20
40
60
80
100
120
100 150 200 250
Filesize(MB)
% b
loc
k c
ac
he
hit
s
BCH MAX
BCH MIN
MSN 2004
Conclusions and Future
• We can beat the disk
• Will compare these results with those using Level 5 hardware (Rip Sohan, LCE)
• Open source release planned
• Developing a Network Storage Server
• Building prototypes – running Linux and Windows using NMS