31
Disco Running Commodity Operating Systems on Scalable Multiprocessors

Disco

  • Upload
    chaman

  • View
    52

  • Download
    0

Embed Size (px)

DESCRIPTION

Disco. Running Commodity Operating Systems on Scalable Multiprocessors. FLASH. cache coherent non-uniform multiprocessor developed at Stanford not available at the time paper was written. Problems with other approaches. - PowerPoint PPT Presentation

Citation preview

Page 1: Disco

Disco

Running Commodity Operating Systems on Scalable

Multiprocessors

Page 2: Disco

FLASH

• cache coherent non-uniform multiprocessor

• developed at Stanford• not available at the time paper was written

Page 3: Disco

Problems with other approaches

• many other experimental systems require large changes to uniprocessor OSes

• development lags behind delivery of hardware

• high costs mean system will likely introduce instabilities, which may break legacy applications

• often hardware vendors are not the software vendors (think Intel/Microsoft)

Page 4: Disco

Virtual Machine Monitorsnanokernel

• software layer between hardware and heavyweight OS

• by running multiple copies of traditional OSes, scalability issues are confined to the much smaller VM monitor

• VM are natural software fault boundaries and again the size of the monitor makes hardware fault tolerance easier to implement

Page 5: Disco

Virtual Machine Monitorsnanokernel

• monitor handles all the NUMA related issues so that UMA OSes do not need to be made aware of non-uniformity

• multiple OSes allow legacy applications to continue to run while newer versions are phased in. Could allow more experimentation with new technologies.

Page 6: Disco

Disco architecture

Page 7: Disco

Virtual Machine Challenges

• overhead– runtime

• privileged instructions must be emulated inside monitor

• I/O requests must be intercepted and de/re -virtualized by monitor

– memory• code/data must be replicated for the different

copies of OS• each OS may have its own file system buffer, for

instance

Page 8: Disco

Virtual Machine Challenges

• resource management– monitor will not have all the information that

the OS does, so it may make poor decision. Think of spin-locking.

• sound familiar? This is the same as the argument against kernel-level threads.

Page 9: Disco

Virtual Machine Challenges• communication and sharing

– if OSes are separated by virtual machine boundaries how do they share resources and how does information cross those VM boundaries. VMs aren’t aware they are actually on the same machine.

• sound familiar? This issue motivated LRPC and URPC, specializing communication protocols for the situation where server and client reside on the same physical machine

Page 10: Disco

Disco implementation

• Disco emulates the MMU and the trap architecture, allowing unmodified applications and OSes to run on the VM

• frequently used kernel operations can be optimized. For instance interrupt disabling is done by the OSes by load and storing to special addresses

Page 11: Disco

Disco implementation

• all I/O devices are virtualized, including network connections and disks, and all access to them must pass through Disco to be translated or emulated.

Page 12: Disco

Disco implementation• at only 13,000 lines there is a higher ability

to hand tune code.• small image size, only 72KB, also means

that copies of Disco can reside in every local node, so Disco text never has to be fetched at lower non-uniform rate

• machine-wide data structures are partitioned so parts that are currently being used by processor can reside in local memory

Page 13: Disco

Disco implementation

• scheduling VMs is similar to traditional kernels scheduling processes, eg quanta size considerations, saving state in data structures, processor affinity, etc

Page 14: Disco

Disco implementationVirtual Physical Memory

• Disco maintains a physical-to-machine address mapping.

• machine addresses are FLASH’s 40 bit addresses

Page 15: Disco

Disco implementationVirtual Physical Memory

• when a heavy weight OS tries to update the TLB, Disco steps in and applies the physical-to-machine translation. Subsequent memory accesses then can go straight thru the TLB

• each VM has an associated pmap in the monitor

• pmap also has a back pointer to its virtual address to help invalidate mappings in the TLB

Page 16: Disco

Disco implementationVirtual Physical Memory

• MIPS has a tagged TLB, called address space identifier (ASID).

• ASIDs are not virtualized, so TLB must be flushed on uberweight VM context switches

• 2nd level software TLB?

Page 17: Disco

Disco implementationHiding NUMA

• cache misses are served faster from local memory rather than remote memory

• read and read-shared pages are migrated to all nodes that frequently access them

• write-shared are not, since maintaining consistency requires remote access anyway

• migratation and replacement policy is driven by cache miss counting

Page 18: Disco

Disco implementationHiding NUMA

Page 19: Disco

Disco implementationHiding NUMA

• memmap tracks which virtual page references each physical page. Used during TLB shootdown.

Page 20: Disco

Disco implementationVirtualizing I/O

• all device access is intercepted by the monitor

• disk reads can be serviced by monitor and if request size is a multiple of the machines’s page size, monitor only has to remap machine pages into the VM physical memory address space.

• pages are read-only and will generate a copy-on-write fault if written to

Page 21: Disco

Virtualizing I/O

Page 22: Disco

Virtual Networks

Page 23: Disco

IRIX

• small changes were required to IRIX kernel, but were due to a MIPS pecularity

• new device drivers were not needed• hardware abstraction layer is where the

trap, the zeroed page, unused page, and VM de-scheduling optimizations were implemented

Page 24: Disco

SPLASHOS

• thin OS, supported by disco• used for parallel scientific applications

Page 25: Disco

Experimental Results

• since FLASH was not available experiments were run on SimOS, a machine simulator

• simulator was too slow, compared to actual machine, to allow long work loads to be studied

Page 26: Disco

Workloads

Page 27: Disco

Single VM

• ran the four worloads in plain IRIX inside simulator and with a single VM running IRIX, 3% - 16% slowdown

Page 28: Disco

Memory Overhead

• ran pmake with 8 physical processors with six different configurations, plain IRIX, 1VM, 2VMs, 4VMs, 8VMs, and 8VMs communicating with NFS

• demonstrates– 8VMs required less than twice the physical

memory as plain IRIX– physical to machine mapping is a useful

optimization

Page 29: Disco

Scalability tests

• compared the performance of pmake under the previously described configurations

• summary: while 1VM showed a significant slowdown(36%), using 8VMs showed a significant speedup(40%)

• also ran radix sorting algorithm on plain IRIX and on SPLASHOS/Disco. Reduced run time by 2/3

Page 30: Disco

Page Migration and Replication

• engineering ran on 8 processors, raytrace on 16

• UMA machine is theoretical lower bound

Page 31: Disco

Conclusion

• nanokernel is several orders of magnitude smaller than heavyweight OS, yet can run virtually unmodified OSes in virtual machine monitors

• problems of execution overhead and memory footprint were addressed