42
Modularity and Separation of Concerns Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah COMP 150-IDS: Internet Scale Distributed Systems (Spring 2018) Copyright 2012, 2013, 2015, 2016, 2017 & 2018

Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

  • Upload
    builien

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

Modularity and

Separation of Concerns

Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2018)

Copyright 2012, 2013, 2015, 2016, 2017 & 2018

Page 2: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn 2

Goals

Explore the benefits of modularity and separation of concerns

Explore some of the limits and drawbacks of modular systems

Page 3: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Abstracting the Hard Disk

Page 4: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

What’s a hard disk?

Now: http://www.youtube.com/watch?v=3owqvmMf6No

Then: http://www.youtube.com/watch?v=CUeXy80zMBg&t=19s

Page 5: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

What’s a hard disk? Platter

Sector

Typical Characteristics

•Fixed sized data blocks (512bytes -> 4K bytes) •Seek time: 3ms – 15ms (depends on drive and distance) •Rotational delay: ~5ms for commodity drives •Transfer rate from platter: 100MBytes/sec

Page 6: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

How does our software show us the disk?

Filesystem – Names: /home/noah/myfile.txt – Files can grow and shrink dynamically – Geometry and timing hidden – Free space managed transparently – Sharing and security – Buffering and optimization – May span multiple drives

Relational database – Collections of tables: rows + columns – Access via query language

Page 7: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Unix Kernel

How is the disk used in Unix / Linux?

Sector

Ap

plic

atio

n

Access by cylinder/track/sector

Filesystem

Files/Dirs security, etc

Buffered block r/w: hides timing

Block

In-memory Block Cache

Blo

ck D

evic

e D

rive

r

Direct read/write of filesystem “blocks” (hides sector size and device geometry)

Raw

Dev

ice

Dri

ver

Page 8: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Unix Kernel

How is the disk used in Unix / Linux (over-simplified)

Sector

Ap

plic

atio

n

Access by cylinder/track/sector

Files/Dirs security, etc

Filesystem

Direct read/write of filesystem “blocks” (hides sector size and device geometry)

Raw

Dev

ice

Dri

ver

Page 9: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Unix Kernel

How is the disk used in Unix / Linux?

Sector

Ap

plic

atio

n

Access by cylinder/track/sector

Filesystem

Files/Dirs security, etc

Buffered block r/w: hides timing

Block

In-memory Block Cache

Blo

ck D

evic

e D

rive

r

Direct read/write of filesystem “blocks” (hides sector size and device geometry)

Raw

Dev

ice

Dri

ver

Page 10: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Things to note

Each layer provides clean abstraction for next

Code replaceable by layer – New filesystem on same block driver

– New raw driver supports new device (different manufacturer, SSD, USB key, digital camera, etc.)

– Cached block space supports (nearly) same interface as uncached

Reuse! – All devices supported by common buffer management and filesystem

– Common APIs at all levels above device

Page 11: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Network Layering Revisited

Page 12: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Architecture of the Internet Protocols

Application Layer

Protocols with application-specific

semantics

HTTP (Web) SMTP (E-mail)

Transport Layer

User-level connection & datagram TCP/UDP

Internet Layer

Unreliable, multi-hop packet delivery

IP Packet Routing

Link Layer

Send an IP Packet over Hardware

Ethernet, Wi-fi, Dial-up

User Program Use the network for some purpose

Firefox, Apache Server, Your program

Layer Purpose Example

We can replace link layer and still use upper layers!

Page 13: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Compare the following RFC’s

http://www.ietf.org/rfc/rfc1042.txt

http://www.ietf.org/rfc/rfc1149.txt

Please note that RFC 1149 support has been demonstrated: http://www.blug.linux.no/rfc1149/

Page 14: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Also this one

https://web.archive.org/web/20160323181858/http://www.networkworld.com/article/2188516/applications/researcher-runs-ip-network-over-xylophones.html

14

Page 15: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Also this one

https://web.archive.org/web/20160323181858/http://www.networkworld.com/article/2188516/applications/researcher-runs-ip-network-over-xylophones.html

15

As an LED lights up, the human participant strikes the corresponding key on the xylophone. Piezo sensors are attached to each xylophone, so that they are able to sense when a note is played on the other xylophone. The Arduino for the receiving computer senses the note and then converts it back into hexadecimal code. And when the second computer sends a return packet, the order of operations is reversed.

Page 16: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Architecture of the Internet Protocols

Application Layer

Protocols with application-specific

semantics

HTTP (Web) SMTP (E-mail)

Transport Layer

User-level connection & datagram TCP/UDP

Internet Layer

Unreliable, multi-hop packet delivery

IP Packet Routing

Link Layer

Send an IP Packet over Hardware

Ethernet, Wi-fi, Dial-up

User Program Use the network for some purpose

Firefox, Apache Server, Your program

Layer Purpose Example

Implementations are often layered to match the architecture!

Page 17: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Overview of Layering/Modularity Issues

Page 18: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Some terms

Separation of concerns

Modularity

Layering

Encapsulation

Information hiding

Abstraction

Reuse

Page 19: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Separation of concerns – HTTP

HTTP/1.1 200 OK Date: Tue, 28 Aug 2007 01:49:33 GMT Server: Apache Transfer-Encoding: chunked Content-Type: text/html <html> <head> <title>Demo #1</title> </head> <body> <h1>A very simple Web page</h1> </body> </html>

HTTP Status Codes Evolve

Orthogonally from Rest of

Protocol

Page 20: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Separation of concerns – HTTP

HTTP/1.1 200 OK Date: Tue, 28 Aug 2007 01:49:33 GMT Server: Apache Transfer-Encoding: chunked Content-Type: text/html <html> <head> <title>Demo #1</title> </head> <body> <h1>A very simple Web page</h1> </body> </html>

Media type registrations shared with E-mail (MIME)

infrastructure

Page 21: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Separation of concerns – HTTP

HTTP/1.1 200 OK Date: Tue, 28 Aug 2007 01:49:33 GMT Server: Apache Transfer-Encoding: chunked Content-Type: text/html <html> <head> <title>Demo #1</title> </head> <body> <h1>A very simple Web page</h1> </body> </html>

Unicode, HTML and other specifications modular and

shareable with other systems

Page 22: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Host: webarch.noahdemo.com

demo1/test.html

Separation of concerns – HTTP HTTP GET

HTTP RESPONSE

HTTP/1.1 200 OK Date: Tue, 28 Aug 2007 01:49:33 GMT Server: Apache Transfer-Encoding: chunked Content-Type: text/html <html> <head> <title>Demo #1</title> </head> <body> <h1>A very simple Web page</h1> </body> </html>

The HTML for the page.

Page 23: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Why modularity and encapsulation?

Sharing and re-use

Layers can evolve separately

Synergies: – Photoshop and GIMP help everyone who uses JPEG

– Including Web use of image/jpeg media type

Reasoning about systems: correctness proofs, etc.

Hiding complexity

Progressive disclosure of complexity

Making complex functions economical

Page 24: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Noah’s Theory of Simplification Choke Points

Nationwide cable & fiber network & ESS Switches

Very complex telephone switching system

Page 25: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Noah’s Theory of Simplification Choke Points

Nationwide cable & fiber network & ESS Switches

Wonderfully simple choke point interface

RJ-11 Jack & Touch Tones = Talk to anyone

in the world using simple touch tone pad.. hook up devices

Page 26: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Noah’s Theory of Simplification Choke Points

Nationwide cable & fiber network & ESS Switches

RJ-11 Jack & Touch Tones = Talk to anyone

in the world using simple touch tone pad.. hook up devices

Group 3 Fax Protocols

Very complex modulation and signalling standard

Page 27: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Noah’s theory….

Nationwide cable & fiber network & ESS Switches

RJ-11 Jack & Touch Tones = Talk to anyone

in the world using simple touch tone pad.. hook up devices

Group 3 Fax Protocols

Drop in paper, dial #,

paper delivered

Page 28: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Example: the Web Stack

Deliver stream to Named Destination

Drop in packet, probably gets there

Internet dynamic routing, ARP, etc.

TCP w/flow control, etc.

Distributed DNS resolution

a) name->ip addr b) UDP Packet to named addr

URIs, Hyperlinks, HTTP Get, Media typed streams, HTML

Click to Browse, worldwide

Each layer hides significant complexity behind simple interface

Page 29: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Layering and Performance

Page 30: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Layering can help performance

Wrap highly tuned implementations in easy-to-use interfaces!

Make those implementations easy to reuse

This is a big, big deal!

But…

Page 31: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Layering can hurt performance

Layering can keep you from getting at details that need to be tuned

Examples: – Disk errors – TCP/IP performance – Compiler optimizations – COMP 40 UM: clean version vs. optimized version

Page 32: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Layering and disk performance

Many disks and device drivers automatically forward data to a spare cylinder when a sector goes bad … spares are usually at inside or outside of disk

But…the filesystem may put critical directory there, unaware access will be amazingly slow

Thanks to Forest Baskett, who gave me this example in about 1980

Page 33: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Layering and TCP/IP Performance

Hard to share buffers and get alignment right across TCP/IP software layers in the OS

Layered implementations can lead to data copying

Studies show that TCP/IP implementations need to share buffers and optimizations across the device, IP, and TCP layers

The highest performing remote file systems share buffers between network and filesystem code

Watson & Mamrak: “a common mistake is to take a layered design as a requirement for a correspondingly layered implementation.” ACM Transactions on Computer Systems (TOCS), Volume 5 Issue 2, May 1987

Page 34: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Layering and compiler optimizations

Compiler front ends tend to respect language layering

Compiler code generators need to optimize across layers

A good compiler will remember pointer to

myArray[i] or even value myArray[i]/2 from

previous loop iteration

int myArray[20]; For (i=0; i<19 && (myArray[i]/2 < 50); i++) myArray[i] += myArray[i+1]/2;

This code doesn’t compute anything useful, but it’s interesting to see how it would be optimized:

I checked: gcc –O3 does indeed do these optimizations

Page 35: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Code from previous example

int i;

unsigned int myArray[SIZE] = {0, 2, 5, 7, 13};

for (i=0; i< (SIZE - 1) && (myArray[i]/2 < 50); i++)

myArray[i] += myArray[i+1]/2;

leal 44(%esp), %esi xorl %edx, %edx movl $0, 28(%esp) leal 32(%esp), %eax movl $2, 32(%esp) movl $5, 36(%esp) movl $7, 40(%esp) movl $13, 44(%esp) L4: movl (%eax), %ecx movl %ecx, %ebx shrl %ebx addl %ebx, %edx cmpl %esi, %eax movl %edx, -4(%eax) je L5 addl $4, %eax cmpl $99, %ecx movl %ecx, %edx jbe L4 .p2align 4,,7

35

Array only read once in loop!

Page 36: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Abstractions Leak!

Page 37: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Leaky abstractions

When you abstract something…you lose something

Sometimes the details you lose show through

These leaky details can cause big trouble!

See “The Law of Leaky Abstractions” A posting by Joel Spolsky

http://www.joelonsoftware.com/articles/LeakyAbstractions.html

By the way, Joel is the person behind StackOverflow and other “Stack” sites

Page 38: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Leaky example: Filesystem performance

Sequential access faster than random – Random causes seeks

– Sequential predictable: allows streaming and pre-fetching

Device geometry “shows through”

SSDs perform differently from hard disks

Factors like these can make it very difficult to predict or tune performance

Page 39: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Leaky example: CPU memory

CPU memory reads faster when locality is good

Cache-aligned loads/stores faster

Multi-core: memory access in one core can slow the other.

Subtlety: even adjacent accesses may be slow if referencing separate cache blocks

Etc.

Factors like these can make it very difficult to predict or tune performance

Page 40: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

2018: Spectre / Meltdown attacks

Severe security problem discovered in most modern CPUs

Leaky abstractions underlying Spectre / Meltdown – Caching: recently accessed data available much faster

– Speculative execution: processor works ahead without security checks if (x < len_of_array_1) y = array2[256 * array1[x]]; // may run speculatively // if x > len(array1)

– Branch prediction: CPU will guess a > b if it usually is

– Data from speculative execution winds up in cache even if a < b

Putting it all together: convince the CPU a probably > b; dangerous code access cache line that depends on supposedly hidden data supposedly secret data accessible!

40

Page 41: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Summary

Page 42: Modularity and Separation of Concerns - Tufts CS layer provides clean abstraction for next Code replaceable by layer – New filesystem on same block driver – New raw driver supports

© 2010 Noah Mendelsohn

Summary

Separation of concerns is one of the key principles of CS

Proper layering and modularization of your designs and code will bring tremendous benefits

But…beware of “leaky” abstractions, performance concerns, security concerns, etc.