Upload
truongkien
View
252
Download
0
Embed Size (px)
Citation preview
1
Acknowledgement: The course slides are adapted from the slides prepared by R.E. Bryant, D.R. O’Hallaron, G. Kesden and Markus Püschel of Carnegie-‐Mellon Univ.
BIL 220 – Introduc7on to Systems Programming Spring 2012
Instructors: Aykut & Erkut Erdem WE’VE GOT
A NEW
COURSE! TOTALLY REDESIGNED
INSIDE AND OUT
2
Overview
¢ Course theme ¢ Five realiQes ¢ How the course fits into the our curriculum ¢ LogisQcs
3
Course Theme: AbstracQon Is Good But Don’t Forget Reality ¢ Most CENG/CS courses emphasize abstracQon § Abstract data types § AsymptoQc analysis
¢ These abstracQons have limits § Especially in the presence of bugs § Need to understand details of underlying implementaQons
¢ Useful outcomes § Become more effecQve programmers
§ Able to find and eliminate bugs efficiently § Able to understand and tune for program performance
§ Prepare for “systems” classes § Computer OrganizaQon, OperaQng Systems, Computer Networks, Computer Architecture, Microprocessors
4
Great Reality #1: Ints are not Integers, Floats are not Reals
¢ Example 1: Is x2 ≥ 0?
§ Float’s: Yes!
§ Int’s: § 40000 * 40000 ➙ 1600000000 § 50000 * 50000 ➙ ??
¢ Example 2: Is (x + y) + z = x + (y + z)? § Unsigned & Signed Int’s: Yes! § Float’s:
§ (1e20 + -‐1e20) + 3.14 -‐-‐> 3.14 § 1e20 + (-‐1e20 + 3.14) -‐-‐> ??
Source: xkcd.com/571
-1794967296
0.00
5
Code Security Example /* Kernel memory region holding user-accessible data */ #define KSIZE 1024 char kbuf[KSIZE]; /* Declaration of library function memcpy */ void *memcpy(void *dest, void *src, size_t n); /* Copy at most maxlen bytes from kernel region to user buffer */ int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len; }
¢ Similar to code found in FreeBSD’s implementaQon of getpeername
¢ There are legions of smart people trying to find vulnerabiliQes in programs
6
Typical Usage /* Kernel memory region holding user-accessible data */ #define KSIZE 1024 char kbuf[KSIZE]; /* Declaration of library function memcpy */ void *memcpy(void *dest, void *src, size_t n); /* Copy at most maxlen bytes from kernel region to user buffer */ int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len; }
#define MSIZE 528 void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, MSIZE); printf(“%s\n”, mybuf); }
7
Malicious Usage
#define MSIZE 528 void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, -MSIZE); . . . }
/* Kernel memory region holding user-accessible data */ #define KSIZE 1024 char kbuf[KSIZE]; /* Declaration of library function memcpy */ void *memcpy(void *dest, void *src, size_t n); /* Copy at most maxlen bytes from kernel region to user buffer */ int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len; }
8
Computer ArithmeQc
¢ Does not generate random values § ArithmeQc operaQons have important mathemaQcal properQes
¢ Cannot assume all “usual” mathemaQcal properQes § Due to finiteness of representaQons § Integer operaQons saQsfy “ring” properQes
§ CommutaQvity, associaQvity, distribuQvity § FloaQng point operaQons saQsfy “ordering” properQes
§ Monotonicity, values of signs
¢ ObservaQon § Need to understand which abstracQons apply in which contexts § Important issues for compiler writers and serious applicaQon programmers
9
Great Reality #2: You’ve Got to Know Assembly ¢ Chances are, you’ll never write programs in assembly § Compilers are much bener & more paQent than you are
¢ But: Understanding assembly is key to machine-‐level execuQon model § Behavior of programs in presence of bugs
§ High-‐level language models break down § Tuning program performance
§ Understand opQmizaQons done / not done by the compiler § Understanding sources of program inefficiency
§ ImplemenQng system sooware § Compiler has machine code as target § OperaQng systems must manage process state
§ CreaQng / fighQng malware § x86 assembly is the language of choice!
10
Assembly Code Example
¢ Time Stamp Counter § Special 64-‐bit register in Intel-‐compaQble machines § Incremented every clock cycle § Read with rdtsc instrucQon
¢ ApplicaQon § Measure Qme (in clock cycles) required by procedure
double t; start_counter(); P(); t = get_counter(); printf("P required %f clock cycles\n", t);
11
Code to Read Counter
¢ Write small amount of assembly code using GCC’s asm facility ¢ Inserts assembly code into machine code generated by compiler
static unsigned cyc_hi = 0; static unsigned cyc_lo = 0; /* Set *hi and *lo to the high and low order bits of the cycle counter. */ void access_counter(unsigned *hi, unsigned *lo) { asm("rdtsc; movl %%edx,%0; movl %%eax,%1"
: "=r" (*hi), "=r" (*lo) : : "%edx", "%eax");
}
12
Great Reality #3: Memory Maners Random Access Memory Is an Unphysical AbstracQon
¢ Memory is not unbounded § It must be allocated and managed § Many applicaQons are memory dominated
¢ Memory referencing bugs especially pernicious § Effects are distant in both Qme and space
¢ Memory performance is not uniform § Cache and virtual memory effects can greatly affect program performance § AdapQng program to characterisQcs of memory system can lead to major speed improvements
13
Memory Referencing Bug Example
¢ Result is architecture specific
double fun(int i) { volatile double d[1] = {3.14}; volatile long int a[2]; a[i] = 1073741824; /* Possibly out of bounds */ return d[0]; }
fun(0) ➙ 3.14 fun(1) ➙ 3.14 fun(2) ➙ 3.1399998664856 fun(3) ➙ 2.00000061035156 fun(4) ➙ 3.14, then segmentation fault
volatile keyword is intended to prevent the compiler from applying any optimizations on
the code that assume values of variables cannot change "on their own."
14
Memory Referencing Bug Example double fun(int i) { volatile double d[1] = {3.14}; volatile long int a[2]; a[i] = 1073741824; /* Possibly out of bounds */ return d[0]; }
fun(0) ➙ 3.14 fun(1) ➙ 3.14 fun(2) ➙ 3.1399998664856 fun(3) ➙ 2.00000061035156 fun(4) ➙ 3.14, then segmentation fault
LocaQon accessed by fun(i)
ExplanaQon: Saved State 4 d7 ... d4 3 d3 ... d0 2 a[1] 1 a[0] 0
15
Memory Referencing Errors
¢ C and C++ do not provide any memory protecQon § Out of bounds array references § Invalid pointer values § Abuses of malloc/free
¢ Can lead to nasty bugs § Whether or not bug has any effect depends on system and compiler § AcQon at a distance
§ Corrupted object logically unrelated to one being accessed § Effect of bug may be first observed long aoer it is generated
¢ How can I deal with this? § Program in Java, Ruby or ML § Understand what possible interacQons may occur § Use or develop tools to detect referencing errors (e.g. Valgrind)
16
Memory System Performance Example
¢ Hierarchical memory organizaQon ¢ Performance depends on access panerns
§ Including how step through mulQ-‐dimensional array
void copyji(int src[2048][2048], int dst[2048][2048]){ int i,j; for (j = 0; j < 2048; j++) for (i = 0; i < 2048; i++) dst[i][j] = src[i][j];}
void copyij(int src[2048][2048], int dst[2048][2048]){ int i,j; for (i = 0; i < 2048; i++) for (j = 0; j < 2048; j++) dst[i][j] = src[i][j];}
21 Qmes slower (PenQum 4)
17
The Memory Mountain
64M
8M 1M
128K
16K
2K 0
1000
2000
3000
4000
5000
6000
7000
s1
s3
s5
s7
s9
s11
s13
s15
s32 Size (bytes)
Rea
d th
roug
hput
(MB
/s)
Stride (x8 bytes)
L1"
L2"
Mem"
L3"
copyij
copyji
Intel Core i7"2.67 GHz"32 KB L1 d-cache"256 KB L2 cache"8 MB L3 cache"
18
Great Reality #4: There’s more to performance than asymptoQc complexity
¢ Constant factors maner too! ¢ And even exact op count does not predict performance
§ Easily see 10:1 performance range depending on how code wrinen § Must opQmize at mulQple levels: algorithm, data representaQons, procedures, and loops
¢ Must understand system to opQmize performance § How programs compiled and executed § How to measure program performance and idenQfy bonlenecks § How to improve performance without destroying code modularity and generality
19
Example Matrix MulQplicaQon
¢ Standard desktop computer, vendor compiler, using opQmizaQon flags ¢ Both implementaQons have exactly the same operaQons count (2n3) ¢ What is going on?
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz (double precision) Gflop/s
160x
Triple loop
Best code (K. Goto)
20
MMM Plot: Analysis Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz Gflop/s
Memory hierarchy and other optimizations: 20x
Vector instructions: 4x
Multiple threads: 4x
¢ Reason for 20x: Blocking or Qling, loop unrolling, array scalarizaQon, instrucQon scheduling, search to find best choice
¢ Effect: fewer register spills, L1/L2 cache misses, and TLB misses
21
Great Reality #5: Computers do more than execute programs
¢ They need to get data in and out § I/O system criQcal to program reliability and performance
¢ They communicate with each other over networks § Many system-‐level issues arise in presence of network
§ Concurrent operaQons by autonomous processes § Coping with unreliable media § Cross plaworm compaQbility § Complex performance issues
22
Role within Hacenepe CENG Curriculum
BIL 212 Computer OrganizaQon
Data Reps. Memory Model
BIL 402 Micro-‐processors
BIL 410 Advanced Computer
Architectures
BIL 426 Computer Networks
BIL 323 OperaQng Systems I
BIL 354 Database Systems
BIL 324 OperaQng Systems II
BIL 220 IntroducQon to
Computer Systems
Processes, Memory Management
Network Protocols
ExecuQon Model Memory System
BIL 131 Bilgisayar
Programlama I
BIL 406 Micro-‐processing
Lab
BIL 428 Computer
Networks Lab
BIL 341 Sooware
Lab
BIL 341 Sooware Lab 2
System ImplementaQon
23
Course PerspecQve
¢ Most Systems Courses are Builder-‐Centric § Computer Architecture
§ Design pipelined processor in Verilog § OperaQng Systems
§ Implement large porQons of operaQng system § Compilers
§ Write compiler for simple language § Networking
§ Implement and simulate network protocols
24
Course PerspecQve (Cont.)
¢ Our Course is Programmer-‐Centric § Purpose is to show how by knowing more about the underlying system, one can be more effecQve as a programmer
§ Enable you to § Write programs that are more reliable and efficient
§ Not just a course for dedicated hackers § We bring out the hidden hacker in everyone
§ Cover material in this course that you won’t see elsewhere
25
Teaching staff
Instructors ¢ Aykut Erdem
[email protected] Office: 111 Tel: 297 7500 / 146
¢ Office Hours: Tuesday, 13:00-‐15:00
¢ Erkut Erdem [email protected] Office: 114 Tel: 297 7500 / 149
Office Hours: Friday, 13:00-‐15:00
Teaching Assistants Ali Caglayan [email protected] Tel: 297 7500 / 165
Office Hours: To be announced..
Oguzhan Guclu [email protected] Tel: 297 7500 / 149
Office Hours: To be announced..
26
Textbook
¢ Randal E. Bryant and David R. O’Hallaron, § “Computer Systems: A Programmer’s PerspecQve, Second EdiQon” (CS:APP2e), PrenQce Hall, 2011
§ hnp://csapp.cs.cmu.edu
¢ This book really maAers for the course! § How to solve labs § PracQce problems typical of exam problems
27
Course Components
¢ Lectures § Higher level concepts
¢ RecitaQons § Applied concepts, important tools and skills for labs, clarificaQon of lectures, exam coverage
¢ Programming Assignments (4) § The heart of the course § 2-‐3 weeks each § Provide in-‐depth understanding of an aspect of systems § Programming and measurement
¢ Exams (2 midterms + final) § Test your understanding of concepts & mathemaQcal principles
28
Geyng Help
¢ Couse web page: § http://web.cs.hacettepe.edu.tr/~bil220 § Complete schedule of lectures, exams, and assignments § Lecture notes and assignments
¢ CommunicaQon § Announcements and course related discussions § Through : https://www.piazza.com/hacettepe.edu.tr/spring2012/bil220
29
Policies: Grading
¢ Midterm 1 (15%) – 29 March 2012 ¢ Midterm 2 (15%) – 26 April 2012 ¢ Final (40%) – To be announced ¢ 4 Programming Assignments (25%) ¢ Class parQcipaQon (5%)
30
Course Overview (1)
¢ Programs and Data § Bits operaQons, arithmeQc, assembly language programs § RepresentaQon of C control and data structures § Includes aspects of architecture and compilers
¢ Performance § Co-‐opQmizaQon (control and data), measuring Qme on a computer § Includes aspects of architecture, compilers, and OS
¢ The Memory Hierarchy § Memory technology, memory hierarchy, caches, disks, locality § Includes aspects of architecture and OS
31
Course Overview (2)
¢ Linking § Ideas from staQc and dynamic link § Includes aspects of architecture and compilers
¢ ExcepQonal Control Flow § Hardware excepQons, processes, process control, Unix signals, non-‐local jumps
§ Includes aspects of compilers, OS, and architecture
¢ Virtual Memory § Virtual memory, address translaQon, dynamic storage allocaQon § Includes aspects of architecture and OS
¢ System Level I/O § Basic concepts of Unix I/O such as files and descriptors § Includes aspects of compilers and OS
32
Assignments
¢ PA1 (datalab): ManipulaQng bits ¢ PA2 (bomblab): Defusing a binary bomb ¢ PA3 (bufferlab): Hacking a buffer bomb ¢ PA4 (shelllab): WriQng your own Unix shell
You will develop all your programming assignments on a machine running Linux OS!
33
Assignment RaQonale
¢ Each assignment has a well-‐defined goal such as solving a puzzle or winning a contest
¢ Doing the lab should result in new skills and concepts
¢ We try to use compeQQon in a fun and healthy way § Set a reasonable threshold for full credit § Post intermediate results (anonymized) on Web page for glory!
34
Policies: Assignments
¢ Work groups § You must work alone on all assignments stated unless otherwise
¢ Submission § Assignments due at 23:59 on Wednesday evening § Electronic submissions (no excepQons!)
¢ Lateness penalQes § Get penalized 20% per day § No late submission later than 3 days aKer due date
35
CheaQng
¢ What is cheaQng? § Sharing code: by copying, retyping, looking at, or supplying a file § Coaching: helping your friend to write a lab, line by line § Copying code from previous course or from elsewhere on WWW
§ Only allowed to use code we supply, or from CS:APP website
¢ What is NOT cheaQng? § Explaining how to use systems or tools § Helping others with high-‐level design issues
¢ Penalty for cheaQng: § Removal from course with failing grade
¢ DetecQon of cheaQng: § We do check § Our tools for doing this are much bener than most cheaters think!
36
Always keep in mind
¢ This is a MUST course. ¢ In order to get the most out of the course, try to stay ahead. § By the weekend, make sure you have reviewed the material covered in the lectures of the preceding week.
¢ If you have any quesQons or just want to chat about the course, do not hesitate to go to office hours.
¢ Nothing is easy, but don’t give up!
37
§ Our goal is to make BIL 220 one of the highly respected and one of the most enjoyable courses in the department!
We aim high!
Photo courtesy of Flickr user richardlamprecht
38
WELCOME to BIL 220!
39
Hello World!
Section 1.2 Programs Are Translated by Other Programs into Different Forms 5
Pre-processor(cpp)
Compiler(cc1)
Assembler(as)
Linker(ld)
hello.c hello.i hello.s hello.o
printf.o
hello
Sourceprogram
(text)
Modifiedsource
program(text)
Assemblyprogram
(text)
Relocatableobject
programs(binary)
Executableobject
program(binary)
Figure 1.3 The compilation system.
Here, the gcc compiler driver reads the source file hello.c and translates it intoan executable object file hello. The translation is performed in the sequenceof four phases shown in Figure 1.3. The programs that perform the four phases(preprocessor, compiler, assembler, and linker) are known collectively as thecompilation system.
. Preprocessing phase.The preprocessor (cpp) modifies the original C programaccording to directives that begin with the # character. For example, the#include <stdio.h> command in line 1 of hello.c tells the preprocessorto read the contents of the system header file stdio.h and insert it directlyinto the program text. The result is another C program, typically with the .isuffix.
. Compilation phase. The compiler (cc1) translates the text file hello.i intothe text file hello.s, which contains an assembly-language program. Eachstatement in an assembly-language program exactly describes one low-levelmachine-language instruction in a standard text form. Assembly language isuseful because it provides a common output language for different compilersfor different high-level languages. For example, C compilers and Fortrancompilers both generate output files in the same assembly language.
. Assembly phase. Next, the assembler (as) translates hello.s into machine-language instructions, packages them in a form known as a relocatable objectprogram, and stores the result in the object file hello.o. The hello.o file isa binary file whose bytes encode machine language instructions rather thancharacters. If we were to view hello.o with a text editor, it would appear tobe gibberish.
. Linking phase.Notice that ourhelloprogram calls theprintf function, whichis part of the standard C library provided by every C compiler. The printffunction resides in a separate precompiled object file called printf.o, whichmust somehow be merged with our hello.o program. The linker (ld) handlesthis merging. The result is the hello file, which is an executable object file (orsimply executable) that is ready to be loaded into memory and executed bythe system.
unix> gcc -o hello hello.c
#include <stdio.h> int main() ���{ printf("hello, world\n"); }
40
Understanding How CompilaQon Systems Work is Really Maners!
¢ OpQmizing program performance § Understanding the basics of machine code and how the compiler translates different C statements into machine code is a must for bener C programming
¢ Understanding link-‐Qme errors § Some of the most of the puzzling programming errors are related to linking
¢ Avoiding security holes § Understanding the consequences of the way data and control informaQon are stored on the program stack is crucial for secure programming
Section 1.2 Programs Are Translated by Other Programs into Different Forms 5
Pre-processor(cpp)
Compiler(cc1)
Assembler(as)
Linker(ld)
hello.c hello.i hello.s hello.o
printf.o
hello
Sourceprogram
(text)
Modifiedsource
program(text)
Assemblyprogram
(text)
Relocatableobject
programs(binary)
Executableobject
program(binary)
Figure 1.3 The compilation system.
Here, the gcc compiler driver reads the source file hello.c and translates it intoan executable object file hello. The translation is performed in the sequenceof four phases shown in Figure 1.3. The programs that perform the four phases(preprocessor, compiler, assembler, and linker) are known collectively as thecompilation system.
. Preprocessing phase.The preprocessor (cpp) modifies the original C programaccording to directives that begin with the # character. For example, the#include <stdio.h> command in line 1 of hello.c tells the preprocessorto read the contents of the system header file stdio.h and insert it directlyinto the program text. The result is another C program, typically with the .isuffix.
. Compilation phase. The compiler (cc1) translates the text file hello.i intothe text file hello.s, which contains an assembly-language program. Eachstatement in an assembly-language program exactly describes one low-levelmachine-language instruction in a standard text form. Assembly language isuseful because it provides a common output language for different compilersfor different high-level languages. For example, C compilers and Fortrancompilers both generate output files in the same assembly language.
. Assembly phase. Next, the assembler (as) translates hello.s into machine-language instructions, packages them in a form known as a relocatable objectprogram, and stores the result in the object file hello.o. The hello.o file isa binary file whose bytes encode machine language instructions rather thancharacters. If we were to view hello.o with a text editor, it would appear tobe gibberish.
. Linking phase.Notice that ourhelloprogram calls theprintf function, whichis part of the standard C library provided by every C compiler. The printffunction resides in a separate precompiled object file called printf.o, whichmust somehow be merged with our hello.o program. The linker (ld) handlesthis merging. The result is the hello file, which is an executable object file (orsimply executable) that is ready to be loaded into memory and executed bythe system.
41
Hardware OrganizaQon of a System 8 Chapter 1 A Tour of Computer Systems
Figure 1.4Hardware organizationof a typical system. CPU:Central Processing Unit,ALU: Arithmetic/LogicUnit, PC: Program counter,USB: Universal Serial Bus.
CPU
Register file
PC ALU
Bus interface I/Obridge
System bus Memory bus
Mainmemory
I/O busExpansion slots forother devices suchas network adaptersDisk
controllerGraphicsadapter
DisplayMouse Keyboard
USBcontroller
Diskhello executable
stored on disk
systems, but all systems have a similar look and feel. Don’t worry about thecomplexity of this figure just now. We will get to its various details in stagesthroughout the course of the book.
Buses
Running throughout the system is a collection of electrical conduits called busesthat carry bytes of information back and forth between the components. Busesare typically designed to transfer fixed-sized chunks of bytes known as words. Thenumber of bytes in a word (the word size) is a fundamental system parameter thatvaries across systems. Most machines today have word sizes of either 4 bytes (32bits) or 8 bytes (64 bits). For the sake of our discussion here, we will assume a wordsize of 4 bytes, and we will assume that buses transfer only one word at a time.
I/O Devices
Input/output (I/O) devices are the system’s connection to the external world. Ourexample system has four I/O devices: a keyboard and mouse for user input, adisplay for user output, and a disk drive (or simply disk) for long-term storage ofdata and programs. Initially, the executable hello program resides on the disk.
Each I/O device is connected to the I/O bus by either a controller or an adapter.The distinction between the two is mainly one of packaging. Controllers are chipsets in the device itself or on the system’s main printed circuit board (often calledthe motherboard). An adapter is a card that plugs into a slot on the motherboard.Regardless, the purpose of each is to transfer information back and forth betweenthe I/O bus and an I/O device.
8 Chapter 1 A Tour of Computer Systems
Figure 1.4Hardware organizationof a typical system. CPU:Central Processing Unit,ALU: Arithmetic/LogicUnit, PC: Program counter,USB: Universal Serial Bus.
CPU
Register file
PC ALU
Bus interface I/Obridge
System bus Memory bus
Mainmemory
I/O busExpansion slots forother devices suchas network adaptersDisk
controllerGraphicsadapter
DisplayMouse Keyboard
USBcontroller
Diskhello executable
stored on disk
systems, but all systems have a similar look and feel. Don’t worry about thecomplexity of this figure just now. We will get to its various details in stagesthroughout the course of the book.
Buses
Running throughout the system is a collection of electrical conduits called busesthat carry bytes of information back and forth between the components. Busesare typically designed to transfer fixed-sized chunks of bytes known as words. Thenumber of bytes in a word (the word size) is a fundamental system parameter thatvaries across systems. Most machines today have word sizes of either 4 bytes (32bits) or 8 bytes (64 bits). For the sake of our discussion here, we will assume a wordsize of 4 bytes, and we will assume that buses transfer only one word at a time.
I/O Devices
Input/output (I/O) devices are the system’s connection to the external world. Ourexample system has four I/O devices: a keyboard and mouse for user input, adisplay for user output, and a disk drive (or simply disk) for long-term storage ofdata and programs. Initially, the executable hello program resides on the disk.
Each I/O device is connected to the I/O bus by either a controller or an adapter.The distinction between the two is mainly one of packaging. Controllers are chipsets in the device itself or on the system’s main printed circuit board (often calledthe motherboard). An adapter is a card that plugs into a slot on the motherboard.Regardless, the purpose of each is to transfer information back and forth betweenthe I/O bus and an I/O device.
42
Running the hello Program Section 1.4 Processors Read and Interpret Instructions Stored in Memory 11
Figure 1.5Reading the hellocommand from thekeyboard.
CPU
Register file
PC ALU
Bus interface I/Obridge
System bus Memory bus
Mainmemory
I/O busExpansion slots forother devices suchas network adaptersDisk
controllerGraphicsadapter
DisplayMouse Keyboard
USBcontroller
Disk
“hello”
Usertypes“hello”
Disk
CPU
Register file
PC ALU
Bus interface I/Obridge
System bus Memory bus
Mainmemory
I/O busExpansion slots forother devices suchas network adaptersDisk
controllerGraphicsadapter
DisplayMouse Keyboard
USBcontroller
“hello, world\n”
hello code
hello executablestored on disk
Figure 1.6 Loading the executable from disk into main memory.
Reading the hello command from the keyboard.
43
Running the hello Program
Loading the executable from disk into main memory.
Section 1.4 Processors Read and Interpret Instructions Stored in Memory 11
Figure 1.5Reading the hellocommand from thekeyboard.
CPU
Register file
PC ALU
Bus interface I/Obridge
System bus Memory bus
Mainmemory
I/O busExpansion slots forother devices suchas network adaptersDisk
controllerGraphicsadapter
DisplayMouse Keyboard
USBcontroller
Disk
“hello”
Usertypes“hello”
Disk
CPU
Register file
PC ALU
Bus interface I/Obridge
System bus Memory bus
Mainmemory
I/O busExpansion slots forother devices suchas network adaptersDisk
controllerGraphicsadapter
DisplayMouse Keyboard
USBcontroller
“hello, world\n”
hello code
hello executablestored on disk
Figure 1.6 Loading the executable from disk into main memory.
44
Running the hello Program 12 Chapter 1 A Tour of Computer Systems
Figure 1.7Writing the output stringfrom memory to thedisplay.
CPU
Register file
PC ALU
Bus interface I/Obridge
System bus Memory bus
Mainmemory
I/O busExpansion slots forother devices suchas network adaptersDisk
controllerGraphicsadapter
DisplayMouse Keyboard
USBcontroller
Disk“hello, world\n”
“hello, world\n”
hello code
hello executablestored on disk
1.5 Caches Matter
An important lesson from this simple example is that a system spends a lot oftime moving information from one place to another. The machine instructions inthe hello program are originally stored on disk. When the program is loaded,they are copied to main memory. As the processor runs the program, instruc-tions are copied from main memory into the processor. Similarly, the data string“hello,world\n”, originally on disk, is copied to main memory, and then copiedfrom main memory to the display device. From a programmer’s perspective, muchof this copying is overhead that slows down the “real work” of the program. Thus,a major goal for system designers is to make these copy operations run as fast aspossible.
Because of physical laws, larger storage devices are slower than smaller stor-age devices. And faster devices are more expensive to build than their slowercounterparts. For example, the disk drive on a typical system might be 1000 timeslarger than the main memory, but it might take the processor 10,000,000 timeslonger to read a word from disk than from memory.
Similarly, a typical register file stores only a few hundred bytes of information,as opposed to billions of bytes in the main memory. However, the processor canread data from the register file almost 100 times faster than from memory. Evenmore troublesome, as semiconductor technology progresses over the years, thisprocessor-memory gap continues to increase. It is easier and cheaper to makeprocessors run faster than it is to make main memory run faster.
To deal with the processor-memory gap, system designers include smallerfaster storage devices called cache memories (or simply caches) that serve astemporary staging areas for information that the processor is likely to need inthe near future. Figure 1.8 shows the cache memories in a typical system. An L1
WriQng the output string from memory to the display.
45
Caches Maner
¢ A system spends a lot of Qme moving informaQon from one place to another.
¢ Larger storage devices are slower than smaller storage devices. § e.g., the processor can read data from the register file almost 100 Qmes faster than from main memory.
¢ Faster devices are more expensive to build than their slower counterparts.
¢ Cache memories as temporary staging areas Section 1.6 Storage Devices Form a Hierarchy 13
Figure 1.8Cache memories.
I/Obridge
CPU chip
Cachememories
Register file
System bus Memory bus
Bus interfaceMain
memory
ALU
cache on the processor chip holds tens of thousands of bytes and can be accessednearly as fast as the register file. A larger L2 cache with hundreds of thousandsto millions of bytes is connected to the processor by a special bus. It might take 5times longer for the process to access the L2 cache than the L1 cache, but this isstill 5 to 10 times faster than accessing the main memory. The L1 and L2 caches areimplemented with a hardware technology known as static random access memory(SRAM). Newer and more powerful systems even have three levels of cache: L1,L2, and L3. The idea behind caching is that a system can get the effect of botha very large memory and a very fast one by exploiting locality, the tendency forprograms to access data and code in localized regions. By setting up caches to holddata that is likely to be accessed often, we can perform most memory operationsusing the fast caches.
One of the most important lessons in this book is that application program-mers who are aware of cache memories can exploit them to improve the perfor-mance of their programs by an order of magnitude. You will learn more aboutthese important devices and how to exploit them in Chapter 6.
1.6 Storage Devices Form a Hierarchy
This notion of inserting a smaller, faster storage device (e.g., cache memory)between the processor and a larger slower device (e.g., main memory) turns outto be a general idea. In fact, the storage devices in every computer system areorganized as a memory hierarchy similar to Figure 1.9. As we move from the topof the hierarchy to the bottom, the devices become slower, larger, and less costlyper byte. The register file occupies the top level in the hierarchy, which is knownas level 0, or L0. We show three levels of caching L1 to L3, occupying memoryhierarchy levels 1 to 3. Main memory occupies level 4, and so on.
The main idea of a memory hierarchy is that storage at one level serves as acache for storage at the next lower level. Thus, the register file is a cache for theL1 cache. Caches L1 and L2 are caches for L2 and L3, respectively. The L3 cacheis a cache for the main memory, which is a cache for the disk. On some networkedsystems with distributed file systems, the local disk serves as a cache for data storedon the disks of other systems.
46
The Memory Hierarchy
¢ Storage devices form a hierarchy ¢ Storage at one level serves as a cache for storage at the next lower level.
14 Chapter 1 A Tour of Computer Systems
CPU registers hold words retrieved from cache memory.
L1 cache holds cache lines retrieved from L2 cache.
L2 cache holds cache linesretrieved from L3 cache.
Main memory holds disk blocks retrieved from local disks.
Local disks hold filesretrieved from disks onremote network server.
Regs
L3 cache(SRAM)
L2 cache(SRAM)
L1 cache(SRAM)
Main memory(DRAM)
Local secondary storage(local disks)
Remote secondary storage(distributed file systems, Web servers)
Smaller,faster,and
costlier(per byte)storagedevices
Larger,slower,
andcheaper
(per byte)storagedevices
L0:
L1:
L2:
L3:
L4:
L5:
L6:
L3 cache holds cache linesretrieved from memory.
Figure 1.9 An example of a memory hierarchy.
Just as programmers can exploit knowledge of the different caches to improveperformance, programmers can exploit their understanding of the entire memoryhierarchy. Chapter 6 will have much more to say about this.
1.7 The Operating System Manages the Hardware
Back to our hello example. When the shell loaded and ran the hello program,and when the hello program printed its message, neither program accessed thekeyboard, display, disk, or main memory directly. Rather, they relied on theservices provided by the operating system. We can think of the operating system asa layer of software interposed between the application program and the hardware,as shown in Figure 1.10. All attempts by an application program to manipulate thehardware must go through the operating system.
The operating system has two primary purposes: (1) to protect the hardwarefrom misuse by runaway applications, and (2) to provide applications with simpleand uniform mechanisms for manipulating complicated and often wildly differentlow-level hardware devices. The operating system achieves both goals via the
Figure 1.10Layered view of acomputer system.
Application programs
Operating system
Main memory I/O devicesProcessor
Software
Hardware
47
The OperaQng System Manages the Hardware ¢ The operaQng system as a layer of sooware interposed between the applicaQon program and the hardware
¢ All anempts by an applicaQon program to manipulate the hardware must go through the operaQng system.
¢ The operaQng system has two primary purposes: 1. To protect the hardware from misuse by runaway applicaQons, and 2. To provide applicaQons with simple and uniform mechanisms for
manipulaQng complicated and ooen wildly different low-‐level hardware devices.
14 Chapter 1 A Tour of Computer Systems
CPU registers hold words retrieved from cache memory.
L1 cache holds cache lines retrieved from L2 cache.
L2 cache holds cache linesretrieved from L3 cache.
Main memory holds disk blocks retrieved from local disks.
Local disks hold filesretrieved from disks onremote network server.
Regs
L3 cache(SRAM)
L2 cache(SRAM)
L1 cache(SRAM)
Main memory(DRAM)
Local secondary storage(local disks)
Remote secondary storage(distributed file systems, Web servers)
Smaller,faster,and
costlier(per byte)storagedevices
Larger,slower,
andcheaper
(per byte)storagedevices
L0:
L1:
L2:
L3:
L4:
L5:
L6:
L3 cache holds cache linesretrieved from memory.
Figure 1.9 An example of a memory hierarchy.
Just as programmers can exploit knowledge of the different caches to improveperformance, programmers can exploit their understanding of the entire memoryhierarchy. Chapter 6 will have much more to say about this.
1.7 The Operating System Manages the Hardware
Back to our hello example. When the shell loaded and ran the hello program,and when the hello program printed its message, neither program accessed thekeyboard, display, disk, or main memory directly. Rather, they relied on theservices provided by the operating system. We can think of the operating system asa layer of software interposed between the application program and the hardware,as shown in Figure 1.10. All attempts by an application program to manipulate thehardware must go through the operating system.
The operating system has two primary purposes: (1) to protect the hardwarefrom misuse by runaway applications, and (2) to provide applications with simpleand uniform mechanisms for manipulating complicated and often wildly differentlow-level hardware devices. The operating system achieves both goals via the
Figure 1.10Layered view of acomputer system.
Application programs
Operating system
Main memory I/O devicesProcessor
Software
Hardware
48
AbstracQons provided by an OS
¢ The operaQng system achieves both goals via the fundamental abstracQons” § Processes: The operaQng system’s abstracQon for running programs. § Virtual Memory: An abstracQon that provides each process with the illusion that it has exclusive use of the main memory.
§ Files: AbstracQons for I/O devices Section 1.7 The Operating System Manages the Hardware 15
Figure 1.11Abstractions provided byan operating system.
Main memory I/O devicesProcessor
Processes
Virtual memory
Files
fundamental abstractions shown in Figure 1.11: processes, virtual memory, andfiles. As this figure suggests, files are abstractions for I/O devices, virtual memoryis an abstraction for both the main memory and disk I/O devices, and processesare abstractions for the processor, main memory, and I/O devices. We will discusseach in turn.
Aside Unix and Posix
The 1960s was an era of huge, complex operating systems, such as IBM’s OS/360 and Honeywell’sMultics systems. While OS/360 was one of the most successful software projects in history, Multicsdragged on for years and never achieved wide-scale use. Bell Laboratories was an original partner in theMultics project, but dropped out in 1969 because of concern over the complexity of the project and thelack of progress. In reaction to their unpleasant Multics experience, a group of Bell Labs researchers—Ken Thompson, Dennis Ritchie, Doug McIlroy, and Joe Ossanna—began work in 1969 on a simpleroperating system for a DEC PDP-7 computer, written entirely in machine language. Many of the ideasin the new system, such as the hierarchical file system and the notion of a shell as a user-level process,were borrowed from Multics but implemented in a smaller, simpler package. In 1970, Brian Kernighandubbed the new system “Unix” as a pun on the complexity of “Multics.” The kernel was rewritten inC in 1973, and Unix was announced to the outside world in 1974 [89].
Because Bell Labs made the source code available to schools with generous terms, Unix developeda large following at universities. The most influential work was done at the University of Californiaat Berkeley in the late 1970s and early 1980s, with Berkeley researchers adding virtual memory andthe Internet protocols in a series of releases called Unix 4.xBSD (Berkeley Software Distribution).Concurrently, Bell Labs was releasing their own versions, which became known as System V Unix.Versions from other vendors, such as the Sun Microsystems Solaris system, were derived from theseoriginal BSD and System V versions.
Trouble arose in the mid 1980s as Unix vendors tried to differentiate themselves by adding newand often incompatible features. To combat this trend, IEEE (Institute for Electrical and ElectronicsEngineers) sponsored an effort to standardize Unix, later dubbed “Posix” by Richard Stallman. Theresult was a family of standards, known as the Posix standards, that cover such issues as the C languageinterface for Unix system calls, shell programs and utilities, threads, and network programming. Asmore systems comply more fully with the Posix standards, the differences between Unix versions aregradually disappearing.
49
Binary RepresentaQons
0.0V"0.5V"
2.8V"3.3V"
0" 1" 0"
50
Encoding Byte Values
¢ Byte = 8 bits § Binary 000000002 to 111111112 § Decimal: 010 to 25510 § Hexadecimal 0016 to FF16
§ Base 16 number representaQon § Use characters ‘0’ to ‘9’ and ‘A’ to ‘F’ § Write FA1D37B16 in C as
– 0xFA1D37B – 0xfa1d37b
0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 A 10 1010 B 11 1011 C 12 1100 D 13 1101 E 14 1110 F 15 1111
51
Byte-‐Oriented Memory OrganizaQon
¢ Programs Refer to Virtual Addresses § Conceptually very large array of bytes § Actually implemented with hierarchy of different memory types § System provides address space private to parQcular “process”
§ Program being executed § Program can clobber its own data, but not that of others
¢ Compiler + Run-‐Time System Control AllocaQon § Where different program objects should be stored § All allocaQon within single virtual address space
• • •"
52
Machine Words
¢ Machine Has “Word Size” § Nominal size of integer-‐valued data
§ Including addresses § Most current machines use 32 bits (4 bytes) words
§ Limits addresses to 4GB § Becoming too small for memory-‐intensive applicaQons
§ High-‐end systems use 64 bits (8 bytes) words § PotenQal address space ≈ 1.8 X 1019 bytes § x86-‐64 machines support 48-‐bit addresses: 256 Terabytes
§ Machines support mulQple data formats § FracQons or mulQples of word size § Always integral number of bytes
53
Word-‐Oriented Memory OrganizaQon ¢ Addresses Specify Byte LocaQons
§ Address of first byte in word § Addresses of successive words differ by 4 (32-‐bit) or 8 (64-‐bit)
0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011
32-bit"Words" Bytes" Addr."
0012 0013 0014 0015
64-bit"Words"
Addr "="??
Addr "="??
Addr "="??
Addr "="??
Addr "="??
Addr "="??
0000
0004
0008
0012
0000
0008
54
Data RepresentaQons
C Data Type Typical 32-bit Intel IA32 x86-64
char 1 1 1
short 2 2 2
int 4 4 4
long 4 4 8
long long 8 8 8
float 4 4 4
double 8 8 8
long double 8 10/12 10/16
pointer 4 4 8
55
Byte Ordering
¢ How should bytes within a mulQ-‐byte word be ordered in memory?
¢ ConvenQons § Big Endian: Sun, PPC Mac, Internet
§ Least significant byte has highest address § Linle Endian: x86
§ Least significant byte has lowest address
56
Byte Ordering Example
¢ Big Endian § Least significant byte has highest address
¢ Linle Endian § Least significant byte has lowest address
¢ Example § Variable x has 4-‐byte representaQon 0x01234567 § Address given by &x is 0x100
0x100 0x101 0x102 0x103
01 23 45 67
0x100 0x101 0x102 0x103
67 45 23 01
Big Endian"
Little Endian"
01 23 45 67
67 45 23 01
57
The origin of “endian”
“Gulliver finds out that there is a law, proclaimed by the grandfather of the present ruler, requiring all ciQzens of Lilliput to break their eggs only at the linle ends. Of course, all those ciQzens who broke their eggs at the big ends were angered by the proclamaQon. Civil war broke out between the Linle-‐Endians and the Big-‐Endians, resulQng in the Big-‐Endians taking refuge on a nearby island, the kingdom of Blefuscu.”
– Danny Cohen, On Holy Wars and A Plea For Peace, 1980
IllustraQon by Chris Beatrice
58
Address "Instruction Code "Assembly Rendition" 8048365: 5b pop %ebx 8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx 804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)
Reading Byte-‐Reversed LisQngs
¢ Disassembly § Text representaQon of binary machine code § Generated by program that reads the machine code
¢ Example Fragment
¢ Deciphering Numbers § Value: 0x12ab § Pad to 32 bits: 0x000012ab § Split into bytes: 00 00 12 ab § Reverse: ab 12 00 00
59
Examining Data RepresentaQons
¢ Code to Print Byte RepresentaQon of Data § CasQng pointer to unsigned char * creates byte array
Printf directives:"%p: "Print pointer"%x: "Print Hexadecimal"
typedef unsigned char *pointer; void show_bytes(pointer start, int len){ int i; for (i = 0; i < len; i++) printf(”%p\t0x%.2x\n",start+i, start[i]); printf("\n"); }
60
show_bytes ExecuQon Example int a = 15213; printf("int a = 15213;\n"); show_bytes((pointer) &a, sizeof(int));
Result (Linux):"int a = 15213; 0x11ffffcb8 0x6d 0x11ffffcb9 0x3b 0x11ffffcba 0x00 0x11ffffcbb 0x00
61
RepresenQng Integers Decimal: "15213 Binary: 0011 1011 0110 1101
Hex: 3 B 6 D
6D 3B 00 00
IA32, x86-64"
3B 6D
00 00
Sun"
int A = 15213;
93 C4 FF FF
IA32, x86-64"
C4 93
FF FF
Sun"
Two’s complement representation"(Covered later)"
int B = -15213;
long int C = 15213;
00 00 00 00
6D 3B 00 00
x86-64"
3B 6D
00 00
Sun"
6D 3B 00 00
IA32"
62
RepresenQng Pointers
Different compilers & machines assign different locaQons to objects
int B = -15213; int *P = &B;
x86-64"Sun" IA32"EFFFFB2C
D4F8FFBF
0C89ECFFFF7F0000
63
char S[6] = "18243";
RepresenQng Strings
¢ Strings in C § Represented by array of characters § Each character encoded in ASCII format
§ Standard 7-‐bit encoding of character set § Character “0” has code 0x30
– Digit i has code 0x30+i § String should be null-‐terminated
§ Final character = 0
¢ CompaQbility § Byte ordering not an issue
Linux/Alpha" Sun"
313832343300
313832343300
64
Boolean Algebra
¢ Developed by George Boole in 19th Century § Algebraic representaQon of logic
§ Encode “True” as 1 and “False” as 0
And n A&B = 1 when both A=1 and B=1
Or n A|B = 1 when either A=1 or B=1
Not n ~A = 1 when A=0
Exclusive-‐Or (Xor) n A^B = 1 when either A=1 or B=1, but not both
65
ApplicaQon of Boolean Algebra
¢ Applied to Digital Systems by Claude Shannon § 1937 MIT Master’s Thesis § Reason about networks of relay switches
§ Encode closed switch as 1, open switch as 0
A"
~A"
~B"
B"
Connection when" "
A&~B | ~A&B"" "
A&~B"
~A&B"= A^B"
66
General Boolean Algebras
¢ Operate on Bit Vectors § OperaQons applied bitwise
¢ All of the ProperQes of Boolean Algebra Apply
01101001 & 01010101 01000001
01101001 | 01010101 01111101
01101001 ^ 01010101 00111100
~ 01010101 10101010 01000001 01111101 00111100 10101010
67
RepresenQng & ManipulaQng Sets
¢ RepresentaQon § Width w bit vector represents subsets of {0, …, w–1} § aj = 1 if j ∈ A
§ 01101001 { 0, 3, 5, 6 } § 76543210
§ 01010101 { 0, 2, 4, 6 } § 76543210
¢ OperaQons § & IntersecQon 01000001 { 0, 6 } § | Union 01111101 { 0, 2, 3, 4, 5, 6 } § ^ Symmetric difference 00111100 { 2, 3, 4, 5 } § ~ Complement 10101010 { 1, 3, 5, 7 }
68
Bit-‐Level OperaQons in C
¢ OperaQons &, |, ~, ^ Available in C § Apply to any “integral” data type
§ long, int, short, char, unsigned§ View arguments as bit vectors § Arguments applied bit-‐wise
¢ Examples (Char data type) § ~0x41 ➙ 0xBE
§ ~010000012 ➙ 101111102§ ~0x00 ➙ 0xFF
§ ~000000002 ➙ 111111112§ 0x69 & 0x55 ➙ 0x41
§ 011010012 & 010101012 ➙ 010000012§ 0x69 | 0x55 ➙ 0x7D
§ 011010012 | 010101012 ➙ 011111012
69
Contrast: Logic OperaQons in C
¢ Contrast to Logical Operators § &&, ||, !
§ View 0 as “False” § Anything nonzero as “True” § Always return 0 or 1 § Early terminaQon
¢ Examples (char data type) § !0x41 ➙ 0x00§ !0x00 ➙ 0x01§ !!0x41 ➙ 0x01
§ 0x69 && 0x55 ➙ 0x01§ 0x69 || 0x55 ➙ 0x01§ p && *p (avoids null pointer access)
70
Shio OperaQons
¢ Leo Shio: x << y § Shio bit-‐vector x leo y posiQons
– Throw away extra bits on leo § Fill with 0’s on right
¢ Right Shio: x >> y § Shio bit-‐vector x right y posiQons
§ Throw away extra bits on right § Logical shio
§ Fill with 0’s on leo § ArithmeQc shio
§ Replicate most significant bit on right
¢ Undefined Behavior § Shio amount < 0 or ≥ word size
01100010 Argument x
00010000 << 3
00011000 Log. >> 2
00011000 Arith. >> 2
10100010 Argument x
00010000 << 3
00101000 Log. >> 2
11101000 Arith. >> 2
00010000 00010000
00011000 00011000
00011000 00011000
00010000
00101000
11101000
00010000
00101000
11101000
71
The Strange Birth and Long Life of Unix
¢ hnp://spectrum.ieee.org/compuQng/sooware/the-‐strange-‐birth-‐and-‐long-‐life-‐of-‐unix
Photo: Alcatel-‐Lucent