Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng
Lecture 1: Introduc9on
I-‐Ting Angelina Lee Jan 13, 2015
Technology Scaling
0
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
1970 1975 1980 1985 1990 1995 2000 2005 2010
u Transistors x 1000 ■ Clock frequency (MHz)
Transistor count is still
rising, …
but clock speed is bounded at
~4GHz.
Power Density
Source: Patrick Gelsinger, Intel Developer’s Forum, Intel CorporaDon, 2004.
Power density, had scaling of clock frequency conDnued its trend of 25%-‐30% increase per year.
Technology Scaling
0
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
1970 1975 1980 1985 1990 1995 2000 2005 2010
u Transistors x 1000 ■ Clock frequency (MHz) ▲ Power (W) ● Cores
Each generation of Moore’s Law
potentially doubles the number of cores.
Chip Mul9processor (CMP)
Abstract Mul9core Architecture
…
Memory I/O
$
P
$
P
$
P
Network
Fibonacci Numbers The Fibonacci numbers are the sequence 〈0, 1, 1, 2, 3, 5, 8, 13, 21, 34, …〉, where each number is the sum of the previous two.
The sequence is named aTer Leonardo di Pisa (1170–1250 A.D.), also known as Fibonacci, a contracDon of filius Bonaccii —“son of Bonaccio.” Fibonacci’s 1202 book Liber Abaci introduced the sequence to Western mathemaDcs, although it had previously been discovered by Indian mathemaDcians.
Recurrence: F0 = 0, F1 = 1, Fn = Fn–1 + Fn–2 for n > 1.
Fibonacci Program #include <inttypes.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } int main(int argc, char *argv[]) { uint64_t n = atoi(argv[1]); uint64_t result = fib(n); printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Disclaimer to Algorithms Police This recursive program is a poor way to compute the nth Fibonacci number, but it provides a good didacDc example.
Fibonacci Execu9on fib(4)
fib(3)
fib(2)
fib(1) fib(0)
fib(1)
fib(2)
fib(1) fib(0)
Key idea for paralleliza9on The calculaDons of fib(n-1) and fib(n-2) can be executed simultaneously without mutual interference.
uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } }
Pthreads* ∙ Standard API for threading specified by ANSI/IEEE POSIX 1003.1-‐2008.
∙ Do-‐it-‐yourself scheduling. ∙ Built as a library of funcDons with “special” non-‐C++ semanDcs.
∙ Each thread implements an abstracDon of a processor, which are mulDplexed onto machine resources.
∙ Threads communicate though shared memory. ∙ Library funcDons mask the protocols involved in interthread coordinaDon.
*WinAPI threads and Java threads provide similar funcDonality.
Key Pthread Func9ons int pthread_create( pthread_t *thread, //returned identifier for the new thread const pthread_attr_t *attr, //object to set thread attributes (NULL for default) void *(*func)(void *), //routine executed after creation void *arg //a single argument passed to func
) //returns error status
int pthread_join( pthread_t thread, //identifier of thread to wait for void **status //terminating thread’s status (NULL to ignore)
) //returns error status
Pthread Implementa9on #include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Pthread Implementa9on #include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Original code.
#include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Pthread Implementa9on
Structure for thread
arguments.
Pthread Implementa9on #include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Func9on called when thread is created.
Pthread Implementa9on #include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
No point in crea9ng thread if there isn’t
enough to do.
#include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Pthread Implementa9on
Marshal input argument to
thread.
Pthread Implementa9on #include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Create thread to execute fib(n–
1).
#include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Pthread Implementa9on
Main program executes
fib(n–2) in parallel.
#include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Pthread Implementa9on
Block un9l the auxiliary thread
finishes.
Pthread Implementa9on #include <inttypes.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> uint64_t fib(uint64_t n) { if (n < 2) { return n; } else { uint64_t x = fib(n-‐1); uint64_t y = fib(n-‐2); return (x + y); } } typedef struct { uint64_t input; uint64_t output; } thread_args; void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }
int main(int argc, char *argv[]) { pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {
result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci of %" PRIu64 " is %" PRIu64 ".\n", n, result); return 0; }
Add the results together to produce the final output.
Issues with Pthreads
Overhead The cost of creaDng a thread >104 cycles ⇒ coarse-‐grained concurrency. (Thread pools can help.)
Scalability
Fibonacci code gets at most about 1.5 speedup for 2 cores. Need a rewrite for more cores.
Modularity The Fibonacci logic is no longer neatly encapsulated in the fib() funcDon.
Code Simplicity
Programmers must marshal arguments (shades of 1958! ) and engage in error-‐prone protocols in order to load-‐balance.
Concurrency Pla]orms A concurrency plakorm should provide:
§ an interface for specifying the logical parallelism of the computaDon;
§ a runDme layer to automate scheduling and synchronizaDon; and
§ guarantees of performance and resource uDlizaDon compeDDve with hand-‐tuned code.
linguistic interface"
compiler"
runtime"
user application"
operating system"
hardware"
tools"
Concurrency Pla]orm
Modern Concurrency Pla]orms*
1998 2002 2006 2010
1996 2000 2004 2008
OpenMP
Cilk-‐5, MIT
Java Fork/Join Framework, Doug Lea
StreamIt, MIT
Fortress, Oracle Labs PPL, MicrosoT
JCilk, MIT X10, IBM
TPL, MicrosoT
Cilk++, CilkArts/Intel Habanero Java, Rice
Cilk-‐M, MIT Cilk Plus, Intel
TBB, Intel
Sequoia++, Stanford
1994
pH, MIT
NESL, CMU
* The list focuses on plakorms that run on systems with shared memory.
Habanero C, Rice
CnC, Intel
2012
Habanero Scala, Rice
What You Will Learn in This Class
Design and implementaDon of concurrency plakorms • language model: TBB, OpenMP, and Cilk • analysis of a mulDthreaded computaDon • scheduling algorithm: work-‐stealing scheduler • implementaDon of work-‐stealing schedulers • implementaDon of "cactus stack" • thread-‐safe storage allocator
What You Will Learn in This Class
SynchronizaDons – wriDng a safe parallel program: • what are races and different kinds of races • implementaDon of synchronizaDon mechansims • race detecDon in Cilk • parallel race detecDon • race detecDon for pthreaded code • sequenDal consistency and linearizability • concurrent data structures
What You Will Learn in This Class
Performance engineer your parallel programs: • scalability analysis • understanding the memory subsystem: cache coherence protocols
• Hardware memory model • Language-‐level memory model
Course Logis9cs
• Course webpage: h_p://classes.engineering.wustl.edu/cse539/web/
• Piazza page: h_ps://piazza.com/wustl/spring2015/cse539s/home – Please accept the piazza invita9on ASAP.
Assignments
• 1 homework for the enDre term, but you’d want to start early …
• 3 assigned projects • 1 final (group) project • There is NO exam.
Grading Policy
• Homework (10%) • Projects (15% each) • Final project (20%) • Class parDcipaDon (25%) – scribe notes – in-‐class discussion
Collabora9on Policy
• Please read the policy posted on the webpage.