30
CSE 539S, Spring 2015 Concepts in Mul9core Compu9ng Lecture 1: Introduc9on ITing Angelina Lee Jan 13, 2015

Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

CSE  539S,  Spring  2015  Concepts  in  Mul9core  Compu9ng  

Lecture    1:  Introduc9on  

I-­‐Ting  Angelina  Lee  Jan  13,  2015  

Page 2: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Technology  Scaling  

0

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

1970 1975 1980 1985 1990 1995 2000 2005 2010

u  Transistors  x  1000  ■  Clock  frequency  (MHz)  

Transistor count is still

rising, …

but clock speed is bounded at

~4GHz.

Page 3: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Power  Density  

Source:  Patrick  Gelsinger,  Intel  Developer’s  Forum,  Intel  CorporaDon,  2004.  

Power  density,  had  scaling  of  clock  frequency  conDnued  its  trend  of  25%-­‐30%  increase  per  year.  

Page 4: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Technology  Scaling  

0

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

1970 1975 1980 1985 1990 1995 2000 2005 2010

u  Transistors  x  1000  ■  Clock  frequency  (MHz)  ▲  Power  (W)  ●  Cores  

Each generation of Moore’s Law

potentially doubles the number of cores.

Page 5: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Chip  Mul9processor  (CMP)  

Abstract  Mul9core  Architecture  

…  

Memory   I/O  

$  

P

$  

P

$  

P

Network  

Page 6: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Fibonacci  Numbers  The  Fibonacci  numbers  are  the  sequence  〈0,  1,  1,  2,  3,  5,  8,  13,  21,  34,  …〉,  where  each  number  is  the  sum  of  the  previous  two.  

The  sequence  is  named  aTer  Leonardo  di  Pisa  (1170–1250  A.D.),  also  known  as  Fibonacci,  a  contracDon  of  filius  Bonaccii  —“son  of  Bonaccio.”    Fibonacci’s  1202  book  Liber  Abaci   introduced  the  sequence  to  Western  mathemaDcs,  although  it  had  previously  been  discovered  by  Indian  mathemaDcians.  

Recurrence:  F0  =  0,  F1  =  1,  Fn  =  Fn–1  +  Fn–2  for  n  >  1.  

Page 7: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Fibonacci  Program  #include  <inttypes.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }      int  main(int  argc,  char  *argv[])  {      uint64_t  n  =  atoi(argv[1]);      uint64_t  result  =  fib(n);      printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result);      return  0;  }  

Disclaimer  to  Algorithms  Police  This  recursive  program  is  a  poor  way  to  compute  the  nth  Fibonacci  number,  but  it  provides  a  good  didacDc  example.    

Page 8: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Fibonacci  Execu9on  fib(4)

fib(3)

fib(2)

fib(1) fib(0)

fib(1)

fib(2)

fib(1) fib(0)

Key  idea  for  paralleliza9on  The  calculaDons  of  fib(n-1)  and  fib(n-2)  can  be  executed  simultaneously  without  mutual  interference.  

uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }  

Page 9: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Pthreads*  ∙  Standard  API  for  threading  specified  by  ANSI/IEEE  POSIX  1003.1-­‐2008.  

∙  Do-­‐it-­‐yourself  scheduling.  ∙  Built  as  a  library  of  funcDons  with  “special”  non-­‐C++  semanDcs.  

∙  Each  thread  implements  an  abstracDon  of  a  processor,  which  are  mulDplexed  onto  machine  resources.  

∙  Threads  communicate  though  shared  memory.  ∙  Library  funcDons  mask  the  protocols  involved  in  interthread  coordinaDon.  

*WinAPI  threads  and  Java  threads  provide  similar  funcDonality.  

Page 10: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Key  Pthread  Func9ons  int  pthread_create(  pthread_t  *thread,        //returned  identifier  for  the  new  thread    const  pthread_attr_t  *attr,          //object  to  set  thread  attributes  (NULL  for  default)  void  *(*func)(void  *),        //routine  executed  after  creation    void  *arg      //a  single  argument  passed  to  func  

)  //returns  error  status  

int  pthread_join(  pthread_t  thread,      //identifier  of  thread  to  wait  for  void  **status      //terminating  thread’s  status  (NULL  to  ignore)  

)  //returns  error  status  

Page 11: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Pthread  Implementa9on  #include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Page 12: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Pthread  Implementa9on  #include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Original  code.  

Page 13: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

#include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Pthread  Implementa9on  

Structure  for  thread  

arguments.  

Page 14: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Pthread  Implementa9on  #include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Func9on  called  when  thread  is  created.  

Page 15: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Pthread  Implementa9on  #include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

No  point  in  crea9ng  thread  if  there  isn’t  

enough  to  do.  

Page 16: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

#include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Pthread  Implementa9on  

Marshal  input  argument  to  

thread.  

Page 17: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Pthread  Implementa9on  #include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Create  thread  to  execute  fib(n–

1).  

Page 18: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

#include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Pthread  Implementa9on  

Main  program  executes  

fib(n–2) in parallel.  

Page 19: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

#include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Pthread  Implementa9on  

Block  un9l  the  auxiliary  thread  

finishes.  

Page 20: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Pthread  Implementa9on  #include  <inttypes.h>  #include  <pthread.h>  #include  <stdio.h>  #include  <stdlib.h>      uint64_t  fib(uint64_t  n)  {        if  (n  <  2)  {            return  n;        }  else  {          uint64_t  x  =  fib(n-­‐1);          uint64_t  y  =  fib(n-­‐2);          return  (x  +  y);      }  }    typedef struct { uint64_t input; uint64_t output; } thread_args;   void *thread_func(void *ptr) { uint64_t i = ((thread_args *) ptr)->input; ((thread_args *) ptr)->output = fib(i); return NULL; }

int  main(int  argc,  char  *argv[])  {   pthread_t thread; thread_args args; int status; uint64_t result; if (argc < 2) { return 1; } uint64_t n = strtoul(argv[1], NULL, 0); if (n < 30) {

result = fib(n); } else { args.input = n-1; status = pthread_create(&thread, NULL, thread_func, (void*) &args); // main can continue executing if (status != NULL) { return 1; } result = fib(n-2); // Wait for the thread to terminate. status = pthread_join(thread, NULL); if (status != NULL) { return 1; } result += args.output; } printf("Fibonacci  of  %"  PRIu64  "  is  %"  PRIu64  ".\n",                          n,  result); return 0; }

Add  the  results  together  to  produce  the  final  output.  

Page 21: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Issues  with  Pthreads  

Overhead  The  cost  of  creaDng  a  thread  >104  cycles  ⇒  coarse-­‐grained  concurrency.    (Thread  pools  can  help.)  

Scalability    

Fibonacci  code  gets  at  most  about  1.5  speedup  for  2  cores.    Need  a  rewrite  for  more  cores.      

Modularity   The  Fibonacci  logic  is  no  longer  neatly  encapsulated  in  the  fib()  funcDon.  

Code  Simplicity  

 

Programmers  must  marshal  arguments  (shades  of  1958!  )  and  engage  in  error-­‐prone  protocols  in  order  to  load-­‐balance.  

Page 22: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Concurrency  Pla]orms  A  concurrency  plakorm    should  provide:      

§  an  interface  for  specifying    the  logical  parallelism    of  the  computaDon;  

§  a  runDme  layer  to  automate  scheduling    and  synchronizaDon;  and  

§  guarantees  of  performance  and  resource  uDlizaDon  compeDDve  with  hand-­‐tuned  code.    

linguistic interface"

compiler"

runtime"

user application"

operating system"

hardware"

tools"

Concurrency  Pla]orm  

Page 23: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Modern  Concurrency  Pla]orms*  

1998 2002 2006 2010

1996 2000 2004 2008

OpenMP  

Cilk-­‐5,  MIT  

Java  Fork/Join  Framework,  Doug  Lea  

StreamIt,  MIT  

Fortress,  Oracle  Labs  PPL,  MicrosoT  

JCilk,  MIT  X10,  IBM  

TPL,  MicrosoT  

Cilk++,  CilkArts/Intel  Habanero  Java,  Rice  

Cilk-­‐M,  MIT  Cilk  Plus,  Intel  

TBB,  Intel  

Sequoia++,    Stanford  

1994

pH,  MIT  

NESL,  CMU  

*  The  list  focuses  on  plakorms  that  run  on  systems  with  shared  memory.    

Habanero  C,  Rice  

CnC,  Intel  

2012

Habanero  Scala,    Rice  

Page 24: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

What  You  Will  Learn  in  This  Class  

Design  and  implementaDon  of  concurrency  plakorms  •  language  model:  TBB,  OpenMP,  and  Cilk  •  analysis  of  a  mulDthreaded  computaDon  •  scheduling  algorithm:  work-­‐stealing  scheduler  •  implementaDon  of  work-­‐stealing  schedulers  •  implementaDon  of  "cactus  stack"  •  thread-­‐safe  storage  allocator  

Page 25: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

What  You  Will  Learn  in  This  Class  

SynchronizaDons  –  wriDng  a  safe  parallel  program:  •  what  are  races  and  different  kinds  of  races  •  implementaDon  of  synchronizaDon  mechansims  •  race  detecDon  in  Cilk  •  parallel  race  detecDon  •  race  detecDon  for  pthreaded  code  •  sequenDal  consistency  and  linearizability  •  concurrent  data  structures  

Page 26: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

What  You  Will  Learn  in  This  Class  

Performance  engineer  your  parallel  programs:  •  scalability  analysis  •  understanding  the  memory  subsystem:  cache  coherence  protocols  

•  Hardware  memory  model  •  Language-­‐level  memory  model  

Page 27: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Course  Logis9cs  

•  Course  webpage:    h_p://classes.engineering.wustl.edu/cse539/web/  

•  Piazza  page:    h_ps://piazza.com/wustl/spring2015/cse539s/home  –  Please  accept  the  piazza  invita9on  ASAP.  

Page 28: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Assignments  

•  1  homework  for  the  enDre  term,  but  you’d  want  to  start  early  …  

•  3  assigned  projects  •  1  final  (group)  project  •  There  is  NO  exam.  

Page 29: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Grading  Policy  

•  Homework  (10%)  •  Projects  (15%  each)  •  Final  project  (20%)  •  Class  parDcipaDon  (25%)  – scribe  notes  –  in-­‐class  discussion  

Page 30: Lecture1:Introduc9on · TechnologyScaling 0 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 1970 1975 1980 1985 1990 1995 2000 2005 2010 ! Transistors’x’1000’

Collabora9on  Policy  

•  Please  read  the  policy  posted  on  the  webpage.