Floating Point Numbers& Parallel Computing
2
Outline
• Fixed-point Numbers• Floating Point Numbers• Superscalar Processors• Multithreading• Homogeneous Multiprocessing• Heterogeneous Multiprocessing
3.141592653589793238462643383…
3
Fixed-point Numbers
• How to represent rational numbers in binary?• One way: define binary “point” between integer and fraction• Analogous to point between integer and fraction for decimal
numbers:
6.75
integer point fraction
4
Fixed-point Numbers
• Point’s position is static (cannot be changed)• E.g., point goes between 3rd and 4th bits of byte:
0110.1100
4 bits for integer
component
4 bits for fraction
component
5
Fixed-point Numbers
• Integer component: binary interpreted as before• LSB is 20
0110.1100
= 22 + 21
= 4+2= 6
6
Fixed-point Numbers
• Fraction component: binary interpreted slightly differently• MSB is 2-1
0110.1100
= 2-1 + 2-2
= 0.5 + 0.25
= 0.75
7
Fixed-point Numbers
0110.1100= 2-1 + 2-2
= 0.5 + 0.25
= 0.75
= 22 + 21
= 4+2= 6
6.75
8
Fixed-point Numbers
• How to represent negative numbers?• 2’s complement notation
-2.375
1101.1010
9
Fixed-point Numbers
1. Invert bits2. Add 13. Convert to fixed-point decimal4. Multiply by -1
1101.1010
0010.0101
0010.0110
= 2-2 + 2-3
= 0.25 + 0.125
= 0.375
21 = 2
2.375-2.375
10
Outline
• Fixed-point Numbers• Floating Point Numbers• Superscalar Processors• Multithreading• Homogeneous Multiprocessing• Heterogeneous Multiprocessing
3.141592653589793238462643383…
11
Floating Point Numbers
• Analogous to scientific notation• E.g., 4.1 × 10 3 = 4100
• Gets around limitations of constant integer and fraction sizes
• Allows representation of very small and very large numbers
12
Floating Point Numbers
• Just like scientific notation, floating point numbers have:• sign (±)• mantissa (M)• base (B)• exponent (E)
4.1 × 10 3 = 4100
M = 4.1
B = 10
E = 3
13
Floating Point Numbers
• Floating point numbers in binary
32 bits
sign 1 bit
exponent8 bits
mantissa23 bits
14
Floating Point Numbers
• Example: convert 228 to floating point
228 = 1110 0100 = 1.1100100 × 27
sign = positiveexponent = 7mantissa = 1.1100100base = 2 (implicit)
15
Floating Point Numbers
228 = 1110 0100 = 1.1100100 × 27
sign = positive (0)exponent = 7mantissa = 1.1100100base = 2 (implicit)
0 0000 0111 11100100000000000000000
16
Floating Point Numbers
• In binary floating point, MSB of mantissa is always 1• No need to store MSB of mantissa (1 is implied)
• Called the “implicit leading 1”
0 0000 0111 11100100000000000000000
0 0000 0111 11001000000000000000000
17
Floating Point Numbers
• Exponent must represent both positive and negative numbers• Floating point uses biased exponent
• Original exponent plus a constant bias• 32-bit floating point uses bias 127
• E.g., exponent -4 (2-4) would be -4 + 127 = 123 = 0111 1011• E.g., exponent 7 (27) would be 7 + 127 = 134 = 1000 0110
0 0000 0111 11001000000000000000000
0 1000 0110 11001000000000000000000
18
Floating Point Numbers
• E.g., 228 in floating point binary (IEEE 754 standard)
0 1000 0110 11001000000000000000000
sign bit = 0
(positive)
8-bit biased exponentE = number – bias E = 134 – 127 = 7
23-bit mantissa without implicit leading 1
19
Floating Point Numbers
• Special cases: 0, ±∞, NaN
value sign bitexponen
tmantiss
a
0 N/A0000000
000…000
+∞ 01111111
100…000
-∞ 11111111
100…000
NaN N/A1111111
1non-zero
20
Floating Point Numbers
• Single versus double precision• Single: 32-bit float
• Range: ±1.175494 × 10-38 ---> ±3.402824 × 1038
• Double: 64-bit double• Range: ±2.22507385850720 × 10-308
---> ±1.79769313486232 × 10308
# bits (total)
# sign bits
# exponent bits
# mantissa bits
float 32 1 8 23
double 64 1 11 52
21
Outline
• Fixed-point Numbers• Floating Point Numbers• Superscalar Processors• Multithreading• Homogeneous Multiprocessing• Heterogeneous Multiprocessing
3.141592653589793238462643383…
22
Superscalar Processors
• Multiple hardwired copies of datapath• Allows multiple instructions to execute simultaneously • E.g., 2-way superscalar processor
• Fetches / executes 2 instructions per cycle• 2 ALUs• 2-port memory unit• 6-port register file (4 source, 2 write back)
23
Superscalar Processors
• Datapath for 2-way superscalar processor
2 ALUs
2-port memory
unit
6-port register file
24
Superscalar Processors
• Pipeline for 2-way superscalar processor• 2 instructions per cycle:
25
Superscalar Processors
• Commercial processors can be 3, 4, or even 6-way superscalar• Very difficult to manage dependencies and hazards
Intel Nehalam (6-way superscalar)
26
Outline
• Fixed-point Numbers• Floating Point Numbers• Superscalar Processors• Multithreading• Homogeneous Multiprocessing• Heterogeneous Multiprocessing
3.141592653589793238462643383…
27
Multithreading (Terms)
• Process: program running on a computer• Can have multiple processes running at same time • E.g., music player, web browser, anti-virus, word processor
• Thread: each process has one or more threads that can run simultaneously• E.g., word processor: threads to read input, print, spell check, auto-save
28
Multithreading (Terms)
• Instruction level parallelism (ILP): # of instructions that can be executed simultaneously for program / microarchitecture • Practical processors rarely achieve ILP greater than 2 or 3
• Thread level parallelism (TLP): degree to which a process can be split into threads
29
Multithreading
• Keeps processor with many execution units busy• Even if ILP is low or program is stalled (waiting for memory)
• For single-core processors, threads give illusion of simultaneous execution• Threads take turns executing (according to OS)• OS decides when a thread’s turn begins / ends
30
Multithreading
• When one thread’s turn ends: -- OS saves architectural state-- OS loads architectural state of another thread-- New thread begins executing
• This is called a context switch• If context switch is fast enough, user perceives threads as
running simultaneously (even on single-core)
context switch
context switch
31
Multithreading
• Multithreading does NOT improve ILP, but DOES improve processor throughput• Threads use resources that are otherwise idle
• Multithreading is relatively inexpensive• Only need to save PC and register file
idle
next task…
vs
32
Outline
• Fixed-point Numbers• Floating Point Numbers• Superscalar Processors• Multithreading• Homogeneous Multiprocessing• Heterogeneous Multiprocessing
3.141592653589793238462643383…
33
Homogeneous Multiprocessing
• AKA symmetric multiprocessing (SMP)• 2 or more identical processors with single shared memory• Easier to design (than heterogeneous)
• Multiple cores on same (or different) chip(s)• In 2005, architectures made shift to SMP
34
Homogeneous Multiprocessing
• Multiple cores can execute threads concurrently• True simultaneous execution• Multi-threaded programming can be tricky..
core #1
core #2
core #3
core #4
single-core
multi-core
threads w/ single-core vs. multi-core
35
Outline
• Fixed-point Numbers• Floating Point Numbers• Superscalar Processors• Multithreading• Homogeneous Multiprocessing• Heterogeneous Multiprocessing
3.141592653589793238462643383…
36
Heterogeneous Multiprocessing
• AKA asymmetric multiprocessing (AMP)
• 2 (or more) different processors• Specialized processors used for specific tasks• E.g., graphics, floating point, FPGAs
• Adds complexity
Nvidia GPU
37
Heterogeneous Multiprocessing
• Clustered: • Each processor has its
own memory• E.g., PCs connected on a
network
• Memory not shared, must pass information between nodes…• Can be costly