Upload
mohammad-gulam-ahamad
View
236
Download
0
Embed Size (px)
7/27/2019 32bit power PC.ppt
1/22
Comparison instructions
Branch and jump instructions
Simple Code Sequences
1
7/27/2019 32bit power PC.ppt
2/22
Where Are Branches Used?In C control statements If statement
if(n > 0) {
} else {
}
While loopwhile (s != NULL) {
}
For loopfor (i = 0; i < N; i++) {
}
Do loopdo {
}while (s != NULL)
Otherse.g. max = (x > y) ? x : y;
2
7/27/2019 32bit power PC.ppt
3/22
Comparison InstructionsTo set up conditions in CR or XER bits Set by arithmetic/logic/shift instructions with . suffix Set by comparison instructions
Compare signed word and unsigned wordcmpw r3, r4 ; set CR0 as for signed r3-r4cmplw r3, r4 ; set CR0 as for unsigned r3-r4
Cmplw: compare logical
Compare using immediate valuescmpwi r3, 200 ; set CR0 as for signed r3-200cmplwi r3, 200 ; set CR0 as for unsigned r3-200
3
7/27/2019 32bit power PC.ppt
4/22
Comparison InstructionsCompare and set specific condition registers
Comparison may specify which CR field to use
cmpw cr3, r3, r4 ; set CR3 instead of CR0
cmplwi cr2, r3, r4 ; logical and using immediate; and set CR2
cmpw cr0, r3, r4 ; equivalent to cmpw r3, r4
4
CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7
LT GT EQ SO
7/27/2019 32bit power PC.ppt
5/22
Branch Basic Termsbranch condition, branch-target
Unconditional branches Always jump to the target address
Conditional branches Take the branch only if some condition holds
Target address Determining the address of the next instruction
5
7/27/2019 32bit power PC.ppt
6/22
Unconditional Branches Unconditional branches
C Assembly
while (1) { loop:addi r9, r9, 1X=X+1;} b loop
6
(-4)
The target loop is specified as an offset from the curre
instruction (PC-relative).
7/27/2019 32bit power PC.ppt
7/22
Conditional BranchesCommonly used branches
Use condition register CR0 LT, GT, EQ, SO
Common forms: ble target_address ble: branch if less then or equal GT=0
blt: branch if less then LT=1
beq: branch if equal EQ=1
bne: branch if not equal EQ=0 bge: branch if greater than or equal to LT=0
bgt: branch if greater thanGT=1
All encoded in the same instruction format (see next)
7
7/27/2019 32bit power PC.ppt
8/22
Conditional BranchesUsing CR fieldsbne cr2, target ; branch if EQ of CR2 is zero
Example: using branch with comparison instructionsloop:
addi r3, r3, 1 ; increase r3cmpw r3, r4 ; compare r4bne target ; branch if r3 != r4
Example: using different CR fieldloop:
addi r3, r3, 1 ; increase r3cmpwcr3, r3, r4 ; compare using cr3bne cr3, target ; branch if r3 != r4
8
7/27/2019 32bit power PC.ppt
9/22
Determining Target Address
1. PC-relative: next PC = PC + EXTS(PC-Offset || 0b00)2. Absolute: next PC = EXTS(PC-Offset || 0b00);
3. Register: next PC = value of register Can use two special registers: LR or CTR
Why sign-extension of an address (for absolute)?
Are addresses ever negative?
Upper address space usually reserved for I/Oaddresses (say oxff000000 onwards).
0xff00 gets sign-extended to 0xffffff00.
9
7/27/2019 32bit power PC.ppt
10/22
Determining Target AddressUse PC-relative or absolute addressing: a suffix
Use PC-relative address:
Use absolute address: ba loop
Update LR option: l suffix If updating, save PC+4 into LR
Do not update LR: b target_addr
Update LR: bl func_addr Update LR and use absolute address: bla func_addr
When do we want to save PC+4?
10
7/27/2019 32bit power PC.ppt
11/22
Underlying Details
bx: encodes 24-bit address (26-bit effective)
bcx: encodes 14-bit address (16-bit effective)bclrx: uses LR register as target addressbcctrx: uses CR register as target addressx:representing AA and LK bits, e.g. l, a, la
11
16 BO AA LK
0-5 6-10 30 31
bcx BI BD
11-15 16-29
19 BO LKbclrx BI 00000 16
19 BO LKbcctrx BI 00000 528
18 PC-Offset AA LKbx
Instruction format
7/27/2019 32bit power PC.ppt
12/22
Underlying Details BO: Branch options
Encodes branching on TRUE or FALSE or on CTR values
BI: Index of the CR bit to use five bits index to 32 CR bits, 3-bit for CR index, 2-bit to select LT,
GT, EQ, or SO
BD: Branch displacement
14-bit (16-bit effective), signed-extended
AA: absolute address bit 1 use absolute addressing; 0 use PC-relative addressing
LK: link bit 1 update LR with PC+4; 0 do not update
12
16 BO AA LKbcx BI BD
Instruction Fields
7/27/2019 32bit power PC.ppt
13/22
Underlying DetailsFrequently used BO encoding in bc, bclr, and bcctr BO=00100 (4): branch if the condition is false BO=01100 (12): branch if the condition is true
BO=10100 (20): branch always BO=10000 (16): decreases CTR then branch if CTR!=0
Examples: blt target_addr bc 12, 0, target_addr blt cr3, target_addr bc 12, 12, target_addr
blr bclr 20, 0: unconditional branch to addr in LR bnelr target_addr bclr 4, 2: branch to LR if not equal
Explanation: bc 4, 14, target_addr: branch if bit 14 inCR (CR3[EQ]) is false (because BO=4) bne cr3,target_addr
13
BO and BI Fields
7/27/2019 32bit power PC.ppt
14/22
Underlying Details
Branch examples using AA and LK bits (zeros by default)
bl target_addr ; branch and save PC+4 in LRba target_addr ; branch using absolute addressing
bla target_addr ; branch using absolute addressing
; and save PC+4 in LR
14
16 BO AA LK
0-5 6-10 30 31
bcx BI BD
10-15 16-29
19 BO LKbclrx BI 00000 16
19 BO LKbcctrx BI 00000 528
18 Offset AA LKbx
AA and LK fields
7/27/2019 32bit power PC.ppt
15/22
Support Procedure Call/ReturnLink RegisterSupporting function calls
1. A parent function calls a child function: blchild_func LR
7/27/2019 32bit power PC.ppt
16/22
Simple Code SequencesHow to translate:
C arithmetic expressions C ifstatement
C for loops
Function calls (next week)
16
7/27/2019 32bit power PC.ppt
17/22
C Arithmetic ExpressionsBasic operationsstatic int sum;
static int x1, x2;
static int y1, y2;
sum = (x1+x2)-(y1+y2)+100;
Assembly
lwz r3, 4(r13) ; load x1
lwz r0, 8(r13) ; load x2
add r4, r3, r0 ; x1+x2lwz r3, 12(r13) ; load y1
lwz r0, 16(r13) ; load y2
add r0, r3, r0 ; y1+y2
subf r3, r0, r4 ; minusaddi r0, r3, 100; ; add 100
stw r0, 0(r13) ; store sum
17
Q: What would happen if signed is changed to unsigned?
7/27/2019 32bit power PC.ppt
18/22
C Arithmetic ExpressionsSign extensionstatic short sum;
static short x1, x2;
static short y1, y2;
sum = (x1+x2)-(y1+y2) + 100;
Assembly
lha r3, 2(r13) ; load x1
lha r0, 4(r13) ; load x2
add r4, r3, r0 ; x1+x2
lha r3, 6(r13) ; load y1
lha r0, 8(r13) ; load y2
add r0, r3, r0 ; y1+y2
subf r3, r0, r4 ; minus
addi r0, r3, 100 ; add 100
sth r0, 0(r13) ; store sum
18
7/27/2019 32bit power PC.ppt
19/22
If-then-elseC Programif (x > y)
z = 1;
else z = 0;
Assembly
cmpw r3, r4
ble skip1
li r31, 1b skip2
skip1: li r31, 0
skip2:
19
Notes:
Code generated by CodeWarrior and then revised
x r3; y r4; z r31
li r31, 1 => addi r31, 0, 1; li called simplified mnemonic
7/27/2019 32bit power PC.ppt
20/22
If-then-elseC Programstatic int x, y;static int max;if (x y > 0)
max = x;else
max = y;
Assemblylwz r4, 0(r13) ; load ylwz r0, 4(r13) ; load xsubf r0, r4, r0 ; x-ycmpwi r0, 0x0000 ; x-y>0?ble skip1 ; no, skip max=x
lwz r0, 0(r13) ; load xstw r0, 8(r13) ; max=xb skip2 ; skip max=y
skip1: lwz r0, 4(r13) ; load ystw r0, 8(r13) ; max=y
skip2:
20
Notes:
Generated by CodeWarrior and then revised
Can you optimize the code? i.e. reduce number of
instruction but produce the same output
7/27/2019 32bit power PC.ppt
21/22
If-then-elseDisassembled code:Address Binary Assembly
00000048: 7C001800 cmpw r0,r30000004C: 4081000C ble *+1200000050: 3BE00001 li r31,100000054: 48000008 b *+8
00000058: 3BE00000 li r31,00000005C:
Assembly Source:
cmpw r0, r3
ble skip1
li r31, 1
b skip2skip1: li r31, 0
skip2:
21
Binary code
7/27/2019 32bit power PC.ppt
22/22
For loopC codestatic int sum;static int X[100];int i;
sum = 0;for (i = 0; i < 100; i ++)sum += X[i];
Assemblyli r0, 0 ; sum = 0 ; sumr31
stw r0, 0(r13); ; sum = 0li r31, 0 ; ir31b cmp_ ;
loop: slwi r4, r31, 2 ; r4=i*4
lis r3, X@ha ; load X addressori r3, r3, X@lo ; load X addressadd r3, r3, r4 ; X[i] addresslwz r4, 0(r3) ; load X[i]lwz r0, 0(r13) ; load sumadd r0, r0, r4 ; sum+=X[i]stw r0, 0(r13) ; store sum
addi r31, r31, 1 ; increase icmp_: cmpwi r31, 0x0064 ; 0x64 = 100
blt loop(generated by CodeWarrior and then revised)
22
Exercise: (1) How many instructions will be executed? (2) Optimize the code
to reduce the loop body to 4 instructions; (3) further reduce the loop body to 3
instructions. Loop body includes the branch instruction.