Computer Organization and Structure
Homework
#4
Due:
2010/12/21
1. Different
instructions utilize different hardware blocks in the basic single-cycle
implementation. The next three problems refer to the following instruction:
|
Instruction |
Interpretation |
a. |
add
Rd, Rs, Rt |
Reg[Rd]=Reg[Rs]+Reg[Rt] |
b. |
lw Rt, Offs(Rs) |
Reg[Rt]=Mem[Reg[Rs]+Offs] |
Figure
1: The basic implementation of the
MIPS subset, including the necessary multiplexors and control lines.
a.
What are the values of control signals
generated by the control in Figure 1 for this
instruction?
b.
Which resources (blocks) perform a
useful function for this instruction?
c.
Which resources (blocks) produce
outputs, but their outputs are not used for this instruction? Which resources
produce no outputs for this instruction?
Different
execution units and blocks of digital logic have different latencies (time
needed to do their work). In Figure 1 there are seven
kinds of major blocks. Latencies of blocks along the critical (longest-latency)
path for an instruction determine the minimum latency of that instruction. For
the following three problems, assume the following resource latencies:
|
I-Mem |
Add |
Mux |
ALU |
Regs |
D-Mem |
Control |
a. |
400ps |
100ps |
30ps |
120ps |
200ps |
350ps |
100ps |
b. |
500ps |
150ps |
100ps |
180ps |
220ps |
1000ps |
65ps |
d.
What is the critical path for a MIPS AND
instruction?
e.
What is the critical path for a MIPS
load (LD)
instruction?
f.
What is the critical path for a MIPS BEQ
instruction?
2. Assuming
that logic blocks needed to implement a processorfs datapath
have the following latencies:
|
I-Mem |
Add |
Mux |
ALU |
Regs |
D-Mem |
Sign-extend |
Shift-left-2 |
a. |
400ps |
100ps |
30ps |
120ps |
200ps |
350ps |
20ps |
2ps |
b. |
500ps |
150ps |
100ps |
180ps |
220ps |
1000ps |
90ps |
20ps |
Figure
2: A portion of the datapath used for fetching instructions and incrementing
the program counter. The fetched instruction is used by other parts of the datapath.
a.
If the only thing we need to do in a
processor is fetch consecutive instructions (Figure 2), what would
the cycle time be?
b.
Consider a datapath
similar to the one in Figure 3, but for a
processor that only has one type of instruction: unconditional PC-relative
branch. What would the cycle time be for this datapath?
c.
Repeat the above problem, but this time
we need to support only conditional
PC-relative branches.
The
remaining three problems refer to the following logic block (resource) in the datapath:
|
Resource |
a. |
Add
4 (to the PC) |
b. |
Data
Memory |
Figure
3: The simple datapath
for the MIPS architecture combines the elements required by different
instruction classes.This datapath
can execute the basic instructions (load-store word, ALU operations, and branches)
in a single clock cycle.
d.
Which kinds of instructions require this
resource?
e.
For which kinds of instructions (if any)
is this resource on the critical path?
f.
Assuming that we only support beq and add
instructions, discuss how changes in the given latency of this resource affect
the cycle time of the processor. Assume that the latencies of other resources
do not change.
3. The
following problems refer to the following sequence of instructions:
|
Instruction
sequence |
a. |
lw $1, 40($6) add $6, $2, $2 sw $6, 50($1) |
b. |
lw $5, -16($5) sw $5, -16($5) add $5, $5, $5 |
a.
Indicate dependences and their type.
b.
Assume there is no forwarding in this
pipelined processor. Indicate hazards and add nop
instructions to eliminate them.
c.
Assume there is full forwarding.
Indicate hazards and add nop
instructions to eliminate them.
The
remaining problems assume the following clock cycle times:
|
Without
forwarding |
With full
forwarding |
With ALU-ALU
forwarding only |
a. |
300ps |
400ps |
360ps |
b. |
200ps |
250ps |
220ps |
d.
What is the total execution time of this
instruction sequence without forwarding and with full forwarding? What is the
speed-up achieved by adding full forwarding to a pipeline that had no
forwarding?
e.
Add nop
instructions to this code to eliminate hazards if there is ALU-ALU forwarding
only (no forwarding from the MEM to the EX stage)?
f.
What is the total execution time of this
instruction sequence with only ALU-ALU forwarding? What is the speed-up over a
no-forwarding pipeline?
4. Assuming
that the breakdown of dynamic instructions into various instruction categories
is as follows:
|
R-Type |
beq |
jmp |
lw |
sw |
a. |
50% |
15% |
10% |
15% |
10% |
b. |
30% |
10% |
5% |
35% |
20% |
Also,
assume the following branch predictor accurancies:
|
Always-taken |
Always
not-taken |
2-bit |
a. |
40% |
60% |
80% |
b. |
60% |
40% |
95% |
a.
Stall cycles due to mispredicted
branches increase the CPI. What is the extra CPI due to mispredicted
branches with the always-taken predictor? Assume that branch outcomes are
determined in the EX stage, that there are no data hazards, and that no delay
slots are used.
b.
Repeat the above problem for the galways
not-takenh predictor.
c.
Repeat the above problem for the g2-bith
predictor.
d.
With the 2-bit predictor, what speed-up
would be achieved if we could convert half of the branch instructions in a way
that replaces a branch instruction with an ALU instruction? Assume that
correctly and incorrectly predicted instructions have the same chance of being
replaced.
e.
With the 2-bit predictor, what speed-up
would be achieved if we could convert half of the branch instructions in a way
that replaced each branch instruction with two ALU instructions? Assume that
correctly and incorrectly predicted instructions have the same chance of being replaced.
f.
Some branch instructions are much more
predictable than others. If we know that 80% of all executed branch
instructions are easy-to-predict loop-back branches that are always predicted
correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of
the branch instructions?