Computer Organization and Structure

Computer Organization and Structure

Homework #4

Due: 2010/12/21

1. Different instructions utilize different hardware blocks in the basic single-cycle implementation. The next three problems refer to the following instruction:

	Instruction	Interpretation
a.	add Rd, Rs, Rt	Reg[Rd]=Reg[Rs]+Reg[Rt]
b.	lw Rt, Offs(Rs)	Reg[Rt]=Mem[Reg[Rs]+Offs]

説明: f04-02-P374493

Figure 1: The basic implementation of the MIPS subset, including the necessary multiplexors and control lines.

a. What are the values of control signals generated by the control in Figure 1 for this instruction?

b. Which resources (blocks) perform a useful function for this instruction?

c. Which resources (blocks) produce outputs, but their outputs are not used for this instruction? Which resources produce no outputs for this instruction?

Different execution units and blocks of digital logic have different latencies (time needed to do their work). In Figure 1 there are seven kinds of major blocks. Latencies of blocks along the critical (longest-latency) path for an instruction determine the minimum latency of that instruction. For the following three problems, assume the following resource latencies:

	I-Mem	Add	Mux	ALU	Regs	D-Mem	Control
a.	400ps	100ps	30ps	120ps	200ps	350ps	100ps
b.	500ps	150ps	100ps	180ps	220ps	1000ps	65ps

d. What is the critical path for a MIPS AND instruction?

e. What is the critical path for a MIPS load (LD) instruction?

f. What is the critical path for a MIPS BEQ instruction?

2. Assuming that logic blocks needed to implement a processor’s datapath have the following latencies:

	I-Mem	Add	Mux	ALU	Regs	D-Mem	Sign-extend	Shift-left-2
a.	400ps	100ps	30ps	120ps	200ps	350ps	20ps	2ps
b.	500ps	150ps	100ps	180ps	220ps	1000ps	90ps	20ps

説明: 説明: f04-06-P374493

Figure 2: A portion of the datapath used for fetching instructions and incrementing the program counter. The fetched instruction is used by other parts of the datapath.

a. If the only thing we need to do in a processor is fetch consecutive instructions (Figure 2), what would the cycle time be?

b. Consider a datapath similar to the one in Figure 3, but for a processor that only has one type of instruction: unconditional PC-relative branch. What would the cycle time be for this datapath?

c. Repeat the above problem, but this time we need to support only conditional PC-relative branches.

The remaining three problems refer to the following logic block (resource) in the datapath:

	Resource
a.	Add 4 (to the PC)
b.	Data Memory

説明: 説明: f04-11-P374493

Figure 3: The simple datapath for the MIPS architecture combines the elements required by different instruction classes.This datapath can execute the basic instructions (load-store word, ALU operations, and branches) in a single clock cycle.

d. Which kinds of instructions require this resource?

e. For which kinds of instructions (if any) is this resource on the critical path?

f. Assuming that we only support beq and add instructions, discuss how changes in the given latency of this resource affect the cycle time of the processor. Assume that the latencies of other resources do not change.

3. The following problems refer to the following sequence of instructions:

Instruction sequence

lw $1, 40($6)

add $6, $2, $2

sw $6, 50($1)

lw $5, -16($5)

sw $5, -16($5)

add $5, $5, $5

a. Indicate dependences and their type.

b. Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop instructions to eliminate them.

c. Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate them.

The remaining problems assume the following clock cycle times:

	Without forwarding	With full forwarding	With ALU-ALU forwarding only
a.	300ps	400ps	360ps
b.	200ps	250ps	220ps

d. What is the total execution time of this instruction sequence without forwarding and with full forwarding? What is the speed-up achieved by adding full forwarding to a pipeline that had no forwarding?

e. Add nop instructions to this code to eliminate hazards if there is ALU-ALU forwarding only (no forwarding from the MEM to the EX stage)?

f. What is the total execution time of this instruction sequence with only ALU-ALU forwarding? What is the speed-up over a no-forwarding pipeline?

4. Assuming that the breakdown of dynamic instructions into various instruction categories is as follows:

	R-Type	beq	jmp	lw	sw
a.	50%	15%	10%	15%	10%
b.	30%	10%	5%	35%	20%

Also, assume the following branch predictor accurancies:

	Always-taken	Always not-taken	2-bit
a.	40%	60%	80%
b.	60%	40%	95%

a. Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due to mispredicted branches with the always-taken predictor? Assume that branch outcomes are determined in the EX stage, that there are no data hazards, and that no delay slots are used.

b. Repeat the above problem for the “always not-taken” predictor.

c. Repeat the above problem for the “2-bit” predictor.

d. With the 2-bit predictor, what speed-up would be achieved if we could convert half of the branch instructions in a way that replaces a branch instruction with an ALU instruction? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.

e. With the 2-bit predictor, what speed-up would be achieved if we could convert half of the branch instructions in a way that replaced each branch instruction with two ALU instructions? Assume that correctly and incorrectly predicted instructions have the same chance of being replaced.

f. Some branch instructions are much more predictable than others. If we know that 80% of all executed branch instructions are easy-to-predict loop-back branches that are always predicted correctly, what is the accuracy of the 2-bit predictor on the remaining 20% of the branch instructions?