Ques 1. For the following code fragment, assume that all data references are shown, that all values are defined before use, and that only b and c are used again after this segment. You may ignore any possible exceptions. The individual statements are numbered to provide an easy reference.
1.
if (a > c) {
2.
d = d + 5;
3.
a= b + d +e;}
} else {
4.
e = e + 2;
5.
f = f + 2;
6.
c = c + f;
}
7. b = a + f;
(a) List the control dependencies in the code. For each
control dependence, tell whether the dependent statement can be scheduled
before the if statement based on the data references.
(b) Assume a dynamically scheduled multiple-issue processor
without speculation and with a window (i.e., scope of ILP) that is holding the
entire code fragment. Find the data dependencies and use this information to
make a list of the successive groups of statements that are issued together.
(c) It is given that only the values b and c are live after
the code segment. If it is known that a value is not live at some point in the
code, then the statement that defines that value can be deleted without
changing the program meaning. Find any values that are not live within the code
fragment, and list the statement(s) that a compiler with this information could
delete.
(d) How does the result for part (c) above affect the ILP
achieved by the processor in part (b) ? What does this illustrate about
measuring computer performance factors such as ILP, making hardware design
choices, and compiler technology ?
Ques 2.
Consider the following loop. First, list all the dependencies. Can this loop,
as written, be unrolled ? Explain. Can this loop be rewritten so loop unrolling
can be applied ? Explain.
….
For (i=1; i<100; i=i+1){
A[i] = A[i] + B[i]; /* S1 */
B[i]= 2*B[i]; /* S2 */
C[i] = a[i-1] + b[i-1]; /*S3 */
D[i] = D[i] * A[i-1];
/* S4 */
}
Practice Problems: (will
not be graded so do not turn these in; but you must try to solve these since
they will help you understand the material).
PP1:
Consider the following segment of code, and assume you have to schedule it on a
superscalar processor using Tomasulo's method.
load R0, (R4)
add R1, R0, #10
load R2, (R5)
mul R0,
R2, #10
mul R4,
R2, R1
store R0, (R5)
Assume you have a superscalar processor
with multiple functional units and you have to schedule the above code using
the Tomasulo algorithm. Assume the processor has two
reservation stations for the load/store, two reservation stations for Add, and
two reservation stations for multiplication. However, the processor has only
one load unit with a latency of 2 cycles, one add unit with latency of 1 cycle,
and one multiplication unit with latency of 2 cycles. Further, assume that data
is written to the Common data bus (CDB) after the execution is complete in the
functional units; i.e., the results are available for the next instruction one
cycle after the functional unit completes. For example, if a load is issued and
started in cycle 1 then it completes in cycle 3, releasing the functional unit,
and the instruction using its result can start in cycle 4.
PP2:
Consider the following piece of code, and derive the branch prediction accuracy
for the two branch predictors defined in what follows.
while (i<100) do
{ /* Branch b1 */
i = i+1;
}
while (i<150) do {
/* branch b2 */
c[i] = c[i]*10;
i = i+1;
}
Next, assume you have a 2 bit branch predictor as given in Figure 2.4. Compute
the branch prediction accuracy for the above code assuming both branches
history bits are set to NT.
PP3: Case study 1 from textbook.