CS 211: Fall 2008. Homework 1. Due September 23rd 6pm. Submit using Blackboard only – no hardcopy submissions.

 

Questions 1 and 2 are required. Question 3 will not be graded and you can collaborate on question 3 only.

Ques.1: Consider the process of improving the performance of a program by optimizing some of the code in the program and running it in an enhanced (i.e., optimized) form. Suppose that optimized instructions run 20 times faster than sequential.

  • Derive the speedup equation when x% of the instructions are optimized.
  • Determine the percentage of code that must be optimized to get a speedup of 2,5, and 10 respectively.
  • If 25% of the code cannot be optimized (due to the inherently sequential nature such as I/O etc.), then what is the maximum speedup you can achieve.

Ques.2: This question requires that you start using the simplescalar simulation environment. Refer to link posted on the course webpages for simplescalar information. 

Use the C compiler (gcc is the default for simplescalar) and compile the code for matrix multiplication (for the pseudo code shown below for one the machines supported by simplescalar (PISA is what is recommended and easier to work with). Compile the code in both optimized and unoptimized mode – read the gcc instructions to figure out the various optimization levels. You should use the base configurations of simplescalar – you do not need to examine optimizations of the processor (this will be addressed in the first project).

  • Find the instruction count, instruction cycles, and data accesses for both the optimized and unoptimized versions. You should write down your analysis (including what optimizations you applied etc.) –simply submitting only the output of your runs will result in zero points.

You must use the following algorithm for matrix multiplication

For i=1 to N do

{

            for j=1 to N do

            {

 C[i,j] = 0;

 for k=1 to N do

            C[i,j] = C[i,j] + A[i,k]*B[k,j];

}

            }

 

Ques.3: This question requires you to generalize Amdahl’s law to the case when multiple enhancements are possible. Three enhancements with the following speedups are proposed for a new architecture:

Speedup1=30, Speedup2= 20, Speedup3 = 15.

Only one enhancement is usable at a time (but multiple can be used over the entire application). If enhancements 1 and 2 are each usable for 25% of the time, what fraction of the time must enhancement 3 be used to achieve an overall speedup of 10 for the entire application ?