CS 178, Spring 2009: Solutions to Homework 5 and Homework 6


Homework 5 Solutions:

Ques.1: Refer to examples in notes, and examples (and questions with solutions provided) in textbook.

Ques.2: Each record has 100 bytes, and disk blocks are 512 bytes; therefore
        number of records per block (blocking factor) is 512/100 = 5.  Since 5 records per block and
        10,000 records, we have file size N=  10,000/5=2,000 disk blocks.

Ques.3: Successful lookup times for different file organizations. (refer to textbook and notes).
In the example file, we have n=10,000 records, with file size N=2,000 disk blocks and p=5 records fit on the disk block of size 512 and 4 bytes addressing for pointers to disk records.

      
   
 



  Homework 6 Solutions
 

Ques.1: The right hand side (RHS) is a selection over the set r1 U r2 (r1 union r2) which could  be a large set since union if union performed first. The left hand side (LHS) first performs selection on each relation, therefore only a small set of tuples may satisfy sigma_P(r1) and  sigma_P(r2). Thus the intermediate result is smaller, requiring less disk space and fewer accesses to the disk.
   

Ques.2:
        For r1: 5,000 tuples and 10 per block = 500 blocks
        For r2: 10,000 tuples and 25 per block = 400 blocks

                of r1 ==> 400 X 500 accesses to r1 plus 400 for r2
                Total no. of disk accesses = 200,400
                However, if we have r1 at outer loop then: for each block of r1, read every tuple
                of r2 ==> 500 X 400 accesses to r2 plus 500 for r2 = 200,500

        For sorting the two files, use the external sorting algorithms assuming M buffers (M-way merge sort).
        Time to sort a file with n blocks is  2n log_M disk accesses--
        to sort r1 we need  2*500 log_M 500
        to sort r2 we need 2*400 log_M 400
        Total time = 1800 + 1000 log_M 500 + 800 log_M 400
       Since we are not given any size of the main memory, we can assume that M=2 (i.e., 2 input buffers and 1 out put buffer).  If M=2, log_2 400=9 and log_2 500=9 therefore total time =         1800 + 9000 + 7200. (If on the other hand, M=10 then we get log_10(400)=3 and log_10(500)=3 and the time is 1800+3000 + 2400.)

                total of 400 +500 =900 disk accesses and write back all blocks after
                computing hash value h(k) for another 900 accesses.
                Next compute join using hash table ==> each tuple is read from bucket, and each block
                in bucket is read exactly once ==> 500 accesses into r1 and 400 access into r2.
                Therefore total time for Hash join = 3*900 = 2700 disk accesses. Note that we are assuming enough memory to hold all the buckets.

Ques.3:  We can provide a number of access plans for this. Firstly, note that no statistics are provided for us -- this makes it difficult to estimate the sizes of the intermediate results.