CS225 September 2, 2008
Youssef
Homework 1
Due Date: September 23, 2008


problem 1: (15 points)

Consider the alphabet {a,b,c,d,e,f} with the following probabilities: P[a]=10/20, P[b]=3/20, P[c]=3/20, P[d]=2/20, and P[e]=P[f]=1/20.

a)      Build the Huffman tree for this alphabet. (Always make the child with the smaller probability the left child. In case of equal probabilities, make the child of smaller alphabetical order to be the left child.)

b)     Code the following string: aaaaaddbcdcbbaaaebcf.

Problem 2: (20 points)

Let p=15/16 and q=1/16. Consider the following 1-st order Markov source with alphabet P[0|0]=P[1|1]=p and P[0|1]=P[1|0]=q. Assume that the probability of the first bit being 0 or 1 is 1/2.

a)      Use the arithmetic coding algorithm (presented in class) to code the following binary sequence from the source in question: 0000001111. Indicate what the interval is after the processing of each bit.

b)      Compute P[00], P[01], P[10], P[11]. Afterwards, perform block-Hoffamn coding on 0000001111 where each block is two bits long. What is the bitrate, and how does it compare to the bitrate in part (a)?


Problem 3: (20 points)

Denote by an the string of n a's. Let x=01316012114012111

a)      Apply run-length encoding on x. Allocate 4 bits to represent each length.

b)      Apply Golomb coding of order m on x, where m=nearest power of 2 of (p*Ln(2))/(1-p), where p is the probability of the more probable symbol.

c)      Apply differential Golomb coding on x. Choose the m that is most appropriate.

d)     Which of the three techniques give the best bitrate?


Problem 4: (20 points)

Give an algorithm for arithmetic decoding, corresponding to the arithmetic coding algorithm presented in class.


Problem 5: (10 points)

Give two different proofs to show that NO lossless compression scheme can compress every binary file (or sequence) by at least one bit. (Hint for one prrof: use a counting technique to count the number of binary sequences of n bits, and those of at most n-1 bits, and observe that the latter is smaller than the former.)


Problem 6: (15 points)

Let x= 01316012114012111 and y=aaaabbbbbbabaabbaaaa

a)     Code x using LZ.

b)     Code y using LZ.

c)     Compare the bitrate of x using LZ with the best bitrate of x in Problem 3.