Codeword: A binary string representing either the whole coded data
or one coded data symbol
Coded Bitstream: the binary string representing the whole coded data.
Lossless Compression: 100% accurate reconstruction of the original data
Lossy Compression: The reconstruction involves errors
which may or may not be tolerable
Bit Rate: Average number of bits per original data element after compression
Signal-to-Noise Ratio (SNR) in the case of lossy compression.
Let I be an original signal (e.g., an image), and
R be its lossily reconstructed
counterpart. SNR is defined to be:
An image is a matrix of numbers, each number called a pixel (short for picture element)
a binaty image (or black-and-white B/W image) is an image where every pixel can have one of two values only (typically 0 and 1).
a grayscale image is a non-color image but there various shades of gray. That is what we typically refer to informally when we talk aboiut the (old) back-and-wite TVs/cameras/photos.
A color image is an image where every pixel can have a color.
Theorem (Newton): Every color is a combination of three colors (such as Red, Green and Blue), called the basic colors.
Therefore, in color images, every pixel is represented with three (numerical) components, such as (R,G,B),
where R represents how much red there is in that pixel, G how much green, and B how much blue.
The spatial resolution of an image is:
the total number of pixels in the image (like you hear a megapixel image,
or a 6-megapixel camera, ...); or
the number of pixels per row and per column, like when one says this image is 512 x 1000 image, which means it has 512 rows and 1000 columns, which also means that every row has 1000 pixels and every column has 512 pixels; or
number of pixels per inch; for binary images (like in fax machines, basic scanners, and old
dot-matrix printers), it is called "dot per inch (dpi)".
Note: For a fixed physical size image, the higher the spatial resolution, the more (and smaller) the pixels are, and thus the better the quality (and detail) of the image.
The density resolution (or bit depth) is the number of bits per pixel. The higher
the density resolution, the more colors (or shades of gray) can be represented, and thus
the crisper or more detailed the image is.
A video is a sequence of images. Every image in the sequence is called a frame.
A video is captured/displayed at a certain rate, called the frame rate.
A typical rate that does not show jerkiness is 30 frames per second (fps). For higher definition (of motion), the rate can be higher. Lowere rates, likes 20 fps and even 15 fps, were used before when communications and computers were slower.
Lower than 15 fps rates would be quite jerky and unacceptable.
A sound/audio (digital) signal is a sequence of values, called samples,
where every sample is the intensity of the recorded sound at the corresponding moment in time.
The sampling rate of a sound is the number of samples per second.
The CD quality sampling rate is 44.1K samples per second (or 44.1 KHz),
usually at 16 bits per sample, though 24 bits per sample is now common.
In digital sound used for miniDV, digital TV, and DVD, the sampling rate is 48 KHz.
In DVD-Audio and in Blu-ray audio tracks, the sampling rate is 96 KHz or 192 KHz.
When the sampling rate is infinity, the signal becomes what is called "analog signal".
When signal is captured as an analog signal, it can be converted to a digital signal by
an analog-to-digital converter (also called A/D converter or digitizer)
If a digital signal is fed into a digital-to-analog converter (also called D/A converter), the output is obviously an analog signal.
Modulators are D/A converters, and demodulators are A/D converters. So, a modem is both an A/D and D/A converter.
Neighboring pixels tend to exhibit high correlations
Techniques: Decorrelation and/or processing in the frequency
domain
Spatial decorrelation converts correlations into symbol- or
block-redundancy
Frequency domain processing addresses visual redundancy
(see the next slide)
Inter-Pixel Temporal Redundancy (in Video)
Often, the majority of corresponding pixels in successive
video-frames are identical over long spans of frames
Due to motion, blocks of pixels change in position but not
in values between successive frames
Thus, block-oriented motion-compensated redundancy reduction
techniques are used for video compression
Visual Redundancy
The human visual system (HVS) has certain limitations
that make many image contents invisible.
Those contents, termed visually redundant, are the target
of removal in lossy compression.
In fact, the HVS can see within a small range of spatial
frequencies: 1-60 cycles/arc-degree
(Plot by hand the contrast sensitivity function)
Approach for reducing visual redundancy in lossy compression
Transform: Convert the data to the frequency domain
Discrete Memoryless Source S: A data generator where the alphabet
is finite and the symbols generated are independent of
one another. Assume the alphabet is {a1,a2,...,an}
Let pk = Probability that symbol ak is generated
(transmitted) by the source
Theorem (Shannon): H(S) is the minimum average number of bits/symbol possible
That is, no matter which lossless compression is ever invented, its bitrate can never be better (smaller)
than H(S) for any memoryless source S.
Sources with Memory: Presence of inter-symbol correlation
Their entropy is still the min average number of bits/symbol
Adjoint Source of Order N
Treat each possible block A of N symbols as a
macrosymbol, and compute the probability PA
Treat the source as a memoryless source consisting of
the macrosymbols A's and their probabilities PA
The entropy
Theorem (Shannon): For any source S with memory,
as N
This implies that for any source S with memory (i.e., with inter-symbol correlation/redundancy),
if we divide it into blocks of large enough size and then block-code it without taking advantage of
inter-block correlation, then we can approximate the performance of any other coder for the source.