Module 6: Probability and Simulation
A sample application
A queueing simulation:
- Experiments like coin flips are easy to simulate
=> The simulation code is itself simple.
- The subject of simulation is concerned with more complex systems
=> Here, the simulation code itself may be intricate.
- Queueing systems are classic examples in simulation
=> Many simulation principles can be learned from these systems.
- Consider a simple two-queue example:

- There are two service stations (kiosks).
Each station has a separate queue for customers.
- Customers arrive to the system from outside.
=> Their interarrival times are random.
- Each customer picks a queue to join and waits for service.
- The service time for each customer is random.
- Generally, we simulate for a purpose: to estimate something
- The average waiting time for a customer.
- Is it better for customers to select the shortest queue
or flip a coin to decide which queue to join?
Exercise 1:
Download, compile and execute QueueControl.java.
- You can step through and observe the sequence of events.
- Use animate=false and estimate the average wait time
for customers.
- What queue-selection policy are customers using? Change the
policy to improve the waiting time.
Elements of a simulation:
- Methods for generating randomness
e.g., random interarrivals and random service times.
- How to track time
e.g., advance the clock from one key point to the next.
- How to store key aspects and ignore inessential ones
e.g., store waiting customers, but not departed customers.
- How to record statistics
e.g., collect wait times at customer departure moments.
- What data structures are useful
e.g., a priority queue
Exercise 2:
Consider the time spent in the system by each customer.
The simulation computes the average "system time."
- Is this a discrete or continuous quantity?
- Are the system times of two successive customers independent?
Before learning how to write simulations, we will need to
understand a few core concepts in probability:
- Random variables and distributions.
- Expectation and laws of large numbers.
- Dependence, joint and conditional distributions.
- Statistics: how to estimate and know how many samples you need.
Random variables
A random variable is an observable numeric quantity from
an experiment:
- Coin flip example:
- "Heads" and "Tails" are not numeric, but 1 and 0 are.
- For multiple coin flips, one can count the number of heads.
- More coin-flip related random variables:
- How many flips does it take to get two heads in a row?
Random variable = number of such flips
- How many heads in 10 coin flips?
Random variable = number of heads
- Dice example:
- Sum of numbers that show up in two rolls
Random variable = the sum
- Bus-stop example:
- How long before the next bus arrives?
Random variable = actual waiting time
- How many buses arrive in a 10-minute period?
Random variable = number of such buses.
Exercise 3:
Identify some random variables of
interest in the Queueing example.
Range of a random variable: the possible values it can take
- Example: number of heads in 10 flips
=> range = {0,1,2,...,10}
- Example: sum of two dice
=> range = {2,3,...,12}
Exercise 4:
What are the ranges for these random variables:
- The number of buses that arrive in a 10-minute period.
- The waiting time for the next bus to arrive.
Notation
- Sometimes we use the short form rv = random variable.
- Typically, we use capital letters to denote rv's:
- Let X = number of heads in 10 coin flips.
- Let Y = the waiting time for the next bus to arrive.
Let's look at the n-coin flip example in more detail:
- Let X = number of heads in n coin flips.
- Suppose Pr[heads] = 0.6 for a single flip.
- Consider the events when n=3 (3 flips)
| Outcome | Pr[outcome] | value of X |
| HHH | 0.6 * 0.6 * 0.6 | X = 3 |
| HHT | 0.6 * 0.6 * 0.4 | X = 2 |
| HTH | 0.6 * 0.4 * 0.6 | X = 2 |
| HTT | 0.6 * 0.4 * 0.4 | X = 1 |
| THH | 0.4 * 0.6 * 0.6 | X = 2 |
| THT | 0.4 * 0.6 * 0.4 | X = 1 |
| TTH | 0.4 * 0.4 * 0.6 | X = 1 |
| TTT | 0.4 * 0.4 * 0.4 | X = 0 |
- First, note:
- We've assumed independence between coin flips
e.g., Pr[HHT] = Pr[1st is heads] * Pr[2nd is heads] * Pr[3rd is tails]
- Each individual outcomes results in an observable numeric value for the rv of interest (X).
- Notice the following
- The range of X is X ε {0,1,2,3}.
- Pr[X = 3] = Pr[HHH] = 0.6 * 0.6 * 0.6 = 0.216
- Pr[X = 0] = Pr[TTT] = 0.4 * 0.4 * 0.4 = 0.064
- The two remaining probabilities of interest are:
Pr[X=1] and Pr[X=2]
- Pr[X=1] = Pr[HTT] + Pr[THT] + Pr[TTH] = 3 * 0.4 * 0.6 *0.4 = 0.288
- Pr[X=2] = Pr[HHT] + Pr[HTH] + Pr[THH] = 3 * 0.6 * 0.6 *0.4 = 0.432
- We've made use of Axiom 3 (disjoint events).
- Thus, this list is complete for X:
Pr[X=0] = 0.064
Pr[X=1] = 0.288
Pr[X=2] = 0.432
Pr[X=3] = 0.216
- Clearly,
Pr[X=0] + Pr[X=1] + Pr[X=2] + Pr[X=3] = 1
Exercise 5:
Suppose Pr[heads] = p in the above example. Write
down Pr[X=i], iε{0,1,2,3}, in terms of p.
Discrete vs. continuous random variables:
- Discrete: when the range is finite or countably infinite.
- Continuous: when the range is real-valued.
- To specify probabilities for a discrete rv X, we need to
specify Pr[X=k] for every possible value k.
- For continuous rv's it's a little more complicated
=> We will get to this later.
- This specification of probabilities is called a distribution.
Discrete distributions
Consider the following rv and distribution:
- Random variable X has range {0,1}.
- Thus, one needs to specify
Pr[X=0] = ?
Pr[X=1] = ?
- For example,
Pr[X=0] = 0.4
Pr[X=1] = 0.6
- One application is coin tossing: X=1 (heads) or
X=0 (tails).
- This type of rv can be used for any success-failure type of
experiment where 0=failure and 1=success.
- In general, we write this type of distribution as
Pr[X=0] = 1-p
Pr[X=1] = p
where p is the success probability.
- This type of rv is called a Bernoulli rv.
- The associated distribution is called a Bernoulli distribution.
- Here p is called a parameter of the distribution.
- Note: Pr[X=0] + Pr[X=1] = 1, as we would expect.
To summarize:
- A discrete rv X takes on values in some set A
where A is finite or countably infinite.
- Probabilities are specified by giving Pr[X=k] for
each k ε A.
- This list of probabilities is called the distribution of X.
- Because some distributions are used for multiple purposes,
we often parameterize them to study general properties.
It is now possible to work with distributions in the abstract
without specifying an experiment.
- For the distribution to be valid, those probabilities should
sum to 1.
&Sigmak Pr[X=k] = 1
Exercise 6:
What would be an example of a discrete rv
in the Queueing application?
An example of a distribution (in the abstract):
- Consider a rv X with range {1,2,...}.
- Suppose the distribution of X is given by
Pr[X=k] = (1-p)k-1 p
where p is a parameter such that 0<p<1.
- Here are a few sample values:
Pr[X=1] = p
Pr[X=2] = (1-p) p
Pr[X=3] = (1-p)2 p
. . .
- This distribution is called the Geometric distribution.
- Note: this distribution has one parameter (p).
- We use the short form
X ~ Geometric(p)
to mean
"rv X has the Geometric distribution with parameter p"
- One application: coin flips
Let X = the number of flips needed to get the first heads.
=> Pr[X=k] = Pr[k-1 tails in a row, then heads]
=> Pr[X=k] = Pr[tails] Pr[tails] ... (k-1) times ...
Pr[tails] Pr[heads]
=>
Pr[X=k] = (1-p)k-1 p
Exercise 7:
Verify by hand that
&Sigmak Pr[X=k] = 1
for the above distribution.
Exercise 8:
If p=0.6, what is the probability that the first heads appears
on the 3rd flip? Verify your answer using Coin.java
and CoinExample.java.
Exercise 9:
Suppose I compare two parameter values for the Geometric distribution:
p=0.6 and p=0.8.
For which of the two values of p is Pr[X=3] higher?
Exercise 10:
Compute (by hand) Pr[X>k] when X~Geometric(p).
The discrete uniform distribution:
- A rv X has the discrete uniform distribution if:
- It has a finite range, i.e., X ε A where
|A| is finite.
- Pr[X=k] = 1/|A|.
- Example: die roll
- Range is finite: X = 1,2,...,6.
- Pr[X=k] = 1/6.
- Example: card drawing.
- Range is finite: X = 0,1,2,...,51.
- Pr[X=k] = 1/52.
- Quite often, useful discrete-uniform rv's have a range
of {0,1,...,m-1} or {1,2,...,m}.
In the latter case, we write X~Discrete-uniform(1,m).
The Binomial distribution:
- A rv X has the Binomial distribution with parameters
n and p (two parameters) if:
- X has range {0,1,...,n}.
- Pr[X=k] = nCk pk (1-p)n-k.
- Short form: X~Binomial(n,p).
- Example: coin flips
- A coin with Pr[H] = p is flipped n times.
- X counts the number of heads.
- To see why this is appropriate for coin flips:
- Suppose we want to compute Pr[X=k]
=> Probability of observing exactly k heads.
- Think of the n flips as a string over the alphabet {H,T}.
e.g. HHTTH = 3 heads in 5 flips
- Now consider k heads in n flips:

- The probability of getting exactly k heads and
exactly (n-k) tails is:
pk (1-p)n-k.
- But there are nCk ("n choose k") different
ways of arranging the k heads.
=> Each is a possible outcome
=>
Pr[X=k] = nCk pk (1-p)n-k
Exercise 11:
Suppose we flip a coin n times and count the number of heads using
a coin where Pr[H] = p.
- Write code to compute Pr[X=k] using the formula
Pr[X=k] = nCk pk (1-p)n-k.
Write your code in Binomial.java.
- Plot a graph of Pr[X=k] vs. k for the case
n=10, p=0.6 and for the case n=10, p=0.2.
- Write a simulation to estimate Pr[X=3] when n=10, p=0.6.
You can use Coin.java
and CoinExample2.java for
this purpose. Verify the estimate using the earlier formula.
Exercise 12:
For the above experiment, verify that the sum of probabilities
is 1. How would you prove this analytically for the Binomial distribution?
The Poisson distribution:
- X~Poisson(γ) if:
- X has range {0,1,2,...}
- Pr[X=k] = e-γ γk / k!
- Thus, the Poisson rv counts zero or more occurences when the
number of possible occurences can be arbitrarily large.
=>Unlike the number of heads in a fixed number of flips.
- Note: this distribution has only one parameter (γ).
- An application? It turns out that "the number of buses that
arrive in a fixed interval" is a good application.
=> Why this is so is not trivial to show.
Exercise 13:
Add code to Poisson.java to
compute Pr[X=k] and plot a graph of Pr[X=k] vs. k
when γ=2.
Exercise 14:
Download BusStop.java and
BusStopExample3.java,
and modify the latter to estimate the probability that exactly three
buses arrive during the interval [0,2].
Compare this with Pr[X=3] when X~Poisson(2).
The cumulative distribution function (CDF)
Consider X~Binomial(n,p):
- We know that X's range is {0,1,...,n} and that
the distribution specifies Pr[X=k].
- Now consider the probability Pr[X ≤ y] for some
number y.
- Example: Pr[X ≤ 5]
Pr[X ≤ 5] = Pr[X=0] + Pr[X=1] + ... + Pr[X=5]
Exercise 15:
Plot Pr[X ≤ k] for X~Binomial(10,0.6).
- Next, observe that
Pr[X ≤ 5] = (Pr[X=0] + Pr[X=1] + Pr[X=3]) + (Pr[X=4] + Pr[X=5])
= Pr[X ≤ 3] + Pr[3 < X ≤ 5]
- In general, for m<k
Pr[X ≤ k] = Pr[X ≤ m] + Pr[m < X ≤ k]
- Now, consider Pr[X ≤ 5.32] when X~Binomial(n,p).
Does this make sense?
- It can if we write Pr[X ≤ 5.32] = Pr[X ≤5] + Pr[5 < X ≤5.32]
- Obviously, Pr[5 <X ≤ 5.32] = 0
=> Pr[X ≤ 5.32] = Pr[X ≤ 5].
- In this way, Pr[X ≤ y] is defined for any
value of y.
- Since it is defined for every y, it is a
function of y:
F(y) = Pr[X ≤ y]
=> called the Cumulative Distribution Function (CDF)
of rv X
- CDFs are usually defined for all possible y: (-infty, +infty)
Exercise 16:
Plot (by hand, on paper) the CDF F(y) for X~Bernoulli(0.6).
Next, consider the sequence of real numbers
yn = (1 - 1/n)
and the sequence
zn = (1 + 1/n)
Compute by hand or by programming the following limits:
- limn→infty yn
- limn→infty zn
- limn→infty F(yn)
- limn→infty F(zn)
What do you notice?
About the CDF for discrete rv's:
- The CDF is right-continuous
You can approach F(y) from the right, but not always
from the left.
- The CDF F(y) never decreases with increasing y.
Exercise 17:
Why is this true?
Exercise 18:
Suppose rv X denotes the outcome ({1,2,...,6}) of
a die roll. Draw by hand the CDF.
Exercise 19:
Consider the distribution for the 3-coin-flip example:
Pr[X=0] = 0.064
Pr[X=1] = 0.288
Pr[X=2] = 0.432
Pr[X=3] = 0.216
Sketch the CDF on paper.
Exercise 20:
For the two CDFs we've seen so far (Bernoulli, Discrete-Uniform), is
it possible to differentiate the function F(y)? If so, what
is the derivative for the Bernoulli case?
Consider a CDF F(y) of a rv X:
Exercise 21:
Suppose X~Geometric(p).
- Argue that Pr[X>k] = (1-p)k.
- Then write down the CDF.
- Write down an expression for Pr[3.11 < X ≤ 5.43] in
terms of the CDF.
Continuous random variables
Exercise 22:
Recall the Bus-stop example. Suppose rv X denotes
the first interarrival time. Download
BusStop.java
and
BusStopExample.java.
Then modify the latter to estimate the following:
- Pr[X < 1]
- Pr[X < 1.1]
- Pr[X=1]
Exercise 23:
What is an example of a continuous rv associated
with the QueueControl.java
application?
The difficulty with continuous rv's:
- First, Pr[X=x] = 0 for most continuous rv's X.
=> This means we can't specify the distribution using a list.
- Second, it's not clear how we can get probabilities to sum
to 1.
e.g., for a rv that takes any real value in [0,1], how
do we sum probabilities?
- Fortunately, the CDF offers a way out.
Let's write some code to estimate a CDF:
- First, identify the range [a,b] of interest.
- Then, break this up into M small intervals:
[a,a+δ], [a+δ,a+2δ], ..., [a+(M-1)δ,a+Mδ]
- Maintain a counter intervalCount[i] for each interval.
- For any random value x, identify the interval
j = floor ( (x-a) / δ )
and increment the count of every interval j ≥ i.
- Pseudocode:
Algorithm: cdfEstimation (M, a, b)
Input: number of intervals M, range [a,b]
1. δ = (b-a) / M
2. for n=1 to numTrials
3. x = generateRV ()
4. j = floor ( (x-a) / δ )
5. for i=j to M-1
6. intervalCount[i] = intervalCount[i] + 1
7. endfor
8. endfor
// Now build and print the CDF
9. for i=0 to M-1
10. midPoint = a + iδ + δ/2 // Interval: [a+iδ,a+(i+1)δ]
11. print (midPoint, intervalCount[i] / numTrials)
12. endfor
- Sample code:
(source file)
static Function makeUniformCDF ()
{
double a = 0, b = 1;
int M = 50; // Number of intervals.
double delta = (b-a) / M; // Interval size.
double[] intervalCounts = new double [M];
double numTrials = 1000000;
for (int t=0; t<numTrials; t++) {
// Random sample:
double y = RandTool.uniform ();
// Find the right interval:
int k = (int) Math.floor ((y-a) / delta);
// Increment the count for every interval above and including k.
for (int i=k; i<M; i++) {
intervalCounts[i] ++;
}
}
// Now compute probabilities for each interval.
double[] cdf = new double [M];
for (int k=0; k<M; k++) {
cdf[k] = intervalCounts[k] / numTrials;
}
// Build the CDF. Use mid-point of each interval.
Function F = new Function ("Uniform cdf");
for (int k=0; k<M; k++) {
double midPoint = a + k * delta + delta/2;
F.add (midPoint, cdf[k]);
}
return F;
}
Exercise 24:
Download and execute UniformCDF.java.
You will also need Function.java.
Modify the code to print F(1)-F(0.75).
Getting probabilities out of a CDF:
- Recall that for any rv X with CDF F:
Pr[y1 < X ≤ y2]
= F(y2) - F(y1)
- Thus, for example, if X has the uniform CDF
Pr[0.75 < X ≤ 1.0]
= F(1) - F(0.75)
- Important: the CDF gets you the probability that
X's outcome lands in any particular interval of interest:
Pr[X falls in any interval (a,b]] = F(b) - F(a)
Exercise 25:
The program GaussianCDF.java
estimates the CDF of a Gaussian rv. Execute the program
to plot the CDF. Then, modify use this CDF to compute the following probabilities:
- Pr[0 < X < 2].
- Pr[X > 0].
A small technicality that goes to the heart of a continuous rv:
Exercise 26:
Modify UniformCDF.java
and GaussianCDF.java
to compute the derivative of each. What is the shape of F'(y)
in each case?
Exercise 27:
If X denotes the first interarrival time in the bus-stop
problem, estimate the CDF of X as follows:
- Assume that values fall in the range [0,3] (i.e.,
disregard values outside this range).
- Use ExponentialCDF.java
as a template, and add modified code from
UniformCDF.java
- Next, compute the derivative of this function and
display it.
About CDFs and derivatives:
- The concept of CDF applies to both discrete and continuous rv's.
- The CDF of a discrete rv is not differentiable.
- For continuous rv's, the derivative is called a
probability density function (pdf).
- The pdf has two uses:
- It's useful in some calculations as we'll see below.
- Most books use the pdf
=> You need to be comfortable with the pdf to read them.
- For completeness, note that the list of probabilities for
a discrete rv is called a probability mass function
(pmf), e.g.
Pr[X=0] = 0.064
Pr[X=1] = 0.288
Pr[X=2] = 0.432
Pr[X=3] = 0.216
- Customarily, a discrete rv's distribution is specified by
giving its pmf.
=> The CDF is not really useful for discrete rv's
- Customarily, a continuous rv's distribution is specified by
giving its pdf.
=> The CDF is equivalent (because you can take its derivative to get the pdf)
- To summarize:
- Discrete rv's use a pmf.
- Continuous rv's use a pdf.
- Both use a CDF.
Expectation - the discrete case
The expected value of a rv is a calculation
performed on its distribution:
- If X is a discrete rv with range A, then
Expected value of X = ΣkεA k Pr[X=k]
- We write this as
E[X] = ΣkεA k Pr[X=k]
- Note: this is merely a calculation, and (so far) has nothing
to do with the English word "expect".
- E[X] is a number (not a function).
- Example: recall the 3-coin-flip example where X =
number-of-heads (we'll use Pr[H]=0.6).
- The range: A = {0,1,2,3}.
- The pmf:
Pr[X=0] = 0.064
Pr[X=1] = 0.288
Pr[X=2] = 0.432
Pr[X=3] = 0.216
- Thus,
E[X] = ΣkεA k Pr[X=k]
= 0 Pr[X=0] + 1 Pr[X=1] + 2 Pr[X=2] + 3 Pr[X=3]
= 0*0.064 + 1*0.288 + 2*0.432 + 3*0.216
Exercise 28:
Complete the calculation above.
What would you get if Pr[H]=0.5?
A few more examples:
- X~Bernoulli(p)
- Recall the pmf:
Pr[X=0] = 1-p
Pr[X=1] = p
- Thus,
E[X] = 0*(1-p) + 1*p = p
- In particular, for a fair coin p=0.5 and so E[X]=0.5.
- X~Geometric(p)
- The range: {1,2,...}
- The pmf:
Pr[X=k] = (1-p)k-1 p
- Thus,
E[X] = Σk k Pr[X=k]
= Σk (1-p)k-1 p
= 1/p
- X~Binomial(n,p)
- The range: {0,1,2,...,n}
- The pmf:
Pr[X=k] = nCk pk (1-p)n-k.
- Thus,
E[X] = Σk k Pr[X=k]
= Σk nCk pk (1-p)n-k.
= np
Exercise 29:
How does this relate to the 3-coin-flip example?
- X~Discrete-uniform(1,k)
- The range: {1,2,...,m}
- The pmf:
Pr[X=k] = 1/m
- Thus,
E[X] = Σk k Pr[X=k]
= (m+1)/2
- X~Poisson(γ)
- The range: {0,1,2,...}
- The pmf:
Pr[X=k] = e-γ γk / k!
- Thus,
E[X] = Σk k e-γ γk / k!
= γ
Exercise 30:
Fill in the steps for the last two expectations (discrete-uniform, Poisson).
What is the meaning of this expected value?
- First, let's return to the 3-coin-flip example where
Pr[X=0] = 0.064
Pr[X=1] = 0.288
Pr[X=2] = 0.432
Pr[X=3] = 0.216
- Now suppose we perform n trials
Let n0 = number of occurences of 0
in the trials.
Similarly, let n1 = number of occurences of 1
in the trials.
... etc
- Next, let
Sn = the sum of values obtained.
- Notice that
(1/n) Sn = average of values obtained.
- For example, suppose we ran some trials and obtained
3, 1, 1, 0, 2, 0, 1, 2, 2, 2
In this case
n0 = 2
n1 = 3
n2 = 4
n3 = 1
Sn = 14
This is evident, if we group the values:
0, 0, 1, 1, 1, 2, 2, 2, 2, 3
- Next, notice that
Sn = 0 + 1*3 + 2*4 + 3*1
= 0*n0 + 1*n1 + 2*n0 + 3*n0
= Σk k nk
- Now consider the limit
limn→infty (1/n) Sn
= limn→infty (1/n) Σk k nk
= limn→infty Σk k nk/n
Exercise 31:
What does nk/n become in the limit?
Unfold the sum for the 3-coin-flip example to see why this is true.
- Thus,
limn→infty (1/n) Sn
= Σk k Pr[X=k]
= E[X]
- Thus, even though E[X] is a calculation done with the
distribution, it turns out to be the average (mean) outcome in the
long run (limit).
(provided of course we believe nk/n → Pr[X=k]).
Exercise 32:
Download Coin.java and
CoinExample3.java
and let X= the number of heads in 3 coin flips.
- Compute the average value of X using
(1/n) Sn
- Estimate Pr[X=k] using nk/n.
- Compute Σk k nk/n.
Compare with the E[X] calculation you made earlier.
Exercise 33:
Use Coin.java and
CoinExample4.java
and let X= the number of flips needed to get
the first heads when Pr[Heads]=0.1.
Compute the average value of X using
(1/n) Sn.
Compare with the E[X] calculation from earlier.
Expectation - the continuous case
There are two technical difficulties with the continuous case:
- Before explaining, recall the calculation for the
discrete case:
E[X] = ΣkεA k Pr[X=k]
- One problem with the continuous case: Pr[X=x] = 0, for
any outcome x.
- Another problem: we don't know how to do uncountable sums.
Let's first try an approximation:
- We will try to discretize and apply the discrete definition.
- Suppose we have a continuous rv with range [a,b].
- We'll divide [a,b] into M intervals
[a,a+δ], [a+δ,a+2δ], ..., [a+(M-1)δ,a+Mδ]
- For convenience, we'll write
xi = a+iδ
- Then, the intervals become
[x0,x1],
[x1,x2], ...,
[xM-1,xM]
- We can compute Pr[X in i-th interval] as follows:
Pr[X in [xi,xi+1]] = F(xi+1) - F(xi)
- A representative point in the interval would be the
mid-point mi = (xi+xi+1)/2
- Thus, we could define
E[X] = Σk<M
mi (F(xi+1) - F(xi))
Exercise 34:
Try this computation with the uniform, Gaussian and exponential
distributions
using
UniformCDF2.java,
GaussianCDF2.java, and
ExponentialCDF2.java.
Explore what happens when more intervals are used in the
expectation computation than in the CDF estimation.
Now for an important insight:
- Let's return to the definition of E[X] for a continuous
rv:
E[X] = Σk<M
mi (F(xi+1) - F(xi))
- Multiply and divide by δ
E[X] = Σk<M
mi (F(xi+1) -
F(xi))/δ δ
- As M grows (and δ shrinks),
(F(xi+1) - F(xi))/δ
→
F'(xi)
- Similarly,
mi
→
xi
- Thus, in the limit, we could argue that
- We use the term probability density function
(pdf) to name f(x) = F'(x), the derivative of the CDF F.
- In this case,
=> This is (almost) the usual textbook definition of E[X].
- By making f(x)=0 outside the rv's range, this is
equivalent to:
=> Textbook definition of E[X].
- As an aside, since we've formally introduced pdfs, calculus
tells us that:
- If f(x) is some density function, then there is
some F such that
(First fundamental theorem of calculus)
- But because F(b) - F(a) = Pr[a ≤ X ≤ b],
- As a corollary,
- Finally,
F' = f
(Second fundamental theorem of calculus).
Exercise 35:
Find a description and picture of the standard Gaussian pdf.
What is the maximum value (height) of the pdf? Use a calculator
or write a program to compute this value as a decimal.
Histograms, pmfs and pdfs
What is the connection?
- We are used to the intuitive idea of a histogram,
which shows the spread of frequency-of-occurence.
- We sense that pmfs and pdfs show something similar
=> This is more or less true, with some modifications.
Types of histograms:
- Let's start with the most basic histogram:
Suppose we are given 7 data: 1.6, 0.7, 2.4, 0.3, 1.8, 1.7, 0.4
- Step 1: find an appropriate range
=> We'll use [0, 2.5].
- Step 2: decide the granularity (number of bins)
=> We'll use 5 bins.
- Step 3: Identify, for each bin, how many data values
fall into that bin.

- Note: the height of each bin is the number
of data falling into each bin.
- An alternative type of histogram is a relative
or proportional histogram:
- Here, we compute the proportion of data in each bin:

Exercise 36:
Consider the 3-coin flip example (with Pr[Heads]=0.6]).
Use a bin width of 1.0
and build a proportional histogram when the data is 3, 1, 1, 0, 2, 0, 1, 2, 2, 2.
Let's write some code for the 3-coin flip example:
- First, observe that the data is integers.
We will build bins to differentiate each possible outcome

- The program:
(source file)
double numTrials = 100000;
Coin coin = new Coin (0.6);
// Build intervals to surround the values of interest.
PropHistogram hist = new PropHistogram (-0.5, 3.5, 4);
for (int t=0; t<numTrials; t++) {
// #heads in 3 flips.
int numHeads = coin.flip() + coin.flip() + coin.flip();
hist.add (numHeads);
}
// View the histogram, text and graph.
System.out.println (hist);
hist.display ();
Exercise 37:
Run CoinExample5.java
to see what you get for a large number of samples.
You will also need Coin.java
and PropHistogram.java.
The relationship between the proportional histogram and pmf:
- Recall: pmf is for discrete rv's
=> the list Pr[X=k] for each k
- We have selected each bin so that the height of bin k
is the proportion of times k occurs in the data.
- Clearly, in the limit,
Height of bin k → Pr[X=k]
- Thus, the proportional histogram becomes the pmf in the limit.
Does this work for continuous rv's?
Exercise 38:
Use PropHistogram.java
and execute GaussianExample.java.
Does it work? Change the number of intervals from 10 to
20. What do you notice? Is the height correct?
Histograms for continuous rv's:
Exercise 39:
Modify computeHeights() in
PropHistogram.java
to compute heights as above for continuous rv's.
Then, execute GaussianExample.java.
Change the number of intervals from 10 to
20. Does it work?
Summary:
- In a regular histogram,
bin height = number of data in bin
- For discrete rv's, proportional histogram → pmf in the limit.
bin height = proportion of data in bin
- Another way of saying it: a proportional histogram
estimates the true pmf.
- For continuous rv's, density histogram → pmf in the limit.
bin height = proportion-of-data-in-bin / bin-width
- A density histogram estimates the true pdf.
Exercise 40:
Estimate the density of the time spent in the system
by a random customer in the QueueControl example.
To do this, you need to build a density histogram of
values of the variable
timeInSystem
in QueueControl.java.
Some well-known continuous rv's
Three important distributions (pdf shapes):
- Uniform:

- Exponential (γ = 4):

- Gaussian (also called Normal):

The Uniform distribution: if X~Uniform(a,b)
- a < b are two real numbers.
=> Most common case: a=0, b=1
- X takes values in [a,b] (a,b inclusive).
- X has pdf (defined, for generality, on the whole real line):

- X has CDF

- X has expected value

Exercise 41:
Suppose X~Uniform(a,b) and [c,d] is a
sub-interval of [a,b]. Use either the pdf or
cdf to compute Pr[Xε[c,d]].
Exercise 42:
Suppose X~Uniform(0,1) and let
rv Y=1-X. What is the distribution of Y?
- What is the range of Y?
- Use PropHistogram
to estimate the density (remember to scale by interval-width) of
Y. Recall that RandTool.uniform() generates
values from the Uniform(0,1) distribution.
- How would you derive the pdf or CDF of Y?
The Exponential distribution: if X~Exponential(γ)
- The parameter γ>0 is a real number.
- X takes values in the positive half of the real-line.
- X has pdf (defined, for generality, on the whole real line):

- X has CDF

- X has expected value

Exercise 43:
Suppose X~Exponential(γ).
- Write down an expression for Pr[X>a].
- Compute Pr[X>b+a|X>b].
Exercise 44:
Suppose X~Exponential(γ) with CDF F(x).
Write down an expression for F-1(y), the inverse
of F.
The Normal distribution: if X~N(μ,σ2)
- The parameter μ is any real number.
- The parameter σ is any non-negative real number.
- X takes any real value.
- X has pdf

- There is no closed-form CDF.
- X has expected value E[X] = μ.
Generating random values from distributions
We have used random values from various distributions,
but how does one write a program to generate them?
Key ideas:
- Since we are writing a (hopefully small) program,
the output cannot really be truly random.
=> the goal is to make pseudo-random generators.
- Use the strange properties of numbers to make
a (pseudo-random) generator for Uniform(0,1) values.
- To generate values for some other distribution:
- Generate Uniform(0,1) values.
- Transform them so that the transformed values have
the desired distribution.
Note: Random-number generation is a topic of research by itself:
- Much work on high-quality Uniform(0,1) (or bit-stream)
generators.
- Similarly, there is much work on efficient transformations for
particular distributions.
Desirable properties for a pseudo-random generator:
- Should be repeatable:
=> It should be possible to reproduce the sequence given some kind of initial "seed".
- Should be portable:
=> It shouldn't be dependent on language, OS or hardware.
- Should be efficient:
=> The calculations shouldn't take too long (simple
arithmetic, preferably).
- Should have some theoretical basis:
=> Ideally, some number-theory results about its properties.
- (Contradiction) Should have good statistical properties:
=> Should satisfy statistical tests for randomness.
Exercise 45:
What kinds of statistical tests? Is it enough that the
histogram resemble the U[0,1] pdf?
A simple algorithm: the Lehmer generator:
- For some large number M, consider the set
S = {1,2,...,M-1}.
- We will generate a number K in S and return U=K/M.
=> Output is between 0 and 1.
- If K is uniformly distributed between 1 and M-1
=> U is approximately Uniform(0,1).
- Note: this means we'll never generate some values
like (1+2)/2M.
- Only remaining question: how do we pick uniformly among
the discrete elements in S={1,2,...,M-1}?
- Key idea:
- Consider two large numbers a and b (but both
a,b < M).
- Suppose we compute x = ab mod M.
- Then x could be anywhere between {1,2,...,M-1}.
- Then, pick another two values for a and b and repeat.
- But how do we pick a and b?
The logic seems circular at this point.
- The key insight: use the previous value x.
- Fix the values of a and b.
- Compute x1 = ab mod M.
- Then, compute x2 = ax1 mod M.
- Then, compute x3 = ax2 mod M.
- ... etc
- We will call b = x0, the seed.
- It can be shown (number theory) that if M is prime,
the sequence of values x1,x2, ...,
xp is periodic (and never repeats in between).
Furthermore, for some values of a, p=M-1.
- Example: M=231-1 and a=48271.
- Pseudocode:
Algorithm: uniform ()
// Assumes M and a are initialized.
// Assumes x (global) has been initialized to seed value.
1. x = (a*x) mod M
2. return x
What about its properties?
- Some properties are obviously satisfied: portable, repeatable,
efficient (provided mod is efficient).
- The Lehmer generator has reasonably good statistical
properties for many applications.
- However, there are (stranger) generators with better
statistical properties.
Using a U[0,1] generator for discrete rv's:
- First, consider X~Bernoulli(0.6) as an example:

- Recall: X takes values 0 or 1 and Pr[X=1]=0.6.
- Key idea: generate a uniform value U.
- If 0<U<0.6, return X=1, else return X=0.
- Pseudocode:
Algorithm: generateBernoulli (p)
Input: Bernoulli parameter p=Pr[success]
1. u = uniform ()
2. if u < p
3. return 1
4. else
5. return 0
6. endif
- This idea can be generalized: e.g., consider rv X
with pmf
Pr[X=0] = 0.064
Pr[X=1] = 0.288
Pr[X=2] = 0.432
Pr[X=3] = 0.216

Generate uniform value U.
if U < 0.064
return 0
else if U < 0.064 + 0.288
return 1
else if U < 0.064 + 0.288 + 0.432
return 2
else // if U < 1
return 3
Exercise 46:
Add code to DiscreteGenExample.java
to implement the above generator, and to test it by building a histogram.
Let's generalize this to any discrete pmf of the type Pr[X=k] = ....
- We'll use the notation pk = Pr[X=k].
- Define Qk =
Σi ≤k Pr[X=i].
- Then, the algorithm is:
Given uniform value U, return the smallest
k such that U < Qk.
- Pseudocode:
Algorithm: generateFromDiscretePmf ()
// Assumes Q[k] have been computed.
1. u = uniform ()
2. k = 0
3. while Q[k] < u
4. k = k + 1
5. endwhile
6. return k
- This presents a problem when X~Geometric(p).
Here, X ε {1,2,...}
=> Cannot pre-compute Qk's.
- Alternatively, we can compute the Qk's as
we go:
Algorithm: generateFromDiscretePmf ()
1. u = uniform ()
2. k = 0
3. Q = Pr[X=0]
4. while Q < u
5. k = k + 1
6. Q = Q + Pr[X=k]
7. endwhile
8. return k
Exercise 47:
Add code to GeometricGenerator.java
to implement the above generator. The test code is already written.
Let's return to the 3-coin-flip example for a key insight:
- Recall the pmf:
Pr[X=0] = 0.064
Pr[X=1] = 0.288
Pr[X=2] = 0.432
Pr[X=3] = 0.216
- From which we can infer the CDF:
Pr[X ≤ 0] = 0.064
Pr[X ≤ 1] = 0.352 (= 0.064 + 0.288)
Pr[X=2] = 0.784 (= 0.064 + 0.288 + 0.432)
Pr[X=3] = 1.000 (= 0.064 + 0.288 + 0.432 + 0.216)
- Let's plot the CDF:

- Recall our generation algorithm:
Given uniform value u, return the smallest k
such that U < Qk.

- Thus, it looks like we are using the CDF "in reverse".
We will use this idea for continuous rv's.
Using the inverse-CDF for continuous rv's:
- Suppose X is a continuous rv with CDF F(x).
- Suppose, further, that the inverse F-1(y)
can be computed.
- Consider F-1(U) where U is a uniform value.
- When U is a rv, so is g(U) for any function g.
=> In particular, F-1(U) is a rv.
- Let H(x) be the CDF of the rv F-1(U).
=> Thus, H(x) = Pr[F-1(U) ≤ x].
- Let's work this through a few steps:
H(x)
= Pr[F-1(U) ≤ x]
= Pr[U ≤ F(x)]
= F(x)
=> F-1(U) has the same CDF as X.
=> F-1(U) has the same distribution as X.
- Thus, we can use F-1(U) to produce random
values of X:
=> Given uniform value U, return F-1(U).
- Pseudocode:
Algorithm: generateFromContinuousCDF ()
// Assumes inverse CDF F-1 is available.
1. u = uniform ()
2. return F-1(u)
Exercise 48:
Add code to ExponentialGenerator.java
to implement the above idea. Use the inverse-CDF you computed
earlier. The test code is written to produce a histogram.
Use your modified version of
PropHistogram.java to make
a density histogram. Compare the result with the actual density
(using γ=4). How do you know your code worked?
About random generation:
- The inverse-CDF idea does not work easily for distributions like
the Normal:
=> One needs to numerically approximate some inverse-CDFs.
- There are a host of techniques for specific distributions.
=> random generation is a specialty in itself (several books).
- For simple simulations, the straightforward generators will suffice.
- For very large simulations (large numbers of samples)
=> You'd want to use a more industrial-strength generator.
- Example: the Mersenne-Twister algorithm for uniform values.
- Another issue: many simulations require generating multiple
streams of random values
=> Need to be careful that you don't introduce correlations
through careless use of generators.
Summary
What have we learned in this module?
- Prior to this, we learned how to reason about events:
- Events, sample spaces, probability measure.
- Combining events (intersection, union).
- Conditional probability, Bayes' rule, total-probability.
- A random variable (rv) measure numeric output from an experiment.
(As opposed to the outcome "King of Spades")
- Why is this useful?
You can perform arithmetic and other types of calculations
with numeric output, e.g., expected value.
- Two types: discrete and continuous.
- Discrete rv's have pmfs. Continuous rv's have pdfs.
- Both have CDFs defined the same way: F(x) = Pr[X ≤ x].
- A proportional histogram becomes the pmf (in the limit).
- A density histogram becomes the pdf (in the limit).
(Minor technical issue: there are two limits here - the
number of samples, the size of the interval).
- The important discrete distributions: Bernoulli,
Discrete-Uniform, Binomial, Geometric, Poisson.
- The important continuous ones: Uniform, Exponential, Normal.
- The expected value is a calculation
performed on a pmf or pdf.
- For discrete rv's:
E[X] = ΣkεA k Pr[X=k]
- For continuous rv's: