Sampling with Geometric Distribution

Sampling with Geometric Distribution#

Recall that, in general, the geometric distribution has the following pdf:

\(n\) is the index of the first success,
\(p\) is the probability of success, and
\(q=1-p\) indicates the probability of failure.

The probability of the first success occuring on the \(n^{th}\) trial is given by:

\[P(n) = (q)^{n-1}p\]

The expected value of the geometric distribution is given by:

\[E(X) = \frac{1}{p}\]

Example: How Many Dice Rolls to Get 10 or More#

Suppose we are rolling two, 6-sided dice and summing the values shown in a standard Monopoly dice roll. How many trials \(n\) on average are required before a success occurs?

Probability of Success#

Let’s note that, of the 36 possible outcomes, 3 are 10’s, 2 are 11’s and 1 is 12. Thus, the probability of success on one dice roll is given by:

\[p = \frac{6}{36} = \frac{1}{6}\]

The rflip() Function#

We need to be able to use rflip() function, so we load that code below.

Relationship to Negative Binomial#

The fact the geometric distribution is a sub-case of the negative binomial distribution where \(k=1\) can help us here. We can use the while loop created in the previous section.

Expected Value#

For the geometric distribution, the expected value is as follows:

\[E(X) = \frac{1}{p}\]

Thus, for this example:

\[E(X) = \frac{1}{\frac{1}{4}} = 4\]

num_trials_needed <- c()     # create a vector to store the number of trials needed in while loop
num_samps = 100              # set the number of times to run the simulation

for (i in 1:num_samps){
   k <- 0     # Initialize successes to zero
   n <- 0     # Initialize number of total trials needed for success to occur
   while (k < 1 && n < 20) {
      k <- k + rflip(1, prob = 1/4, summarize = TRUE)[1,2]
      n <- n + 1
    }
   num_trials_needed[i] <- n     # store the number of trials need in this simulation
}

lower <- quantile(num_trials_needed, prob = 0.05)     # Calcuate the 5th percentile.
upper <- quantile(num_trials_needed, prob = 0.95)     # Calcuate the 95th percentile.
cat('The grand mean of the number of trials needed to achieve success is equal to\n   ',mean(num_trials_needed) )
hist(num_trials_needed, breaks = 8, main = 'Histogram of Oil Wells Drilled: Repetitions = 100', xlab = 'Number of Wells Drilled')
abline( v = lower, col="blue")     # Add vertical line at 5th percentile
abline(v = upper, col="blue")      # Add vertical line at 95th percentile 

The grand mean of the number of trials needed to achieve success is equal to
    3.35

_images/e3465360367a826910a457596736a7f8cb71024964dcb72d8dd6f138fb50efe6.png

This table summarizes the investigation to include five examples from the sampling distribution of Drawing a Spades hand:

num_trials_needed <- c()     # create a vector to store the number of trials needed in while loop
num_samps = 200              # set the number of times to run the simulation

for (i in 1:num_samps){
   k <- 0     # Initialize successes to zero
   n <- 0     # Initialize number of total trials needed for success to occur
   while (k < 1 && n < 20) {
      k <- k + rflip(1, prob = 1/4, summarize = TRUE)[1,2]
      n <- n + 1
    }
   num_trials_needed[i] <- n     # store the number of trials need in this simulation
}

lower <- quantile(num_trials_needed, prob = 0.05)     # Calcuate the 5th percentile.
upper <- quantile(num_trials_needed, prob = 0.95)     # Calcuate the 95th percentile.
cat('The grand mean of the number of trials needed to achieve success is equal to\n   ',mean(num_trials_needed) )
hist(num_trials_needed, breaks = 8, main = 'Histogram of Oil Wells Drilled: Repetitions = 200', xlab = 'Number of Wells Drilled')
abline( v = lower, col="blue")     # Add vertical line at 5th percentile
abline(v = upper, col="blue")      # Add vertical line at 95th percentile 

The grand mean of the number of trials needed to achieve success is equal to
    4.085

_images/2366d03d080f68a35e1deed7e3f22dea16191c853b3a55255a8d9870e23902cc.png

num_trials_needed <- c()     # create a vector to store the number of trials needed in while loop
num_samps = 500              # set the number of times to run the simulation

for (i in 1:num_samps){
   k <- 0     # Initialize successes to zero
   n <- 0     # Initialize number of total trials needed for success to occur
   while (k < 1 && n < 20) {
      k <- k + rflip(1, prob = 1/4, summarize = TRUE)[1,2]
      n <- n + 1
    }
   num_trials_needed[i] <- n     # store the number of trials need in this simulation
}

lower <- quantile(num_trials_needed, prob = 0.05)     # Calcuate the 5th percentile.
upper <- quantile(num_trials_needed, prob = 0.95)     # Calcuate the 95th percentile.
cat('The grand mean of the number of trials needed to achieve success is equal to\n   ',mean(num_trials_needed) )
hist(num_trials_needed, breaks = 8, main = 'Histogram of Oil Wells Drilled: Repetitions = 500', xlab = 'Number of Wells Drilled')
abline( v = lower, col="blue")     # Add vertical line at 5th percentile
abline(v = upper, col="blue")      # Add vertical line at 95th percentile 

The grand mean of the number of trials needed to achieve success is equal to
    3.89

_images/3b74c7794e8e755bdc44eb355af30c39ef7184b7c05da8d5b8a75b0aa39b6771.png

num_trials_needed <- c()     # create a vector to store the number of trials needed in while loop
num_samps = 1000              # set the number of times to run the simulation

for (i in 1:num_samps){
   k <- 0     # Initialize successes to zero
   n <- 0     # Initialize number of total trials needed for success to occur
   while (k < 1 && n < 20) {
      k <- k + rflip(1, prob = 1/4, summarize = TRUE)[1,2]
      n <- n + 1
    }
   num_trials_needed[i] <- n     # store the number of trials need in this simulation
}

lower <- quantile(num_trials_needed, prob = 0.05)     # Calcuate the 5th percentile.
upper <- quantile(num_trials_needed, prob = 0.95)     # Calcuate the 95th percentile.
cat('The grand mean of the number of trials needed to achieve success is equal to\n   ',mean(num_trials_needed) )
hist(num_trials_needed, breaks = 8, main = 'Histogram of Oil Wells Drilled: Repetitions = 1,000', xlab = 'Number of Wells Drilled')
abline( v = lower, col="blue")     # Add vertical line at 5th percentile
abline(v = upper, col="blue")      # Add vertical line at 95th percentile 

The grand mean of the number of trials needed to achieve success is equal to
    3.878

_images/b8a5c1c4856325980bb552dd42abe4852831e2785a469836f91bfcf71fb50fae.png

Max Number of Wells Drilled	p = 1/4	E(X)	Simulation Grand Mean
100	0.25	4	4.05
200	0.25	4	3.92
500	0.25	4	4.28
1,000	0.25	4	3.857