Sampling with Geometric Distribution#

Recall that, in general, the geometric distribution has the following pdf:

  • \(n\) is the index of the first success,

  • \(p\) is the probability of success, and

  • \(q=1-p\) indicates the probability of failure.

The probability of the first success occuring on the \(n^{th}\) trial is given by:

\[P(n) = (q)^{n-1}p\]

The expected value of the geometric distribution is given by:

\[E(X) = \frac{1}{p}\]

Example: How Many Dice Rolls to Get 10 or More#

Suppose we are rolling two, 6-sided dice and summing the values shown in a standard Monopoly dice roll. How many trials \(n\) on average are required before a success occurs?

Probability of Success#

Let’s note that, of the 36 possible outcomes, 3 are 10’s, 2 are 11’s and 1 is 12. Thus, the probability of success on one dice roll is given by:

\[p = \frac{6}{36} = \frac{1}{6}\]

The rflip() Function#

We need to be able to use rflip() function, so we load that code below.

Hide code cell source
rflip <- function(n=1, prob=.5, quiet=FALSE, verbose = !quiet, summarize = FALSE, 
                  summarise = summarize) {
	if ( ( prob > 1 && is.integer(prob) ) ) {  
		# swap n and prob
		temp <- prob
		prob <- n
		n <- temp
	}
	if (summarise) {
	  heads <- rbinom(1, n, prob)
	  return(data.frame(n = n, heads = heads, tails = n - heads, prob = prob))
	} else {
	  r <- rbinom(n,1,prob)
	  result <- c('T','H')[ 1 + r ]
	  heads <- sum(r)
	  attr(heads,"n") <- n
	  attr(heads,"prob") <- prob 
	  attr(heads,"sequence") <- result
	  attr(heads,"verbose") <- verbose
	  class(heads) <- 'cointoss'
	  return(heads)
	}
}

Relationship to Negative Binomial#

The fact the geometric distribution is a sub-case of the negative binomial distribution where \(k=1\) can help us here. We can use the while loop created in the previous section.

Expected Value#

For the geometric distribution, the expected value is as follows:

\[E(X) = \frac{1}{p}\]

Thus, for this example:

\[E(X) = \frac{1}{\frac{1}{4}} = 4\]
num_trials_needed <- c()     # create a vector to store the number of trials needed in while loop
num_samps = 100              # set the number of times to run the simulation

for (i in 1:num_samps){
   k <- 0     # Initialize successes to zero
   n <- 0     # Initialize number of total trials needed for success to occur
   while (k < 1 && n < 20) {
      k <- k + rflip(1, prob = 1/4, summarize = TRUE)[1,2]
      n <- n + 1
    }
   num_trials_needed[i] <- n     # store the number of trials need in this simulation
}

lower <- quantile(num_trials_needed, prob = 0.05)     # Calcuate the 5th percentile.
upper <- quantile(num_trials_needed, prob = 0.95)     # Calcuate the 95th percentile.
cat('The grand mean of the number of trials needed to achieve success is equal to\n   ',mean(num_trials_needed) )
hist(num_trials_needed, breaks = 8, main = 'Histogram of Oil Wells Drilled: Repetitions = 100', xlab = 'Number of Wells Drilled')
abline( v = lower, col="blue")     # Add vertical line at 5th percentile
abline(v = upper, col="blue")      # Add vertical line at 95th percentile 
The grand mean of the number of trials needed to achieve success is equal to
    3.35
_images/e3465360367a826910a457596736a7f8cb71024964dcb72d8dd6f138fb50efe6.png

This table summarizes the investigation to include five examples from the sampling distribution of Drawing a Spades hand:

num_trials_needed <- c()     # create a vector to store the number of trials needed in while loop
num_samps = 200              # set the number of times to run the simulation

for (i in 1:num_samps){
   k <- 0     # Initialize successes to zero
   n <- 0     # Initialize number of total trials needed for success to occur
   while (k < 1 && n < 20) {
      k <- k + rflip(1, prob = 1/4, summarize = TRUE)[1,2]
      n <- n + 1
    }
   num_trials_needed[i] <- n     # store the number of trials need in this simulation
}

lower <- quantile(num_trials_needed, prob = 0.05)     # Calcuate the 5th percentile.
upper <- quantile(num_trials_needed, prob = 0.95)     # Calcuate the 95th percentile.
cat('The grand mean of the number of trials needed to achieve success is equal to\n   ',mean(num_trials_needed) )
hist(num_trials_needed, breaks = 8, main = 'Histogram of Oil Wells Drilled: Repetitions = 200', xlab = 'Number of Wells Drilled')
abline( v = lower, col="blue")     # Add vertical line at 5th percentile
abline(v = upper, col="blue")      # Add vertical line at 95th percentile 
The grand mean of the number of trials needed to achieve success is equal to
    4.085
_images/2366d03d080f68a35e1deed7e3f22dea16191c853b3a55255a8d9870e23902cc.png
num_trials_needed <- c()     # create a vector to store the number of trials needed in while loop
num_samps = 500              # set the number of times to run the simulation

for (i in 1:num_samps){
   k <- 0     # Initialize successes to zero
   n <- 0     # Initialize number of total trials needed for success to occur
   while (k < 1 && n < 20) {
      k <- k + rflip(1, prob = 1/4, summarize = TRUE)[1,2]
      n <- n + 1
    }
   num_trials_needed[i] <- n     # store the number of trials need in this simulation
}

lower <- quantile(num_trials_needed, prob = 0.05)     # Calcuate the 5th percentile.
upper <- quantile(num_trials_needed, prob = 0.95)     # Calcuate the 95th percentile.
cat('The grand mean of the number of trials needed to achieve success is equal to\n   ',mean(num_trials_needed) )
hist(num_trials_needed, breaks = 8, main = 'Histogram of Oil Wells Drilled: Repetitions = 500', xlab = 'Number of Wells Drilled')
abline( v = lower, col="blue")     # Add vertical line at 5th percentile
abline(v = upper, col="blue")      # Add vertical line at 95th percentile 
The grand mean of the number of trials needed to achieve success is equal to
    3.89
_images/3b74c7794e8e755bdc44eb355af30c39ef7184b7c05da8d5b8a75b0aa39b6771.png
num_trials_needed <- c()     # create a vector to store the number of trials needed in while loop
num_samps = 1000              # set the number of times to run the simulation

for (i in 1:num_samps){
   k <- 0     # Initialize successes to zero
   n <- 0     # Initialize number of total trials needed for success to occur
   while (k < 1 && n < 20) {
      k <- k + rflip(1, prob = 1/4, summarize = TRUE)[1,2]
      n <- n + 1
    }
   num_trials_needed[i] <- n     # store the number of trials need in this simulation
}

lower <- quantile(num_trials_needed, prob = 0.05)     # Calcuate the 5th percentile.
upper <- quantile(num_trials_needed, prob = 0.95)     # Calcuate the 95th percentile.
cat('The grand mean of the number of trials needed to achieve success is equal to\n   ',mean(num_trials_needed) )
hist(num_trials_needed, breaks = 8, main = 'Histogram of Oil Wells Drilled: Repetitions = 1,000', xlab = 'Number of Wells Drilled')
abline( v = lower, col="blue")     # Add vertical line at 5th percentile
abline(v = upper, col="blue")      # Add vertical line at 95th percentile 
The grand mean of the number of trials needed to achieve success is equal to
    3.878
_images/b8a5c1c4856325980bb552dd42abe4852831e2785a469836f91bfcf71fb50fae.png
Max Number of Wells Drilled p = 1/4 E(X) Simulation Grand Mean
100 0.25 4 4.05
200 0.25 4 3.92
500 0.25 4 4.28
1,000 0.25 4 3.857