20. A/B Testing Tools¶

from datascience import *
import numpy as np

%matplotlib inline

import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

from scipy import stats

I keep some data frames in CSV format accessible from my website. One of them is called personality.csv and has, as you might imagine, personality variables. In this case, we are using a subset of the personality data with variables like perfectionsism, narcissism with grouping variables like biological sex and the AccDate variable which has Yes/No responses to the following question:

“At a time in your life when you are not involved with anyone, a person asks you out. This person has a great personality, but you do not find this person physically attractive. Do you accept the date?”

The Stress1 and Stress2 variables are pre-post data collected from the 2nd week and 7th week of the semester respectively to see if college students experience more stress during midterms.

pers = Table.read_table('http://faculty.ung.edu/rsinn/perfnarc.csv')
pers

Sex	G21	Greek	AccDate	Stress1	Stress2	Perf	Narc
F	N	N	N	9	7	99	3
F	Y	N	Y	11	13	86	2
F	N	Y	N	15	14	118	4
F	N	N	Y	16	15	113	2
F	Y	N	Y	17	17	107	8
F	N	N	N	10	7	123	1
F	N	N	N	16	18	93	4
F	N	Y	Y	12	12	126	7
F	N	N	Y	11	16	91	5
F	Y	N	Y	18	16	111	1

... (138 rows omitted)

Tools for A/B Testing¶

As we walk through an example with narcissism, we will build three functions that will help us conduct A/B tests.

ab_shuffle
ab_diff
ab_hist

All three expect an input of a 2-column table with the grouping variable in the first column and a numeric variable in the second.

Creating a 2-column table for A/B Testing¶

We will use the grouping variable of biological sex and numeric variable of narcissism scores.

narc = pers.select('Sex','Narc')

narc.group('Sex')

Sex	count
F	85
M	63

narc.group('Sex', np.average)

Sex	Narc average
F	3.81176
M	5.57143

Calculating observed difference in means for A/B groups¶

a_mean = narc.group(0,np.average).column(1).item(0)
a_mean

3.8117647058823527

b_mean = narc.group(0,np.average).column(1).item(1)
b_mean

5.571428571428571

observed_difference = a_mean - b_mean
observed_difference

-1.7596638655462185

integer_bins = np.arange(15)
narc.hist('Narc', group = "Sex", bins = integer_bins)
_= plots.title('Narcissism by Sex')

C:\Users\robbs\anaconda3\envs\datasci\lib\site-packages\datascience\tables.py:920: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  values = np.array(tuple(values))

The A/B hypothesis test for differences in narcissism based on biological sex¶

The null hypothesis is that the male and female groups are drawn from the same distribution. If so, then randomly shuffling the grouping labels should not matter. The observed difference in A/B means should fall well within the distribution of shuffled differences in A/B means which we can simulate.

Creating `ab_shuffle`: a function for shuffling the grouping labels¶

Let’s first demonstrate step by step what we need the function to do. Then we can create the function. The first code block below demonstrates our “shuffle” command which uses the sample method and draws without replacement.

shuffle_sex = narc.sample(with_replacement = False)
shuffle_sex.show(5)

Sex	Narc
F	3
F	2
F	0
M	5
F	4

... (143 rows omitted)

shuffle_sex = narc.sample(with_replacement = False).column(0)
shuffle_sex

array(['M', 'F', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'F', 'F', 'F', 'F',
       'F', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'F', 'F', 'F', 'M', 'F',
       'F', 'F', 'F', 'F', 'F', 'M', 'M', 'M', 'F', 'F', 'M', 'F', 'F',
       'M', 'M', 'M', 'M', 'M', 'F', 'F', 'F', 'F', 'F', 'M', 'M', 'F',
       'F', 'M', 'F', 'M', 'M', 'F', 'F', 'F', 'F', 'F', 'F', 'F', 'M',
       'M', 'M', 'M', 'F', 'F', 'M', 'M', 'M', 'M', 'F', 'F', 'M', 'M',
       'F', 'F', 'F', 'M', 'F', 'F', 'F', 'F', 'M', 'F', 'F', 'F', 'F',
       'F', 'M', 'M', 'M', 'F', 'M', 'F', 'F', 'M', 'M', 'F', 'M', 'M',
       'M', 'M', 'M', 'F', 'F', 'F', 'M', 'M', 'F', 'M', 'M', 'M', 'M',
       'F', 'F', 'M', 'F', 'F', 'F', 'M', 'F', 'F', 'F', 'F', 'F', 'F',
       'M', 'M', 'M', 'F', 'F', 'F', 'F', 'M', 'F', 'F', 'M', 'F', 'F',
       'F', 'M', 'M', 'M', 'M'], dtype='<U1')

After creating an array of shuffled labels, we need to include that array as column in our table. We can add the shuffled labels as a third column, then use the select method to create a two-column table with the columns in the correct order.

shuffled_narc = narc.with_column("Shuffled Grouping",shuffle_sex)
shuffled_narc.show(5)

Sex	Narc	Shuffled Grouping
F	3	M
F	2	F
F	4	F
F	2	M
F	8	F

... (143 rows omitted)

shuffled_narc = narc.with_column("Shuffled Grouping",shuffle_sex).select(2,1)
shuffled_narc.show(5)

Shuffled Grouping	Narc
M	3
F	2
F	4
M	2
F	8

... (143 rows omitted)

shuffled_narc.group('Shuffled Grouping',np.average)

Shuffled Grouping	Narc average
F	4.6
M	4.50794

The `ab_shuffle` function¶

Our function just combines the previous several code blocks. Notice that the expected input is a two-column table with the grouping variable be in the first column.

def ab_shuffle(tab):
    shuffle_group = tab.sample(with_replacement = False).column(0)
    shuffled_tab = tab.with_column("Shuffled Grouping",shuffle_group).select(2,1)
    return shuffled_tab

ab_shuffle(narc)

Shuffled Grouping	Narc
M	3
M	2
M	4
M	2
F	8
M	1
F	4
F	7
M	5
F	1

... (138 rows omitted)

Creating `ab_diff`: a function that calculates the difference in A/B group means¶

We can add a function to the .group method to find the A/B group means.

shuffled_narc.group('Shuffled Grouping',np.average)

Shuffled Grouping	Narc average
F	4.6
M	4.50794

a_mean = shuffled_narc.group('Shuffled Grouping',np.average).column(1).item(0)
a_mean

4.6

b_mean = shuffled_narc.group('Shuffled Grouping',np.average).column(1).item(1)
b_mean

4.507936507936508

diff = a_mean - b_mean
diff

0.09206349206349174

The `ab_diff` function¶

Using the above code blocks as a template, we can write a function that grabs the means from the grouping table. Again, the expected input is a two-column table with the grouping variable first.

def ab_diff(tab):
    tab.group(0,np.average)
    a_mean = tab.group(0,np.average).column(1).item(0)
    b_mean = tab.group(0,np.average).column(1).item(1)
    return a_mean - b_mean

ab_diff(shuffled_narc)

0.09206349206349174

Simulating the statistic¶

The statistic we need is the difference in shuffled A/B group means. Our plan is to use a for loop to repeatedly reshuffle the labels and calculate this statistic. The output will be an array representing a random sampling of this statistic.

The engine in the for loop is quite simple. We shuffle the data table and calculate the difference in A/B means in one line using the two functions we created above.

diffs = make_array()

# Reduce reps to 1,000 or less, especially if running in cloud.
reps = 5000

for i in range(reps):
    new_diff = ab_diff(ab_shuffle(narc))
    diffs = np.append(diffs, new_diff)

# Remove the hashtag/comment symbol to see the array output
# diffs

Displaying the distribution of the null hypothesis statistic¶

Let’s create a third function, one that will take an array of simulated statistics along with an observed value and plot a histogram showing both.

def ab_hist(myArray, observed_value):
    tab = Table().with_column('A/B Differences',myArray)
    tab.hist(0)
    _ = plots.plot([observed_value, observed_value], [0, 0.1], color='red', lw=2)

ab_hist(diffs,observed_difference)

Conside what the above visualization means.

The blue histogram represents the null hypothesis statistic
The red line indicates the observed value of the statistic

To calculate a \(p\)-value, we first creat a truth array for the number of randomized A/B differences in means that were less than the observed_difference. Then we can sum the truth array which counts all simulated values at least as extreme as the observed value.

sum(diffs <= observed_difference)

p_val = sum(diffs <= observed_difference) / reps
p_val

0.0002

Results of Example A/B Test¶

Because p_val is far less than \(0.05\), we reject the null. In real world terms, we conclude that there is strong evidence that a significant difference exists in Narcissism levels based upon biological sex.

Sex	G21	Greek	AccDate	Stress1	Stress2	Perf	Narc
F	N	N	N	9	7	99	3
F	Y	N	Y	11	13	86	2
F	N	Y	N	15	14	118	4
F	N	N	Y	16	15	113	2
F	Y	N	Y	17	17	107	8
F	N	N	N	10	7	123	1
F	N	N	N	16	18	93	4
F	N	Y	Y	12	12	126	7
F	N	N	Y	11	16	91	5
F	Y	N	Y	18	16	111	1

Sex	G21	Greek	AccDate	Stress1	Stress2	Perf	Narc
F	N	N	N	9	7	99	3
F	Y	N	Y	11	13	86	2
F	N	Y	N	15	14	118	4
F	N	N	Y	16	15	113	2
F	Y	N	Y	17	17	107	8
F	N	N	N	10	7	123	1
F	N	N	N	16	18	93	4
F	N	Y	Y	12	12	126	7
F	N	N	Y	11	16	91	5
F	Y	N	Y	18	16	111	1

Intro to Applied Statistics

20. A/B Testing Tools¶

Tools for A/B Testing¶

Creating a 2-column table for A/B Testing¶

Calculating observed difference in means for A/B groups¶

The A/B hypothesis test for differences in narcissism based on biological sex¶

Creating ab_shuffle: a function for shuffling the grouping labels¶

The ab_shuffle function¶

Creating ab_diff: a function that calculates the difference in A/B group means¶

The ab_diff function¶

Simulating the statistic¶

Displaying the distribution of the null hypothesis statistic¶

Results of Example A/B Test¶

Creating `ab_shuffle`: a function for shuffling the grouping labels¶

The `ab_shuffle` function¶

Creating `ab_diff`: a function that calculates the difference in A/B group means¶

The `ab_diff` function¶

Sex	G21	Greek	AccDate	Stress1	Stress2	Perf	Narc
F	N	N	N	9	7	99	3
F	Y	N	Y	11	13	86	2
F	N	Y	N	15	14	118	4
F	N	N	Y	16	15	113	2
F	Y	N	Y	17	17	107	8
F	N	N	N	10	7	123	1
F	N	N	N	16	18	93	4
F	N	Y	Y	12	12	126	7
F	N	N	Y	11	16	91	5
F	Y	N	Y	18	16	111	1