Statistical Formulas#

In 9th grade algebra, we learn about equations such as

\[y = x\]

We find that:

  1. \(x\) is the independent variable, and

  2. \(y\) is the dependent variable.

In statistics, we have similar equations that we use to indicate what operations are being done, such as:

\[y \sim x\]

where \(y\) is the dependent variable. The meanings are quite similar. However, the operators we use for statistics forumalae are important and somewhat different than algebra.

ANOVA and \(t\)-Tests#

Suppose that we have a numeric variable \(\textbf{y}\) and a grouping variable \(\textbf{A}\). The statistical formula indicating that we should create an ANOVA or a \(\textbf{t}\)-test as appropriate is the following:

\[y \sim A\]

If the category variable \(A\) has 2 or fewer groups, R will conduct the \(t\)-test. If \(A\) has 3 or more groups, an ANOVA is launched which, admittedly, takes a couple more steps to complete.

Linear Regression#

If we have two numeric variables \(x\) and \(y\), the formula

\[y \sim x\]

indicates simple linear bivariate regression.

Forcing an Intercept#

We can specify the \(y\)-intercept of 2 in a linear model as follows:

\[ y \sim 2 + x\]

Polynomial Regression#

The following code will produce quadractic regression:

\[ y \sim \text{poly}(x,2)\]

or cubic regression:

\[ y \sim \text{poly}(x,3)\]

Note that these formulas will be intrepretted by R as a request for a model using orthogonal polynomials. We can specify a model that using the specific (and traditional) powers as follows:

\[ y \sim 1 + x + I(x^2)\]