{ "cells": [ { "cell_type": "markdown", "id": "e3ed112d-7cf4-4b16-8250-f82c1ea06256", "metadata": {}, "source": [ "# Statistical Formulas\n", "\n", "These formulas are similar to many such sections in statistics textbooks and provide guidance on how perform basic statistical hypothesis testing and other basic statistical tasks." ] }, { "cell_type": "markdown", "id": "5ce020ed-7f61-4891-8df1-5319cfc3bcb7", "metadata": {}, "source": [ "## Probability\n", "\n", "The following rules of probability will be utilized for the entire course:\n", "\n", "- For any event $A$, $0\\leq P(A) \\leq 1$\n", "- The sample space $S$ has probability $P(S) =1$\n", "- For disjoint event sets $A$ and $B$, \n", "\n", "$$P(A \\text{ or } B) = P(A) + P(B)$$\n", "\n", "- In general, for event sets $A$ and $B$,\n", "\n", "$$P(A \\cup B) = P(A) + P(B)-P(A\\cap B)$$\n", "\n", "- $P(A \\text{ does not occur}) =1-P(A)$\n", "- For a discrete probability density function $p(x)$:\n", " \n", " - $0\\leq p(x_i) \\leq 1 \\hspace{.25cm}\\text{for all}\\hspace{.25cm} 1\\leq i \\leq n$\n", " \n", " - $\\sum p(x_i)=1$\n", "\n", "- For any continuous probability density function $f(x)$:\n", "\n", " - $f(x) \\geq 0 \\hspace{.25cm}\\text{for all}\\hspace{.25cm} x\\in(-\\infty,+\\infty)$\n", "\n", " - $\\displaystyle\\int\\limits_{-\\infty}^{+\\infty} f(x)dx=1$" ] }, { "cell_type": "markdown", "id": "d4b67a5c-c7e8-4126-a430-89c8fe0c4e31", "metadata": {}, "source": [ "## Exploring Data: Distributions and Descriptives\n", "\n", "Look for overall pattern (shape, center, spread) and deviations (outliers).\n", "\n", "- Mean:\n", "\n", "$$\\bar{x}=\\frac{x_1+x_2+...+x_n}{n}=\\frac{1}{n}\\sum{x_i}$$\n", "\n", "- Standard deviation:\n", "\n", "$$s=\\sqrt{\\frac{1}{n-1}\\sum{(x_i-\\bar{x})^2}}$$\n", "\n", "- Median: Arrange all observations from smallest to largest. The median $M$ is located $\\frac{(n + 1)}{2}$ from the beginning of this list. We also use the notation $\\tilde{x}$ for the median.\n", "- Quartiles: The first quartile $Q1$ is the median of the observations whose position in the ordered list is to the left of the location of the overall median. The third quartile $Q3$ is the median of the observations to the right of the location of the overall median.\n", "- Standardized value of $x$:\n", "$$z=\\frac{x-\\mu}{\\sigma}$$" ] }, { "cell_type": "markdown", "id": "421170e8-9a5b-4f8e-8be6-359a739e2f6c", "metadata": {}, "source": [ "### Five Number Summary\n", "\n", "
Statistic | \n", "Symbol | \n", "
---|---|
Minimum | \n", "min | \n", "
1st Quartile | \n", "Q1 | \n", "
Median | \n", "med | \n", "
3rd Quartile | \n", "Q3 | \n", "
Maximum | \n", "max | \n", "