AP Statistics Cheat Sheet

This page is a high-yield AP Statistics reference for last-minute review. It pulls together vocabulary, conditions, and formulas across Units 1–9 in one scan-friendly layout. On exam day you also have the official formula sheet and tables—use this page to remember when to apply each idea and how to phrase conclusions in context.

VERY Important Terms

Parameter (population): \(\mu\), \(p\), \(\sigma\), \(\beta\): fixed, usually unknown. Parameters are always GREEK letters!
Statistic (sample): \(\bar{x}\), \(\hat{p}\), \(s\), \(b\): computed from data. Parameters are always ROMAN (basically English) letters!

Unit 1: Exploring Univariate Data

Graphs and shape

Histogram / stemplot: shape (skew left/right, symmetric), gaps, unusual values.
Skew: tail direction = skew direction (long tail right → skewed right).
Boxplot: median, IQR, outliers (often \(Q_1 - 1.5\,\text{IQR}\) and \(Q_3 + 1.5\,\text{IQR}\) rule).

Center and spread

Center: Mean \(\bar{x}\): pulled by outliers (average of all data points); median resistant (the “middle” data point).
Spread: Standard deviation \(s\): typical distance from mean (same units as data); IQR \(= Q_3 - Q_1\): spread of middle half; resistant to outliers
When comparing two distributions, alwasy use CUSS: Center, Unusual features, Shape, and Spread in context.

Transforming Distributions

Mean: Suppose you have \(y=f(x)\). Then \(\mu_y = \mu_f(x)\). Do any operation to combine two variables, the mean will be the same operation done on the two respective means.
Standard Deviation: Suppose you have \(y=f(x) + a\), where \(f(x) does not contain any addition of constants\). Then, \(\sigma_y=\sigma_f(x)\). Also, if you add/subtract two variables, the new standard deviation becomes: \(\sigma_a,b = \sqrt{\sigma_a^2 + \sigma_b^2}\)

Unit 2: Exploring Bivariate Data

Scatterplots

Describe form (linear/curved), direction (positive/negative), strength (weak/moderate/strong), and unusual points (outlier). Always describe in context!

Correlation \(r\)

Measures linear association for quantitative variables; \(-1 \le r \le 1\).
Not resistant to outliers; NEVER cite causation from correlation alone.

Least-squares regression line

\[\hat{y} = a + bx\]

Slope \(b\): predicted change in \(\hat{y}\) when \(x\) increases by 1 (in \(y\) units per \(x\) unit).
Intercept \(a\): predicted \(\hat{y}\) when \(x=0\) (only meaningful if \(x=0\) is plausible).
Residual \(e = y - \hat{y}\); residual plot checks linearity and equal spread. If residuals are reasonable and distributed randomly, then a linear trend is a good fit.

Influential vs outlier

Outlier in \(y\): large residual.
Influential point: pulling the regression line; often extreme in \(x\).

Unit 3: Collecting Data

Scope of inference

Random sample → generalize to population that was sampled from
Random assignment → supports cause (for differences in treatment)

Study types

Observational: treatments not imposed; confounding possible, and thus causation CANNOT be established.
Experiment: researcher assigns treatments; can show causation if well designed.
Tip: Always say “This study is an [experiment/observation] because the researchers [did/did not] impose treatments.

Sampling designs

Simple Random Sample: Take a population, assign each a number, and use a random number generator to select your desired amount of samples.
Stratified Random Sample: Take a population, split into groups based on a characteristic, and employ a simple random sampling method to randomly choose the desired amount of samples from EACH group.
Clustered Random Sample: Take a population, split into groups based on a characteristic, and employ a simple random sampling method to randomly choose the desired amount of groups and sample ALL members of the selected groups.
Bias sources: wording, nonresponse, voluntary response, convenience samples. Usually, random sampling reduces bias one way or another

Experiments

Blocking: separate similar subjects first, then randomize within blocks.
Control: The control group is the group that receives no treatment or a placebo (basically something that looks like the treatment but does nothing)
Replication: Replication occurs if you can rerun the experiment again. It does NOT include cases where you run a treatment many times in one experiment.
Random assignment: To get the best results, always try to use some sort of random assignment to assign patients to treatments
Blinding: Blinding is when a party (either the patient or the deliverer of the treatment) does not know which treatment a patient is assigned to. Single blinding is when the patient does not know which treatment is given (usually done through placebos), and double blinding is when both parties don’t know, usually done where the treatments are assigned and then given to the unknowing deliverer.

Unit 4: Probability and Random Variables

Probability rules

\[0 \le P(A) \le 1\]
\[P(A^c) = 1 - P(A)\]
\[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]
Independent: \(P(A \mid B) = P(A)\) (if \(P(B)>0\)); useful form \(P(A \cap B) = P(A)P(B)\).
Conditional: \(P(A \mid B) = \dfrac{P(A \cap B)}{P(B)}\)

Discrete random variables

Mean (expected value): \(\mu_X = \sum x\, P(X=x)\)
Variance: \(\sigma_X^2 = \sum (x-\mu_X)^2 P(X=x)\) ; SD \(\sigma_X\).

Combining random variables

\[\mu_{X \pm Y} = \mu_X \pm \mu_Y\]
If independent: \(\sigma^2_{X \pm Y} = \sigma^2_X + \sigma^2_Y\) (note plus for both sum and difference).

Binomial

Fixed \(n\) independent trials, same \(p\), success/failure outcomes.

\[P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}\]

\(\mu_X = np\), \(\sigma_X = \sqrt{np(1-p)}\)

Calculator Trick: Use binompdf(n, p, x) for \(P(X=x)\) and binomcdf(n, p, x) for \(P(X≤x)\)

Geometric \(G(n,p)\)

Number of trials \(k\) to get one success.

\[P(X=k) = p^k (1-p)^{n-k}\]

\(\mu_X = \frac{1}{p}\), \(\sigma_X = \frac{\sqrt{1-p}}{p}\)

Calculator Trick: Use geometpdf(n, p, x) for \(P(X=x)\) and geometcdf(n, p, x) for \(P(X≤x)\)

Normal distribution

Standardize: \(z = \dfrac{x-\mu}{\sigma}\)
Use normalcdf / invNorm with proper \(\mu\), \(\sigma\) and boundaries.

Unit 5: Sampling Distributions

Core ideas

A statistic varies from sample to sample; its distribution is the sampling distribution.
Central Limit Theorem: for large \(n\), sampling distribution of \(\bar{x}\) is approximately Normal even if the population isn’t (Independence / Random + large \(n\)).

Means

\[\mu_{\bar{x}} = \mu\]
\(\sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}}\) (if \(\sigma\) known); use \(\dfrac{s}{\sqrt{n}}\) as standard error when estimating.

Proportions

\[\mu_{\hat{p}} = p\]
\[\sigma_{\hat{p}} = \sqrt{\dfrac{p(1-p)}{n}}\]

Conditions

\(np \ge 10\) and \(n(1-p) \ge 10\) for Normal approximation to \(\hat{p}\).
\(n \ge 30\) often cited for \(\bar{x}\), but prefer context + shape of population when \(n\) is small.

Inference: Shared Structure

Confidence interval

\[\text{statistic} \pm (\text{critical value}) \times (\text{standard error})\]

Interpret: “We are C% confident that the interval (interval) captures the true population parameter [in context].”
Not: “C% chance the parameter is in this interval” for one computed interval.

Hypothesis test

Parameter in words + symbols (define \(p\); \(\mu\); \(p_1-p_2\); \(\beta\)).
Hypotheses: \(H_a\) matches the question; equality always in \(H_0\).
Conditions / PLANTS-style checks (see each procedure).
Test statistic + \(P\)-value (probability of a statistic at least as extreme as observed if \(H_0\) true).
Conclusion linked to \(P\)-value + context.

Errors

Type I: reject true \(H_0\) (false alarm); controlled by \(\alpha\).
Type II: fail to reject false \(H_0\) (miss); related to power (which is \(1-\beta\)).

Unit 6: Inference for Proportions

One-sample \(z\) interval and test for \(p\)

Standard error (estimate): \(\text{SE}_{\hat{p}} = \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)
\(z\) statistic: \(z = \dfrac{\hat{p} - p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}\) under \(H_0\).

Two-sample difference \(\hat{p}_1 - \hat{p}_2\)

CI: \((\hat{p}_1-\hat{p}_2) \pm z^* \sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\)
Hypothesis test \(p_1-p_2\): pooled proportion \(\hat{p}_c = \dfrac{x_1+x_2}{n_1+n_2}\) for standard error under \(H_0: p_1=p_2\):

\[z = \dfrac{(\hat{p}_1-\hat{p}_2)-0}{\sqrt{\hat{p}_c(1-\hat{p}_c)\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}}\]

Conditions (short checklist)

Random assignment or random sampling (as appropriate).
10%: \(n \le 0.10\,N\) when sampling without replacement (each group if two-sample).
Large counts: successes & failures \(\ge 10\) each (for each group or for one-sample use \(np_0\) and \(n(1-p_0)\) when checking Normal approximation).

Unit 7: Inference for Means

One-sample \(t\) for \(\mu\)

\[t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \quad \text{df} = n-1\]

Paired data

Difference \(d_i = \text{before}_i - \text{after}_i\) (be consistent); analyze one sample of differences.

Two-sample \(t\) for \(\mu_1 - \mu_2\)

Typically unpooled: \(\text{SE} = \sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}\) ; df use calculator/wizard or conservative approximate methods per course.

Conditions

Random + 10% if sampling without replacement.
Normality / \(n\): if \(n\) small, need roughly symmetric data without strong skew/outliers; CLT helps when \(n\) is large.

Unit 8: Chi-Square Tests

Expected counts (tables)

\[\text{expected} = \frac{(\text{row total})(\text{column total})}{\text{table total}}\]

Test statistic

\[\chi^2 = \sum \frac{(\text{observed}-\text{expected})^2}{\text{expected}}\]

Types

Goodness-of-fit: one categorical variable; \(\text{df} = (\#\text{cells}) - 1\) (minus estimated params if any).
Homogeneity: same distribution across populations / treatment groups.
Independence / association: one sample classified two ways.

Conditions

Random counts / random assignment as appropriate.
Large expected counts: commonly all expected \(\ge 5\) (some contexts use a stricter rule).

Unit 9: Inference for Slopes (Linear Regression)

Model (thinking)

\[y = \beta_0 + \beta_1 x + \varepsilon\]

Slope test: \(H_0: \beta_1 = 0\) vs \(H_a: \beta_1 \ne 0\) (or one-sided).

\(t\) statistic for slope

\[t = \frac{b - \beta_1}{\text{SE}_b}, \quad \text{df} = n-2\]

Conditions (LINE / PLUS-style memory hook)

Linear relationship (check scatterplot + residual plot).
Independent observations (random / 10%).
Normal residuals roughly (especially for small \(n\)).
Equal spread of residuals across \(x\).

Fast “Which Procedure?” Map

Question type	Usually
One proportion	One-sample \(z\)
Compare two proportions	Two-sample \(z\)
One mean	One-sample \(t\)
Matched pairs	Paired \(t\) on differences
Two independent means	Two-sample \(t\)
Distribution fits a model	\(\chi^2\) goodness-of-fit
Same distribution across groups	\(\chi^2\) homogeneity
Association of two categorical vars	\(\chi^2\) independence
Linear relationship between \(x\) and quantitative \(y\)	\(t\)-test for slope \(\beta_1\)

Most Common AP Statistics Mistakes

Parameter vs statistic: mixing \(\mu\) and \(\bar{x}\), or \(p\) and \(\hat{p}\), in conclusions.
Correlation ≠ causation; confounding in observational studies.
Wrong SE: pooled vs unpooled proportions; forgetting \(\sqrt{n}\) in denominator.
Hypotheses about \(\hat{p}\) or \(\bar{x}\) instead of \(p\) or \(\mu\).
“Accept \(H_0\)” language — say fail to reject \(H_0\).
\(P\)-value definition: mis-stating as \(P(H_0 \text{ true})\).
Conditions skipped or checked as “met” without linking to the study’s randomness and sample size.
Chi-square: wrong \(\text{df}\) or wrong “expected count” formula.

Exam-Day FRQ Mini-Checklist PLEASE CHANGE

Name the procedure (e.g., two-sample \(t\) interval for \(\mu_1-\mu_2\)).
Define parameters in plain language with context.
Show conditions with numbers where needed (not just “\(n\) large”).
Give formula with numbers plugged when asked to calculate.
Conclusion: evidence sentence referencing \(P\)-value or interval in context, and whether it generalizes / suggests causality based on study design.