Unit 6: Inference for Categorical Data: Proportions

This unit uses sample proportions to estimate and test claims about population proportions. The main statistic is \(\hat{p}\) for one population, or \(\hat{p}_1-\hat{p}_2\) for two populations.

Estimation And Hypothesis Testing

Statistical inference uses sample data to make conclusions about a population parameter. For proportions, the parameter is usually:

\(p\): one population proportion.
\(p_1-p_2\): difference between two population proportions.

A confidence interval estimates a plausible range of values for a parameter. A hypothesis test evaluates whether sample data provide convincing evidence against a null hypothesis.

Confidence Intervals

A confidence interval has the form

\[\text{statistic} \pm \text{critical value}\cdot \text{standard error}.\]

The confidence level describes the long-run capture rate of the method. A 95% confidence interval does not mean there is a 95% probability that the fixed parameter is in this particular interval. It means that if we repeatedly sampled and built intervals the same way, about 95% of those intervals would contain the true parameter.

Confidence interval repeated sampling placeholder

One-Proportion z-Interval

Use a one-proportion z-interval to estimate one population proportion \(p\):

\[\hat{p} \pm z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.\]

Conditions:

Random sample or random assignment.
Independence: if sampling without replacement, \(n \le 0.10N\).
Large counts: \(n\hat{p} \ge 10\) and \(n(1-\hat{p}) \ge 10\).

Common critical values:

Confidence level	\(z^*\)
90%	1.645
95%	1.960
99%	2.576

Margin Of Error

The margin of error for a one-proportion interval is

\[ME = z^*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}.\]

For planning sample size, use

\[n = \frac{(z^*)^2p^*(1-p^*)}{ME^2},\]

where \(p^*\) is a planning estimate. If no estimate is given, use \(p^*=0.5\) because it gives the most conservative, largest required sample size.

Always round required sample size up.

Hypothesis Tests

A hypothesis test begins with:

Null hypothesis \(H_0\): the default claim, usually “no difference” or “equals a stated value.”
Alternative hypothesis \(H_a\): the claim we seek evidence for.

For one proportion:

\[H_0: p=p_0.\]

The alternative may be

\[H_a:p>p_0,\qquad H_a:p<p_0,\qquad \text{or}\qquad H_a:p\ne p_0.\]

The p-value is the probability, assuming \(H_0\) is true, of getting a test statistic as extreme as or more extreme than the observed result in the direction of \(H_a\).

Decision rule:

If p-value \(< \alpha\), reject \(H_0\).
If p-value \(\ge \alpha\), fail to reject \(H_0\).

Never say “accept \(H_0\)”; the data may simply not be strong enough to reject it.

One-Proportion z-Test

Use a one-proportion z-test for a claim about one population proportion:

\[z = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}.\]

Use \(p_0\) in the standard error because the test assumes the null hypothesis is true.

Conditions:

Random sample or random assignment.
Independence: if sampling without replacement, \(n \le 0.10N\).
Large counts using the null value: \(np_0 \ge 10\) and \(n(1-p_0) \ge 10\).

p-value tail area placeholder

Two-Proportion z-Interval

Use a two-proportion z-interval to estimate \(p_1-p_2\):

\[(\hat{p}_1-\hat{p}_2) \pm z^* \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}.\]

Conditions:

Two random samples or random assignment to two groups.
Independence within and between groups.
If sampling without replacement, \(n_1 \le 0.10N_1\) and \(n_2 \le 0.10N_2\).
Large counts in both groups: successes and failures are each at least 10.

Interpret the interval in context: “We are __% confident that the true difference in population proportions \(p_1-p_2\) is between __ and ___.”

Two-Proportion z-Test

For a test of

\[H_0:p_1-p_2=0,\]

we pool the sample proportions because the null says the two population proportions are equal:

\[\hat{p}_c = \frac{x_1+x_2}{n_1+n_2}.\]

The test statistic is

\[z = \frac{(\hat{p}_1-\hat{p}_2)-0} {\sqrt{\hat{p}_c(1-\hat{p}_c)\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}.\]

Use the pooled proportion only for the hypothesis test, not for the confidence interval.

Errors And Power

A Type I error occurs when we reject a true null hypothesis. Its probability is \(\alpha\), the significance level.

A Type II error occurs when we fail to reject a false null hypothesis. Its probability is \(\beta\).

Power is the probability of correctly rejecting a false null hypothesis:

\[\text{Power} = 1-\beta.\]

Power increases when:

The true parameter is farther from the null value.
Sample size increases.
Significance level \(\alpha\) increases.
Variability decreases.

Calculator Notes

Common calculator tools:

1-PropZInt: one-proportion confidence interval.
1-PropZTest: one-proportion hypothesis test.
2-PropZInt: two-proportion confidence interval.
2-PropZTest: two-proportion hypothesis test.

Calculator output does not replace communication. You still need hypotheses, conditions, p-value or interval, and a conclusion in context.

Working Checklist

Identify the parameter: \(p\) or \(p_1-p_2\).
Choose interval or test.
Check random, independence, and large-count conditions.
Use the correct standard error: null value for tests, sample value for intervals.
Compute the interval or p-value.
Write a conclusion in context.

Key Equations

Idea	Equation
One-proportion interval	\(\hat{p}\pm z^*\sqrt{\hat{p}(1-\hat{p})/n}\)
One-proportion test	\(z=(\hat{p}-p_0)/\sqrt{p_0(1-p_0)/n}\)
Two-proportion interval	\((\hat{p}_1-\hat{p}_2)\pm z^*\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\)
Pooled proportion	\(\hat{p}_c=(x_1+x_2)/(n_1+n_2)\)
Two-proportion test	\(z=\frac{(\hat{p}_1-\hat{p}_2)}{\sqrt{\hat{p}_c(1-\hat{p}_c)(1/n_1+1/n_2)}}\)