Hypothesis Testing: Making Data-Driven Decisions

Intermediate Eda
~2 min read Eda

Definition

Hypothesis testing is a statistical method for making decisions using experimental data. It provides a formal framework to determine whether observed patterns in data reflect true effects or could reasonably occur by random chance. The process begins with two competing hypotheses: the null hypothesis (H0), which typically states there is no effect or no difference, and the alternative hypothesis (H1), which proposes a specific effect or difference exists.

Intuition

💡

Imagine you are a judge at a trial. The null hypothesis is 'the defendant is innocent' (status quo), and you need strong evidence to overturn this presumption. The p-value is like the strength of the prosecution's evidence - it tells you how likely the evidence would be if the defendant were truly innocent.

Mathematical Formula

One-Sample t-test:
\[ \quad t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]
Two-Sample t-test:
\[ \quad t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \]
Chi-Square Test:
\[ \quad \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]

Step-by-Step Explanation:

  1. One-sample t: Measures how many standard errors the sample mean is from the hypothesized mean.
  2. Two-sample t: Compares means of two groups using pooled variance.
  3. Chi-square: Sum of squared standardized differences between observed and expected.

Interactive Demo

Standard normal with rejection region Example Data

Real-World Use Cases

Clinical Trials

Pharmaceutical companies use t-tests to compare drug efficacy against placebo. A p-value < 0.05 suggests the drug has a statistically significant effect.

A/B Testing

Tech companies use hypothesis testing to evaluate website changes. Two-sample t-tests compare conversion rates between control and treatment groups.

Quality Control

Engineers use one-sample t-tests to verify products meet specifications.

Implementation

Manual Implementation (No Libraries)

The t-statistic measures how many standard errors the observed mean is from the hypothesized mean.
import numpy as np
import math

def one_sample_t_test(data, mu_0):
    n = len(data)
    mean = np.mean(data)
    std = np.std(data, ddof=1)
    t_stat = (mean - mu_0) / (std / math.sqrt(n))
    return t_stat, n - 1

data = [120, 122, 118, 125, 119, 121, 124, 117, 123, 120]
t_stat, df = one_sample_t_test(data, 120)
print(f't-statistic: {t_stat:.4f}, df: {df}')

Using Libraries (numpy, scipy)

import numpy as np
from scipy import stats

data = np.random.normal(122, 8, 30)
t_stat, p_value = stats.ttest_1samp(data, 120)
print(f'One-sample t: t={t_stat:.4f}, p={p_value:.4f}')

group_a = np.random.normal(75, 10, 25)
group_b = np.random.normal(82, 10, 25)
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f'Two-sample t: t={t_stat:.4f}, p={p_value:.4f}')

When to Use

✅ Appropriate Use Cases:

  • One-sample t-test: Comparing a sample mean to a known/hypothesized value
  • Two-sample t-test: Comparing means of two independent groups
  • Chi-square: Testing independence between categorical variables

❌ Avoid When:

  • Do not use t-tests on ordinal data
  • Do not ignore assumption violations
  • Do not run multiple t-tests instead of ANOVA

Common Pitfalls

  • Misunderstanding p-values: p=0.03 does not mean 97% probability the effect is real
  • Ignoring effect size
  • Multiple comparisons problem