Frequently, and especially recently, misunderstandings of common statistical terms/ concepts have caused confusion and even anger. I would like to (attempt) to clear up a big player in the world of commonly used (and commonly misunderstood) statistical concepts: the p-value.


Stealing Lucy D’Agostino McGowan’s XKCD embedding strategy.

TL DR

A p-value is not a probability of the true parameter being something, but the percentage of times that the data you saw, or more extreme data, would occur given some “null” model. These are subtly, but importantly, different concepts.

Setup:

We will illustrate this concept with a story.

Say you are a cheating detection analyst at a casino. One day one of the casino’s employees comes up to you and tells you that there potentially are unfair coins being used in the casino (they seem to land on tails more frequently). It’s your job to figure out if they are fair or not. The employee hands you a piece of paper with something written on it and then runs away to attend to more important things than statistics. The paper says the following:

Heads = \(h\), Tails = \(t\) | \(t,t,h,t,t,h\)

After staring at this paper for a few minutes, you decide what you have is data on which face of a coin landed upright on a given flip, for a total of 6 flips. A fair coin in your opinion is one that has the same chance of falling on heads as it does tails, or 50-50. This is your null hypothesis: \(P(\text{tails}) = 0.5\). The employee said they thought the coins were biased towards tails, you want to test if they are, this is your alternative hypothesis: \(P(\text{tails}) > 0.5\). Your job as a statistician is to take this incredibly complex data and distill it to a single decision, the coin is fair (null), or the coin is biased towards tails (alternative).