Earlier today, on Hacker News someone posted a link to Tom Moertel's blog post "On the evidence of a single coin toss", where he poses a question about probabilities: if he claims he had a perfectly-biased always-heads coin, and you toss it once and it comes up heads, should that sway your beliefs about the claim?
This was an excitingly interesting question on which I spent far too much time working out an answer, so to sort of justify the time I wrote up the answer and posted it on HN. I figured I should clean it up a little and post here. The tl;dr is that "it depends", first on your formalism (and whether you buy into Bayesian analysis), and second on how much you trust Tom in the first place.
There are at least three different lines of inquiry here:
There's a hidden probability in the simple case, because p(C) is encompassing both my belief in coins generally and also my belief about Tom's truthtelling. So really I have p(C) "p that the claim is true" but also p(S) "p that Tom stated the claim to me". Thus also p(S|C) "p that if the claim were true, Tom would state this claim to me" and p(C|S) "p of the claim being true given that Tom stated it to me"; but also the highly relevant p(S|¬C) "p of that if the claim were NOT true, Tom would state this claim to me ANYWAY" and a few other variants. When you start doing Bayesian analysis with more than two variables you nearly always need to account for both p(A|B) and p(A|¬B) for at least some of the cases, even where you could sometimes fudge this in the simpler problems.
SO this brings us to a formulation of the original question as: what is the relationship between p(C|S,H) and p(C|S)? The former as
p(H|C,S)p(C,S)/(p(C,S,H) + p(¬C,S,H))and then
p(H|C,S)p(C,S)/(p(H|C,S)p(C,S) + p(H|¬C,S)p(¬C,S))and if I take p(H|C,S) as 1 (given) and p(H|¬C,S) as 1/2 (approximate), I'm left with
p(C,S)/(p(C,S) + 0.5p(¬C,S))For the prior quantity p(C|S), a similar set of rewrites gives me
p(C,S)/(p(C,S) + p(¬C,S))Now I'm in the home stretch, but I'm not done.
Here we have to break down p(C,S) and p(¬C,S). For p(C,S) we can use p(C)p(S|C), which is "very small" times "near 1", assuming Tom would be really likely to state that claim if it were true (wouldn't you want to show off your magic coin?). The other one's more interesting. We rewrite p(¬C,S) as p(¬C)p(S|¬C), which is "near 1" times "is Tom just messing with me?".
Because a crucial part of this analysis, which is missing in the hypothesis-test version or in the simpler Bayesian model, but "obvious" to anyone who approaches it from a more intuitive standpoint, is that it matters a lot whether you think Tom might be lying in the first place, and whether he's the sort that would state a claim like this just to get a reaction or whatever. In the case where you basically trust Tom ("he wouldn't say that unless he at least thought it to be true") then the terms of p(C,S) + p(¬C,S) might be of comparable magnitude, and multiplying the second of them by 1/2 will have a noticeable effect. (Specifically, if p(C,S) and p(¬C,S) turned out to be exactly equal, then flipping once would make us about 4/3 as likely to believe the claim as before.) But if you think Tom likely to state a claim like this, even if false, just for effect (or any other reason), then p(C,S) + p(¬C,S) is hugely dominated by that second term, which would be many orders of magnitude larger than the first, and so multiplying that second term by 1/2 is still going to leave it orders of magnitude larger, and the overall probability—even with the extra evidence—remains negligible, with a very slight increase to the belief in Tom's claim.
[0] This clearly breaks if p(C) is higher than 1/2, because twice that is more than 1. If we assume that the prior p(H) is a distribution over coins, centred on the fair ones and with a long tail going out to near-certainty at both ends, the claim "this coin is an always-heads coin"[1] is removing a chunk of that distribution in the H direction, meaning that p(H|¬C) is actually slightly, very slightly, greater than 1/2. This is the "fudge" I refer to above that lets me put the p(H) as 1/2. Clearly if my prior p(C) is higher than "very small" this would be inconsistent with the prior p(H) I've described.
[1] I'm further assuming that "always" means "reallllllly close to always", because otherwise the claim is trivially false and the problem isn't very interesting.
[2] Note that this is not actually a "naive Bayesian" approach—that's a technical term that means something more complicated.
[3] This is what I meant about buying into the Bayesian approach. I'm going to continue the post under the assumption that Bayesian reasoning is valid (even if not what is traditionally called a "probability"), and I'm going to use the language and notation of probability to do it. If that doesn't sit well, imagine that I am quantifying something like belief or confidence rather than probability per se.
"How would Apple like it if when they discovered a serious bug in OS X, instead of releasing a software update immediately, they had to submit their code to an intermediary who sat on it for a month and then rejected it because it contained an icon they didn't like?" --Paul Graham
Posted by blahedo at 5:05pm on 7 Dec 2010