mmcirvin | Oct. 4th, 2003

One of the things that impressed me the most about blogger Bruce Rolston was that, back during the hunt for the DC shooters a year ago, he made the correct call about the vehicle driven by serial killers John Muhammad and John Lee Malvo, at a time when the investigation was mostly on the wrong track.

Rolston's reasoning was simple. Two vehicles had been mentioned in connection with the various shooting incidents: a burgundy Chevy Caprice, and a white van. But white vans are common vehicles that you could see anywhere by chance, whereas Chevy Caprices are not as common, so reports of such a car are more likely to be significant. (It turned out to be a dark blue Caprice, but the people who named it had pretty clearly seen the actual car.)

This is a simple application of Bayes' Theorem. I've gotten into arguments before about the usefulness of the flavor of statistics built around this theorem, but the theorem itself is a pretty uncontroversial statement about evidence and conditional probability that is useful to know as an antidote to errors of reasoning.

It has to do with how the acquisition of new evidence should affect your estimate of the probability that something is true. Suppose that you've figured that the probability of statement b being true is P(b). Then you discover some new fact, a, which could happen as a consequence of b being true. Then a is evidence for b, but it doesn't necessarily imply b. The probability should be affected in the following way:

P(b|a) = P(b) P(a|b)
         -----------
             P(a)

where P(b|a) means "the probability of b, given a"; P(a|b) means "the probability of a, given b"; P(b) is your prior calculation of b's probability, and P(a) is the prior probability of a, whether b is true or not. P(a) is in the denominator, because stuff that is likely to have happened anyway is not very strong evidence one way or the other.

So in this case, take b to mean "the shooters are driving this kind of vehicle" and a to mean "some people saw this kind of vehicle driving around near a shooting". P(a|b) is a fairly high number, maybe even above 0.5, for both the Caprice and the white van (as the number of shootings mounted, so did the probability that some witness would see the car in question and remember what it looked like). But P(a) is pretty high for the white van, since there are white vans driving around all over the place for a variety of reasons, but not so high for the Chevy Caprice.

Wait a minute... what about P(b)? Presuming that we know nothing about the shooters, and knowing that white vans are more common than Caprices, shouldn't P(b) be more common for the white van as well? It's a tricky business, choosing these prior probabilities. The real question is which kind of vehicle has the higher ratio of P(b)/P(a). But since white vans are so common as delivery and utility vehicles that false sightings of them near a shooting are significantly likely, P(b)/P(a) ought to be much lower for them than for Caprices.

I suppose that in this case, since we don't have a good way of quantitatively writing down reasonable prior probabilities, writing out the formula really doesn't tell us much we didn't already know. But it's a good way of organizing your thoughts about questions of probability and evidence.

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Matt McIrvin's Freaky Alternate Reality

Oct. 4th, 2003

Oct. 4th, 2003

Bayes' Theorem and the DC shooting spree

The end

Profile

June 2025

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags