Sample sizes (draft)

Currently TrueDelta publicly posts Vehicle Reliability Survey results when there have been responses for at least 20 vehicles owned for at least 80 total months. In comparison, Consumer Reports’ minimum sample is 100 vehicles, with no minimum number of months of ownership. One hundred sounds a lot more impressive than 20. And they do love to brag about the size of their sample. But 20 is far more sufficient for what we’re doing than 100 is for what they do.

People often wonder how a relatively small sample can be used to make a factual statement about an entire, large population. When a few hundred thousand of a certain model are sold, how can a sample of 20, or even 2,000, be sufficient to infer that, say, this model will average 0.8 repair trips per year?

Statisticians have found that the size of population has nothing to do with it when the population is large. Instead, only two things matter: how much does the attribute being measured vary from vehicle to vehicle, and how many vehicles are sampled? By entering the measured “variance” and the sample size into an equation, it is possible to calculate confidence intervals, the range within which the results of repeated tests should fall. TrueDelta posts 90 percent confidence intervals. If the entire population was surveyed, the result would have a nine out of ten shot of falling within the reported confidence interval.

Clearly, with confidence intervals narrower is better. Confidence intervals can be reduced in two ways. You can select a statistic with relatively low variation. (How TrueDelta has done this will be the subject of a later blog entry.) Or you can increase the sample size.

Then there is the matter of how narrow the confidence intervals need to be. We wouldn’t suggest putting too much emphasis on differences of just one or two tenths of a repair trip per year. So while the confidence intervals of roughly plus-or-minus 0.3 repair trips per year that we’ve been finding with 20 to 30 responses aren’t ideal, they’re not so large as to make the results worthless. Especially not when some models have been averaging about 0.3 repair trips per year, while others have been averaging around 1.0. We wouldn’t make much of the difference between 0.3 and 0.5, or that between 0.8 and 1.0. But between 0.3 and 1.0? Certainly.

The way Consumer Reports reports results requires far narrower confidence intervals. They ask people to only report problems they “considered serious.” This yields an average of only about 0.16 serious problems per vehicle for the latest well-represented model year (2006 currently). This is roughly one quarter the average number of repair trips per year that TrueDelta has been reporting. They then label a model “worse than average” when it’s 20 percent below this number, or around 0.19 serious problems per vehicle. And the dreaded “much worse than average” kicks in at 45 percent below, around 0.23 serious problems per vehicle. Since a few hundredths of a serious problem per vehicle can make all the difference between a good dot and a bad one, Consumer Reports should have confidence intervals of, at most, a couple hundredths serious problems per vehicle.

Have they achieved this worthy goal? It’s not easy to say, since they’ve never reported confidence intervals. But an inference is possible based on TrueDelta’s data. The confidence intervals they need are less than one tenth as wide as those we’ve been observing with a sample size in the 20s. Confidence interval breadth varies with the square root of the sample size. So a sample size of 100 (with a square root of ten) would yield confidence intervals half as wide as those yielded by a sample size of 25 (with a square root of five). Let’s even assume that for many models they have a sample size of 400. This would make their confidence intervals about one-fourth as wide as TrueDelta’s. For confidence one-tenth as wide as ours they would need a minimum sample size around 2,000…which they don’t have for more than a few models.

I’d like to get TrueDelta’s confidence intervals down to about 0.1 repair trips per year. The current sample sizes are not large enough for this, but TrueDelta is much closer to achieving this goal than Consumer Reports is to obtaining the sample sizes it needs given the way it reports its statistics. Put another way, anyone who trusts the results published by Consumer Reports should have no sample-size-related issues with TrueDelta’s results.

Note: This is an initial draft. I’ll have this entry reviewed by others with more statistical expertise than I possess, and will be revising it based on their suggestions.