Change, but not for the better:Consumer Reports' new ratings

Or, what's old is new again.

The old rating system

In my first piece on Consumer Reports, I criticized their use of relative ratings. From such ratings it's not possible to tell how many more problems a "bad" car will have than a "good" one. As a result, many readers likely assume this difference is larger than it actually is.

Consumer Reports grappled with this issue back in 1993. For at least twenty years they had been using a relative scale to rate the reliability of each model's systems (engine, transmission, etc.). But the reliability of the average car had improved so much that "looking at the average for each trouble spot, as we had been doing, was not as meaningful as it once was" (April 1993, p. 234). With the average rate under two percent for most systems on nearly new cars, it wasn't possible for any car to earn the prized full-red dot. This just didn't seem right. Also, the clear dot that represented "average" could mean a problem rate from under one percent to over 17 percent.

The solution: an absolute scale where a full-red dot meant a problem rate under two percent, a half-red a rate from two to five, a clear dot a rate from five to 9.3, a half-black a rate from 9.3 to 14.8, and a full-black a rate over 14.8. This made it obvious how problem rates increased as a car aged, and somewhat obvious what the actual rates were.

Problems with the old rating system

Unfortunately, overall rates remained relative to the average, and juxtaposing these with absolute system-level rates created new issues. A model could earn a full set of red dots yet still receive an average overall rating. Shouldn't the dots add up? Well, no. A car might have low problem rates in all areas, yet still be higher than the even lower average in many of these areas. Or fairly low rates in all areas could add up to a fairly high overall rate. Also, it was not possible to tell how a car compared to the average from the new system-level ratings. So Consumer Reports provided a set of ratings for the average car, to which the data for a specific model could be compared. Many people were confused by this so-simple-it's-complicated process.

The new rating system

Last fall they addressed this confusion, but not with an absolute scale for the overall ratings like the one TrueDelta uses. Instead, they've returned to a relative scale for the system-level ratings.

They tossed a relative system back in 1993 because it no longer made sense. How, then, does it make sense in 2006, when cars have become even more reliable? It doesn't. A purely relative system simply isn't viable. So they've actually adopted a hybrid rating system. Once problem rates dip under three percent, an absolute scale takes over. But not the absolute scale they adopted in 1993. Instead, they've adopted a much narrower scale.

With the new scale, a full- or half-black dot can be earned with any problem rate over three percent. A clear, "average" dot means the problem rate is under three percent and a half-red dot means the rate is under two percent. Finally, if the problem rate is below one percent, the system automatically gets a full-red dot.

Absolute rates lost

What's wrong with this new system? First, any sense of absolute problem rates has been lost. A half-black dot indicates that a problem rate is both below average and over three percent. But how far over three percent? In last fall's New Car Preview 2006 (but not in the Annual Auto Issue) they provided a chart of average problem rates. For 2005 models, this average was at or below one percent for 11 of the 15 systems, two percent for three systems, and three percent for just one. How much worse does a rate have to be to be "worse than average"? They don't say. End result: the question of how bad "bad" is cannot be answered.

Similarly, it's also not possible to tell how good "good" is. Does a full red dot mean that the problem rate is below one percent? Or just much better than the average, and possibly much higher than one percent?

Splitting hairs

Second, Consumer Reports is splitting some very fine hairs. Can they reliably predict system-level problem rates at the one-percent level? Maybe for a few models where they have a huge sample, but for many cars their sample size is under 200, and it can be as small as 100. So the ratings can easily be determined by whether or not a single respondent reports an issue. Add in the wiggle room provided by a roughly one in six response rate, letting respondents determine whether a problem is "serious" enough to be reported, and people's shaky memories when asked to report things that happened a year earlier, and the new system implies a far higher level of precision than their research design can deliver.

As pointed out in my second piece on Consumer Reports, they have unexplained variances of as much as 80 points for the overall ratings. This doesn't lend confidence that they can measure system-level problem rates at the one-percent level.

Reliability appears to have declined

Third, the change makes it appear that fairly new vehicles have suddenly become less reliable even though the opposite is true. With the cut-offs now one, two, and three percent rather than two, five, and 9.3 percent, it is now two-to-three times harder to earn a given rating when the average problem rate is low.

The consequences are predictable. Last year a 2004 CTS earned top ratings for every system. This year the 2005 received one half-red, two clear, and one half-black dot. Should people now avoid the car because it has problem rates in the two-to-three percent range for a couple of systems? (The overall rating remains the same at average.)

So why do it?

Why adopt a new system with so many weaknesses? The stated reason is to make the cars' relative reliability more apparent. But there might be another reason. If you're Consumer Reports, you want to sell memberships and magazines. People are more likely to buy these if they're worried about reliability. And if most cars earn high ratings, which has been increasingly the case, they're less likely to be worried. Solution: change the rating system to produce more "bad" dots.

Consumer Reports' new ratings system will boost revenues. But it will also further distort perceptions. Reporting dots rather than actual rates has led many people to believe that the differences among cars are larger than they actually are. The new system intensifies this distortion by shrinking the absolute difference between ratings to as little as a single percentage point. But do those who avoid products with black dots care if a problem rate is three percent instead of one or two percent? I doubt it. Even if some people did care about such small differences, can Consumer Reports' methods and sample size deliver this level of precision? I doubt that, too.

When using any scale, it is important to ensure that different ratings are far enough apart that differences between them matter and can reliably be measured. Consumer Reports' new system fails both tests.

Thanks for reading.

Michael Karesh, TrueDelta

First posted: April 10, 2006
Last updated: November 16, 2006

Think Pieces