Or, what's old is new again.
The old rating system
In my first piece on Consumer Reports, I criticized their use of relative ratings. From such ratings
it's not possible to tell how many more problems a "bad" car will have than a "good" one. As a result, many readers likely assume this
difference is larger than it actually is.
Consumer Reports grappled with this issue back in 1993. For at least twenty years they had been using a
relative scale to rate the reliability of each model's systems (engine, transmission, etc.). But the reliability of the average car had improved so
much that "looking at the average for each trouble spot, as we had been doing, was not as meaningful as it once was" (April 1993, p. 234). With the
average rate under two percent for most systems on nearly new cars, it wasn't possible for any car to earn the prized full-red dot. This just didn't
seem right. Also, the clear dot that represented "average" could mean a problem rate from under one percent to over 17 percent.
The solution: an absolute scale where a full-red dot meant a problem rate under two percent, a half-red a rate from two to five, a clear dot a rate
from five to 9.3, a half-black a rate from 9.3 to 14.8, and a full-black a rate over 14.8. This made it obvious how problem rates increased as a
car aged, and somewhat obvious what the actual rates were.
Problems with the old rating system
Unfortunately, overall rates remained relative to the average, and juxtaposing these with absolute system-level
rates created new issues. A model could earn a full
set of red dots yet still receive an average overall rating. Shouldn't the dots add up? Well, no. A car might have low problem rates in all areas, yet
still be higher than the even lower average in many of these areas. Or fairly low rates in all areas could add up to a fairly high overall rate.
Also, it was not possible to tell how a car compared to the average from the new system-level ratings. So Consumer Reports provided a set of ratings for the average car, to which
the data for a specific model could be compared. Many people were confused by this so-simple-it's-complicated process.
The new rating system
Last fall they addressed this confusion, but not with an absolute scale for the overall ratings like the one TrueDelta uses. Instead, they've
returned to a relative scale for the system-level ratings.
They tossed a relative system back in 1993 because it no longer made sense. How,
then, does it make sense in 2006, when cars have become even more
reliable? It doesn't. A purely relative system simply isn't viable. So they've actually adopted a hybrid rating system. Once problem rates dip
under three percent, an absolute scale takes over. But not the absolute scale they adopted in 1993. Instead, they've adopted a much
narrower scale.
With the new scale, a full- or half-black dot can be earned with any problem rate over three percent. A clear, "average" dot means the problem rate
is under three percent and a half-red dot means the rate is under two percent. Finally, if the problem rate is below one percent, the system
automatically gets a full-red dot.
Absolute rates lost
What's wrong with this new system? First, any sense of absolute problem rates has been lost. A half-black dot indicates that a problem rate is both below average and over three
percent. But how far over three percent? In last fall's New Car Preview 2006 (but not in the Annual Auto Issue) they provided a chart of average
problem rates. For 2005 models, this average was at or below one percent for 11 of the 15 systems, two percent for three systems, and three percent
for just one. How much worse does a rate have to be to be "worse than average"? They don't say.
End result: the question of how bad "bad" is cannot be answered.
Similarly, it's also not possible to tell how good "good" is. Does a full red dot mean that the problem rate is below one percent? Or
just much better than the average, and possibly much higher than one percent?
Splitting hairs
Second, Consumer Reports is splitting some very fine hairs. Can they reliably predict system-level problem rates at the one-percent level?
Maybe for a few models where they have a huge sample, but for many cars their sample size is under 200, and it can be as small as 100.
So the ratings can easily be determined by whether or not a single respondent reports an issue. Add in the wiggle room provided by a roughly
one in six response rate, letting respondents determine whether a problem is "serious" enough to be reported, and
people's shaky memories when asked to report things that happened a year earlier, and the new system implies a far higher level of precision than
their research design can deliver.
As pointed out in my second piece on Consumer Reports, they have unexplained variances of as much as 80 points
for the overall ratings. This doesn't lend confidence that they can measure system-level problem rates at the one-percent level.
Reliability appears to have declined
Third, the change makes it appear that fairly new vehicles have suddenly become less reliable even though the opposite is true. With the cut-offs now
one, two, and three percent rather than two, five, and 9.3 percent, it is now two-to-three times harder to earn a
given rating when the average problem rate is low.
The consequences are predictable. Last year a 2004 CTS earned top ratings for every system. This year the 2005 received one half-red, two clear, and
one half-black dot. Should people now avoid the car because it has problem rates in the two-to-three percent range for a couple of systems? (The overall
rating remains the same at average.)
So why do it?
Why adopt a new system with so many weaknesses? The stated reason is to make the cars' relative reliability more apparent. But there might be
another reason. If you're Consumer Reports, you want to sell memberships and magazines. People are more likely to buy these if they're
worried about reliability. And if most cars earn high ratings, which has been increasingly the case, they're less likely to be worried.
Solution: change the rating system to produce more "bad" dots.
Consumer Reports' new ratings system will boost revenues. But it will also further distort perceptions. Reporting dots rather than
actual rates has led many people to believe that the differences among cars are larger than they actually are.
The new system intensifies this distortion by shrinking the absolute difference between ratings to as little as a single percentage point. But do
those who avoid products with black dots care if a problem rate is three percent instead of one or two percent? I doubt it. Even if
some people did care about such small differences, can Consumer Reports'
methods and sample size deliver this level of precision? I doubt that, too.
When using any scale, it is important to ensure that different ratings are far enough apart that differences between them matter and can reliably be
measured. Consumer Reports' new system fails both tests.
Thanks for reading.
Michael Karesh, TrueDelta
First posted: April 10, 2006
Last updated: November 16, 2006