Time to clean the data!

Well, except for a few stragglers data collection is done for the quarter. There’s more of it than ever before. Which also means there’s more of it to clean than ever before.

First, a bit on growth. Last month 5,064 cars were surveyed. Of these, a survey was completed for 2,183 of them. Which works out to a 43 percent response rate – same as last quarter. I was hoping for higher, but this is still a far higher rate than Consumer Reports or J.D. Power achieves.

Looking at the four previous quarters, 363, 618, 811, and 1,324 surveys were returned. So growth has been steady and rapid. Compared to a year ago, the latest data includes six times as many cars.

I “clean” the data by checking it for errors and correcting those I find. In some cases, you’ll get an email. In others, I can correct the error without additional information.

First, I check for gaps in responses. If a member responded in an earlier quarter and responded in April, but missed a quarterly check-in or two in between, I contact them to fill in the gap. Luckily, most participants are consistent. This only affects a few dozen people, about two percent of the total.

Next, I check for odometer readings that are either lower than previously submitted readings or that seem too high given the age of the car. In some cases the member obviously swapped numbers or forgot to only include the thousands, and instead typed in the entire number. In other cases, I send an email. This affected about sixty vehicles this time, three percent of the total.

Finally, I go through the responses one by one, looking for inconsistencies. The detailed descriptions help a great deal with this process; I don’t have to email members for clarification nearly as much as I used to before asking for these descriptions. I’m just about to start on this part of the cleaning process now.

Two things I’m especially concerned about:

  1. Reporting a repair trip when a part was ordered, but not reporting the later trip to have the new part installed.
  2. Reporting all issues for a trip one the same page, rather than reporting them one at a time. Each time you submit a response you’ll be asked whether you have another issue or repair trip to submit. I can usually fix this on my own, but it’s tedious.

If you receive an email asking for clarification, please respond quickly. I intend to send clean data to the analyst by the end of this week, and post the next set of results in two weeks.

The threshholds will be going up again. For a result to be “official,” I’m requiring data on at least 25 cars, up from 20 last time and 18 six months ago.  Members will also be able to see results for cars with at least 15 responses, up from 10 last time.  Even so, the number of models will be increasing, to 30 in the “official” results and another 29 on the members-only page.