A quick fun example of data quality garbage in, garbage out…
I once helped organize a user conference where a suspiciously large percentage of the audience came from Afghanistan, because of people lazily clicking on the first country on the registration page.
It turns out that even web giants like Facebook and MySpace have the same problem. In the June 8th episode of the NPR show On The Media, during a segment that discussed children’s use of Facebook despite the age limit restrictions, researcher Danah Boyd said:
“One of the funny things that you will find on these sites is that a huge number of kids actually say that they are from Afghanistan or Zimbabwe, which are the countries alphabetically at the top and the bottom of the possible countries you can be from… so, based on the stats of Facebook and MySpace, there are more people online in Afghanistan and Zimbabwe than there are living there” (@39’00’’)
What’s NOT funny is to realize that most of the time it’s not as easy as this to recognize you have a data quality problem. What if the figures had only been 10% or 15% out? Would anybody have noticed the problem then? Or would they have happily accepted the figures and used them for (bad) decision making?
Poor data quality is the number one technical problem preventing the successful deployment of analytic solutions. Every business should invest in good data quality solutions that help detect and fix bad data.