In January 2013 at an MRS conference in London, Alex Johnson (of Kantar’s Innovation Group), asked the audience who had heard of a “mezobyte”. A few nervous hands were raised. Alex allayed the fears of those of us who did not know by simply admitting he had made up the word on the spot!
It is not the “Big” word that is important.
It neatly embodied all that is wrong with our thinking about Big Data, then and now. Too often we are amazed, startled or even made slightly nervous by one or other of the 3Vs of Big Data – the velocity, volume and variety of data. But it is not the “Big” word that is important, it is the “Data” word. Alex went on to compare what 2000 gigabytes of storage might get you – 24 hours of video footage of an empty room OR everything 1500 people did on their phones for one year. So it’s not just about data either, it’s about meaningful data.
None of this should frighten any market researcher. Our entire industry was founded on the precept that a small, well designed sample was more accurate than a large convenience sample. In fact, as we all know, randomly sampling from a big data set, and analyzing that sample, would give precisely the same answer, within margin of error, as any analytics company would do using the complete data set.
What else frightens us about big data?
Firstly we have to recognize that we really don’t like looking at the results of question we didn’t write, we think we could do it better. We also don’t like people not answering the questions we ask, it makes our tables messy. Our tools are designed to deal with structured data, and preferably a tool that can produce and print batch tables that we subsequently read. These are irrational objections and ones that Big Data analytics companies, and their clients (our clients?), would find risible.
Our paradigm can be summed up as asking the right people, the right questions, and understanding their answers. In our jargon this is Sampling, Questionnaire Design and Analysis. On all these fronts we are attacked by proponents of Big Data and it is time to fight back.
The sampling side of market research is, to be frank, very forgiving. Given a sample that is large enough, diverse enough and random enough you will get pretty much the right answer. We often fuss and preen too much on quotas and issues of representivity when the right answer could have been gotten with much less fuss. So sampling purity is not a good enough objection. On a case by case basis we need to critically assess the big data source. Is it random enough? Really, what frame is it? Does it represent the population of interest? We must not be purist about this. We must reflect the reality and ask: Does it really matter? For when it does our clients will be grateful we were there asking these questions.
On questionnaire design we are good, so good in fact that we think it is easy and therefore that anyone can do it! But we are not perfect, we do things we shouldn’t; we ask questions that are unaskable (or unanswerable); we forget that the respondent knows they are in an experiment and is desperately trying to please. We forget about all sorts of biases. This does not make Big Data ‘better’. After all; not all of the questions we want to ask are being discussed in the blogosphere or on social media, not always is the truth being told to the data collector, not always is the data provider who you think it is.
Take a simple example of loyalty club membership. In the supermarket I give away my Nectar points sometimes to the nice old lady behind me – because I have forgotten my Nectar card. So now she has a data blip because I’m sure our shopping baskets are very different, and is probably wondering why she is starting to see advertising that ought to be aimed at me! I also have a Clubcard for the odd occasions when I use Tesco. They must have a very odd picture of me as a consumer.
At the gross level of course all this data inconsistency averages out and does model well. But the human foibles and inconsistencies at the individual level, like quantum theory to Newtonian mechanics, explains why our dream of predictive one-to-one marketing is bound to fail.
And yet, little by little, big data ideas are creeping into our world. We have always accepted the idea of appending external data to our samples and now we also accept the idea of passive data collection, as less obtrusive to the individual and more accurate than recall. Now we are extending this into the realm of the survey itself. It has rarely occurred to any of us to capture paradata about the survey over and above perhaps the order in which items were selected in a multi-coded questions. At the same 2013 conference I presented an analysis of the order in which respondents “attacked” a 73 item grid. I had only hoped to capture a slight pause or hesitation in clicking item 39. Little did I expect to find that there was no guarantee that item 39 was the 39th click, nor that it was necessarily preceded by a click on item 38! The data was messy, un-structured and extremely hard to process using our standard tools. But once I overcame these minor problems a wonderful world of consumer insight opened up in front of me. Fast forward two years to the General Online Research conference and ‘paradata’ was the word “de jour” in most of the methodology track. Fabulous insights are being obtained about the process of survey taking without asking respondents anything about what they did or why they did it.
Market researchers would make good Big Data analysts. We know how to ask questions, we understand the absolute necessity to establish causality over mere correlation and we recognize a biased sample frame when we see one. These are all the bricks in our house and no big bad wolf can ever blow them away.