Brian Heath
Your data is not enough
Updated: Jan 10, 2022
Trying to solve a problem with only data is like trying to understand a book by only reading the count of keywords. You may get the gist of the topic, but you’ll never know the story - or if you are even reading the right book.

A pervasive thought in many organizations is that if we could only get all of the data together then we could figure out how to solve a problem, make a better decision, and generate exciting new revenue streams. This is a completely reasonable thought as having all of the data could result in all of those things. But moving blindly forward with collecting all of the data will lead to a lot of frustration:
Why aren’t we getting any insights from this data?
Why is it taking so long to put everything together?
We just invested a ton of money into this project, why aren’t my analysts able to answer my questions?
Why all of these problems? The short answer is that getting the data together was only one step of many to solve your problem. Think of data as just words in a book. Certainly, each word has some meaning by itself, but the real value comes when all of the words are put together to convey an idea. There is a reason why libraries consist of books and not just the words that comprise the books. Having a strategy to collect all of the data is equivalent to collecting all of the words ever written. In the end, you’ll have a dictionary that has some uses but no one ever goes to a dictionary to figure out how to solve a problem. Most data strategies today are only building dictionaries and getting exactly the kind of boring results you’d expect to get from a dictionary.
Now dictionaries still have their place in the world, so we shouldn’t completely throw out the concept of collecting data. The key is figuring out when to collect the data and, more importantly, what data we should be collecting. Would you go to a library before you knew what you were looking for? Most likely not.
Similarly, I wouldn’t recommend that you set your goal to collect all of the books ever written just because you could. Defining what data is exactly interesting should be the hardest and most rewarding part of any analytics project. By the time you are done, you should be able to define the problem so well that you may not even need to collect the data at all. The answer or insight may be obvious. If the answer isn’t obvious, then it is only a matter of time and number crunching to collect and analyze the data to get a solution to your problem. Skipping these steps will most often result in you realizing one day that you’ve been reading a summary of words from a poorly written children’s book on cats instead of the insights from a work of Socrates.
At the start of this post is a picture that is the word cloud of this blog post. Word clouds visually display the most frequently used words in a written document. Can you tell from just the word frequencies the core ideas of this post? The process to extract the data from this blog post, cleaning it up, and putting it into a graphical form isn't terribly difficult. But it doesn’t tell you the full story - it's just data. The data from this document is not enough and neither is data to solve your problem.
Interested in learning more? Contact us