Big data as a concept is defined around four aspects: data volume, data velocity, data veracity and data value.
Two patterns emerge when these characteristics are looked at closely. While the volume and velocity aspects refer to data generation process and how to capture and store the data, veracity and value aspects deal with the quality and the usefulness of the data leading to the point.
Data management is a major challenge for most enterprises – even small data is plagued by quality and management issues.
In addition, the digital world is generating new sets of data coming in from different sources (mostly from web) in structured format and unstructured format.
If businesses simply go by the volume and velocity aspects, it qualifies as a big data problem. However, in reality, a lot of this data comprises ‘noise’ (information or metadata having low or no real value for the enterprise).
The purpose of smart data (veracity and value) is to filter out the noise and hold the valuable data, which can be effectively used by the enterprise to solve business problems.
If businesses take the smart data approach, they can always argue that bigger isn't always better. For a predictive model, will a simple random sample suffice?
What's the marginal impact on a predictive model's accuracy if it runs on five million rows versus 10 billion rows? Statistically speaking, the marginal impact is negligible.
So, how does big data become smart data?
There are no formulas, but one has to better understand the clues in the questions around the data. Analysing data qualitatively enables one to not only become data-driven but also creates opportunities to become creatively-driven. And this is where big data can become smart data.
Instead of just looking at the numbers and making wild guesses about why something works or doesn’t, people who work with data have to humanise it and essentially become ‘data whisperers’.
It is the skill of further analysing the quantitative and qualitative aspects of data together. Businesses have to let the data tell their the story, removing as much of their own bias as possible.