There was a time when Big Data was being talked by everybody. It is even now but perhaps not with the same vim and vigour. At the recent NASSCOM Conference on Big Data Analytics, there was less talk on Big Data. In fact one of the speakers mentioned and I quote “Big Data is nothing but one byte more than what your IT systems can handle”
It is generally accepted that there are three characteristics of Big Data – viz., Volume, Variety & Velocity. A few other practitioners add a couple of more – we will touch upon that a bit later.
Volume – much has been talked about the sheer size of data that are being generated these days. In fact thanks to advances it is no longer a rocket science on abilities to store these data. Except for a few example on the fringes of business ways to store data – and pretty large volumes at that have been addressed.
Increasing research on processing of images, sound and unstructured text have also resulted in Variety the second characteristic in being a relatively solved problem.
It is Velocity that is now attracting attention. There are two reasons for the same. Online customer data, typically for those who now produce such as retail have started seeing more online activity. Customer activity online takes on many forms. From purchasing goods, services to travel and making reviews and comments there seems to be no end to customer behaviour data trails online.
While it was always believed that data of the say the last six months to a year will yield valuable insights, it is still postmortem. Organization want to act on as recent data as possible. With data being generated with increasing velocity organizations want to take decisions that will have an effect in immediate future. Decisions such change in pricing, making relevant offers and recommendation on the go, understanding customer sentiment are now wanted on a here and now basis.
Another important aspect of Velocity is the phenomena of Internet of Things. With sensors and literally very huge numbers of sensors producing data by the milli- and the microseconds, this characteristics have suddenly assumed mammoth importance.
The need for complex event processing have given impetus to the advent of real time analytics. The focus is now on how to digest this new found velocity and take meaningful actions.
I had mentioned additional characteristics. They are Veracity and Value. Veracity is the truthfulness of data. This takes on two aspects. One relates to the inherent goodness of data that come into the analytics framework. The other relates to the true interpretation of the effects of the data with respect to the reality at hand. The second aspect of veracity usually spins of a debate of correlation and causation.
Value on the other hand is the ultimate benefit of doing the analytics. Questions such does business really want this? or is the result of doing these complex analysis result in an actionable business outcome?
In between all these are all the debates pertaining to data scientists, their existence, their need, the hype of analytics, whether big is good, are number of columns more important than number of rows.
To me it seems the path to analytics seems to be :
- Start small – focus on columns or attributes, use Excel or perhaps R
- Test what ever hypotheses are plaguing the business community
- Develop meaningful algorithms if required
- Increase rows – go BIG