What is Big Data?
If you try to get information from web, there are tons of information about the meaning of ‘big data’… so it’s possible to presume that it’s subjective term. Most sources would consider that big data is, at least, a set of terabytes. But it’s possible to see people using smaller chunks of data. So? “One reasonable definition is that it’s data which can’t comfortably be processed on a single machine” – Ian Wrigley. I would say that Big Data is a set of data and it is one concept (hype) and method to handle a large volume of data, volatile (or not), with faster processing speed.
What is a Big Data challenge?
A potential challenge is that it is created very fast and it comes from different sources which could come in most different formats. Most data is worthless but actually have a lot of value. How to handle this and how to transform worthless data to worth information?
The 3 V’s of Big Data:
If you are looking for big data concepts, you’ll often hear about the three defining properties called 3V’s.
Volume: size of data that you’re dealing with;
Variety: the data is often coming from lots of different sources and from many different formats (txt, mp3 …);
Velocity: speed at the data arrives; get ready to be processed; to be generated and to be available.
There are plenty of things we can do with ‘big data’. It can provide benefits for individual people: good recommendations by amazon/NetFlix/…, for community: hospital monitoring, for organizations: creating innovative products, society: energy consumption, disaster recovery, urban planning. But basically, for me, it is about how to transform worthless data to worth information.
My idea here was just highlights of Big Data’s concepts based on my studies. If you want more detail behind Big Data I’d recommend the book Big Data: A Revolution That Will Transform How We Live, Work, and Think.
Next step: choose a small set of technologies to put in practice.
Just a mental note: what is the relationship between text mining and big data?