Big data
Big data is a term that describes the large volume of data - both, structured and unstructured - that overloads a business on a day to day basis. But it's not the amount of data that's important. It's what all the organizations do with the data that matters. Big data can be analyzed for insights leading to better decisions and strategic business moves. "Big data" is similar to "small data" but it is bigger in size. Big data is the collection of both structured and unstructured data which are from different sources like social data, machine generated data, and traditional enterprises. There is no single standard definition. According to McKinsey, Big data refers to datasets whose sizes are beyond the ability of typical database software tools to capture, store, manage, and analyse.
Evolution of data
Humans have been generating data for thousands of years. More recently, we have seen an amazing progession in the amount of data produced from the advent of mainframes to client server to ERP and now, everything digital. For years, the overwhelming amount of data produced was deemed useless. But data has always been an integral part of every enterprise, big or small. As the importance and value of data to an enterprise became evident, so did the proliferation of the data silos within an enterprise. This data was primarily of a structured type, standardized, and heavily governed. The typical volumes of data were in the range of a few terabytes and in some cases, due to compliances and regulation requirements, the volumes expectedly went up several notches higher.
Measuring of data
1024 Bytes = 1 Kilobytes(KB)
1024 Kilobytes = 1 Megabytes(MB)
1024 Megabytes = 1 Gigabytes(MB)
1024 Gigabytes = 1 Terabytes(MB)
1024 Terabytes = 1 Petabytes(MB)
1024 Petabytes = 1 Exabytes(MB)
1024 Exabytes = 1 Zettabytes(MB)
1024 Zettabytes = 1 Yottabytes(MB)
Data models
3 types of data models are there.
- Structured data
This type describes that data which is grouped into a relational scheme. The data configuration and consistency allows it to respond to simple queries to arrive at usable information, based on an organisation's parameters and operational needs.
- Semi-Structured data
This is the form of structured data that does not confirm an explicit and fixed schema. The data is inherently self-describing and contains tags or other markers to enforce hierarchies of records and fields within the data. Examples include weblogs and social media feeds.
- Unstructured data
This type of data consists of formats which cannot easily be indexed into relational tables for analysis or querying. Examples include images, audio, and video files.
Characteristics of big data
According to gartner,"big data is high volume, high velocity, and high variety information assets that demand cost effective, innovative forms of information processing for enhanced insight and decision making".The main 3 V's are volume, velocity, and variety.
- Volume - Volume derives the amount of data from terabytes to petabytes
- Velocity - Velocity represents speed the rate of change in the data and how fast it must be processed to gain business value.
- Variety - Big data means such more than traditional RDBMS data.It includes unstructured text,sound and movie files,images,documents,geo-location data,web logs etc.
Big data also have two more chracteristics, they are,
- Veracity - Data in doubt or unpredictable data.
- Value - Big data is about supporting decisions, need the ability to act on the data and derive value.
Handling the three V's helps organisations to extract the big data.The value comes in turning the three V's into the three I's,
- Informed intuition - predicting likely future occurences and what course of actions is more likely to be successful.
- Intelligence - Looking at what is happening now in real time (or) close to real time and determining to take action.
- Insight - reviewing what is happened and determing the action to take
Why big data is necessary?
The convergence accross business domains has lead in a new economic system that is redefining relationships among producers, distributors, and consumers or goods and services.Within an organisation, this complexity makes it difficult for business leaders to rely solely on experience (or pure intuition) to make decisions.They need to rely on good data services for their decisions.By placing data at the heart of the business operations to provide access to new insights, organisations will then be able to complete more effectively.
Three things have come together to drive attention to big data,
- Increase of storage capacities.
- Increase of processing power.
- Availability of data types.