Introduction
Hey fellas, recently I got an opportunity to work on SSRS reporting services that lead me to Big Data Analytics. And because of Big Data trends, I became interested in exploring the tableau software. There are now many tools available related to Big Data. I chose tableau.
I am not an expert in using this tableau software. I am still learning this technology and I am writing this article since I want to share my experience and things that I learn. So if there are any mistakes in the content then please notify me of your comments.
In this article, I will write a few basic things that I have learned and will continue writing few a more articles as I proceed. First, let us get started here talking about why we need Big Data systems.
Why Big Data?
Colin White says that global digital content created will increase some 30 times over the next ten years, to 35 zettabytes. One way of looking at big data is that it represents the large and rapidly growing volume of information that is mostly untapped by existing analytical applications and data warehousing systems. Examples of this data include high-volume sensor data, weblogs, videos, images, and social networking information from web sites such as FaceBook and Twitter. Organizations are interested in capturing and analyzing this data because it can add significant value to the decision making process. Such processing, however, may involve complex workloads that push the boundaries of what is possible using traditional data warehousing and data management techniques and technologies.
Facebook has grown to 500 million active users monthly, users in Q3 of 2010 and going up to Q3 of 2012 have eclipsed a billion. So in two years, they have doubled their user base. That insight thinks about one small change they made during the sign-up process that they learned from analyzing all the weblog data and all the web traffic data or big data that they were to double. This is a primary reason in my mind why many big companies are making a huge investment in these Big Data spaces.
In classifying big data, Tata Consultancy Services Limited (TCS) looked at how much of the companies' data was structured versus unstructured, as well as how much was generated internally versus externally.
- 51% of data is structured
- 27% of data is unstructured
- 21% of data is semi-structured
A much higher than the anticipated percentage of data was not structured, either unstructured or semi-structured, and a little less than a quarter of the data was external. (Source: Tata Consultancy Services Limited; The Emerging Big Returns on Big Data).
What is Big Data?
- Unstructured: Big Data is unstructured. These are documents that are stored as videos and images, many kinds of various data types, and structures that don't fit nicely into rows and columns like a relational database schema.
- PetaBytes+: They are petabytes in scale. Companies like Facebook and Google have stored massive amounts of information. They need a way to analyze the data and these systems are the way they have chosen.
- Evolution of RDBMS: Big data systems are the evolution of RDBMS. These are new ways of storing data that allow much greater flexibility and much more scalability.
- Many Platforms: Big Data systems are not a single thing, there are many types and a variety of platforms and even within those there are various tools and technologies.
Managing and Analyzing Big Data
For the past two decades, most business analytics have been created using structured data extracted from operational systems and consolidated into a data warehouse. Big data dramatically increases both the number of data sources and the variety and volume of data that is useful for analysis. A high percentage of this data is often described as multi-structured to distinguish it from the structured operational data used to populate a data warehouse. In most organizations, multi-structured data is growing at a considerably faster rate than structured data. Two important data management trends for processing big data are relational DBMS products optimized for analytical workloads (often called analytic RDBMSs, or ADBMSs) and non-relational systems (sometimes called NoSQL systems) for processing multi-structured data. A non-relational system can be used to produce analytics from big data or to preprocess big data before it is consolidated into a data warehouse. says Colin White.
Summary
In this article, we have learned about Big Data and its importance.