What is Big Data ?
Big data is a large data source, ever-increasing and, in many cases, complex. It usually gathers data from various cheaply and widely available data sources. Some of these sources include daily transactions in small scale retail stores, sensors that determine a machine’s health and user behaviour on web sites. While managing such data might sound easy, the sheer number of data points is massive. Volume, Variety and velocity are the determining factors of Big Data, together they are termed the 3 Vs. How do you expect any computer to run complex analysis on such data? This is where the concept of big data arrives.
Tools Used in Big Data
Firstly, Big Data analytics requires a lot of computation to get it right. To setup these processes, we require tools that can guide us through the way. Here is a list of a few common tools in Big Data.
- Apache Hadoop: Hadoop is a framework that allows you to first store Big Data in a distributed environment, so that, you can process it parallelly. In addition, it is an open-source framework written in Java (but is cross-platform).
- Apache Spark: Spark is more like an alternative or successor of Hadoop. Essentially, they built Spark to overcome any drawbacks that Hadoop had.
- Apache Cassandra: Cassandra is an open-source NoSQL DBMS. The main function was to manage large volumes of data. It employs CQL (Cassandra Query Language) to interact with the database.
- Apache Storm: Storm is also an Apache product, a real-time framework to process data streams. It is free and open-source.
- Apache Hive: Hive is a java based cross-platform data warehouse tool that facilitates data summarization, query, and analysis.
How Does Big Data Work ?
There are 3 main actions behind the working of Big Data – Integration, Management and Analysis. In other words, these are the names under which any Big Data process can be classified.
- Integration
As mentioned earlier, the number of data points is what makes it “Big Data”. However, these large streams of data from ubiquitous sources have led to various problems. The volumes of data that we create are unimaginable. Quite often we deal with data in petabytes and sometimes even more. Hence, to make Big Data useful, we will have to get the data, process it and format it to suit the needs of the analysts.
- Management
After getting your data from various sources, we need to store it. This storage is required if we want to access the data later. It is, most importantly, required by the computers to analyse the data. We can store the data locally if the resources are available. However, most people/companies prefer to use cloud services as it is easier to manage and they can just get new services when required.
- Analysis
Now that we have all are required data stored, we can get to analysing it. The analysis is one of the major parts of the process as it allows the person to make sense of all the collected data. You can use it to make important market research and hence develop sales.
Uses of Big Data
The term Big Data was coined in the year 2005. However, this idea or concept has been in existence for longer. Most commonly Big Data has been used to make important business decisions based on several market factors. Apart from the aforementioned, there are many more use cases. Let us look into some of these use cases.
- Big data extracted from media, entertainment and social media
The data that is generated by millions of users on these sites is sent to databases for analysis. This analysis has plenty of benefits such as.
- Optimizing recommendations to content
- Generalising the interests of a user
- Displaying more relatable advertisements
However, these benefits come with a few drawbacks as well which include privacy concerns.
- Big data generated by weather stations
Weather stations (both public and private) generate huge amounts of data on the local weather of every town in every country. This data can then be used to monitor and even predict the weather conditions. Other uses of this data include studies of global warming, prediction of natural disasters and to predict the availability of water in many places of the world. One such product launched in 1996 was IBMs Deep Thunder project. The aim of this project was mainly to improve the local weather forecasting using high-performance computing.
- Big Data in Banking security
With every passing day, the number of transaction increases at an unimaginable rate. However, the increase in the number of online transactions would most certainly result in an increase in the number of fraudulent transactions. The data generated by each transaction is extremely valuable to stop fraud from happening. The data collected on a normal day from a normal customer would be classified and stored. They can then train machine learning algorithms to match these patterns in the real world. If the algorithm detects any anomalies in these transactions, it would then flag them to be check by a person. The banks can prevent many different types of fraudulent activities with this principle.
Conclusion
I hope you all understood something new about the concept of Big Data. Although It is a pretty vast topic that has lots to discuss I have tried my best to cram in as much information as possible. Finally, if you have any question, comments or suggestions you can leave them in the comments section below.
Happy Learning !! 😃