In this article, I will be demonstrating about 6 Advantages and Disadvantages of Apache Spark | Limitations & Benefits of Apache Spark. Through this post, you will know the pros and cons of Apache Spark.
Let's get started,
Advantages of Apache Spark
1. Speed
Unlike in other frameworks like Hadoop, Apache Spark does not use local memory space for processing. It relies on RAM computing system. Therefore, their processing speed is much faster. Especially, in terms of big data. On average, Spark is able to process tasks 100x faster than Hadoop. That is the reason why Spark is the preferred option for large scale data processing involving petabytes of data.
2. User Friendliness
Apache Spark provides the option to process large datasets through the use of APIs. These APIs are included with over 100 operators that intends to transform semi-structured data. Eventually, creating parallel applications is a hassle free process.
3. Big Data Access
Apache Spark ensures maximum big data availability by finding many possible ways of making them access. More and more data scientists and engineers are educated on Spark so as to use them.
4. Machine learning & Data Analysis
Apache Spark facilitates both machine learning and data analysis through the use of libraries. For an example, Spark comes with a framework that can be used to extract and transform information which includes structured data.
5. Standard Libraries
Spark comes with standard libraries which is of higher levels. Normally, the libraries provide support for machine learning, SQL queries and graph processing. Developers using these libraries cam make sure maximum productivity. And also, even tasks that require complex work flow can be accomplished easily with Spark.
6. Career Demand
Apache Spark will be a great option for those who are willing to pursue their career in big data. Employees working as a Spark engineer will be able to enjoy significant benefits both in terms of remuneration and work. Once they are with enough experience, there is high demand for their profession. Companies are willing to hire them with attractive salary packages.
Disadvantages of Apache Spark
1. Cost
Cost effectiveness is another factor that needs to be considered in Apache Spark. Allocating data in memory is not cost efficient when it comes to processing of big data. Generally, in-memory processing requires tremendous amount of memory. If the memory consumption is higher, it will automatically increase the expenses as well.
2. Small File Issue
Issues with small files are common when Apache Spark is combined with Hadoop. Hadoop uses its own file system known as Hadoop Distributed File System (HDFS). Under normal conditions, they can only support small number of large files instead of large number of small files.
3. Lack of Real Time Processing
The live stream of data which is arriving is divided into batches. These batches are commonly called as Resilient Distributed Database (RDD). Once these batches are arrived, they are processed to complete other operations. Eventually, once again they will be transformed into batches. This process is known as Micro Batch Processing. Thus, it is not able to support real time data processing completely.
4. No File Management System
Apache Spark cannot process file management on its own. It relies on other third party systems. Either it needs to be combined with Hadoop Distributed File System (HDFS) or used along with a cloud based data platform. This makes Spark less efficient compared to other platforms.
5. Manual Optimization
Automation is a new trend in the technological world. Most popular platforms today prefer automation. Automatic Code Optimization process is absent in Apache Spark. All the codes needs to be optimized manually.
6. Pressure Control
Apache Spark is subjected to a condition known as Data Buffer. In this case, the buffer gets filled completely which resists transferring of data. When this happens all other data will get lined up. All these buildup data cannot be transferred until the buffer is cleared. Spark lacks the ability to control this back pressure from data buffer.
No comments:
Post a Comment