Spark RDD Operations
- Transformation
- Action
Transformation in Spark
Spark Transformation is a function that produces new RDD (dataframes/datasets) from the existing RDDs. It takes RDD as input and produces one or more RDD as output.
Each time it creates a new RDD when we apply any transformation. (input RDD’s cannot be changed since RDD are immutable in nature)
Example of Transformation,
- Narrow Transformation - map(), mapPartition(), flatMap(), filter(), union()
- Wider Transformation - groupByKey(), aggregateByKey(), aggregate(), join(), repartition(), etc.,
Action in Spark
Transformations creates RDDs from each other, but when we want to work with the actual dataset, at that point, action is performed.
Example of Action,
collect(), count(), first(), top(), etc.,