Apache Spark Tuning Configurations Tutorial

Here are some pyspark configuration items that can be used for tuning spark jobs.

1. Memory Management

spark.executor.memory = 4g
spark.driver.memory = 2g
spark.memory.offHeap.enabled = true

2. Parallelism

spark.executor.cores = 4
spark.default.parallelism = 8

Additional Resources

Related Articles

Hudi Upserts

Hudi Data Lake

Spark Performance Guide

Spark CDC

Spark on Windows

Spark SQL