Spark Context

Balasubramaniyan Sellamuthu
1 min readFeb 19, 2020

Initializing Spark Context for Spark 1.0 and for version 2 and above, SparkContext, SQLContext are unified under SparkSession.

Creating Spark Context — Create SparkConf() object with below syntax

val conf = new SparkConf().setMaster(“local[*]”).setAppName(“SparkTestApp”)

Initialize SparkContext by passing conf object

val sc = new SparkContext(conf)
val sqlContext = org.apache.spark.sql.SQLContext(sc) ( Deprecated )

In version 2.0 and greater we can use SparkSession

val spark = SparkSession.builder.appName(“Spark_Test”).master(“local[*]”).enableHiveSupport().getOrCreate()

Getting All Configuration Values

spark.SparkContext.getConf.getAll.foreach(println) or spark.conf.getAll.foreach(println)

(spark.sql.catalogImplementation,hive)
(spark.driver.host,192.168.0.13)
(spark.app.name,SparkDataFrames)
(spark.master,local[*])
(spark.executor.id,driver)
(spark.driver.port,58959)
(spark.app.id,local-1582091700372)

Changing Spark Configurations

  1. While Defining SparkSession
val spark = SparkSession.builder.config(“spark.driver.cores”,16).appName(“Spark_Test”).master(“local[*]).enableHiveSupport().getOrCreate()

2. Adding Configuration after initializing SparkSession

spark.conf.set(“spark.logConf”,false)

3. Supplying it in the command line

./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar

Different Spark Configurations

spark.app.name

spark.driver.cores

spark.driver.memory

spark.executor.memory

spark.extraListeners

spark.driver.maxResultSize

spark.local.dir

spark.logConf

spark.master

spark.submit.deployMode

--

--