If we need to connect to hive using jdbc, we would need the server name and port number where hive is hosted. If we don’t have them handy, well! we can run the below command on Hive CLI and get the server and the port which can be used to connect .

hive> set hive.server2.thrift.port;

hive.server2.thrift.port=10000

hive> set hive.metastore.uris;

hive.metastore.uris=thrift://hiveip.abc.com:9083,thrift://hiveip2.abc.com:9083


Initializing Spark Context for Spark 1.0 and for version 2 and above, SparkContext, SQLContext are unified under SparkSession.

Creating Spark Context — Create SparkConf() object with below syntax

val conf = new SparkConf().setMaster(“local[*]”).setAppName(“SparkTestApp”)

Initialize SparkContext by passing conf object

val sc = new SparkContext(conf)
val sqlContext = org.apache.spark.sql.SQLContext(sc) ( Deprecated )

In version 2.0 and greater we can use SparkSession

val spark = SparkSession.builder.appName(“Spark_Test”).master(“local[*]”).enableHiveSupport().getOrCreate()

Getting All Configuration Values

spark.SparkContext.getConf.getAll.foreach(println) or spark.conf.getAll.foreach(println)

(spark.sql.catalogImplementation,hive)
(spark.driver.host,192.168.0.13)
(spark.app.name,SparkDataFrames)
(spark.master,local[*])
(spark.executor.id,driver)
(spark.driver.port,58959)
(spark.app.id,local-1582091700372)

Changing Spark Configurations

  1. While Defining SparkSession
val spark = SparkSession.builder.config(“spark.driver.cores”,16).appName(“Spark_Test”).master(“local[*]).enableHiveSupport().getOrCreate()

2. Adding Configuration after initializing SparkSession

spark.conf.set(“spark.logConf”,false)

3. Supplying it in the command line

./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=false
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar

Different Spark Configurations

spark.app.name

spark.driver.cores

spark.driver.memory

spark.executor.memory

spark.extraListeners

spark.driver.maxResultSize

spark.local.dir

spark.logConf

spark.master

spark.submit.deployMode


Step 1 : If you are creating standalone application, using SparkSession initiate the spark context. If you are using spark-shell , the SparkSession would be automatically initiated for you.

Step 2: Using SparkContext ( sc ) read the file( Here I have used sherlock-holmes.txt file ) and create a RDD…

Balasubramaniyan Sellamuthu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store