The project also contains a “pom.xml” file. This is the main file of all the Maven projects. This file will contain all the external dependencies information about our project. As we have created a Spark project this file contains the “spark-core” and “spark-SQL” libraries. Maven will automatically download these references from Maven global repository and save it to a local folder. In future projects, it will refer to local reference only.
For our first project, it takes some more time to install all the dependencies in our local system. After some time, our new project will be ready. Now we can create a new Scala Object under the “sample” package. Java is a package-based application. Scala also follows the same tradition.
Please give a valid name to our new Scala object. Please choose “Kind” as “Object”
I have downloaded the Pincode data from the Indian government official
website for evaluation purposes and saved it to our default data folder. You must save data using this folder structure.
We can add the below source code to our “indianpincode” object file.
indianpincode.scala
- package sample
- import org.apache.spark.sql.SparkSession
- object indianpincode {
- def main(arg: Array[String]): Unit = {
- val sparkSession: SparkSession = SparkSession.builder.master("local").appName("Scala Spark Example").getOrCreate()
- val csvPO = sparkSession.read.option("inferSchema", true).option("header", true).
- csv("all_india_PO.csv")
- csvPO.createOrReplaceTempView("tabPO")
- val count = sparkSession.sql("select * from tabPO").count()
- print(count)
- }
- }
In this code, we have imported “org.apache.spark.sql.SparkSession” library. SparkSession will help us to create a spark session in the runtime and we can execute all Spark-related queries in our project easily.