Read and Query a Parquet File in a Spark Shell

Basic Read and Query

// Read in file to a data frame.
> val df = spark.read.parquet("filename.parquet")

// Print the data frame schema.
> df.printSchema()

// Show the data.
> df.show()

// Number of records.
> df.count()

Spark SQL

// Read in file to a data frame.
> val df = spark.read.parquet("filename.parquet")

// Create a temporary view over the dataframe
> df.createOrReplaceTempView("dfv") // dfv - data frame view

// Run SQL Select query
> val res=spark.sql("SELECT * FROM dfv")
> res.show()

Show more data:

df.show() // Show 20 rows & 20 characters for columns
df.show(50) // Show 50 rows
df.show(false) // Show 20 rows with full column value
df.show(50,false) // Show 50 rows & full column value
df.show(20,20,true) // Show 20 rows, column length 20 & displays data in vertical