Shell access in SparkSQL

Another cool thing about SparkSQL is that with it, you can actually expose a shell that you can connect to. So if you cache a table, you can actually connect to it by starting a server and issuing SQL queries to it, just like you would with any other database. Here are the key features of the process:

  • SparkSQL exposes a JDBC/ODBC server (if you build Spark with Hive support)
  • You start SparkSQL with sbin/start-thriftserver.sh
  • It listens on port 10000 by default
  • You connect to it using bin/beeline -u jdbc:hive2://localhost:10000
  • Voila, you have a SQL shell for SparkSQL
  • You can create new tables or query existing ones that were cached using hiveCtx.cacheTable("tableName")

Think about how powerful this is: You can have ...

Get Frank Kane's Taming Big Data with Apache Spark and Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.