Another cool thing about SparkSQL is that with it, you can actually expose a shell that you can connect to. So if you cache a table, you can actually connect to it by starting a server and issuing SQL queries to it, just like you would with any other database. Here are the key features of the process:
- SparkSQL exposes a JDBC/ODBC server (if you build Spark with Hive support)
- You start SparkSQL with sbin/start-thriftserver.sh
- It listens on port 10000 by default
- You connect to it using bin/beeline -u jdbc:hive2://localhost:10000
- Voila, you have a SQL shell for SparkSQL
- You can create new tables or query existing ones that were cached using hiveCtx.cacheTable("tableName")
Think about how powerful this is: You can have ...