Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.
Run Hive queries up to 100x faster in memory, or 10x on disk.
Shark uses the powerful Apache Spark engine to speed up computations.
Run unmodified Hive queries on existing warehouses.
Shark reuses the Hive frontend and metastore, giving you full compatibility with existing Hive data, queries, and UDFs. Simply install it alongside Hive.
Unlock your data with machine learning and statistics.
By running on Spark, Shark can call complex analytics functions like machine learning right from SQL. Or call Shark inside your Spark jobs to load Hive data.
Use the same engine for both short and long queries.
Unlike other interactive SQL engines, Shark supports mid-query fault tolerance, letting it scale to large jobs too. Don't worry about using a different engine for historical data.