Shark makes Hive faster and more powerful.

What is Shark?

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.

News

Speed

Run Hive queries up to 100x faster in memory, or 10x on disk.

Shark uses the powerful Apache Spark engine to speed up computations.

Hive Compatibility

Run unmodified Hive queries on existing warehouses.

Shark reuses the Hive frontend and metastore, giving you full compatibility with existing Hive data, queries, and UDFs. Simply install it alongside Hive.

Shark uses the existing Hive client and metastore

Spark Integration

Unlock your data with machine learning and statistics.

By running on Spark, Shark can call complex analytics functions like machine learning right from SQL. Or call Shark inside your Spark jobs to load Hive data.

GENERATE KMeans(tweet_locations)
SAVE AS TABLE tweet_clusters;
Calling machine learning functions from SQL

Scalability

Use the same engine for both short and long queries.

Unlike other interactive SQL engines, Shark supports mid-query fault tolerance, letting it scale to large jobs too. Don't worry about using a different engine for historical data.