Hadoop Hive is an open source data warehouse built on top of the Apache Hadoop framework. It allows for large scale data analysis and storage with Hadoop, as well as an intuitive and easy to use SQL-like interface. Hive was originally developed internally by Facebook but was released as open source software and is currently licensed under the . Hive Requirements Hive requires that you have a working Hadoop installation because Hive uses Hadoop for it's data storage and processing. Uses for Hive Once configured, Hive provides a really easy to use and understand SQL-like interface. Hive abstracts your data and allows you to visualize it in a familiar table format, enabling easy data extraction and analysis using aggregate functions provided by Hive. Hadoop does the heavy lifting of the data storage and data processing using the MapReduce framework. This framework allows for complex data analysis on massive data sets in much shorter time than traditional RDBMS data warehouses. Hive is currently being used in various applications such as log analysis, where large amounts of raw data need to be processed daily. Using Hive for these purposes provides an easy to use mechanism for visualizing massive datasets in a timely manner, which can be an integral part for any web-based business trying to gain insight into their product and users. The use of Hive at Facebook has allowed them to analyze approximately 15TB of compressed web server logs per day, a task that would take much longer using conventional systems. For a much more thorough guide to using Hive, go to the Apache Hive Getting Started Guide.
|