Understanding Apache Hive
Introduction BigData and Hive Apache Hive is a software application created to facilitate data analyses on Apache Hadoop. It is a Java framework that helps extracting knowledge from data placed on a HDFS cluster by providing a SQL-like interface to it. The Apache Hadoop platform is a major project on distributed computing and it is commonly assumed to be the best approach when dealing with BigData challenges. It is now very well established that great volume of data is produced everyday. Whether it is by system logs or by users purchases, the amount of information generated is such that previous existing Databases and Datawarehouses solutions don’t seem to scale well enough. The MapReduce programming paradigm was uncovered in 2004 as a new approach on processing large datasets. In 2005 its OpenSource version, Hadoop, was created by Doug Cutting. Although Hadoop is not set for substituting relational databases, it is a good solution for big...