TrustRadius Insights for Apache Hive are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Business Problems Solved
Apache Hive is a versatile software that has been widely used across various departments and organizations for different use cases. It has proven to be particularly helpful in handling large datasets, migrating data between different operating systems, synchronizing programs, and fetching and generating product metrics. Users have found value in using Hive for data analytics, engineering, data science, product management, and IT-related tasks such as improving analysis of big datasets stored in Hadoop HDFS.
Furthermore, Apache Hive has simplified the process of filtering and cleaning data using SQL, reducing the learning curve for handling big data. It allows users to run SQL queries against data in Hadoop, enabling efficient analysis of large datasets without the need to learn a new language. Additionally, Hive has been utilized for building reports, analyzing data stored in the Hadoop file system, processing events gathered in HDFS, and converting them into parquet files for fast querying.
Overall, users have praised Apache Hive for its scalability, accessibility, and cost-effectiveness in storing and retrieving analytics data. It has provided an intuitive solution for storing large datasets, querying big sets of data using SQL, aggregating massive datasets into distilled information for data-driven decision making, and creating external and internal tables in Hadoop/BigData projects. With its ability to process both unstructured and structured data efficiently, Hive has become an essential tool for data analysts, engineers, and business analysts across organizations.
On-premises large data processing is handled by Apache Hive, which is running on Cloud ERA Servers. In order to use Apache Hive, you must have a distributed system that is query efficient and can perform queries quicker with parallel execution. Metrics like user information and purchase history are stored in HDFS and then accessed using queries built on top of Hive using Apache Hive.
Pros
Reduce-based query language with a simple query language.
Parallelism across a distributed system is provided.
All cloud platforms have access to a tabular format and interfaces.
Cons
Due to the shuffled data, complex joins may take a long time to complete.
Execution is dependent on external storage and memory.
Likelihood to Recommend
Data warehouses that update and append records in batches or real time can be queried using Apache Hive. Tableau and other reporting tools may be used straight from Python searches on Apache data sets. Structured data and tables may be accessed using SQL-like syntax. Using a hive, you may build tables at various levels of the Data Lake. Transactional databases are not the best fit.
We use Apache Hive to store a large set of data, which are huge documents such as problem statements and its answer, not only submitted by the site owners but also by the user of the site.
Pros
It is easy to store the data that are unstructured
Easy to retrieve using SQL queries instead of other complicated way
Large set of data can be stored efficiently
Cons
Apache Hive can provide more flexibility on the Integration.
Likelihood to Recommend
Apache Hive wont is really useful when we just store small data sets. so sometimes our usage wont is suitable for Hive. we are planning to move to SQL Databases if it continues.
We build our data lake and perform queries on large amounts of data. We group data from multiple sources into a common structure, making it easy for our developers to perform complex queries without leaving the simple framework provided by SQL. Although the deployment is not easy, once we have the infrastructure, the work is greatly simplified.
Pros
Simplify query to devs
Organize data
Batch process
Cons
Deploy
Maintenance
Support
Likelihood to Recommend
It is great for laboratory environments and to start working with unstructured data about which we are not very clear about how we want to treat it. It also allows queries to be improved very quickly by allowing developers to work with SQL instead of map-reduce. As an improvement, in productive environments, troubleshooting is complicated and requires expert personnel.
VU
Verified User
C-Level Executive in Information Technology (11-50 employees)
To manage and view Apache Hadoop data in a SQL-like format To be able to query databases across the organization, quickly To query data for the purpose of using on Spark projects To save queries
Pros
Easy-to-use, interactive modern layout
Easy to organize data and view tables and views from across the organization
Fast speed for most queries
Cons
Some queries, particularly complex joins, are still quite slow and can take hours
Previous jobs and queries are not stored sometimes
Switching to Impala can sometimes be time-consuming (i.e. the system hangs, or is slow to respond).
Sometimes, directories and tables don't load properly which causes confusion
Likelihood to Recommend
Apache Hive is well-suited for querying Hadoop. If you use Hadoop you should consider Hive. It is well-suited for large organizations where there is lots of data that needs to be queried. However, there is significant overhead to set up and maintaining Hive (and Hadoop in general). Small companies and individuals should consider other means of storing data, such as SQL.
We have used the system to migrate data either for new versions or because we will use another operating program, the software helps us to synchronize programs between different operating systems, a history of information can be kept constant, and it can be sent to third parties the information already transformed.
Pros
Please provide some detailed examples of things that Apache Hive does particularly well.
Migration to the cloud is modern and very secure.
Cons
The best way to do this is to schedule the extraction at times established by hours and quantities.
So that it can be used normally in daily use, it must be taken into account that the maintenance management of the system so that it works effectively.
Likelihood to Recommend
Software work execution is on a large scale, it is good to use for new projects or organizational changes, data lineage mapping has always been dubious but this one has had good results. You can store and synchronize data from different departments, the storage process can be manual but it is best automated.
I used Apache Hive on top of Hadoop for filtering and cleaning data using SQL. It was the part of the project which I was working on. Apache Hive gives SQL-like a platform where we can fire SQL queries. Apache Hive was a perfect choice for cleaning data as we were using Apache Hadoop and both are Apache products.
Pros
Filtering data
cleaning data
SQL like interface
Integrates with Hadoop
Cons
Uses lot of lot of memory
Not compatible with other databases like Postgres, MySql
Limited support
Slow as compare o other interfaces
Likelihood to Recommend
Apache Hive is best for ETL ( Extract Transform Load ) purposes. It gives its best performance when integrated with the Hadoop file distributed system. Its also very good for performing mathematical operations and when the data is organized and structured. It can handle large sizes of data ( petabytes) but requires a lot of in-memory in the system. It supports both unstructured and structured data nut best with structured data.
The software is intuitive from the first steps, one of the first features we take into account for the software does not allow duplicate files to be stored. It is advanced software that through data the system constantly learns and develops. The first phase is very effective, the analysis and checking of the information are verified in detail.
Pros
The unification of the data will help to establish the commercial criteria.
We are sure that the data is protected
Cons
If you try to extract an excessive amount of data, the system will become slow
You may have the danger that the system collapses due to the amount of data
Likelihood to Recommend
In addition to the fact that the information is quickly accessible through the established security protocols, it has not helped us as users to maintain a fairly comfortable data processing flow, it is more profitable to process the data in batches, we have been able to unify data from different sources
Main purpose for using Apache Hive was to get the insights from data. Analyzing the data and use it to take informed business decisions. Also the interface is similar to SQL working so it is easy to understand for a new person also.
Pros
It can be used to retrieve data from database like SQL.
We can partition the data and distribute amongst the clustered machines
Easily scalable, which gives capability of running analytics at a larger level
Cons
No support for working with Unstructured data.
ACID properties are not followed like database which creates confusion many times
Support OLAP environment only, OLTP is not supported
Likelihood to Recommend
If you have workforce who are knowing SQL and you have a need to explore large-scale data and get insights from it then Apache Hive is perfect for you. If you have experienced people who have worked on big data earlier then using Splunk is better. For starting the journey in data-driven decisions and data analytics it is better to use Apache Hive first.
VU
Verified User
C-Level Executive in Product Management (51-200 employees)
Apache Hive is an open-source data warehouse solution built on top of Hadoop that helps to analyze a very large amount of data. Our use case/scope is to work on a large data analytics project where the data frequency and velocity are very high. Apache Hive is very useful in processing both the unstructured and structured data in a seamless way. It help us in reducing to write complex queries as it is targeted to the SQL queries, we have a engineer team who are very proficient in writing SQL queries with the help of Apache Hive to process the big data. We have identified no business issues using the solution.
Pros
Apache Hive supports external data tables.
Supports data partitioning to improve overall performance.
Apache hive is reliable and scalable solution.
Apache Hive supports writing ad-hoc queries as well.
Cons
Apache hive is not best suited for OLTP based jobs.
Sometimes we observed high latency rate while querying data.
Limitations on providing row-level data update.
Training materials needs improvements.
Likelihood to Recommend
Apache Hive is a data warehouse/ ETL solution that is being used for processing big data for analytics and visualizations. Apache Hive has great architecture that makes it very well suited for organizations. The Metastore, is used for storing metadata for each table and its schema. The Driver operates as a controller for executions of the statements. Like other components such as Optimizer and CLI, Thrift Server are some components that enable the processing of big data transformation.
VU
Verified User
Program Manager in Information Technology (201-500 employees)
We use Apache to process large data and get the output with less process time. The framework is very much useful for data processing and analytics purpose.
Pros
Used in data warehouse like similar to ETL tools.
Interface like SQL give data stored in various db group.
Enables analytics at massive scale.
Cons
Way of framework development can be improved.
OLTP is not supported.
Does not offer real time queries.
Likelihood to Recommend
Keeps queries running very fast and takes very little time to write Hive queries in comparison to MapReduce code. Very easy to write queries including joins in Hive.
VU
Verified User
Administrator in Information Technology (1001-5000 employees)