Apache Hadoop vs. Azure HDInsight

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Hadoop
Score 7.9 out of 10
N/A
Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.N/A
Azure HDInsight
Score 4.0 out of 10
N/A
HDInsight is an implementation of the Apache Hadoop technology stack on the Microsoft Azure cloud platform: It is based on the Hortonworks Hadoop distribution. Microsoft Azure HDInsight includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie, Ambari, etc. It also integrates with with business intelligence (BI) tools such as Power BI, Excel, SQL Server Analysis Services, and SQL Server Reporting Services.N/A
Pricing
Apache HadoopAzure HDInsight
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
HadoopAzure HDInsight
Free Trial
NoNo
Free/Freemium Version
YesNo
Premium Consulting/Integration Services
NoNo
Entry-level Setup FeeNo setup feeNo setup fee
Additional Details
More Pricing Information
Community Pulse
Apache HadoopAzure HDInsight
User Ratings
Apache HadoopAzure HDInsight
Likelihood to Recommend
8.0
(0 ratings)
4.0
(0 ratings)
Likelihood to Renew
9.6
(0 ratings)
-
(0 ratings)
Usability
8.0
(0 ratings)
8.9
(0 ratings)
Performance
8.0
(0 ratings)
-
(0 ratings)
Support Rating
7.5
(0 ratings)
1.0
(0 ratings)
Online Training
6.1
(0 ratings)
-
(0 ratings)
User Testimonials
Apache HadoopAzure HDInsight
Likelihood to Recommend
Apache Hadoop (and its subsequent add-ons) are well-suited to larger, unstructured data flows, such as aggregation of web traffic or advertising. Geospatial algorithms and their outputs are well-suited for this kind of aggregation as structuring that data is challenging, but leaving it unstructured and performing queries as-needed is a better fit for most business models. With the advent of data science, I would expect Hadoop fits a LOT of their initial outputs quite well.
Read full review
If you want to save costs and just pay for what you use, I highly recommend it. It will help you also to work with data for your reports and analytics. on the other hand I think it could be the subscription you have but high volume of data make it slow but not so much. anyway I think it's really good because it's from Microsoft which always is friendly to use it as all the suit they have.
Read full review
Pros
  • HDFS is reliable and solid, and in my experience with it, there are very few problems using it
  • Enterprise support from different vendors makes it easier to 'sell' inside an enterprise
  • It provides High Scalability and Redundancy
  • Horizontal scaling and distributed architecture
Read full review
  • Shows live changes in analytics. Shows you how social media is working for us. Since we promote weekly events this is something that we really need to pay attention to.
  • Azure in itself is very user-friendly, HDInsight is a great addition. For our purposes, we definitely also utilized the power query to translate data to Excel.
Read full review
Cons
  • Hadoop is a batch oriented processing framework, it lacks real time or stream processing.
  • Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size.
  • Hadoop cannot be used for running interactive jobs or analytics.
Read full review
  • Had issues while using Azure ARM Templates for deployments.
  • The cost structure suited for high end temporary data analytics, rather than a traditional frequently queried data warehouse.
Read full review
Likelihood to Renew
Hadoop is organization-independent and can be used for various purposes ranging from archiving to reporting and can make use of economic, commodity hardware. There is also a lot of saving in terms of licensing costs - since most of the Hadoop ecosystem is available as open-source and is free
Read full review
No answers on this topic
Usability
Great! Hadoop has an easy to use interface that mimics most other data warehouses. You can access your data via SQL and have it display in a terminal before exporting it to your business intelligence platform of choice. Of course, for smaller data sets, you can also export it to Microsoft Excel.
Read full review
Azure HDInsight is usable on the top of Azure Data Lake and gives us the benefit of analyzing large scale data workload in Hadoop. Usability and support from Microsoft are outstanding.
Read full review
Support Rating
We went with a third party for support, i.e., consultant. Had we gone with Azure or Cloudera, we would have obtained support directly from the vendor. my rating is more on the third party we selected and doesn't reflect the overall support available for Hadoop. I think we could have done better in our selection process, however, we were trying to use an already approved vendor within our organization. There is plenty of self-help available for Hadoop online.
Read full review
Inexpert, isolated teams... not good for support an excessively complex platform. Lots of weeks or months for a complex problem troubleshoot. Many time lost stuck on MindTree, before the case was finally escalated with Microsoft!
Read full review
Online Training
Hadoop is a complex topic and best suited for classrom training. Online training are a waste of time and money.
Read full review
No answers on this topic
Alternatives Considered
I feel that this is a highly reliable and scalable solution computing technology that is highly capable of processing large data sets across multiple servers and thousands of machines in a well-defined and distributed manner. Apache Hadoop can automatically scale up the number of servers and machines that are needed to process, store, and analyze data sets. It also handles explosions in data with big data technology. Apache Hadoop is good at handling all node failures as well.
Read full review
Many times you just need spark performing fast and cheap. Azure HDInsight Includes lots of features and not required software. Also its libraries and runtime versions are pritty old. But, what is great Is you don't need to have an expert in your team and things -when work- performs always in the exact same way. Also, as I mentioned, for a starter that's a great ROI.
Read full review
Return on Investment
  • As it was open source makes it popular choice for handling large chuck of datasets
  • It was free earlier but now it’s licensed but still enterprise is a fine tuned version which makes it easier for new users and administrators to use it
  • Our investment is worth every single penny.
  • Initial cost is more as you might need to hire administrators to setup the cluster and make them in scalable. But once done it’s pretty easy
Read full review
  • ROI is of course there, as no legacy software for data presentation.
  • No manual intervention for data retrieval.
  • Data is available anywhere as requested.
Read full review
ScreenShots