TrustRadius: an HG Insights company

Azure HDInsight

Score4 out of 10

36 Reviews and Ratings

What is Azure HDInsight?

HDInsight is an implementation of the Apache Hadoop technology stack on the Microsoft Azure cloud platform: It is based on the Hortonworks Hadoop distribution.

Microsoft Azure HDInsight includes implementations of Apache Spark, HBase, Storm, Pig, Hive, Sqoop, Oozie, Ambari, etc. It also integrates with with business intelligence (BI) tools such as Power BI, Excel, SQL Server Analysis Services, and SQL Server Reporting Services.

Categories & Use Cases

A great tool for mid-size organizations, that would become a great tool for mid-big ones

Use Cases and Deployment Scope

Azure HDInsight that's the main ETL and Data Processing platform of our company. It hosts all processes that generated the data we provide for our costumers, managed by ADF pipelines.

Pros

  • Integration with Azure Datafactory
  • Integration with Azure Management API
  • Easy to deploy from powershell, API, ADFv2 and other Azure Resources
  • Cheap when creating and deleting dynamically on demand clusters
  • A high level of documentation
  • Some diagnostic tools

Cons

  • Instable for months, with freaky problems
  • By default generates huge logs that finally break the cluster
  • Difficult to control its resources escalation
  • Difficult to plan, monitor and limit its costs
  • On-Demand clusters are very unstable and untrustful
  • Cluster creation process fails often
  • Cost is huge and difficult to control / limit
  • Dead weight (and cost) of some products you will never use into the cluster (we just needed Spark and Hadoop)
  • Poor support team
  • Outdated software in the clusters (almost out of support Python versions)
  • Almost impossible to customize and adapt for our specific needs

Return on Investment

  • When our business started, it was a great value for a company with no tech experts
  • When business grow and data volumes also, problems started to arise: lots of on-demand clusters creations failed, cluster internal cluster crashed due to logs size (no way to limit them)
  • When business got really big, after having months of outage, we have to start looking for a new platform. It became ineffective in ROI.

Alternatives Considered

Apache Spark and Apache Hadoop

Other Software Used

Azure Data Factory, Azure Blob Storage, Apache Spark, Azure App Service, Azure Functions, Azure Logic Apps, Azure API Management, Azure Cosmos DB

Best tool for advertising and marketing

Pros

  • Promote things that needed to be on top.
  • Current analytics changes.
  • As it's from Microsoft, it has high availability and support from Microsoft.
  • Easy transformation of high volume data.

Cons

  • Performance issue with large volume data.
  • Cost

Return on Investment

  • Meet business objectives.
  • Advertising events so save cost.
  • High availibility

Alternatives Considered

Microsoft Power BI and Azure Blob Storage

Usability

totally friendly!

Pros

  • It's scalable.
  • It's from Microsoft so it's friendly to use.
  • Data is available when you need it.

Cons

  • Performance with high volume of data.

Return on Investment

  • The impact is so far positive, the organization pays for the whole company in 3 countries.

Usability

Other Software Used

SAP Business Warehouse (SAP BW), formerly SAP NetWeaver Business Warehouse, Microsoft Power BI

Azure HDInsight provide good toll for data management

Pros

  • Data is presented without interfering others (IT or other dept).
  • Data is managed properly and is available for retrievable any time.
  • Legacy use of CD/DVD and Pendrive are not required.

Cons

  • Sometimes, lower bandwidth faces some issues (Slow speed)
  • Something can be done about lower bandwidth capability as higher bandwidth may not be available at all the places.
  • But improvement will be there in the above capability I am sure as the technology evolves.

Return on Investment

  • ROI is of course there, as no legacy software for data presentation.
  • No manual intervention for data retrieval.
  • Data is available anywhere as requested.

Other Software Used

Microsoft 365 (formerly Office 365), Microsoft Teams, Honeywell Connected Plant, Honeywell Production Control Center for Oil & Gas (PCC), ABB Ability Solution Suite

Usability

My Experience with Azure HDInsight

Pros

  • The Hadoop cluster was built within minutes.
  • It is cost-effective to collect and store structured or unstructured data.
  • The flat network storage system technology offers a high-speed connection between nodes and blob storage system.

Cons

  • Had issues while using Azure ARM Templates for deployments.
  • The cost structure suited for high end temporary data analytics, rather than a traditional frequently queried data warehouse.

Return on Investment

  • Less implementation cost.
  • No hosting services cost.
  • Faster data retrieval.

Usability