TrustRadius: an HG Insights company

Azure Data Factory

Score9 out of 10

60 Reviews and Ratings

What is Azure Data Factory?

Microsoft's Azure Data Factory is a service built for all data integration needs and skill levels. It is designed to allow the user to easily construct ETL and ELT processes code-free within the intuitive visual environment, or write one's own code. Visually integrate data sources using more than 80 natively built and maintenance-free connectors at no added cost. Focus on data—the serverless integration service does the rest.

Categories & Use Cases

Top Performing Features

  • Connect to traditional data sources

    Ability to connect to traditional data sources like relational databases, flat files, XML files and packaged applications

    Category average: 8.7

  • Connecto to Big Data and NoSQL

    Ability to connect to non-traditional data sources like Hadoop and other big data technologies, and NoSQL databases

    Category average: 7.7

  • Simple transformations

    Simple data transformations are calculations, data type conversions, aggregations and search and replace operations

    Category average: 8.1

Areas for Improvement

  • Testing and debugging

    Tool to debug and tune for optimal performance

    Category average: 6.9

  • Integration with data quality tools

    Integration with tools for cleansing, parsing and normalizing data according to business rules

    Category average: 7.9

  • Collaboration

    Collaboration is enabled by a shared repository of project information and metadata

    Category average: 7.1

Overall helpful product that works as advertised.

Use Cases and Deployment Scope

Using SHIR to pull records from on-premise databases and storing in ADLS storage. From ADLS storage, bringing data into databricks for analytics use. Roughly 50 different pipelines in each environment, with 3 separate environments. Code is stored and deployed from Azure Dev ops. Alerting is handled via LogicMonitor and Azure Functions.

Pros

  • Step by step processes.
  • Storing infrastructure as code.
  • Alerting on job failures.
  • SHIR

Cons

  • Learning curve for pipeline creation interface.
  • Alerting isn't necessarily built in. Had to work around this to meet team needs.
  • With GIT enabled, some features can only be done via git, while some need to be done via the portal.

Return on Investment

  • Still working on ROI. Development is ongoing after some non-Azure Data Factory related changes.

Usability

Other Software Used

Microsoft Exchange, Microsoft Azure, Microsoft Azure Key Vault, Nerdio

One of the best and reliable ETL & ELT platforms for pulling data from multiple sources

Use Cases and Deployment Scope

One of the best Data Integration tools for both ETL and ELT. I have been using ADF for the last 6+ years and it helped me in extracting several data feeds within our organization that meets our specific business needs. The tool provides many features such as Move and Transform, Data explorer, Azure Functions, Data bricks, Data Lake Analytics, Blob Storage, Linked services, Machine Learning, and Power Query.

Pros

  • It allows copying data from various types of data sources like on-premise files, Azure Database, Excel, JSON, Azure Synapse, API, etc. to the desired destination.
  • We can use linked service in multiple pipeline/data load.
  • It also allows the running of SSIS & SSMS packages which makes it an easy-to-use ETL & ELT tool.

Cons

  • For complex JSON when it comes to mapping nested attribute it's not easy to flatten out
  • Data Factory V1 does not have a good implementation experience as compared to V2
  • Work with on premise solutions sometimes is not too friendly because you will need to set a VPN

Return on Investment

  • ADF makes the whole ETL process very simple and manageable.
  • It saves a lot of cost and time.
  • Solving the data ingestions with ELT approach.
  • Storage compaction format help us a lot when dealing with Bigdata problems.

Alternatives Considered

AWS Glue

Other Software Used

Fivetran, Talend Data Integration, Informatica PowerCenter

Azure Databricks

Use Cases and Deployment Scope

Orchestration platform for the Databricks notebooks. Have used an ETL for loading csv files into SQL server based database.

Pros

  • Orchestration engine
  • Low code Data pipeline
  • Logic apps integration

Cons

  • Error Flagging, Details of the error code is not specific especially faced this during Azure Table load
  • Missing feature of Data exploration functionality similar to Synapse Data explorer
  • missing access to orchestrate/create stream analytics job

Return on Investment

  • No Code / Low Code Easier development
  • Easier Orchestration Platform
  • Lot of different services available for plug in connect

Alternatives Considered

Azure Synapse Analytics (Azure SQL Data Warehouse) and Oracle Data Integrator

Other Software Used

Azure Synapse Analytics (Azure SQL Data Warehouse), Databricks Lakehouse Platform (Unified Analytics Platform), Azure Blob Storage

Database management and ETL tool for big data that is smart and reliable

Pros

  • Creating ETL and ELT workflows as well as orchestrating and monitoring pipelines without writing any code.
  • Hybrid data integration is easily and agilely possible through this software.
  • It has lot of various useful components

Cons

  • It should integrate more ETL and audit functionality.
  • Pipelines lack flexibility because moving Data Factory pipelines between different environments, such as for development or testing, require increased security and flexibility.
  • The number of pre-defined templates is small and they should have more variety.

Return on Investment

  • Facilitate better decision-making and improve business processes.
  • Optimize business process outcomes by increasing internal efficiency and operational effectiveness.
  • Boosts revenue growth while improving business process agility.

Alternatives Considered

IBM InfoSphere DataStage, SnapLogic and Pentaho

Other Software Used

Azure Backup, Microsoft Azure, Azure Cosmos DB

The go-to ETL tool for most situations

Use Cases and Deployment Scope

Data Integration: We harness Azure Data Factory's capabilities to move data from various sources – both on-premises databases and cloud storage – into our Azure data storage solutions like Azure SQL Database, Azure Blob Storage, and Azure Data Lake Store. This ensures all our data, regardless of its origin, is consolidated in one place.

Transformations: Azure Data Factory's data flow transformations help us clean, transform, and enrich our data before loading it to the destination. This is crucial for maintaining data quality, especially when dealing with diverse datasets.

Pros

  • Azure Data Factory supports a vast array of source and destination connectors, both from within the Microsoft ecosystem (like Azure Blob Storage, Azure SQL Database, Azure Cosmos DB) and external platforms (like Amazon S3, Google Cloud Storage, SAP, Salesforce, and many more).
  • Azure Data Factory's Mapping Data Flows provides a code-free environment to design data transformations visually. Users can drag and drop elements to create complex ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes without needing to write any code.
  • Azure Data Factory provides a unified monitoring dashboard that offers a holistic view of all pipeline activities. I think this makes it easier for users to track the status of various jobs, identify failures, and pinpoint bottlenecks.

Cons

  • Granularity of Errors: Sometimes, Azure Data Factory provides error messages that are too generic or vague for us, making it challenging to pinpoint the exact cause of a pipeline failure. Enhanced error messages with more actionable details would greatly assist us as users in debugging their pipelines.
  • Pipeline Design UI: In my experience, the visual interface for designing pipelines, especially when dealing with complex workflows or numerous activities, can become cluttered. I think a more intuitive and scalable design interface would improve usability. In my opinion, features like zoom, better alignment tools, or grouping capabilities could make managing intricate designs more manageable.
  • Native Support: While Azure Data Factory does support incremental data loads, in my experience, the setup can be somewhat manual and complex. I think native and more straightforward support for Change Data Capture, especially from popular databases, would simplify the process of capturing and processing only the changed data, making regular data updates more efficient

Return on Investment

  • Cost Savings: By automating our ETL processes with Azure Data Factory, we've reduced manual data handling by approximately 60%. This translates to savings from reduced man-hours and the overhead of maintaining legacy systems.
  • Timeliness: Our report generation time has reduced by 70% with Azure Data Factory's scheduled pipelines. Faster insights mean quicker decisions for us, enabling our teams to capitalize on time-sensitive opportunities. We can easily share the data visualizations to all stakeholders.

Alternatives Considered

Informatica Cloud Data Integration and AWS Glue