One Stop solution for all the Orchestration needs.
Use Cases and Deployment Scope
Pros
- Apache Airflow is one of the best Orchestration platforms and a go-to scheduler for teams building a data platform or pipelines.
- Apache Airflow supports multiple operators, such as the Databricks, Spark, and Python operators. All of these provide us with functionality to implement any business logic.
- Apache Airflow is highly scalable, and we can run a large number of DAGs with ease. It provided HA and replication for workers. Maintaining airflow deployments is very easy, even for smaller teams, and we also get lots of metrics for observability.
Cons
- To achieve a production-ready deployment of Apache Airflow, you require some level of expertise. A repository of officially maintained sample configurations of Helm charts will be handy for a new team.
- As airflow is used to build many data pipelines, a feature for building lineage using queries for different compute engines will help develop the data catalog. Typically, multiple tools are required for this use case.
- For building a data pipeline from upstream to downstream tables, using Airflow with lineage to trigger the downstream DAGs after recovery will be helpful. Additionally, creating a dependency between the DAGs would be beneficial.
Return on Investment
- By using Apache Airflow, we were able to build the data platform and migrate our workloads out of Hevo Data.
- Airflow currently powers the datasets for the entire company, supporting analytics backends, data science, and data engineering use cases.
- We can scale the DAGS from < 1000 to currently> 8000 dag runs per day using HA and worker scaling.

