Azure Data Lake : A wonderful Scalable Cloud Storage Solution for all your Big Data Needs
Use Cases and Deployment Scope
Stored Terabytes of Healthcare data in a cost-optimized solution on-cloud using Azure Data Lake Storage Gen2 in containerized fashion. We utilized Azure Data Lake Storage containers as a Destination in our Data Engineering Streasmets Pipelines. Loaded Data became available further to multiple downstream applications in an automated and faster way using Azure Data Factory. Also turned out a better, cost-optimized, and faster solution than HDFS for our different business use cases like the migration of huge data from RDBMS to Data Lake.
Pros
- Setting up Azure Data Lake Storage account, container is quite easy
- Access from anywhere and easy maintenance
- Integration with Azure Data Factory service for end to end pipeline is pretty easy
- Can store Any form of data (Structured, Unstructured, Semi) in faster manner
Cons
- UI search feature can certainly be improvised e.g. inclusion of wildcards to search a particular file in container
- Sometimes gets Hanged/lagged while monitoring
- Probably the new UI feature can address above issues.
Most Important Features
- Smooth Integration with other Azure Services i.e. Azure Databricks, Data factory, synapse, etc.
- Easy to access and Manage, Less maintenance required in comparison to traditional storage solutions
- Hadoop FIle System compatibility
Return on Investment
- Data Migration projects from relational sources to Azure Data Lake Storage have given a great ROI, thanks to the less running costs, and High availability
- Pretty easy to work with in terms of Managing and accessing Data in containerized fashion.
- Further features like Archival of data which is accessed less frequently can significantly reduce cost
Alternatives Considered
Apache Hadoop and Google Cloud Storage
Other Software Used
Apache Kafka, Apache Hive, Apache Spark





