Azure Data Lake Storage

Score9.5 out of 10

33 Reviews and Ratings

What is Azure Data Lake Storage?

Azure Data Lake Storage Gen2 is a highly scalable and cost-effective data lake solution for big data analytics. It combines the power of a high-performance file system with massive scale and economy to help you speed your time to insight. Data Lake Storage Gen2 extends Azure Blob Storage capabilities and is optimized for analytics workloads.

Categories & Use Cases

Abhishek Katara View profile

Technical Lead in Information Technology at Tata Consultancy Services (10,001+ employees employees)

Use Cases and Deployment Scope

Stored Terabytes of Healthcare data in a cost-optimized solution on-cloud using Azure Data Lake Storage Gen2 in containerized fashion. We utilized Azure Data Lake Storage containers as a Destination in our Data Engineering Streasmets Pipelines. Loaded Data became available further to multiple downstream applications in an automated and faster way using Azure Data Factory. Also turned out a better, cost-optimized, and faster solution than HDFS for our different business use cases like the migration of huge data from RDBMS to Data Lake.

Pros

Setting up Azure Data Lake Storage account, container is quite easy
Access from anywhere and easy maintenance
Integration with Azure Data Factory service for end to end pipeline is pretty easy
Can store Any form of data (Structured, Unstructured, Semi) in faster manner

Cons

UI search feature can certainly be improvised e.g. inclusion of wildcards to search a particular file in container
Sometimes gets Hanged/lagged while monitoring
Probably the new UI feature can address above issues.

Most Important Features

Smooth Integration with other Azure Services i.e. Azure Databricks, Data factory, synapse, etc.
Easy to access and Manage, Less maintenance required in comparison to traditional storage solutions
Hadoop FIle System compatibility

Return on Investment

Data Migration projects from relational sources to Azure Data Lake Storage have given a great ROI, thanks to the less running costs, and High availability
Pretty easy to work with in terms of Managing and accessing Data in containerized fashion.
Further features like Archival of data which is accessed less frequently can significantly reduce cost

Alternatives Considered

Apache Hadoop and Google Cloud Storage

Other Software Used

Apache Kafka, Apache Hive, Apache Spark

David Vaughn View profile

Network Engineer in Information Technology at Willis-Knighton Health System (5001-10,000 employees employees)

Use Cases and Deployment Scope

We had all of our storage located within a single datacenter, which caused an issue should it go down. Azure Data Lake Storage allowed us to move some of the storage there, keeping that piece online and active if we lost communication to the main datacenter. It's nice, but not the most reliable.

Pros

PowerShell integration
Azure AD integration
AdlCopy

Cons

Price is a bit steep
CLI could be better
Permissions are difficult to use compared to their competition

Most Important Features

Azure AD integration
Hierarchical file system
Usage of Azure KeyVault for encryption keys

Return on Investment

Initial cost was reasonable compared to the previous solution we had in place
An unforeseen growth rate is leading to a much higher cost than expected
Transaction fees are higher than the previous solution we used

Alternatives Considered

Amazon EMR (Elastic MapReduce)

Other Software Used

Aruba ClearPass, Avaya Cloud Office, Zebra Desktop Printers

Daniel Ortiz View profile

Gerente de marketing in Marketing at Numana SEO (51-200 employees employees)

Use Cases and Deployment Scope

Overall it is easy to learn and would be useful for any home care service. Another thing that I like about it is that there is a phone call system where they help you with all the questions you may have. Audio and video calls are possible, with PC screen sharing its other systems

Allows saving documents for some members or sharing them in general channels

Pros

Provides an overview of any device you will eventually work with in the future.
Having short videos allows me to go back and study precisely the topics I need without sifting through 30-minute videos to find the vignettes I need.

Cons

study for the certifications also to have them as a reference for work when you have any questions about applying a configuration to the equipment.
The Internet interface is simple and easy to use. Capacity is good and it's good that HP continues to innovate with this technology

Most Important Features

Excellent tool replaces emails as a means of communication
The different channels can be generated for each area or management of the company and other leisure channels.

Return on Investment

The cost can be high for more advanced work. In some cases, for instance, time limits and lab runtimes may be too short if you are too slow to learn what is explained as you go along.
promote flexible team communication. You can create different spaces for different teams, and share files and tasks.

Other Software Used

AbacusLaw, Bautomate, Avonbrook Fortuna DMS

Davide De Pretto View profile

Software Engineer in Engineering at Witailer (51-200 employees employees)

Use Cases and Deployment Scope

We need to store large amount of data that flow daily from our processes as well from external APIs, and we need to keep them for long period of times to perform historical analysis for our clients. Azure Data Lake Storage helps us achieve this goal by providing a secure, fast and large data store for our needs.

Pros

Store large amount of data
Access this data quickly using Synapse Analytics or Spark/Databricks
Ingest data quickly so our ingestion APIs are never throttled

Cons

I'd like to see a better cross-platform native client. Azure Data Explorer is fine, but it's far from the "SSMS" kind of experience SQL Server users are used to.
Listing a large number of file is somewhat problematic and slow. Using the native C# library, running directly on an Azure VM, it can take several hours to list just a couple million files.
Switching from V1 to V2 requires the creation of a new Storage Account and that's pretty inconvenient.

Most Important Features

Deep integration within the Azure ecosystem, including C# libraries and Azure Synapse Analytics
Solid internal knowledge
It's a scalable platform that - contrary to a relational database - support rapid growth in data size without needing ever extensive maintenance

Return on Investment

Being a data company,Azure Data Lake Storage and our "data mart" is really the core of our business
Migrating from a relational approach to Azure Data Lake Storage has a very short ROI thanks to the significant reduction of running costs
The ability to move data in the Archive tier helps us to further reduce costs, when needed

Alternatives Considered

Azure SQL Database

Other Software Used

Azure SQL Database, Azure Synapse Analytics (Azure SQL Data Warehouse), Azure Service Bus, Azure App Service

Steve Lollar

IT Support Specialist II in Information Technology at Nantahala Outdoor Center (51-200 employees employees)

Use Cases and Deployment Scope

Azure Data Lake is being utilized in a number of ways for our company, most of all tracking employee meal plans, and other analytical sales data. This is the best solution for our use case, and has worked extremely well. We love that it also integrates with Power BI, which our sales team and marketing folks use heavily.

Pros

Affordable and cost effective for small-medium sized businesses.
Regulatory Compliance Metrics
Deployment that's not complicated

Cons

U-SQL is somewhat complex to understand
You cannot use blob APIs, NFS 3.0, and Data Lake Storage APIs to write to the same instance of a file.
The WASB driver experiences issues all the time

Most Important Features

Unlimited Data Size
Fault-Tolerant and Available
Optimized for High-Speed Throughput
True HDFS Compatibility

Return on Investment

Better sales metrics and data for accounting to review
Improved storage capacity
Security and compliance features are incomparable compared to similar solutions

Alternatives Considered

AWS Data Pipeline