TrustRadius: an HG Insights company

Databricks Data Intelligence Platform

Score8.6 out of 10

90 Reviews and Ratings

What is Databricks Data Intelligence Platform?

Databricks in San Francisco offers the Databricks Lakehouse Platform (formerly the Unified Analytics Platform), a data science platform and Apache Spark cluster manager. The Databricks Unified Data Service aims to provide a reliable and scalable platform for data pipelines, data lakes, and data platforms. Users can manage full data journey, to ingest, process, store, and expose data throughout an organization. Its Data Science Workspace is a collaborative environment for practitioners to run all analytic processes in one place, and manage ML models across the full lifecycle. The Machine Learning Runtime (MLR) provides data scientists and ML practitioners with scalable clusters that include popular frameworks, built-in AutoML and optimizations.

One Stop Shop for Data Professionals.

Use Cases and Deployment Scope

Databricks is the primary data platform where we land, standardize, clean, transform, and clean our data sources. We utilize the Workflows feature to automate reoccurring tasks and have built internal applications around the reusable workflows. We use the dashboard feature internally to allow customer success teams and business analysts to keep tabs on the performance and outputs of our products. The workloads are orchestrated in Databricks but executed within our own AWS accounts, allowing us to stay compliant with our stringent security requirements.

Pros

  • Thoughtful application of AI assistants during the coding and analysis steps.
  • Intuitive UI for users of varying skill sets.
  • Frequently updated documentation.

Cons

  • Greater support for non spark workloads.
  • Ability to host JAR files on serverless endpoints.

Return on Investment

  • Greater democratization to data sources.
  • Migration took a while, as we were largely a Pandas shop.

Usability

Alternatives Considered

Snowflake

Other Software Used

Notion, Datadog

Most collaborative Data Science & AI workspace !

Use Cases and Deployment Scope

I use Databricks Lakehouse Platform in my Data Scienc & AI consulting company to help various business entities with data-driven solutions. The platform can handle large and complex data sets and enable us to build and deploy applications using the latest technologies. The opennness of Databricks allows us to seamlessly integrate and adapt to our clients requirements :

* Creating dashboards with Tableau, Redash, Qlik,

* Feed their CRM tool like Salesforce, SAP,

* developing chatbots for Knowledge Management

* Serve ML models behind API endpoints.

Databricks Lakehouse Platform is a versatile and open product that saves us a lot of time, help us control cloud cost and human resources energy !

Pros

  • Enhanced Data Science & Data Engineering collaboration
  • Complete Infrastructure-as-code Terraform provider
  • Very easy streaming capabilities
  • Multiple Git providers integration with merge assistant

Cons

  • VsCode IDE support for local development
  • Python SDK for Workflows
  • Poetry support

Most Important Features

  • Unity Catalog
  • Collaborative Spark Notebook supporting python, SQL, Scala, R
  • Serverless Endpoints
  • mlflow integration

Return on Investment

  • Data Science environment is ready in a matter of minutes, not days.
  • Much better cost control
  • Easy onboarding for all clouds

Alternatives Considered

Azure Synapse Analytics and Snowflake

Other Software Used

Bitbucket, Azure Data Factory, Tableau Desktop

Data for insights

Pros

  • SQL
  • User friendly
  • Great development environment

Cons

  • Errors are not explained
  • No data back up feature
  • Interface can be more intuitive

Most Important Features

  • Data Warehouse
  • Spark computations
  • Allows SQL, Scala and R to collaborate on notebooks

Return on Investment

  • A comprehensive data warehouse for transactions and calculations.
  • Cost effective on just using one tool that does most we ask for.
  • Fast business insights with data availability

Alternatives Considered

Snowflake

Other Software Used

Microsoft Azure, Sift Digital Trust & Safety Suite, Automation Anywhere

Best in the industry

Use Cases and Deployment Scope

This product is used for Data Science project development, from data analysis/wrangling to feature creation, to training, to finetuning and to model test and validation, and finally to deployment. While Databricks is used by many users, we also use GitHub and code Q/A to promote a code in production. This is one of the advantages of Databricks is the integration part, not only Git but whether you use it on Azure or AWS, you can also leverage the power of the integrated Machine Learning in those platforms, such as auto ml or Azure ML.

Pros

  • Data Science code agnostic (SQL, R, Pyton, Pyspark, Scala)
  • Customer Service with REAL support from data eng. and data scientist
  • Integration with many technology : Tableau, Azure, AWS, Spark, etc.

Cons

  • Visualization
  • Collaboration

Return on Investment

  • Make Data Scientist more efficient

Usability

Great for both ad-hoc analyzes and scheduled jobs

Pros

  • Ready-2-use Spark environment with zero configuration required
  • Interactive analysis with notebook-style coding
  • Variety of language options (R, Scala, Python, SQL, Java)
  • Scheduled jobs

Cons

  • Random task failures
  • Hard to debug code
  • Hard to profile code

Most Important Features

  • Scalable processing
  • Notebook-based codebase
  • Scheduled jobs

Return on Investment

  • Supports end-customer dashboards
  • Provides a fast analysis platform
  • Supports BI dashboards for engineering teams