TrustRadius Insights for IBM StreamSets are summaries of user sentiment data from TrustRadius reviews and, when necessary, third party data sources.
Business Problems Solved
Users have found Streamsets to be a versatile and user-friendly platform that solves a variety of data integration challenges. One key use case is the ability to easily develop on-premises and deploy to the cloud, helping users control their cloud budget efficiently. The platform has also been praised for its seamless integration with Apache Kafka and Apache Nifi, simplifying the process of connecting these tools with a data lake.
Streamsets has proven valuable in handling real-time data consumption, filtering, tagging, and monitoring of systems, as well as anomaly detection based on traffic patterns. Users have utilized the platform for data movement, migration, and ingestion, reducing downtime and simplifying the process. Additionally, Streamsets has been widely used for data extraction from various source systems, including IoT devices, enabling users to gain insights from previously inaccessible data sources.
The tool's ability to handle different data formats elegantly and save time compared to hand-coded ETL tools has been appreciated by users. It has been effectively used for solving big data ETL problems, offering fast transfer, support for various sources and destinations, and prompt support. Streamsets has also been utilized in AI/ML tasks such as building transformations for knowledge graphs.
Overall, Streamsets has proven reliable and efficient in handling data ingestion from various sources, meeting the needs of users across industries and providing flexibility in designing pipelines with minimal coding.
I use IBM StreamSets to continuously train AI models with real-time data streams, ensuring my models are always up-to-date with fresh, high-quality data. It helps me to handle schema changes easily and scale data pipelines efficiently. This allows me to automate data ingestion and transformation across diverse sources,
Pros
Real time fraud detection
Helps organisation to build personalised costumer demands
risk management makes it easy for organisations to detect potential fraud risks
Cons
Can't handle large data it lags
Eror logs ain't easy to understand
Support system takes time to respond
Likelihood to Recommend
Our development team utilized IBM StreamSets to design Data Pipelines to hydrate/load On-Prem data (from various RDBMS sources) to Cloud i.e. Azure, GCP It needs improvement in the logging as the logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
As a Federal Govt Contractor we use IBM StreamSets as a central component of our data integration and pipeline management. We use it to collect, transform, and deliver data securely across our different systems that support our government contracts
Pros
ability to click & drag to quickly build pipeline
one system platform
central monitoring
Cons
interface can lag with larger data pulls
version control isnt great for team work
some complex transformations still require custom work
Likelihood to Recommend
IBM StreamSets is well suited for real-time or batch data integration across different systems, especially where data quality, monitoring, and governance are priorities. It’s less appropriatr where a full platform deployment adds unnecessary overhead. It can also be challenging in highly restricted network environments, where configuring collectors and securing communication adds complexity.
So in my organisation we majorly use IBM StreamSets to automate data flows between our CRM and analytics tools. Before it, we used to do it manually/ some other non effective tool and spend hours moving and cleaning data which was quite frustrating to be honest. Now we can set up pipelines that run quite smoothly and also keeps the reports accurate.
Pros
It makes building data pipelines quite super intuitive even for non coders.
Ir also handles real time data ingestion effortlessly so I always have up to date information for my reports.
It's great at monitoring data quality as well.
Cons
The error messages I feel aren t always very descriptive so troubleshooting can take longer
Maybe more customisation options for scheduling can be done, rest it works pretty well.
Likelihood to Recommend
I think it is really well suited for scenario ls where I need to move and transform data between cloud applications quickly. It's also good for automating routine data cleaning tasks, which saves me a lot of manual efforts. Also, it ensures that I have the up to date information for the reports that are important to me, so that's a big advantage.
I used IBM StreamSets for data analysis. It is a brilliant tool for monitoring data for analysis and provide pie charts and graphs in an easily readable format which lets even a not so well trained but knows enough person it read it efficiently and accurately. The charts and graphs give thorough information about the data without missing any key points.
Pros
Graphs and charts are designed well
Data summation is amazing
Easy to read and understand the summed up information
Cons
Where the person's skillsets in data analysis is not of an expert.
Data monitoring and analysis.
Customer data for better customer acquisition
Likelihood to Recommend
I've used this tool personally for looking at the data of how many videos the user watches. The categories and subcategories that the user us interested in. What are the professions of the people who are most interested in a particular kind of video. What institutes and researchers are more inclined towards which categories and sub-categories etc.
P.S. I work in a company which has a catalogue of scientific research and educational videos.
I mainly use IBM StreamSets to stream data from our on-prem systems to cloud applications and use them in real-time user applications to give them the latest information of various business reports that users create on different systems like client onboarding applications etc which then gets streamed to advisor applications where the advisor users create reports out of this available data and use it in their regular day to day work activities.
Pros
It helps streaming huge data that we have in our Teradata database to various reporting applications that runs on cloud seamlessly.
We also use IBM StreamSets to power few BI dashboards that our product managers use on regular basis to showcase various data with clients.
I think the data quality is way better compared to Informatica tool.
Cons
IBM should make things easy for beginners to get started with IBM StreamSets tool. Most new joinees in my team always find it difficult to do debugging in existing pipelines that we have.
The integration limitations are there. Like compared to Java where it integrates well but other frameworks like Python, .NET etc, the support is not so good.
The UI/UX interface, while intuitive for simple pipelines, sometime becomes cluttered and hard to navigate when managing complex pipelines involving more data streams.
Likelihood to Recommend
When you are dealing with a data warehouse and want to find an easy way to integrate applications and expose data in real-time, then IBM StreamSets is the best tool to go for. I'm using it for the same purpose in my applications.
This tool will be well-suited for someone with a proper technical background. Though IBM StreamSets UI is mostly drag and drop, advanced configurations require technical expertise or support to do the initial setup.
Verified User
Engineer in Information Technology (10,001+ employees)
the solution allows a live ingestion of data from different sources and create customized pipelines, monitor them and use them for many different scopes. data are real time and can have various format, are then integrated and collected into one unique layer from which are available for control and execution
Pros
science and analysis of real time data
create reaction chains to specific events
unification of data into one unique layer
Cons
design of the pipeline is complicated
integration of AI to support the user
reporting and monitor dashboards
Likelihood to Recommend
the IBM solution is well suited for environment where there are multiple cloud sources of data that are hard to uniform and standardize, and are critical for live monitoring and action taking based on reaching specific parameters. live monitoring and automatic reaction to data is a great feature of the tool.
Verified User
Project Manager in Information Technology (10,001+ employees)
We use IBM StreamSets for batch loading of data sets between disparate applications into a Data estate so we can query the data to find patterns. We also use IBM StreamSets to handle our continuously streaming data requirements. We went with IBM StreamSets over the competition because of their unique (patented) architecture.
Pros
Real-Time Data Ingestion.
Streaming Pipelines at Scale.
Handling Data Drift and Schema Changes.
Flexibility Across Hybrid/Multi-Cloud/On-Prem Environments.
Cons
Performance handling Large Data Volumes.
Debugging, Error Logging, and Observability.
Connector/Integration Coverage.
Likelihood to Recommend
Because real-world sources often change (new fields get added, formats get tweaked, etc.), StreamSets helps detect and adapt to those "schema drifts" or changes automatically, or with minimal manual intervention. That makes pipelines more resilient and significantly reduces the maintenance burden. Therefore, data sets with constantly changing sources/formats are great for StreamSets.
I use IBM StreamSets for AI-related tasks including continuously training models, as well as for real-time data streaming, handling schema changes, and simply scaling pipelines.
Pros
it connects to many data sources and helps catch issues early with built-in alerts and monitoring tools
it supports real-time and batch processing, handles data drift well, and makes pipeline debugging easier with the updated UI
Cons
it lag when handling large amounts of data
the error logs were sometimes difficult to interpret
support took time to respond when I needed urgent help
Likelihood to Recommend
StreamSets is for teams looking for a fast and low-code way to build ETL pipelines, also it’s especially useful if you’re dealing with real-time data or complex source systems.
We use IBM StreamSets as one of our ETL products. We use it for a lot of transformations and sourcing data from various heterogenous systems. We mainly moved to IBM StreamSets so as to move from our traditional batch process of data aggregation to real time data aggregation. We also use this for streaming data. IBM StreamSets comes with a lot of inbuilt connectors that help with the development of integrations more seamlessly
Pros
Stream Data
Transformation
Realtime data aggregations
Cons
Error reporting
Java
Upgrade issues
Likelihood to Recommend
IBM StreamSets is well suited for streaming data, close to real time data aggregations use cases. Comes with multiple built in connectors and transformations that reduces the development effort and time. There are connectors for majority of the cloud providers. They also have connectors for the destinations which include a lot of the new era of databases.
Verified User
Director in Information Technology (5001-10,000 employees)
Being part of one of the Healthcare Service provider accounts, we as a data engineering Team utilized StreamSets to design Data Pipelines to hydrate/load On-Prem data (from various RDBMS sources) to Cloud i.e. Azure, GCP. These Datasets are further utilized by Data scientists and analysts to generate patterns and insights for the healthcare benefits of customers.
We use StreamSets heavily not only for our Batch use cases but for real-time use cases too like consuming from Kafka topic and streaming data to Azure Event Hub.
Pros
A easy to use canvas to create Data Engineering Pipeline.
A wide range of available Stages ie. Sources, Processors, Executors, and Destinations.
Supports both Batch and Streaming Pipelines.
Scheduling is way easier than cron.
Integration with Key-Vaults for Secrets Fetching.
Cons
Monitoring/Visualization can be improvised and enhanced a lot (e.g. to monitor a Job to see what happened 7 days back with data transfer).
The logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
Auto Scalability for heavy load transfer (Taking much time for >5 million record transfer from JDBC to ADLS destination in Avro file transfer).
There should be a concept of creating Global variables which is missing.
Likelihood to Recommend
Majorly for all Batch and Streaming Scenarios we are designing StreamSets pipelines, few best suited and tried out use cases below : 1. JDBC to ADLS data transfer based on source refresh frequency. 2. Kafka to GCS. 3. Kafka to Azure Event. 4. Hub HDFS to ADLS data transfer. 5. Schema generation to generate Avro.
The easy to design Canvas, Scheduling Jobs, Fragment creation and utilization, an inbuilt wide range of Stage availability makes it an even more favorable tool for me to design data engineering pipelines.