StreamSets

StreamSets

Connect with experts and peers to elevate technical expertise, solve problems and share insights.


#DataIntegration
#Data
 View Only
  • 1.  Data Volume Validation

    Posted 15 days ago

    Hello IBM Team,

    We would like to confirm whether the following scenarios can be implemented using IBM StreamSets:

    1. Data volume validation
      Is it possible to validate that the amount of data processed by Streamset ? We have KAFKA Topic with Batch.

    2. Pipeline execution alerts
      Is there a way to configure automatic alerts (e.g., email notifications) when a pipeline has not been executed or fails to run as expected?

    We would appreciate your guidance or best practices related to these scenarios.

    Best regards,
    Rahul



    ------------------------------
    Rahul Dharmawat
    ------------------------------


  • 2.  RE: Data Volume Validation

    Posted 14 days ago
    Edited by Eric Greisdorf 14 days ago

    Hi Rahul,

    Yes, absolutely, these are common scenarios for IBM StreamSets.

    1. Data volume validation - Can be accomplished several ways, depending on the requirements. a. Pipeline uses standard stages (ex: Control Hub API or JDBC Query) to retrieve and compare record counts and data, and a Stream Selector stage to route/alert accordingly. b. REST API to retrieve Job metrics. c. Python SDK to retrieve Job metrics. Would you like more information on a specific approach?
    2. Pipeline execution alerts - We've posted this video training to the community library for setting up rules, alerts and subscriptions. A 'Pipeline Commit' subscription is used for this Github repository integration example .

    Let us know if these get you started, and anything we can help with.

    Best Regards,



    ------------------------------
    Eric Greisdorf
    ------------------------------



  • 3.  RE: Data Volume Validation

    Posted 14 days ago

    Thanks a lot Eric for your reply. 

    Really appreciate if you can give more detailed option to explore Data volume validation. We have KAFKA topic(Through Batch) to load the data and there is no specific window for the Kafka topic.

    Windows aggregator is not the correct option for us. Groovy Evaluator is also not working due to env. setup.

    Do you have some more material for Pipeline uses standard stages (ex: Control Hub API or JDBC Query) to retrieve and compare record counts and data, and a Stream Selector stage to route/alert accordingly.

    Really appreciate your help.



    ------------------------------
    Rahul Dharmawat
    ------------------------------



  • 4.  RE: Data Volume Validation

    Posted 13 days ago

    Hi Rahul,

    An excellent option is to use the Control Hub API Processor to make the JobRunner REST calls ex:  /jobrunner/rest/v1/metrics/job/{jobId}

    This will return the record counts and timings for each Pipeline stage.

    Then use a Lookup Processor / Executor to count the expected records, or inspect an audit trail.

    For more details specific to your environment and constraints, I recommend reaching out to your IBM account team.

    Regards,





    ------------------------------
    Eric Greisdorf
    ------------------------------



  • 5.  RE: Data Volume Validation

    Posted 13 days ago

    Thanks Eric for your reply. Do you have any documentation for the same. I will try to connect with IBM folks also.



    ------------------------------
    Rahul Dharmawat
    ------------------------------