Cloud Pak for Data

 View Only
  • 1.  Data Refinery long run issue

    Posted Tue October 11, 2022 10:37 AM
    Hi Teams,

    We assist our customer to use cloud pak for data. And when we use data refinery function, we meet some long run issue.(from 10/7 to now 10/11 and still running without error)
    Our action is peocess a table with three operations and export to csv:
    1. filter data by date
    2. split date field (like YYYYMMDD -> YYYYMM and DD)
    3. group by some fields
    4. export to csv

    Would like to ask if there is any way to troubleshoot? Thanks!




    ------------------------------
    Chun Hsiang Wu
    ------------------------------

    #CloudPakforDataGroup


  • 2.  RE: Data Refinery long run issue

    Posted Thu October 13, 2022 11:44 AM

    Thank you for your comment. This is a performance issue for the Split operation. We are actively working on a solution for an upcoming release.



    ------------------------------
    Nancy Weir
    ------------------------------



  • 3.  RE: Data Refinery long run issue

    Posted Sun October 16, 2022 11:13 PM

    Hi @Nancy Weir,

    Thanks for all your reply!

    Do we have any alternative plan if we suggest our customer to remove this split action?
    And besides prd environment, we also run the same script on qas environment, which has less data, and the script is work.
    Do we have relevant tests on how much data above will cause a performance issue?

    Thanks for all your assistance!



    ------------------------------
    Chun Hsiang Wu
    ------------------------------



  • 4.  RE: Data Refinery long run issue

    Posted Fri October 14, 2022 08:59 AM
    Have you tried using Modeler flows? Probably the Modeler engine can provide you with performance improvement.

    ------------------------------
    Sergio Gutierrez
    ------------------------------



  • 5.  RE: Data Refinery long run issue

    Posted Mon October 17, 2022 12:04 AM

    Hi Sergio,

    Thanks for your reply!
    But our project about dataOps push our customer to use data refinery to speed up communication with their user.
    Therefore, we may not switch tool on this stage.
    Do we have any other alternative on data refinery function?​​



    ------------------------------
    Chun Hsiang Wu
    ------------------------------



  • 6.  RE: Data Refinery long run issue

    Posted Wed October 26, 2022 10:29 PM
    It seems like our runtime can't process the data size.
    We will use spark runtime instead of Default Data Refinery XS to handle it.
    • Spark & R 3.6 environments
    • Default Data Refinery XS
    • Hadoop

    Reference:
    https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=environments-data-refinery

    Thanks 



    ------------------------------
    Chun Hsiang Wu
    ------------------------------