Open Source Databases

 View Only
  • 1.  Recommendation for Streams 4.1.1 Data Load to Google BigQuery

    Posted Thu January 13, 2022 05:07 PM

    Greetings. I am looking into a couple of alternatives for loading data from Streams 4.1.1 to GCP BigQuery and was wondering if anyone has recommendations given there are multiple alternatives. One being the Storage Write API and the other being the REST insertAll API. Google recommends the Storage Write but please let me know if you have positive/negative experiences with either one in terms of implementation/support/security challenges. Thanks.






    #OpenSourceOfferings
    #Streams
    #Support
    #SupportMigration


  • 2.  RE: Recommendation for Streams 4.1.1 Data Load to Google BigQuery

    Posted Thu January 13, 2022 05:34 PM

    Hi v.cintron,

    I'm not aware of any Streams customers directly using the Storate Write API. Couple of our IBM Streams 4.x customers do read and write from/to GCP BigQuery tables from their real-time Streams applications. They use the REST APIs published by GCP. It has been working very well for them for the past three years. I helped them with that in 2019. You can use the versatile HttpPost operator available from this Streams toolkit available for free of cost.

    https://github.com/IBMStreams/streamsx.websocket

    Good Luck,

    Senthil.






    #OpenSourceOfferings
    #Streams
    #Support
    #SupportMigration


  • 3.  RE: Recommendation for Streams 4.1.1 Data Load to Google BigQuery

    Posted Fri January 14, 2022 03:44 PM

    Thanks Senthil. The Storage Write is newer and would require a Java operator which should be doable but REST should be easier to implement. Any thoughts on how to handle OAuth with the HTTP POST operator? Could not find anything in the docs since the version we have is old (4.1.1).






    #OpenSourceOfferings
    #Streams
    #Support
    #SupportMigration


  • 4.  RE: Recommendation for Streams 4.1.1 Data Load to Google BigQuery

    Posted Fri January 14, 2022 04:31 PM

    Hi v.cintron,

    Please note that I pointed you to a new Streams toolkit (https://github.com/IBMStreams/streamsx.websocket) that happens to include an improved version of HttpPost operator that provides better throughput, text and binary data sending, persistent HTTP connections etc. This particular HttpPost operator is different than the HTTPPost operator in the streamsx.inet toolkit that is shipped in the Streams 4.x product. In the new HttpPost operator, you can form your own custom request headers and then let that operator send it to the remote HTTP server.


    Google BigQuery documentation says this. "Set the token in the Authorization request header with the value Bearer ACCESS_TOKEN." You can create a custom header inside your Streams application in an SPL map with a key as "Authorization" and value as "Bearer YOUR_ACCESS_TOKEN" and then pass this SPL map via the requestHeaders attribute in your input tuple to the HttpPost operator. You can refer to the HttpPostTester example available in that toolkit URL shown above and search for requestHeaders in that SPL file. Please try it and tell me if this works for you.


    Regards,

    Senthil.







    #OpenSourceOfferings
    #Streams
    #Support
    #SupportMigration


  • 5.  RE: Recommendation for Streams 4.1.1 Data Load to Google BigQuery

    Posted Fri January 21, 2022 10:26 PM

    Thank you very much for the input. I will give that a shot and also see if Storage Write works as well. I'll keep you posted and share results.






    #OpenSourceOfferings
    #Streams
    #Support
    #SupportMigration


  • 6.  RE: Recommendation for Streams 4.1.1 Data Load to Google BigQuery

    Posted Tue January 25, 2022 03:37 AM

    Senthil, finally was able to build but looks like Streams 4.2 is required for websocket toolkit. Do you know if its possible to make the same REST POST call with the streamsx.inet toolkit?






    #OpenSourceOfferings
    #Streams
    #Support
    #SupportMigration


  • 7.  RE: Recommendation for Streams 4.1.1 Data Load to Google BigQuery

    Posted Tue January 25, 2022 12:03 PM

    Hi v.cintron,

    You will need a minimum Streams version of 4.2 to use the streamsx.websocket toolkit. If you really want, you can make it to work on Streams 4.1. You can edit the streamsx.websocket/com.ibm.streamsx.websocket/info.xml file and change 4.2.1.0 to 4.1.0.0, save and rebuild that toolkit using the ant commands that you used. You can quickly try this and see if it will work for you.


    If that doesn't work, please tell me. We can discuss how you can use the streamsx.inet toolkit to accomplish what you are trying to do. As I already said, streamsx.websocket toolkit's HttpPost operator will give you much better throughput compared to the operator available in the streamsx.inet toolkit.


    In any case, we will get it working for you via one of those two toolkits.


    All the very best.






    #OpenSourceOfferings
    #Streams
    #Support
    #SupportMigration