API Connect

API Connect

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
  • 1.  Streaming support in API Connect

    Posted Mon November 04, 2024 10:55 AM

    I am wondering does IBM API connect support API requests that would stream the data contents in the request.

    The use case is a file upload, for very large files [200MB+], to enable the call to stream thru API Connect to the backend, without building up its runtime memory too much while doing so.

    e.g:

     curl -vvv -H "Content-Type: application/octet-stream" -X POST -T data.bin <APIC API ENDPOINT>

    The default behavior, for a simple PoC example, shows that the curl upload hangs during the upload, the DP memory grows, and only once the whole file is uploaded to the APIC side is the backend invoked and file passed.


    As far as I understood, the underlying DP can support this, but how can the api/apic configuration be set to levrage it?



    ------------------------------
    Vid Romac
    ------------------------------


  • 2.  RE: Streaming support in API Connect

    Posted Mon November 04, 2024 03:35 PM

    Hi Vid,
    Yes, streaming is enabled in an API Gateway API by default.  You can explicitly specify buffering by specifying x-ibm-configuration.buffering with a value of true.  If streamed, what will read data off of the stream within the API assembly execution is a parse policy or a GatewayScript that does a context.message.body.readAsXXXX function.  Without those, if you execute an invoke policy should be reading from the stream and writing that to your backend server.  So a first question is what does your API look like from a policy perspective?  That's quite a large file to be posting to the gateway, and eventually the API will need to read from the stream to write to your backend.  What you perceive as buffering might simply be a very fast execution of API policies before the data needs to be consumed, and memory growth could be explained by DataPower reading from the stream faster than the data can be pushed out onto the network. That's definitely an educated guess on my part, so it might be worthwhile to open a PMR to get a detailed explanation.
    Best Regards,
    Steve Linn



    ------------------------------
    Steve Linn
    Senior Consulting I/T Specialist
    IBM
    ------------------------------



  • 3.  RE: Streaming support in API Connect

    Posted Tue November 05, 2024 03:44 AM

    My API assembly is bare-bone for the PoC, only contains the invoke step:

      assembly:
        execute:
          - invoke:
              target-url: $(service-url)
            verb: POST

    I disagree that this is only perceived as buffering, as I've tested it with various file sizes [50, 500, 5000MB] and each takes longer to process [as well as a larger memory footprint on DP], after which it declines as the back-end is currently a mock service. So APIC waits for the whole file before calling it, which then gets denied. If it were streaming, my expectation is that no matter if the file is 1MB or 1000MB, the deny should come as soon as possible. 



    ------------------------------
    Vid Romac
    ------------------------------



  • 4.  RE: Streaming support in API Connect

    Posted Tue November 05, 2024 09:52 AM

    Hi Vid,
    I'm not sure I follow you on the denial that you mention. You're saying your mock backend will consume this request and then reject it?  Does your mock backend need to consume/buffer the request before it responds?  Even though the payload is being streamed to the backend, I would anticipate DataPower memory to grow as that payload will be placed into context in request.body so memory will be consumed.  The invoke policy also must wait for that backend to respond before it will continue.  The question is if the request payload is also being streamed to your backend.  I'd think a DataPower packet capture would be useful in this case.  Since your API call is https, don't forget to enable the ssl session keys on the packet capture so that part of the traffic can be decrypted in wireshark.  You should be able to see timestamps of each packet being sent.  If your huge file tests show the inbound request starts to be received at some time and then the syn to the mock service happens many seconds (minutes?) after that, then the request is indeed being buffered, but if you almost immediately see interleaving traffic between the request inbound to DataPower and outbound to your mock service, then the request is being streamed and I'd anticipate the DataPower latency is due to the invoke policy waiting on a HTTP 400 from your mock backend.

    If the packet capture doesn't answer some questions, I'd recommend you open a PMR.

    Best Regards,
    Steve Linn



    ------------------------------
    Steve Linn
    Senior Consulting I/T Specialist
    IBM
    ------------------------------



  • 5.  RE: Streaming support in API Connect

    Posted Tue November 05, 2024 10:17 AM

    That was the first PoC version, where there wasn't a proper back-end that would allow such an upload, just to see the behavior on the first hop.

    The new version has a proper one, that will accept and stream the file it gets from the upload into the file system. Here is the python flask app that listens on the back-end. [if I made some error on this side do tell, as this is my first dabble in a flask app]. The file being uploaded is 400MB large.

    @app.route('/upload', methods=['POST'])
    def upload_file():
    
        #config
        timestr = time.strftime("%Y%m%d-%H%M%S")
        print("Request received at " + timestr)
        filename = "output_" + timestr
        file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
     
        with open(file_path, "wb") as f:
            chunk_size = 4096
            while True:
                chunk = request.stream.read(chunk_size)
                if len(chunk) == 0:
                    return 'Done', 200
                f.write(chunk)
        return 'File successfully uploaded', 200

    The apic api side is described as

    swagger: '2.0'
    info:
      title: OpenAPI definition
      version: 0.0.2
      x-ibm-name: streaming-demo
    host: $(catalog.host)
    basePath: /streaming-demo
    schemes:
    - https
    x-ibm-configuration:
      testable: true
      enforced: true
      cors:
        enabled: true
      assembly:
        execute:
          - invoke:
              target-url: $(service-url)
              verb: POST
      gateway: datapower-gateway
      properties:
        service-url:
          value: ''
          description: ''
          encoded: false
      catalogs:
        mycatalog:
          properties:
            service-url: <python backend>/upload
    
    consumes:
      - application/octet-stream
      /server/receive:
        post:
          parameters: []
          responses:
            '200':
              description: OK
          tags:
          - server-file-controller
          operationId: receive

    The upload from the client is done via

    curl -k -vvv -H "Content-Type: application/octet-stream" -X POST -T data_small.bin  <APIC>/streaming-demo/server/receive

    However, APIC behaves still the same and hogs memory during the upload - from 2,5GB up to 3,6GB RAM. If there were multiple paralel uploads, all the memory would easily be used.

     Is there no way to tell apic to discard/release the payload that was sent over.



    ------------------------------
    Vid Romac
    ------------------------------



  • 6.  RE: Streaming support in API Connect

    Posted Mon November 25, 2024 04:02 PM

    Vid,

    Check your API definition as that's most likely where your issue will be.  If you have something defined as required in your yaml, such as body, APIC will buffer the request to ensure that required field is present.  

    A real easy way to test this is to create a shell API that has no requirements such as one sample below.  Make sure you are streaming from your client application and this should stream to the backend.

    Matt

    swagger: '2.0'
    info:
      title: stream
      x-ibm-name: stream
      version: 1.0.0
    x-ibm-configuration:
      cors:
        enabled: false
      gateway: datapower-api-gateway
      type: rest
      phase: realized
      enforced: true
      testable: true
      assembly:
        execute:
          - invoke:
              title: invoke
              version: 2.0.0
              verb: keep
              target-url: $(target-url)
              follow-redirects: false
              timeout: 60
              persistent-connection: true
              chunked-uploads: true
      properties:
        target-url:
          value: http://example.com/operation-name
          encoded: false
      activity-log:
        enabled: true
        success-content: activity
        error-content: payload
    basePath: /stream
    paths:
      /:
        post:
          responses:
            '200':
              description: success
              schema:
                type: string
          consumes: []
          produces: []
    schemes:
      - https
    


    ------------------------------
    Matt
    ------------------------------



  • 7.  RE: Streaming support in API Connect

    Posted Mon November 25, 2024 05:06 PM

    Hi Matt and Vid,
    There are only two things in an API yaml (which will translate to the DataPower API Definition object) that would enable buffering.
    1. The presence of a map policy.
    2. If x-ibm-configuration.buffering has a value of true
    In Matt's example, no map, no buffering, so the sample API should be streamed.
    Best Regards,
    Steve Linn



    ------------------------------
    Steve Linn
    Senior Consulting I/T Specialist
    IBM
    ------------------------------



  • 8.  RE: Streaming support in API Connect

    Posted Tue November 26, 2024 04:49 AM

    I gave my entire APIC API specification yaml content in my message above, as well as the client code, and as you can see, there is nothing in it that would trigger buffering: nothing marked as required, no map policy, no buffering flag. And yet the behavior is not as expected



    ------------------------------
    Vid Romac
    ------------------------------



  • 9.  RE: Streaming support in API Connect

    Posted Tue November 26, 2024 05:09 AM

    correction - backend code

    client code is given in first message - a curl call



    ------------------------------
    Vid Romac
    ------------------------------