Original Message:
Sent: Tue November 26, 2024 04:48 AM
From: Vid Romac
Subject: Streaming support in API Connect
I gave my entire APIC API specification yaml content in my message above, as well as the client code, and as you can see, there is nothing in it that would trigger buffering: nothing marked as required, no map policy, no buffering flag. And yet the behavior is not as expected
------------------------------
Vid Romac
Original Message:
Sent: Mon November 25, 2024 05:06 PM
From: Steve Linn
Subject: Streaming support in API Connect
Hi Matt and Vid,
There are only two things in an API yaml (which will translate to the DataPower API Definition object) that would enable buffering.
1. The presence of a map policy.
2. If x-ibm-configuration.buffering has a value of true
In Matt's example, no map, no buffering, so the sample API should be streamed.
Best Regards,
Steve Linn
------------------------------
Steve Linn
Senior Consulting I/T Specialist
IBM
Original Message:
Sent: Mon November 25, 2024 04:01 PM
From: Matt E
Subject: Streaming support in API Connect
Vid,
Check your API definition as that's most likely where your issue will be. If you have something defined as required in your yaml, such as body, APIC will buffer the request to ensure that required field is present.
A real easy way to test this is to create a shell API that has no requirements such as one sample below. Make sure you are streaming from your client application and this should stream to the backend.
Matt
swagger: '2.0'info: title: stream x-ibm-name: stream version: 1.0.0x-ibm-configuration: cors: enabled: false gateway: datapower-api-gateway type: rest phase: realized enforced: true testable: true assembly: execute: - invoke: title: invoke version: 2.0.0 verb: keep target-url: $(target-url) follow-redirects: false timeout: 60 persistent-connection: true chunked-uploads: true properties: target-url: value: http://example.com/operation-name encoded: false activity-log: enabled: true success-content: activity error-content: payloadbasePath: /streampaths: /: post: responses: '200': description: success schema: type: string consumes: [] produces: []schemes: - https
------------------------------
Matt
Original Message:
Sent: Tue November 05, 2024 10:17 AM
From: Vid Romac
Subject: Streaming support in API Connect
That was the first PoC version, where there wasn't a proper back-end that would allow such an upload, just to see the behavior on the first hop.
The new version has a proper one, that will accept and stream the file it gets from the upload into the file system. Here is the python flask app that listens on the back-end. [if I made some error on this side do tell, as this is my first dabble in a flask app]. The file being uploaded is 400MB large.
@app.route('/upload', methods=['POST'])def upload_file(): #config timestr = time.strftime("%Y%m%d-%H%M%S") print("Request received at " + timestr) filename = "output_" + timestr file_path = os.path.join(app.config['UPLOAD_FOLDER'], filename) with open(file_path, "wb") as f: chunk_size = 4096 while True: chunk = request.stream.read(chunk_size) if len(chunk) == 0: return 'Done', 200 f.write(chunk) return 'File successfully uploaded', 200
The apic api side is described as
swagger: '2.0'info: title: OpenAPI definition version: 0.0.2 x-ibm-name: streaming-demohost: $(catalog.host)basePath: /streaming-demoschemes:- httpsx-ibm-configuration: testable: true enforced: true cors: enabled: true assembly: execute: - invoke: target-url: $(service-url) verb: POST gateway: datapower-gateway properties: service-url: value: '' description: '' encoded: false catalogs: mycatalog: properties: service-url: <python backend>/uploadconsumes: - application/octet-stream /server/receive: post: parameters: [] responses: '200': description: OK tags: - server-file-controller operationId: receive
The upload from the client is done via
curl -k -vvv -H "Content-Type: application/octet-stream" -X POST -T data_small.bin <APIC>/streaming-demo/server/receive
However, APIC behaves still the same and hogs memory during the upload - from 2,5GB up to 3,6GB RAM. If there were multiple paralel uploads, all the memory would easily be used.
Is there no way to tell apic to discard/release the payload that was sent over.
------------------------------
Vid Romac
Original Message:
Sent: Tue November 05, 2024 09:51 AM
From: Steve Linn
Subject: Streaming support in API Connect
Hi Vid,
I'm not sure I follow you on the denial that you mention. You're saying your mock backend will consume this request and then reject it? Does your mock backend need to consume/buffer the request before it responds? Even though the payload is being streamed to the backend, I would anticipate DataPower memory to grow as that payload will be placed into context in request.body so memory will be consumed. The invoke policy also must wait for that backend to respond before it will continue. The question is if the request payload is also being streamed to your backend. I'd think a DataPower packet capture would be useful in this case. Since your API call is https, don't forget to enable the ssl session keys on the packet capture so that part of the traffic can be decrypted in wireshark. You should be able to see timestamps of each packet being sent. If your huge file tests show the inbound request starts to be received at some time and then the syn to the mock service happens many seconds (minutes?) after that, then the request is indeed being buffered, but if you almost immediately see interleaving traffic between the request inbound to DataPower and outbound to your mock service, then the request is being streamed and I'd anticipate the DataPower latency is due to the invoke policy waiting on a HTTP 400 from your mock backend.
If the packet capture doesn't answer some questions, I'd recommend you open a PMR.
Best Regards,
Steve Linn
------------------------------
Steve Linn
Senior Consulting I/T Specialist
IBM
Original Message:
Sent: Tue November 05, 2024 03:43 AM
From: Vid Romac
Subject: Streaming support in API Connect
My API assembly is bare-bone for the PoC, only contains the invoke step:
assembly:
execute:
- invoke:
target-url: $(service-url)
verb: POST
I disagree that this is only perceived as buffering, as I've tested it with various file sizes [50, 500, 5000MB] and each takes longer to process [as well as a larger memory footprint on DP], after which it declines as the back-end is currently a mock service. So APIC waits for the whole file before calling it, which then gets denied. If it were streaming, my expectation is that no matter if the file is 1MB or 1000MB, the deny should come as soon as possible.
------------------------------
Vid Romac
Original Message:
Sent: Mon November 04, 2024 03:35 PM
From: Steve Linn
Subject: Streaming support in API Connect
Hi Vid,
Yes, streaming is enabled in an API Gateway API by default. You can explicitly specify buffering by specifying x-ibm-configuration.buffering with a value of true. If streamed, what will read data off of the stream within the API assembly execution is a parse policy or a GatewayScript that does a context.message.body.readAsXXXX function. Without those, if you execute an invoke policy should be reading from the stream and writing that to your backend server. So a first question is what does your API look like from a policy perspective? That's quite a large file to be posting to the gateway, and eventually the API will need to read from the stream to write to your backend. What you perceive as buffering might simply be a very fast execution of API policies before the data needs to be consumed, and memory growth could be explained by DataPower reading from the stream faster than the data can be pushed out onto the network. That's definitely an educated guess on my part, so it might be worthwhile to open a PMR to get a detailed explanation.
Best Regards,
Steve Linn
------------------------------
Steve Linn
Senior Consulting I/T Specialist
IBM
Original Message:
Sent: Mon November 04, 2024 10:55 AM
From: Vid Romac
Subject: Streaming support in API Connect
I am wondering does IBM API connect support API requests that would stream the data contents in the request.
The use case is a file upload, for very large files [200MB+], to enable the call to stream thru API Connect to the backend, without building up its runtime memory too much while doing so.
e.g:
curl -vvv -H "Content-Type: application/octet-stream" -X POST -T data.bin <APIC API ENDPOINT>
The default behavior, for a simple PoC example, shows that the curl upload hangs during the upload, the DP memory grows, and only once the whole file is uploaded to the APIC side is the backend invoked and file passed.
As far as I understood, the underlying DP can support this, but how can the api/apic configuration be set to levrage it?
------------------------------
Vid Romac
------------------------------