Cloud Platform as a Service

Cloud Platform as a Service

Join us to learn more from a community of collaborative experts and IBM Cloud product users to share advice and best practices with peers and stay up to date regarding product enhancements, regional user group meetings, webinars, how-to blogs, and other helpful materials.

 View Only

Using COS bucket mounts as persistent storage in Code Engine

By Enrico Regge posted Fri January 09, 2026 04:40 AM

  

Written by

Simon Daniel Moser (smoser@de.ibm.com) - Distinguished Engineer for Containers @ IBM Cloud
Enrico Regge (reggeenr@de.ibm.com) - Software Architect @ IBM Cloud

----------------------------------

IBM Cloud Code Engine is IBM Cloud’s all‑purpose serverless platform, capable of running 12-factor web apps, functions, batch jobs, and general containers (this collection is called compute variants in the further course of this article). However, until recently, Code Engine had one shortcoming: All its compute variants could only save data by writing to a database or using Cloud Object Storage (COS) through an API; mounting a persistent filesystem wasn’t possible.

Just recently, we wrote a blog using JupyterLab as an example where this limitation got in our way. JupyterLab is a popular tool for data scientists, and Code Engine makes it easy and cost‑efficient to run. But any data stored inside the Jupyter notebook environment lived only on the container’s ephemeral disk. When the app scaled down (a very common and intended use case for serverless computing), that data disappeared—far from ideal.

Now, as mentioned above, you could e.g. call some Cloud Object Storage API to persist that data, but that doesn’t work for software like JupyterLab where you can’t modify the source code to integrate such calls. For these “as‑is” applications, containers, or jobs, filesystem mounts are essential. They let you expose persistent storage directly into the filesystem, and using COS‑backed mounts aligns well with serverless patterns because COS itself is serverless as well, meaning the storage costs scale with the usage.

There are, however, scenarios where COS mounts aren’t ideal—for example, low‑latency, highly concurrent workloads like relational databases. Object Storage has higher latency and offers no coordination for concurrent writes. The Code Engine documentation lists more things to watch out for and covers these considerations in more detail. But for our Jupyter use case, none of these limitations matter, so let’s look at what the new persistent storage capabilities offer and how to use them.

Before diving in, let’s briefly review the system architecture of the sample. Figure 1 shows all required resources:

image

An IBM Code Engine project (“jupyter‑labs”) in a region of your choice

  • A Code Engine application (“jupyter‑lab”) deployed in that project and exposed to the public internet
  • A Cloud Object Storage (COS) instance (“jupyter‑labs‑cos”) with a regional bucket in the same region as the Code Engine project
  • An HMAC‑based service credential for the COS instance to allow bucket access

To connect the Code Engine application with COS, we configure a persistent data store:

  • A COS credential derived from the HMAC key is created and referenced by the persistent data store
  • The data store is configured with the bucket name
  • At runtime, Code Engine mounts the bucket into the application’s filesystem

Setup and Configuration

With the architecture in place, we’ll now walk through the example: first deploying a plain JupyterLab instance (using a fixed password), then adding and connecting the persistent storage, and finally confirming that data in the bucket persists even when the application scales down.

  • Install jq. On MacOS, you can use following brew formulae to do a brew install jq
  • Install and configure the IBM Cloud CLI
    • Install the IBM Cloud CLI including the Code Engine plugin as described in the Code Engine documentation
    • Perform all necessary login steps to IBM Cloud
  • Create or select a Code Engine project
    • Create or select the Code Engine project of your choice; e.g. ibmcloud ce project create --name jupyter-labs or ibmcloud ce project select --name <yourProjectName>
    • Once the project had been created or selected, run the following commands to initialize environment variables in your current terminal session that are required later on:
      CE_PROJECT=$(ibmcloud ce project current --output json)
      REGION=$(echo "$CE_PROJECT" | jq -r '.region_id')
      CE_INSTANCE_GUID=$(echo "$CE_PROJECT" | jq -r '.guid')
      echo "CE project guid: ${CE_INSTANCE_GUID}, region: ${REGION}"
  • Create the Jupyter Labs app
    ibmcloud ce app create --name jupyter-lab \
        --image "quay.io/jupyter/minimal-notebook:latest" \
        --command "jupyter" \
        --command "lab" \
        --command "--ip" \
        --command "*" \
        --command "--NotebookApp.allow_origin" \
        --command "*" \
        --max-scale 1 \
        --service-account "none" \
        --port 8888
  • Open the browser using the URL that is printed in the command output to load the Jupyter Lab.
    • On your first attempt to open the Jupyter Lab, you’ll be greeted with the page shown below.
image

Once, you passed the authentication, you’ll be greeted with the Juypter Labs Launcher. That’s it …

image

… not quite yet. While it was pretty easy to bring a Jupyter Lab running on Code Engine, the capabilities of that setup are not meeting our requirements, yet. While you can explore and use Jupyter in the current setup, you’ll notice that all your data (notebooks, your configured password) is gone one Code Engine scales down your app. In the second part of the solution setup, we’ll add a persistence layer.

  • Create the COS instance and the bucket
    • COS_INSTANCE_NAME=jupyter-labs--cos, to define a environment Variable in your Terminal for the COS instance name
    • ibmcloud resource service-instance-create ${COS_INSTANCE_NAME} cloud-object-storage standard global -d premium-global-deployment-iam to create an actual COS instance by that name
    • COS_INSTANCE_ID=$(ibmcloud resource service-instance ${COS_INSTANCE_NAME} --output json | jq -r '.[0] | .id') to define a environment Variable in your terminal for the COS instance ID
    • ibmcloud cos config crn --crn ${COS_INSTANCE_ID} --force
    • ibmcloud cos config auth --method IAM to set the authentication mode to IBM Cloud IAM
    • ibmcloud cos config region --region ${REGION} to set the region to your preferred region
    • ibmcloud cos config endpoint-url --url s3.${REGION}.cloud-object-storage.appdomain.cloud to set the endpoint URL to the geo regional endpoint
    • COS_BUCKET_NAME=${CE_INSTANCE_GUID}-jupyter-labs to create an environment variable in your terminal for the bucket name
    • ibmcloud cos bucket-create \
          --class smart \
          --bucket $COS_BUCKET_NAME to create an actual COS bucket with that name
  • Create the COS Service credentials and store them as a Code Engine secret
    • COS_HMAC_CREDENTIALS=$(ibmcloud resource service-key-create ${COS_INSTANCE_NAME}-codeengine-mounts Writer --instance-id $COS_INSTANCE_ID --parameters '{"HMAC":true}' --output JSON)
    • COS_HMAC_CREDENTIALS_ACCESS_KEY_ID=$(echo "$COS_HMAC_CREDENTIALS"|jq -r '.credentials.cos_hmac_keys.access_key_id')
    • COS_HMAC_CREDENTIALS_SECRET_ACCESS_ID=$(echo "$COS_HMAC_CREDENTIALS"|jq -r '.credentials.cos_hmac_keys.secret_access_key')
  • Create a Code Engine secret
    • ibmcloud ce secret create --name cos-secret \
          --format hmac \
          --access-key-id ${COS_HMAC_CREDENTIALS_ACCESS_KEY_ID} \
          --secret-access-key ${COS_HMAC_CREDENTIALS_SECRET_ACCESS_ID}
  • Create the Code Engine persistent data store. Please see our documentation to learn more about “Working with persistent data stores” in Code Engine
    • ibmcloud ce persistentdatastore create \
          --name jupyter-labs-data-store \

          --cos-bucket-name $COS_BUCKET_NAME \
          --cos-access-secret cos-secret
  • Update the Jupyter Lab app and add persistent storage
    • ibmcloud ce app update --name jupyter-lab \
          --image "quay.io/jupyter/minimal-notebook:latest" \
          --scale-down-delay 600 \
          --cpu 1 \
          --memory 4G \
          --mount-data-store /mnt/jupyter=jupyter-labs-data-store:work/ \
          --mount-data-store /home/jovyan/.jupyter=jupyter-labs-data-store:config/ \
          --command "jupyter" \
          --command "lab" \
          --command "--ip" \
          --command "*" \
          --command "--NotebookApp.allow_origin" \
          --command "*" \
          --command "--notebook-dir" \
          --command "/mnt/jupyter" \
          --max-scale 1 \
          --service-account "none" \
          --port 8888

After the update got applied, you’ll notice that your Jupyter Lab forgot your password. Repeat these steps:

This will obtain the temporary token from the logs to you clipboard and allow you to log in. To avoid this in the future, you can set your new, own password in that same screen. Once you are logged in, click on e.g. the tile “Python2 (ipykernel)” under the “notebook section. This will your new, own notebook (which will be stored in the COS bucket).

image

To proof that your data is stored in the IBM Cloud Object Storage that we created as part of this blog post, let us take a look at the bucket and its objects. In the IBM Cloud console, navigate to https://cloud.ibm.com/objectstorage/instances and select your bucket. In the Objects tab, you’ll see that two base folders “config“ and “work“ have been created. After navigating into the “work“ folder, you should see something similar to this:

image

Now enforce a restart by applying an update “ibmcloud ce app update --name jupyter-lab“ … Even after various restarts you’ll notice that your notebook in the COS bucket is still around. 🎉

It’s time to recap what we have done so far. We deployed a JupyterLab environment, which is straightforward because Jupyter is largely agnostic to the underlying infrastructure as long as compute and a filesystem are available. On Code Engine, the filesystem is ephemeral by default, meaning it exists only for the lifetime of an instance and is not shared across instances. In Unix systems, the standard way to introduce persistent or remote storage is through mounting, which lets an external filesystem appear as part of the local one. This is exactly what Code Engine provides when you define a mount‑data‑store option such as “/home/jovyan/.jupyter=jupyter-labs-data-store:config/“. In the next section, we break down this configuration to illustrate its capabilities.

  • /home/jovyan/.jupyter - The first part, describes the mount point within the containers file system and defines where remote folders and files should appear
  • jupyter-labs-data-store - The second part, relates to the name of the Code Engine persistent data store entity. As part of our blog post, we specified a COS bucket and the credentials (read and write) to access it.
  • config/ - The last part, defines the prefix / location in the target COS bucket that should be mounted. If not set, Code Engine mounts the entire bucket.

For COS mounts, Code Engine uses the open‑source tool s3fs, which can mount S3‑compatible storage buckets as directories while preserving the underlying object format. This lets you interact with Object Storage through familiar shell commands—such as ls for listing or cp for copying—and also enables legacy applications that expect local filesystem access. However, s3fs does not provide the same performance or semantics as a native filesystem. For many workloads, such as JupyterLab, this is sufficient, but others may require stronger guarantees. To determine whether s3fs fits your needs, we recommend reviewing the limitations documented both in the Code Engine documentation and in the s3fs project.

This blog focused on using COS as a simple and effective persistence layer. Still, some scenarios require other storage types, such as file or block storage, which may better meet low‑latency or high‑concurrency needs. If your workload falls into that category, we invite you to get in touch with us so we can discuss the alternatives available.

Conclusion and Outlook

One limitation of this example is that the configuration file, including hashed user credentials, is stored in the COS bucket—something that would not pass a security review. To address this, we recommend integrating an external OIDC authentication mechanism. Details on how to do this are available in our blog post “Advanced OIDC integrations with Code Engine“.

Looking ahead, this scenario can be extended by including a pre‑packaged Code Engine SDK in the container image. Together with IBM Cloud trusted profiles, this would allow the notebook to call Code Engine APIs directly. As a result, you could run your data science workflows in JupyterLab while off‑loading compute‑intensive tasks—such as large‑scale data processing or PDF text extraction with tools like docling—to Code Engine fleets. Wouldn’t that be nice? But that’s a topic for another blog article someday …

0 comments
16 views

Permalink