Ok, I figured out the issue with the permission problem. Ultimately, in the Security Context Constraint, the seLinuxContext must be set to MustRunAs and NOT RunAsAny.
The knowledge center was updated in v10 to note this, but I had been using a previous copy of the SCC I likely obtained from v9. You can do your own research on it, but bottom line if that value is set to RunAsAny the filesystem seems to get the seliunux permissions of the last container that started and the other containers lose permissions. So from what I understand MustRunAs makes sure the selinux permissions stay consistent among the pods assigned to the SCC.
Thanks for all the input earlier. Based on this earlier input, I am rethinking even using these PVCs, at least for the non-config containers. If we are using lightweight containers, and shipping 100% of the logs to the console, I am not sure as Scott pointed out we even need the PVCs. I am thinking that it would be best to stop using them at this point for anything but the configuration container, and as Scott suggested put a local snapshot manager in each network tier (which I hopefully plan to do later in the year after we get to lightweight containers). This would make the containers and the entire architecture even more ephemeral.
The only question in my mind that remains is if I should have a shared log directory still for the WRPs, DSC, and runtime. I can't see an advantage. In fact, the knowledge center documentation doesn't even show a shared log directory now in the v10 manifest files that are provided for K8s. I know trace data is stored there, but if we were doing a trace we'd trace, stop, collect without restarting the containers so we could get the trace data. The only thing I was worried about is maybe core dumps or something if the containers actually crashed hard.
@Scott Exton do you have any thoughts on if a shared log directory is still needed on production with the lightweight containers?
Thanks again,
Matt
------------------------------
Matt Jenkins
------------------------------
Original Message:
Sent: Fri April 22, 2022 01:59 PM
From: Matt Jenkins
Subject: ISVA container OpenShift v4 file permission issues on shared directories
Scott, yes I always wondered what would happen if multiple containers downloaded at the same time.
I have looked at CONFIG_SERVICE_URL on v10.0.3 and that would be idea to use a snapshot manager at each datacenter / network tier. However, that means we have to be able able to push there from our core network where the config container lives today. So I have a bit of networking to get done before we can do that, so right now I am concentrating on just getting the lightweight containers working with the config container as we have used it in the past.
Step 2 is getting to OpenShift 4 which is what I am currently working on and getting the permissions errors. I think Jon is onto something with selinux. Unfortunately I am not a cluster / OS admin on the OCP4 hosts so I am waiting on our OpenShift team to provide me some more insight after I relayed what Jon suggested. Even if I spin up two copies of the config container (normally would not do, but should be no harm for a test) the first one loses connectivity to the shared volume. So I don't see how it could be a container image permission issue. This points more towards the containerization software. This is especially true after you said the bootstrapping does not do anything with the permissions on the shared volumes.
Thanks again for both your help. I'll let you know what I end up finding out on this side.
------------------------------
Matt Jenkins
Original Message:
Sent: Fri April 22, 2022 01:13 AM
From: Scott Exton
Subject: ISVA container OpenShift v4 file permission issues on shared directories
Matt,
I just want to ensure that I understood what you are saying.... The shared volume is shared by all of the lightweight containers, but the snapshot is populated by downloading the snapshot from the configuration container (or another Web service). If my understanding is correct, this does sound a little bit dangerous because if multiple containers are started at the same time and attempt to download the snapshot at the same time, there could be corruption occurring in the shared volume. I understand the issues with pulling the snapshot from different data centres – have you looked at the CONFIG_SERVICE_URL functionality in the configuration container (new in v10.0.3) which allows you to automatically push out the snapshot to multiple endpoints?
Anyway, this is not related to the permission problems which you are seeing. I suspect that OpenShift is doing something with the permissions when a pod is created – the bootstrapping of the ISVA containers doesn't do anything to the permissions on the shared volumes.
I hope that this helps.
Scott A. Exton
Senior Software Engineer
Chief Programmer - IBM Security Verify Access
IBM Master Inventor
Original Message:
Sent: 4/21/2022 10:42:00 PM
From: Matt Jenkins
Subject: RE: ISVA container OpenShift v4 file permission issues on shared directories
Scott, I am setting CONFIG_SERVICE_URL on all the lightweight containers. In the past, this is how we did it. Even though they all share /var/shared, they still get their config from CONFIG_SERVICE_URL. This even seems to work on v10.0.3.1 on OpenShift v3.11.
If we do not share /var/shared, then everything else such as extensions, fixpacks, snapshots, and support files won't be shared. I don't know if that is an issue other than on clusters with multiple pods, even pod will have to download the config instead of just one downloading it and it existing on /var/shared for the other pods to use. We have 6 OpenShift clusters that make up the entire environment, each with multiple containers running. I also store some custom scripts in a directory I called /var/shared/config_scripts but technically these only need to exist on the config container (they are some scripts I add to the shared storage that are used for general manual operational activities, although I suppose I could rewrite these processes to just upload the script to /tmp each time it was needed and run them from there).
So are you thinking that because CONFIG_SERVICE_URL is set, that the bootstrap on the other lightweight containers is changing the permissions where then the config container can't read them? Why then when the config container is restarted can the config container get to the files again? Does it reset all the permissions when it comes up? It doesn't look like the permissions are changing when I ran an ls on the /var/shared directory, they looked the same when I tried to do the directory listing from the config container.
Thanks for your help and insight.
------------------------------
Matt Jenkins
Original Message:
Sent: Thu April 21, 2022 04:58 PM
From: Scott Exton
Subject: ISVA container OpenShift v4 file permission issues on shared directories
Matt,
I suspect that this is infrastructure related and that OpenShift is changing the permissions on the directory based on where the volumes are being mounted. Maybe OpenShift is changing the permissions so that the UID that the container is running as has permissions to access the volume? You should be able to validate this by looking at the ownership/permissions on the volume in the lightweight container, and then compare this to the UID that the container is running as.
I personally haven't encountered this problem, although in an OpenShift environment I generally use the CONFIG_SERVICE_ULR rather than shared volumes.
The only other thing which I can think of is the container itself is updating the shared volume which hosts the configuration snapshot – although this wouldn't explain why the permissions on the directory are being modified. The only time that these containers should write to the snapshot directory is when the CONFIG_SERVICE_URL environment variable is set (this environment variable is used to indicate that a shared volume is not in use and that the container should instead retrieve the configuration snapshot from an external Web service). It would be worth checking to ensure that this environment variable is not set in the worker containers.
I hope that this helps.
Scott A. Exton
Senior Software Engineer
Chief Programmer - IBM Security Verify Access
IBM Master Inventor
Original Message:
Sent: 4/21/2022 1:30:00 PM
From: Matt Jenkins
Subject: RE: ISVA container OpenShift v4 file permission issues on shared directories
A quick update after some additional discovery. If I only have the config container running, things are fine. If I startup another pod, say the runtime, the config container can no longer access /var/shared, and the runtime pod can. Then if I start the DSC pod, the config and runtime cannot access it but the DSC pod can. It feels like something wrong with the shared filesystem, so I am going to try to get our OpenShift team to see if they can figure anything out. If anyone has any input it would be appreciated. Thanks.
------------------------------
Matt Jenkins
Original Message:
Sent: Thu April 21, 2022 11:56 AM
From: Matt Jenkins
Subject: ISVA container OpenShift v4 file permission issues on shared directories
Using the 10.0.3.1 lightweight images (and the regular image for the config obviously), I am having an issue with permissions on the shared directory which is causing the LMI and other containers to be unable to access /var/shared/fixpacks and /var/application.logs.
It is really weird. When the config container starts, in the terminal I can do a directory listing of /var/shared. However, after the config container is running fully, I can no longer do a directory listing of /var/shared.
Whatever is causing this is causing the config container to be unable to reach these directories as well, and hence once it is online, it cannot publish the config, generate a snapshot, or even read the snapshot file for the other containers to retrieve it via their bootstrap script via the API. What is weird is that the files are accessible for some time, then the lightweight containers come up when the config container can access the snapshot files in /var/shared/snapshots, and then the files are not accessible again. So I am wondering if the lightweight containers (wrp, dsc, and/or runtime) are having some effect on the permissions? However, the permissions don't seem to change. You can see in my output below where one moment I am able to do a directory listing, then a minute or so later I am unable to do the same directory listing from the config container.
Has anyone else seen anything like this with OpenShift v4.9 and ISVA v10.0.3.1 lightweight images? I have not tried the full image yet for WRP, DSC, and runtime functions. I am using a PVC with the ocs-storagecluster-cephfs as ReadWriteMany for the storage volumes (manifests below).
Thanks for any input.
sh-4.4$ ls /var/shared -ld
drwxrwxr-x. 7 root root 5 Apr 20 16:26 /var/shared
sh-4.4$ ls /var/shared/snapshots -ld
drwxrwsrwx. 2 isam www-data 11 Apr 20 20:00 /var/shared/snapshots
sh-4.4$ ls /var/shared/snapshots -l
total 130237
-rw-rw-r--. 1 root www-data 9389812 Apr 20 15:41 isva_10.0.3.1_20220420-154133.154991_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 9389787 Apr 20 15:44 isva_10.0.3.1_20220420-154406.325462_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 9390307 Apr 20 15:44 isva_10.0.3.1_20220420-154430.800829_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 9928610 Apr 20 15:45 isva_10.0.3.1_20220420-154504.159574_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 10040088 Apr 20 15:45 isva_10.0.3.1_20220420-154550.963214_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 10113651 Apr 20 15:50 isva_10.0.3.1_20220420-155052.874035_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 10175593 Apr 20 15:54 isva_10.0.3.1_20220420-155431.537537_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 10186323 Apr 20 15:55 isva_10.0.3.1_20220420-155505.922208_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 17017455 Apr 20 16:07 isva_10.0.3.1_20220420-160737.632667_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 18863170 Apr 20 16:15 isva_10.0.3.1_20220420-161533.515782_isam-config-0.snapshot
-rw-rw-r--. 1 root www-data 18863656 Apr 20 16:15 isva_10.0.3.1_published.snapshot
sh-4.4$ ls /var/shared/snapshots -l
ls: cannot access '/var/shared/snapshots': Permission denied
sh-4.4$ ls /var/shared/snapshots -ld
ls: cannot access '/var/shared/snapshots': Permission denied
sh-4.4$ ls /var/shared -ld
drwxrwxr-x. 7 root root 5 Apr 20 16:26 /var/shared
---
# Define persistent volume claim (storage) we will use for isam shared volume (configuration)
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "isam-shared-pvc"
spec:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: "10Gi"
...
---
# Define persistent volume claim (storage) we will use for isam logs
apiVersion: "v1"
kind: "PersistentVolumeClaim"
metadata:
name: "isam-logs-pvc"
spec:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: "10Gi"
...
------------------------------
Matt Jenkins
------------------------------