If the workload is new and we don’t know its size, the best approach is to start with a safe baseline and then scale as we learn. Typically, we begin with:
-
3 master nodes (small CPUs, just for control)
-
A couple of infra nodes for logging/monitoring
-
A few worker nodes (4–6 CPUs each) to host applications
-
Storage (ODF) can start co-located, and move to 3 dedicated nodes if data or I/O grows
From this, we estimate the total vCPUs, add a little extra (10–15% for z/VM overhead and 20% for growth), and then divide by the overcommit ratio (e.g., 6:1) to get the IFL count.
In practice, this usually means starting with 4–6 IFLs for a pilot, 12–16 IFLs for small production, and 20+ IFLs for larger environments.
Once it’s running, we monitor CPU, memory, and storage usage for a couple of weeks, and then adjust up or down as needed.