FlashCore Modules (FCM) provide a compression capability for primary storage on z/OS that is independent of dataset type and is applied to all data on the DS8000. They do not provide the same performance and other ancillary benefits as zEDC compression or coprocessor compression for Db2 tables and so should not be seen as a replacement for this. However, they can provide data reduction for those datasets that are not supported by dataset compression capabilities.
Given that many clients use both zEDC for sequential data and coprocessor compression for their Db2 tables we expect to see highly variable compression across different volumes in a z/OS environment. However, there are likely significant volumes of non-compressed datasets such as Db2 indexes or VSAM databases that will benefit from FCM compression.
If dataset encryption is used, then these datasets will not be compressed at all on the storage system and if this is pervasive(sic) then the overall compressibility at the storage level may be very limited.
Another factor to consider is that with CKD track formatting, the actual data on a 56KB track is less than this. With 4KB records we have 48KB of data on a track and compression will be able to recover some of this space even if the actual data is not compressible. Because of these various factors it is important to understand the range of compression ratios that would be experienced in a z/OS environment to determine what the average compression ratio would be.
DFSMS Compression Estimation Tool
The DFSMS compression estimation tool enables the user to obtain an estimate of the compressibility of data on a z/OS volume by using the zEDC compression functionality with a set of parameters which emulate how the FlashCore Module compression is performed. This is provided as a new utility within the DFDMSdss product. It will perform the same processing as a dump of a volume but rather than performing a backup of the volume will perform compression on the allocated tracks and calculate an average compression ratio of these tracks. As well as the compression statistics the tool will also provide free space statistics on the volume so that both the thin provisioning and compression benefits can be calculated from this output.
Comparing Estimations and FlashCore Module compression
The compression estimation tool is not a precise match for what happens inside the DS8000 and so there will always be some difference between the results of the tool and what is seen in the DS8000. The following graph shows a comparison of the Compression Estimation Tool results and the actual FlashCore Module compression for a real-world environment.
The results show that the estimation tool was within 10% of the actual compression ratio for the majority of samples and especially for volumes with a low compression ratio the FlashCore Modules tended to get a slightly better compression ration than the tool. This provides a high degree of confidence that the tool can be used for planning purposes.
Using the Compression Estimation Tool
To obtain a good overall estimate of the compressibility of data in a z/OS environment it is important to run the Compression Estimation Tool against a statistically significant and representative set of data.
The table above shows the variation of compression in a real-world environment with the bars showing the amount of capacity with different compression ratios. Different DFSMS storage groups and groupings of NONSMS volumes are shown in different colours and show that there is both variation between and within storage groups even if there are definite groupings that can be seen.
If it is possible, then the best results will be obtained by running the tool against all volumes in an environment. However, this is unlikely to be practical in many cases, so the guidelines below help identify a subset of volumes to run the tool against.
The following should be taken into consideration
- Perform the estimation on volumes that together comprise at least 10% of the capacity of the environment. A larger sample would be better, but this should be considered a minimum.
- Include volumes from all DFSMS storage groups and a sample of significant NONSMS volumes. Aim for 10% of volumes within each storage group with a minimum of 10 volumes and a maximum of 100 if the storage group is very large.
- When selecting volumes select those with the least free space as free space is not considered when doing the compression analysis.
- Select both large and small volumes when in the same storage group as DFSMS will tend to skew new allocations towards smaller volumes so they may have different data than larger volumes.
For each storage group or volume grouping you could assume that this data compresses by the weighted average of the compression ratio of the volumes evaluated. If there is very significant variation within a particular grouping, then it might be advisable to consider increasing the number of volumes evaluated to make sure that the sample is representative.
Using results for sizing a new storage system
With FlashCore Modules it is mandatory to use thin provisioned volumes and compression will always be performed on the data on the FCMs. Therefore, in the majority of environments the system will be overprovisioned where the total capacity of the volumes will be greater than the physical capacity of the drives. When sizing the system, you need to consider both the savings provided by thin provisioning and by compression as these together will determine the required physical capacity of the system.
For example, if we assume that 15% of space in the environment is not used due to thin provisioning and that the compression rate of the used space is 50%. Then if the existing storage capacity was 500TB then this would consume 212.5TB of FlashCore Module capacity.
Typically, users doing overprovisioning will plan on the utilisation of the system being at less than 80% and would plan on purchasing additional capacity if this threshold was reached. So 265TB would meet the 500TB required today. If the user had 10% year on year growth and wanted to provide 4 years of growth, then the capacity required would be 389TB.
It is highly likely that the granularity of Flash drives today does not exactly match the capacity requirements of a particular environment and so the capacity should be rounded to the nearest possible configuration.
In this case a sensible configuration might be 4 drive sets of 9.6TB FCMs although in many cases it is likely that the performance requirements could also be met by 2 drive sets of 19.2TB FCMs as shown below. Both these configurations would provide 408TB of capacity