As SKLM grows in popularity, we are receiving increase requests for a a guide of Best Practices for SKLM. This wiki is a working-document for a potential white paper on the subject.
A good implementation starts with a good design. SKLM is very flexible and implements a Master-Clone (M/C) or Multi-Master (M/M) architecture. Aside from the master, SKLM can support up to 20 clones. Alternatively, in Mutlti-master mode, 21 masters can be deployed.
Whether the customer chooses M/C or M/M, most customers implement a total of four hosts across two data centers. One part is located in each center.
Note that numerous and various storage components can concurrently communicate with SKLM across the network using KMIP or IPP, but they are not limited to being located in the same data center as any of the SKLM servers.
SKLM can be deployed globally, as well.
SKLM implements a model where keys are pre-generated for each of the storage solutions. Although not all of the keys may be immediately used, the pre-generation of keys on the master allows them to be retrieved from the master or any of the clones. IBM recommends that keys be pre-generated whenever possible by the use of key groups within SKLM and that 10% or more keys be pre-generated than are anticipated to be used within a year. With SKLM's ability to manage 8M+ keys, generating more keys than are required will unlikely consume the capacity of SKLM.
Whenever keys are generated, SKLM prompts administrators to make a backup. IBM recommends that backups are created whenever prompted. In addition, IBM recommends that the SKLM environment (application and data) be backed-up monthly. If SKLM is running as a VM, IBM recommends making a "snapshot" of the VM every year or whenever keys have been added or configurations have been changed on the SKLM server. Making a VM snapshot preserves not only the SKLM application and database of keys, but also the operating system and network configuration as well.
One other important consideration for backups is having a safe copy of your backups at a safe location. As the SKLM backup contains all the keys and device/client information it is a complete snap shot of your encryption key environment and as we all know without the keys the encrypted data is lost. With the constant threat of attacks, both external and internal, IBM recommends that you create a copy of your SKLM backups and store it in a safe place to reduce the potential of an attack where someone with authority (real or somehow hacked) maliciously deletes your keys and your SKLM backups as well. As the SKLM backup is encrypted it is relatively safe to copy that backup jar file to another protected system or even a jump drive that is kept in a safe place.
Manual vs Automated Replication of the master to clones
SKLM provides the capability to easily create a backup of the master and as stated above IBM recommends a backup be created whenever prompted using this capability. This capability to create a backup can also be used to replicate SKLM from a master to as many clones as desired. (IBM recommends at least one master and one clone) This process is to create the backup jar (which is encrypted as part of the backup process) and move that jar to each of the clones and conduct a restore using that jar file. This is a manual replication of the SKLM master to one or more SKLM Clones. There is also a built in facility that allows SKLM to be set up to automatically replicate from the master to up to 20 SKLM clones. This is set up via configuration on the master SKLM. If automated replication is put in place then the master SKLM will monitor any new key creation and set up a time frame for the master to check for the creation of any keys and if new keys are found perform an automated replication to up to 20 clone SKLM instances. This process if very similar to the manual backup/restore replication but is performed automatically. (the master checks for new keys based on the schedule configured, if new keys are found a backup is initiated and the resulting backup jar is encrypted, a Secure Sockets connection is then established with each of the clone and the encrypted backup/restore jar is transmitted to the clone, then the clone receives the backup/restore jar and performs a restore, finally transmitting a success return code back to the master. Note that TCP/IP connection form the master to all the clones is required for this automated replication to be enabled)
Encryption deadlock is when, in order for your SKLM servers to start, they need to access disk storage which is encrypted via SKLM-managed keys. The result is a catch-22 situation where the storage environment cannot be read without the encryption keys, but the encryption keys are located on the same storage system.
Deadlock is almost impossible to recover from.
Organizations are encouraged to plan for worst case scenarios, such as when an entire data center might go cold due to a mega-event similar to Hurricane Sandy, which might cause diesel fuel for backup generators to become unavailable due to extended power outages. Use of SKLM servers in secondary data centers can help if network connectivity is present, and certain disk systems allow use of a recovery key. IBM recommends storing SKLM backups in a "logical safe" which they can access even if their main storage environment is down, and that installations using self-encrypting disk or flash storage have the ability to retrieve keys from either a backup copy of SKLM or an SKLM replica that is not on the same storage environment as the master (which would be encrypted in a deadlock situation).
Use of HSMs
SKLM utilizes an optional software based FIPS-140-2 Level 1 certified crypto-module for generation of keys and implementation of encryption algorithms. Most customers find this level of FIPS-certification sufficient to meet their needs. However, some customers also need the master key stored in hardware security modules (HSMs) separate from SKLM.
When HSMs are deployed with SKLM, then the HSMs must be integrated prior to the initial use of SKLM (so the SKLM master key can be generated by the HSM). Furthermore, if HSMs are deployed IBM recommends deploying two HSMs for redundancy and to backup the contents of the HSMs (a separate HSM utility is required to perform this function) on a regular basis (i.e. monthly or whenever a change to the HSM is made).
Keep in mind that if an HSM is deployed and the HSM is corrupted, destroyed, or inaccessible, then the master key may not be recoverable and therefore all keys within SKLM may not be be recoverable since they are encrypted under the master key stored in the HSM. Employing regular backups of the HSM and documenting an HSM recovery process (outside the scope of SKLM) will minimize the risk of losing the HSM contents (SKLM master key).
Since SKLM is a solution that resides on the network, firewall configurations should be employed to restrict traffic to the SKLM server to come from only authorized or administrative hosts. IBM recommends restricting network traffic to the SKLM server to only hosts that are managed by administrator (for administrative usage), SYSLOG servers, SIEM servers, and storage and applications solutions that receive keys from SKLM. SKLM should never be placed on an internet facing network.
To monitor the health of SKLM, administrators should review their operating system and application logs on a monthly basis. Furthermore, SKLM should be integrated with an external SYSLOG server and SIEM to allow for separation and automated monitoring of log data for forensic analysis, to maintain log information for post event review, and to allow for automatic detection of suspicious activities.
HW or VMs for SKLM
Customers can deploy SKLM on stand-alone servers or as virtual machines. Although SKLM can be deployed in either way, IBM recommends deploying SKLM on virtual machines because of the flexibility that a virtualized environment provides, including the ease in producing a snapshot of the VM for disaster recovery.
SKLM regularly releases updates and fix-packs. Customers are always encouraged to keep their SKLM instances aligned with the most recent releases and patched with the latest fix-packs to optimize performance, minimize downtime, and ensure they do not become susceptible to any security vulnerabilities.
Risks of Key Loss
Although the risk of lost can never be eliminated, customers will maximize the risk of key loss if these recommendations are not followed. The loss of encryption keys means that any encrypted data protected by those keys will cryptographically erased and not recoverable.
Key rotation is important to ensure keys do not become used for encryption of multiple data objects and also to minimize the blast-radius in the unlikely event that a key is compromised. As such, SKLM offers the ability to implement device groups will timed-key usage. They keys can always be used to read formerly-encrypted data, but cannot be used to encrypt future data. This approach allows for easy implementation cryptographic erasure policies.
Although standards do not specify a specific key rotation periodicity, organizations and industry generally adopts the practice of using unique keys per tape, disk, or object that is encrypted. Those keys are generally wrapped under a master, or key-exchange, key. For most organizations, the rotation periodicity (a.k.a. cryptoperiod) is one year, but can be as short as three months. Regardless of your organizations policies regard key rotation, SKLM can integrate with self-encrypting storage to support them.
NIST SP800-57 provides comprehensive guidance on a key management practices. Although not all organizations may be able to follow this guidance precisely, it is always recommended to establish a roadmap for aligning your organization's key management practices with those described in SP800-57.
#Encryption and Key Management