AIOps: Monitoring and Observability - Group home

Elasticsearch Index Lifecycle Management policy and OMEGAMON Data Provider

  

Have you experienced a problem where ODP going into an Elastic environment has run out of space within Elastic or made Elastic go into “read only mode” and not accept new records? 
If you have, then read on for some ways to correct this situation. 

When ODP was introduced, additional documentation regarding specific uses of ODP with a variety of analytics platforms was provided on GitHub z-open-data. Our team has wanted to provide guidance to how ODP can be integrated into an existing analytic platform or as is the case with Elastic, provide guidance for new instances of Elastic as well. 

Net: we had some competing requirements for providing guidance. At Create an Elasticsearch Index template we provide a sample policy that can be deployed by a customer. Any production usage should have at least 3 Elastic images in a cluster to handle scale and availability. 

Further in the documentation, we note that a test environment might have a single Elastic image. In that case, additional guidance was provided at Number of replicas that the “number_of_replicas” should be set to 0. 

Unfortunately, we know of at least one instance where in a test environment, that last step was missed. And the net was the ILM policy did not take effect and the customer ran out of disk space, as older records were not pruned. This essentially makes ODP restricted. 

We’ve updated that documentation to explain this better and provide a warning about Single node Elastic instances used in test environments.  While the Elastic default for number_of_replicas is 1, single node Elastic environments must set this to 0 in order for Index Lifecycle Management to work properly. 

As mentioned, the documentation has been updated and can be found here and here. There are two steps for successful set up. 

What’s happening is that, within Elastic, if you go to ‘Stack Management -> Index Management -> Data Streams’,

you are likely seeing the health of the indices are yellow, and some error related to index lifecycle error. (note: above picture is all green health)  The detailed error may reveal that "Waiting for all shard copies to be active". This is because the number of replicas is by default set to 1, but since this is a single node instance, the replicas never get created. Thus, the ILM policy gets stuck here and not able to perform its job to move the data to ‘delete’ phase.

Note: If you go through the Stack Management and cannot find the Index Management function, then you do not have the proper authority within Elastic to execute the commands. We have users that are "read only", for example. They can use the dashboards, but they are not allowed to manipulate the environment and cannot/will not see these functions. 

To fix this error, please perform the following steps:

1) Go to ‘Stack Management -> Index Management -> Index templates’,

edit the index template ‘omegamon’ to add "number_of_replicas": "0" to the ‘Index Settings’ section. (This will ensure all new indices are created with number_of_replicas = 0)

{
    "index": {
         "lifecycle": {
             "name": "omegamon-ds-ilm-policy"
        },
        "number_of_replicas": "0"
    }
}

2) This maybe not required, but just in case your indices are already in ‘read_only’ status (due to ‘out-of-space’ condition), Go to ‘Dev Tools’, paste this to the console, then run it to reset the status.


PUT /_all/_settings
{
    "index.blocks.read_only_allow_delete": null
}

3) Go to ‘Dev Tools’, paste this to the console, then run it. (This will update all existing indices with number_of_replicas = 0)

PUT /_settings
{
   "index":{
       "number_of_replicas": 0
   }
}

Once these steps are done, you may notice that the status of your indices (data streams) are turning back to green, and the ILM policy should be working now to delete expired old data properly. You may also choose to manually delete old data by deleting the indices (data streams). Don’t worry about the warning for deleting indices, and it will be recreated automatically when new data comes.

Summary

We hope you haven't experienced this problem, but if so, you've found this helpful. As always, if you'd like to learn more about OMEGAMON Data Provider or OMEGAMON AI Insights, please go to our Master Blog for additional information. 

#CICS
#db2z/os
#IBMMQ
#IBMZ
#IBMZOS
#IBMAI
#IMS
#Instana
#jvm
#OMEGAMON
#ODP
#OMEGAMONAIInsights