Instana U- Learn more

 View Only

Using Instana Synthetic for Availability Checks in Production

By Srikanth S M posted Thu September 26, 2024 06:19 AM

  

IBM SRE (site reliability engineers) have enabled across-regions availability checks on Instana by taking advantages of Synthetic Monitoring best practices and solutions provided by Instana Synthetic. This is beneficial to monitoring and analysis for stability and availability of different
regions hosted Instana infrastructure and for a whole lot of Tenant Units in production, and to further understanding of challenges and solutions in Synthetic Monitoring in production.

In this article, we will introduce the actual scenarios and best practices in production environment for Synthetic monitoring, including developing Synthetic scripts, tuning, alerting and so on. This article can be a reference for SRE team or customers who want to use InstanaSynthetic Monitoring in their production environment and can address most of your questions

Across-regions availability tests 

The goal of the availability tests is to monitor Instana operation SaaS Tenant Units across all the regions. We call them blue-region, orange-region and so on. The test scenario should cover end user monitoring through browsers as End-to-End tests: 

  1. Login main page with username, password and 2FA (two-factor authentication) 

  1. Validate main page 

  1. Navigate pages through the left navigation bar 

  1. Validate pages loaded and rendered as expected 

  1. Click All Services in the table and drill down to validate All Services page 

To achieve the availability test purpose, we use Instana Synthetic browser script test which can do proactively user journey monitoring by playing back Node.js based test scripts. The following sections will cover most important steps and frequently asked questions. 

 

Login websites with Synthetic user credentials 

The most frequently asked question for browser testing is how to log on to websites and how to protect user credentials. Instana Synthetic provides a solution to support login activities: 

  1. Create user credentials on Instana with Synthetic Open APIs 

  1. Use global variables $secure to access secretes in your test script 

  1. Use browser testing API $browser.generateTOTPToken to generate the Time-based One-Time Password (TOTP) with TOTP secret key to pass two-factor authentication 

In your Synthetic browser test script, you can use Selenium and JavaScript based code to login websites flexibly.    

await type("email", $driver.By.css("#user-name-input"), $secure.username); 

await type("password", $driver.By.css("#password-input"), $secure.password); 

await click("Login button", $driver.By.css("button[type='submit']")); 

let totp_token = $browser.generateTOTPToken($secure.totp_key); 

await findElementByIdAndSendKeys("otp-input", totp_token); 

await findButtonByClassAndClick("ds-button"); 

 

Reuse shared code 

Reusing shared code can definitely reduce the maintenance effort and make your code understandable and easily manageable in Git Hub repositories. In our test scenarios, we need to monitor all the 5 regions, the difference is a little among regions 

 

We can use global variables such as $synthetic to access variables for specific test case, and test configuration settings to pass different values to your scripts. In this way, we only need to develop and maintain one script to test all the 5 regions. 

Firstly, you can configure custom properties through Instana Synthetic Configuration UI 

  

 

Secondly, you can use $synthetic.labels to access the variable values. 

console.log(">>>>>>>>>>>>>>>>>>>", "Accessing Instana Login Page",

$synthetic.labels.regionName, $synthetic.labels.regionURL);

await $browser.get($synthetic.labels.regionURL); 

 

The custom properties can also be used as Synthetic tags to filter metrics and also reuse them in the Synthetic smart alert custom properties. This will be illustrated in the following part. 

 

Validate test results 

We use a lot of validations to check page rendering results. And we hope to let test failed if any validation checking failed, thus we can be informed by alerting. Instana Synthetic provides a lot of APIs to validate UI pages. When your test failed, Instana will capture a screenshot automatically to help you troubleshoot. And you can also use timeline chart, console logs, browser logs to diagnose test failures. In our scenarios, we use following ways to validate pages which are commonly used in browser testing field: 

  1. Use Explicit Wait to verify specified web elements presented on UI 

In the test logic, if the link of All Services cannot be presented on the page until 30 seconds, your test failed with an Error.  

await $browser.waitForAndFindElement($driver.By.xpath(`//a[text()='All Services']`), 30000); 

 

  1. Verify page titles 

In this test logic, we wait for the specified timeout value until page title contains the key words of Instana. If the timeout value reached, your test failed with Errors. 

await $browser.wait($driver.until.titleContains("Instana"), timeout); 

 

  1. Verify specified text values presented on UI  

In this logic, we wait for most 30 seconds until the page contains the text of Summary. If the timeout value reached, your test failed with Error messages. 

await $browser.waitForAndFindElement($driver.By.xpath(`//div[contains(text(),'Summary')]`), 30000); 

 

  1. Take screenshots 

You can take a screenshot to help you verify or troubleshoot in the important steps with browser testing APIs. The screenshots can be found in the test result details page. 

await $browser.takeScreenshot(); 

 

 

Alert test failures 

Use reasonable test frequency 

After create a Synthetic test, Instana can playback a test by fixed frequency. You are recommended to use 15 mins for browser testing since usually they are End-to-End user journey tests.  

 

Tune Synthetic tests in production 

All the Synthetic users hope to be alerted for any production issues, but also most clients do not hope to be alerted by momentary hiccup or occasional network slowness. You can address it by tuning Synthetic settings with 0 code.  

  1. Use Retry strategy  

To use Instana Retry strategy, you can set the number of retry attempts on the Instana test configuration page, set it to1 or 2. Then, your test is run at most three times in one test interval such as 15 mins in our scenario until it is successful before the result is sent to the Instana backend. 

  1. Use smart alert consecutive test failures 

You can also use Synthetic smart alert consecutive test failures to set the condition when the alert is triggered. If you set it to 2, it means one alert will be triggered if test failed consecutively in two test intervals such as 30 mins in our scenario.  

  1. Use reasonable timeout value 

The timeout value in test configuration is how long your overall test will be failed with a Script Timeout error. You are recommended to use the default value which is 5 mins if no values input. For validation of elements presented on pages, you are recommended to use Explicit Wait in code.  

In our availability test scenarios, we hope to be alerted in 15 mins, thus we take usage of Retry number of twice, with an interval of 10 seconds. No extra configuration of consecutive test failures. 

 

Use smart alert 

You can use Synthetic smart alert to monitor test failures and sent alerts to specified alert channels such as slack channels, email groups, PagerDuty and so on. 

You can use global variables in your smart alert titles such as testName. 

 

 

You can also add custom payloads in the alert content to help you isolate issues. As mentioned in the previous sections, we have defined custom properties in test configuration for region name and region URL and access them through global variables in test script code. In the smart alert configuration, we can also use them as Synthetic tags in Additional Custom Payloads. You will see the predefined custom properties prompted when you choose Synthetic test > tags. 

 

 

Analyze availability and success rate 

After the tests running smoothly for weeks, months, we hope to analyze the success rate and availability. The availability formula is equivalent to (1- Tdown/Tdown+Tmaintain+Tup). 

Instana Synthetic provides Open APIs to query your test result metrics. 

 

Query test result success rate 

In the following query, we get the success rate and average response time for last 1 hour for the availability test which has a tag of orange-region. 

curl -X POST -H 'authorization: apiToken xxxx' -H 'Content-Type: application/json' -i 'https://xxxx.io/api/synthetics/results' --data '{ "tagFilters":[{"value":"orange-region ","name":"synthetic.tags","key":"regionName","operator":"EQUALS"}], "metrics": [ { "aggregation": "MEAN", "metric": "response_time" }, { "aggregation": "MEAN", "metric": "status" } ], "timeFrame": { "to": 0, "windowSize": 3600000 } }' 

 

The result is as below. metricsStatus 1.0 represents success rate 100%. 

{ 

"testResult": [ 

{ 

"testId": "xxx", 

"testName": "Availability-script-orange-region", 

"locationId": [ 

"xxxx" 

], 

"metrics": [ 

{ 

"synthetic.metricsResponseTime": 42964.25 

}, 

{ 

"synthetic.metricsStatus": 1.0 

} 

] 

} 

] 

} 

 

Query status metrics and send to AI analysis dashboard 

In the following query, we get status metric of test result instances by 15 mins for the availability test which has a tag of regionName and the value specified by a variable of ${region} 

 

result=$(curl -s -X POST https://xxxx.instana.io/api/synthetics/results/list  \ 

    -H "Content-Type: application/json" -H 'authorization: apiToken xxxx' \ 

-d '{"pagination":{"page":1,"pageSize":3},"syntheticMetrics":["status"],"order":{"by":"start_time","direction":"DESC"},"tagFilters":[{"value":"'${region}'","name":"synthetic.tags","key":"regionName","operator":"EQUALS"}],"timeFrame":{"to": 0,"windowSize": 900000}}') 

 

The result is as below. The status metric contains a timestamp and a status value. It can be sent to any analytics dashboard. The metric value of 1 represents successful, and the metric of 0 represents failed. 

 

{"items": [{ 

  "metrics": { 

         "status": [ 

    [ 

        1724377840874, 

        1 

    ] 

          ] 

      } 

}]} 

 

Conclusion 

This article describes the benefits of using the Instana Synthetic Monitoring and the best practices to monitor production environment. And addressed most frequently asked questions that you might have. More best practices will be shared in the following series of Synthetic PoP related blogs.  

Permalink