Management

 View Only

Service Modeling using Topology Manager

By JULIUS WAHIDIN posted Wed August 03, 2022 01:53 AM

  
IBM Topology Manager application, called Agile Service Manager (ASM), comes with plenty of ready-to-use observers to discover the customer's environment. For example, there are observers to discover the topology of Kubernetes, OpenStack, cloud infrastructure, application performance management (APM), and many others. However, while working with customers, most wanted more than a single instance discovery. Some wanted an end-to-end topology, while others wanted to see the Service or the Business view. This blog will discuss configurations you can perform to meet these requirements.

End-to-end topology
A typical customer requirement is to show an end-to-end topology across the customer's application, infrastructure and network domain. ASM may have out-of-the-box observers for each domain, and to satisfy the customer requirement, we need to add the
merge rule that joins the topology. 

Let us work through building an end-to-end topology example using the ASM's sample file so you can repeat this in your lab. You can find the source file used in this blog from the following git location: File Observer Sample File.   For clarity, I have made minor modifications to this sample file for this blog. We start with the observer for VMWare vCenter and Dynatrace.  Even though we have used sample files read by file observer jobs for this exercise, the data comes from real vCenter and Dynatrace, so that I will call them vCenter and Dynatrace observers.

Observer job
As the first step, we define the observer job for vCenter and Dynatrace and then run the discovery. I assume you know how to define an observer job. I tried to make this blog not too long, so I will not go through the detailed steps of performing a specific operation. If you need some refresher tutorial, then the following article might help you:
-   Topology Modeling using Agile Service Manager: A tutorial.
-   Building a topology using Agile Service Manager's REST Interface
-   ASM tutorial: Creating Topology from Events
-   ASM tutorial: Learning to troubleshoot a REST API connection
-   REST Observers in Watson AIOps v3.4
-   Working with File Observers in WAIOps 3.1.1
Getting started with Watson for AIOps Event Manager (4/7) - Create sample topology

VMware vCenter
Once we run the observer job, we can verify the number of nodes discovered by looking at the view history pull-down menu of the observer job list. For the sample vCenter, we can see that the observer found 318 resources. We will interchangeably use the terms node and resource in the context of ASM topology.


The number of nodes can be quite large, as we can see from the discovered topology below:



To understand it better, let us focus on a small topology section. But, first, let us pick up one of the discovered Virtual Machines, VM-50, and show its neighbours.

We can observe the following:
  • The HyperVisor host machine is esx8.noi.ibm.com.
  • VM-50 is one of the virtual machines hosted by esx8.noi.ibm.com
  • VM-50 has a CPU, a hard disk and an Operating System. There is more information on each node if you right-click on the node and show choose Resource detail.
  • VM-50 has three network adapters, and each network adapter is assigned a network address (mac address).
This is all good if all you need is to discover your vCenter infrastructure. ASM has found the host and virtual machines, including their resources (CPU, disk, operating system, and network). You can also monitor the infrastructure history of changes and map the alerts into the topology node's status.   For some customers, this may be all they need.

Dynatrace


For application performance discovery, we will look at the Dynatrace observer. The Dynatrace observers discover some of Dynatrace's monitoring structures for you: Host, Process, Service and Application. Here is an example of the Dynatrace topology build using the sample files. Suppose we right-click on the Kafka.Kafka node, then the pop-up menu is shown:



Again, we have zoomed in on a few of the many automatically discovered resources. The above shows the host server1.noi.ibm.com runs multiple processes. Three of them are Elasticsearch, search-service.jar and Kafka. Each process contains services such as the Elastic:0:9000 (abbreviated). Each process is also connected to other processes. The pop-up menu shows that Kafka has connectivity with 40 other processes in the same host not shown in the diagram above.

If the vCenter observer before shows you the infrastructure, the Dynatrace observer provides you with the application topology. Respectively, each of the discovered topologies will be useful for the infrastructure or Application support team. Unfortunately, most customers still have siloed approach to supporting their environment.

I
t will be more useful if the customer can see the vCenter topology connected to the Dynatrace topology. The support team can then see the effect of the vCenter infrastructure on the applications as discovered by Dynatrace.

Fortunately, ASM has a merge feature that allows us to join observer job results. You configure the merge rule by specifying a typical pattern in the observer job result.  

Merge

Let us assume that the VM-50 from the vCenter observer is the same as host server1.noi.ibm.com from the Dynatrace observer. We want to tell ASM to merge them. There are at least two ways to join them. We can combine the VM-50 and server1.noi.ibm.com into a single node or create a relationship between VM-50 and server1.noi.ibm.com. Which one to choose depends on the purpose of the model. Merging them into a single node will produce a more compact topology, whereas creating the relationship will keep the topology of each observer intact. For our purpose, let us choose the latter one. We will create a relationship, and in doing so, the merge rule can be more straightforward.

Diagrammatically here is the process of joining the two topologies. We start with two topologies, and I have rearranged the node we want to merge on the edge of the diagram. It will be easier for you to see the merge rule:
Step 1: We use the file observer and introduce two additional nodes. We have named each new node the same name as the node from the DynaTrace and vCenter observer we want to join. This name assignment will make the merge definition in step 2 easier.

Step 2: We define a merge rule. Because of what we did in step 1, the merge rule is simple: join the node with the same name.

Step 3: Re-run the observer job. The merge is performed during an observer job run, so we need to re-run the observer job for the Dynatrace, vCenter and the File Observer.
Result: This is the joined topology. Note that we now have two composite resources. A composite resource is a resource created as a merge of the two original nodes, and it has the combined information from the two original nodes. We know that a resource is a composite resource because it will now have a CompositeId or the Vertex type is composite.



Implementation

So let us detail the implementation step. 

Step 1
is to create a file and then define the file observer job. The following file content can generate the joiner node.

V:{"uniqueId":"vjoiner-server1.noi.ibm.com","entityTypes":["host"],"matchTokens":["server1.noi.ibm.com"],"name":"server1.noi.ibm.com","_references":[{"_edgeType":"runsOn","_toUniqueId":"vjoiner-vm-50"}]}
V:{"uniqueId":"vjoiner-vm-50","entityTypes":["vm"],"matchTokens":["vm-50"],"name":"VM-50"}

Step 2
We define the merge rule.

Editing the merge rule gives the following:

We are joining by name, so we specify the ${name} token.

Then we specify that we are only interested in merging host, server or vm entityTypes.

As a recommended practice, we should always specify the most strict rules. So we should also define the observer job to be included.

Step 3
We then need to re-run the observer job.
After that, we should be able to see the result:


As you can see, the VM-50 is now connected to server1.noi.ibm.com.
If we right-click on VM-50 and select resource detail, we get the following.


A few things to note:

  • Rather than Id, we now have CompositeId
  • The merge token is the property used to merge the Composite Resource.
  • The property of the previous resource is carried over; examples are the entityTypes and cpu_count property.
There we go; we have created a joined topology from two out-of-the-box observers.

Connecting the network
Once we know how to join discovered topologies, we can extend it further. For example, I have added the network topology to the merge rule. Then, I ran the observer job for the network topology and modified our merge rule slightly to include the MAC address from the vCenter.


After re-running the vCenter and the network observer, we get the following end-to-end topology across the application, infrastructure and network.



The topology lets the support engineer know how the application connects to the infrastructure, network interface, and load balancer before going to the Cloud, which the Imperva Firewall protects.

If you want to create the network component as shown in your lab, I have provided a file observer content at the end of this blog.

Once we connect the source of the alerts, we can see the effect of an issue in the Load Balancer effect my ElasticSearch application. We can also then turn on the Cloud Native Event Analytics feature of Netcool Operation Insight to perform topology-based grouping.

Service Modeling
So far, we have been merging the results of different ASM observers to create an end-to-end view.   When you talk to customer management, very often, they will want to see the business view of the topology. The business view comes in different flavours depending on the customer organisation; if the customer is a manufacturer, they might want to see the product view of the topology. A service operator might wish to have a service view. A distribution company might want a geographical perspective.  

They want to build a logical hierarchy from the discovered topology as a common theme. Usually, the hierarchy is available in their specific industry-specific application, which means the ASM out-of-the-box observer most likely is unavailable. Therefore, a common approach is to export the structure from their industry-specific application into an intermediary file, such as a CSV or a JSON file, and then use the File or REST Observer to bring the model into ASM. 

In this case, let us go through another sample observer, the Azure observer. The Azure observer discovers the Azure infrastructure, and it includes:
-  Subscription
-  Resource Group
-  Virtual Machine (VM)
-  VM's disk
-  VM's OS
-  VM's network interface
-  VM's network interface Security Group
-  Network interface's IP address
-  Virtual Network
-  Virtual Subnet

The sample data for Azure produces the following topology:

The topology starts with an Azure subscription (Resource named GTS-IMI-Demo-EA-Resources-Dev/Ace). Under this subscription are many resource groups. The model has so many resource groups that it becomes hard to see even from a one-hop view. 

Looking at the resource detail, we can see that the Azure resource groups contain location information. We can use this location information to create a logical grouping view. So let us do that.

Let us create three top-level geographical groupings: AsiaPac, America, and Europe. Under these three geographical groups, we introduce the available Azure data centres, such as the eastus seen in the screenshot above. In this exercise, we are using location; however, the same method can create a hierarchy based on services or products. We need to pick up some properties or a combination of properties to drive the grouping automatically.

We will generate the top-level group using the following file observer content shown at the end of this blog. This content is generated by extracting the location from the Azure sample file.

And here is the result of one of the top-level groups.

Now we need to define the merge rule. First, we will merge the name property from our top-level geo with the location property of the Azure observer topology.

Here is part of the definition of the rule:

We must run the observer job for Azure and the Geo File Observer. To help view the topology, I defined a dynamic group using the top-level geographical as a seed. Here is the resource grouping summary.

The following shows the resulted topology for one of the top-level geographical regions.


The topology is now organised based on the geography that we defined. For example, the above screenshot depicts the European Azure data centre.

The example shows how to build the logical topology using the discovered resources. In a real-life scenario, the challenge is finding data sources and transforming them to provide the dynamic structure to our service model.

Summary
We have discussed ASM modelling that I came across quite a bit during ASM customer implementation: end-to-end view and service modelling. Both models make use of the ASM's merge functionality. Creating a topology is only the start of the AIOps journey. Once the topology model is created, we can use it to enable topology-based grouping and configure the Cloud Native Event Analytics to calculate the Probable Cause scoring. 


Additional sample files
Network sample file.
The following can be used to generate the network component.
V:{"_operation":"InsertReplace","uniqueId":"vjoiner-fc:50:56:89:F5:40","entityTypes":["networkaddress"],"_status":[],"mac_address":"fc:50:56:89:F5:40","matchToken":["fc:50:56:89:f5:40"],"name":"fc:50:56:89:F5:40","tags":["VCenter65","VM-50"]}
V:{"_operation":"InsertReplace","uniqueId":"10.10.10.10","entityTypes":["ipaddress"],"_status":[],"matchToken":["10.10.10.10"],"name":"10.10.10.10","tags":["IpAddress"]}
V:{"_operation":"InsertReplace","uniqueId":"LoadBalancer1","entityTypes":["loadbalancer"],"_status":[],"matchToken":["LoadBalancer1"],"name":"F5 LoadBalancer1","tags":["LoadBalancer"]}
V:{"_operation":"InsertReplace","uniqueId":"Imperva","entityTypes":["firewall"],"_status":[],"matchToken":["Imperva"],"name":"Imperva Firewall","tags":["Firewall"]}
V:{"_operation":"InsertReplace","uniqueId":"Cloud","entityTypes":["cloud"],"_status":[],"matchToken":["cloud"],"name":"Cloud","tags":["Cloud"]}
E:{"_fromUniqueId":"LoadBalancer1","_edgeType":"loadBalances","_toUniqueId":"10.10.10.10"}
E:{"_fromUniqueId":"Imperva","_edgeType":"routes","_toUniqueId":"LoadBalancer1"}
E:{"_fromUniqueId":"10.10.10.10","_edgeType":"bindsTo","_toUniqueId":"vjoiner-fc:50:56:89:F5:40"}
E:{"_fromUniqueId":"Imperva","_edgeType":"connectedTo","_toUniqueId":"Cloud"}

Azure Geography Sample Files
V:{"uniqueId":"Geo-AsiaPac","entityTypes":["geocenter"],"name":"AsiaPac","tags":["geo"]}
V:{"uniqueId":"Geo-Europe","entityTypes":["geocenter"],"name":"Europe","tags":["geo"]}
V:{"uniqueId":"Geo-America","entityTypes":["geocenter"],"name":"America","tags":["geo"]}
V:{"uniqueId":"DataCenter-eastus","entityTypes":["datacenter"],"name":"eastus","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-America","_edgeType":"connectedTo","_toUniqueId":"DataCenter-eastus"}
V:{"uniqueId":"DataCenter-eastus2","entityTypes":["datacenter"],"name":"eastus2","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-America","_edgeType":"connectedTo","_toUniqueId":"DataCenter-eastus2"}
V:{"uniqueId":"DataCenter-westus","entityTypes":["datacenter"],"name":"westus","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-America","_edgeType":"connectedTo","_toUniqueId":"DataCenter-westus"}
V:{"uniqueId":"DataCenter-westus2","entityTypes":["datacenter"],"name":"westus2","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-America","_edgeType":"connectedTo","_toUniqueId":"DataCenter-westus2"}
V:{"uniqueId":"DataCenter-centralus","entityTypes":["datacenter"],"name":"centralus","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-America","_edgeType":"connectedTo","_toUniqueId":"DataCenter-centralus"}
V:{"uniqueId":"DataCenter-southcentralus","entityTypes":["datacenter"],"name":"southcentralus","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-America","_edgeType":"connectedTo","_toUniqueId":"DataCenter-southcentralus"}
V:{"uniqueId":"DataCenter-southeastasia","entityTypes":["datacenter"],"name":"southeastasia","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-AsiaPac","_edgeType":"connectedTo","_toUniqueId":"DataCenter-southeastasia"}
V:{"uniqueId":"DataCenter-centralindia","entityTypes":["datacenter"],"name":"centralindia","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-AsiaPac","_edgeType":"connectedTo","_toUniqueId":"DataCenter-centralindia"}
V:{"uniqueId":"DataCenter-southindia","entityTypes":["datacenter"],"name":"southindia","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-AsiaPac","_edgeType":"connectedTo","_toUniqueId":"DataCenter-southindia"}
V:{"uniqueId":"DataCenter-eastasia","entityTypes":["datacenter"],"name":"eastasia","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-AsiaPac","_edgeType":"connectedTo","_toUniqueId":"DataCenter-eastasia"}
V:{"uniqueId":"DataCenter-australiasoutheast","entityTypes":["datacenter"],"name":"australiasoutheast","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-AsiaPac","_edgeType":"connectedTo","_toUniqueId":"DataCenter-australiasoutheast"}
V:{"uniqueId":"DataCenter-ukwest","entityTypes":["datacenter"],"name":"ukwest","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-Europe","_edgeType":"connectedTo","_toUniqueId":"DataCenter-ukwest"}
V:{"uniqueId":"DataCenter-uksouth","entityTypes":["datacenter"],"name":"uksouth","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-Europe","_edgeType":"connectedTo","_toUniqueId":"DataCenter-uksouth"}
V:{"uniqueId":"DataCenter-westeurope","entityTypes":["datacenter"],"name":"westeurope","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-Europe","_edgeType":"connectedTo","_toUniqueId":"DataCenter-westeurope"}
V:{"uniqueId":"DataCenter-northeurope","entityTypes":["datacenter"],"name":"northeurope","tags":["datacenter"]}
E:{"_fromUniqueId":"Geo-Europe","_edgeType":"connectedTo","_toUniqueId":"DataCenter-northeurope"}


Footnote:
This blog will be presented at this event
1 comment
76 views

Permalink

Comments

Wed September 28, 2022 01:17 PM

Very interesting!
G.