AIOps

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to Blog List

AIOps from the source: Topology Manager's Resource Groups and Templates Demystified

By Matthew Duggan posted Tue November 19, 2024 05:26 PM

Introduction

Managing networks, applications, and infrastructure effectively is more than just a technical challenge — it’s essential for business continuity. One of the most important strategies used by operations teams is organising topology, alerts, metrics and logs into logical groups that provide contextual advantages. These groups might pertain to physical resources like servers and their connected network switches, more abstract resources such Kubernetes deployments and their pods, geographical locations or simply data that shares common characteristics such as tags - whatever matters to the business.

In this article, we explore IBM Cloud Pak for AIOps' Topology Manager resource groups and templates and how to use them.

Why Resource Groups Matter

Resource grouping is important for several reasons. First, it helps create a clear, structured and more navigable model of the managed environment that aligns with its technical characteristics and the organisations' business goals and structure. Second, resources and resource groups provide the foundations on which better incident management, event correlation, maintenance planning and impact analysis all build on. Resource groups lend themself well to use-cases outside of incident and event management because they exist whether or not there's alerts about problems with the managed environment. Given the right resource groups, operations teams can very quickly determine that an alert relates to infrastructure in a specific service, who owns the service, and how the problem should be prioritised.

The following diagram depicts "progressive value discovery" where, typically, users start with alert and event management, expand with logs, metrics or topology, and then look to build on that with resource grouping and service models - each capability builds on the previous one to deliver more value.

Simplifying Troubleshooting and Response

In the face of an incident, resources groups provide operations teams with the ability to contextually frame management data. For example, rather than addressing each alert and resource in isolation, resource groups give operators a broader view of related dependencies and services, allowing them to understand the possible ripple effects of an incident. Furthermore, resources can be members of multiple resource groups allowing the operations team to examine resources and therefore correlated alerts from multiple perspectives to help understand impact, priority, location, owner and a myriad of other concepts.

The following diagrams hint at the value of resource groups - the first one giving the user rapid visibility of problems with services through a simple experience that hides complexity. The second one shows how the user can easily understand which resource groups a sever is a member of and a summary of their state, giving great context quickly to the user to help them make more informed decisions.

The Importance of Regular Updates

To keep these advantages, it’s crucial to regularly review and update resource groups. IT environments are constantly evolving, with new deployments, scaling, decommissioning, or shifts in organizational structure. As resources change, outdated group definitions can lead to ineffective monitoring, incorrect alerting or even worse, critical resources being left unmonitored and a loss of trust in the management tool.

Frequent updates to group membership and presence ensure that the system reflects the current landscape, which is key for efficient operations. This process should be as dynamic as the infrastructure itself, incorporating automated discovery tools or regular audits to maintain accuracy.

What can I do with resource groups in AIOps and Topology Manager?

AIOps makes extensive of use of Topology Manager's grouping capabilities and here are 20 cool things you can do with them:-

You can group resources in ways to fit your business needs, across data-sources and management domains, and automatically adapt to changes in your environment.
You can search for and filter resource groups and save the filters for recollection later.
You can assemble resource groups into more complex applications and services. You can also favourite these and view them on the login page.
You can easily understand which resource groups a given resource is a member of.
You can favourite groups for an at-a-glance view of their health.
You can view the member resources of a resource group as a topology or table view.
You can view the properties of a group and easily see if it's supporting more complex applications and services.
You can tag groups to provide additional context and search and filtering capabilities.
You can get a URL to a resource group for easy sharing, e.g. in a Slack channel or ticket.
You can assign business criticality to groups to help differentiate them and to help prioritise working on incidents related to them - think Pets vs. Cattle.
You can easily understand how the resources within a group have changed over time by using the timeline and delta capabilities.
You can easily see how a resource group's resources have behaved over time regarding their state and relationship and property changes by using the topology activity panel.
You can use resource groups as the basis of AIOps incidents, where related event, metric, topology and runbook data is collated in-terms of a resource group and alerts prioritised in-terms of the topology within resource group.
You can view resource group and business criticality data in correlated alerts, giving further context to the alert manager.
You can view resource group context in alerts and incidents to help reveal business and service context and improve the decision making process.
You can manually position the resources in a specific type of resource group to meet user visual preferences and have it remember their positions.
You can create an exact template and group directly from a standard topology view.
You can easily query for group data from Topology Manager's APIs.
You can do some really powerful data processing using rules for Token and Tag Templates, check out the documentation at https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=rules-reference.
You can assign icons to groups, including your own icons :)

O.K., so what is a resource in Topology Manager?

In Cloud Pak for AIOps Topology Manager, a resource is 'anything' under management. This typically includes servers, network ports, Kubernetes pods, disks and physical components such as cards and power supplies - whatever makes sense to an organisation's operations team to manage. It's these resources that we want to group together in ways that are meaningful to the operations team.

Topology Manager uses a graph and relational database in parallel and resources are graph vertices (nodes) and 'inventory' records at the same time. Resource vertices are related to other resource vertices via relationships (edges) that represent a known relationship. It's these resources and relationships that Topology Manager's time-series graph is composed of.

Both resources and edges have properties that help provide context to the user and help with data processing, visualisation and filtering. Check out the documentation for more information at https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=apis-topology-api-reference

Let's say you were building a topology model of the classic sci-fi film "The Matrix" and you want to model that Neo knows Morpheus and Trinity, that they're the crew of the Nebuchadnezzar and that Morpheus battles Agent Smith and has state. Each character and concept is a vertex in Topology Manager and the following diagram depicts the key concepts used - the main takeaways being:-

Resources have properties, some of which such as 'uniqueId', 'name' and 'tags' are reserved. Otherwise, Topology Manager is very flexible regarding what it can model - resources can have array/list and map/dictionary properties no problem, just bear-in-mind how usable that data is in the UIs and via APIs.
Other than reserved properties, resources don't need to conform to the same model, e.g. it's fine for a resource instance of the same type to have an age property but not another.
Edges have an 'edgeLabel' (family) value from a fixed set, such as 'association' and 'dataFlow'. The 'edgeType' property represents the type of relationship within its family, such as 'knows' isa association relationship. Edges can also have properties but not as flexibly as resources - stick to standard scalar-like key/value pairs.
Resources and Edges have timestamps for historical tracking purposes (not shown in the diagram, but they're reserved properties). By default, resources have a 30 day TTL applied to them in the event of deletion after, and plus the gc_seconds_time of 10 days for Cassandra, they'll be erased on the next Cassandra compaction run.
Resources can optionally have a state model attached to them. This state model is automatically driven by the topology integrations and/or by unambiguously correlating events to resources.
The data exposed by the APIs can transparently expose records to the user that are formed from multiple graph elements.
Resources can be arbitrarily related to each other but there are prescriptive models for devices, resource groups and application/service models. See the examples at this link:- https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=reference-topology-examples.
Resources can be members of 0..n resource groups.

Where do resources come from and how are they managed?

Resources in topology manager come from Observer integration types, commonly used ones being ITNM, File, ServiceNow, Dynatrace and VMware vCenter - the full list is here:- https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=integrations-observer-jobs. Each Observer integration type can provide a number of Observer job types that serve as the unit of work for getting data from a given source. Some Observers, such as Kubernetes, can also automatically provide resource groups.

Observer jobs either 'load' data from the source or 'listen' to data from the source. The difference is that load jobs can be considered as a batch process that can be scheduled to run frequently and/or that can be run on an adhoc basis. Listen jobs are intended to consume a stream of data from the source to reflect changes as rapidly as possible. Load jobs typically get a snapshot of data from the source with little or no lifecycle data about the resources and relationships being retrieved, think of it in terms of running an SQL SELECT query against a database schema. On the other hand, listen jobs typically see CRUD lifecycle data revealed from the source.

The following diagram summarises the data flow and note that Topology Manager consumes data 'on-the-fly' and as rapidly as possible regardless of whether listen or load jobs are used. i.e. load jobs are not transactional and fresh data is available as soon as possible.

Observers jobs provide data to the Topology Service in-terms of a 'provider' which serves as a namespace for the data. Everything provided by job ABC in the diagram is, in this case, provided in-terms of the provider ABC which is typically the case - an observer job instance is typically 1:1 with its provider and that is the basis of Topology Manager's lifecycle, processing and metadata management for incoming data.

Where do providers come from? Providers are typically built by the observer when creating the job but they can often be manually specified with caution. You might be asking why providers are decoupled from observer jobs - can't the data just be provided in-terms of the job ABC? The reason they're decoupled is that some more complex integrations require two different types of job to act on each others data, in which case a shared provider is used.

It is possible for load and listen jobs to coexist and to be used interchangeably. This is necessary because listening to a source typically does not reveal data to 'bootstrap' the topology, i.e. if you start listening to a stream of resource updates now, you don't know what else is in the source and so you'd have partial topology and likely updates to records you've never seen. Integrations such as the NFVD Observer can switch between load and listen jobs for the same provider so the load job bootstraps or mops-up gaps in listening and the listener drives frequent updates into the graph.

You might now be thinking "how does Topology Manager manage lifecycle for load jobs if CRUD data is not revealed by the source?". It automatically keeps meta-data about what it sees or each job instance run and that is used to help decide whether or not a resource should be (soft) deleted or not. For example, if job ABC sees resources [alpha, beta, charlie] at time 1 and only [alpha, charlie] at time 2, then beta will automatically get deleted.

It's important to note that data from a given job's provider is only considered to be unique in-terms of that provider. That is, if you provide a resource with a uniqueId value of serverA in jobs ABC and XYZ as depicted in the diagram and with no common mergeToken values present, the user will see two serverA resources in the UIs. If there's truly only one serverA and its seen from different perspectives, such as from VMware vCenter and a custom file, then you can merge them together using merge rules into a 'composite' - see the documentation at this link:- https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=topologies-managing. When thinking about the groups you want to manage, consider whether you have resources from different sources that need to be merged.

How are resource groups created and managed by Topology Manager?

Topology Manager provides a number of ways of creating and managing resource groups to fit with business requirements of the organisation. At the most fundamental level, a resource group is a set of resources that should be grouped together according to some criteria to meet a specific business goal. For example, one might want to create a group for each physical server and the VMs running on them, a building and the devices in it, or group resources known to support every Kubernetes deployment.

The reason for this constraint is to maintain a good user experience and for performance reasons and especially given that a topology with 2,500 resources may have many more relationships.

The challenge is how to create and maintain the resource groups given the diverse grouping criteria we see and that different techniques need to be used depending on the scenario and available data. Grouping of resources appears to be a simple problem to solve but there some are significant challenges in keeping resource groups up-to-date given:-

The topology graph may be very large and comprised of composite resources formed of data from many sources.
The topology graph may be a 'moving target' subject to rapid and unpredictable queries and updates from many Observers, event-to-topology correlation and UI and API users in parallel.
The resource grouping criteria may be complex and need computationally expensive queries and updates to maintain resource group accuracy.
In response to any update of the topology graph, there may need to be many resource groups created and/or updated and/or deleted.

Topology Manager provides the concept of Templates as the means of creating and managing resource groups for the majority of use-cases via intuitive administration UIs. For advanced use-cases, it is possible to define templates via APIs and to define and provide resource groups via File Observer files but they're out-of-scope of this article.

The following template types are provided by Topology Manager and we'll investigate each in-turn.

How many groups can I have and how big can a group be?

We currently support 30,000 groups of any origin and consideration should be given to the number of resources within each group - the Topology Manager UI will not display groups with greater than 2,500 member resources and large groups are slower for downstream systems such as probable cause to process and it can significantly increase load on Topology Manager.

What about event correlation?

Resource Groups in Topology Manager summarise the status severity of their resources, any incidents related to the resource group and any assigned business criticality as depicted by the following diagram and screenshot. This model is intended to give users rapid visibility of the overall status of a resource group, whether it has an associated incident and whether it should be prioritised in terms of its business criticality. User defined business criticality terminology and associated weights help serve as a differentiator among resource groups and resources in the event they have the same criticality and are intended to help prioritise independently of state.

Which Topology Manager Template type should I use?

The following diagram suggests a workflow to adopt when deploying Topology Manager that accommodates both simple and more complex use-cases from initial use-case gathering through to a finished deployment, part of which is determining which grouping strategy to use.

As you'll see below, each Template type has its pros and cons and use-case applicability. The following table will also help you decide on which Template type to use.

TOP TIP: best-practice is to make use of a development environment to test your group creation before using them in production. Start with your smallest and most tightly scoped grouping requirements and iterate from there.

TOP TIP: best-practice is to try and aim for not too many small groups and not too many large groups - aim for a middle-ground re: the number of groups and their size. If in doubt, a rule of thumb is that resources groups would ideally not be larger than three or four hundred resources.

TOP TIP: resource groups can overlap if their members are in multiple groups and you can use this to your advantage. For example, if modelling a network and you want to correlate problems from the edge of the network into the core via aggregation devices, use two or more overlapping groups.

Why can't I just use resource tags?

You can but groups and tags deliver very different value. Think of tags as a means of adding extra visual context to resources and groups and providing good search and filtering controls. Resource groups deliver far more value than tags and can be thought of as a major extension of them.

Our Sample Topology

The discussion that follows shall use the following topology as an example to demonstrate the various grouping mechanisms.

Each resource has the following characteristics:-

This topology was loaded from a File Observer file which contained the following entries.

V:{"_operation":"InsertReplace", "uniqueId":"h1", "entityTypes":["hexagon"], "name":"h1", "tags":["alpha","bravo"], "myProperty":"xray", "_status":[{"state":"open","severity":"critical","status":"shapeStatus","description":"Hexagon critical"}], "_references":[{"_edgeType":"connectedTo","_toUniqueId":"c1"}]}
V:{"_operation":"InsertReplace", "uniqueId":"h2", "entityTypes":["hexagon"], "name":"h2", "tags":["alpha","bravo"], "myProperty":"xray", "_references":[{"_edgeType":"connectedTo","_toUniqueId":"c2"}]}
V:{"_operation":"InsertReplace", "uniqueId":"c1", "entityTypes":["circle"], "name":"c1", "tags":["bravo","charlie"], "myProperty":["xray","zulu"], "_status":[{"state":"open","severity":"major","status":"shapeStatus","description":"Circle major"}], "_references":[]}
V:{"_operation":"InsertReplace", "uniqueId":"c2", "entityTypes":["circle"], "name":"c2", "tags":["bravo","charlie"], "myProperty":"zulu", "_references":[]}
V:{"_operation":"InsertReplace", "uniqueId":"s1", "entityTypes":["square"], "name":"s1", "tags":["charlie","delta"], "myProperty":"tango", "_references":[{"_edgeType":"runsOn","_toUniqueId":"c1"}]}
V:{"_operation":"InsertReplace", "uniqueId":"s2", "entityTypes":["square"], "name":"s2", "tags":["charlie","delta"], "myProperty":"tango", "_references":[{"_edgeType":"runsOn","_toUniqueId":"c2"}]}
V:{"_operation":"InsertReplace", "uniqueId":"d1", "entityTypes":["diamond"], "name":"d1", "tags":["delta","echo"], "myProperty":["tango","zulu"], "_references":[{"_edgeType":"uses","_toUniqueId":"c1"},{"_edgeType":"uses","_toUniqueId":"c2"}]}

Common Template Characteristics

Each Template type has a common set of properties:-

Template Name - the name of the template being defined.
Description - an optional description of the template being defined.
Resource group type - the type of resource group being created, such as compute, network or a custom type.
Correlation Switch - whether the resulting resource group(s) should be used in alert and incident management.
Icon - the icon to assign to the resulting resource group(s).
Tag - the tag(s) to apply to the resulting resource group(s). These tags are then visible in the UI and available for search and filtering. It is best practice to tag groups with something independent of the resources that'll be members of resource groups.

What can I edit after a template has been created?

Once a template has been saved, Topology Manager immediately starts the processing necessary to meet the goals of the template for all but token templates that rely on Observers to provide the data used to group by. Each template allows a set of characteristics to be edited as follows.

TOP TIP: Significant changes to a template can result in larges amounts of processing to be performed and so consider what the groups from a template need to mean to the business as part of any edits you make.

Dynamic Template

Metadata: description, generated group type, name prefix and/or suffix, correlate true/false, icon and tags.

Grouping Criteria: seed resource filter tags and per-hop A & Z-resource filter tags. Use the ‘Undo’ button to backtrack and remove steps taken when building the sample topology. Then, use the navigation tools to select alternative next hops to update the sample topology.

Exact Template

Metadata: description, generated group type, name prefix and/or suffix, correlate true/false, maintain manually specified node positions, icon and tags.

Tag Template

Metadata: description, generated group type, name, correlate true/false, icon and tags.

Grouping Criteria: add/remove to the set of tags that resources must have to be in the group.

Token Template

Metadata: description, generated group type, correlate true/false, icon and tags.

Grouping Criteria: add/edit/delete the rules used to drive property values that in-turn generate groups from the template.

How can I remove a resource from a group?

The answer to this question depends on how the group is built and there's currently no way of explicitly excluding a resource from a group from the UI. Also, does removing a specific node change the meaning of the group to the business? It may be better to create a new template taking into account the different criteria.

For tag and token templates, the best way is to ensure the resource in question no longer has the tag or token value that made it a member of a group or less appealing is to change the criteria of the group.

For exact and dynamic templates, this is a trickier problem to solve because of their concept of a complete group, i.e. the criteria needs to be fully satisfied and there's currently no means of post-processing to remove a specific node. At the moment, one could look at using tag filters and if necessary tagging a resource so that it does not appear in the group in question.

Dynamic Templates

Dynamic templates provide a high degree of automation through a WYSIWYG-based approach to managing potentially large numbers of resource groups. Dynamic Templates are useful when wanting to find resource groups based on repeating patterns in the topology, such as if wanting to group by each Kubernetes deployments and their related pods and worker nodes. The set of resource groups available and/or resource group membership is updated in response to changes in the managed environment and, as a result, the topology from which the resource groups are derived.

The business logic for resource grouping is expressed as an example topology and graph traversal that Topology Manager uses to find resource groups and their resource membership. This example graph traversal starts with a 'seed' resource followed by a sequence of traversal steps to walk through the graph. The template processor finds all resources that match the type and optionally the OR'd tags of the example seed resource and iterates through them. For each seed resource found, the graph traversal is run and if it can be fully satisfied, the set of resources found, including the seed resource, is added to a resource group named based on the seed resource. The resource group type and other characteristics are derived from the Dynamic Template's common characteristics.

TOP TIP: Resource groups from Dynamic templates include resources that are related to each other. If your business needs don't care about relationships, Token or Tag Templates are worth considering.

TOP TIP: Seed resource selection is important for two reasons - one is that the number of resource groups generated is based on the seed resource and two is that Dynamic Template resource groups are named based on seed resources which are why we expect to see two groups - h1 and h2.

TOP TIP: Choose seed resource characteristics with the lowest cardinality whilst meeting the business goals as this will generate fewer groups and so will be easier and more performant to manage and update, provide a better user experience, and put less pressure on Topology Manager and downstream systems. Avoid grouping by resource types that typically have very high cardinalities such as network interfaces and containers unless they can be more tightly scoped through tag and traversal steps. Instead, prefer resources with relatively low cardinalities such as devices or Kubernetes deployments and include related higher cardinality resources in their resource groups.

Given the sample topology, assume that the business requirement is:- "create a resource group for each hexagon that consists of any related circles, anything that runs on those circles and any related diamonds.". We want the groups to be as shown by the following diagram, noting that diamond d1 will be a member of two groups. Note that resource state is not taken into account for grouping purposes.

Let's walk through how to create the Dynamic Template in Topology Manager.

1. First, create a Dynamic Template and complete the common template characteristics. In this case, we're going to create 'infrastructure' resource groups with the infrastructure icon, make them correctable and we'll tag any resource groups created with 'dynamic shape groups'.

2. Search for a seed resource that matches the business criteria and that you know is related to the other resources of interest. In this case, we're choosing hexagon h1 as the seed resource because it is a suitable entry point into the topology and it's a suitable resource to traverse from.

Resource groups created by this template will be named based on the seed resources found for complete groups that satisfy the traversal criteria. In this case, we will end up with two resource groups - h1 and h2. You can optionally suffix or prefix the name of the resource groups generated using the 'Resource group naming pattern' capability.

TOP TIP: You can right-click on a resource in the standard topology viewer and access the template builder UI in-context of that.

3. Once you've selected the seed resource, you can see an indication of how many seed resources (and potentially the number of resource groups) Topology Manager has found. Seed scope control lets you fine-tune the seed resources that the template will use to iterate over - try specifying tags and watch the number of seeds found change. A good use of this is to create a template that, for example, focuses on specific Kubernetes services by using label tags. Note that multiple tags can be specified but that they are OR'd, e.g. you could say 'seed the template using hexagons tagged with alpha or bravo'.

4. You now have to provide an example traversal by using the navigation capabilities of the right-hand screen, the simplest of which is the hop selection tool but exercise caution as some topologies can become very large within a small number of hops from a seed resource.

In our case, we'll use a more targeted approach to creating the traversal and this should be considered best-practice. Use the right-click navigation tools from resources to walk through the graph one step at a time, the options are:-

Get neighbors walks through the graph focussing on the type of adjacent resources, regardless of how they're related to the selected resource.
Follow relationship walks through the graph focussing on the relationship type to/from the selected resource and regardless of the type of adjacent resources.

TIP TIP: Dynamic and Exact templates do not currently differentiate between from/to or in/out edges when traversing the topology.

TOP TIP: The right-click navigation tools give a previous of what's related to the selected resource but not yet displayed.

In our scenario, we want "for each hexagon that consists of any related circles" and so we don't care about how they're related to hexagons, just that they are - so choose Get neighbours and choose circle. The (1) in this case indicates that there's one circle related to the selected hexagon. As you walk through the topology, the left-hand panel will summarise each step you've performed and the right-hand panel will expand to show the resulting topology inclusive of the most recent step performed. As with seed tag scoping, you can specify OR'd tags for each traversal step to fine-tune how the template processor will walk through the graph.

TOP TIP: If you choose a layout algorithm, the template processor will ensure that any resource groups generated from the template use that layout when viewing them.

TOP TIP: Your choice of traversal step type determines what happens to the generated resource groups if the topology changes. If we saved the template now, we would generate two resource groups - h1 = { h1,c1 } and h2 = {h2,c2 }. Consider what would happen to the h1 resource group in the following update scenarios:-

Our traversal step wants to include any circles related to hexagons and regardless of how they're related. Group h1 would therefore be updated to be h1 = {h1,c1,c3}.
Our traversal step didn't want squares related to hexagons in any way and also didn't want to follow connectedTo edges, so this scenario would not change h1's resource membership, so h1 = {h1,c1}.

5. Our next business requirement was to include anything that runs on circles, and so in this case we follow the runsOn relationship.

6. Our business requirement is to also include "and any related diamonds" and so a TOP TIP is to right-click on resources, click Get neighbours and preview to see if there's diamonds the traversal should visit. In our case, the only diamond to visit is related to circle c1, so click it.

7. Now we're happy with our sample traversal, click the blue Save button and the template processor will get to work, showing previews of each resource group as they become available. As expected, we now see two resource groups previews in the template builder.

TOP TIP: If you want to edit or delete the template and the groups it has generated, then you can go to the template admin page and refine your traversal or delete the template. TIP TIP: it can take some time to update a template's groups or delete them.

8. Assuming you're happy with your new groups, that's it! They'll automatically update and if we introduced a new hexagon that matched the template criteria, we'd have an additional group created.

Here's what one of our groups looks like in the UI, note the state of the h1 group is summarised based on its member resources. There's also an example of how the group data is visible in the alert and incident viewers, note the alert portray the correlated resource topology and that there's resource groups associated with it.

Exact Templates

Despite the name, Exact Templates are the same as Dynamic Templates with respect to their ability to update group membership based on topology changes. The difference is that only a single group is created for the specified seed resource and example traversal but with a user-specified name. Here's an example where we want a group seeded on diamond d1 + one hop away, and we'll create it from the standard topology viewer.

You might be wondering why you'd use an Exact Template instead of a Dynamic Template. The answer is that sometimes it's necessary to create a single group with very specific characteristics to meet a more specialised business goal. It may not be possible to sufficiently tightly scope a Dynamic Template's seed resources to only create one group whereas an Exact Template makes that simple. An example use-case is if you've got "Pet's vs. Cattle" use-cases and want to treat your pets in special ways!

TOP TIP: Exact templates can remember the positions of the member resources if you move them around.

1. From the topology view having used the navigation tools to view the topology you want treated as a group, click the cog icon and choose Create exact template.

2. Complete the Exact Template common criteria, choose its type, icon, whether to correlate or not, its tags and name the group. Once happy, save the template.

3. You can now view the new diamond group in the resource management groups page.

TOP TIP: You can view resource groups as a table of resources instead of a topology view.

Tag Templates

Tag Templates are the most intuitive ay of grouping resources and they're intended provide a single of group of resources that share the specified tag and regardless of how and whether they're related to each other. For example, you may want to group all resources together that share a tag denoting their administrative contact or a Kubernetes label.

Like Exact templates, although a Tag Template creates one group, they're still able to adapt to changes in the environment, adding/removing resources and the group if necessary based on their tag use.

Where do tags come from? Some Observer integrations provide tags out of the box but a best-practice is to configure tag rules that allow resources to be tagged based on criteria including resource origin, type, property value and so forth. Resources can also be tagged with literal strings and/or a mix of literal strings and property values if need. For now, we've already got the tags we'll use but look for a Rules Demystified article and check out the documentation at https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=elements-configuring-rules.

TOP TIP: When creating tag rules, it's best practice to qualify tag values with something that indicates what they mean. For example, if you've a tag of 'aiops' it's better to make it more meaningful with something ike 'app:aiops'.

Given the sample topology, assume that the business requirement is:- "create a single resource group that contains resources tagged with bravo.". We want the resource group to be as shown by the following diagram.

Let's walk through how to create the Tag Template in Topology Manager.

1. First, create a Tag Template and complete the common template characteristics. In this case, we're going to create a 'cluster' resource group with the cluster icon, make it correlatable, tag it with 'my tag group' and call it 'Bravo Template' and the name of the resource group 'bravo'.

2. To specify the tags, you've got two options on the right-hand panel of the screen. One is to find the tags if you know them but I prefer to switch to Resources as that lets you search for resources typical search properties including name, entityTypes and tags. It also lets you preview the resources that'll be added to the group by the specified tags.

Let's search for bravo and see which resources match. In this case, we can see c1, c2, h1 and c2 are tagged with Bravo. If you click on the blue resource name link, you'll expand the resource details but also show the set of tags available. In this case, I chose c1 and it is tagged with bravo which I selected.

TOP TIP: if you click the down arrow on a row in the Resources table, you can expand and see properties to provide better context.

3. Now save the template and you'll see a preview of the generated resource group.

4. You can now view the new group in the group management page in the same way as groups from Dynamic and Exact Templates.

Token Templates

What if you want a Dynamic Tag Template? Token Templates provide the means to use a nominated property with unknown values or a token expression to drive the creation and management of resource groups.

Recall that our sample topology resources include a 'myProperty' property with a mixture of scalar and list values. We want Topology Manager to create groups of resources based on those values and so we can expect to see groups for tango, xray and zulu, i.e. the unique set of values of myProperty.

TOP TIP: you can also use array/list values of a nominated property if you want a resource to be added to multiple groups.

Let's walk through how to create the Token Template in Topology Manager.

1. First, create a Token Template and complete the common template characteristics. In this case, we're going to create 'network' resource groups with the network icon, make them correctable and tag them with 'my token template'. Resource group names is automatically based on the value of the nominated property.

2. Token Templates need rules to determine which property values should be used to drive resource group creation and management. You can add multiple rules to a token template for more complex scenarios but we'll use a simple one here. All we want to do is 'promote' whatever the value of myProperty is to a groupToken and drive the resource groups. Check out the documentation at https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=elements-configuring-rules.

TOP TIP: Any Topology Manager rule you create should generally be tightly scoped, at least to the Observer type and Provider and ideally to specific resource types. This helps ensure that the result of processing that acts on properties managed by rules is tightly focussed and improves ease of administration given rules can be more atomically updated.

Create a new rule as follows - give it a name, nominate the property and specify the conditions for the Observer and provider to act on. This ensures that when the in-scope Observer evaluates the rule on data-in-motion, the relevant resources get updated if the rule applies to them. In our case, the rule will apply to each resource in our sample data because they have a myProperty property.

TOP TIP: You can use regular expressions, concatenate property values and use literals as part of the tokens expressions nominated to be in a rule - check out a forthcoming AIOps from the Source blog about that.

Once the rule has been added, you'll see it reflected in the list of rule definitions that apply to the template.

3. Save the template and, unlike other templates, you'll notice that no groups have been created - why is that? This is because other template types use graph graph traversals that can be run against the stored in the graph database whereas Token Templates rely on a groupToken property that's populated by Observers acting on rules against data-in-motion. Re-run your Observer job(s) to see the result of using templates and, as expected, we've got three resource groups created for tango, xray and zulu.

Here's the set of resource group templates we've now built (My Services existed before writing the blog :-) ).

Here's examples of the groups created by the Token Template - note some have connected resources, some don't.

UI Search and Filtering

To find resource groups you can use the Google-like search bar and more structured and capable filters.

Google-like searches are well suited to adhoc queries and are intended to avoid the user needing a lot of prior knowledge about the resources and resource groups under management.

In the following example, we searched for d1 and got a hit based on the resource group name.

In the second example, we searched for token and got hits for resources where their tags matched the search term.

TOP TIP: searches don't have to be exact to find what you're looking for.

Topology Manager's filtering capabilities allow more elaborate and structured filters to be defined for applications/services, resource groups and resources (topology elements). Key capabilities of filters include:-

The ability to find topology elements by their status severity, business criticality and properties.
The ability to express a rich set of operators for nominated properties including equality, substring matches, set matches, emptiness checks, starts with, ends with, greater than, less than and regular expression matches.
The ability to AND multiple conditions together.
The user experience for inventory filtering is consistent with the alert and incident filtering user experience.
The ability to save and name filters as filters to assign to users groups or users or as restriction filters and assign them to user groups.

Here's an example of a filter definition based on the following business requirement: "create a filter for everyone that easily allows them to find any resource groups with a critical or major severity and that's of the most important 'Tier 1' business criticality.".

1. Define the Filter conditions to meet the business goal - here we want to find anything assigned with Tier 1 business criticality that has either critical or major severity status associated with it.

2. Save the filter and assign it to everyone.

3. Activate the saved filter from the drop-down filter menu.

API fun

Topology Manager has many APIs available and this article wouldn't be complete without providing an example of how to obtain group data from one. In this case, we'll use the Topology Service API to get some data.

TOP TIP: It's well worth enabling routes to your Topology Manager services to simplify accessing their APIs if you have use-cases that need them. Check out the documentation at the following link:- https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=apis-enabling-access-url-routes.

Follow the instructions in the TOP TIP above and get access to your routes.
Login to your OCP cluster and get an access token to your cluster, install the oc command on your PC/Mac and get your topology routes.
Using the HOST and PATH from step 2, point your browser to the Topology Service Swagger page, e.g. https://aiops-topology-topology-cp4aiops.challenger/1.0/topology/swagger
You should see a webpage like this:-
Authorise using API credentials. You can retrieve these using commands such as the following, but refer to the documentation at:- https://www.ibm.com/docs/en/cloud-paks/cloud-pak-aiops/4.7.1?topic=apis.
1. Username:
  oc get secret aiops-topology-asm-credentials -o jsonpath='{.data.username}' | base64 -d
2. Password:
  oc get secret aiops-topology-asm-credentials -o jsonpath='{.data.password}' | base64 -d
Open the Groups API category and plugin the parameters into the /groups API. Use the default X-TenantID parameter of cfd95b7e-3bc7-4006-a4a8-a73a79c71255 and the following _field parameters.
Run the API query to retrieve a list of groups in the system.
1. Here's a sample cURL command:- curl -X 'GET' \
  'https://aiops-topology-topology-cp4aiops.challenger/1.0/topology/groups?_field=name&_field=entityTypes&_field=tags&_include_count=false&_return_composites=false&_include_status_severity=false&_include_story_priority=false' \
  -H 'accept: application/json' \
  -H 'X-TenantID: cfd95b7e-3bc7-4006-a4a8-a73a79c71255' \
  -H 'authorization: Basic YcHMtdXNlcjo3ZEk0dlpwbjZnUVFKQ2U5NdlBjbVNBSGd1MndZRFJFPQ=='
2. Here's a sample record from the above query:
To get the members of a group, you can use the /groups/{id}/members API as follows, using the _id from the previous record as follows.
1. Here's a sample cURL command:- curl -X 'GET' \
  'https://aiops-topology-topology-cp4aiops.challenger/1.0/topology/groups/lcyaFue9T7S5XSSZ958byw/members?_field=name&_include_count=false&_return=nodes&_unpack_groups=false&_include_bystanders=false' \
  -H 'accept: application/json' \
  -H 'X-TenantID: cfd95b7e-3bc7-4006-a4a8-a73a79c71255' \
  -H 'authorization: Basic YcHMtdXNlcjo3ZEk0dlpwbjZnUVFKQ2U5NdlBjbVNBSGd1MndZRFJFPQ=='
2. Here's a sample response from the above query.

Final Thoughts

Managing resource groups in IT is not just a matter of convenience but a fundamental best practice that enhances clarity, accelerates incident response, and improves overall operational efficiency. In this article, we've discussed the benefits of resource groups in AIOps, how they work and how to use them.

0 comments

110 views

Permalink

https://community.ibm.com/community/user/blogs/matthew-duggan/2024/11/17/aiops-from-the-source-groups

AIOps

AIOps

AIOps from the source: Topology Manager's Resource Groups and Templates Demystified

By Matthew Duggan posted Tue November 19, 2024 05:26 PM

Introduction

Why Resource Groups Matter

Simplifying Troubleshooting and Response

The Importance of Regular Updates

What can I do with resource groups in AIOps and Topology Manager?

O.K., so what is a resource in Topology Manager?

Where do resources come from and how are they managed?

How are resource groups created and managed by Topology Manager?

How many groups can I have and how big can a group be?

What about event correlation?

Which Topology Manager Template type should I use?

Why can't I just use resource tags?

Our Sample Topology

Common Template Characteristics

What can I edit after a template has been created?

How can I remove a resource from a group?

Dynamic Templates

Exact Templates

Tag Templates

Token Templates

UI Search and Filtering

API fun

Final Thoughts

Permalink

Additional
Resources

Office

Quick Links

AIOps

AIOps

AIOps from the source: Topology Manager's Resource Groups and Templates Demystified

By Matthew Duggan posted Tue November 19, 2024 05:26 PM

Introduction

Why Resource Groups Matter

Simplifying Troubleshooting and Response

The Importance of Regular Updates

What can I do with resource groups in AIOps and Topology Manager?

O.K., so what is a resource in Topology Manager?

Where do resources come from and how are they managed?

How are resource groups created and managed by Topology Manager?

How many groups can I have and how big can a group be?

What about event correlation?

Which Topology Manager Template type should I use?

Why can't I just use resource tags?

Our Sample Topology

Common Template Characteristics

What can I edit after a template has been created?

How can I remove a resource from a group?

Dynamic Templates

Exact Templates

Tag Templates

Token Templates

UI Search and Filtering

API fun

Final Thoughts

Permalink

Additional Resources

Office

Quick Links

Additional
Resources