Cognos Analytics

 View Only

Multi-modeler, multi-module techniques

By IAN HENDERSON posted Fri August 23, 2024 10:43 AM

  

Multi-modeler, multi-module techniques


Modelling projects can be complex, with many objects, which no one modeler could model in a reasonable time frame. This paper documents one technique for modelling complex projects using multiple data modules to create a reporting application.

A data module can be used in other data modules. The objects of the data module will be displayed as shortcuts or references. This allows the modeler to reuse objects and allows multiple modelers to work on a project simultaneously.

If you can divide an envisaged model into discrete segments then you can also divide the work of modelling into discrete segments, with each segment in its own data module. Each data module could be brought into other data modules. The objects in a data module could then have relationships made to objects from other data modules.

A project following Cognos modelling best practices would lend itself to division into discrete, modular segments. This is because any object in the model would have one and only one possible relationship path between it and any other object. If an object is involved in role-playing you would use aliases. Thus, you could plug a data module into another data module.

Dimensions are natural discrete segments of a model. Each dimension consists of the tables of that dimension. If you need to create a view to incorporate several tables into a dimension-table form structure or if you already had a dimension table created in your data base then only that object would be exposed in other modules. They, in turn can be brought into a set of single fact table data modules and plugged in with whatever objects that are relevant. A single fact table data module is also a segment, the objects of which can be used in other data modules.

A data module could be used in many contexts. In the technique described in this paper, data modules, each representing an individual dimension, will be reused in a series of single fact table-based data modules. These fact table data modules will consist of a fact table and its associated dimensions. The fact table data modules would, in turn, be imported into a single data module which would be used by report and dashboard authors and power users. 


Technique

As part of the Cognos proven practices, it is recommended that you follow Kimball’s Four-Step Dimensional Design Process. It is:

1.    Select the business process.
2.    Declare the grain.
3.    Identify the dimensions.
4.    Identify the facts.

In terms of modelling actions this translates into the defining of dimensions, defining their grains of detail through specifying column dependency, defining facts, and laying out of the relationship paths between the dimensions and facts.

As part of that process, you should be able to identify discrete segments of the data warehouse which you are going to model. Each of those segments could be modelled in separate data modules.

An example of sub-optimal processes will often see people dumping the entire data warehouse into a data module and then trying to sort things out. That approach can mean you will have a lot of stuff you need to wade through to do your work and, because it is one big module, only one person can work on it at a time.

The diagram below illustrates an example of the process of multiple data module modelling. On the left you will see a set of dimension data modules, each modelling a dimension. In the centre are three star-schema data modules. Each contain a fact table which contains facts of a particular subject area and those dimensions which are related to that fact. These dimensions have been imported from the pool of dimensions. On the right is a reporting application data module. It contains the three star-schema data modules. 
 

Diagram showing how data modules can be plugged into other data modules so you can have a reporting application created by many modellers rather than one person working on a single data module



Work relevant to a particular area can be thus confined to that area, in a data module which is specific to the area. The work performed there would not prevent parallel work in other areas. With shortcuts, and, to a lesser extent, references, modifications would be automatically propagated from the data module where the modification was performed to the data modules which consume it and then to the dashboards and reports which consume the reporting application data module.

A data module which models a dimension would look something like this:
 

A data module containing all the tables of a dimension



A data module which models a star schema, which incorporates all the dimensions which belong to a single fact table would look like this:
 

Dimension plugged into a single fact table data module, showing it related to the fact table and also showing the other dimensions in the data module.  The sources pane is open, showing all the dimension-based data modules used in the data module as well as the data source object where the fact table comes from.


A data module which contains several star schemas and which would be used as the data module which would be used in reports and dashboards would look like this:

The diagram of a data module which contains the metadata of several star schema based data modules.  This data module would be used by report and dashboard authors and consumers.  The metadata tree is showing additional modelling such as stand alone calculations and filters and navigation paths.


  
I will show each of these in individual sections below.

Modelling dimensions

One logical method to divide the work is to model each dimension in its own data module. You would only need to expose the dimension in the data module as shown in this illustration below.
 

A dimension based data module



In this module are the five tables which make up the Employee dimension in the sample Great outdoors warehouse data base. They are Emp Employee Dim, which is the main table of the dimension, Go Branch dim and Go Region dim, and two lookup tables, Emp Termination lookup and Emp Position lookup.

Branch contains the information for branches, such as addresses and city. Go region dim has the country and regions (collections of countries) information. 
In the diagram you will see relationships between the tables. The cardinality of those relationships models the hierarchy of the geographic entities. Go Region dim has a 1.N relationship to Go Branch dim. Go Branch dim has a 1.N relationship to Emp Employee Dim. The two lookup tables have 1.N relationships to Emp Employee Dim.

In addition, there is a view called Employee. It incorporates all 5 of the other tables.

The nature of the tables, where there are lookup tables and tables with multiple grains of detail, requires that they be incorporated into a view, which would then act somewhat in the manner that a properly designed dimension table would. At that point the view would need column dependency to be defined to normalize the view.

More information on this is available here:

https://www.ibm.com/docs/en/cognos-analytics/12.0.0?topic=metadata-column-dependencies

https://www.ibm.com/support/pages/multi-fact-multi-grain-and-relative-dates-defining-column-dependencies-prevent-double-counting-cognos-analytics-proven-practice-capp-document

Because the dimension needs to be normalized, the objects from the five tables were brought into the Employee view and column dependency specified to normalize the grains of dimensional detail.
 

Column dependency diagram of the employee dimension, showing the employee by manager and employee by region hierarchies converging to the employee column dependency group.



The picture above gives you an idea of it, although you may need to squint a bit. The details are not particularly important in themselves for the purposes of this example, as it is just to give you an idea about how normalization through column dependency looks like.

At the top of the picture, you can see the chain of six column dependency groups, which represents the grains of the employee by manager hierarchy. Below that are the country and branch levels of the employee by region hierarchy. Both converge at the leaf level column dependency group of individual employees, which is that big thing on the right of the picture.

If you are familiar with the Framework Manager (FM) great outdoors warehouse sample model you will remember that there are two model query subjects for Employee, one modelling the employee by manager hierarchy and the other modelling the employee by region hierarchy. With column dependency you can model both in one query subject.

You will notice that all the tables except for Employee are hidden. This is because those tables are used in the view and subsequently do not need to be exposed. The hide from users setting will hide those query subjects in reporting applications. In a data module which uses another data module where some objects are hidden they will be visible, albeit with the hide from user indication. The presence of these objects does not affect the query generation.

If you are using shortcuts, only the shortcuts need to be in the consuming data module, as shown in the picture below. In it you will see the employee data module in the sources pane on the left. You will see the Employee view and below it, the hidden Employee sources tables folder. In the middle is the metadata tree of the Sales data module. I have only included Employee and, because it is a view, it is included in Sales as a shortcut.
 



If you use a data module in another data module then the objects which you want to use are represented either as shortcuts or references. I will talk about the shortcut versus reference question latter on.

The picture below shows the Employee data module in report studio. Only Employee is visible.
 

The employee dimension shown in report studio, illustrating that only the dimension itself is exposed to the reporting application and that the dimension's source tables are not visible.



An example of the use of a data module in another data module is shown in the picture in the next section, where a star schema model based on the sales fact table is shown.

If you know that a table will be used in more than one dimension as part of role playing, then it will be a good idea to edit the identifier of each instance of that table so that the role is clearly identified. Because the identifier is used in the generated Cognos SQL, this will enable you to have a clearer tracing of each instance of that table in the SQL. This is one of the Cognos proven practices.

I have performed some tests where I did not rename the identifiers of objects which were used in multiple dimensions. I did not encounter problems, but I believe that it is probably a good idea to follow that practice.

More information about identifiers is available here:

https://www.ibm.com/docs/en/cognos-analytics/12.0.0?topic=metadata-object-properties

https://community.ibm.com/community/user/businessanalytics/blogs/ian-henderson1/2020/12/21/identifiers-and-their-use
 

Illustration showing the identifier property of an alias



Part of the identification of fact grains will entail the definition of column dependency in the dimensions. Another part of the identification is the definition of relationships between the dimensions and facts, where the keys of the relationship will identify in which dimension grain the fact exists. This means that you would need to have some idea about fact grains before embarking on the modelling exercise but that’s a good idea in general. You can revise column dependency as you better understand what levels of detail and what hierarchies exist in the dimensions and what keys are used for relationships to the facts.

Modelling star schemas

Once the dimensions have been modelled you will be able to start modelling star schema data modules. Each star schema consists of a fact table and its associated dimensions.

A star schema could be used as the metadata for a report on its own or as part of a combined reporting application of many star schemas. A description of such an application follows in the next section.

Changes to the dimension will be automatically propagated to consuming data modules, especially for shortcuts, in most cases. Actions such as deleting tables in a dimension or renaming identifiers will result in you needing to edit data modules which consume that data module.

The process of modelling the star schema is to import the dimensions which are associated to a fact along with that fact table into a data module. The source tables need not be added.

In the star schema you would model relative time. You can also model stand-alone calculations and filters. You will be able to import them into the uber module. 
 

employee in sales



There is a known defect in CA 12.0.2 when you use the same data source in a data module as that used in a data module which is also used in the data module.

The errors will be similar to this:

Error: XQE-MSR-0017 The column "SLS_SALES_FACT.SALE_TOTAL" refers to "SALE_TOTAL", which does not exist. Edit the column so it refers to an existing item.

There are two workarounds.

One work around is to import the fact table into the data module before importing the dimension data modules.

The other workaround is the following.
1. Save the data module.
2. Re-open the data module.
3. Select the table which is generating the errors and delete it from the data module.
4. Open the sources panel and drag the table back into the data module.

This defect is fixed in 12.0.3.

Many star-schema module reporting application module aka the uber module

In most cases, the reporting / dashboard data module requirements will require multiple facts. For example, some actual fact is helpful on its own but there is much more value if you can compare the actuals versus planned values.

To accomplish this, you would need to import the star schemas which you would want to have in the reporting data module. An example of one is shown in the picture below. In it, three star schemas have been imported into a data module. They are Sales, Product forecast, and Returns. The first has sales facts. The second has planned values. The third has values related to returns of products. 
 

uber module


The facts have been arranged by me in the centre of the relationship diagram. The dimensions radiate around them.

You will notice in the diagram that there are dashed lines between the selected shortcut to the fact table, product forecast, and the shortcuts to Time order date, Employee, and Products.

They are relationship shortcuts. Relationship shortcuts are, like all other shortcuts, pointers to an object. In this case they indicate that the targets of the shortcuts have a relationship in an underlying data module.

A relationship shortcut is read only. You can see the properties of the target relationship. Break link creates a new relationship, separate from the original target relationship.

The introduction of relationship shortcuts in CA 12.0.2 was more or less the impetus to conduct the exercises which are documented here, as it, and its related query engine changes, was basically the last element missing from successfully modelling with multiple data modules.

You will notice that each star schema will have instances of dimensions which also exist in some of the other star schemas.

For example, you will notice that the product forecast folder contains, along with the product forecast fact table, shortcuts to the Employee, Products, and Time order date dimensions. You will notice that they are also in the sales folder.

This is basically the same pattern as star schema groupings in FM. This method allows report authors and consumers to know what dimensions have relationships to a particular fact table.

You may be tempted to delete duplicate dimension shortcuts rather than having star schema folders. This would prevent you from indicating to users which dimensions are associated with any particular fact. Also, you can end up with a mess as, although each shortcut points to the same object, each shortcut has its own relationship to the fact table of the data module which it belongs to.

Design considerations

Navigation paths are not carried forward into data modules which consume other data modules. It is necessary to create the navigation paths in the data modules which you wish to expose to use by your report authors and consumers.

You can import stand-alone calculations and filters from one data module to another. The stand-alone calculations and filters thus created in the consuming data modules are copies of the original objects. This means that there is no connection between the two objects. Changes to the source object will not propagate to the instance of the object in the consuming data module. This will entail maintenance overhead because changes to a calculation or filter in the star schema would need to also be done in the consuming reporting application data modules. Similarly, you can edit the expression of the consuming data module object.

Because they are independent of their source calculation or filter, calculations and filters can be modified themselves, which could result in what is seemingly the same calculation object but different expressions.

As a consequence, you need to think through where you will create stand-alone calculations and filters.

Shortcuts versus references

When a query subject from a data module is brought into another data module the query subject will be represented as either a reference or as a shortcut. 
If the object is a table, then, when it is brought into another data module, the query subject will be represented as a reference. You can convert the reference to a shortcut or alias. If the object is a view, then the query subject will always be represented as a shortcut.

I think using shortcuts is to be preferable to using references. References sometimes make queries difficult, and they require baggage such as relationships. Shortcuts point back to the original object so the original object’s relationships will be picked up automatically. They are also extensible as you can add calculations to them. Consequently, I think that converting references into shortcuts would be a good idea.

Another thing you need to keep in mind is that a reference to a reference will not pick up the relationships of the reference. If you converted the reference to a reference into a shortcut to the reference, then the relationship would be picked up.

I have found that if you want to convert references into shortcuts it is better to work your way up from the bottom. For example, assume you have a data module which has a reference in it and the data module, in turn, is used in another data module. It would be better to convert the first reference and then convert the second reference.

If you use shortcuts, then much of the baggage of related tables are no longer necessary to deal with. Just the shortcut would need to be used in the consuming data module. I tend to envision this as plugging one thing into another thing. For example, if you plug a mouse or some other external device into a computer you do not need to worry (now; not true a long time ago, before plug-and-play) about making the mouse work; it just does. Similarly, if you plug a dimension-based data module into another module the thing will work with the other objects in that data module, provided that you do the modelling exercise of defining relationships between those objects and the exposed-from-another-data- module object. The fact that the dimension exists in another data module is not relevant. It is as if the dimension was in the data module.

Thus, generally you will probably be better off converting your reference to shortcuts as they make data modules more modular.

Troubleshooting

When you convert two references to shortcuts the relationship shortcut will not appear. You need to save and reopen the data module. This is a known issue.

https://jsw.ibm.com/browse/MUI-3684

I have observed that if you use query items from multiple shortcuts to the same dimension you can encounter an error complaining about something similar to the following: XQE-PLN-0108 The data item "Product Type" cannot be pushed to any fact streams of the query. I have not established what exactly causes this to happen; it usually does not happen.

It is possible to observe repetition of the summary value of a measure in multi-fact queries, as shown in the picture below. Assuming that you have defined column dependency you probably will not observe double counting in the summary itself. This has been observed to happen if the dimensions and facts do not come from the same data source connection. If you create 1.1 relationships between the instances of a dimension the queries should present the correct values.
 

How a 1.1 relationship can fix a projection issue.



An illustration of such a thing is here.
 

diagram showing 1.1 relationships



Converting a reference to a shortcut will not propagate to the target modules. You need to convert them there as well. As mentioned in the shortcuts versus references section I have found that a conversion process going from the bottom level data modules up works better.

If you have a filter join optimization setting to unique or range of values for a relationship where both or either of the query subjects are shortcuts, then you can encounter a query engine XQE-GEN-0018 Query Service internal error. This problem is fixed in 12.0.4.

8 comments
69 views

Permalink

Comments

2 days ago

@Marc Reed

sorry, I guess wasn't precise enough in describing what I was doing.

I rebuilt exactly what @IAN HENDERSON described.

I created a datamodule for the date dimension and created a shortcut table there.

Then I created two distinct star schema models both using that date dimension shortcut. Then I created the uber model combining those two star schemas.

That uber model is showing two distinct elements for the date dimension, both using the shortcut table.

When I add two distinct KPIs from the two different facts to a cancas and add one of the two date dimensions to the filter bar, both KPIs are filtered.

CA 12.0.3 gave you that cross join error?

6 days ago

@J E
In your example it sounds like you have a single dimension, joined to two facts. A multi-fact query works fine in this scenario.
I was commenting on @IAN HENDERSON uber DM, where I believe he has two fact tables each joined to their own shortcut instance of the same dimension. 
In my DMs the query engine has seen these shortcuts as distinct dimensions, and not the same dimension - hence not producing a multi-fact query.

This also impacts something like dashboards. Add two distinct  KPIs from the two different facts to a canvas, and add the employee dimension to the filter bar. Only one of the KPIs is filtered by employee, the other KPI gives a cross join error as it isn't joined to that instance of the employee dimension. This isn't a multi-fact query, but still suffers the same issue.

I was wondering if this had changed in 12.0.4.

6 days ago

@Marc Reed, in my simplyfied example a multi-fact query seems to work correctly.

A line chart with date and two lines, one from fact a, one from fact b shows the corret values.

Again, the engine can't create a query in native SQL.

The cognos SQL is something like

select date, meassure from fact_A inner join dim_date

select date, meassure from fact_B inner join dim_date 

Those two results are joined with an outer join on date A is not distict from date B.

Sounds plausible to me

6 days ago

@IAN HENDERSON @J E

Ian, following on from JE's comment below.

Another issue I have had when trying this approach in the past is that multi-fact queries do not work correctly.

I thing your diagram of the uber module had me fooled. I though that maybe 12.0.4 had somehow realised there were common dimensions.  Hence multi-fact queries will work.

Can you confirm if in 12.0.4 a query that uses a common dimension such as employees , and multiple facts, that the query engine recognises this is a stitch query and produces the usual multi-fact query.

<<Edited - whilst the comment box allows you to paste in graphics, it doesn't show them>>

9 days ago

I've not been ignoring you.  

11 days ago

Hi @IAN HENDERSON,

thanks a lot for your great post.

Right now, our approach is very different. We are using a huge datamodule with all the tables relevant for our customers and create smaller datamodules for reporting or self-service from our huge SourceModule.

As we are testing CA 12.0.3 right now, we are trying to apply your approach to see how it impacts development and users.

We do have lots of fact tables sharing common dimensions, like employees in your example.

For our test, it’s a date dimension. So, we created a module for that dimension, then a module for star schema A with a fact table A and that dimension and a module for star schema B using fact table B and the same date dimension from the date dimension datamodule.

In an uber module we combine datamodule star schema A and datamodule star schema B.

So now, in that new module we have our date dimension twice, one from each star schema, both with distinct identifiers.

In my opinion, those dimension identifiers should be renamed in each star schema module to prevent problems in report authoring.

When now using one of those dimensions in a dashboard to display a line chart with date and a KPI from the fact table for line position, it works, no matter which of the two date dimensions are used. The generated query of that line chart gives me headaches though. 

Cognos won't generate native SQLs anymore, only Cognos SQLs. They look very similar to the expected queries in native SQL but from my experience, those Cognos queries can be hard to understand, to verify against the database and to analyze when having issues. Most of the times, only having Cognos SQL was an indicator for us that there is a problem in our design.

Is there a problem in our testcase or is that behavior expected, and should we be worried about it?

12 days ago

Hi @IAN HENDERSON

This is something we have been trying to do since day 1 with data modules and failing.

It sounds like the new CA12 relationship shortcuts is  a step in the right direction.


A couple of points.

You mention that standalone calculations and filters can be imported into the uber data module. In earlier versions of CA these become copies, meaning that the original and copy standalone objects would need to be maintained.

In CA12 can you now have shortcuts to standalone objects?

In your uber data module you have a couple of shortcuts to the Products dimension. Have IBM now sorted the issue that I would have to define a navigation paths for each instance of the product dimension?

Previously navigation paths referred to explicit instances of the shortcut, rather than being applicable to all instances of the shortcut.

It's the same product dimension - it seemed crazy that you had to define multiple navigation paths. 

If IBM have fixed it that's great. If they haven't then it kind of makes this 'trying to replicate FMs namespace' structure unusable. You have to define and maintain multiple versions of the same nav path, users get confused when using the umpteen nav paths in a dashboard and so on.

You may have posted about this in the blog... When you create each fact data module, for example your standalone Product DM and standalone Forecast Sales DM, each of those has a shortcut to the Product dimension.
In each of those standalone DMs the Id of the shortcut will be the same Id as the original Product dimension.
For simplicity, lets say the id of the original dimension is PRODUCT, in the Sales DM you will now have shortcut with an id of PRODUCT, and in the Forecast DM you will have a shortcut with an id of product. All of this is fine as they are all standalone dimensions.
But when you bring the fact DMs into the uber DM, you cant have two shortcuts with the ID of product, so one will get renamed PRODUCT_1.
Which has an impact for authors. You would expect anything written against the Sales DM to be able to run against the uber DM. But it may not given the dimension could now be product_1.
A way around this is to re-id the shortcuts in the fact DMs.

Its all a bit messy and something namespaces in FM alleviate. 

Mon August 26, 2024 12:24 PM

Hi @IAN HENDERSON

Thank you very much for sharing your deep insights into data module modelling techniques! 

We are using two layers of data modules (a first one for each dimension or fact table and a second one for the star schemas and their "enrichment"). And we are now facing modelling challenges when we have the need for something you call an uber data module. 

First a question about your section "shortcuts vs. references": Due to limitations of versions 11.1.7 and now 11.2.4 we are using both references and table views in star schema data modules: For each read-only reference we create an additional table view. This will show all changes of the underlying data module in the reference. And it also gives us the opportunity to change the selection/visibility of columns or to rename them. Do you expect issues with this, especially when we will introduce uber data modules and/or version 12? 

And a wish for an option regarding what you described in your section "design considerations": You have a lot of modelling options when you create uber data modules as you described. But you also have a lot of maintenance work with chances from bottom-up. Sometimes I simply want to have the star schema data modules available 1:1 in a uber data module (including references to stand-alone calculations & filters and possibly even navigation paths) with absolutely no need to break, convert or edit any of its read-only references. But with modelling options on top of the referenced objects (like new joins, customs tables and stand-alone calculations & filters). So the same result that I have when I import a package into a data module. So when I import a (star schema) data module I want to have the option to either import the way it currently is or alterntatevely the same as packages are imported. Does this seem feasible to you and do you see any chances that this option will be introduced one day? 

Best regards 

Philipp