Multi-modeler, multi-module techniques
Modelling projects can be complex, with many objects, which no one modeler could model in a reasonable time frame. This paper documents one technique for modelling complex projects using multiple data modules to create a reporting application.
A data module can be used in other data modules. The objects of the data module will be displayed as shortcuts or references. This allows the modeler to reuse objects and allows multiple modelers to work on a project simultaneously.
If you can divide an envisaged model into discrete segments then you can also divide the work of modelling into discrete segments, with each segment in its own data module. Each data module could be brought into other data modules. The objects in a data module could then have relationships made to objects from other data modules.
A project following Cognos modelling best practices would lend itself to division into discrete, modular segments. This is because any object in the model would have one and only one possible relationship path between it and any other object. If an object is involved in role-playing you would use aliases. Thus, you could plug a data module into another data module.
Dimensions are natural discrete segments of a model. Each dimension consists of the tables of that dimension. If you need to create a view to incorporate several tables into a dimension-table form structure or if you already had a dimension table created in your data base then only that object would be exposed in other modules. They, in turn can be brought into a set of single fact table data modules and plugged in with whatever objects that are relevant. A single fact table data module is also a segment, the objects of which can be used in other data modules.
A data module could be used in many contexts. In the technique described in this paper, data modules, each representing an individual dimension, will be reused in a series of single fact table-based data modules. These fact table data modules will consist of a fact table and its associated dimensions. The fact table data modules would, in turn, be imported into a single data module which would be used by report and dashboard authors and power users.
Technique
As part of the Cognos proven practices, it is recommended that you follow Kimball’s Four-Step Dimensional Design Process. It is:
1. Select the business process.
2. Declare the grain.
3. Identify the dimensions.
4. Identify the facts.
In terms of modelling actions this translates into the defining of dimensions, defining their grains of detail through specifying column dependency, defining facts, and laying out of the relationship paths between the dimensions and facts.
As part of that process, you should be able to identify discrete segments of the data warehouse which you are going to model. Each of those segments could be modelled in separate data modules.
An example of sub-optimal processes will often see people dumping the entire data warehouse into a data module and then trying to sort things out. That approach can mean you will have a lot of stuff you need to wade through to do your work and, because it is one big module, only one person can work on it at a time.
The diagram below illustrates an example of the process of multiple data module modelling. On the left you will see a set of dimension data modules, each modelling a dimension. In the centre are three star-schema data modules. Each contain a fact table which contains facts of a particular subject area and those dimensions which are related to that fact. These dimensions have been imported from the pool of dimensions. On the right is a reporting application data module. It contains the three star-schema data modules.
Work relevant to a particular area can be thus confined to that area, in a data module which is specific to the area. The work performed there would not prevent parallel work in other areas. With shortcuts, and, to a lesser extent, references, modifications would be automatically propagated from the data module where the modification was performed to the data modules which consume it and then to the dashboards and reports which consume the reporting application data module.
A data module which models a dimension would look something like this:
A data module which models a star schema, which incorporates all the dimensions which belong to a single fact table would look like this:
A data module which contains several star schemas and which would be used as the data module which would be used in reports and dashboards would look like this:
I will show each of these in individual sections below.
Modelling dimensions
One logical method to divide the work is to model each dimension in its own data module. You would only need to expose the dimension in the data module as shown in this illustration below.
In this module are the five tables which make up the Employee dimension in the sample Great outdoors warehouse data base. They are Emp Employee Dim, which is the main table of the dimension, Go Branch dim and Go Region dim, and two lookup tables, Emp Termination lookup and Emp Position lookup.
Branch contains the information for branches, such as addresses and city. Go region dim has the country and regions (collections of countries) information.
In the diagram you will see relationships between the tables. The cardinality of those relationships models the hierarchy of the geographic entities. Go Region dim has a 1.N relationship to Go Branch dim. Go Branch dim has a 1.N relationship to Emp Employee Dim. The two lookup tables have 1.N relationships to Emp Employee Dim.
In addition, there is a view called Employee. It incorporates all 5 of the other tables.
The nature of the tables, where there are lookup tables and tables with multiple grains of detail, requires that they be incorporated into a view, which would then act somewhat in the manner that a properly designed dimension table would. At that point the view would need column dependency to be defined to normalize the view.
More information on this is available here:
https://www.ibm.com/docs/en/cognos-analytics/12.0.0?topic=metadata-column-dependencies
https://www.ibm.com/support/pages/multi-fact-multi-grain-and-relative-dates-defining-column-dependencies-prevent-double-counting-cognos-analytics-proven-practice-capp-document
Because the dimension needs to be normalized, the objects from the five tables were brought into the Employee view and column dependency specified to normalize the grains of dimensional detail.
The picture above gives you an idea of it, although you may need to squint a bit. The details are not particularly important in themselves for the purposes of this example, as it is just to give you an idea about how normalization through column dependency looks like.
At the top of the picture, you can see the chain of six column dependency groups, which represents the grains of the employee by manager hierarchy. Below that are the country and branch levels of the employee by region hierarchy. Both converge at the leaf level column dependency group of individual employees, which is that big thing on the right of the picture.
If you are familiar with the Framework Manager (FM) great outdoors warehouse sample model you will remember that there are two model query subjects for Employee, one modelling the employee by manager hierarchy and the other modelling the employee by region hierarchy. With column dependency you can model both in one query subject.
You will notice that all the tables except for Employee are hidden. This is because those tables are used in the view and subsequently do not need to be exposed. The hide from users setting will hide those query subjects in reporting applications. In a data module which uses another data module where some objects are hidden they will be visible, albeit with the hide from user indication. The presence of these objects does not affect the query generation.
If you are using shortcuts, only the shortcuts need to be in the consuming data module, as shown in the picture below. In it you will see the employee data module in the sources pane on the left. You will see the Employee view and below it, the hidden Employee sources tables folder. In the middle is the metadata tree of the Sales data module. I have only included Employee and, because it is a view, it is included in Sales as a shortcut.
If you use a data module in another data module then the objects which you want to use are represented either as shortcuts or references. I will talk about the shortcut versus reference question latter on.
The picture below shows the Employee data module in report studio. Only Employee is visible.
An example of the use of a data module in another data module is shown in the picture in the next section, where a star schema model based on the sales fact table is shown.
If you know that a table will be used in more than one dimension as part of role playing, then it will be a good idea to edit the identifier of each instance of that table so that the role is clearly identified. Because the identifier is used in the generated Cognos SQL, this will enable you to have a clearer tracing of each instance of that table in the SQL. This is one of the Cognos proven practices.
I have performed some tests where I did not rename the identifiers of objects which were used in multiple dimensions. I did not encounter problems, but I believe that it is probably a good idea to follow that practice.
More information about identifiers is available here:
https://www.ibm.com/docs/en/cognos-analytics/12.0.0?topic=metadata-object-properties
https://community.ibm.com/community/user/businessanalytics/blogs/ian-henderson1/2020/12/21/identifiers-and-their-use
Part of the identification of fact grains will entail the definition of column dependency in the dimensions. Another part of the identification is the definition of relationships between the dimensions and facts, where the keys of the relationship will identify in which dimension grain the fact exists. This means that you would need to have some idea about fact grains before embarking on the modelling exercise but that’s a good idea in general. You can revise column dependency as you better understand what levels of detail and what hierarchies exist in the dimensions and what keys are used for relationships to the facts.
Modelling star schemas
Once the dimensions have been modelled you will be able to start modelling star schema data modules. Each star schema consists of a fact table and its associated dimensions.
A star schema could be used as the metadata for a report on its own or as part of a combined reporting application of many star schemas. A description of such an application follows in the next section.
Changes to the dimension will be automatically propagated to consuming data modules, especially for shortcuts, in most cases. Actions such as deleting tables in a dimension or renaming identifiers will result in you needing to edit data modules which consume that data module.
The process of modelling the star schema is to import the dimensions which are associated to a fact along with that fact table into a data module. The source tables need not be added.
In the star schema you would model relative time. You can also model stand-alone calculations and filters. You will be able to import them into the uber module.
There is a known defect in CA 12.0.2 when you use the same data source in a data module as that used in a data module which is also used in the data module.
The errors will be similar to this:
Error: XQE-MSR-0017 The column "SLS_SALES_FACT.SALE_TOTAL" refers to "SALE_TOTAL", which does not exist. Edit the column so it refers to an existing item.
There are two workarounds.
One work around is to import the fact table into the data module before importing the dimension data modules.
The other workaround is the following.
1. Save the data module.
2. Re-open the data module.
3. Select the table which is generating the errors and delete it from the data module.
4. Open the sources panel and drag the table back into the data module.
This defect is fixed in 12.0.3.
Many star-schema module reporting application module aka the uber module
In most cases, the reporting / dashboard data module requirements will require multiple facts. For example, some actual fact is helpful on its own but there is much more value if you can compare the actuals versus planned values.
To accomplish this, you would need to import the star schemas which you would want to have in the reporting data module. An example of one is shown in the picture below. In it, three star schemas have been imported into a data module. They are Sales, Product forecast, and Returns. The first has sales facts. The second has planned values. The third has values related to returns of products.
The facts have been arranged by me in the centre of the relationship diagram. The dimensions radiate around them.
You will notice in the diagram that there are dashed lines between the selected shortcut to the fact table, product forecast, and the shortcuts to Time order date, Employee, and Products.
They are relationship shortcuts. Relationship shortcuts are, like all other shortcuts, pointers to an object. In this case they indicate that the targets of the shortcuts have a relationship in an underlying data module.
A relationship shortcut is read only. You can see the properties of the target relationship. Break link creates a new relationship, separate from the original target relationship.
The introduction of relationship shortcuts in CA 12.0.2 was more or less the impetus to conduct the exercises which are documented here, as it, and its related query engine changes, was basically the last element missing from successfully modelling with multiple data modules.
You will notice that each star schema will have instances of dimensions which also exist in some of the other star schemas.
For example, you will notice that the product forecast folder contains, along with the product forecast fact table, shortcuts to the Employee, Products, and Time order date dimensions. You will notice that they are also in the sales folder.
This is basically the same pattern as star schema groupings in FM. This method allows report authors and consumers to know what dimensions have relationships to a particular fact table.
You may be tempted to delete duplicate dimension shortcuts rather than having star schema folders. This would prevent you from indicating to users which dimensions are associated with any particular fact. Also, you can end up with a mess as, although each shortcut points to the same object, each shortcut has its own relationship to the fact table of the data module which it belongs to.
Design considerations
Navigation paths are not carried forward into data modules which consume other data modules. It is necessary to create the navigation paths in the data modules which you wish to expose to use by your report authors and consumers.
You can import stand-alone calculations and filters from one data module to another. The stand-alone calculations and filters thus created in the consuming data modules are copies of the original objects. This means that there is no connection between the two objects. Changes to the source object will not propagate to the instance of the object in the consuming data module. This will entail maintenance overhead because changes to a calculation or filter in the star schema would need to also be done in the consuming reporting application data modules. Similarly, you can edit the expression of the consuming data module object.
Because they are independent of their source calculation or filter, calculations and filters can be modified themselves, which could result in what is seemingly the same calculation object but different expressions.
As a consequence, you need to think through where you will create stand-alone calculations and filters.
Shortcuts versus references
When a query subject from a data module is brought into another data module the query subject will be represented as either a reference or as a shortcut.
If the object is a table, then, when it is brought into another data module, the query subject will be represented as a reference. You can convert the reference to a shortcut or alias. If the object is a view, then the query subject will always be represented as a shortcut.
I think using shortcuts is to be preferable to using references. References sometimes make queries difficult, and they require baggage such as relationships. Shortcuts point back to the original object so the original object’s relationships will be picked up automatically. They are also extensible as you can add calculations to them. Consequently, I think that converting references into shortcuts would be a good idea.
Another thing you need to keep in mind is that a reference to a reference will not pick up the relationships of the reference. If you converted the reference to a reference into a shortcut to the reference, then the relationship would be picked up.
I have found that if you want to convert references into shortcuts it is better to work your way up from the bottom. For example, assume you have a data module which has a reference in it and the data module, in turn, is used in another data module. It would be better to convert the first reference and then convert the second reference.
If you use shortcuts, then much of the baggage of related tables are no longer necessary to deal with. Just the shortcut would need to be used in the consuming data module. I tend to envision this as plugging one thing into another thing. For example, if you plug a mouse or some other external device into a computer you do not need to worry (now; not true a long time ago, before plug-and-play) about making the mouse work; it just does. Similarly, if you plug a dimension-based data module into another module the thing will work with the other objects in that data module, provided that you do the modelling exercise of defining relationships between those objects and the exposed-from-another-data- module object. The fact that the dimension exists in another data module is not relevant. It is as if the dimension was in the data module.
Thus, generally you will probably be better off converting your reference to shortcuts as they make data modules more modular.
Troubleshooting
When you convert two references to shortcuts the relationship shortcut will not appear. You need to save and reopen the data module. This is a known issue.
https://jsw.ibm.com/browse/MUI-3684
I have observed that if you use query items from multiple shortcuts to the same dimension you can encounter an error complaining about something similar to the following: XQE-PLN-0108 The data item "Product Type" cannot be pushed to any fact streams of the query. I have not established what exactly causes this to happen; it usually does not happen.
It is possible to observe repetition of the summary value of a measure in multi-fact queries, as shown in the picture below. Assuming that you have defined column dependency you probably will not observe double counting in the summary itself. This has been observed to happen if the dimensions and facts do not come from the same data source connection. If you create 1.1 relationships between the instances of a dimension the queries should present the correct values.
An illustration of such a thing is here.
Converting a reference to a shortcut will not propagate to the target modules. You need to convert them there as well. As mentioned in the shortcuts versus references section I have found that a conversion process going from the bottom level data modules up works better.
If you have a filter join optimization setting to unique or range of values for a relationship where both or either of the query subjects are shortcuts, then you can encounter a query engine XQE-GEN-0018 Query Service internal error. This problem is fixed in 12.0.4.