Migration of Cognos query engine primary key query generation handling to primary key-based column dependency
Note on terminology.
Brief overview of Column dependency.
Primary key column dependency group chain creation process.
Example of a single column primary key mapping to column dependency.
Example of a multi-column primary key mapping to column dependency.
Review and confirmation of primary key column dependency.
Query items with calculate after aggregation set on.
In Cognos Analytics 11.1.1, functionality called column dependency was implemented. This functionality guides the Cognos Analytics query engine in query planning.
In Cognos Analytics 11.1.6, a.k.a. R6, in addition to this existing column dependency functionality, new automatically generated column dependency groups will be created in your modules, supplementing your existing manually-modelled column dependency.
These automatically generated column dependency groups are based on the columns of the primary key of a data base table. If the source table contains a primary key and the query subject has not had column dependency defined in it, then they are generated in the query subject.
This document attempts to describe the change in behaviour. It also provides a brief, non-definitive, general overview of the functionality of column dependency.
Note on terminology
When a data source is used by a module, it is possible to choose what tables, views, and other similar data base objects of the data source to include in the module. Each of those objects are created in the module as a query subject.
For clarity, an attempt is made in this document to use the term table when referring to the data base object and query subject when referring to the object in Cognos Analytics modelling.
In Cognos Analytics 11.1.6, a.k.a. R6, column dependency groups will be automatically created based on each data base table’s primary key. This functionality supplements the existing manually-modelled column dependency functionality, which was introduced in 11.1.1.
They will be created in each query subject which is directly based on a table in data base schema.
The generation of primary key column dependency will take place if the source data base table has had a primary key defined for it in the data base.
If you have manually modelled column dependency on a query subject, then that modelling will take precedence. The manually created column dependency will be preserved and the primary key-based column dependency groups will not be created.
The column dependency chain will consist of a set of linked column dependency groups, each using one of the columns in the primary key in the table as its key.
All other columns of the table will usually be added to the leaf-level column dependency group as attributes of that group.
The primary key column dependency chain will be created even in query subjects which have had column dependency set, but then removed.
The primary key-based column dependency will be available to both existing and newly created models.
The new primary key-based column dependency is similar to behaviour during metadata import in Framework Manager (FM). There, if a source table had a primary key defined, a determinant would be created in the data base query subject created during metadata import. The columns of the primary key would be used as the key of a uniquely identified determinant. The other columns of the table which were not in the primary key would be added as attributes of the determinant.
Prior to R6, code existed in the Cognos query engine which would use the primary key information, which is loaded into the Cognos content store when a data base schema is loaded, as part of the query planning process.
This was not accessible to users, however.
It was decided to make it clearer that the primary key of a table was being used during query planning and SQL generation and surface it to you in the form of column dependency. This also now allows you to customize the primary key column dependency, which was previously hidden in the query engine.
Brief overview of Column dependency
Column dependency exists to specify the grains of detail for query subjects so that the items in a query are correctly normalized. This allows the Cognos query engine to correctly generate SQL for situations such as multi-fact grain.
Each column dependency group consists of a key and those attributes and measures which are associated with that key at the grain of detail defined by the key.
A key of a column dependency group has two properties. They are key constraint and key composition.
Key composition has two settings. They are independent and dependent.
If the key composition setting is set as independent, then the key will not need to have the keys from those column dependency groups which precedes it included in the generated SQL.
If the key composition setting is set as dependent, then the key will need to have the keys from those column dependency groups which precedes it included in the generated SQL.
Key constraint has the settings of unique and repeating. If the values in the data will appear only once in the query, then the key is unique.
As we will see in one of our examples below, if there are multiple columns in a table’s primary key then multiple column dependency groups will be created. Each column of the primary key will be placed in a column dependency group as its key.
Attributes have a property which determines SQL generation. The SQL generation property has two values, Minimum and Group by. This is useful in cases such as where the data is inconsistent, for example where the case of an attribute is not consistent, such as ‘United States’, ‘United states’, ‘united states’, ‘US’, and ‘USA’.
In future releases, two additional values, Average and Maximum have been added.
Attributes which are not measures will have Minimum, Maximum, and Group by.
Attributes which are measures will have Minimum, Maximum, and Average. This is not the aggregation of the measures. Attributes which are measures will be aggregated by the regular aggregate property value of the measure.
A primary key can consist of either one or many columns. Dimension tables and lookup tables commonly have a primary key. Fact tables commonly do not have primary keys.
Because a column dependency group only has one key, a multiple column primary key, which is also known as a segmented key, will need to be split into a set of column dependency groups. Each column dependency group will consist of one of the columns of the primary key and, in the case of the leaf level column dependency group, those columns of the table which are not part of the key and any calculation which has been created in the query subject.
It is possible to have multiple chains of column dependency groups, reflecting discrete hierarchies of attributes. For example, your employee dimension could have a column dependency group defining the managerial reporting hierarchy and a column dependency group defining the geographic locations of the employees.
When you do further modelling in your module or if you create reports or dashboards, you can use query items from a mix of query subjects which have and have not had column dependency defined.
Primary key column dependency group chain creation process
The primary key column dependency chain is created with the following mechanism.
When a module is opened, if a query subject is based on a data base table and if that table has primary keys then the query subject is examined to determine if column dependency has been created.
If the query subject has had column dependency manually defined, then the query subject will be passed over and the primary key-based column dependency will not be created for the query subject.
If it has not had column dependency manually defined, then the list of the columns in the table’s primary key is gathered.
If there is only one column in the primary key, then one column dependency group is created. It will use the column of the primary key as the key. The key will have a key constraint property of unique and a key composition property of independent. All other columns in the table will be deemed to be attributes. Attributes which are not measures will have their SQL operator property set to minimum.
If there are more than one column in the primary key, then multiple column dependency groups are created. They will be created based on the order in the data base in which the primary keys exist in the primary key definition for the table. The key constraint and key composition properties will be set to reflect each key’s position in the column dependency chain.
The examples below include an illustration of the creation of a single-column column dependency group and of the creation of a multiple-column column dependency group.
Each column dependency group will consist of only a key except for the leaf column dependency group, which is generated from the last key in the primary key. All other columns in the table will be added to it as attributes or measures.
If you have put calculations in your query subject, then an attempt will be made to decide where to put those calculations. It is possible that it will not get assigned to a column dependency group.
When a query item is not part of a column dependency group, a validation warning can be displayed. The warning will state that the calculations are not in any column dependency group and that if any are used then the query will not be able to use the column dependency definition in the SQL generation.
You will need to decide what to do. In most cases, this could entail manually adding the calculations to the leaf-level column dependency group. The section Query items with calculate after aggregation set on, which is found later in this document, discusses this further.
Example of a single column primary key mapping to column dependency
Illustration 1 shows a key of a dependency group created from a data base table where there is a primary key with only one column in the primary key. In this case, it is the primary key of the table GO_ORG_DIM in the GOSLDW sample data base.
The key has a key constraint setting of unique and a key composition setting of independent. This is because no other column value is necessary to identify a record in the table.
Example of a multi-column primary key mapping to column dependency
This example will show a primary key defined in a source data base table, how its analogue FM determinant is defined, and how the primary key’s metadata maps to the elements of a column dependency group chain.
This example uses the SLS_SALES_FACT table in the sample GOSLDW data base.
Usually a fact table does not have a primary key. The GOSLDW data base has primary keys defined on its fact tables as well as the dimension tables.
In this case, it allows for the demonstration of the mapping of a multiple-column primary key to a chain of column dependency groups.
The SLS_SALES_FACT table in the sample GOSLDW data base has a primary key with 8 keys. They are ORDER_DAY_KEY, ORGANIZATION_KEY, EMPLOYEE_KEY, RETAILER_KEY, RETAILER_SITE_KEY, PRODUCT_KEY, PROMOTION_KEY and ORDER_METHOD_KEY.
There are 10 other columns in the table. They are SALES_ORDER_KEY, SHIP_DAY_KEY, CLOSE_DAY_KEY, QUANTITY, UNIT_COST, UNIT_PRICE, UNIT_SALE_PRICE, GROSS_MARGIN, SALE_TOTAL, and GROSS_PROFIT.
Illustration 2 has an example of how the SLS_SALES_FACT table can appear in a data base management tool. On the left, we see that there is a primary key. On the right, we see a list of columns in the table. There is a key icon beside each of the columns which are in the primary key.
Illustration 3 shows the SLS_SALES_FACT table after it has been imported into Framework Manager.
As you can see, the 8 columns which form the primary key are created as the keys of this determinant. It has a setting of uniquely identified. The non-key columns have been added to the determinant as attributes.
Illustration 4 shows the column dependency chain which is created for SLS_SALES_FACT when GOSLDW is used in a data module.
As you can see, each column in the primary key has been added as the key to a column dependency group. The order of the dependency groups is the same as the order of the keys in the primary key. The last key in the primary key is ORDER_METHOD_KEY. This is because it happens to be the last one in the set of keys in the primary key and, as such, it is the key to the leaf-level column dependency group.
Each column dependency group is linked to the next using an arrow, indicating the flow of the column dependency chain.
The measures and attributes of the table have been added to the leaf-level column dependency group.
As you can see, in the leaf-level column dependency group, there are 3 attributes. They are SALES_ORDER_KEY, SHIP_DAY_KEY, and CLOSE_DAY_KEY. They are in the column dependency group as attributes as they are not included in the primary key. They have a SQL operator setting of minimum, which is the default.
The measures have been put here as well. In a future release, greater control over the SQL operator of measures will be provided.
The key composition setting for the leaf-level column dependency group is dependent. The key constraint setting for the leaf-level column dependency group is unique.
The key composition setting for the column dependency groups other than the leaf-level is dependent. The key constraint setting for the non-leaf-level column dependency groups is repeating.
The reason why the key composition settings are dependent is that all of the keys are necessary and thus each is dependent on the other.
The reason why the key constraint setting for the non-leaf-level column dependency groups is repeating is that the key values for these column dependency groups repeats.
Review and confirmation of primary key column dependency
When you create a module which has, as one of its data sources, a data base schema in which primary keys have been defined, you ought to examine, review, and confirm the column dependency groups which have been created. You can augment them and also create column dependency in any query subject as is appropriate.
If you have a module which was created prior to 11.1.6 and which uses as one of its data sources a data base schema in which primary keys have been defined, you ought to examine, review, and confirm the column dependency groups which have been created.
In both the cases of a newly created module and of an existing module, this serves two purposes. One is to confirm that the mapping between the primary key in the source table in the data base and the column dependency in the query subject is correct. The other is to allow you to gain some insight into the functionality of column dependency by the mapping of a data base concept which you know – the concept of a primary key – and the concept of column dependency.
These are the basic things to verify:
- Verify that the keys and their settings are correct.
- Verify that the SQL operators of your attributes are correct.
- Verify that your measures have been correctly identified.
- Verify that all query items are in a column dependency group at the appropriate grain of detail.
Validation of a module using a set of rules which attempt to detect undesirable states can be performed manually or, by default, with every action which modifies a module. Validation is also done when a module is opened. When a module is validated, several states regarding the column dependency groups and their query subjects are checked.
One test is to verify that the keys in the column dependency groups have an appropriate usage. The usage of a key should be identifier.
If that state exists, a validation error will be displayed for the query subjects where the state exists. The message will specify two possible actions to perform. The action you choose will depend on the nature of the column dependency you have specified, or, if the column dependency was auto-generated from the data base table's primary key, the nature of the primary key.
One action would be determined if the query item in question is, in fact, serving as an identifier. For example, if a column in your table is used in the primary key of the table, then it is a key. As such it ought to have a usage of identifier.
If that is the case, then you need to set the usage of the query item to be identifier. You do this by selecting the query item, viewing the properties, and changing the usage to identifier.
The other action would be determined if the query item was not actually a key. Some of the cases in which this can happen are if the query item which was the key to a column dependency group was removed from the group or the order of objects in the column dependency group was changed.
The action to be performed in that case would be to re-order the query items in the column dependency group so that the query item which is the key for the group is the first in the group.
If you select the query subject and invoke the context menu, there is an item for specify column dependency. Select it and examine the column dependency and verify that it correctly maps to the primary key.
Another test is to verify that all query items in a query subject are included in a column dependency group. This will produce a warning. If the query item which is not in a column dependency group is used in a query, then the Cognos query engine cannot know how to deal with it. The column dependency defined for the query subject is ignored. There is an exception for calculations with the aggregation setting of calculate after aggregation. See the section Query items with calculate after aggregation set on for more information. In R6, some validation of missing query items will not correctly identify that the query item is missing from column dependency if the column dependency was automatically generated. This is corrected for future releases.
When an existing data module is opened, Cognos will determine if the data sources which are used in the module use primary keys.
If a data source uses primary keys, then the relevant query subjects of the module will be upgraded with the primary key-based column dependency groups. This process is documented in the section Primary key column dependency group chain creation process, which appears earlier in this document.
The module upgrade will only be saved if you subsequently save the module.
When a report or dashboard uses a module which has at least one data source which contains primary keys, the module does not need to be manually upgraded and saved in order for the report or dashboard to use the column dependency.
Query items with calculate after aggregation set on
If a query item in a query subject has the calculate after aggregation checkbox set on, this tells the Cognos query engine to aggregate the values for any query item in the expression before performing the operations of the query item’s expression.
It would be a good practice that such expressions be created as stand alone calculations rather than as calculations in the query subject.
These query items will not be included in an automatically-generated column dependency group.
In R6, if a query item whose aggregate has been set to calculate after aggregation is included in a column dependency group, no validation error will be displayed. In future releases, an error stating that it should not be in a column dependency group will be displayed.
Thanks to Henk Cazemeir, Xiaowen He, Nigel Campbell, and Omar Khan for their helpful reviews of earlier drafts of this document.#LearnCognosAnalytics#administration
This document attempts to be accurate but errors or omissions could exist and it should not be understood to be Ex cathedra, nor should the reviewers be deemed to have given an imprimatur.