Graphical data maps, known as message maps in IBM Integration Bus or WebSphere Message Broker, offer the ability to achieve the transformation of a message without the need to write code, providing a visual image of the transformation, simplifying its implementation and ongoing maintenance. While you may not require a plan for many small and simple transformations, having a design strategy to build visual transformations will result in a more productive, better performing, and maintainable outcome.
Data model considerations
A message map uses data models over which you may have control or not. If you can influence the data model, these are some of the key points to consider:
Currently, for Graphical Data Mapping, you will need to have a complete logical model for every field of the data so that the Graphical Data Map editor can present them as the input and output logical message assembly and understand the structure, type and other key properties of the data. The logical model can be defined via a schema model, XSD, for XML and for text and binary a DFDL schema model. For JSON Data or auxiliary data in the extendable sections of the local environment, global environment, transport headers, or extension points in XML models, “xsd:any
” the Graphical Mapper allows you to use the User Defined function, or reference an equivalent XSD model. Consider the following points within these message models for your maps:
- The logical model represents the hierarchical structure of the data that is used internally by IBM Integration Bus. It defines the structured layout of the data and provides data type information for the named fields. It also defines the cardinality of the elements, that is, the data element always exist,
"mimOccurs=1"
, or it is optional, "minOccurs=0"
, or it can occur once or more, "maxOccurs"
. Additionally, the model can specify whether an element can be set to an out of range value, such as “nilled”.
- Parsers, such as XMLNSC and DFDL deal with the detail of the physical format when the data stream is passed between systems in XML or a text orientated formats respectfully.
- For XML messages, the data is self describing, meaning it carries the element names, and optionally type, in a instance of a message.
- The XMLNSC parser can run without a model if you are not enforcing validation. However Mapping will require an XML schema model since it is needed to define the full set of possible data fields and information about data values with regards to type, any value constraints, and structure in terms of elements and attributes.
When you define a logical model, it can often be tempting to define it with the maximum flexibility.
- You can configure the definition of an element by setting it’s cardinality to
"minoccurs=0"
to say it is optional and set it’s “maxoccurs=unbounded
” to say it repeats an unspecified number of times. However, I suggest that you set the cardinality to specific values whenever possible.
- You can configure an element as nillable. However, I suggest this is only done when you know that the application will need to handle this state of an element.
When you add unused flexibility into a data model, you must consider the following:
- The graphical mapping editor will provide more advanced transforms to support those points of flexibility during development.
- The mapping node runtime must account for all possible scenarios when executing the transformation. That is, it must add logic to detect whether an optional element is actually present etc.
For this reason, I encourage that when you can influence the data model, you specify such model per the requirements rather than by “just-in-case” requirements.
Designing a graphical data map
Identify the input and output to a map
Start by considering what will be the input and output of the Mapping node(s) the map will be invoked from.
For the message data, this mean knowing the root element in the relevant message model. In IBM Integration Bus, you also need to consider what will the Message Assembly contain on input and output, and which parts do you need to have access to in the transformation logic.
For the message data, you may also need to know what parser domain the message is under. In most cases this will be clear, from the up and downstream nodes. You only need to specify the domain for the output message the map will build. The output domain is defaulted to an appropriate setting when you select an element that defines your output data:
- Selecting from a XML schema, defaults to XMLNSC domain
- Selecting from a DFDL schema, sets the DFDL domain
- Selecting from a Message Set implies MRM domain.
When working with SOAP nodes, remember to determine if the Mapping node will be dealing with the whole SOAP domain message, or just the XML message from the body for a particular operation.
- An integration service operation, represented by a subflow, will provide just the relevant operation XML message, so you would select that from the XML schema model.
- However a Mapping node wired directly from a SOAPinput node with no SOAPExtract requires the input to be set to SOAP domain message. Then internally in the map, you would provide a mapping cast at the extension point of the SOAP domain message body, and maybe header, to the relevant message type from your message model.
Identify additional Message Assembly data required in the map
For other parts of the Message Assembly such as Properties folder, Local Environment and any Transport headers, you should only include these in the map if you need to read or write to them.
Note: Any part of the Message Assembly that the Mapping node does not need to be updated will be simply and efficiently be passed on to the output unmodified if it is not included in the output side of the map.
If you only require to read data from say the Local Environment in the map, you should only add it on the left hand input side.
If you add any part of the Message Assembly to the map on the right hand output side, the mapping node will create a new instance of that header or folder and only populate it with the outcome of the transforms you place in the map targeting it. That is, you need to move all existing data that you want to preserve, in addition to providing the updated data.
If the only transform you have targeting an output section of the Message Assembly is a Move directly from it’s corresponding input, consider removing it from the map and have the Mapping node simply propagate it.
Design for reuse
Next you should consider if you will want to reuse all or part of the transformation in other Mapping nodes, or even other IBM products. The Mapping node most always invoke a map that deals with a message assembly, but you can put the mapping for common parts of the message data into a submap to enable reuse.
If you are looking to achieve reuse across IBM products, your submap must be a base graphical data map and not use any IIB specific functions.
Design for simplicity and improve performance by doing so
As you would with any task, break your transformation down into sections that deal with transforming complex type structures in your logical schema model, by using structural transforms such as Local map and If transforms to provide a nested structure. This hierarchy of ‘nested’ maps using one of the structural transforms can avoid clutter in large maps. It also enables you to place conditions at the highest single level, for example if a set of child fields should only be mapped dependent on some attribute of the folder rather than repeat the conditional test on each mapping, place them all inside an If transform to improve performance.
Avoid mapping directly between child elements when their containing parent folder(s) are defined as optional, “monOccurs=0
“. By using for example a “Local map” between these parent folders, as well as providing a cleaner structure to the mapping, this can improve performance, since now the nested mapping will only be considered if the parent is present at runtime.
Another advantage of using Local map containers is that if the need for reuse comes at a later stage, you can use an action in the map editor to refactor the Local map into a Submap.
When constructing the map remember that the Graphical Data Mapper provides an Auto map facility which can be used to automatically provide simple Move or Convert mappings between input and output elements that have some level of correlation in their names. For example when mapping fields such as “ProductCode” to “PRODID”, the Auto map facility could build the mapping based on the common “prod” segment of the source and target element names. Again by structuring your mapping into nested Local maps you can greatly improve the effectiveness of Auto map, since applying it at a high level might result in many false input to output match candidates. However if you have organised the mappings, you can invoke the Auto map within a focused scope of the nested mapping enabling you to set a much lower match criteria to achieve the desired name correlation.
Design transform connections
Remember that you can connect additional inputs into a transform to make that element available via the unique variable name for use in expressions, or to provide it as another input to a nested mapping. For example if the mapping of an element A to output B is dependent on a condition of element C, you would first wire A to B and then drag C and connect it to the input of the transform. Depending on the type of transform you maybe prompted to connect C as either a primary or supplementary input. Choosing primary will mean that the type of transform will be updated to utilise the newly added additional primary input, for example a Move might become a Concat. Choosing supplementary will not change the transform, but the newly added input is now available for use in any expression on the transform and will appear as an additional input if the transform has a nested mapping. For example you might have a For each on a set of repeating Item elements, and need to access higher level none repeating “CurrencyType” element within the mapping of each “Item”, this would be achieved by connecting it as a supplementary input to the For each.
Some transforms, including the If and For each transforms allow multiple output connections. For example you might use a For each with multiple connections to the required elements within a choice group.
Also note that the choice of transform offered by the Graphical Data Map editor will be dependent on the connections. For example if you wish to populate a repeating output from some none repeating inputs, you must connect at least two inputs into a transform targeting the repeating output before the required Append transform is offered in the transform type pick list.
Design considerations for processing data
When the transformation of a data value from input to output becomes more than just a simple Move, or type conversion, you can call on the full set of standard XPath 2.0 operators and functions to manipulate the data as required. There are many great quick start guides and tutorials for XPath on the web that you can access, for example w3schools Xpath functions. The Graphical Data Map editor offers the Xpath functions as transform types in the pick list as well as in the content assist, “Ctrl-space” when editing expressions and conditions.
When the transformation of a data value requires a processing operation such as addition, subtraction, multiplication etc, use a Custom XPath transform, and build the required expression in the transform’s General properties tab using the XPath operators (See w3schools). Again you can add additional input connections to the Custom XPath transform when more elements are required in the expression.
When the transformation of a data value from input(s) to output requires it, you can also use the a Custom XPath transform to build a combination of XPath functions to be applied. For example if output B was required to be set from the value of A having all white space removed and underscores replaced with dashes, you could use a Custom XPath transform with it’s XPath expression as:
fn:replace( fn:normalize-space( $A ), ‘_’, ‘-‘ )
In some cases you might already have utility functions that perform the required manipulation of data values. If these are in Java or ESQL, you could consider using the Custom Java or Custom ESQL transforms to provide a call out from the map to these utility functions. However note that using ESQL will mean that the map is tied to running in IIB, and there will be overheads in marshaling the data value and invoking the external ESQL function for every invocation of the function made during the execution of the map. There are lower overheads in calling to an external Java method since the Mapping node’s dedicated execution engine does itself execute the map as a compiled Java class which will gain from JIT optimisation.
Summary
In this post I’ve covered some key aspects to consider during message map design. Look out for future posts in which I’ll cover aspects like implementing message routing using the mapping node, including producing alternate output messages, shedding input into multiple output messages. Also please let me know what topics you’d like to see.