What is Direct Dispatch?
Direct Dispatch is a feature of DFDL that allows the DFDL parser to resolve choice groups within a message more quickly. A choice group is a part of a message model where the content can be one of many different alternatives, known as choice branches.
The Direct Dispatch capability of DFDL is now available in the IBM Integration Bus V10.
Previously, resolving a choice required one of the following approaches:
- Speculative Parsing. The parser tries each choice branch in turn, backing out and trying the next branch if the parser can’t make the data match that branch. This algorithm of trying and failing each branch in turn means that it takes longer to parse a message that matches a later branch in the choice.
- Asserts or Discriminators. The parser tries each choice branch in turn, but can evaluate an expression somewhere along each branch that indicates whether this branch is the correct one or not. This improves performance over ‘speculative parsing’ as the parser can usually fail quicker and reject a branch sooner, but it still requires the parser to try each branch in turn, so it still takes longer to parse a message that matches a later branch in the choice.
- Initiated Content. Sometimes each choice branch has a unique string at the start of its data. If so then the string can be modeled as a DFDL initiator. The parser tries each choice branch in turn, but only has to look at the initiator to discover the correct branch. Further, if all initiators are the same length and encoding, the parser only needs to read the initiator once and can jump straight to the choice branch.
But what do you do if your data doesn’t have unique initiators? Is it still possible to jump straight to the right branch? What if the key to knowing which branch to choose is somewhere else earlier in the input message? What if you can work out which branch to choose based on looking at multiple parts of the input message?
This is where Direct Dispatch can be used. Direct Dispatch lets you provide a single expression which is evaluated as soon as the choice group is encountered and indicates which branch of the choice is present in the data. There is no speculative parsing, because the parser knows which branch is there. There is no need for initiators in the branches, as the expression lets you look at anything in the parts of the message that have already been parsed. You just need there to be something earlier in the message that indicates which choice branch to take.
So how does it work?
Each branch within the choice group defines a branch key using the dfdl:choiceBranchKey property. This is a string that uniquely identifies that branch.
The choice group defines a dispatch key using the dfdl:choiceDispatchKey property. This is an expression that returns a string, and that string is the key to resolving the choice group. As it is an expression, it has all of the power of XPath to look at one or more parts of the message that have already been processed, and calculate an answer based on that data. This can be as simple as just reading the value of a string in a message header, or a more complex expression that combines multiple pieces of information or makes decisions based on the presence or absence of other elements in the message.
When the DFDL parser is parsing the input message and reaches the choice group, it resolves the dfdl:choiceDispatchKey expression to get the key, then jumps straight to the choice branch with the matching dfdl:choiceBranchKey, knowing that it must be the right branch.
Why is it useful?
Several industry standard message formats contain choice groups where the branch to take is indicated earlier in the message. Examples such as EDIFACT and SWIFT are ‘envelope’ formats where the payload of the message can be one of many alternatives, the indication of which alternative is given in a header earlier in the message. COBOL copybooks using the REDEFINES feature of the language often carry an indicator earlier in the message that says which REDEFINES branch to take. Direct Dispatch can be used to improve performance in all of these cases.
How do I use it?
Let’s imagine you have a text based message format that contains a header, and that header contains comma separated information for things like message version, dates, and meta-data about the message such as the type of message:
version,date,type
The message may then have some sections of generic data that is in every message:
{generic information}
The message then contains the main payload or body of the data. The type of that body is controlled by the information in the header:
{content that matches the type in the header}
An example message of type ‘ORDER’ might be something like this:
3.0,2012-07-27T21:00:00,ORDER
Customer etc information provided here
ITEM:banana,PRICE:1.44,QTY:6
The scenario is that the body of the message can be one of many choices, but we know which choice it is as we are told in the message header. The example above could have a DFDL schema like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
xmlns:fmt="http://www.ibm.com/dfdl/GeneralPurposeFormat"
xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions">
<xsd:import namespace="http://www.ibm.com/dfdl/GeneralPurposeFormat" schemaLocation="IBMdefined/GeneralPurposeFormat.xsd"/>
<xsd:element ibmSchExtn:docRoot="true" name="Message">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF; %CR;%LF;">
<!– The header section of the message, containing the version, timestamp and message type –>
<xsd:element name="Header">
<xsd:complexType>
<xsd:sequence dfdl:separator=",">
<xsd:element name="Version" type="xsd:string" />
<xsd:element name="Date" type="xsd:dateTime"/>
<xsd:element name="Type" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!– A generic section of the message containing customer and other information –>
<xsd:element name="Common" type="xsd:string"/>
<!– A choice of different bodies for different message types –>
<!– The choice dispatch key returns the ‘Type’ from the header, upper-cased –>
<xsd:choice dfdl:choiceDispatchKey="{fn:upper-case(/Message/Header/Type)}">
<!– Choice branch 1: QUERY section –>
<xsd:element dfdl:choiceBranchKey="QUERY" name="Query">
<xsd:complexType>
<xsd:sequence dfdl:separator=",">
<xsd:element name="Field" type="xsd:string" maxOccurs="5"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!– Choice branch 2: QUOTE section –>
<xsd:element dfdl:choiceBranchKey="QUOTE" name="Quote">
<xsd:complexType>
<xsd:sequence dfdl:separator=",">
<xsd:element name="Field" type="xsd:string" maxOccurs="5"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!– Choice branch 3: ORDER section –>
<xsd:element dfdl:choiceBranchKey="ORDER" name="Order">
<xsd:complexType>
<xsd:sequence dfdl:separator=",">
<xsd:element name="Item" dfdl:initiator="ITEM:" type="xsd:string"/>
<xsd:element name="Price" dfdl:initiator="PRICE:" type="xsd:decimal"
dfdl:textNumberPattern="#0.###" />
<xsd:element name="Qty" dfdl:initiator="QTY:" type="xsd:int"
dfdl:textNumberPattern="#0" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<!– Choice branch 4: RETURN section –>
<xsd:element dfdl:choiceBranchKey="RETURN" name="Return">
<xsd:complexType>
<xsd:sequence dfdl:separator=",">
<xsd:element name="Field" type="xsd:string" maxOccurs="5"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format encoding="ASCII" escapeSchemeRef="" occursCountKind="implicit" ref="fmt:GeneralPurposeFormat"/>
</xsd:appinfo>
</xsd:annotation>
</xsd:schema>
You can see how the choice uses the message ‘Type’ from the ‘Header’ to get the dispatch key (line 31), by using the expression below:
{fn:upper-case(/Message/Header/Type)}
We then have each of the branches in the choice define a branch key (lines 34, 43, 52 and 65). The sample message has a ‘Type’ of ‘ORDER’ so will match the ‘Order’ branch.
When DFDL parses the message and reaches the choice, the dispatch key will be calculated and the parser will go straight to the ‘Order’ branch. For our simple example, this might not seem like much of a performance gain, but now imagine a message model with a choice that contains hundreds of branches and you can see that there are advantages in knowing exactly which branch is in the message!