COBOL

COBOL is responsible for the efficient, reliable, secure, and unseen day-to-day operations of the world's economy.

View Only

Back to discussions

Expand all | Collapse all

XML Parsing with XMLSS

1. XML Parsing with XMLSS

Like
Discussion Topic
Posted Fri February 27, 2015 03:36 AM

Reply
Hi,

in http://www-01.ibm.com/support/knowledgecenter/SS6SG3_4.1.0/com.ibm.entcobol.doc_4.1/MG/igymch2030.htm it is described, that when you parse an XML-Document with XML-Parse and XMLSS compiler option,"...XML-TEXT and XML-NTEXT could have a substring of the value for the CONTENT-CHARACTERS event". I know that with "COMPAT" you have this effect when you have entity references in you r content. My question is whether this "multiple conten-characters-event" with XMLSS does only happen when you parse XML-files "one segment at a time" as described in COBOL Programming Guide or whether this can coccur also when you parse an XML-file from one "string".

Thank you very much!

Sabine

lauterbachS
2. Re: XML Parsing with XMLSS

Like
Discussion Topic
Posted Mon March 16, 2015 04:51 PM

Reply
Hi Sabine.

The short answer to your question is "yes." Splitting of attribute or element character content can occur when using XMLSS to parse a single unsegmented XML string, that is without providing additional segments in response to the END-OF-INPUT XML event. That said, such splitting is quite rare in practice, because the output buffer allocated for the XMLSS parser is normally large enough compared with the XML string to be parsed.

There are two ways in which the buffer size might not be sufficient to avoid splitting. First, because there is a 64KB limit on the size of the initial buffer, splitting could occur with an XML string of several megabytes. Second, it's possible to use entity references to create a large resultant XML document from a very small input string, and thus a small allocated output buffer. The attached program splitxmp.cbl illustrates this second possibility. You can see that the entity references &b; and &c; resolve to multiple copies of the alphabet in entity a. The program is quite complicated because it also illustrates some techniques for reassembling the split content, but I think you will find it worth your time to study.

Meanwhile, Enterprise COBOL for z/OS Version 5 has introduced a facility that should make detecting splitting and then performing reassembly much easier. Quoting from the V5 summary of changes, "A new special register, XML-INFORMATION, provides a mechanism to easily determine whether the XML content delivered for an XML event is complete, or will be continued on the next event."

Further out, the COBOL team has requested that XML System Services provide the capability to reduce or avoid splitting altogether. Stay tuned!

I hope this helps. Feel free to reply if you want further information.

Nick Tindall

NickTindall
Attachments
splitxmp.cbl
3. Re: XML Parsing with XMLSS

Like
Discussion Topic
Posted Thu May 10, 2018 04:12 PM

Reply
Let me start by saying I'm new to these forums so I may not be using this space correctly. I couldn't see anywhere where I could post a question. I am looking for information regarding this "splitting" of attribute or content characters. In an attempt to use the zIIP processor to presumably improve our CPU usage and performance (I'm a programmer, not a systems person), we added PROCESS XMLPARSE(XMLSS) to a program that gets a huge amount of use in our system and consumes very large XML messages. It appears from poking around these forums that this is unnecessary, XMLSS would be the default when compiling with 6.2. Is this correct?

We have come across this "splitting" which of course is giving us undesirable results. We have upgraded to COBOL 6.2 so we have the XML-INFORMATION special register available for use. I've added a check of this for each and every attribute or content character event.

Your explanation above implies that this splitting should be rare. I found 5 occurrences in 2500 messages in one test. However, we have to check always since IBM documentation uses words like "arbitrarily" and "randomly". I'm concerned that this could offset any gain using the zIIP processor. It could possibly kick us in the rear depending upon how often we are concatenating data from these splits. We can have messages that are up to 3 million bytes.

Coding can handle the splits but are there any steps to be taken to help reduce these splits? Setting a buffer size differently? any tweaks to the parser? having the newest parser? And you also mention that there may be a fix in the works to avoid this completely. Your post is 3 years old so I don't know if you'll see this but 3 years should have resulted in a fix for this I would think, although this is still happening. i have been unable to find online any detailed explanation for how this happens, when it happens, why it happens, in order to make an educated guess as to how often we'll run into it. Your explanation above is the most useful I've found. Any pointers to documentation or forum posts that would enlighten me would be much appreciated.
Retired but back
4. Re: XML Parsing with XMLSS

Like
Discussion Topic
Posted Mon May 14, 2018 01:18 PM

Reply
@Retired but back 5845fc83-e76a-43b5-bfd9-2fda0fc296db Hi Carol,

Yes, XMLSS is the default when compiling with V6.2, as documented here.

As for whether there are any steps to be taken to help reduce splits, the record will be split if there is not enough room left in the user provided output buffer to fit the entire record. You can increase the size of the output buffer to avoid splits - this is currently the only way to avoid splits. However, presently, the runtime controls the size of this buffer, and there is no way to influence it, unless you use the XML API directly. See page 43 (page 57 in the PDF) of the XML System Services User's Guide for a description of this and for a list of the record types XMLSS will split.

Also, in case you are looking for a way to join back together when a split occurs, please see here.

In the future, you can post a new question to the forum by going to the main COBOL Cafe forum page and clicking the "Start a Topic" button, which is right above the list of topics and page numbers. New replies to old topics like this one are monitored as well, so if you find that your question is related to a question already asked, grouping them together helps future readers find all the related information at once.

Regards,

Nicole
Nicole Trudeau

COBOL

COBOL

XML Parsing with XMLSS

Discussion TopicFri February 27, 2015 03:36 AM

Discussion TopicMon March 16, 2015 04:51 PM

Discussion TopicThu May 10, 2018 04:12 PM

Discussion TopicMon May 14, 2018 01:18 PM

1. XML Parsing with XMLSS

2. Re: XML Parsing with XMLSS

Attachments

3. Re: XML Parsing with XMLSS

4. Re: XML Parsing with XMLSS

Additional
Resources

Office

Quick Links

COBOL

COBOL

XML Parsing with XMLSS

Discussion TopicFri February 27, 2015 03:36 AM

Discussion TopicMon March 16, 2015 04:51 PM

Discussion TopicThu May 10, 2018 04:12 PM

Discussion TopicMon May 14, 2018 01:18 PM

1. XML Parsing with XMLSS

2. Re: XML Parsing with XMLSS

Attachments

3. Re: XML Parsing with XMLSS

4. Re: XML Parsing with XMLSS

Related Content

XML to COBOL copybooks

Need help in using HWTJPARS in COBOL for parsing a JSON string. Any sample available?

should COBOL really support XML and JSON

IDZ split line for COBOL

Easy upgrade from COBOL 6.1 to 6.3, get JSON, UTF-8 and 64-bit features

Additional Resources

Office

Quick Links

Additional
Resources