Programming Languages on Power

 View Only

 Using XML-INTO to obtain XML sub structure

Jump to  Best Answer
  • RPG
Colin Grierson's profile image
Colin Grierson posted Thu November 07, 2024 02:44 PM

I'm working on an application which receives a variety of types of request, all wrapped in a standard way

<client>...</client><authentication>....</authentication> <request><requestType>....</requestType><requestXML>.....</requestXML></request>

I need to get the contents of the requestXML tag - which of course is an XML string. I thought this would be simple but XML-INTO seems to want to only return the values of the final data elements. I want the entire XML sub string within <requestXML>.....</requestXML>. Is there a simple way of doing this?

Thanks for your help

Colin Grierson 


#RPG
Paul Nicolay's profile image
Paul Nicolay  Best Answer

Often you don't have control of what is passed to you (and it isn't escaped as Barbara suggests).

In that case a simple %Scan/%SubSt logic will to the job... but via the XML functions it isn't an option. 


#RPG
Barbara Morris's profile image
Barbara Morris

The XML string inside the <requestXml> tags can't have any of the XML-syntax characters like < > & etc. They have to be encoded as &lt; &gt; &amp; etc.

If I code my program like this, where the "embedded" tag is an XML document, I get RNX0353 saying that the XML document doesn't match the RPG variable. The <a> tag is seen as a child of my <embedded> tag. 

dcl-s xmlstring varchar(100); 
dcl-s embedded varchar(20); 

xmlstring = '<embedded><a>hello</a></embedded>'; 
xml-into embedded %xml(xmlstring);
return;  


This version of my program works, because it doesn't see "<a>", it sees "&lt;a&gt;"

dcl-s xmlstring varchar(100); 
dcl-s embedded varchar(20); 

xmlstring = '<embedded>&lt;a&gt;hello&lt;/a&gt;</embedded>'; 
xml-into embedded %xml(xmlstring);
return;  

#RPG
Colin Grierson's profile image
Colin Grierson

Hi Paul
Scan and substitute is where I ended up too. But it felt wrong that something so basic could not be done with the standard tools and I wondered if I had missed something obvious.
Thanks for your help, Colin


#RPG
Jon Paris's profile image
Jon Paris

But in the context of XML-INTO's design, it is not "something so basic".  The op-code is intended to break XML into its individual elements - it does it very well. You are effectively expecting a toaster to fry an egg for you.

There are tools available that could do this - although I doubt they would be any simpler than a couple of scans and a substring.

Are you passing this portion of the document to another process which will decompose it? Trying to understand why you want top extract this piece.


#RPG
Daniel Gross's profile image
Daniel Gross IBM Champion

Hi Colin,

what you plan to do is not the purpose of a XML parser. You would be able to do this with SAX, but I think, its a PITA.

I would use embedded SQL and a regular expression to extract the part of the XML that I need:

exec sql values regexp_substr(
    '<client>...</client><authentication>....</authentication> <request><requestType>....</requestType><requestXML>.....</requestXML></request>',
    '<request>(.+?)<\/request>',
    1,
    1,
    'ni',
    1
) into :varCharVariable;

This should do what you want. Of course you can replace the source string also with a :HostVariable.

HTH and kind regards,

Daniel


#RPG
Jon Paris's profile image
Jon Paris

Daniel - I don't think SAX would work for the same reason that XML-INTO won't it is designed to handle individual elements.

Your SQL is basically doing substring operations and given that that is built-in to RPG I'm sure it would be slower and, to me at least, is less obvious.

If this is a process (with differering data sources) then I might be tempted to write a general purpose parser that could be used with DATA-INTO. But straight forward substring extraction is about as easy as it gets,


#RPG
Daniel Gross's profile image
Daniel Gross IBM Champion

Jon,

well - last time I programmed with SAX was in Java and its "many moons" ago - so I'm not sure, but I thought we were able to access the raw XML stream in the SAX "event handler" - but as I said, it's long ago, and my memory might not serve me right. 

The pattern matching with regular expressions is quite normal at any other platform - and it's also quite handy, because you're even able to check for contents inside the "substring" - and all that without having to code long %scan and %subst expressions. 

You can also test and develop your regular expressions on several websites (e.g. regex101.com) and your SQL interactively - not so easy with a complex %scan and %subst expression.

Regular expressions might be somehow unfamiliar for us (the seasoned RPG programmers) but it's mainstream for nearly all others. So I think of it as some sort of "future proofing" an application. 

And finally to speed/performance - we all know what Hoare said: Premature optimization is the root of all evil. So I didn't make any assumptions about performance as I don't know if it's in fact a factor. 

But if performance might be a problem, it's possible to compile the regular expression and apply it multiple times using the highly optimized C library - from an RPG programmers POV this a huge PITA but it's very fast at runtime. 

Regards,

Daniel


#RPG