Db2

 View Only
  • 1.  Getting a list of XML Fields in an XML column

    Posted Wed October 11, 2023 04:42 PM

    Hi,

    I started a new job a couple months back. The company has a number of tables (<10) that have an XML column in them that people have been using for a variety of reasons. What they want to do is identify all of the fields in each row (per table) and count how many times those fields appear in the XML column in the table - basically doing an inventory and count of the XML fields in each table. Each row can have different fields in the column.

    The end goal is to get a list of all of the fields (and number of times they exist) per table, and determine which ones are valid. Then add those fields as columns to the table, populate the new columns and remove the XML column completely.

    The part I need help on is the Inventory and Counting of the individual fields. Does anyone have a process or method that does this?



    ------------------------------
    Douglas Kostelnik
    ------------------------------


  • 2.  RE: Getting a list of XML Fields in an XML column

    IBM Champion
    Posted Thu October 12, 2023 02:52 AM

    Hi Douglas,

    looks like you have new management not knowing their data nor the history of it. ;-)

    Theoretically you should have an XML DTD and have this assigned to the XML column to be able to verify the XML structure while inserting data. See https://www.ibm.com/docs/en/db2/11.5?topic=data-xml-parsing

    Now you seem to have a mess of structured text in the XML column and the database admin shall generate a report for management. What info from the XML element does your management want? Let them have a look at https://www.w3schools.com/xml/xml_elements.asp and let them define their goal. A customer of mine used a CLOB to store a free text form in rougly XML syntax in a table for additional agreements in an insurance policy. This is not solvable by Db2 alone nor would the output of such a report be helpful.

    The closest solution to your problem could export the XML into text and do some text extraction/search to find tag names in angle brackets < > and dump them, then do a sort and uniq -c to do the calculation.

    I'm not aware, that Db2 is storing the tag names of the deconstructed XML document in the XML storage object https://www.ibm.com/docs/en/db2/11.5?topic=storage-xml-object in a form accessible by SQL. 
    if you would have indexes on XML data, you could find some paths in syscat.indexes.

    Also annotated schema decomposition https://www.ibm.com/docs/en/db2/11.5?topic=purexml-annotated-xml-schema-decomposition could only help you, if you have a DTD, which probably management doesn't have.

    Good luck!



    ------------------------------
    Roland Schock
    IBM Champion and IBM Gold Consultant
    ------------------------------



  • 3.  RE: Getting a list of XML Fields in an XML column

    IBM Champion
    Posted Thu October 12, 2023 08:18 AM

    Douglas,

    The key to doing this are a set of XML functions which return the names of nodes in an XML document.

    Here's an example of what you can do, based on an example in the pureXML cookbook -

    SELECT T.*
    FROM mytable
    XMLTABLE('$XMLCOL//(*, @*)'
    COLUMNS 

    seq FOR ORDINALITY,
    node VARCHAR(20) PATH 'local-name(.)', 

    type VARCHAR(15) PATH
    'if (self::attribute())
    then "Attribute"
    else (if (./*)
    then "Element"
    else "Leaf-Element")'
    ) AS T
    ;

    The local-name() function is the secret sauce here.   There are a bunch of functions that give you details of the various pieces of the hierarchy.    I will try to find some time to write a more comprehensive example later.

    Incidentally the Db2 pureXML Cookbook is the go-to resource for all questions pureXML.



    ------------------------------
    Philip Nelson
    ------------------------------