Db2

Db2

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only
  • 1.  Getting a list of XML Fields in an XML column

    Posted Wed October 11, 2023 04:42 PM

    Hi,

    I started a new job a couple months back. The company has a number of tables (<10) that have an XML column in them that people have been using for a variety of reasons. What they want to do is identify all of the fields in each row (per table) and count how many times those fields appear in the XML column in the table - basically doing an inventory and count of the XML fields in each table. Each row can have different fields in the column.

    The end goal is to get a list of all of the fields (and number of times they exist) per table, and determine which ones are valid. Then add those fields as columns to the table, populate the new columns and remove the XML column completely.

    The part I need help on is the Inventory and Counting of the individual fields. Does anyone have a process or method that does this?



    ------------------------------
    Douglas Kostelnik
    ------------------------------


  • 2.  RE: Getting a list of XML Fields in an XML column

    Posted Thu October 12, 2023 02:52 AM

    Hi Douglas,

    looks like you have new management not knowing their data nor the history of it. ;-)

    Theoretically you should have an XML DTD and have this assigned to the XML column to be able to verify the XML structure while inserting data. See https://www.ibm.com/docs/en/db2/11.5?topic=data-xml-parsing

    Now you seem to have a mess of structured text in the XML column and the database admin shall generate a report for management. What info from the XML element does your management want? Let them have a look at https://www.w3schools.com/xml/xml_elements.asp and let them define their goal. A customer of mine used a CLOB to store a free text form in rougly XML syntax in a table for additional agreements in an insurance policy. This is not solvable by Db2 alone nor would the output of such a report be helpful.

    The closest solution to your problem could export the XML into text and do some text extraction/search to find tag names in angle brackets < > and dump them, then do a sort and uniq -c to do the calculation.

    I'm not aware, that Db2 is storing the tag names of the deconstructed XML document in the XML storage object https://www.ibm.com/docs/en/db2/11.5?topic=storage-xml-object in a form accessible by SQL. 
    if you would have indexes on XML data, you could find some paths in syscat.indexes.

    Also annotated schema decomposition https://www.ibm.com/docs/en/db2/11.5?topic=purexml-annotated-xml-schema-decomposition could only help you, if you have a DTD, which probably management doesn't have.

    Good luck!



    ------------------------------
    Roland Schock
    IBM Champion and IBM Gold Consultant
    ------------------------------



  • 3.  RE: Getting a list of XML Fields in an XML column

    Posted 26 days ago

    I'm Back!
    Originally, I was looking at this issue, then the other DBA took on the project. He's gone now, so it is back on me.

    The gist is...
    People used the XML columns as kind of a data dump. The columns were unregulated, so people added (and maybe subtracted) XML fields over time. I'm migrating the Db2 database to Aurora PostgreSQL, and this issue is coming up again. The management and development staff want to get rid of the xml columns and add columns to the relational tables (or create new tables) and populate it with the relevant XML values. There are about 15 tables that will need the XML to be "undone". The questions are:
    What fields are in these XML columns?
    Which fields have data in them?
    Is the data relevant?
    After determining the answers to those questions, then we have to figure out where to put the data.



    ------------------------------
    Douglas Kostelnik
    ------------------------------



  • 4.  RE: Getting a list of XML Fields in an XML column

    Posted Thu October 12, 2023 08:18 AM

    Douglas,

    The key to doing this are a set of XML functions which return the names of nodes in an XML document.

    Here's an example of what you can do, based on an example in the pureXML cookbook -

    SELECT T.*
    FROM mytable
    XMLTABLE('$XMLCOL//(*, @*)'
    COLUMNS 

    seq FOR ORDINALITY,
    node VARCHAR(20) PATH 'local-name(.)', 

    type VARCHAR(15) PATH
    'if (self::attribute())
    then "Attribute"
    else (if (./*)
    then "Element"
    else "Leaf-Element")'
    ) AS T
    ;

    The local-name() function is the secret sauce here.   There are a bunch of functions that give you details of the various pieces of the hierarchy.    I will try to find some time to write a more comprehensive example later.

    Incidentally the Db2 pureXML Cookbook is the go-to resource for all questions pureXML.



    ------------------------------
    Philip Nelson
    ------------------------------



  • 5.  RE: Getting a list of XML Fields in an XML column

    Posted 25 days ago

    Good example Philip, but there's a comma missing after the table name and before the XMLTABLE. Otherwise looks good to go

    Regards

    Mark



    ------------------------------
    Mark Gillis
    Principal Consultant
    Triton Consulting
    ------------------------------



  • 6.  RE: Getting a list of XML Fields in an XML column

    Posted 25 days ago
    Thanks for that hint. I should have seen it ��
    It's working now! Thank you!

    Doug Kostelnik
    Data Design/Admin Associate
    Email: Douglas.Kostelnik.vafh6j@statefarm.com
    Cell: 813-695-6604






  • 7.  RE: Getting a list of XML Fields in an XML column

    Posted 24 days ago

    This shouldn't work - `XMLCOL` is not passed from `mytable`.
    Below is working as-is example with `name`, `parent_name` for each table element allowing you understand the document structure.

    select v.*
    from 
      table(MON_GET_DATABASE_DETAILS (-2)) p
    --table (values  '<doc docattr="f"><a1 elattr="2019-01-01">1</a1></doc>' ) p(details)
    , XMLTABLE (
    --XMLNAMESPACES(DEFAULT 'http://www.ibm.com/xmlns/prod/db2/mon'), 
     '$D//(*, @*)' 
     PASSING XMLPARSE(DOCUMENT p.DETAILS) AS "D" COLUMNS 
      ename   VARCHAR(50) PATH 'name(.)'
    , p_ename VARCHAR(50) PATH 'name(..)'
    , type    VARCHAR(10) PATH 'if (self::element()) then (if (./*) then (''NON-LEAF'') else (''LEAF'')) else (''ATTRIBUTE'')'
    , val_str VARCHAR(50) PATH 'if (./*) then () else .'
    , val_int BIGINT      PATH 'if (. castable as xs:long) then xs:long(.) else ()'
    , val_ts  TIMESTAMP   PATH 'if (. castable as xs:dateTime) then xs:dateTime(.) else ()'
    , val_dt  DATE        PATH 'if (. castable as xs:date) then xs:date(.) else ()'
    ) v

    You get something like this:

    |ENAME              |P_ENAME     |TYPE     |VAL_STR                   |VAL_INT   |VAL_TS                    |VAL_DT|
    |-------------------|------------|---------|--------------------------|----------|--------------------------|------|
    |db2_database       |            |NON-LEAF |                          |          |                          |      |
    |release            |db2_database|ATTRIBUTE|11050700                  |11,050,700|                          |      |
    |member             |db2_database|LEAF     |0                         |0         |                          |      |
    |db_status          |db2_database|LEAF     |ACTIVE                    |          |                          |      |
    |db_activation_state|db2_database|LEAF     |EXPLICIT                  |          |                          |      |
    |db_conn_time       |db2_database|LEAF     |2025-05-29T14:33:13.443404|          |2025-05-29-14.33.13.443404|      |
    |catalog_partition  |db2_database|LEAF     |0                         |0         |                          |      |
    |last_backup        |db2_database|LEAF     |2025-01-29T13:26:06.000000|          |2025-01-29-13.26.06.000000|      |
    ...




    ------------------------------
    Mark Barinstein
    ------------------------------



  • 8.  RE: Getting a list of XML Fields in an XML column

    Posted 23 days ago

    I tweaked this for my environment, and it worked perfectly!! This is exactly what I was looking for,



    ------------------------------
    Douglas Kostelnik
    ------------------------------