Db2

 View Only
  • 1.  Java UDFs to compress and uncompress BLOBs:

    Posted Tue February 21, 2023 08:58 AM
    Hi,

    I'm developing some open source Java UDFs to compress and uncompress non-inlined BLOBs: https://github.com/easydataservices/db2-compress/discussions/2

    (Obviously performance may not be fantastic, but I'm targeting an archiving solution -  write once, read little.)

    A concern is this - the UDFs appear to work fine, but the CREATE FUNCTION documentation (https://www.ibm.com/docs/en/db2/11.5?topic=statements-create-function-external-scalar) says that PARAMETER STYLE JAVA "can only be specified when LANGUAGE JAVA is used, no structured data types are specified as parameters, and no CLOB, BLOB, or DBCLOB data types are specified as return types (SQLSTATE 429B8)".

    Java development pages seem to omit mentioning this, the code seems to work well, and no SQLSTATE 429B8 is returned. So I am inclined to think this documented restriction is out-of-date?

    Any insider comments? Also interested in any concerns about using sqlj.runtime.ref.DefaultContext.getDefaultContext().getConnection() inside a JDBC Java UDF to create the output Blob. Again this seems to work fine, and using a Db2 Java Blob instead of SerialBlob is more efficient.

    Jeremy Rickard



  • 2.  RE: Java UDFs to compress and uncompress BLOBs:

    Posted Sat February 25, 2023 03:43 AM

    I've raised a case with IBM to get clarification on this. Will update once I find out more.



    ------------------------------
    Jeremy Rickard
    ------------------------------



  • 3.  RE: Java UDFs to compress and uncompress BLOBs:

    Posted Sun February 26, 2023 05:56 PM

    I am impressed by the way you detailed everything. It is really going to help me a lot.



    ------------------------------
    Sophia Ava
    ------------------------------



  • 4.  RE: Java UDFs to compress and uncompress BLOBs:

    Posted Mon February 27, 2023 07:17 AM
    Edited by Jeremy Rickard Mon February 27, 2023 07:20 AM

    Thanks Sophia,

    Before using in earnest, I strongly suggest waiting until IBM answers my Case about the page that says a Java UDF returning a LOB that is not supported (though it seems to work and does not throw the SQLSTATE that page says it should). So it looks to me like this is out-of-date documentation, but best to get confirmation. Also, I would then like to do some final testing.

    Assuming all is good, there are some interesting boundary considerations when it comes to large LOBs and compression. In the case of my own immediate work requirement, it is an archiving requirement with little apparent need for speed. So I will probably use a COMPACT LOB (not generally recommended by IBM) for maximum spaces savings, and I won't having to worry about the folliowing...

    In the case of a NOT COMPACT LOB, it may be worth considering the LOB buddy space mechanism, where Db2 divides large blocks of storage in two until the smallest size that can store the LOB is reached, Say that the compression reduces a LOB's size from 31MB to 17MB... this is of no benefit. You still need a 32MB segment to store that LOB in either the original or compressed form. So the effective compression ratio is zero, and it would be better to store the uncompressed LOB. By contrast, a compression that only reduces a LOB from 17MB to 15MB will actually save 16MB, because the data can now fit into a smaller 16MB segment.

    Insert code could take these interesting properties into account when assessing whether to store data compressed or not. However, so long as your LOB data is generally getting compression ratios of at least 2x, there will nearly always be a real space saving and there is no need to worry about this relatively complicated calculation.

    I would not use the Gzip function to compress LOB data that already fits in-line uncompressed. If you want to save space for these smaller LOBs, it is best to use Db2 compression.


    ------------------------------
    Jeremy Rickard
    ------------------------------



  • 5.  RE: Java UDFs to compress and uncompress BLOBs:

    Posted Sun February 26, 2023 05:56 PM

    Very interesting and informative post thanks for sharing .



    ------------------------------
    Evelyn Luna
    ------------------------------



  • 6.  RE: Java UDFs to compress and uncompress BLOBs:

    Posted Fri March 31, 2023 11:43 PM

    OK, I promised to post an update. It took longer than I imagined but here at last...

    Firstly, the case with IBM confirmed that Java UDFs can return LOB data types. The documentation that says otherwise is out-of-date, and will be fixed later on. Thank you to Jian in the support team and the developers who looked into this.

    Secondly, I've done a fair bit of testing, and tuned buffer sizes with dramatic improvements. A couple of conclusions:
    1) If you want to work with BLOBs larger than 64MB you will need to increase JAVA_HEAP_SZ.
    2) On my laptop running Db2 on Ubuntu, an uncompressed 25MB JSON document compresses 6.8x in about 2 seconds, and uncompresses in about 1 second. Performance is now perhaps adequate for archive databases of an appropriate design (my requirement likely to be that), but I would not suggest using this in any database where response matters.
    3) Smaller LOBs containing similar data run faster, more-or-less proportional to size.

    Lastly, a first "official" release! Release 0.1, see: https://github.com/easydataservices/db2-compress

    Please read the notes in the README, and be sure to test this works well for you before deploying.



    ------------------------------
    Jeremy Rickard
    ------------------------------