IBM webMethods Hybrid Integration

IBM webMethods Hybrid Integration

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
  • 1.  Badly performing query

    Posted Fri November 12, 2010 10:30 AM

    Hi.

    I have a query where I - among other things - try to detect duplicates.

    The dublicates have identical “AnmeldelseIdentifikator” but different tracsaction timestamps in “TransaktionsTid”.

    In each group of documents with same “AnmeldelseIdentifikator” i want to process the first and move other duplicates to another collection.

    The query works with a few documents, but with 1000 documents in the input collection fails because it takes too long to complete.

    It uses a function “tafe:getAnmeldelseIdentifikator()” for which I have created a function-index.
    It doesn’t seem to use that index, though.

    The “Kategoriser” query is somewhat complex, so here’s a small part of it, which performs just as bad:

    
    declare namespace wsa      = "http://www.w3.org/2005/08/addressing";
    declare namespace xsi      = "http://www.w3.org/2001/XMLSchema-instance";
    
    declare namespace cvr      = "http://rep.oio.dk/cvr.dk/xml/schemas/2005/03/22/";
    declare namespace kms       = "http://rep.oio.dk/kms.dk/xml/schemas/2005/03/11/";
    declare namespace cpr      = "http://rep.oio.dk/cpr.dk/xml/schemas/core/2005/03/18/";
    declare namespace xkom     = "http://rep.oio.dk/xkom.dk/xml/schemas/2005/03/15/";
    
    declare namespace etl      = "http://rep.oio.dk/tinglysning.dk/svarservice/message/integration/1/";
    declare namespace model    = "http://rep.oio.dk/tinglysning.dk/schema/model/1/";
    declare namespace snapshot = "http://rep.oio.dk/tinglysning.dk/schema/snapshot/1/";
    declare namespace msgext   = "http://rep.oio.dk/tinglysning.dk/svarservice/message/anmeldelse/1/";
    declare namespace akt      = "http://rep.oio.dk/tinglysning.dk/schema/elektroniskakt/1/";
    declare namespace msg      = "http://rep.oio.dk/tinglysning.dk/service/message/tingbog/1/";
    declare namespace msgadm   = "http://rep.oio.dk/tinglysning.dk/service/message/administration/1/";
    declare namespace anm      = "http://rep.oio.dk/tinglysning.dk/schema/anmeldelse/1/";
    
    declare namespace fase2_1  = "http://skat.dk/begrebsmodel/2009/01/15/";
    declare namespace ho       = "http://skat.dk/begrebsmodel/xml/schemas/kontekst/2007/05/31/";
    
    declare namespace aid="http://skat.dk/tafe/anmeldelseid/xml/schemas/2009/06/01/"
    declare namespace cfg = "http://skat.dk/tafe/config/xml/schemas/2009/10/13/";
    declare namespace log = "http://skat.dk/tafe/log/xml/schemas/2009/06/01/";
    declare namespace tf = "http://namespaces.softwareag.com/tamino/TaminoFunction";
    
    import module namespace tafe = "http://skat.dk/tafe";
    
    declare function local:isDublet($doc as node()) as xs:boolean {
    exists(collection('Staging2')/fase2_1:TinglysningAnmeldelseKopiSvarModtag_I
    [tafe:getAnmeldelseIdentifikator(.) = tafe:getAnmeldelseIdentifikator($doc)
    and .//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid
    < $doc//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid]
    )
    }
    
    for $doc in input()/*
    where local:isDublet($doc)
    return $doc

    I’d very much appreciate ideas on how too get this to use the index or otherwise improve performance.

    Best regards
    Claus Nielsen
    [/code]
    queries.zip (6.7 KB)
    xsd.zip (19.9 KB)


    #API-Management
    #Tamino
    #webMethods


  • 2.  RE: Badly performing query

    Posted Fri November 19, 2010 02:16 PM

    Hi Claus,
    It looks as if something in your XQUERY triggers not only one but TWO full “tablescans”.

    • and I guess only one is expected :wink:

    I tried to move the “isDublet” functionality into the main loop and it appears that the “exists()” function is the culprit!

    So perhaps somekind of “if” statement could be used instead ?

    • it is pretty hard to test without example-data …
      Finn

    #webMethods
    #Tamino
    #API-Management


  • 3.  RE: Badly performing query

    Posted Fri November 19, 2010 05:29 PM

    Hi Claus,
    as Finn already said it is difficult to assess without test data. It might be the "tafe:getAnmeldelseIdentifikator(.) = tafe:getAnmeldelseIdentifikator($doc) " predicate cannot be evaluated index-based and this leads to many more doctype scans at runtime.

    May be you can try the following in the getDublet() function:

    let $x := tafe:getAnmeldelseIdentifikator($doc)
    let $y := $doc//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid
    return (
    exists(
    collection(‘Staging2’)/fase2_1:TinglysningAnmeldelseKopiSvarModtag_I
    [tafe:getAnmeldelseIdentifikator(.) = $x
    and .//fase2_1:Kontekst/ho:HovedOplysninger/ho:TransaktionsTid < $y]
    ) )

    and avoid as many // operators as possible.

    Best Regards,

    Michael
    

    #webMethods
    #API-Management
    #Tamino