Hi,
Thanks for the reply. I would be grateful for further clarifications.
Forgive my ignorance. But I think there is something more to it than just the operations. The algorithm 1 requires 10K scans through the DB. OK… maybe 10K scans through the index data (plus probably accessing the document corresponding to a selected indexed item is constant time operation?)
There is also another thing. That of flexibility. How do I check if “newelement” already exists in method 1? That is, if I want to insert this element only if an identical element doesn’t already exist, how do I incorporate that in method 1? In method 2, I see how I can do that. In fact, in method 2, I can modify the document in any number of ways with java because I have the complete DOM tree in hand and once I am done with everything, I can put it back.
OK … now I will tell you about a small test that did. I implemented a sub problem of the one I described above and checked the performance. In this test, I avoided update operations. I avoided accessing the big DB with 4.5M documents. The “keys” in my question come from a database which contains 10K documents. I simply obtained these keys and the corresponding “candidate-new-elements-to-be-inserted-into-4.5M-db” (I call it new-element-set from now on) from the 10K database in two ways: (0.1) by accessing the DB sequentially (0.2) by accessing via an indexed field. (OK now I am running into a stupid notation. I want to reserve the names methods 1 and 2 for the two that I mentioned in my first message. Sorry. Here the ‘0’ in 0.x stands for the fact that this is a preliminary test!). In each method, I saved the results as a hash map where “key” and the corresponding “new-element-set” become the key and value of the hash map. That is all I did. I elaborate the methods (0.1) and (0.2), sequential access and index-based access respectively, below:
(0.1) sequential access:
In this method, I executed the following query
“for $a in input()/root1 return {$a/key}{$a/value}”.
I then iterated through and saved the “key” and “value-set” pairs to the hash map.
(NOTE: each document in input()/root1 contains exactly one “key” element and zero or more “value” elements)
(0.2) index-based access:
In this method, I first executed the query:
“input()/root1/key”,
and then for each “key”, I executed the following query
“for $a in input()/root1 where $a/key = ‘’ return $a/value”,
and finally iterated through these values, and saved the “key” and “value-set” pairs to the hash map.
And the result is: method (0.1) takes much less time than method (0.2). In an example run (that I did one second ago to check again), method (0.1) took 47 milliseconds and method (0.2) took 829 ms.
But of course, this test didn’t include any update operations, and didn’t access the real big DB. So I don’t deny that the things may be different if I simulate my original question. I will soon try to do that, but your kind feedback and suggestions will greatly help already now.
Best regards,
Gopal.
#webMethods-Tamino-XML-Server-APIs#API-Management#webMethods