Informix

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

View Only

Back to discussions

Expand all | Collapse all

Repeat or Kill a stalled operation

1. Repeat or Kill a stalled operation

Like
Jacob Salomon
Posted Sun February 23, 2020 12:33 AM

Reply
Hi Family.

My current client seems to be doing something strange in their web-based application servers. First the intuitive way I would do it at first instinct:

If DEADLCK_TIMOUT (spelling?) is set to 60 seconds, it seems perfectly legit for my own database access to be set with set lock mode to wait 5. that way, if a SELECT or DELETE is taking more than 5 seconds I would get the "unable to position" error.

A long discussion with one of the programmers indicates the app servers seem to be doing something else. Disclaimer on his behalf: It is a commercial product and AFAIK he has no access to the source code. The suspicion is that the app server is setting an internal timer (likely with setjump() and longjump() calls) as a timeout mechanism. Thus, if a DELETE takes more than that 5 seconds (or whatever they set it to) and it is not complete by then, it repeats the DELETE.

My reaction to this theory is horror: "But the lock(s) set in the process of the stalled DELETE are still in place!"

The symptom that led to said long discussion was that there are deadlock errors happening a lot. Unless some internal engine behavior has changed over the past few years, any lock lasting longer than DEADLCK_TIMOUT gets flagged as a deadlock, even when it's just a long-lasting lock.

Is it possible for a session to be locked out of a row when that the session had itself set that very lock?

My next question regards a way to avoid that possibility: If I come out of the timer and the operation is still pending, is there a way to abort that operation? I vaguely recall seeing such an option but I'm not sure if that was in an Informix manual or another. And how to address which operation to abort in an asynchronous environment that may have many pending operations? That seems to be another conundrum.

Ideas, anyone?

------------------------------
Jacob Salomon (Concise is my middle name - NOT :-)
---
Nobody goes there anymore, it's too crowded. --Attr: Yogi Berra
------------------------------

#Informix
2. RE: Repeat or Kill a stalled operation

Like
Art Kagel

IBM Champion
Posted Sun February 23, 2020 01:49 AM

Reply
Jacob:

Yes, those sessions could definitely be locking themselves out of rows they are deleting or updating by interrupting themselves on a timeout and then reissuing the operation. The solution is for the timeout handler to call sqlbreak() ESQL/C API function to stop the outstanding operation. They should also be using hte sqlbreakcallback() API function to set and capture the timeout instead of language specific or OS timers and interrupts. These are database safe ways to handle this kind of operation. (Though I agree with you that they probably should not be doing what they are doing altogether.) So, you set a timeout using sqlbreakcallback() which registers a callback function at the same time then perform the DELETE. If the timeout expires before the delete returns then the callback function is executed with a status argument value of 2 indicating a timeout. The callback function should call sqldone() to determine if the SQL operation si still active and if so then call sqlbreak() to interrupt the DELETE in the engine instead of letting it keep running. Only then can they safely issue the DELETE again. If the callback function is called with a status argument of 0 then the DELETE (or whatever) has completed without a timeout.

They absolutely should not be calling a longjump() while an SQL operation is still active (ie unless they sqlbreak() it).

Art

Art S. Kagel, President and Principal Consultant
ASK Database Management
www.askdbmgt.com

Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.

Original Message
3. RE: Repeat or Kill a stalled operation

Like
Jacob Salomon
Posted Sun February 23, 2020 12:49 PM

Reply
Hi Art.

Remembering your setup at the client where we both worked, I figured that question would spark your interest. :-)

I've looked over the pages in the ESQL/C manual on sqlbreak() and more important, sqlbreakcallback(). (I'd need to practice what I see there to really grok it!) They do not eschew (OoooH Fancy word! :-) the setjump() and longjump() the way you seem to but that's a smaller question. This follow-up may or may not be relevant but if these are multi-threaded applications, the application may have several SQL operations pending at the same time by calling sqlbreakcallback() with a 0 timeout. Like this here fellow...

Kidding aside, even if there is a separate callback function specified before each operation, when the callback function is called, how does the engine know which operation to query for the sqldone() call? Or, for that matter, which one to abort with sqlbreak()? (Never mind the difficulty of setting a separate callback function for multiple SELECTs or UPDATEs on the same table!)

(Whew! This is BIG LEAGUE stuff!)

Thanks for starting pointing me in this direction. If I do get the chance to practice it, you will have created a monster! :-)

------------------------------
Jacob Salomon
---
Nobody goes there anymore, it's too crowded. --Attr: Yogi Berra
------------------------------

Original Message
4. RE: Repeat or Kill a stalled operation

Like
Art Kagel

IBM Champion
Posted Sun February 23, 2020 02:30 PM

Reply
Multithreaded apps MUST have separate connection for each thread! Otherwise chaos!

Art

Original Message
5. RE: Repeat or Kill a stalled operation

Like
Jacob Salomon
Posted Sun February 23, 2020 05:26 PM

Reply
Super logical, Art. But that wasn't the big follow-up question. Let's forget the multi-threaded aspect of this.

Disclaimer: You know a heckuvalot more than I do at using sqlbreak(), sqlbreakcallback() and asynchronous SQL.

My point was that if one calls sqlbreakcallback() with a 0 for the timeout parameter, it calls the callback function immediately, well before the SQL is done. And that callback can jump/call all over the place. Now mad with asynchronous power, I could be doing other stuff while, say, one DELETE, one UPDATE and one SELECT are all pending.

But when I call sqldone(), and get a YES result (as opposed to -493) how would I know which of those SQL operations is indeed done?
Or would I get the -493 if ANY one is still pending?

HMMmm. The more I delve into this, I think it less likely this is the scheme in the app server. But I'd like to follow through anyway. It may make it less likely to look foolish to the developers. (As apposed to looking foolish among family here. It wouldn't be the last time but the knowledge is worth the risk.)

Essentially, I am trying to see how the authors of that app server might have programmed it and, with 20/20 hindsight, address the pitfalls of the scheme. Because something is indeed going wrong with locks under high loads.

Thanks again!

------------------------------
Jacob Salomon
---
Nobody goes there anymore, it's too crowded. --Attr: Yogi Berra
------------------------------

Original Message
6. RE: Repeat or Kill a stalled operation

Like
Art Kagel

IBM Champion
Posted Sun February 23, 2020 05:57 PM

Reply
One can always find a gun short enough to be used to shoot one's foot. Issuing another SQL in the same session is moot what the callback is intended for. Rather to allow building a service that does other stuff while it waits for an SQL to complete.

The original problem stands: that they are running the delete again without killing thed first run which is why they are getting lock puts and deadlocks.

Original Message
7. RE: Repeat or Kill a stalled operation

Like
Jacob Salomon
Posted Mon February 24, 2020 12:01 PM

Reply
Thanks much, Art.

Now I can sound more intelligent when I start probing just what's going on in the application servers.

Wish me luck! :-)

------------------------------
Jacob Salomon
---
Nobody goes there anymore, it's too crowded. --Attr: Yogi Berra
------------------------------

Original Message

Informix

Informix

Repeat or Kill a stalled operation

Jacob SalomonSun February 23, 2020 12:33 AM

Art KagelSun February 23, 2020 01:49 AM

Jacob SalomonSun February 23, 2020 12:49 PM

Art KagelSun February 23, 2020 02:30 PM

Jacob SalomonSun February 23, 2020 05:26 PM

Art KagelSun February 23, 2020 05:57 PM

Jacob SalomonMon February 24, 2020 12:01 PM

1. Repeat or Kill a stalled operation

2. RE: Repeat or Kill a stalled operation

3. RE: Repeat or Kill a stalled operation

4. RE: Repeat or Kill a stalled operation

5. RE: Repeat or Kill a stalled operation

6. RE: Repeat or Kill a stalled operation

7. RE: Repeat or Kill a stalled operation

Additional
Resources

Office

Quick Links

Informix

Informix

Repeat or Kill a stalled operation

Jacob SalomonSun February 23, 2020 12:33 AM

Art KagelSun February 23, 2020 01:49 AM

Jacob SalomonSun February 23, 2020 12:49 PM

Art KagelSun February 23, 2020 02:30 PM

Jacob SalomonSun February 23, 2020 05:26 PM

Art KagelSun February 23, 2020 05:57 PM

Jacob SalomonMon February 24, 2020 12:01 PM

1. Repeat or Kill a stalled operation

2. RE: Repeat or Kill a stalled operation

3. RE: Repeat or Kill a stalled operation

4. RE: Repeat or Kill a stalled operation

5. RE: Repeat or Kill a stalled operation

6. RE: Repeat or Kill a stalled operation

7. RE: Repeat or Kill a stalled operation

Additional Resources

Office

Quick Links

Additional
Resources