It looks like you hit the bull's eye. I found that the error code 746 with the value 139058 originates in the update trigger on the evuknor table. This hint helped me to identify the problematic record.
Děkuji moc za pomoc. :-)
Thank you for your help.
Original Message:
Sent: Wed August 07, 2024 04:01 AM
From: Tomas Zahradnik
Subject: "Bad file number" for an UPDATE command
Hi Ondrej,
are there any update triggers on 'evuknor table' which might call some other procedure(s)? Maybe the '746' (if it's an error) comes from there ...
------------------------------
-tz-
https://www.linkedin.com/in/tomzahr/
Original Message:
Sent: Tue August 06, 2024 12:58 PM
From: Ondřej Žižka
Subject: "Bad file number" for an UPDATE command
To be honest. This is a system I was assigned to recently so I don't know it in detail, but to give you a bigger picture.
The failing query is a part of cron job executed once a day.
The cron job executes command: ./dbaccess.cron <dbname> <script_name>
The command was created to bu universal, so you can change the database and the script you execute just by parameters. The content of the script is following:
. /etc/profile.d/informix.sh
export DBACCNOIGN=1
{ dbaccess -e $1 $2 2>&1;}|mail -s$1:$2 $LOGNAME
The first and second lines of the script are just the setting of the environment, the third line executes the <script_name>
against the <dbname>
and sends the result to a mailbox.
The problematic script looks like this. It is truncated to 1 command after the problematic update.
EXECUTE PROCEDURE cenimport();
EXECUTE PROCEDURE mezn_org();
EXECUTE PROCEDURE mezd_org();
-- !!! Here is the problematic update!!!
UPDATE evuknor SET preveden=1
WHERE evuknor.preveden=0 AND EXISTS (SELECT 1
FROM vestnik WHERE evuknor.vestnik=vestnik.vestnik
AND vestnik.uzaverka BETWEEN TODAY-31 AND TODAY);
-- This query works OK if I execute it separately
UPDATE evuknor SET evuknor.datzv=(CASE
WHEN evuknor.datvyd>'1991-05-01'::DATE THEN evuknor.datvyd
WHEN evuknor.datsch<'1991-12-31'::DATE THEN evuknor.datsch
WHEN evuknor.datuc IS NOT NULL THEN evuknor.datuc
WHEN evuknor.vestnik IS NOT NULL THEN (SELECT vestnik.datvyd FROM vestnik WHERE vestnik.vestnik=evuknor.vestnik)
END)
WHERE NVL(evuknor.datzv,0::DATE)!=(CASE
WHEN evuknor.datvyd>'1991-05-01'::DATE THEN evuknor.datvyd
WHEN evuknor.datsch<'1991-12-31'::DATE THEN evuknor.datsch
WHEN evuknor.datuc IS NOT NULL THEN evuknor.datuc
WHEN evuknor.vestnik IS NOT NULL THEN (SELECT vestnik.datvyd FROM vestnik WHERE vestnik.vestnik=evuknor.vestnik)
ELSE 0::DATE END);
So after all I started to see the following output in emails sending the output from this execution:
Database selected.
-----------------------
EXECUTE PROCEDURE cenimport();
(expression) 0
(expression) 0
(expression) 0
(expression)
(expression)
(expression)
(expression)
1 row(s) retrieved.
EXECUTE PROCEDURE mezn_org();
(expression) 2024-08-06 13:07:31
(expression) 2024-08-06 13:07:31
(expression) 0
(expression) 0
(expression) 168379
(expression)
(expression)
(expression)
1 row(s) retrieved.
EXECUTE PROCEDURE mezd_org();
(expression) 2024-08-06 13:07:32
(expression) 2024-08-06 13:07:32
(expression) 0
(expression) 0
(expression) 287170
(expression)
(expression)
(expression)
1 row(s) retrieved.
UPDATE evuknor SET preveden=1
WHERE evuknor.preveden=0 AND EXISTS (SELECT 1
FROM vestnik WHERE evuknor.vestnik=vestnik.vestnik
AND vestnik.uzaverka BETWEEN TODAY-31 AND TODAY);
746: 139058;
Bad file number
Error in line 9
Near character position 1
Database closed.
-----------------------
As you can see, there are procedures, but these procedures finished successfully. The 746: 139058
string comes from the UPDATE command (I think).
Apart from the things I wrote, I am still thinking about the insufficient space.
Here is an output from onstat -d
:
-------------------------
Dbspaces
address number flags fchunk nchunks pgsize flags owner name
457a2028 1 0x70001 1 1 2048 N BA informix rootdbs
47a54a58 2 0x70001 2 1 2048 N BA informix physdbs
47a54c00 3 0x60001 3 1 2048 N BA informix logsdbs
47a54da8 4 0x42001 4 1 2048 N TBA informix tmp1dbs
47a53df8 5 0x42001 5 1 2048 N TBA informix tmp2dbs
47a56028 6 0x60001 6 1 2048 N BA informix datadbs
47a561d0 7 0x60001 7 1 2048 N BA informix didxdbs
47a56378 8 0x60001 8 1 2048 N BA informix testdbs
47a56520 9 0x60001 9 1 2048 N BA informix tidxdbs
47a566c8 10 0x60011 10 1 2048 N BBA informix blobdbs
47a56870 11 0x60001 11 1 2048 N BA informix archdbs
47a56a18 12 0x42001 12 1 2048 N TBA informix tmp3dbsf
12 active, 2047 maximum
Note: For BLOB chunks, the number of free pages shown is out of date.
Run 'onstat -d update' for current stats.
Chunks
address chunk/dbs offset size free bpages flags pathname
457a21d0 1 1 0 150000 113322 PO-B-D /IDS/rootdbsf
47a57028 2 2 0 32821 0 PO-B-D /IDS/physdbsf
47a58028 3 3 0 1310773 0 PO-B-D /IDS/logsdbsf
47a59028 4 4 0 262197 262144 PO-B-- /IDS/tmp1dbsf
47a5a028 5 5 0 262197 262144 PO-B-- /IDS/tmp2dbsf
47a5b028 6 6 0 5767221 920879 PO-B-D /IDS/datadbsf
47a5c028 7 7 0 1572917 1016091 PO-B-D /IDS/didxdbsf
47a5d028 8 8 0 5767221 3502248 PO-B-D /IDS/testdbsf
47a5e028 9 9 0 1572917 1215641 PO-B-D /IDS/tidxdbsf
47a5f028 10 10 0 262197 ~198565 262197 POBB-D /IDS/blobdbsf
47a60028 11 11 0 3043986 484880 PO-B-D /IDS/archdbsf
47a61028 12 12 8 1048576 1048523 PO-B-- /IDS/tmp3dbsf
12 active, 32766 maximum
NOTE: The values in the "size" and "free" columns for DBspace chunks are
displayed in terms of "pgsize" of the DBspace to which they belong.
Expanded chunk capacity mode: always
-----------------------
Here is the related part of the instance config file.
PHYSFILE 65430
PLOG_OVERFLOW_PATH /work/logoverflow
PHYSBUFF 512
LOGFILES 320
LOGSIZE 4096
DYNAMIC_LOGS 2
LOGBUFF 256
But the update command where I see the issue is the only "data changing command" where I see that. Other updates or inserts work OK even if work with more rows than the problematic UPDATE.
I hope this explanation made it clearer. Thank you again for any help.
------------------------------
Ondřej
Original Message:
Sent: Tue August 06, 2024 07:57 AM
From: Andreas Legner
Subject: "Bad file number" for an UPDATE command
Hmm, was worth a try ...
Next: what does that "746: 139058;" line mean in your outputs?
If this is errno 746, then maybe trapping this too? But: would this mean we're in a stored procedure? And what's that second number there??
------------------------------
Andreas Legner
Original Message:
Sent: Tue August 06, 2024 07:11 AM
From: Ondřej Žižka
Subject: "Bad file number" for an UPDATE command
Hello Andreas,
Thank you for a quick comment. I run the following:
informix@norm:~/auto> onmode -I 105
informix@norm:~/auto> ./dbaccess.ecron binor daily_binor # script executing the problematic UPDATE
informix@norm:~/auto> onmode -I
In the Informix log file, I found this:
13:06:52 Verbose error trapping set, errno = 105, session_id = -1
13:08:06 Verbose error trapping stopped.
Nothing more.
Thank you.
------------------------------
Ondřej
Original Message:
Sent: Tue August 06, 2024 06:52 AM
From: Andreas Legner
Subject: "Bad file number" for an UPDATE command
Try trapping errno 105: onmode -I 105
------------------------------
Andreas Legner
Original Message:
Sent: Tue August 06, 2024 06:32 AM
From: Ondřej Žižka
Subject: "Bad file number" for an UPDATE command
Hello,
Suddenly, without changing the config, OS upgrade or so on (as far as I know) I started receiving an error for one specific SQL. The command is:
UPDATE evuknor SET preveden=1
WHERE evuknor.preveden=0 AND EXISTS (SELECT 1
FROM vestnik WHERE evuknor.vestnik=vestnik.vestnik
AND vestnik.uzaverka BETWEEN TODAY-31 AND TODAY);
746: 139058;
Bad file number
Error in line 9
Near character position 1
There are some procedure calls before this query, but they process normally.
I executed multiple infrastructure checks (https://www.ibm.com/docs/fi/informix-servers/12.10?topic=checking-verify-consistency), but did not find any issue.
I tried to run a similar command but with SELECT.
SELECT * FROM evuknor
WHERE evuknor.preveden=0 AND EXISTS (SELECT 1
FROM vestnik WHERE evuknor.vestnik=vestnik.vestnik
AND vestnik.uzaverka BETWEEN TODAY-31 AND TODAY);
There was no issue. All records were returned correctly (about 150 records).
The database info is following:
informix@norm:~> onstat -V
IBM Informix Dynamic Server Version 12.10.FC2WE Software Serial Number AAA#B000000
The documentation advises it is an OS error and I should contact Informix Technical Support, but I would like to try my best to solve it without Technical Support.
Do you please have any advice on how to troubleshoot the issue? I have not found any obvious error in OS logs.
One thing I see in the Informix log is the following:
00:54:00 Performance Advisory: Based on the current workload, the physical log might be too small to
accommodate the time it takes to flush the buffer pool.
00:54:00 Results: The server might block transactions during checkpoints.
00:54:00 Action: If transactions are blocked during the checkpoint, increase the size of the
physical log to at least 352152 KB.
May this be the issue? A small physical log? Would it produce this kind of error?
Thank you for any help or advice.
------------------------------
Ondřej
------------------------------