Here's a cute riddle posed to me by two onstat commands. No downed server, not
problems, just this curiosity nagger:
Owing to the messages that motivated my questions on IPL complaints I was
curious to see some of this with my own investigation. To start with, I wanted
to see a comparison between the lowest log sequence number in the server. I
also wanted to see the lowest log seq number occupied by my active transactions.
To get the lowest log seq numbers, I ran this pipeline:
$ onstat -l | tail +18 | sort +3 -4 -n | head -20
(The tail +18 is to filter out physlog information.)
Yields (redacting a few rows & columns in interest of confidentiality and space):
181455778 1071 U-B---- 3077433 330:7782403
1814557e0 1072 U-B---- 3077434 318:7833603
181455848 1073 U-B---- 3077435 320:7833603
1814558b0 1074 U-B---- 3077436 330:7833603
181455918 1075 U-B---- 3077437 318:7884803
181455980 1076 U-B---- 3077438 320:7884803
1814559e8 1077 U-B---- 3077439 330:7884803
181455a50 1078 U-B---- 3077440 318:7936003
So the lowest log sequence number - before the next wraparound overwrites it -
is 3077433. OK, now lets find the lowest log sequence number that started an
active transaction. This requires splitting up the log_begin and current_logpos
by the : and then sorting:
onstat -x | awk '$5 != "-" {print}' | sed 's/:/ /g' | sort +4 -5 -n +6 -7 -n | head -20
address flags userthread locks begin_logpos current_logpos
5bff75238 A-B-- de8e085c8 3 3079258 0x4536018 3079258 0x4536050
67eb08788 A-B-- 8862d3178 3 3079258 0x7fa3018 3079258 0x7fa3050
7bc9180e8 A-B-- 6577faa68 3 3079258 0x6c940c 3079258 0x6c9444
e0c07ae48 A-B-- 7e4eeb2e8 4 3079258 0xb992018 3079258 0xb992050
38cab1878 A-B-- 5fde478b8 20 3079259 0x529e2fc 3079259 0x529e4e8
3ea67ec48 A-B-- cf713a6f8 4 3079259 0x519a738 3079259 0x519a770
So, the lowest log seq number with a BEGIN record of an open transaction is
3079258. Gee, how far is that from the lowest sequence number of the logs?
Well, 3079258 - 3077433 == 1825
Huh? Now we happen to have over 1600 log files. But that 1825 sequence number
difference is quite over the number of log files we have. This should not be
possible.
Interesting: I did it again the next morning. Different log numbers, of course,
but the difference is still 1825. That is a clue of some sort but I seem to
have left my pipe and deerstalker hat at the Diogenes Club. (For Sherlock Holms
fans. :-)
Now, let's see what I get with the SMI table a few weeks after I tried to
post the above:
select uniqid, size, is_backed_up
from syslogs
order by uniqid
uniqid size is_backed_up
3089523 51200 1
3089524 51200 1
3089525 51200 1
3089526 51200 1
3089527 51200 1
3089528 51200 1
3089529 51200 1
OK, so the oldest current log is # 3089523. Now what about the logs with the
BEGIN WORK at the beginning of oldest current transaction?
select tx_id, tx_logbeg, tx_loguniq, tx_owner
from systrans
where tx_logbeg != 0
order by tx_logbeg
tx_id tx_logbeg tx_loguniq tx_owner
1516 3091333 3091333 24643575112
4955 3091345 3091345 25797131000
2153 3091346 3091346 22306093672
1360 3091348 3091348 6528360712
5214 3091349 3091349 8450420008
2065 3091351 3091351 7499513832
1492 3091352 3091352 45083792920
3942 3091352 3091352 18111540920
3766 3091352 3091352 18111801752
754 3091353 3091353 45083717416
Now the difference is 1810, a different number than the first example but still
more than the number of logs in our server.
Have I piqued the collective curiosity of people who grok the internals of all this?
------------------------------
+-----------------------------------------------------------+
| I am pleased to report that I had no problems today. |
| I had only issues, opportunities, challenges and valuable |
| learning experiences. |
+------------------------------------------ Jacob S --------+
------------------------------