Original Message:
Sent: Tue March 12, 2024 10:41 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
the logs:
Mar 12 06:01:18 2024 227725:227732 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 59509 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 12 06:01:18 2024 227725:227732 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:59509
Mar 12 09:36:34 2024 259758:259762 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 60447 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 12 09:36:34 2024 259758:259762 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:60447
Mar 12 09:38:51 2024 18193:18193 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 49835 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 12 09:38:51 2024 18193:18193 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:49835
Mar 12 10:03:10 2024 264359:264365 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 48571 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 12 10:03:10 2024 264359:264365 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:48571
Mar 12 10:03:18 2024 18193:18193 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 46045 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 12 10:03:18 2024 18193:18193 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:46045
Mar 12 10:04:57 2024 18193:18193 3 10.1 mbdReConf: start
Mar 12 10:04:57 2024 18193:18193 3 10.1 mbdReConf: done
Mar 12 10:06:00 2024 264831:264836 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 55477 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 12 10:06:00 2024 264831:264836 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:55477
Mar 12 10:14:15 2024 18193:18193 3 10.1 userok: Client 10.240.112.57:44083 is not using <0/eauth> authentication
Mar 12 10:14:24 2024 18193:18193 3 10.1 mbdReConf: start
Mar 12 10:14:24 2024 18193:18193 3 10.1 mbdReConf: done
Mar 12 10:24:21 2024 267648:267651 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 54125 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 12 10:24:21 2024 267648:267651 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:54125
Mar 12 10:32:19 2024 18193:18193 3 10.1 mbdReConf: start
Mar 12 10:32:19 2024 18193:18193 3 10.1 mbdReConf: done
Mar 12 10:32:34 2024 18193:18193 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 39891 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 12 10:32:34 2024 18193:18193 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:39891
my lsf.conf:
[lsfadmin@host57 conf]$ cat lsf.conf
# This file is produced automatically by lsfconfig according to
# installation setup. Refer to "Administering IBM Spectrum LSF"
# before changing any parameters in this file.
# Any changes to the path names of LSF files must be reflected
# in this file. Make these changes with caution.
# After editing this file, run "lsadmin reconfig" and
LSF_SERVERDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc
LSF_BINDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/bin
LSF_LIBDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/lib
LSB_SHAREDIR=/nfsdata/work
# Configuration directories
LSF_CONFDIR=/nfsdata/conf
LSB_CONFDIR=/nfsdata/conf/lsbatch
# Daemon log messages
LSF_LOGDIR=/nfsdata/log
LSF_LOG_MASK=LOG_WARNING
# Batch mail message handling
LSB_MAILTO=!U
# Miscellaneous
LSF_AUTH=eauth
LSB_NCPU_ENFORCE=1
# General lsfinstall variables
LSF_MANDIR=/nfsdata/10.1/man
LSF_INCLUDEDIR=/nfsdata/10.1/include
LSF_MISC=/nfsdata/10.1/misc
XLSF_APPDIR=/nfsdata/10.1/misc
LSF_ENVDIR=/nfsdata/conf
# Internal variable to distinguish Default Install
LSF_DEFAULT_INSTALL=y
# Internal variable indicating operation mode
LSB_MODE=batch
# Other variables
LSF_LIM_PORT=7869
LSF_RES_PORT=6878
LSB_MBD_PORT=6881
LSB_SBD_PORT=6882
# Enable mbd query child
LSB_QUERY_PORT=6891
LSF_DYNAMIC_HOST_WAIT_TIME=60
# WARNING: Please do not delete/modify next line!!
LSF_LINK_PATH=n
# LSF_MACHDEP and LSF_INDEP are reserved to maintain
# backward compatibility with legacy lsfsetup.
# They are not used in the new lsfinstall.
LSF_INDEP=/nfsdata
LSF_MACHDEP=/nfsdata/10.1
LSF_TOP=/nfsdata
LSF_VERSION=10.1
LSF_ENABLE_EGO=N
# LSF_EGO_ENVDIR=/nfsdata/conf/ego/cluster1/kernel
EGO_WORKDIR=/nfsdata/work/cluster1/ego
LSF_LIVE_CONFDIR=/nfsdata/work/cluster1/live_confdir
# Default tuning parameters
# Enable strict resource requirement syntax to select section
LSF_STRICT_RESREQ=Y
# Automatically shuts down any daemons running on hosts that attempted to
# join the cluster, but failed to communicate within the
# LSF_DYNAMIC_HOST_WAIT_TIME period.
EGO_ENABLE_AUTO_DAEMON_SHUTDOWN=Y
# Enable bmod to modify resource limits and location of job output files for running jobs
LSB_MOD_ALL_JOBS=Y
# Reduce pim update frequency
LSF_PIM_SLEEPTIME_UPDATE=Y
LSF_PIM_LINUX_ENHANCE=Y
LSF_UNIT_FOR_LIMITS=MB
# Do not lock lim when running exclusive jobs
LSB_DISABLE_LIMLOCK_EXCL=Y
# Display the execution host in the output of the command bsub -K
LSB_SUBK_SHOW_EXEC_HOST=Y
# Do not allow lsrun by default to encourage use of bsub
LSF_DISABLE_LSRUN=Y
# Turn off RES syncup to reduce traffic to master
LSF_RES_SYNCUP_INTERVAL=0
# Add slots information to the bjobs output
LSB_BJOBS_DISPLAY_ENH=Y
LSB_QUERY_ENH=Y
#LSF_LIC_SCHED_HOST= # License scheduler host
DAEMON_SHUTDOWN_DELAY=180
LSF_PROCESS_TRACKING=Y
LSF_LINUX_CGROUP_ACCT=Y
LSB_ENABLE_HPC_ALLOCATION=Y
LSB_BJOBS_PENDREASON_LEVEL=1
LSF_MASTER_LIST="host57.secure-ic.adds"
LSF_EGO_DAEMON_CONTROL=N
[lsfadmin@host57 conf]$
Can you help me to solve my problem, Thank you!!
------------------------------
roy al nabbout
Original Message:
Sent: Tue March 12, 2024 03:54 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
No , the problem is only on the host58 and I'm sure that the lsfadmin is the same on the 2 hosts:
[lsfadmin@host57 ~]$ id lsfadmin
uid=1005(lsfadmin) gid=1006(lsfadmin) groups=1006(lsfadmin),10(wheel)
[lsfadmin@host57 ~]$
[lsfadmin@host58 ~]$ id lsfadmin
uid=1005(lsfadmin) gid=1006(lsfadmin) groups=1006(lsfadmin),10(wheel)
[lsfadmin@host58 ~]$
------------------------------
roy al nabbout
Original Message:
Sent: Mon March 11, 2024 08:45 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Do you have the same problem on host57? It seems lsfadmin account authentication fails. Are you sure lsfadmin account setting on host58 and host57 are same?
------------------------------
YI SUN
Original Message:
Sent: Mon March 11, 2024 04:47 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Thank you for these steps; they actually solved my problem, but I still have the issue with bhosts, bsub, and badmin reconfig, which still gives me 'permission denied'. I'm using lsfadmin on both servers, which have the same uid and gid and groups. Moreover, I have verified that it has full permissions on the directory.
[lsfadmin@host58 conf]$ lsid
IBM Spectrum LSF Community Edition 10.1.0.12, Jun 10 2021
Copyright IBM Corp. 1992, 2016. All rights reserved.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
My cluster name is cluster1
My master name is host57.secure-ic.adds
[lsfadmin@host58 conf]$ lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
host57.secu X86_64 Intel_E5 12.5 16 30.6G 15.5G Yes (mg)
host58.secu X86_64 Intel_E5 12.5 16 30.6G 15.5G Yes ()
[lsfadmin@host58 conf]$ bhosts
User permission denied
[lsfadmin@host58 conf]$ badmin mbdrestart
Checking configuration files ...
No errors found.
Failed: User permission denied
[lsfadmin@host58 conf]$ badmin reconfig
Checking configuration files ...
No errors found.
Failed: User permission denied
[lsfadmin@host58 conf]$ cd ..
[lsfadmin@host58 nfsdata]$ bsub < simple.sh
User permission denied. Job not submitted.
[lsfadmin@host58 nfsdata]$ ll
total 1701784
drwxr-xr-x. 12 lsfadmin lsfadmin 4096 Mar 8 14:50 10.1
drwxrwxr-x. 3 lsfadmin lsfadmin 23 Mar 7 14:29 builds
drwxr-xr-x. 5 lsfadmin lsfadmin 4096 Mar 8 14:50 conf
-rwxr-xr-x. 1 lsfadmin lsfadmin 173 Feb 19 10:40 gedit.sh
-rwxr-xr-x. 1 lsfadmin lsfadmin 27 Feb 19 10:16 hello.sh
drwxr-xr-x. 2 lsfadmin lsfadmin 4096 Mar 11 09:21 log
-rw-r--r--. 1 lsfadmin lsfadmin 417 May 27 2016 LSF_redist.txt
drwxr-xr-x. 4 lsfadmin lsfadmin 28 Jul 19 2021 lsfsce10.2.0.12-x86_64
-rwxr-xr-x. 1 lsfadmin lsfadmin 1742579740 Jan 30 13:51 lsfsce10.2.0.12-x86_64.tar.gz
drwxr-xr-x. 5 lsfadmin lsfadmin 68 Mar 8 14:48 patch
-rw-r--r--. 1 lsfadmin lsfadmin 753 Mar 8 14:49 patch.conf
drwxr-xr-x. 3 lsfadmin lsfadmin 21 Mar 8 14:50 properties
-rw-rw-r--. 1 lsfadmin lsfadmin 0 Mar 7 14:52 simple_job_628.err
-rw-rw-r--. 1 lsfadmin lsfadmin 1564 Mar 7 14:52 simple_job_628.out
-rw-rw-r--. 1 lsfadmin lsfadmin 0 Mar 11 09:21 simple_job_836.err
-rw-rw-r--. 1 lsfadmin lsfadmin 1622 Mar 11 09:21 simple_job_836.out
-rwxr-xr-x. 1 lsfadmin lsfadmin 115 Feb 19 09:14 simple.sh
-rwxr-xr-x. 1 lsfadmin lsfadmin 230 Feb 20 14:14 testp_job.sh
-rwxr-xr-x. 1 lsfadmin lsfadmin 0 Feb 13 14:22 testroy
drwxr-xr-x. 3 lsfadmin lsfadmin 22 Feb 29 15:06 work
[lsfadmin@host58 nfsdata]$
[lsfadmin@host58 nfsdata]$ tail -n 100 /nfsdata/log/mbatchd.log.host57.secure-ic.adds
Mar 8 08:51:58 2024 17990:17990 3 10.1 ncb_openLogFile: The file </nfsdata/work/cluster1/logdir/lsb.ncb.events> must be owned by <lsfadmin>, and the file permission mode must be 644 (-rw-r--r--).
Mar 8 08:51:58 2024 17990:17990 3 10.1 ncb_initLogFile: ncb_openLogFile(/nfsdata/work/cluster1/logdir/lsb.ncb.events) failed.
Mar 8 08:51:58 2024 17990:17990 3 10.1 ncb_check: ncb_initLogFile() failed.
Mar 8 08:52:22 2024 17990:17990 3 10.1 mbdReConf: start
Mar 8 08:52:22 2024 17990:17990 3 10.1 mbdReConf: done
Mar 8 10:07:54 2024 17990:17990 3 10.1 mbdReConf: start
Mar 8 10:07:55 2024 17990:17990 3 10.1 mbdReConf: done
Mar 8 10:43:47 2024 38092:38095 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 47667 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 8 10:43:47 2024 38092:38095 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:47667
Mar 8 10:47:02 2024 38600:38603 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 45305 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 8 10:47:02 2024 38600:38603 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:45305
Mar 8 10:47:34 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 52871 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 8 10:47:34 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:52871
Mar 8 11:11:57 2024 42541:42545 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 58247 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 8 11:11:57 2024 42541:42545 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:58247
Mar 8 11:16:26 2024 43240:43256 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 55819 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 8 11:16:26 2024 43240:43256 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:55819
Mar 8 11:53:39 2024 17990:17990 3 10.1 mbdReConf: start
Mar 8 11:53:40 2024 17990:17990 3 10.1 mbdReConf: done
Mar 9 06:30:30 2024 224659:224677 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 58239 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 9 06:30:30 2024 224659:224677 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:58239
Mar 11 03:55:24 2024 643598:643604 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 51361 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 03:55:24 2024 643598:643604 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:51361
Mar 11 04:18:08 2024 648872:648877 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 57725 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:18:08 2024 648872:648877 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:57725
Mar 11 04:18:55 2024 648983:648998 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 43415 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:18:55 2024 648983:648998 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:43415
Mar 11 04:19:06 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 59477 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:19:06 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:59477
Mar 11 04:22:55 2024 651370:651375 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 42539 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:22:55 2024 651370:651375 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:42539
Mar 11 04:26:01 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 47393 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:26:01 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:47393
Mar 11 04:30:21 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 50605 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:30:21 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:50605
Mar 11 04:37:04 2024 17990:17990 3 10.1 userok: Client 10.240.112.57:47177 is not using <0/eauth> authentication
Mar 11 04:37:34 2024 17990:17990 3 10.1 mbdReConf: start
Mar 11 04:37:34 2024 17990:17990 3 10.1 mbdReConf: done
Mar 11 04:37:56 2024 17990:17990 3 10.1 mbdReConf: start
Mar 11 04:37:56 2024 17990:17990 3 10.1 mbdReConf: done
Mar 11 04:38:26 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 56079 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:38:26 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:56079
Mar 11 04:38:43 2024 17990:17990 3 10.1 userok: Client 10.240.112.57:52967 is not using <0/eauth> authentication
Mar 11 04:38:53 2024 17990:17990 3 10.1 userok: Client 10.240.112.57:35959 is not using <0/eauth> authentication
Mar 11 04:39:07 2024 17990:17990 3 10.1 mbdReConf: start
Mar 11 04:39:07 2024 17990:17990 3 10.1 mbdReConf: done
Mar 11 04:39:25 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 47915 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:39:25 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:47915
Mar 11 04:43:30 2024 654742:654746 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 37575 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:43:30 2024 654742:654746 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:37575
Mar 11 04:43:44 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 38255 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:43:44 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:38255
Mar 11 04:43:52 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 55477 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:43:52 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:55477
Mar 11 04:44:05 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 37593 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 04:44:05 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:37593
Mar 11 05:54:09 2024 665495:665499 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 42147 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 05:54:09 2024 665495:665499 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:42147
Mar 11 05:55:32 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 56371 64 user mbatchd@cluster1 NULL NULL
> len=64 failed, rc=0
Mar 11 05:55:32 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:56371
[lsfadmin@host58 nfsdata]$
the logs announced that i have eauth authentication error
-----------------------------
roy al nabbout
Original Message:
Sent: Sun March 10, 2024 09:37 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
It seems you enabled the EGO during installation. Let's try following on host58.
- kill lim process
- as root run ". profile.lsf"
- env | grep EGO
- unset all EGO_* environment varaibles listed in 3)
- set LSF_ENABLE_EGO=N and comment out LSF_EGO_ENVDIR in lsf.conf
- run lsadmin limstartup
------------------------------
YI SUN
Original Message:
Sent: Fri March 08, 2024 04:00 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
In fact, for the information, host57 is the master host and 58 is a compute host, and here are the lsf.conf and ego.conf files:
lsf.conf :
cat lsf.conf
# This file is produced automatically by lsfconfig according to
# installation setup. Refer to "Administering IBM Spectrum LSF"
# before changing any parameters in this file.
# Any changes to the path names of LSF files must be reflected
# in this file. Make these changes with caution.
# After editing this file, run "lsadmin reconfig" and
# "badmin mbdrestart" to apply your changes.
LSF_SERVERDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc
LSF_BINDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/bin
LSF_LIBDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/lib
LSB_SHAREDIR=/nfsdata/work
# Configuration directories
LSF_CONFDIR=/nfsdata/conf
LSB_CONFDIR=/nfsdata/conf/lsbatch
# Daemon log messages
LSF_LOGDIR=/nfsdata/log
LSF_LOG_MASK=LOG_WARNING
# Batch mail message handling
LSB_MAILTO=!U
# Miscellaneous
LSF_AUTH=eauth
LSB_NCPU_ENFORCE=1
# General lsfinstall variables
LSF_MANDIR=/nfsdata/10.1/man
LSF_INCLUDEDIR=/nfsdata/10.1/include
LSF_MISC=/nfsdata/10.1/misc
XLSF_APPDIR=/nfsdata/10.1/misc
LSF_ENVDIR=/nfsdata/conf
# Internal variable to distinguish Default Install
LSF_DEFAULT_INSTALL=y
# Internal variable indicating operation mode
LSB_MODE=batch
# Other variables
LSF_LIM_PORT=7869
LSF_RES_PORT=6878
LSB_MBD_PORT=6881
LSB_SBD_PORT=6882
# Enable mbd query child
LSB_QUERY_PORT=6891
LSF_DYNAMIC_HOST_WAIT_TIME=60
# WARNING: Please do not delete/modify next line!!
LSF_LINK_PATH=n
# LSF_MACHDEP and LSF_INDEP are reserved to maintain
# backward compatibility with legacy lsfsetup.
# They are not used in the new lsfinstall.
LSF_INDEP=/nfsdata
LSF_MACHDEP=/nfsdata/10.1
LSF_TOP=/nfsdata
LSF_VERSION=10.1
LSF_ENABLE_EGO=Y
LSF_EGO_ENVDIR=/nfsdata/conf/ego/cluster1/kernel
EGO_WORKDIR=/nfsdata/work/cluster1/ego
LSF_LIVE_CONFDIR=/nfsdata/work/cluster1/live_confdir
# Default tuning parameters
# Enable strict resource requirement syntax to select section
LSF_STRICT_RESREQ=Y
# Automatically shuts down any daemons running on hosts that attempted to
# join the cluster, but failed to communicate within the
# LSF_DYNAMIC_HOST_WAIT_TIME period.
EGO_ENABLE_AUTO_DAEMON_SHUTDOWN=Y
# Enable bmod to modify resource limits and location of job output files for running jobs
LSB_MOD_ALL_JOBS=Y
# Reduce pim update frequency
LSF_PIM_SLEEPTIME_UPDATE=Y
LSF_PIM_LINUX_ENHANCE=Y
LSF_UNIT_FOR_LIMITS=MB
# Do not lock lim when running exclusive jobs
LSB_DISABLE_LIMLOCK_EXCL=Y
# Display the execution host in the output of the command bsub -K
LSB_SUBK_SHOW_EXEC_HOST=Y
# Do not allow lsrun by default to encourage use of bsub
LSF_DISABLE_LSRUN=Y
# Turn off RES syncup to reduce traffic to master
LSF_RES_SYNCUP_INTERVAL=0
# Add slots information to the bjobs output
LSB_BJOBS_DISPLAY_ENH=Y
LSB_QUERY_ENH=Y
#LSF_LIC_SCHED_HOST= # License scheduler host
DAEMON_SHUTDOWN_DELAY=180
LSF_PROCESS_TRACKING=Y
LSF_LINUX_CGROUP_ACCT=Y
LSB_ENABLE_HPC_ALLOCATION=Y
LSB_BJOBS_PENDREASON_LEVEL=1
LSF_MASTER_LIST="host57.secure-ic.adds"
LSF_EGO_DAEMON_CONTROL=N
[root@host58 conf]#
ego.conf:
[root@host58 conf]# cd ego
[root@host58 ego]# ls
cluster1
[root@host58 ego]# cd cluster1/
[root@host58 cluster1]# cd kernel/
[root@host58 kernel]# cat ego.conf
# $RCSfile$Revision$Date$
# EGO kernel parameters configuration file
#
# EGO master candidate host
EGO_MASTER_LIST="host57.secure-ic.adds"
# EGO daemon port number
EGO_KD_PORT=7870
EGO_PEM_PORT=7871
# EGO service directory
EGO_ESRVDIR=/nfsdata/conf/ego/cluster1/eservice
# EGO security configuration
EGO_SEC_PLUGIN=sec_ego_default
EGO_SEC_CONF=/nfsdata/conf/ego/cluster1/kernel
# EGO event configuration
#EGO_EVENT_MASK=LOG_INFO
#EGO_EVENT_PLUGIN=eventplugin_snmp[SINK=host,MIBDIRS=/nfsdata/conf/ego/cluster1/kernel/mibs]
# Parameters related to dynamic adding/removing host
# EGO_GET_CONF=LIM
EGO_CONFDIR=/nfsdata/conf/ego/cluster1/kernel
EGO_TOP=/nfsdata
[root@host58 kernel]#
logs file:
Mar 8 08:59:01 host58 systemd-logind[1231]: New session 13 of user lsfadmin.
Mar 8 08:59:01 host58 systemd[1]: Started User runtime directory /run/user/1005.
Mar 8 08:59:01 host58 systemd[1]: Starting User Manager for UID 1005...
Mar 8 08:59:01 host58 nfsrahead[18546]: setting /home/lsfadmin readahead to 128
Mar 8 08:59:02 host58 systemd[18520]: Listening on Sound System.
Mar 8 08:59:02 host58 systemd[18520]: Reached target Paths.
Mar 8 08:59:02 host58 systemd[18520]: Started Mark boot as successful after the user session has run 2 minutes.
Mar 8 08:59:02 host58 systemd[18520]: Reached target Timers.
Mar 8 08:59:02 host58 systemd[18520]: Starting D-Bus User Message Bus Socket.
Mar 8 08:59:02 host58 systemd[18520]: Listening on Multimedia System.
Mar 8 08:59:02 host58 systemd[18520]: Listening on D-Bus User Message Bus Socket.
Mar 8 08:59:02 host58 systemd[18520]: Reached target Sockets.
Mar 8 08:59:02 host58 systemd[18520]: Reached target Basic System.
Mar 8 08:59:02 host58 systemd[1]: Started User Manager for UID 1005.
Mar 8 08:59:02 host58 systemd[18520]: Starting Sound Service...
Mar 8 08:59:02 host58 systemd[1]: Started Session 13 of user lsfadmin.
Mar 8 08:59:03 host58 systemd[18520]: Started D-Bus User Message Bus.
Mar 8 08:59:03 host58 systemd[18520]: Started Sound Service.
Mar 8 08:59:03 host58 systemd[18520]: Reached target Default.
Mar 8 08:59:03 host58 systemd[18520]: Startup finished in 1.247s.
Mar 8 09:01:05 host58 systemd[18520]: Starting Mark boot as successful...
Mar 8 09:01:05 host58 systemd[18520]: Started Mark boot as successful.
Mar 8 09:58:53 host58 lim[20111]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
Mar 8 09:58:53 host58 lim[20111]: main: LIM has exited due to a fatal error.
[root@host58 kernel]#
hosts file:
[root@host58 conf]# sudo cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.240.112.58 host58
10.240.112.57 host57
[root@host58 conf]#
------------------------------
roy al nabbout
Original Message:
Sent: Thu March 07, 2024 08:55 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
log on host58 as root and make sure lim process is not running.
1) source LSF_TOP/conf/profile.lsf
2) env | egrep "LSF | EGO" to list related parameters
3) lsadmin ckconfig -v see if reports any error
4) if no error in 3), run lsadmin limstartup
5) if there is error in 3), append updated lsf.conf and ego.conf here so we can take a look again
------------------------------
YI SUN
Original Message:
Sent: Thu March 07, 2024 09:30 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
I did as you told me, but I still encounter the same problem:
[lsfadmin@host57 log]$ bhosts -l
HOST host57
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
ok 12.50 - 16 0 0 0 0 0 -
CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem slots ngpus
Total 0.0 0.0 0.0 1% 0.0 16 1 1 56G 15.5G 28.9G 16 0.0
Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M - 0.0
ngpus_shared ngpus_excl_t ngpus_excl_p ngpus_prohibited
Total 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0
gpu_shared_avg_ut gpu_shared_avg_mut gpu_mode0 gpu_mode1 gpu_mode2
Total 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0
gpu_mode3 gpu_mode4 gpu_mode5 gpu_mode6 gpu_mode7 gpu_temp0
Total 0.0 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0 0.0
gpu_temp1 gpu_temp2 gpu_temp3 gpu_temp4 gpu_temp5 gpu_temp6
Total 0.0 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0 0.0
gpu_temp7 gpu_ecc0 gpu_ecc1 gpu_ecc2 gpu_ecc3 gpu_ecc4 gpu_ecc5
Total 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0
gpu_ecc6 gpu_ecc7 gpu_ut0 gpu_ut1 gpu_ut2 gpu_ut3 gpu_ut4 gpu_ut5
Total 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
gpu_ut6 gpu_ut7 gpu_mut0 gpu_mut1 gpu_mut2 gpu_mut3 gpu_mut4
Total 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0 0.0 0.0
gpu_mut5 gpu_mut6 gpu_mut7 gpu_mtotal0 gpu_mtotal1 gpu_mtotal2
Total 0.0 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0 0.0
gpu_mtotal3 gpu_mtotal4 gpu_mtotal5 gpu_mtotal6 gpu_mtotal7
Total 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0
gpu_mused0 gpu_mused1 gpu_mused2 gpu_mused3 gpu_mused4 gpu_mused5
Total 0.0 0.0 0.0 0.0 0.0 0.0
Reserved 0.0 0.0 0.0 0.0 0.0 0.0
gpu_mused6 gpu_mused7 gpu_maxfactor
Total 0.0 0.0 0.0
Reserved 0.0 0.0 0.0
LOAD THRESHOLD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
CONFIGURED AFFINITY CPU LIST: all
HOST host58
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
closed_LIM 1.00 - 1 0 0 0 0 0 -
CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem slots ngpus
Total 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M 1 0.0
Reserved - - - - - - - - - - - - -
ngpus_shared ngpus_excl_t ngpus_excl_p ngpus_prohibited
Total 0.0 0.0 0.0 0.0
Reserved - - - -
gpu_shared_avg_ut gpu_shared_avg_mut gpu_mode0 gpu_mode1 gpu_mode2
Total 0.0 0.0 0.0 0.0 0.0
Reserved - - - - -
gpu_mode3 gpu_mode4 gpu_mode5 gpu_mode6 gpu_mode7 gpu_temp0
Total 0.0 0.0 0.0 0.0 0.0 0.0
Reserved - - - - - -
gpu_temp1 gpu_temp2 gpu_temp3 gpu_temp4 gpu_temp5 gpu_temp6
Total 0.0 0.0 0.0 0.0 0.0 0.0
Reserved - - - - - -
gpu_temp7 gpu_ecc0 gpu_ecc1 gpu_ecc2 gpu_ecc3 gpu_ecc4 gpu_ecc5
Total 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Reserved - - - - - - -
gpu_ecc6 gpu_ecc7 gpu_ut0 gpu_ut1 gpu_ut2 gpu_ut3 gpu_ut4 gpu_ut5
Total 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Reserved - - - - - - - -
gpu_ut6 gpu_ut7 gpu_mut0 gpu_mut1 gpu_mut2 gpu_mut3 gpu_mut4
Total 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Reserved - - - - - - -
gpu_mut5 gpu_mut6 gpu_mut7 gpu_mtotal0 gpu_mtotal1 gpu_mtotal2
Total 0.0 0.0 0.0 0.0 0.0 0.0
Reserved - - - - - -
gpu_mtotal3 gpu_mtotal4 gpu_mtotal5 gpu_mtotal6 gpu_mtotal7
Total 0.0 0.0 0.0 0.0 0.0
Reserved - - - - -
gpu_mused0 gpu_mused1 gpu_mused2 gpu_mused3 gpu_mused4 gpu_mused5
Total 0.0 0.0 0.0 0.0 0.0 0.0
Reserved - - - - - -
gpu_mused6 gpu_mused7 gpu_maxfactor
Total 0.0 0.0 0.0
Reserved - - -
LOAD THRESHOLD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
CONFIGURED AFFINITY CPU LIST: all
the lim on host58 is always down and i cannot start him :
[lsfadmin@host57 log]$ bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
host57 ok - 16 0 0 0 0 0
host58 closed - 1 0 0 0 0 0
[lsfadmin@host58 conf]$ lsadmin limstartup
Starting up LIM on <host58> ...... [lsfadmin@host58 conf]$
logs:
Mar 7 15:22:49 host58 systemd-logind[1231]: Session c8 logged out. Waiting for processes to exit.
Mar 7 15:22:49 host58 systemd-logind[1231]: Removed session c8.
Mar 7 15:22:54 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
Mar 7 15:22:54 host58 systemd-logind[1231]: New session c9 of user root.
Mar 7 15:22:54 host58 systemd[1]: Started Session c9 of user root.
Mar 7 15:22:54 host58 systemd-logind[1231]: Session c9 logged out. Waiting for processes to exit.
Mar 7 15:22:54 host58 systemd[1]: session-c9.scope: Succeeded.
Mar 7 15:22:54 host58 systemd-logind[1231]: Removed session c9.
Mar 7 15:23:04 host58 systemd[1]: Stopping User Manager for UID 0...
Mar 7 15:23:04 host58 systemd[4143]: Stopped target Default.
Mar 7 15:23:04 host58 systemd[4143]: Stopped target Basic System.
Mar 7 15:23:04 host58 systemd[4143]: Stopped target Timers.
Mar 7 15:23:04 host58 systemd[4143]: Stopped target Paths.
Mar 7 15:23:04 host58 systemd[4143]: Stopped target Sockets.
Mar 7 15:23:04 host58 systemd[4143]: Closed Multimedia System.
Mar 7 15:23:04 host58 systemd[4143]: Closed D-Bus User Message Bus Socket.
Mar 7 15:23:04 host58 systemd[4143]: Reached target Shutdown.
Mar 7 15:23:04 host58 systemd[4143]: Started Exit the Session.
Mar 7 15:23:04 host58 systemd[4143]: Reached target Exit the Session.
Mar 7 15:23:04 host58 systemd[1]: user@0.service: Succeeded.
Mar 7 15:23:04 host58 systemd[1]: Stopped User Manager for UID 0.
Mar 7 15:23:04 host58 systemd[1]: Stopping User runtime directory /run/user/0...
Mar 7 15:23:04 host58 systemd[1]: run-user-0.mount: Succeeded.
Mar 7 15:23:04 host58 systemd[1]: user-runtime-dir@0.service: Succeeded.
Mar 7 15:23:04 host58 systemd[1]: Stopped User runtime directory /run/user/0.
Mar 7 15:23:04 host58 systemd[1]: Removed slice User Slice of UID 0.
Mar 7 15:23:40 host58 nfsrahead[4300]: setting /home/lsfadmin readahead to 128
Mar 7 15:24:05 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
Mar 7 15:24:31 host58 nfsrahead[5097]: setting /home/lsfadmin readahead to 128
Mar 7 15:24:42 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
Mar 7 15:24:42 host58 systemd[1]: Created slice User Slice of UID 0.
Mar 7 15:24:42 host58 systemd[1]: Starting User runtime directory /run/user/0...
Mar 7 15:24:42 host58 systemd-logind[1231]: New session c10 of user root.
Mar 7 15:24:42 host58 systemd[1]: Started User runtime directory /run/user/0.
Mar 7 15:24:42 host58 systemd[1]: Starting User Manager for UID 0...
Mar 7 15:24:42 host58 systemd[5122]: Listening on Multimedia System.
Mar 7 15:24:42 host58 systemd[5122]: Starting D-Bus User Message Bus Socket.
Mar 7 15:24:42 host58 systemd[5122]: Reached target Timers.
Mar 7 15:24:42 host58 systemd[5122]: Reached target Paths.
Mar 7 15:24:42 host58 systemd[5122]: Listening on D-Bus User Message Bus Socket.
Mar 7 15:24:42 host58 systemd[5122]: Reached target Sockets.
Mar 7 15:24:42 host58 systemd[5122]: Reached target Basic System.
Mar 7 15:24:42 host58 systemd[5122]: Reached target Default.
Mar 7 15:24:42 host58 systemd[5122]: Startup finished in 137ms.
Mar 7 15:24:42 host58 systemd[1]: Started User Manager for UID 0.
Mar 7 15:24:42 host58 systemd[1]: Started Session c10 of user root.
Mar 7 15:24:42 host58 nfsrahead[5161]: setting /nfsdata readahead to 128
Mar 7 15:24:42 host58 systemd-logind[1231]: Session c10 logged out. Waiting for processes to exit.
Mar 7 15:24:42 host58 systemd[1]: session-c10.scope: Succeeded.
Mar 7 15:24:42 host58 systemd-logind[1231]: Removed session c10.
Mar 7 15:24:51 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
Mar 7 15:24:52 host58 systemd[1]: Stopping User Manager for UID 0...
Mar 7 15:24:52 host58 systemd[5122]: Stopped target Default.
Mar 7 15:24:52 host58 systemd[5122]: Stopped target Basic System.
Mar 7 15:24:52 host58 systemd[5122]: Stopped target Timers.
Mar 7 15:24:52 host58 systemd[5122]: Stopped target Sockets.
Mar 7 15:24:52 host58 systemd[5122]: Closed Multimedia System.
Mar 7 15:24:52 host58 systemd[5122]: Closed D-Bus User Message Bus Socket.
Mar 7 15:24:52 host58 systemd[5122]: Stopped target Paths.
Mar 7 15:24:52 host58 systemd[5122]: Reached target Shutdown.
Mar 7 15:24:52 host58 systemd[5122]: Started Exit the Session.
Mar 7 15:24:52 host58 systemd[5122]: Reached target Exit the Session.
Mar 7 15:24:52 host58 systemd[1]: user@0.service: Succeeded.
Mar 7 15:24:52 host58 systemd[1]: Stopped User Manager for UID 0.
Mar 7 15:24:52 host58 systemd[1]: Stopping User runtime directory /run/user/0...
Mar 7 15:24:52 host58 systemd[1]: run-user-0.mount: Succeeded.
Mar 7 15:24:52 host58 systemd[1]: user-runtime-dir@0.service: Succeeded.
Mar 7 15:24:52 host58 systemd[1]: Stopped User runtime directory /run/user/0.
Mar 7 15:24:52 host58 systemd[1]: Removed slice User Slice of UID 0.
Mar 7 15:25:15 host58 nfsrahead[5216]: setting /home/lsfadmin readahead to 128
Mar 7 15:25:35 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
Mar 7 15:26:30 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
Mar 7 15:26:30 host58 systemd[1]: Created slice User Slice of UID 0.
Mar 7 15:26:30 host58 systemd[1]: Starting User runtime directory /run/user/0...
Mar 7 15:26:30 host58 systemd-logind[1231]: New session c11 of user root.
Mar 7 15:26:30 host58 systemd[1]: Started User runtime directory /run/user/0.
Mar 7 15:26:30 host58 systemd[1]: Starting User Manager for UID 0...
Mar 7 15:26:30 host58 systemd[5369]: Reached target Paths.
Mar 7 15:26:30 host58 systemd[5369]: Listening on Multimedia System.
Mar 7 15:26:30 host58 systemd[5369]: Reached target Timers.
Mar 7 15:26:30 host58 systemd[5369]: Starting D-Bus User Message Bus Socket.
Mar 7 15:26:30 host58 systemd[5369]: Listening on D-Bus User Message Bus Socket.
Mar 7 15:26:30 host58 systemd[5369]: Reached target Sockets.
Mar 7 15:26:30 host58 systemd[5369]: Reached target Basic System.
Mar 7 15:26:30 host58 systemd[5369]: Reached target Default.
Mar 7 15:26:30 host58 systemd[5369]: Startup finished in 144ms.
Mar 7 15:26:30 host58 systemd[1]: Started User Manager for UID 0.
Mar 7 15:26:30 host58 systemd[1]: Started Session c11 of user root.
Mar 7 15:26:31 host58 systemd[1]: session-c11.scope: Succeeded.
Mar 7 15:26:31 host58 systemd-logind[1231]: Session c11 logged out. Waiting for processes to exit.
Mar 7 15:26:31 host58 systemd-logind[1231]: Removed session c11.
Mar 7 15:26:41 host58 systemd[1]: Stopping User Manager for UID 0...
Mar 7 15:26:41 host58 systemd[5369]: Stopped target Default.
Mar 7 15:26:41 host58 systemd[5369]: Stopped target Basic System.
Mar 7 15:26:41 host58 systemd[5369]: Stopped target Sockets.
Mar 7 15:26:41 host58 systemd[5369]: Stopped target Paths.
Mar 7 15:26:41 host58 systemd[5369]: Stopped target Timers.
Mar 7 15:26:41 host58 systemd[5369]: Closed D-Bus User Message Bus Socket.
Mar 7 15:26:41 host58 systemd[5369]: Closed Multimedia System.
Mar 7 15:26:41 host58 systemd[5369]: Reached target Shutdown.
Mar 7 15:26:41 host58 systemd[5369]: Started Exit the Session.
Mar 7 15:26:41 host58 systemd[5369]: Reached target Exit the Session.
Mar 7 15:26:41 host58 systemd[1]: user@0.service: Succeeded.
Mar 7 15:26:41 host58 systemd[1]: Stopped User Manager for UID 0.
Mar 7 15:26:41 host58 systemd[1]: Stopping User runtime directory /run/user/0...
Mar 7 15:26:41 host58 systemd[1]: run-user-0.mount: Succeeded.
Mar 7 15:26:41 host58 systemd[1]: user-runtime-dir@0.service: Succeeded.
Mar 7 15:26:41 host58 systemd[1]: Stopped User runtime directory /run/user/0.
Mar 7 15:26:41 host58 systemd[1]: Removed slice User Slice of UID 0.
Mar 7 15:26:42 host58 lim[6332]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
Mar 7 15:26:42 host58 lim[6332]: main: LIM has exited due to a fatal error.
Mar 7 15:27:05 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
Mar 7 15:27:05 host58 systemd[1]: Created slice User Slice of UID 0.
Mar 7 15:27:05 host58 systemd[1]: Starting User runtime directory /run/user/0...
Mar 7 15:27:05 host58 systemd-logind[1231]: New session c12 of user root.
Mar 7 15:27:05 host58 systemd[1]: Started User runtime directory /run/user/0.
Mar 7 15:27:05 host58 systemd[1]: Starting User Manager for UID 0...
Mar 7 15:27:05 host58 systemd[6348]: Starting D-Bus User Message Bus Socket.
Mar 7 15:27:05 host58 systemd[6348]: Listening on Multimedia System.
Mar 7 15:27:05 host58 systemd[6348]: Reached target Timers.
Mar 7 15:27:05 host58 systemd[6348]: Reached target Paths.
Mar 7 15:27:05 host58 systemd[6348]: Listening on D-Bus User Message Bus Socket.
Mar 7 15:27:05 host58 systemd[6348]: Reached target Sockets.
Mar 7 15:27:05 host58 systemd[6348]: Reached target Basic System.
Mar 7 15:27:05 host58 systemd[6348]: Reached target Default.
Mar 7 15:27:05 host58 systemd[6348]: Startup finished in 133ms.
Mar 7 15:27:05 host58 systemd[1]: Started User Manager for UID 0.
Mar 7 15:27:05 host58 systemd[1]: Started Session c12 of user root.
Mar 7 15:27:05 host58 lim[6408]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
Mar 7 15:27:05 host58 lim[6408]: main: LIM has exited due to a fatal error.
Mar 7 15:27:06 host58 systemd-logind[1231]: Session c12 logged out. Waiting for processes to exit.
Mar 7 15:27:41 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
Mar 7 15:27:41 host58 systemd-logind[1231]: New session c13 of user root.
Mar 7 15:27:41 host58 systemd[1]: Started Session c13 of user root.
Mar 7 15:27:53 host58 systemd-logind[1231]: Session c13 logged out. Waiting for processes to exit.
Mar 7 15:27:53 host58 systemd[1]: session-c13.scope: Succeeded.
Mar 7 15:27:53 host58 systemd-logind[1231]: Removed session c13.
Mar 7 15:29:05 host58 systemd[1]: Starting Cleanup of Temporary Directories...
Mar 7 15:29:05 host58 systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Mar 7 15:29:05 host58 systemd[1]: Started Cleanup of Temporary Directories.
Mar 7 15:33:05 host58 systemd[1]: Starting dnf makecache...
Mar 7 15:33:06 host58 dnf[6549]: Updating Subscription Management repositories.
Mar 7 15:33:07 host58 nfsrahead[6565]: setting /home/lsfadmin readahead to 128
Mar 7 15:33:10 host58 dnf[6549]: Metadata cache refreshed recently.
Mar 7 15:33:10 host58 systemd[1]: dnf-makecache.service: Succeeded.
Mar 7 15:33:10 host58 systemd[1]: Started dnf makecache.
Mar 7 15:33:28 host58 lim[6586]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
Mar 7 15:33:28 host58 lim[6586]: main: LIM has exited due to a fatal error.
Mar 7 15:33:32 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
Mar 7 15:44:16 host58 dbus-daemon[1225]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.1577' (uid=0 pid=6729 comm="sudo rm test_egosc_ " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023")
Mar 7 15:44:16 host58 systemd[1]: Starting Fingerprint Authentication Daemon...
Mar 7 15:44:17 host58 dbus-daemon[1225]: [system] Successfully activated service 'net.reactivated.Fprint'
Mar 7 15:44:17 host58 systemd[1]: Started Fingerprint Authentication Daemon.
Mar 7 15:44:19 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
Mar 7 15:44:19 host58 systemd-logind[1231]: New session c14 of user root.
Mar 7 15:44:19 host58 systemd[1]: Started Session c14 of user root.
Mar 7 15:44:19 host58 systemd-logind[1231]: Session c14 logged out. Waiting for processes to exit.
Mar 7 15:44:19 host58 systemd[1]: session-c14.scope: Succeeded.
Mar 7 15:44:19 host58 systemd-logind[1231]: Removed session c14.
Mar 7 15:44:47 host58 systemd[1]: fprintd.service: Succeeded.
Mar 7 15:46:10 host58 nfsrahead[7093]: setting /home/lsfadmin readahead to 128
Mar 7 15:46:36 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
Mar 7 15:47:38 host58 lim[7153]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
Mar 7 15:47:38 host58 lim[7153]: main: LIM has exited due to a fatal error.
Mar 7 15:50:17 host58 nfsrahead[7193]: setting /home/lsfadmin readahead to 128
Mar 7 15:50:37 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
Mar 7 15:50:46 host58 nfsrahead[7237]: setting /home/lsfadmin readahead to 128
Mar 7 15:51:31 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
Mar 7 15:53:56 host58 nfsrahead[7304]: setting /home/lsfadmin readahead to 128
Mar 7 15:54:21 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
Mar 7 15:55:19 host58 dbus-daemon[1225]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.1646' (uid=0 pid=7333 comm="sudo cat /var/log/messages " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023")
Mar 7 15:55:19 host58 systemd[1]: Starting Fingerprint Authentication Daemon...
Mar 7 15:55:19 host58 dbus-daemon[1225]: [system] Successfully activated service 'net.reactivated.Fprint'
Mar 7 15:55:19 host58 systemd[1]: Started Fingerprint Authentication Daemon.
[lsfadmin@host58 conf]$
------------------------------
roy al nabbout
Original Message:
Sent: Wed March 06, 2024 02:06 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Could you disable LSF_EGO_DAEMON_CONTROL=Y and add LSF_SERVERDIR, LSF_LIBDIR, LSF_BINDIR into lsf.conf file, then stop/start LSF service.
------------------------------
YI SUN
Original Message:
Sent: Wed March 06, 2024 05:29 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
I still encounter the same problem., can you help me please!!
ego.conf file:
[serviceit@host58 kernel]$ cat ego.conf
# $RCSfile$Revision$Date$
# EGO kernel parameters configuration file
#
# EGO master candidate host
EGO_MASTER_LIST="host57.secure-ic.adds"
# EGO daemon port number
EGO_KD_PORT=7870
EGO_PEM_PORT=7871
# EGO service directory
EGO_ESRVDIR=/nfsdata/conf/ego/cluster1/eservice
# EGO security configuration
EGO_SEC_PLUGIN=sec_ego_default
EGO_SEC_CONF=/nfsdata/conf/ego/cluster1/kernel
# EGO event configuration
#EGO_EVENT_MASK=LOG_INFO
#EGO_EVENT_PLUGIN=eventplugin_snmp[SINK=host,MIBDIRS=/nfsdata/conf/ego/cluster1/kernel/mibs]
# Parameters related to dynamic adding/removing host
# EGO_GET_CONF=LIM
EGO_CONFDIR=/nfsdata/conf/ego/cluster1/kernel
EGO_TOP=/nfsdata
LSF_SERVERDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc
LSF_BINDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/bin
LSF_LIBDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/lib
LSF_ENVDIR=/nfsdata/conf
the error message:
------------------------------
roy al nabbout
Original Message:
Sent: Fri March 01, 2024 03:03 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
After sourcing LSF profile, run env | grep LSF to check if LSF_SERVERDIR, LSF_BINDIR, and LSF_LIBDIR environment variables are set. The error usually indicates those three variables are not set. You also can manually add them in lsf.conf file.
------------------------------
YI SUN
Original Message:
Sent: Fri March 01, 2024 04:08 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
I did as you told me, but I still encounter the same problem.
the configuration of lsf.conf:
[serviceit@host58 conf]$ cat lsf.conf
# This file is produced automatically by lsfconfig according to
# installation setup. Refer to "Administering IBM Spectrum LSF"
# before changing any parameters in this file.
# Any changes to the path names of LSF files must be reflected
# in this file. Make these changes with caution.
# After editing this file, run "lsadmin reconfig" and
# "badmin mbdrestart" to apply your changes.
LSB_SHAREDIR=/nfsdata/work
# Configuration directories
LSF_CONFDIR=/nfsdata/conf
LSB_CONFDIR=/nfsdata/conf/lsbatch
# Daemon log messages
LSF_LOGDIR=/nfsdata/log
LSF_LOG_MASK=LOG_WARNING
# Batch mail message handling
LSB_MAILTO=!U
# Miscellaneous
LSF_AUTH=eauth
LSB_NCPU_ENFORCE=1
# General lsfinstall variables
LSF_MANDIR=/nfsdata/10.1/man
LSF_INCLUDEDIR=/nfsdata/10.1/include
LSF_MISC=/nfsdata/10.1/misc
XLSF_APPDIR=/nfsdata/10.1/misc
LSF_ENVDIR=/nfsdata/conf
# Internal variable to distinguish Default Install
LSF_DEFAULT_INSTALL=y
# Internal variable indicating operation mode
LSB_MODE=batch
# Other variables
LSF_LIM_PORT=7873
LSF_RES_PORT=6878
LSB_MBD_PORT=6881
LSB_SBD_PORT=6882
# Enable mbd query child
LSB_QUERY_PORT=6891
LSF_DYNAMIC_HOST_WAIT_TIME=60
# WARNING: Please do not delete/modify next line!!
LSF_LINK_PATH=n
# LSF_MACHDEP and LSF_INDEP are reserved to maintain
# backward compatibility with legacy lsfsetup.
# They are not used in the new lsfinstall.
LSF_INDEP=/nfsdata
LSF_MACHDEP=/nfsdata/10.1
LSF_TOP=/nfsdata
LSF_VERSION=10.1
LSF_ENABLE_EGO=Y
LSF_EGO_ENVDIR=/nfsdata/conf/ego/cluster1/kernel
EGO_WORKDIR=/nfsdata/work/cluster1/ego
LSF_LIVE_CONFDIR=/nfsdata/work/cluster1/live_confdir
# Default tuning parameters
# Enable strict resource requirement syntax to select section
LSF_STRICT_RESREQ=Y
# Automatically shuts down any daemons running on hosts that attempted to
# join the cluster, but failed to communicate within the
# LSF_DYNAMIC_HOST_WAIT_TIME period.
EGO_ENABLE_AUTO_DAEMON_SHUTDOWN=Y
# Enable bmod to modify resource limits and location of job output files for running jobs
LSB_MOD_ALL_JOBS=Y
# Reduce pim update frequency
LSF_PIM_SLEEPTIME_UPDATE=Y
LSF_PIM_LINUX_ENHANCE=Y
LSF_UNIT_FOR_LIMITS=MB
# Do not lock lim when running exclusive jobs
LSB_DISABLE_LIMLOCK_EXCL=Y
# Display the execution host in the output of the command bsub -K
LSB_SUBK_SHOW_EXEC_HOST=Y
# Do not allow lsrun by default to encourage use of bsub
LSF_DISABLE_LSRUN=Y
# Turn off RES syncup to reduce traffic to master
LSF_RES_SYNCUP_INTERVAL=0
# Add slots information to the bjobs output
LSB_BJOBS_DISPLAY_ENH=Y
LSB_QUERY_ENH=Y
#LSF_LIC_SCHED_HOST= # License scheduler host
DAEMON_SHUTDOWN_DELAY=180
LSF_PROCESS_TRACKING=Y
LSF_LINUX_CGROUP_ACCT=Y
LSB_ENABLE_HPC_ALLOCATION=Y
LSB_BJOBS_PENDREASON_LEVEL=1
LSF_MASTER_LIST="host57"
LSF_EGO_DAEMON_CONTROL=Y
Can you help me plz?
------------------------------
roy al nabbout
Original Message:
Sent: Thu February 29, 2024 02:03 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
It seems LSF environment issue. Before run lsid, you can try "source $LSF_TOP/conf/cshrc.lsf" or ". $LSF_TOP/conf/profile.lsf) to set up LSF environment in the shell session. Here LSF_TOP is the top directory of your LSF installation.
------------------------------
YI SUN
Original Message:
Sent: Thu February 29, 2024 04:03 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
i have this problem on the host58 but everything is working normal on the host57 , can you help me plz?
------------------------------
roy al nabbout
Original Message:
Sent: Wed February 28, 2024 01:06 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
You can use chmod u+s on $LSF_BINDIR/bctrld (owned by root account) as workaround.
------------------------------
YI SUN
Original Message:
Sent: Wed February 28, 2024 05:48 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
I have this problem, i cannot startup the lim on my lsf .
Can you help me plz!!!
------------------------------
roy al nabbout
Original Message:
Sent: Fri February 16, 2024 11:07 AM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
It doesn't seem you have resolved host name resolution issue as previously mentioned on host58. As same user I guess no problem for you to submit job and run hosts.
------------------------------
YI SUN
Original Message:
Sent: Fri February 16, 2024 11:00 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
when i try to submit a job i receive this error message and i always have the same error when i run bhosts on my compute node .
------------------------------
roy al nabbout
Original Message:
Sent: Fri February 16, 2024 10:56 AM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
You can do both on the compute node.
------------------------------
YI SUN
Original Message:
Sent: Fri February 16, 2024 09:29 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
If I've configured a host as a compute host, am I able to submit jobs to this host, or can I only execute jobs on it?
------------------------------
roy al nabbout
Original Message:
Sent: Thu February 15, 2024 10:58 AM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
This more looks like host name resolution issue. Try following.
- change host58.secure-ic.adds to host58 in lsf.cluster file
- create a file "hosts" under LSF_TOP/conf with following entries (or you can add them in /etc/hosts on host57 and host58)
<ip> host57 host57.secure-ic.adds
<ip> host58 host58.secure-ic.adds
- restart LSF services on host57 and host58
------------------------------
YI SUN
Original Message:
Sent: Thu February 15, 2024 05:38 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Everything works on host57, but on host58, I have a problem where when I run `bhosts`, it tells me user permission denied.
------------------------------
roy al nabbout
Original Message:
Sent: Wed February 14, 2024 12:19 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Do you use host name pattern in lsf.cluster file? Maybe attach your cluster file here to take a look.
------------------------------
YI SUN
Original Message:
Sent: Wed February 14, 2024 09:01 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
I did as you instructed, but now it's telling me that "host 57 not defined". However, when I run "lsid", it shows that it recognizes it, and it's also correctly defined in the file lsf.cluster.cluster1.
------------------------------
roy al nabbout
Original Message:
Sent: Tue February 13, 2024 07:37 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
I got a bit time today to try LSF CE. It works on RHEL8.8. My suggestion is to comment out following lines in hostsetup script, then run hostsetup again.
#get_lsf_edition "$LSF_ENTITLEMENT_FILE"
#if [ "$?" != "0" -a "$LSF_OFFERING" != "COMMUNITY" -a "$LSF_OFFERING" != "WORKGROUP" -a "$LSF_OFFERING" != "HPS" -a "$LSF_OFFERING" != "VARIANT" ]; then
#E LSF_EDITION="Unknown"
#fi
[root@syirhel881 install]# ./hostsetup --top="/sratch/support/syi1/lsfcefp12" --boot="y"
Logging installation sequence in /sratch/support/syi1/lsfcefp12/log/Install.log
------------------------------------------------------------
L S F H O S T S E T U P U T I L I T Y
------------------------------------------------------------
This script sets up local host (LSF server, client or slave) environment.
Setting up LSF server host "syirhel881" ...
Checking LSF installation for host "syirhel881.fyre.ibm.com" ... Done
Created symlink /etc/systemd/system/multi-user.target.wants/lsfd.service → /usr/lib/systemd/system/lsfd.service.
Installing LSF service scripts on host "syirhel881.fyre.ibm.com" ... Done
LSF service ports are defined in /sratch/support/syi1/lsfcefp12/conf/lsf.conf.
Checking LSF service ports definition on host "syirhel881.fyre.ibm.com" ... Done
[Tue Feb 13 16:25:48 PST 2024:get_lsf_edition:ERROR_1021]
"/sratch/support/syi1/lsfcefp12/conf/lsf.entitlement" does not exist or is not readable.
... Setting up LSF server host "syirhel881" is done
... LSF host setup is done.
[root@syirhel881 install]# vi hostsetup
[root@syirhel881 install]# ./hostsetup --top="/sratch/support/syi1/lsfcefp12" --boot="y"
Logging installation sequence in /sratch/support/syi1/lsfcefp12/log/Install.log
------------------------------------------------------------
L S F H O S T S E T U P U T I L I T Y
------------------------------------------------------------
This script sets up local host (LSF server, client or slave) environment.
Setting up LSF server host "syirhel881" ...
Checking LSF installation for host "syirhel881.fyre.ibm.com" ... Done
Installing LSF service scripts on host "syirhel881.fyre.ibm.com" ... Done
LSF service ports are defined in /sratch/support/syi1/lsfcefp12/conf/lsf.conf.
Checking LSF service ports definition on host "syirhel881.fyre.ibm.com" ... Done
[Tue Feb 13 16:27:40 PST 2024:get_lsf_edition:ERROR_1021]
"/sratch/support/syi1/lsfcefp12/conf/lsf.entitlement" does not exist or is not readable.
... Setting up LSF server host "syirhel881" is done
... LSF host setup is done.
[root@syirhel881 install]# vi hostsetup
[root@syirhel881 install]# ./hostsetup --top="/sratch/support/syi1/lsfcefp12" --boot="y"
Logging installation sequence in /sratch/support/syi1/lsfcefp12/log/Install.log
------------------------------------------------------------
L S F H O S T S E T U P U T I L I T Y
------------------------------------------------------------
This script sets up local host (LSF server, client or slave) environment.
Setting up LSF server host "syirhel881" ...
Checking LSF installation for host "syirhel881.fyre.ibm.com" ... Done
Installing LSF service scripts on host "syirhel881.fyre.ibm.com" ... Done
LSF service ports are defined in /sratch/support/syi1/lsfcefp12/conf/lsf.conf.
Checking LSF service ports definition on host "syirhel881.fyre.ibm.com" ... Done
... Setting up LSF server host "syirhel881" is done
... LSF host setup is done.
[root@syirhel881 install]# systemctl status lsfd
● lsfd.service - IBM Spectrum LSF
Loaded: loaded (/usr/lib/systemd/system/lsfd.service; enabled; vendor preset: disabled)
Active: inactive (dead)
[root@syirhel881 install]# systemctl start lsfd
[root@syirhel881 install]# lsid
^C
[root@syirhel881 install]# systemctl status lsfd
● lsfd.service - IBM Spectrum LSF
Loaded: loaded (/usr/lib/systemd/system/lsfd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2024-02-13 16:31:08 PST; 11s ago
Process: 57564 ExecStart=/sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/lsf_dae>
Process: 57561 ExecStartPre=/bin/bash -c (timer=12; while (( $timer )); do if [ ! -d "/sratch/sup>
Tasks: 14 (limit: 49023)
Memory: 163.2M
CGroup: /system.slice/lsfd.service
├─57635 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/lim
├─57638 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/res
├─57640 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/sbatchd
├─57649 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/pim
├─57655 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/mbatchd -d /sra>
├─57667 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/mbschd
├─57691 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/melim
├─57693 /bin/sh /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/elim.hpc
├─57696 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/mbatchd -d /sra>
├─57730 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/eauth -s
└─57768 sleep 8
Feb 13 16:31:08 syirhel881.fyre.ibm.com systemd[1]: Starting IBM Spectrum LSF...
Feb 13 16:31:08 syirhel881.fyre.ibm.com lsf_daemons[57564]: Starting the LSF subsystem
Feb 13 16:31:08 syirhel881.fyre.ibm.com systemd[1]: Started IBM Spectrum LSF.
[root@syirhel881 install]# lsid
IBM Spectrum LSF Community Edition 10.1.0.12, Jun 10 2021
Copyright IBM Corp. 1992, 2016. All rights reserved.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
My cluster name is lsfcefp12
My master name is syirhel881.fyre.ibm.com
------------------------------
YI SUN
Original Message:
Sent: Tue February 13, 2024 03:44 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
In fact, I'm starting LSF services on the master host instead of remotely, but I still face the same issue and am unable to resolve it. Could you please help me solve this problem?
------------------------------
roy al nabbout
Original Message:
Sent: Mon February 12, 2024 12:01 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
for rsh error, if you start LSF services on remote hosts, LSF performs remote login first. This requests to use rsh/ssh/pdsh, etc. for root user password less logon (which you should set it properly for your system).
https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=lsfconf-lsf-rsh
------------------------------
YI SUN
Original Message:
Sent: Mon February 12, 2024 11:52 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
I set `LSF_EGO_DAEMON_CONTROL=N` in `lsf.conf`, and now my LSF daemons are starting, but I continue to face the same error when executing the `lsfd.services`.
------------------------------
roy al nabbout
Original Message:
Sent: Mon February 12, 2024 11:43 AM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
For now you can set LSF_EGO_DAEMON_CONTROL=N in lsf.conf just try not to make the environment too complicated. From your previous message, I can see the cluster is working by manually start LSF services.
------------------------------
YI SUN
Original Message:
Sent: Mon February 12, 2024 11:34 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
when i try to startup the lsf , i always have the same probleme every time :
------------------------------
roy al nabbout
Original Message:
Sent: Mon February 12, 2024 10:36 AM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Here is info for default LSF installation directory structure, https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=linux-example-installation-directory-structure.
After you source LSF profile, LSF_ENVDIR is set to LSF_TOP/conf, LSF_SEVERDIR is set to LSF_TOP/10.1/<os_type>/etc, LSF_BINDIR is set to LSF_TOP/10.1/<os_type>/bin, and LSF_LIBDIR is set to LSF_TOP/10.1/<os_type>/lib. In your case <os_type> is linux2.6-glibc2.3-x86_64
------------------------------
YI SUN
Original Message:
Sent: Mon February 12, 2024 10:28 AM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
It is PATH setting not correct. You must source LSF profile (. profile.lsf or source cshrc.lsf) to get LSF_SERVERDIR env variable set correctly, lsf_daemons script is located in LSF_SERVERDIR directory (make sure lsf_daemons file has execution permission set correctly).
------------------------------
YI SUN
Original Message:
Sent: Mon February 12, 2024 10:08 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
When I execute sh -x $LSF_SERVERDIR/lsf_daemons start , I receive this:
------------------------------
roy al nabbout
Original Message:
Sent: Mon February 12, 2024 10:00 AM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Try running sh -x $LSF_SERVERDIR/lsf_daemons start directly see if you get any error. "203/EXEC" is the status set by systemd and could be caused by various reasons and it is hard to pinpoint what is going wrong now.
------------------------------
YI SUN
Original Message:
Sent: Mon February 12, 2024 09:40 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Can you explain what the error code 203/EXEC means and how to resolve it please?
Thank you!!
------------------------------
roy al nabbout
Original Message:
Sent: Fri February 09, 2024 03:52 PM
From: Gábor Samu
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Hello,
The directory where the LSF daemon binaries (e.g. res) are located is usually not in the path. So you'll need to run $LSF_SERVERDIR/res -V after sourcing profile.lsf / cshrc.lsf.
------------------------------
Gábor Samu
Original Message:
Sent: Fri February 09, 2024 05:28 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
I have confirmed that the libnsl.so file is installed on all my nodes as Robert instructed, but I am still facing the same issue. Additionally, when I attempt to run the command "res -V", I receive an error that reads "bash: res: command not found".
------------------------------
roy al nabbout
Original Message:
Sent: Thu February 08, 2024 05:50 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
For glibc issue, If you run "res -V", do you get any error? After resolving libnsl.so concern as indicated by Robert, do you still have problem with glibc?
------------------------------
YI SUN
Original Message:
Sent: Thu February 08, 2024 05:41 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
My problem is:
The IBM Spectrum LSF service fails to start. The systemctl status and logs for the lsfd service indicate a failure due to a timeout, accompanied by errors related to the GLIBC version compatibility for /lib64/libnsl.so.1.
Error Messages:
The service fails with a timeout error during startup attempts. Specific error logs mention that the GLIBC version GLIBC_2.2.5, required by /nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc/res and /nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc/sbatchd, is not found.
Could you give me advice on how to safely update GLIBC to a version that supports GLIBC_2.2.5 without affecting other system components or services? Alternatively, advice on installing and configuring compatibility libraries for libnsl to meet the requirements of the LSF components.
Info: My servers are running on Red Hat 8.
------------------------------
roy al nabbout
Original Message:
Sent: Wed February 07, 2024 12:58 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
You will need to source LSF profile first. e.g. ". LSF_TOP_DIRECTORY/conf/profile.lsf".
------------------------------
YI SUN
Original Message:
Sent: Wed February 07, 2024 11:55 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Always i have this error message when i try to run this command.
------------------------------
roy al nabbout
Original Message:
Sent: Wed February 07, 2024 11:13 AM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
You can ignore this error. Use lsadmin/badmin commands you should be able to bring up LSF services, and then run LSF commands successfully.
------------------------------
YI SUN
Original Message:
Sent: Wed February 07, 2024 03:44 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Hello,
Enclosed, you will find the command that was executed, along with the error message shown in the screenshot below. Furthermore, the lsf.conf file is attached to this message.
Thank you in advance for your reply and your assistance.
------------------------------
roy al nabbout
Original Message:
Sent: Tue February 06, 2024 08:16 PM
From: YI SUN
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
According to quick start guide, LSF CE is fully licensed with limitation on cluster size and certain features, so I guess it doesn't need an entitlement file.
What command you run to get error "lsf.entitlement not found"? Also check lsf.conf file to make sure no parameter set with "lsf.entitlement"
------------------------------
YI SUN
Original Message:
Sent: Mon February 05, 2024 03:47 AM
From: roy al nabbout
Subject: Ibm spectrum lsf community edition (lsfsce10.2.0.12)
Hello,
I've recently installed the lsfsce10.2.0.12 package, but I'm unable to locate the license file. Whenever I attempt to run lsf on my cluster, I encounter an error message stating, "lsf.entitlement not found."
Could you please assist me in understanding where to install this file or advise on the necessary steps to take?
Best regards,
Roy AL NABBOUT
------------------------------
roy al nabbout
------------------------------