High Performance Computing Group

 View Only
Expand all | Collapse all

Ibm spectrum lsf community edition (lsfsce10.2.0.12)

  • 1.  Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 05, 2024 01:33 PM

    Hello,

    I've recently installed the lsfsce10.2.0.12 package, but I'm unable to locate the license file. Whenever I attempt to run lsf on my cluster, I encounter an error message stating, "lsf.entitlement not found."

    Could you please assist me in understanding where to install this file or advise on the necessary steps to take?

    Best regards,

    Roy AL NABBOUT



    ------------------------------
    roy al nabbout
    ------------------------------


  • 2.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Tue February 06, 2024 08:17 PM

    According to quick start guide, LSF CE is fully licensed with limitation on cluster size and certain features, so I guess it doesn't need an entitlement file.

    What command you run to get error "lsf.entitlement not found"?  Also check lsf.conf file to make sure no parameter set with "lsf.entitlement"



    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed February 07, 2024 03:44 AM
      |   view attached

    Hello,

    Enclosed, you will find the command that was executed, along with the error message shown in the screenshot below. Furthermore, the lsf.conf file is attached to this message.

    Thank you in advance for your reply and your assistance.



    ------------------------------
    roy al nabbout
    ------------------------------

    Attachment(s)

    conf
    lsf.conf   2 KB 1 version


  • 4.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed February 07, 2024 11:13 AM

    You can ignore this error. Use lsadmin/badmin commands you should be able to bring up LSF services, and then run LSF commands successfully.



    ------------------------------
    YI SUN
    ------------------------------



  • 5.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed February 07, 2024 11:55 AM

    Always i have this error message when i try to run this command.



    ------------------------------
    roy al nabbout
    ------------------------------



  • 6.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed February 07, 2024 12:59 PM

    You will need to source LSF profile first. e.g. ". LSF_TOP_DIRECTORY/conf/profile.lsf". 



    ------------------------------
    YI SUN
    ------------------------------



  • 7.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu February 08, 2024 05:42 AM

    My problem is:

    The IBM Spectrum LSF service fails to start. The systemctl status and logs for the lsfd service indicate a failure due to a timeout, accompanied by errors related to the GLIBC version compatibility for /lib64/libnsl.so.1.

    Error Messages:

    The service fails with a timeout error during startup attempts. Specific error logs mention that the GLIBC version GLIBC_2.2.5, required by /nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc/res and /nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc/sbatchd, is not found.

    Could you give me advice on how to safely update GLIBC to a version that supports GLIBC_2.2.5 without affecting other system components or services? Alternatively, advice on installing and configuring compatibility libraries for libnsl to meet the requirements of the LSF components.

    Info: My servers are running on Red Hat 8.



    ------------------------------
    roy al nabbout
    ------------------------------



  • 8.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu February 08, 2024 10:02 AM

    For the libnsl.so.1 make sure you have the package installed on all your nodes.  libnsl is not installed by default https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=notes-known-issues

    The second issue is I'm pretty sure the linux2.6-glibc2.3 is intended for EL 7 or maybe EL6 I can't remember off the top of my head which one of those releases required the transition.  Odd that it is the only version that is included in the Community Edition.  For EL8 and EL9 for sure the lsf package was linux3.10-glibc2.17-x86_64.  Hopefully Yi will be able to steer you in the right direction there.



    ------------------------------
    Robert Lines
    ------------------------------



  • 9.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu February 08, 2024 05:51 PM

    For glibc issue, If you run "res -V", do you get any error? After resolving libnsl.so concern as indicated by Robert, do you still have problem with glibc?



    ------------------------------
    YI SUN
    ------------------------------



  • 10.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 09, 2024 05:29 AM

    I have confirmed that the libnsl.so file is installed on all my nodes as Robert instructed, but I am still facing the same issue. Additionally, when I attempt to run the command "res -V", I receive an error that reads "bash: res: command not found".



    ------------------------------
    roy al nabbout
    ------------------------------



  • 11.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 09, 2024 03:52 PM

    Hello, 

    The directory where the LSF daemon binaries (e.g. res) are located is usually not in the path. So you'll need to run $LSF_SERVERDIR/res -V after sourcing profile.lsf / cshrc.lsf. 



    ------------------------------
    Gábor Samu
    ------------------------------



  • 12.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 05:43 AM

    Hello,

    When I execute `$LSF_SERVERDIR/res -V`, I receive a notification indicating that the binary type is Linux2.6-glibc2.3-x86_64. However, attempting to initiate the LSF results in an error message:

    The service encounters a timeout error during startup attempts. Detailed error logs indicate the absence of the required GLIBC version GLIBC_2.2.5 for /nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc/res and /nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc/sbatchd.

    Could you offer guidance on updating GLIBC to a version compatible with GLIBC_2.2.5 in a manner that doesn't disrupt other system components or services? Alternatively, could you provide recommendations on installing and setting up compatibility libraries for libnsl to fulfill the LSF components' requirements?



    ------------------------------
    roy al nabbout
    ------------------------------



  • 13.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 09:27 AM

    Try if you can start sbatchd, res manually, e.g. as root user run $LSF_SERVERDIR/res -d $LSF_ENVDIR and $LSF_SERVERDIR/sbatchd -d $LSF_ENVDIR. If this works, maybe use it as workaround. 



    ------------------------------
    YI SUN
    ------------------------------



  • 14.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 09:40 AM

    Can you explain what the error code 203/EXEC means and how to resolve it please?


    Thank you!!



    ------------------------------
    roy al nabbout
    ------------------------------



  • 15.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 10:01 AM

    Try running sh -x $LSF_SERVERDIR/lsf_daemons start directly see if you get any error. "203/EXEC" is the status set by systemd and could be caused by various reasons and it is hard to pinpoint what is going wrong now. 



    ------------------------------
    YI SUN
    ------------------------------



  • 16.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 10:09 AM

    When I execute sh -x $LSF_SERVERDIR/lsf_daemons start , I receive this: 



    ------------------------------
    roy al nabbout
    ------------------------------



  • 17.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 10:29 AM

    It is PATH setting not correct. You must source LSF profile (. profile.lsf or source cshrc.lsf) to get LSF_SERVERDIR env variable set correctly, lsf_daemons script is located in LSF_SERVERDIR directory (make sure lsf_daemons file has execution permission set correctly).



    ------------------------------
    YI SUN
    ------------------------------



  • 18.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 10:37 AM

    Here is info for default LSF installation directory structure, https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=linux-example-installation-directory-structure.

    After you source LSF profile, LSF_ENVDIR is set to LSF_TOP/conf, LSF_SEVERDIR is set to LSF_TOP/10.1/<os_type>/etc, LSF_BINDIR is set to LSF_TOP/10.1/<os_type>/bin, and LSF_LIBDIR is set to LSF_TOP/10.1/<os_type>/lib. In your case <os_type> is linux2.6-glibc2.3-x86_64



    ------------------------------
    YI SUN
    ------------------------------



  • 19.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 11:34 AM

    when i try  to startup the lsf , i always have the same probleme every time :



    ------------------------------
    roy al nabbout
    ------------------------------



  • 20.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 11:43 AM

    For now you can set LSF_EGO_DAEMON_CONTROL=N in lsf.conf just try not to make the environment too complicated. From your previous message, I can see the cluster is working by manually start LSF services.



    ------------------------------
    YI SUN
    ------------------------------



  • 21.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 11:52 AM

    I set `LSF_EGO_DAEMON_CONTROL=N` in `lsf.conf`, and now my LSF daemons are starting, but I continue to face the same error when executing the `lsfd.services`.








    ------------------------------
    roy al nabbout
    ------------------------------



  • 22.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 12:02 PM

    for rsh error, if you start LSF services on remote hosts, LSF performs remote login first. This requests to use rsh/ssh/pdsh, etc. for root user password less logon (which you should set it properly for your system).

    https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=lsfconf-lsf-rsh



    ------------------------------
    YI SUN
    ------------------------------



  • 23.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Tue February 13, 2024 03:45 AM

    In fact, I'm starting LSF services on the master host instead of remotely, but I still face the same issue and am unable to resolve it. Could you please help me solve this problem?



    ------------------------------
    roy al nabbout
    ------------------------------



  • 24.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Tue February 13, 2024 07:38 PM

    I got a bit time today to try LSF CE. It works on RHEL8.8. My suggestion is to comment out following lines in hostsetup script, then run hostsetup again.

    #get_lsf_edition "$LSF_ENTITLEMENT_FILE"
    #if [ "$?" != "0" -a "$LSF_OFFERING" != "COMMUNITY" -a "$LSF_OFFERING" != "WORKGROUP" -a "$LSF_OFFERING" != "HPS" -a "$LSF_OFFERING" != "VARIANT" ]; then
    #E   LSF_EDITION="Unknown"
    #fi

    [root@syirhel881 install]# ./hostsetup --top="/sratch/support/syi1/lsfcefp12" --boot="y"
    Logging installation sequence in /sratch/support/syi1/lsfcefp12/log/Install.log

    ------------------------------------------------------------
        L S F    H O S T S E T U P    U T I L I T Y
    ------------------------------------------------------------
    This script sets up local host (LSF server, client or slave) environment.

    Setting up LSF server host "syirhel881" ...
    Checking LSF installation for host "syirhel881.fyre.ibm.com" ... Done
    Created symlink /etc/systemd/system/multi-user.target.wants/lsfd.service → /usr/lib/systemd/system/lsfd.service.
    Installing LSF service scripts on host "syirhel881.fyre.ibm.com" ... Done
    LSF service ports are defined in /sratch/support/syi1/lsfcefp12/conf/lsf.conf.
    Checking LSF service ports definition on host "syirhel881.fyre.ibm.com" ... Done

    [Tue Feb 13 16:25:48 PST 2024:get_lsf_edition:ERROR_1021]
        "/sratch/support/syi1/lsfcefp12/conf/lsf.entitlement" does not exist or is not readable.


    ... Setting up LSF server host "syirhel881" is done
    ... LSF host setup is done.
    [root@syirhel881 install]# vi hostsetup
    [root@syirhel881 install]# ./hostsetup --top="/sratch/support/syi1/lsfcefp12" --boot="y"
    Logging installation sequence in /sratch/support/syi1/lsfcefp12/log/Install.log

    ------------------------------------------------------------
        L S F    H O S T S E T U P    U T I L I T Y
    ------------------------------------------------------------
    This script sets up local host (LSF server, client or slave) environment.

    Setting up LSF server host "syirhel881" ...
    Checking LSF installation for host "syirhel881.fyre.ibm.com" ... Done
    Installing LSF service scripts on host "syirhel881.fyre.ibm.com" ... Done
    LSF service ports are defined in /sratch/support/syi1/lsfcefp12/conf/lsf.conf.
    Checking LSF service ports definition on host "syirhel881.fyre.ibm.com" ... Done

    [Tue Feb 13 16:27:40 PST 2024:get_lsf_edition:ERROR_1021]
        "/sratch/support/syi1/lsfcefp12/conf/lsf.entitlement" does not exist or is not readable.


    ... Setting up LSF server host "syirhel881" is done
    ... LSF host setup is done.
    [root@syirhel881 install]# vi hostsetup
    [root@syirhel881 install]# ./hostsetup --top="/sratch/support/syi1/lsfcefp12" --boot="y"
    Logging installation sequence in /sratch/support/syi1/lsfcefp12/log/Install.log

    ------------------------------------------------------------
        L S F    H O S T S E T U P    U T I L I T Y
    ------------------------------------------------------------
    This script sets up local host (LSF server, client or slave) environment.

    Setting up LSF server host "syirhel881" ...
    Checking LSF installation for host "syirhel881.fyre.ibm.com" ... Done
    Installing LSF service scripts on host "syirhel881.fyre.ibm.com" ... Done
    LSF service ports are defined in /sratch/support/syi1/lsfcefp12/conf/lsf.conf.
    Checking LSF service ports definition on host "syirhel881.fyre.ibm.com" ... Done

    ... Setting up LSF server host "syirhel881" is done
    ... LSF host setup is done.
    [root@syirhel881 install]# systemctl status lsfd
    ● lsfd.service - IBM Spectrum LSF
       Loaded: loaded (/usr/lib/systemd/system/lsfd.service; enabled; vendor preset: disabled)
       Active: inactive (dead)
    [root@syirhel881 install]# systemctl start lsfd
    [root@syirhel881 install]# lsid
    ^C
    [root@syirhel881 install]# systemctl status lsfd
    ● lsfd.service - IBM Spectrum LSF
       Loaded: loaded (/usr/lib/systemd/system/lsfd.service; enabled; vendor preset: disabled)
       Active: active (running) since Tue 2024-02-13 16:31:08 PST; 11s ago
      Process: 57564 ExecStart=/sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/lsf_dae>
      Process: 57561 ExecStartPre=/bin/bash -c (timer=12; while (( $timer )); do if [ ! -d "/sratch/sup>
        Tasks: 14 (limit: 49023)
       Memory: 163.2M
       CGroup: /system.slice/lsfd.service
               ├─57635 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/lim
               ├─57638 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/res
               ├─57640 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/sbatchd
               ├─57649 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/pim
               ├─57655 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/mbatchd -d /sra>
               ├─57667 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/mbschd
               ├─57691 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/melim
               ├─57693 /bin/sh /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/elim.hpc
               ├─57696 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/mbatchd -d /sra>
               ├─57730 /sratch/support/syi1/lsfcefp12/10.1/linux2.6-glibc2.3-x86_64/etc/eauth -s
               └─57768 sleep 8

    Feb 13 16:31:08 syirhel881.fyre.ibm.com systemd[1]: Starting IBM Spectrum LSF...
    Feb 13 16:31:08 syirhel881.fyre.ibm.com lsf_daemons[57564]: Starting the LSF subsystem
    Feb 13 16:31:08 syirhel881.fyre.ibm.com systemd[1]: Started IBM Spectrum LSF.
    [root@syirhel881 install]# lsid
    IBM Spectrum LSF Community Edition 10.1.0.12, Jun 10 2021
    Copyright IBM Corp. 1992, 2016. All rights reserved.
    US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

    My cluster name is lsfcefp12
    My master name is syirhel881.fyre.ibm.com



    ------------------------------
    YI SUN
    ------------------------------



  • 25.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed February 14, 2024 09:01 AM

    I did as you instructed, but now it's telling me that "host 57 not defined". However, when I run "lsid", it shows that it recognizes it, and it's also correctly defined in the file lsf.cluster.cluster1.



    ------------------------------
    roy al nabbout
    ------------------------------



  • 26.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed February 14, 2024 12:19 PM

    Do you use host name pattern in lsf.cluster file? Maybe attach your cluster file here to take a look.



    ------------------------------
    YI SUN
    ------------------------------



  • 27.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu February 15, 2024 02:59 AM

    I specified that host57 be a master host and 58 a compute host.


    [serviceit@host57 conf]$ cat lsf.cluster.cluster1
    # $Revision$Date$
    #-----------------------------------------------------------------------
    # Copyright IBM Corp. 1992, 2016. All rights reserved.
    #
    # After editing this file, run "lsadmin reconfig" and
    # "badmin mbdrestart" to apply your changes.
    #
    # T H I S   I S   A    O N E   P E R   C L U S T E R    F I L E
    #
    # This is a sample cluster definition file.  There is a cluster
    # definition file for each cluster.  This file's name should be
    # lsf.cluster.<cluster-name>.
    # See lsf.cluster(5) and the "Administering IBM Spectrum LSF".
    #

    Begin   ClusterAdmins
    Administrators = serviceit
    End    ClusterAdmins

    Begin   Host
    HOSTNAME  model    type        server  RESOURCES    #Keywords
    #apple    Sparc5S  SUNSOL       1     (sparc bsd)   #Example
    #peach    DEC3100  DigitalUNIX  1     (alpha osf1)
    #banana   HP9K778  HPPA         1     (hp68k hpux)
    #mango    HP735    HPPA         1     (hpux cs)
    #grape    SGI4D35  SGI5         1     (irix)
    #lemon    PC200    LINUX        1     (linux)
    #pear     IBM350   IBMAIX4      1     (aix cs)
    #plum     PENT_100 NTX86        1     (nt)
    #berry    DEC3100  !            1     (ultrix fs bsd mips dec)
    #orange   !        SUNSOL       1     (sparc bsd)   #Example
    #prune    !        !            1     (convex)
    host57.secure-ic.adds   !   !   1   (mg)
    host58.secure-ic.adds   !       !       1       ()
    End     Host

    Begin Parameters
    LSF_HOST_ADDR_RANGE=*.*.*.*
    # FLOAT_CLIENTS_ADDR_RANGE=*.*.*.*
    # FLOAT_CLIENTS=10
    End Parameters

    # Begin ResourceMap
    # RESOURCENAME  LOCATION
    # tmp2          [default]
    # nio           [all]
    # console       [default]
    # osname        [default]
    # osver         [default]
    # cpuarch       [default]
    # cpuspeed      [default]
    # bandwidth     [default]
    # availcpufreqs [default]
    # currcpufreqs  [default]
    # End ResourceMap



    ------------------------------
    roy al nabbout
    ------------------------------



  • 28.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu February 15, 2024 05:39 AM

    Everything works on host57, but on host58, I have a problem where when I run `bhosts`, it tells me user permission denied.



    ------------------------------
    roy al nabbout
    ------------------------------



  • 29.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu February 15, 2024 10:58 AM

    This more looks like host name resolution issue. Try following.

    • change host58.secure-ic.adds to host58 in lsf.cluster file
    • create a file "hosts" under LSF_TOP/conf with following entries (or you can add them in /etc/hosts on host57 and host58)

              <ip> host57 host57.secure-ic.adds

              <ip> host58 host58.secure-ic.adds

    • restart LSF services on host57 and host58


    ------------------------------
    YI SUN
    ------------------------------



  • 30.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 05:51 AM

    I did as you told me, but i have this error message on host57 and Host58



    ------------------------------
    roy al nabbout
    ------------------------------



  • 31.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 11:01 AM

    On host57 use root account to stop and start LSF services.

    On host58 you need to source LSF profile before running any LSF command



    ------------------------------
    YI SUN
    ------------------------------



  • 32.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 08:26 AM

                                                                                                                                                                                                                                                               I did as you told me,and i fixed the last problem but now i have a problem on the host58 i can't run bhosts , i always receive this error message .                                                                                                                                                          



    ------------------------------
    roy al nabbout
    ------------------------------



  • 33.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 09:28 AM

    If I've configured a host as a compute host, am I able to submit jobs to this host, or can I only execute jobs on it?



    ------------------------------
    roy al nabbout
    ------------------------------



  • 34.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 09:29 AM

    If I've configured a host as a compute host, am I able to submit jobs to this host, or can I only execute jobs on it?



    ------------------------------
    roy al nabbout
    ------------------------------



  • 35.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 10:57 AM

    You can do both on the compute node.



    ------------------------------
    YI SUN
    ------------------------------



  • 36.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 11:01 AM

    when i try to submit a job i receive this error message and i always have the same error when i run bhosts on my compute node .



    ------------------------------
    roy al nabbout
    ------------------------------



  • 37.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 11:08 AM

    It doesn't seem you have resolved host name resolution issue as previously mentioned on host58. As same user I guess no problem for you to submit job and run hosts.



    ------------------------------
    YI SUN
    ------------------------------



  • 38.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri February 16, 2024 11:33 AM
    Edited by roy al nabbout Fri February 16, 2024 12:02 PM

    I did as you told me, but I still encounter the same problem.

    ------------------------------
    roy al nabbout
    ------------------------------



  • 39.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed February 28, 2024 05:49 AM

    I have this problem, i cannot startup the lim on my lsf .

    Can you help me plz!!!



    ------------------------------
    roy al nabbout
    ------------------------------



  • 40.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed February 28, 2024 01:06 PM

    You can use chmod u+s on $LSF_BINDIR/bctrld (owned by root account) as workaround. 



    ------------------------------
    YI SUN
    ------------------------------



  • 41.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu February 29, 2024 04:04 AM

    i have this problem on the host58 but everything is working normal on the host57 , can you help me plz?



    ------------------------------
    roy al nabbout
    ------------------------------



  • 42.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu February 29, 2024 02:04 PM

    It seems LSF environment issue. Before run lsid, you can try "source $LSF_TOP/conf/cshrc.lsf" or ". $LSF_TOP/conf/profile.lsf)  to set up LSF environment in the shell session. Here LSF_TOP is the top directory of your LSF installation.



    ------------------------------
    YI SUN
    ------------------------------



  • 43.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri March 01, 2024 04:08 AM

    I did as you told me, but I still encounter the same problem.


    the configuration of lsf.conf:


    [serviceit@host58 conf]$ cat lsf.conf
    # This file is produced automatically by lsfconfig according to
    # installation setup. Refer to "Administering IBM Spectrum LSF"
    # before changing any parameters in this file.
    # Any changes to the path names of LSF files must be reflected
    # in this file. Make these changes with caution.

    # After editing this file, run "lsadmin reconfig" and
    # "badmin mbdrestart" to apply your changes.


    LSB_SHAREDIR=/nfsdata/work

    # Configuration directories
    LSF_CONFDIR=/nfsdata/conf
    LSB_CONFDIR=/nfsdata/conf/lsbatch

    # Daemon log messages
    LSF_LOGDIR=/nfsdata/log
    LSF_LOG_MASK=LOG_WARNING

    # Batch mail message handling
    LSB_MAILTO=!U

    # Miscellaneous
    LSF_AUTH=eauth
    LSB_NCPU_ENFORCE=1

    # General lsfinstall variables
    LSF_MANDIR=/nfsdata/10.1/man
    LSF_INCLUDEDIR=/nfsdata/10.1/include
    LSF_MISC=/nfsdata/10.1/misc
    XLSF_APPDIR=/nfsdata/10.1/misc
    LSF_ENVDIR=/nfsdata/conf

    # Internal variable to distinguish Default Install
    LSF_DEFAULT_INSTALL=y

    # Internal variable indicating operation mode
    LSB_MODE=batch

    # Other variables
    LSF_LIM_PORT=7873
    LSF_RES_PORT=6878
    LSB_MBD_PORT=6881
    LSB_SBD_PORT=6882

    # Enable mbd query child
    LSB_QUERY_PORT=6891
    LSF_DYNAMIC_HOST_WAIT_TIME=60

    # WARNING: Please do not delete/modify next line!!
    LSF_LINK_PATH=n

    # LSF_MACHDEP and LSF_INDEP are reserved to maintain
    # backward compatibility with legacy lsfsetup.
    # They are not used in the new lsfinstall.
    LSF_INDEP=/nfsdata
    LSF_MACHDEP=/nfsdata/10.1

    LSF_TOP=/nfsdata
    LSF_VERSION=10.1
    LSF_ENABLE_EGO=Y
    LSF_EGO_ENVDIR=/nfsdata/conf/ego/cluster1/kernel
    EGO_WORKDIR=/nfsdata/work/cluster1/ego
    LSF_LIVE_CONFDIR=/nfsdata/work/cluster1/live_confdir

    # Default tuning parameters

    # Enable strict resource requirement syntax to select section
    LSF_STRICT_RESREQ=Y
    # Automatically shuts down any daemons running on hosts that attempted to
    # join the cluster, but failed to communicate within the
    # LSF_DYNAMIC_HOST_WAIT_TIME period.
    EGO_ENABLE_AUTO_DAEMON_SHUTDOWN=Y
    # Enable bmod to modify resource limits and location of job output files for running jobs
    LSB_MOD_ALL_JOBS=Y
    # Reduce pim update frequency
    LSF_PIM_SLEEPTIME_UPDATE=Y
    LSF_PIM_LINUX_ENHANCE=Y
    LSF_UNIT_FOR_LIMITS=MB
    # Do not lock lim when running exclusive jobs
    LSB_DISABLE_LIMLOCK_EXCL=Y
    # Display the execution host in the output of the command bsub -K
    LSB_SUBK_SHOW_EXEC_HOST=Y

    # Do not allow lsrun by default to encourage use of bsub
    LSF_DISABLE_LSRUN=Y

    # Turn off RES syncup to reduce traffic to master
    LSF_RES_SYNCUP_INTERVAL=0

    # Add slots information to the bjobs output
    LSB_BJOBS_DISPLAY_ENH=Y
    LSB_QUERY_ENH=Y
    #LSF_LIC_SCHED_HOST= # License scheduler host
    DAEMON_SHUTDOWN_DELAY=180

    LSF_PROCESS_TRACKING=Y
    LSF_LINUX_CGROUP_ACCT=Y

    LSB_ENABLE_HPC_ALLOCATION=Y
    LSB_BJOBS_PENDREASON_LEVEL=1
    LSF_MASTER_LIST="host57"
    LSF_EGO_DAEMON_CONTROL=Y



    Can you help me plz?



    ------------------------------
    roy al nabbout
    ------------------------------



  • 44.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri March 01, 2024 03:03 PM

    After sourcing LSF profile, run env | grep LSF to check if LSF_SERVERDIR, LSF_BINDIR, and LSF_LIBDIR environment variables are set. The error usually indicates those three variables are not set. You also can manually add them in lsf.conf file. 



    ------------------------------
    YI SUN
    ------------------------------



  • 45.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon March 04, 2024 03:03 AM
    Edited by roy al nabbout Mon March 04, 2024 09:34 AM

    I did as you told me, but I still encounter the same problem.

    the configuration of ego.conf:

    [serviceit@host58 kernel]$ cat ego.conf
    # $RCSfile$Revision$Date$
    # EGO kernel parameters configuration file
    #

    # EGO master candidate host
    EGO_MASTER_LIST="host57"

    # EGO daemon port number
    EGO_KD_PORT=7870
    EGO_PEM_PORT=7871

    # EGO service directory
    EGO_ESRVDIR=/nfsdata/conf/ego/cluster1/eservice

    # EGO security configuration
    EGO_SEC_PLUGIN=sec_ego_default
    EGO_SEC_CONF=/nfsdata/conf/ego/cluster1/kernel

    # EGO event configuration
    #EGO_EVENT_MASK=LOG_INFO
    #EGO_EVENT_PLUGIN=eventplugin_snmp[SINK=host,MIBDIRS=/nfsdata/conf/ego/cluster1/kernel/mibs]

    # Parameters related to dynamic adding/removing host
    # EGO_GET_CONF=LIM

    EGO_CONFDIR=/nfsdata/conf/ego/cluster1/kernel
    EGO_TOP=/nfsdata
    [serviceit@host58 kernel]$


    ------------------------------
    roy al nabbout
    ------------------------------



  • 46.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed March 06, 2024 05:29 AM

    I still encounter the same problem., can you help me please!!

    ego.conf file: 

    [serviceit@host58 kernel]$ cat ego.conf
    # $RCSfile$Revision$Date$
    # EGO kernel parameters configuration file
    #

    # EGO master candidate host
    EGO_MASTER_LIST="host57.secure-ic.adds"

    # EGO daemon port number
    EGO_KD_PORT=7870
    EGO_PEM_PORT=7871

    # EGO service directory
    EGO_ESRVDIR=/nfsdata/conf/ego/cluster1/eservice

    # EGO security configuration
    EGO_SEC_PLUGIN=sec_ego_default
    EGO_SEC_CONF=/nfsdata/conf/ego/cluster1/kernel

    # EGO event configuration
    #EGO_EVENT_MASK=LOG_INFO
    #EGO_EVENT_PLUGIN=eventplugin_snmp[SINK=host,MIBDIRS=/nfsdata/conf/ego/cluster1/kernel/mibs]

    # Parameters related to dynamic adding/removing host
    # EGO_GET_CONF=LIM

    EGO_CONFDIR=/nfsdata/conf/ego/cluster1/kernel
    EGO_TOP=/nfsdata
    LSF_SERVERDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc
    LSF_BINDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/bin
    LSF_LIBDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/lib
    LSF_ENVDIR=/nfsdata/conf

    the error message:



    ------------------------------
    roy al nabbout
    ------------------------------



  • 47.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Wed March 06, 2024 02:07 PM

    Could you disable LSF_EGO_DAEMON_CONTROL=Y and add LSF_SERVERDIR, LSF_LIBDIR, LSF_BINDIR into lsf.conf file, then stop/start LSF service.



    ------------------------------
    YI SUN
    ------------------------------



  • 48.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu March 07, 2024 09:31 AM
    Edited by roy al nabbout Thu March 07, 2024 09:51 AM

    I did as you told me, but I still encounter the same problem:

    [lsfadmin@host57 log]$ bhosts -l
    HOST  host57
    STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
    ok              12.50     -     16      0      0      0      0      0      -

     CURRENT LOAD USED FOR SCHEDULING:
                    r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem  slots  ngpus
     Total           0.0   0.0   0.0    1%   0.0    16    1     1   56G 15.5G 28.9G     16    0.0
     Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M      -    0.0

                   ngpus_shared ngpus_excl_t ngpus_excl_p ngpus_prohibited
     Total                  0.0          0.0          0.0              0.0
     Reserved               0.0          0.0          0.0              0.0

                   gpu_shared_avg_ut gpu_shared_avg_mut gpu_mode0 gpu_mode1 gpu_mode2
     Total                       0.0                0.0       0.0       0.0       0.0
     Reserved                    0.0                0.0       0.0       0.0       0.0

                   gpu_mode3 gpu_mode4 gpu_mode5 gpu_mode6 gpu_mode7 gpu_temp0
     Total               0.0       0.0       0.0       0.0       0.0       0.0
     Reserved            0.0       0.0       0.0       0.0       0.0       0.0

                   gpu_temp1 gpu_temp2 gpu_temp3 gpu_temp4 gpu_temp5 gpu_temp6
     Total               0.0       0.0       0.0       0.0       0.0       0.0
     Reserved            0.0       0.0       0.0       0.0       0.0       0.0

                   gpu_temp7 gpu_ecc0 gpu_ecc1 gpu_ecc2 gpu_ecc3 gpu_ecc4 gpu_ecc5
     Total               0.0      0.0      0.0      0.0      0.0      0.0      0.0
     Reserved            0.0      0.0      0.0      0.0      0.0      0.0      0.0

                   gpu_ecc6 gpu_ecc7 gpu_ut0 gpu_ut1 gpu_ut2 gpu_ut3 gpu_ut4 gpu_ut5
     Total              0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0
     Reserved           0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0

                   gpu_ut6 gpu_ut7 gpu_mut0 gpu_mut1 gpu_mut2 gpu_mut3 gpu_mut4
     Total             0.0     0.0      0.0      0.0      0.0      0.0      0.0
     Reserved          0.0     0.0      0.0      0.0      0.0      0.0      0.0

                   gpu_mut5 gpu_mut6 gpu_mut7 gpu_mtotal0 gpu_mtotal1 gpu_mtotal2
     Total              0.0      0.0      0.0         0.0         0.0         0.0
     Reserved           0.0      0.0      0.0         0.0         0.0         0.0

                   gpu_mtotal3 gpu_mtotal4 gpu_mtotal5 gpu_mtotal6 gpu_mtotal7
     Total                 0.0         0.0         0.0         0.0         0.0
     Reserved              0.0         0.0         0.0         0.0         0.0

                   gpu_mused0 gpu_mused1 gpu_mused2 gpu_mused3 gpu_mused4 gpu_mused5
     Total                0.0        0.0        0.0        0.0        0.0        0.0
     Reserved             0.0        0.0        0.0        0.0        0.0        0.0

                   gpu_mused6 gpu_mused7 gpu_maxfactor
     Total                0.0        0.0           0.0
     Reserved             0.0        0.0           0.0


     LOAD THRESHOLD USED FOR SCHEDULING:
               r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
     loadSched   -     -     -     -       -     -    -     -     -      -      -
     loadStop    -     -     -     -       -     -    -     -     -      -      -


     CONFIGURED AFFINITY CPU LIST: all


    HOST  host58
    STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
    closed_LIM       1.00     -      1      0      0      0      0      0      -

     CURRENT LOAD USED FOR SCHEDULING:
                    r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem  slots  ngpus
     Total           0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M      1    0.0
     Reserved         -     -     -     -     -     -    -     -     -     -     -       -     -

                   ngpus_shared ngpus_excl_t ngpus_excl_p ngpus_prohibited
     Total                  0.0          0.0          0.0              0.0
     Reserved                -            -            -                -

                   gpu_shared_avg_ut gpu_shared_avg_mut gpu_mode0 gpu_mode1 gpu_mode2
     Total                       0.0                0.0       0.0       0.0       0.0
     Reserved                     -                  -         -         -         -

                   gpu_mode3 gpu_mode4 gpu_mode5 gpu_mode6 gpu_mode7 gpu_temp0
     Total               0.0       0.0       0.0       0.0       0.0       0.0
     Reserved             -         -         -         -         -         -

                   gpu_temp1 gpu_temp2 gpu_temp3 gpu_temp4 gpu_temp5 gpu_temp6
     Total               0.0       0.0       0.0       0.0       0.0       0.0
     Reserved             -         -         -         -         -         -

                   gpu_temp7 gpu_ecc0 gpu_ecc1 gpu_ecc2 gpu_ecc3 gpu_ecc4 gpu_ecc5
     Total               0.0      0.0      0.0      0.0      0.0      0.0      0.0
     Reserved             -        -        -        -        -        -        -

                   gpu_ecc6 gpu_ecc7 gpu_ut0 gpu_ut1 gpu_ut2 gpu_ut3 gpu_ut4 gpu_ut5
     Total              0.0      0.0     0.0     0.0     0.0     0.0     0.0     0.0
     Reserved            -        -       -       -       -       -       -       -

                   gpu_ut6 gpu_ut7 gpu_mut0 gpu_mut1 gpu_mut2 gpu_mut3 gpu_mut4
     Total             0.0     0.0      0.0      0.0      0.0      0.0      0.0
     Reserved           -       -        -        -        -        -        -

                   gpu_mut5 gpu_mut6 gpu_mut7 gpu_mtotal0 gpu_mtotal1 gpu_mtotal2
     Total              0.0      0.0      0.0         0.0         0.0         0.0
     Reserved            -        -        -           -           -           -

                   gpu_mtotal3 gpu_mtotal4 gpu_mtotal5 gpu_mtotal6 gpu_mtotal7
     Total                 0.0         0.0         0.0         0.0         0.0
     Reserved               -           -           -           -           -

                   gpu_mused0 gpu_mused1 gpu_mused2 gpu_mused3 gpu_mused4 gpu_mused5
     Total                0.0        0.0        0.0        0.0        0.0        0.0
     Reserved              -          -          -          -          -          -

                   gpu_mused6 gpu_mused7 gpu_maxfactor
     Total                0.0        0.0           0.0
     Reserved              -          -             -


     LOAD THRESHOLD USED FOR SCHEDULING:
               r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
     loadSched   -     -     -     -       -     -    -     -     -      -      -
     loadStop    -     -     -     -       -     -    -     -     -      -      -


     CONFIGURED AFFINITY CPU LIST: all



    the lim on host58 is always down and i cannot start him :

    [lsfadmin@host57 log]$ bhosts
    HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
    host57             ok              -     16      0      0      0      0      0
    host58             closed          -      1      0      0      0      0      0

    [lsfadmin@host58 conf]$ lsadmin limstartup
    Starting up LIM on <host58> ...... [lsfadmin@host58 conf]$

    logs:
    Mar  7 15:22:49 host58 systemd-logind[1231]: Session c8 logged out. Waiting for processes to exit.
    Mar  7 15:22:49 host58 systemd-logind[1231]: Removed session c8.
    Mar  7 15:22:54 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
    Mar  7 15:22:54 host58 systemd-logind[1231]: New session c9 of user root.
    Mar  7 15:22:54 host58 systemd[1]: Started Session c9 of user root.
    Mar  7 15:22:54 host58 systemd-logind[1231]: Session c9 logged out. Waiting for processes to exit.
    Mar  7 15:22:54 host58 systemd[1]: session-c9.scope: Succeeded.
    Mar  7 15:22:54 host58 systemd-logind[1231]: Removed session c9.
    Mar  7 15:23:04 host58 systemd[1]: Stopping User Manager for UID 0...
    Mar  7 15:23:04 host58 systemd[4143]: Stopped target Default.
    Mar  7 15:23:04 host58 systemd[4143]: Stopped target Basic System.
    Mar  7 15:23:04 host58 systemd[4143]: Stopped target Timers.
    Mar  7 15:23:04 host58 systemd[4143]: Stopped target Paths.
    Mar  7 15:23:04 host58 systemd[4143]: Stopped target Sockets.
    Mar  7 15:23:04 host58 systemd[4143]: Closed Multimedia System.
    Mar  7 15:23:04 host58 systemd[4143]: Closed D-Bus User Message Bus Socket.
    Mar  7 15:23:04 host58 systemd[4143]: Reached target Shutdown.
    Mar  7 15:23:04 host58 systemd[4143]: Started Exit the Session.
    Mar  7 15:23:04 host58 systemd[4143]: Reached target Exit the Session.
    Mar  7 15:23:04 host58 systemd[1]: user@0.service: Succeeded.
    Mar  7 15:23:04 host58 systemd[1]: Stopped User Manager for UID 0.
    Mar  7 15:23:04 host58 systemd[1]: Stopping User runtime directory /run/user/0...
    Mar  7 15:23:04 host58 systemd[1]: run-user-0.mount: Succeeded.
    Mar  7 15:23:04 host58 systemd[1]: user-runtime-dir@0.service: Succeeded.
    Mar  7 15:23:04 host58 systemd[1]: Stopped User runtime directory /run/user/0.
    Mar  7 15:23:04 host58 systemd[1]: Removed slice User Slice of UID 0.
    Mar  7 15:23:40 host58 nfsrahead[4300]: setting /home/lsfadmin readahead to 128
    Mar  7 15:24:05 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
    Mar  7 15:24:31 host58 nfsrahead[5097]: setting /home/lsfadmin readahead to 128
    Mar  7 15:24:42 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
    Mar  7 15:24:42 host58 systemd[1]: Created slice User Slice of UID 0.
    Mar  7 15:24:42 host58 systemd[1]: Starting User runtime directory /run/user/0...
    Mar  7 15:24:42 host58 systemd-logind[1231]: New session c10 of user root.
    Mar  7 15:24:42 host58 systemd[1]: Started User runtime directory /run/user/0.
    Mar  7 15:24:42 host58 systemd[1]: Starting User Manager for UID 0...
    Mar  7 15:24:42 host58 systemd[5122]: Listening on Multimedia System.
    Mar  7 15:24:42 host58 systemd[5122]: Starting D-Bus User Message Bus Socket.
    Mar  7 15:24:42 host58 systemd[5122]: Reached target Timers.
    Mar  7 15:24:42 host58 systemd[5122]: Reached target Paths.
    Mar  7 15:24:42 host58 systemd[5122]: Listening on D-Bus User Message Bus Socket.
    Mar  7 15:24:42 host58 systemd[5122]: Reached target Sockets.
    Mar  7 15:24:42 host58 systemd[5122]: Reached target Basic System.
    Mar  7 15:24:42 host58 systemd[5122]: Reached target Default.
    Mar  7 15:24:42 host58 systemd[5122]: Startup finished in 137ms.
    Mar  7 15:24:42 host58 systemd[1]: Started User Manager for UID 0.
    Mar  7 15:24:42 host58 systemd[1]: Started Session c10 of user root.
    Mar  7 15:24:42 host58 nfsrahead[5161]: setting /nfsdata readahead to 128
    Mar  7 15:24:42 host58 systemd-logind[1231]: Session c10 logged out. Waiting for processes to exit.
    Mar  7 15:24:42 host58 systemd[1]: session-c10.scope: Succeeded.
    Mar  7 15:24:42 host58 systemd-logind[1231]: Removed session c10.
    Mar  7 15:24:51 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
    Mar  7 15:24:52 host58 systemd[1]: Stopping User Manager for UID 0...
    Mar  7 15:24:52 host58 systemd[5122]: Stopped target Default.
    Mar  7 15:24:52 host58 systemd[5122]: Stopped target Basic System.
    Mar  7 15:24:52 host58 systemd[5122]: Stopped target Timers.
    Mar  7 15:24:52 host58 systemd[5122]: Stopped target Sockets.
    Mar  7 15:24:52 host58 systemd[5122]: Closed Multimedia System.
    Mar  7 15:24:52 host58 systemd[5122]: Closed D-Bus User Message Bus Socket.
    Mar  7 15:24:52 host58 systemd[5122]: Stopped target Paths.
    Mar  7 15:24:52 host58 systemd[5122]: Reached target Shutdown.
    Mar  7 15:24:52 host58 systemd[5122]: Started Exit the Session.
    Mar  7 15:24:52 host58 systemd[5122]: Reached target Exit the Session.
    Mar  7 15:24:52 host58 systemd[1]: user@0.service: Succeeded.
    Mar  7 15:24:52 host58 systemd[1]: Stopped User Manager for UID 0.
    Mar  7 15:24:52 host58 systemd[1]: Stopping User runtime directory /run/user/0...
    Mar  7 15:24:52 host58 systemd[1]: run-user-0.mount: Succeeded.
    Mar  7 15:24:52 host58 systemd[1]: user-runtime-dir@0.service: Succeeded.
    Mar  7 15:24:52 host58 systemd[1]: Stopped User runtime directory /run/user/0.
    Mar  7 15:24:52 host58 systemd[1]: Removed slice User Slice of UID 0.
    Mar  7 15:25:15 host58 nfsrahead[5216]: setting /home/lsfadmin readahead to 128
    Mar  7 15:25:35 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
    Mar  7 15:26:30 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
    Mar  7 15:26:30 host58 systemd[1]: Created slice User Slice of UID 0.
    Mar  7 15:26:30 host58 systemd[1]: Starting User runtime directory /run/user/0...
    Mar  7 15:26:30 host58 systemd-logind[1231]: New session c11 of user root.
    Mar  7 15:26:30 host58 systemd[1]: Started User runtime directory /run/user/0.
    Mar  7 15:26:30 host58 systemd[1]: Starting User Manager for UID 0...
    Mar  7 15:26:30 host58 systemd[5369]: Reached target Paths.
    Mar  7 15:26:30 host58 systemd[5369]: Listening on Multimedia System.
    Mar  7 15:26:30 host58 systemd[5369]: Reached target Timers.
    Mar  7 15:26:30 host58 systemd[5369]: Starting D-Bus User Message Bus Socket.
    Mar  7 15:26:30 host58 systemd[5369]: Listening on D-Bus User Message Bus Socket.
    Mar  7 15:26:30 host58 systemd[5369]: Reached target Sockets.
    Mar  7 15:26:30 host58 systemd[5369]: Reached target Basic System.
    Mar  7 15:26:30 host58 systemd[5369]: Reached target Default.
    Mar  7 15:26:30 host58 systemd[5369]: Startup finished in 144ms.
    Mar  7 15:26:30 host58 systemd[1]: Started User Manager for UID 0.
    Mar  7 15:26:30 host58 systemd[1]: Started Session c11 of user root.
    Mar  7 15:26:31 host58 systemd[1]: session-c11.scope: Succeeded.
    Mar  7 15:26:31 host58 systemd-logind[1231]: Session c11 logged out. Waiting for processes to exit.
    Mar  7 15:26:31 host58 systemd-logind[1231]: Removed session c11.
    Mar  7 15:26:41 host58 systemd[1]: Stopping User Manager for UID 0...
    Mar  7 15:26:41 host58 systemd[5369]: Stopped target Default.
    Mar  7 15:26:41 host58 systemd[5369]: Stopped target Basic System.
    Mar  7 15:26:41 host58 systemd[5369]: Stopped target Sockets.
    Mar  7 15:26:41 host58 systemd[5369]: Stopped target Paths.
    Mar  7 15:26:41 host58 systemd[5369]: Stopped target Timers.
    Mar  7 15:26:41 host58 systemd[5369]: Closed D-Bus User Message Bus Socket.
    Mar  7 15:26:41 host58 systemd[5369]: Closed Multimedia System.
    Mar  7 15:26:41 host58 systemd[5369]: Reached target Shutdown.
    Mar  7 15:26:41 host58 systemd[5369]: Started Exit the Session.
    Mar  7 15:26:41 host58 systemd[5369]: Reached target Exit the Session.
    Mar  7 15:26:41 host58 systemd[1]: user@0.service: Succeeded.
    Mar  7 15:26:41 host58 systemd[1]: Stopped User Manager for UID 0.
    Mar  7 15:26:41 host58 systemd[1]: Stopping User runtime directory /run/user/0...
    Mar  7 15:26:41 host58 systemd[1]: run-user-0.mount: Succeeded.
    Mar  7 15:26:41 host58 systemd[1]: user-runtime-dir@0.service: Succeeded.
    Mar  7 15:26:41 host58 systemd[1]: Stopped User runtime directory /run/user/0.
    Mar  7 15:26:41 host58 systemd[1]: Removed slice User Slice of UID 0.
    Mar  7 15:26:42 host58 lim[6332]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
    Mar  7 15:26:42 host58 lim[6332]: main: LIM has exited due to a fatal error.
    Mar  7 15:27:05 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
    Mar  7 15:27:05 host58 systemd[1]: Created slice User Slice of UID 0.
    Mar  7 15:27:05 host58 systemd[1]: Starting User runtime directory /run/user/0...
    Mar  7 15:27:05 host58 systemd-logind[1231]: New session c12 of user root.
    Mar  7 15:27:05 host58 systemd[1]: Started User runtime directory /run/user/0.
    Mar  7 15:27:05 host58 systemd[1]: Starting User Manager for UID 0...
    Mar  7 15:27:05 host58 systemd[6348]: Starting D-Bus User Message Bus Socket.
    Mar  7 15:27:05 host58 systemd[6348]: Listening on Multimedia System.
    Mar  7 15:27:05 host58 systemd[6348]: Reached target Timers.
    Mar  7 15:27:05 host58 systemd[6348]: Reached target Paths.
    Mar  7 15:27:05 host58 systemd[6348]: Listening on D-Bus User Message Bus Socket.
    Mar  7 15:27:05 host58 systemd[6348]: Reached target Sockets.
    Mar  7 15:27:05 host58 systemd[6348]: Reached target Basic System.
    Mar  7 15:27:05 host58 systemd[6348]: Reached target Default.
    Mar  7 15:27:05 host58 systemd[6348]: Startup finished in 133ms.
    Mar  7 15:27:05 host58 systemd[1]: Started User Manager for UID 0.
    Mar  7 15:27:05 host58 systemd[1]: Started Session c12 of user root.
    Mar  7 15:27:05 host58 lim[6408]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
    Mar  7 15:27:05 host58 lim[6408]: main: LIM has exited due to a fatal error.
    Mar  7 15:27:06 host58 systemd-logind[1231]: Session c12 logged out. Waiting for processes to exit.
    Mar  7 15:27:41 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
    Mar  7 15:27:41 host58 systemd-logind[1231]: New session c13 of user root.
    Mar  7 15:27:41 host58 systemd[1]: Started Session c13 of user root.
    Mar  7 15:27:53 host58 systemd-logind[1231]: Session c13 logged out. Waiting for processes to exit.
    Mar  7 15:27:53 host58 systemd[1]: session-c13.scope: Succeeded.
    Mar  7 15:27:53 host58 systemd-logind[1231]: Removed session c13.
    Mar  7 15:29:05 host58 systemd[1]: Starting Cleanup of Temporary Directories...
    Mar  7 15:29:05 host58 systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
    Mar  7 15:29:05 host58 systemd[1]: Started Cleanup of Temporary Directories.
    Mar  7 15:33:05 host58 systemd[1]: Starting dnf makecache...
    Mar  7 15:33:06 host58 dnf[6549]: Updating Subscription Management repositories.
    Mar  7 15:33:07 host58 nfsrahead[6565]: setting /home/lsfadmin readahead to 128
    Mar  7 15:33:10 host58 dnf[6549]: Metadata cache refreshed recently.
    Mar  7 15:33:10 host58 systemd[1]: dnf-makecache.service: Succeeded.
    Mar  7 15:33:10 host58 systemd[1]: Started dnf makecache.
    Mar  7 15:33:28 host58 lim[6586]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
    Mar  7 15:33:28 host58 lim[6586]: main: LIM has exited due to a fatal error.
    Mar  7 15:33:32 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
    Mar  7 15:44:16 host58 dbus-daemon[1225]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.1577' (uid=0 pid=6729 comm="sudo rm test_egosc_ " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023")
    Mar  7 15:44:16 host58 systemd[1]: Starting Fingerprint Authentication Daemon...
    Mar  7 15:44:17 host58 dbus-daemon[1225]: [system] Successfully activated service 'net.reactivated.Fprint'
    Mar  7 15:44:17 host58 systemd[1]: Started Fingerprint Authentication Daemon.
    Mar  7 15:44:19 host58 systemd-logind[1231]: Existing logind session ID 2 used by new audit session, ignoring.
    Mar  7 15:44:19 host58 systemd-logind[1231]: New session c14 of user root.
    Mar  7 15:44:19 host58 systemd[1]: Started Session c14 of user root.
    Mar  7 15:44:19 host58 systemd-logind[1231]: Session c14 logged out. Waiting for processes to exit.
    Mar  7 15:44:19 host58 systemd[1]: session-c14.scope: Succeeded.
    Mar  7 15:44:19 host58 systemd-logind[1231]: Removed session c14.
    Mar  7 15:44:47 host58 systemd[1]: fprintd.service: Succeeded.
    Mar  7 15:46:10 host58 nfsrahead[7093]: setting /home/lsfadmin readahead to 128
    Mar  7 15:46:36 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
    Mar  7 15:47:38 host58 lim[7153]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
    Mar  7 15:47:38 host58 lim[7153]: main: LIM has exited due to a fatal error.
    Mar  7 15:50:17 host58 nfsrahead[7193]: setting /home/lsfadmin readahead to 128
    Mar  7 15:50:37 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
    Mar  7 15:50:46 host58 nfsrahead[7237]: setting /home/lsfadmin readahead to 128
    Mar  7 15:51:31 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
    Mar  7 15:53:56 host58 nfsrahead[7304]: setting /home/lsfadmin readahead to 128
    Mar  7 15:54:21 host58 systemd[1]: home-lsfadmin.mount: Succeeded.
    Mar  7 15:55:19 host58 dbus-daemon[1225]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.1646' (uid=0 pid=7333 comm="sudo cat /var/log/messages " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023")
    Mar  7 15:55:19 host58 systemd[1]: Starting Fingerprint Authentication Daemon...
    Mar  7 15:55:19 host58 dbus-daemon[1225]: [system] Successfully activated service 'net.reactivated.Fprint'
    Mar  7 15:55:19 host58 systemd[1]: Started Fingerprint Authentication Daemon.
    [lsfadmin@host58 conf]$




    ------------------------------
    roy al nabbout
    ------------------------------



  • 49.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Thu March 07, 2024 08:56 PM

    log on host58 as root and  make sure lim process is not running.

    1) source LSF_TOP/conf/profile.lsf

    2) env | egrep "LSF | EGO" to list related parameters

    3) lsadmin ckconfig -v see if reports any error

    4) if no error in 3), run lsadmin limstartup

    5) if there is error in 3), append updated lsf.conf and ego.conf here so we can take a look again



    ------------------------------
    YI SUN
    ------------------------------



  • 50.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Fri March 08, 2024 04:00 AM
    Edited by roy al nabbout Fri March 08, 2024 04:12 AM

    In fact, for the information, host57 is the master host and 58 is a compute host, and here are the lsf.conf and ego.conf files:


    lsf.conf : 

    cat  lsf.conf
    # This file is produced automatically by lsfconfig according to
    # installation setup. Refer to "Administering IBM Spectrum LSF"
    # before changing any parameters in this file.
    # Any changes to the path names of LSF files must be reflected
    # in this file. Make these changes with caution.

    # After editing this file, run "lsadmin reconfig" and
    # "badmin mbdrestart" to apply your changes.
    LSF_SERVERDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc
    LSF_BINDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/bin
    LSF_LIBDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/lib


    LSB_SHAREDIR=/nfsdata/work

    # Configuration directories
    LSF_CONFDIR=/nfsdata/conf
    LSB_CONFDIR=/nfsdata/conf/lsbatch

    # Daemon log messages
    LSF_LOGDIR=/nfsdata/log
    LSF_LOG_MASK=LOG_WARNING

    # Batch mail message handling
    LSB_MAILTO=!U

    # Miscellaneous
    LSF_AUTH=eauth
    LSB_NCPU_ENFORCE=1

    # General lsfinstall variables
    LSF_MANDIR=/nfsdata/10.1/man
    LSF_INCLUDEDIR=/nfsdata/10.1/include
    LSF_MISC=/nfsdata/10.1/misc
    XLSF_APPDIR=/nfsdata/10.1/misc
    LSF_ENVDIR=/nfsdata/conf

    # Internal variable to distinguish Default Install
    LSF_DEFAULT_INSTALL=y

    # Internal variable indicating operation mode
    LSB_MODE=batch

    # Other variables
    LSF_LIM_PORT=7869
    LSF_RES_PORT=6878
    LSB_MBD_PORT=6881
    LSB_SBD_PORT=6882

    # Enable mbd query child
    LSB_QUERY_PORT=6891
    LSF_DYNAMIC_HOST_WAIT_TIME=60

    # WARNING: Please do not delete/modify next line!!
    LSF_LINK_PATH=n

    # LSF_MACHDEP and LSF_INDEP are reserved to maintain
    # backward compatibility with legacy lsfsetup.
    # They are not used in the new lsfinstall.
    LSF_INDEP=/nfsdata
    LSF_MACHDEP=/nfsdata/10.1

    LSF_TOP=/nfsdata
    LSF_VERSION=10.1
    LSF_ENABLE_EGO=Y
    LSF_EGO_ENVDIR=/nfsdata/conf/ego/cluster1/kernel
    EGO_WORKDIR=/nfsdata/work/cluster1/ego
    LSF_LIVE_CONFDIR=/nfsdata/work/cluster1/live_confdir

    # Default tuning parameters

    # Enable strict resource requirement syntax to select section
    LSF_STRICT_RESREQ=Y
    # Automatically shuts down any daemons running on hosts that attempted to
    # join the cluster, but failed to communicate within the
    # LSF_DYNAMIC_HOST_WAIT_TIME period.
    EGO_ENABLE_AUTO_DAEMON_SHUTDOWN=Y
    # Enable bmod to modify resource limits and location of job output files for running jobs
    LSB_MOD_ALL_JOBS=Y
    # Reduce pim update frequency
    LSF_PIM_SLEEPTIME_UPDATE=Y
    LSF_PIM_LINUX_ENHANCE=Y
    LSF_UNIT_FOR_LIMITS=MB
    # Do not lock lim when running exclusive jobs
    LSB_DISABLE_LIMLOCK_EXCL=Y
    # Display the execution host in the output of the command bsub -K
    LSB_SUBK_SHOW_EXEC_HOST=Y

    # Do not allow lsrun by default to encourage use of bsub
    LSF_DISABLE_LSRUN=Y

    # Turn off RES syncup to reduce traffic to master
    LSF_RES_SYNCUP_INTERVAL=0

    # Add slots information to the bjobs output
    LSB_BJOBS_DISPLAY_ENH=Y
    LSB_QUERY_ENH=Y
    #LSF_LIC_SCHED_HOST= # License scheduler host
    DAEMON_SHUTDOWN_DELAY=180

    LSF_PROCESS_TRACKING=Y
    LSF_LINUX_CGROUP_ACCT=Y

    LSB_ENABLE_HPC_ALLOCATION=Y
    LSB_BJOBS_PENDREASON_LEVEL=1
    LSF_MASTER_LIST="host57.secure-ic.adds"
    LSF_EGO_DAEMON_CONTROL=N
    [root@host58 conf]#

    ego.conf:

    [root@host58 conf]# cd ego
    [root@host58 ego]# ls
    cluster1
    [root@host58 ego]# cd cluster1/
    [root@host58 cluster1]# cd kernel/
    [root@host58 kernel]# cat ego.conf
    # $RCSfile$Revision$Date$
    # EGO kernel parameters configuration file
    #

    # EGO master candidate host
    EGO_MASTER_LIST="host57.secure-ic.adds"

    # EGO daemon port number
    EGO_KD_PORT=7870
    EGO_PEM_PORT=7871

    # EGO service directory
    EGO_ESRVDIR=/nfsdata/conf/ego/cluster1/eservice

    # EGO security configuration
    EGO_SEC_PLUGIN=sec_ego_default
    EGO_SEC_CONF=/nfsdata/conf/ego/cluster1/kernel

    # EGO event configuration
    #EGO_EVENT_MASK=LOG_INFO
    #EGO_EVENT_PLUGIN=eventplugin_snmp[SINK=host,MIBDIRS=/nfsdata/conf/ego/cluster1/kernel/mibs]

    # Parameters related to dynamic adding/removing host
    # EGO_GET_CONF=LIM

    EGO_CONFDIR=/nfsdata/conf/ego/cluster1/kernel
    EGO_TOP=/nfsdata
    [root@host58 kernel]#



    logs file:

    Mar  8 08:59:01 host58 systemd-logind[1231]: New session 13 of user lsfadmin.
    Mar  8 08:59:01 host58 systemd[1]: Started User runtime directory /run/user/1005.
    Mar  8 08:59:01 host58 systemd[1]: Starting User Manager for UID 1005...
    Mar  8 08:59:01 host58 nfsrahead[18546]: setting /home/lsfadmin readahead to 128
    Mar  8 08:59:02 host58 systemd[18520]: Listening on Sound System.
    Mar  8 08:59:02 host58 systemd[18520]: Reached target Paths.
    Mar  8 08:59:02 host58 systemd[18520]: Started Mark boot as successful after the user session has run 2 minutes.
    Mar  8 08:59:02 host58 systemd[18520]: Reached target Timers.
    Mar  8 08:59:02 host58 systemd[18520]: Starting D-Bus User Message Bus Socket.
    Mar  8 08:59:02 host58 systemd[18520]: Listening on Multimedia System.
    Mar  8 08:59:02 host58 systemd[18520]: Listening on D-Bus User Message Bus Socket.
    Mar  8 08:59:02 host58 systemd[18520]: Reached target Sockets.
    Mar  8 08:59:02 host58 systemd[18520]: Reached target Basic System.
    Mar  8 08:59:02 host58 systemd[1]: Started User Manager for UID 1005.
    Mar  8 08:59:02 host58 systemd[18520]: Starting Sound Service...
    Mar  8 08:59:02 host58 systemd[1]: Started Session 13 of user lsfadmin.
    Mar  8 08:59:03 host58 systemd[18520]: Started D-Bus User Message Bus.
    Mar  8 08:59:03 host58 systemd[18520]: Started Sound Service.
    Mar  8 08:59:03 host58 systemd[18520]: Reached target Default.
    Mar  8 08:59:03 host58 systemd[18520]: Startup finished in 1.247s.
    Mar  8 09:01:05 host58 systemd[18520]: Starting Mark boot as successful...
    Mar  8 09:01:05 host58 systemd[18520]: Started Mark boot as successful.
    Mar  8 09:58:53 host58 lim[20111]: main: initenv_((null)) failed. There is either a configuration error in ego.conf or some mandatory parameters are missing in ego.conf or in one or more environment variables.
    Mar  8 09:58:53 host58 lim[20111]: main: LIM has exited due to a fatal error.
    [root@host58 kernel]#

    hosts file:

    [root@host58 conf]# sudo cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    10.240.112.58 host58
    10.240.112.57 host57
    [root@host58 conf]#



    ------------------------------
    roy al nabbout
    ------------------------------



  • 51.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Sun March 10, 2024 09:37 PM

    It seems you enabled the EGO during installation. Let's try following on host58.

    1. kill lim process
    2. as root run ". profile.lsf"
    3. env | grep EGO
    4. unset all EGO_* environment varaibles listed in 3)
    5.  set LSF_ENABLE_EGO=N and comment out LSF_EGO_ENVDIR in lsf.conf
    6. run lsadmin limstartup


    ------------------------------
    YI SUN
    ------------------------------



  • 52.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon March 11, 2024 04:47 AM
    Edited by roy al nabbout Mon March 11, 2024 06:00 AM

    Thank you for these steps; they actually solved my problem, but I still have the issue with bhosts, bsub, and badmin reconfig, which still gives me 'permission denied'. I'm using lsfadmin on both servers, which have the same uid and gid and groups. Moreover, I have verified that it has full permissions on the directory.

    [lsfadmin@host58 conf]$ lsid
    IBM Spectrum LSF Community Edition 10.1.0.12, Jun 10 2021
    Copyright IBM Corp. 1992, 2016. All rights reserved.
    US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

    My cluster name is cluster1
    My master name is host57.secure-ic.adds
    [lsfadmin@host58 conf]$ lshosts
    HOST_NAME      type    model  cpuf ncpus maxmem maxswp server RESOURCES
    host57.secu  X86_64 Intel_E5  12.5    16  30.6G  15.5G    Yes (mg)
    host58.secu  X86_64 Intel_E5  12.5    16  30.6G  15.5G    Yes ()
    [lsfadmin@host58 conf]$ bhosts
    User permission denied
    [lsfadmin@host58 conf]$ badmin mbdrestart

    Checking configuration files ...

    No errors found.

    Failed: User permission denied
    [lsfadmin@host58 conf]$ badmin reconfig

    Checking configuration files ...

    No errors found.

    Failed: User permission denied
    [lsfadmin@host58 conf]$ cd ..
    [lsfadmin@host58 nfsdata]$ bsub < simple.sh
    User permission denied. Job not submitted.
    [lsfadmin@host58 nfsdata]$ ll
    total 1701784
    drwxr-xr-x. 12 lsfadmin lsfadmin       4096 Mar  8 14:50 10.1
    drwxrwxr-x.  3 lsfadmin lsfadmin         23 Mar  7 14:29 builds
    drwxr-xr-x.  5 lsfadmin lsfadmin       4096 Mar  8 14:50 conf
    -rwxr-xr-x.  1 lsfadmin lsfadmin        173 Feb 19 10:40 gedit.sh
    -rwxr-xr-x.  1 lsfadmin lsfadmin         27 Feb 19 10:16 hello.sh
    drwxr-xr-x.  2 lsfadmin lsfadmin       4096 Mar 11 09:21 log
    -rw-r--r--.  1 lsfadmin lsfadmin        417 May 27  2016 LSF_redist.txt
    drwxr-xr-x.  4 lsfadmin lsfadmin         28 Jul 19  2021 lsfsce10.2.0.12-x86_64
    -rwxr-xr-x.  1 lsfadmin lsfadmin 1742579740 Jan 30 13:51 lsfsce10.2.0.12-x86_64.tar.gz
    drwxr-xr-x.  5 lsfadmin lsfadmin         68 Mar  8 14:48 patch
    -rw-r--r--.  1 lsfadmin lsfadmin        753 Mar  8 14:49 patch.conf
    drwxr-xr-x.  3 lsfadmin lsfadmin         21 Mar  8 14:50 properties
    -rw-rw-r--.  1 lsfadmin lsfadmin          0 Mar  7 14:52 simple_job_628.err
    -rw-rw-r--.  1 lsfadmin lsfadmin       1564 Mar  7 14:52 simple_job_628.out
    -rw-rw-r--.  1 lsfadmin lsfadmin          0 Mar 11 09:21 simple_job_836.err
    -rw-rw-r--.  1 lsfadmin lsfadmin       1622 Mar 11 09:21 simple_job_836.out
    -rwxr-xr-x.  1 lsfadmin lsfadmin        115 Feb 19 09:14 simple.sh
    -rwxr-xr-x.  1 lsfadmin lsfadmin        230 Feb 20 14:14 testp_job.sh
    -rwxr-xr-x.  1 lsfadmin lsfadmin          0 Feb 13 14:22 testroy
    drwxr-xr-x.  3 lsfadmin lsfadmin         22 Feb 29 15:06 work
    [lsfadmin@host58 nfsdata]$



    [lsfadmin@host58 nfsdata]$ tail -n 100 /nfsdata/log/mbatchd.log.host57.secure-ic.adds
    Mar  8 08:51:58 2024 17990:17990 3 10.1                                 ncb_openLogFile: The file </nfsdata/work/cluster1/logdir/lsb.ncb.events> must be owned by <lsfadmin>, and the file permission mode must be 644 (-rw-r--r--).
    Mar  8 08:51:58 2024 17990:17990 3 10.1 ncb_initLogFile: ncb_openLogFile(/nfsdata/work/cluster1/logdir/lsb.ncb.events) failed.
    Mar  8 08:51:58 2024 17990:17990 3 10.1 ncb_check: ncb_initLogFile() failed.
    Mar  8 08:52:22 2024 17990:17990 3 10.1 mbdReConf: start
    Mar  8 08:52:22 2024 17990:17990 3 10.1 mbdReConf: done
    Mar  8 10:07:54 2024 17990:17990 3 10.1 mbdReConf: start
    Mar  8 10:07:55 2024 17990:17990 3 10.1 mbdReConf: done
    Mar  8 10:43:47 2024 38092:38095 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 47667 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar  8 10:43:47 2024 38092:38095 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:47667
    Mar  8 10:47:02 2024 38600:38603 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 45305 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar  8 10:47:02 2024 38600:38603 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:45305
    Mar  8 10:47:34 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 52871 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar  8 10:47:34 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:52871
    Mar  8 11:11:57 2024 42541:42545 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 58247 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar  8 11:11:57 2024 42541:42545 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:58247
    Mar  8 11:16:26 2024 43240:43256 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 55819 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar  8 11:16:26 2024 43240:43256 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:55819
    Mar  8 11:53:39 2024 17990:17990 3 10.1 mbdReConf: start
    Mar  8 11:53:40 2024 17990:17990 3 10.1 mbdReConf: done
    Mar  9 06:30:30 2024 224659:224677 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 58239 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar  9 06:30:30 2024 224659:224677 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:58239
    Mar 11 03:55:24 2024 643598:643604 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 51361 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 03:55:24 2024 643598:643604 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:51361
    Mar 11 04:18:08 2024 648872:648877 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 57725 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:18:08 2024 648872:648877 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:57725
    Mar 11 04:18:55 2024 648983:648998 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 43415 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:18:55 2024 648983:648998 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:43415
    Mar 11 04:19:06 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 59477 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:19:06 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:59477
    Mar 11 04:22:55 2024 651370:651375 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 42539 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:22:55 2024 651370:651375 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:42539
    Mar 11 04:26:01 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 47393 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:26:01 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:47393
    Mar 11 04:30:21 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 50605 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:30:21 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:50605
    Mar 11 04:37:04 2024 17990:17990 3 10.1 userok: Client 10.240.112.57:47177 is not using <0/eauth> authentication
    Mar 11 04:37:34 2024 17990:17990 3 10.1 mbdReConf: start
    Mar 11 04:37:34 2024 17990:17990 3 10.1 mbdReConf: done
    Mar 11 04:37:56 2024 17990:17990 3 10.1 mbdReConf: start
    Mar 11 04:37:56 2024 17990:17990 3 10.1 mbdReConf: done
    Mar 11 04:38:26 2024 17990:17990 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 56079 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:38:26 2024 17990:17990 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:56079
    Mar 11 04:38:43 2024 17990:17990 3 10.1 userok: Client 10.240.112.57:52967 is not using <0/eauth> authentication
    Mar 11 04:38:53 2024 17990:17990 3 10.1 userok: Client 10.240.112.57:35959 is not using <0/eauth> authentication
    Mar 11 04:39:07 2024 17990:17990 3 10.1 mbdReConf: start
    Mar 11 04:39:07 2024 17990:17990 3 10.1 mbdReConf: done
    Mar 11 04:39:25 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 47915 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:39:25 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:47915
    Mar 11 04:43:30 2024 654742:654746 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 37575 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:43:30 2024 654742:654746 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:37575
    Mar 11 04:43:44 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 38255 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:43:44 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:38255
    Mar 11 04:43:52 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 55477 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:43:52 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:55477
    Mar 11 04:44:05 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 37593 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 04:44:05 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:37593
    Mar 11 05:54:09 2024 665495:665499 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 42147 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 05:54:09 2024 665495:665499 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:42147
    Mar 11 05:55:32 2024 654090:654090 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 56371 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 11 05:55:32 2024 654090:654090 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:56371
    [lsfadmin@host58 nfsdata]$


    the logs announced that i have eauth authentication error
    -----------------------------
    roy al nabbout
    ------------------------------



  • 53.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon March 11, 2024 08:46 PM

    Do you have the same problem on host57? It seems lsfadmin account authentication fails. Are you sure lsfadmin account setting on host58 and host57 are same?



    ------------------------------
    YI SUN
    ------------------------------



  • 54.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Tue March 12, 2024 03:55 AM
    Edited by roy al nabbout Tue March 12, 2024 03:57 AM

    No , the problem is only on the host58 and I'm sure that the lsfadmin is the same on the 2 hosts: 

    [lsfadmin@host57 ~]$ id lsfadmin
    uid=1005(lsfadmin) gid=1006(lsfadmin) groups=1006(lsfadmin),10(wheel)
    [lsfadmin@host57 ~]$

    [lsfadmin@host58 ~]$ id lsfadmin
    uid=1005(lsfadmin) gid=1006(lsfadmin) groups=1006(lsfadmin),10(wheel)
    [lsfadmin@host58 ~]$



    ------------------------------
    roy al nabbout
    ------------------------------



  • 55.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Tue March 12, 2024 10:41 AM

    the logs: 

    Mar 12 06:01:18 2024 227725:227732 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 59509 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 12 06:01:18 2024 227725:227732 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:59509
    Mar 12 09:36:34 2024 259758:259762 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 60447 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 12 09:36:34 2024 259758:259762 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:60447
    Mar 12 09:38:51 2024 18193:18193 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 49835 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 12 09:38:51 2024 18193:18193 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:49835
    Mar 12 10:03:10 2024 264359:264365 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 48571 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 12 10:03:10 2024 264359:264365 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:48571
    Mar 12 10:03:18 2024 18193:18193 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 46045 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 12 10:03:18 2024 18193:18193 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:46045
    Mar 12 10:04:57 2024 18193:18193 3 10.1 mbdReConf: start
    Mar 12 10:04:57 2024 18193:18193 3 10.1 mbdReConf: done
    Mar 12 10:06:00 2024 264831:264836 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 55477 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 12 10:06:00 2024 264831:264836 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:55477
    Mar 12 10:14:15 2024 18193:18193 3 10.1 userok: Client 10.240.112.57:44083 is not using <0/eauth> authentication
    Mar 12 10:14:24 2024 18193:18193 3 10.1 mbdReConf: start
    Mar 12 10:14:24 2024 18193:18193 3 10.1 mbdReConf: done
    Mar 12 10:24:21 2024 267648:267651 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 54125 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 12 10:24:21 2024 267648:267651 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:54125
    Mar 12 10:32:19 2024 18193:18193 3 10.1 mbdReConf: start
    Mar 12 10:32:19 2024 18193:18193 3 10.1 mbdReConf: done
    Mar 12 10:32:34 2024 18193:18193 3 10.1 verifyEAuth/lib.eauth.c: eauth <1005 1006 lsfadmin 10.240.112.58 39891 64 user mbatchd@cluster1 NULL NULL
    > len=64 failed, rc=0
    Mar 12 10:32:34 2024 18193:18193 3 10.1 userok: eauth authentication failed for lsfadmin/10.240.112.58:39891

    my lsf.conf: 


    [lsfadmin@host57 conf]$ cat lsf.conf
    # This file is produced automatically by lsfconfig according to
    # installation setup. Refer to "Administering IBM Spectrum LSF"
    # before changing any parameters in this file.
    # Any changes to the path names of LSF files must be reflected
    # in this file. Make these changes with caution.

    # After editing this file, run "lsadmin reconfig" and
    LSF_SERVERDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/etc
    LSF_BINDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/bin
    LSF_LIBDIR=/nfsdata/10.1/linux2.6-glibc2.3-x86_64/lib


    LSB_SHAREDIR=/nfsdata/work

    # Configuration directories
    LSF_CONFDIR=/nfsdata/conf
    LSB_CONFDIR=/nfsdata/conf/lsbatch

    # Daemon log messages
    LSF_LOGDIR=/nfsdata/log
    LSF_LOG_MASK=LOG_WARNING

    # Batch mail message handling
    LSB_MAILTO=!U

    # Miscellaneous
    LSF_AUTH=eauth
    LSB_NCPU_ENFORCE=1

    # General lsfinstall variables
    LSF_MANDIR=/nfsdata/10.1/man
    LSF_INCLUDEDIR=/nfsdata/10.1/include
    LSF_MISC=/nfsdata/10.1/misc
    XLSF_APPDIR=/nfsdata/10.1/misc
    LSF_ENVDIR=/nfsdata/conf

    # Internal variable to distinguish Default Install
    LSF_DEFAULT_INSTALL=y

    # Internal variable indicating operation mode
    LSB_MODE=batch

    # Other variables
    LSF_LIM_PORT=7869
    LSF_RES_PORT=6878
    LSB_MBD_PORT=6881
    LSB_SBD_PORT=6882

    # Enable mbd query child
    LSB_QUERY_PORT=6891
    LSF_DYNAMIC_HOST_WAIT_TIME=60

    # WARNING: Please do not delete/modify next line!!
    LSF_LINK_PATH=n

    # LSF_MACHDEP and LSF_INDEP are reserved to maintain
    # backward compatibility with legacy lsfsetup.
    # They are not used in the new lsfinstall.
    LSF_INDEP=/nfsdata
    LSF_MACHDEP=/nfsdata/10.1

    LSF_TOP=/nfsdata
    LSF_VERSION=10.1
    LSF_ENABLE_EGO=N
    # LSF_EGO_ENVDIR=/nfsdata/conf/ego/cluster1/kernel
    EGO_WORKDIR=/nfsdata/work/cluster1/ego
    LSF_LIVE_CONFDIR=/nfsdata/work/cluster1/live_confdir

    # Default tuning parameters

    # Enable strict resource requirement syntax to select section
    LSF_STRICT_RESREQ=Y
    # Automatically shuts down any daemons running on hosts that attempted to
    # join the cluster, but failed to communicate within the
    # LSF_DYNAMIC_HOST_WAIT_TIME period.
    EGO_ENABLE_AUTO_DAEMON_SHUTDOWN=Y
    # Enable bmod to modify resource limits and location of job output files for running jobs
    LSB_MOD_ALL_JOBS=Y
    # Reduce pim update frequency
    LSF_PIM_SLEEPTIME_UPDATE=Y
    LSF_PIM_LINUX_ENHANCE=Y
    LSF_UNIT_FOR_LIMITS=MB
    # Do not lock lim when running exclusive jobs
    LSB_DISABLE_LIMLOCK_EXCL=Y
    # Display the execution host in the output of the command bsub -K
    LSB_SUBK_SHOW_EXEC_HOST=Y

    # Do not allow lsrun by default to encourage use of bsub
    LSF_DISABLE_LSRUN=Y

    # Turn off RES syncup to reduce traffic to master
    LSF_RES_SYNCUP_INTERVAL=0

    # Add slots information to the bjobs output
    LSB_BJOBS_DISPLAY_ENH=Y
    LSB_QUERY_ENH=Y
    #LSF_LIC_SCHED_HOST= # License scheduler host
    DAEMON_SHUTDOWN_DELAY=180

    LSF_PROCESS_TRACKING=Y
    LSF_LINUX_CGROUP_ACCT=Y

    LSB_ENABLE_HPC_ALLOCATION=Y
    LSB_BJOBS_PENDREASON_LEVEL=1
    LSF_MASTER_LIST="host57.secure-ic.adds"
    LSF_EGO_DAEMON_CONTROL=N
    [lsfadmin@host57 conf]$

    Can you help me to solve my problem, Thank you!!



    ------------------------------
    roy al nabbout
    ------------------------------



  • 56.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Tue March 12, 2024 12:13 PM

    Maybe try following to skip authentication

    1. Shut down LSF daemons on both hosts
    2. set LSF_AUTH=none
    3. set LSF_STRICT_CHECKING=N
    4. set LSF_AUTH_QUERY_COMMANDS=N
    5. Start up LSF daemons on both hosts


    ------------------------------
    YI SUN
    ------------------------------



  • 57.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted Mon February 12, 2024 10:43 AM

    I did as you told me, but the status of lsf.service is still failed.but the lsid and the lshosts work .



    ------------------------------
    roy al nabbout
    ------------------------------



  • 58.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted 20 days ago

    Hello!

    I am new to installing LSF, the truth is I have tried to follow the steps mentioned in the guide but when using lsid it gives me the following error:

    error lsid
    Here I put my lsf.conf file


    ------------------------------
    Jesus Diego Martínez
    ------------------------------



  • 59.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted 19 days ago

    Hi Jesus,

    Before you run lsid, do you set following parameters (you can check through "env | grep LSF")? Make sure you source lsf profile before you access LSF cluster (e.g. ". /usr/share/lsf/conf/profile.lsf")

    LSF_ENVDIR

    LSF_SERVERDIR

    LSF_BINDIR

    LSF_LIBDIR



    ------------------------------
    YI SUN
    ------------------------------



  • 60.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted 7 days ago

    Hello, good afternoon, thank you in advance for responding previously, and it seems that you were already able to detect the lsid command but now it gives me this problem:

    lsid

    here my playbookplaybook

    playbook

    Do you know how to solve it?



    ------------------------------
    Jesus Diego Martínez
    ------------------------------



  • 61.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted 6 days ago

    A bit confusion here. In your previous lsf.conf, LSF_MASTER_LIST=node1, but your install_lsf.yml shows LSF_MASTER_LIST=node2. Also you may make change in /etc/hosts as below.
    1. remove 127.0.0.1 jesus-VirtualBox

    2. make change to 192.168.0.104 node2 jesus-VirtualBox

    Stop LSF services and startup them again.



    ------------------------------
    YI SUN
    ------------------------------



  • 62.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted 6 days ago
    Edited by Jesus Diego Martínez 6 days ago

    Hello Yi Sun,
    It seems that I already modified the /etc/hosts file with what you mentioned but now it tells me that the lsf.conf file is wrong, here I attach the lsf.conf file from node2:

    lsid
    node2 on virtual Box
    node2 on virtual Box
    node2 on virtual Box



    ------------------------------
    Jesus Diego Martínez
    ------------------------------



  • 63.  RE: Ibm spectrum lsf community edition (lsfsce10.2.0.12)

    Posted 3 days ago

    Not sure what really is going wrong here. Suggests following.

    1. Set LSF_SERVERHOSTS="node1 node2"
    2. Add LSF_SERVERDIR, LSF_LIBDIR, LSF_BINDIR in lsf.conf as well.


    ------------------------------
    YI SUN
    ------------------------------