High Performance Computing Group

 View Only

[LSF RTM]Collecting data from 'Kerberos enabled LSF cluster' in RTM 10.2.0.11

  • 1.  [LSF RTM]Collecting data from 'Kerberos enabled LSF cluster' in RTM 10.2.0.11

    Posted Mon January 11, 2021 02:25 AM

    - Add a new poller to monitor Kerberos enabled LSF cluster

    1. Login RTM poller host, copy folder $RTM_TOP/rtm/lsf1017 to a new one, or install RTM poller binary to a new folder, e.g. $RTM_TOP/rtm/lsf1017krb.

    2. Add the new pollers via RTM console. Select "LSF 10.1.0.7" in LSF Version section

    3. Enable Kerberos eauth
    # cd $RTM_TOP/rtm/lsf1017krb/etc
    # mv eauth.krb5 eauth
    # chmod u+s eauth

    - Prepare Kerberos on RTM poller host

    1. On RTM poller host, install Kerberos packages below
    • RHEL/CentOS 7/8
    krb5-workstation(provide kinit)
    krb5-devel(provide libkrb5.so libkrb5support.so libk5crypto.so for eauth.krb5)
    libcom_err-devel(provide libcom_err.so for eauth.krb5)

    • Ubuntu 18.04
    krb5-user(provide kinit)
    libkrb5-dev(provide libkrb5.so libkrb5support.so libk5crypto.so libcom_err.so for eauth.krb5)
    comerr-dev(provide libcom_err.so for eauth.krb5)

    • SLES 15
    krb5-client(provide kinit)
    krb5-devel(provide libkrb5.so libkrb5support.so libk5crypto.so)
    libcom_err-devel(provide libcom_err.so for eauth.krb5)

    2. Ensure RTM $Daemon_User is a valid user for all member of LSF master list
    • Under RTM for RHEL/CentOS, default is "apache"
    • Under RTM for SLES, default is "wwwrun"
    • Under RTM for Ubuntu, default is "www-data"

    3. Create a specified principal for RTM poller host. The name format is "RTM_$Daemon_User"
    • On RHEL/CentOS, the default one should be "RTM_apache"
    • On SLES, the default one should be "RTM_wwwrun"
    • On Ubuntu, the default one should be "RTM_www-data"

    4. Add following principal to Kerberos keytab file /etc/krb5.keytab on RTM poller host
    1. RTM_$Daemon_User created in above step
    2. Kerberos enabled cluster name and the master list. The name format is like
    lsf/<cluster name>
    lsf/<primary master>
    lsf/<master candidate> ...

    The Kerberos command is as below
    # kadmin ktadd -norandkey -k /etc/krb5.keytab <principal>

    5. Add RTM poller host to lsf.conf
    LSF_ADDON_HOSTS="<RTM poller host name>"

    - Add the Kerberos enabled LSF cluster on RTM console

    Note: Errors will occur before the following steps have been done.

    - Add RTM required users to Kerberos keytab

    1. Login RTM server host, add all RTM required principal to krb5.keytab file by
    If remote 'kadmin' privilege 'extract-keys' is granted, execute RTM utility script
    #$RTM_TOP/rtm/bin/createkeytab.sh [-t /<path to>/krb5.keytab]
    Or
    On RTM server, generate RTM required principal list by
    #$RTM_TOP/rtm/bin/createkeytab.sh -g
    On KDC server, run Kerberos keytab operation command
    #kadmin/kadmin.local ktadd -norandkey -k /<path to>/krb5.keytab <all principals above>
    Then, copy /<path to>/krb5.keytab back to RTM server host

    2. Copy /<path to>/krb5.keytab from RTM server host to RTM poller host
    On both RTM server host and poller host, edit /etc/krb5.conf, add option below under 'libdefaults' section
    default_client_keytab_name=/<path to>/krb5.keytab

    3. Execute $RTM_TOP/rtm/bin/updateccache.sh to initialize Kerberos credentials cache, immediately

    4. Add Kerberos credentials cache initialize utility as a cron task under all RTM poller host. The schedule cycle depends on your Kerberos ticket lifetime(Which is defined by "ticket_lifetime" under "libdefaults" in file /etc/krb5.conf). The recommendation is half of defined lifetime.
    For example, assume "ticket_lifetime=24h",
    Execute #crontab -e, append  "* */12 * * * /opt/IBM/rtm/bin/updateccache.sh > /dev/null 2>&1" in crontab editor, and save. List crontab by
    #crontab -l
    * */12 * * * /opt/IBM/rtm/bin/updateccache.sh > /dev/null 2>&1
    Or
    Create /etc/cron.d/rtmkrb5 as below
    #cat /etc/cron.d/rtmkrb5
    * */12 * * * root /opt/IBM/rtm/bin/updateccache.sh > /dev/null 2>&1
    Note:
    Manually create a new benchmark job with new user for Kerberos enabled cluster, you need to go through step 1 to 3 above, after the new user is created.​​​

    ------------------------------
    Edward Deng
    ------------------------------

    #SpectrumComputingGroup