List of Contributions

YI SUN

Contact Details

My Content

1 to 20 of 50+ total
Posted By YI SUN Mon April 15, 2024 12:06 PM
Found In Egroup: High Performance Computing Group
\ view thread
You may check this link see how LSF supports X forwarding job. There are some other job submission (bsub) options to work with SSH tunneling. ------------------------------ YI SUN ------------------------------
Posted By YI SUN Tue April 09, 2024 12:52 PM
Found In Egroup: High Performance Computing Group
\ view thread
Now it is working on browsers in addition to Chrome. ------------------------------ YI SUN ------------------------------
Posted By YI SUN Wed March 27, 2024 05:35 PM
Found In Egroup: High Performance Computing Group
\ view thread
Yes.Seems Chrome works, but filefox and Edge fail. ------------------------------ YI SUN ------------------------------
Posted By YI SUN Wed March 20, 2024 01:22 PM
Found In Egroup: High Performance Computing Group
\ view thread
Hi Mitch, could you share the link you use to access LSF Community Edition? ------------------------------ YI SUN ------------------------------
Posted By YI SUN Tue March 12, 2024 12:13 PM
Found In Egroup: High Performance Computing Group
\ view thread
Maybe try following to skip authentication Shut down LSF daemons on both hosts set LSF_AUTH=none set LSF_STRICT_CHECKING=N set LSF_AUTH_QUERY_COMMANDS=N Start up LSF daemons on both hosts ------------------------------ YI SUN ------------------------------
Posted By YI SUN Mon March 11, 2024 08:45 PM
Found In Egroup: High Performance Computing Group
\ view thread
Do you have the same problem on host57? It seems lsfadmin account authentication fails. Are you sure lsfadmin account setting on host58 and host57 are same? ------------------------------ YI SUN ------------------------------
Posted By YI SUN Sun March 10, 2024 09:37 PM
Found In Egroup: High Performance Computing Group
\ view thread
It seems you enabled the EGO during installation. Let's try following on host58. kill lim process as root run ". profile.lsf" env | grep EGO unset all EGO_* environment varaibles listed in 3) set LSF_ENABLE_EGO=N and comment out LSF_EGO_ENVDIR in lsf.conf run lsadmin limstartup ---------- ...
Posted By YI SUN Thu March 07, 2024 08:55 PM
Found In Egroup: High Performance Computing Group
\ view thread
log on host58 as root and make sure lim process is not running. 1) source LSF_TOP/conf/profile.lsf 2) env | egrep "LSF | EGO" to list related parameters 3) lsadmin ckconfig -v see if reports any error 4) if no error in 3), run lsadmin limstartup 5) if there is error in 3), append updated lsf.conf ...
Posted By YI SUN Wed March 06, 2024 02:06 PM
Found In Egroup: High Performance Computing Group
\ view thread
Could you disable LSF_EGO_DAEMON_CONTROL=Y and add LSF_SERVERDIR, LSF_LIBDIR, LSF_BINDIR into lsf.conf file, then stop/start LSF service. ------------------------------ YI SUN ------------------------------
Posted By YI SUN Mon March 04, 2024 04:24 PM
Found In Egroup: High Performance Computing Group
\ view thread
You may check this link. https://community.ibm.com/community/user/cloud/discussion/how-to-run-pytorch-ddp-job-on-multi-nodes#bmcfa6563e-098c-4ad0-88f7-a0615a97de40 ------------------------------ YI SUN ------------------------------
Posted By YI SUN Mon March 04, 2024 04:19 PM
Found In Egroup: High Performance Computing Group
\ view thread
How did you launch the 2nd task on host1? ------------------------------ YI SUN ------------------------------
Posted By YI SUN Sun March 03, 2024 03:59 PM
Found In Egroup: High Performance Computing Group
\ view thread
Seems you should submit test job to queue "gq" rather "cq" ------------------------------ YI SUN ------------------------------
Posted By YI SUN Fri March 01, 2024 03:09 PM
Found In Egroup: High Performance Computing Group
\ view thread
LSF 10.1.0.14 adds lsfd-lim, lsfd-res, and lsfd-sbatchd service unit files to help auto restart lim/res/sbatchd if they exits unexpectedly. But this implementation seems not compatible with LSF daemon management through lsadmin/badmin/bctrld commands. You may consider to install following patch. The ...
Posted By YI SUN Fri March 01, 2024 03:03 PM
Found In Egroup: High Performance Computing Group
\ view thread
After sourcing LSF profile, run env | grep LSF to check if LSF_SERVERDIR, LSF_BINDIR, and LSF_LIBDIR environment variables are set. The error usually indicates those three variables are not set. You also can manually add them in lsf.conf file. ------------------------------ YI SUN --------------- ...
Posted By YI SUN Thu February 29, 2024 02:03 PM
Found In Egroup: High Performance Computing Group
\ view thread
It seems LSF environment issue. Before run lsid, you can try "source $LSF_TOP/conf/cshrc.lsf" or ". $LSF_TOP/conf/profile.lsf) to set up LSF environment in the shell session. Here LSF_TOP is the top directory of your LSF installation. ------------------------------ YI SUN ------------------------- ...
Posted By YI SUN Wed February 28, 2024 01:06 PM
Found In Egroup: High Performance Computing Group
\ view thread
You can use chmod u+s on $LSF_BINDIR/bctrld (owned by root account) as workaround. ------------------------------ YI SUN ------------------------------
Posted By YI SUN Thu February 22, 2024 11:53 AM
Found In Egroup: High Performance Computing Group
\ view thread
Try following see if one node job can get GPU allocation on host2. On host1, bsub -I -gpu "num=4" -m host2 nvidia-smi If above test is positive, try following see if it works on two nodes. On host1, bsub -n 2 -gpu "num=4/host" -R "type==any span[ptile=1]" blaunch nvdia-smi ...
Posted By YI SUN Fri February 16, 2024 11:08 AM
Found In Egroup: High Performance Computing Group
\ view thread
It doesn't seem you have resolved host name resolution issue as previously mentioned on host58. As same user I guess no problem for you to submit job and run hosts. ------------------------------ YI SUN ------------------------------
Posted By YI SUN Fri February 16, 2024 11:01 AM
Found In Egroup: High Performance Computing Group
\ view thread
On host57 use root account to stop and start LSF services. On host58 you need to source LSF profile before running any LSF command ------------------------------ YI SUN ------------------------------
Posted By YI SUN Fri February 16, 2024 10:57 AM
Found In Egroup: High Performance Computing Group
\ view thread
You can do both on the compute node. ------------------------------ YI SUN ------------------------------