High Performance Computing Group

High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

 View Only
  • 1.  BJOBS reports more memory

    Posted Mon April 14, 2025 05:51 AM

    Hi,

    I am finding regular discrepancies between the memory consumption reported through /proc/meminfo, and the memory consumption reported by bjobs.

    In small testcases, it does not seem to be seen. But in a major tool, we are regularly seeing this.

    Is this something that has previously been reported? Are there any known causes and solutions?

    --

    Thanks & Regards,

    Vijay.



    ------------------------------
    Vijay Pasupuleti
    ------------------------------


  • 2.  RE: BJOBS reports more memory

    Posted Mon April 14, 2025 05:30 PM

    Do you mean max mem usage for job or current mem usage for job?



    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: BJOBS reports more memory

    Posted Tue April 15, 2025 01:16 AM

    Hi Yi,

    It is the Max. Mem.

    Vijay.



    ------------------------------
    Vijay Pasupuleti
    ------------------------------



  • 4.  RE: BJOBS reports more memory

    Posted Thu April 17, 2025 01:20 AM

    Hi Yi,

    Are there any known issues if it were Max. Mem?



    ------------------------------
    Vijay Pasupuleti
    ------------------------------



  • 5.  RE: BJOBS reports more memory

    Posted Sun April 20, 2025 07:33 PM

    Assuming your job runs under LSF cgroup integration, right? Currently LSF calculates max memory by periodically (roughly every SBD_SLEEP_TIME) reads job cgroup's memory.usage_in_bytes(rather than reads cgroup's memory.max_usage_in_bytes), and set current job's max mem by comparing with previous sampling data, it is possible that LSF may miss actual max mem reached by the job. It is strange that LSF reports lager max mem for the job. Do you have LSB_CGROUP_MEM_INCLUDE_CACHD=N set in lsf.conf? 

    If you are not using LSF cgroup integration, LSF collects job memory usage from through LSF PIM process, this way you should consider setting following parameters in lsf.conf.

    LSF_PIM_LINUX_ENHANCE=Y

    LSF_PIM_SMAPS_UPDATE=Y

    EGO_PIM_SWAP_REPORT=Y

    I suggest you should create a suport case for more investigation of the root cause by shaing how you verif ps output and check /proc/meminfo



    ------------------------------
    YI SUN
    ------------------------------