High Performance Computing Group

High Performance Computing Group

Connect with HPC subject matter experts and discuss how hybrid cloud HPC Solutions from IBM meet today's business needs.

 View Only
  • 1.  BJOBS reports more memory

    Posted Mon April 14, 2025 05:51 AM

    Hi,

    I am finding regular discrepancies between the memory consumption reported through /proc/meminfo, and the memory consumption reported by bjobs.

    In small testcases, it does not seem to be seen. But in a major tool, we are regularly seeing this.

    Is this something that has previously been reported? Are there any known causes and solutions?

    --

    Thanks & Regards,

    Vijay.



    ------------------------------
    Vijay Pasupuleti
    ------------------------------


  • 2.  RE: BJOBS reports more memory

    Posted Mon April 14, 2025 05:30 PM

    Do you mean max mem usage for job or current mem usage for job?



    ------------------------------
    YI SUN
    ------------------------------



  • 3.  RE: BJOBS reports more memory

    Posted Tue April 15, 2025 01:16 AM

    Hi Yi,

    It is the Max. Mem.

    Vijay.



    ------------------------------
    Vijay Pasupuleti
    ------------------------------



  • 4.  RE: BJOBS reports more memory

    Posted 30 days ago

    Hi Yi,

    Are there any known issues if it were Max. Mem?



    ------------------------------
    Vijay Pasupuleti
    ------------------------------



  • 5.  RE: BJOBS reports more memory

    Posted 27 days ago

    Assuming your job runs under LSF cgroup integration, right? Currently LSF calculates max memory by periodically (roughly every SBD_SLEEP_TIME) reads job cgroup's memory.usage_in_bytes(rather than reads cgroup's memory.max_usage_in_bytes), and set current job's max mem by comparing with previous sampling data, it is possible that LSF may miss actual max mem reached by the job. It is strange that LSF reports lager max mem for the job. Do you have LSB_CGROUP_MEM_INCLUDE_CACHD=N set in lsf.conf? 

    If you are not using LSF cgroup integration, LSF collects job memory usage from through LSF PIM process, this way you should consider setting following parameters in lsf.conf.

    LSF_PIM_LINUX_ENHANCE=Y

    LSF_PIM_SMAPS_UPDATE=Y

    EGO_PIM_SWAP_REPORT=Y

    I suggest you should create a suport case for more investigation of the root cause by shaing how you verif ps output and check /proc/meminfo



    ------------------------------
    YI SUN
    ------------------------------