Order Management & Fulfillment

Order Management & Fulfillment

Come for answers, stay for best practices. All we're missing is you.

 View Only

Preparing Your On-Premise IBM Sterling Order Management for Holiday Peak Season 2025

By Shoeb Bihari posted 2 days ago

  
As the holiday shopping season approaches, retailers with on-premise IBM Sterling Order Management deployments need to ensure their systems are optimized to handle the increased demand. This critical business period requires careful planning and preparation to maintain system stability and performance. Here's your comprehensive guide to holiday readiness for on-premise OMS environments.

Understanding the Challenge

The holiday season brings unprecedented traffic to your order management system. Without proper preparation, this can lead to:
- Performance degradation
- System instability
- Order processing delays
- Poor customer experience

By following the recommendations in this guide, you can minimize these risks and ensure your on-premise OMS environment performs optimally during peak season.

IBM Sterling Order Management - Performance Guide

This document, along with the ongoing webinar series, provides insight into our proven best practices around customization patterns, configurations, testing, and ongoing housekeeping.  As with all recommendations, be sure to test these out in a non-production environment to validate and tune to your specific environments and use cases.

Essential Diagnostics to Have Ready: Platform Diagnostics & Critical Metrics

Database (DB2):

- Lock waits (MON_LOCKWAITS)
- Current SQL (MON_CURRENT_SQL)
- Connections (MON_GET_CONNECTION)
     - OMS identifies DB connection using yfs.app.identifyconnection=Y
- Workload (MON_GET_WORKLOAD)
- Transaction logs (MON_GET_TRANSACTION_LOG)
- Package cache delta (MON_GET_PKG_CACHE_STMT)

DB2 Performance and Lock Analysis using Instana Observability

JVM Profiling Tools:

- IBM Health Center for Java
- Oracle Java Flight Recorder (JFR)
- Thread dump scripts (kill -3 PID or jstack -l PID)

Application Diagnostics:

- TIMER & SQLDEBUG Trace capabilities
- YFS_STATISTICS_DETAIL table monitoring

- UI component diagnostics (console & network logs)

Performance optimization strategies

Database Optimization

1. Review and Apply Indices:
- Ensure proper indexing on frequently queried tables
- Pay special attention to YFS_PERSON_INFO, YFS_CUSTOMER, etc for customer searches

2. Query Optimization:

- Avoid using UPPER function in queries
- Implement shadow columns for case-insensitive searches
- Clean up stale data from configuration tables

3. Connection Management:
- Monitor connection pools and adjust as needed
- Identify and terminate long-running transactions

Application Tuning

1. API Usage:

- Avoid open-ended list API calls
- Use adequate filters in all API requests
- Optimize output templates to retrieve only necessary data

2. Tracing and Logging:
- Enable traces only at minimum required level
- Do not apply any tracing during peak hours as it can degrade the performance severely. 
- Use traces for very short periods and disable immediately
- Implement TraceTTL to automatically disable verbose logging

3. Admin Utilities:

- Restrict access to admin utilities
- Avoid using DB Query Tool for large result sets

Critical Workload Adjustments

1. Agent Management:
- Halt complex order reallocation agents (like IBA)
- Disable non-essential purges (There are some purges like Inventory, etc that you must run all the time)
- Ensure essential purge agents continue to run (e.g., Inventory Purge)

2. Capacity Management:
- Avoid disabling capacity by setting infinite thresholds
- Configure appropriate capacity thresholds before peak season

3. Manual Activities:
- Pause manual activity through API tester or DB query client tools
- Suspend manual reporting query execution
- Use Data Extract agent or scheduled reports instead

Backlog Management

1. Identify Contention Points:

- Monitor database queries
- Check JVM resources and GC overhead
- Review container CPU utilization

2. Throttle Workloads:

- Reduce threads and JVM instances to avoid contention
- Understand backlog by querying YFS_TASK_Q table
- Monitor queue depth regularly

Sterling Intelligent Promising (SIP) Considerations


1.
 Availability Operations:

- Run availability snapshots only during off-peak hours
- Use reduced snapshots (delivery-method based)
- Apply minimum window of 30 days for zero-availability cleanup

2. Distribution Group Management:

- Schedule DG updates/sync during off-peak hours
- Use Node + Item level DG sync for targeted updates

3. Network Configurations:

- Avoid recompute network availability during peak periods
- Perform Node on/off activities only during low-traffic windows

Pre-Peak Preparation Checklist

- [ ] Run Close Order agent to make orders eligible for purge
- [ ] Aggressively purge tables to keep them lightweight
- [ ] Review and disable non-critical Order Monitor rules
- [ ] Perform end-to-end performance testing with peak workloads
- [ ] Update to latest fixpack or patch level
- [ ] Document escalation procedures and support contacts
- [ ] Prepare runbooks with precise actions for common issues

Mitigation Tips:

Mustgather

Symptoms

Diagnostics

Mitigate

Server unresponsive / JVM Crash

High queue depth alert

Real time calls result in 500, 502, etc. errors.

Server down alerts

High thread utilization  (WebContainer, DefaultExecutor, etc.)

High GC/Heap utilization.

OOM, Stack Overflow exceptions

q 3x javacore/ threaddump, 20 sec apart

q heapdump, and GC logs for OOM

q linperf.sh for high CPU

q Server logs

q YFS_STATISTICS_DETAIL export

ü Restart the servers to mitigate the issue.

ü Increase heap (Xmx) if server is going OOM.

Database Slowness

Latches/Contention/Locks

Slow Queries

High DB transaction log utilization

Excessive latches/contention

Excessive YFC0006, YFC0003 errors

Excessive DB Connections, high wait time

High DB resources (CPU, Memory, IO) utilization

q    oms-db2collect.sh

q db2support | AWR Report

q 3x javacore/ threaddump, 20 sec apart from application JVM

q YFS_STATISTICS_DETAIL export

ü For locking terminate the connection holding the lock from DB side. 

ü For slow query, capture EXPLAIN an ADVICE and apply indices.

Application Slowness

API, Agent, Integration

High queue depth alert

Real time calls result in 500, 502, etc. errors.

Inventory/Order lookup calls failing

High transaction backlog  for schedule, release, etc.

q 3x javacore/ threaddump, 20 sec apart

q db2support | AWR Report

q TIMER  or SQLDEBUG trace for 5 minutes or single transaction.

q Application logs

q YFS_STATISTICS_DETAIL export

ü Throttle workload

ü Stop unnecessary workload/servers to reduce load on DB, JMS or External System.

ü Scale up if response time doesn’t degrade.

UI Slowness

Call Centre, Web Store

OrderHub, etc.

CSR unable to lookup orders

Store Associates unable to perform shipment action.

Browser hanging, or generic error message.

q Screen recording/capture

q HAR file (browser debug)

q Application and LB access logs

ü Delete browser cache, cookies, and browser temporary files

ü Verify network connectivity

OMS Container Issues

Container getting restarted frequently.

Health check failures

q Deployment details (YAML) | describe pod

q CPU/Memory request/limit

q Container CPU/Memory utilization metrics

ü Restart the servers to mitigate the issue.

ü Increase CPU/Memory requests


Engaging with Support


1.
 Team Preparation:

- Confirm support contacts for all system components
- Define a 24x7 support schedule for peak hours
- Establish clear communication channels and escalation procedures

2. Immediate Response Plan:

- Be ready to restart JVMs if necessary
- Prepare throttling strategies for overloaded servers
- Know how to reduce JVMs or threads quickly
- Have procedures to capture necessary logs and diagnostics

Proper preparation is the key to a successful holiday season for your on-premise IBM Sterling Order Management system. By implementing these recommendations, you'll be well-positioned to handle the increased demand and provide a seamless experience for your customers during this critical business period.

Remember that proactive monitoring and quick response to emerging issues are essential. Use the diagnostic tools and monitoring capabilities built into the system to identify and address potential problems before they impact your business operations.

For more detailed technical best practices, refer to the IBM Sterling OMS Performance Guide and Technical Best Practices documentation.
0 comments
4 views

Permalink