WebSphere Application Server & Liberty

 View Only
  • 1.  Can we limit the time a tWAS v9 server stops?

    IBM Champion
    Posted Thu July 15, 2021 07:38 AM

    We are running a nightly backup script which stops WAS, takes a backup of the the database and then starts WAS again. This usually works nicely but intermittently fails after 20 - 25 minutes.

    We are seeing the following messages in the SystemOut.log:

    --- snip ---

    [7/15/21 2:17:44:344 CEST] 0000009a filter I com.ibm.ws.webcontainer.filter.FilterInstanceWrapper destroy CWSRV0248I: Requests are currently being processed. Waiting up to 60 seconds before forcing filter destroy.
    --- snip ---

    and in the stopServer.log we see after ~20 minutes:
    --- snip ---
    [7/15/21 2:20:13:136 CEST] 00000001 AdminTool A ADMU3111E: Server stop requested but failed to complete.
    [7/15/21 2:20:13:139 CEST] 00000001 WsServerStop E ADMU3002E: Exception attempting to process server WebSphere_Portal
    [7/15/21 2:20:13:141 CEST] 00000001 WsServerStop E ADMU3007E: Exception com.ibm.websphere.management.exception.AdminException: ADMU3060E:
    Timed out waiting for server shutdown.
    [7/15/21 2:20:13:142 CEST] 00000001 WsServerStop A ADMU3007E: Exception com.ibm.websphere.management.exception.AdminException: ADMU3060E:
    Timed out waiting for server shutdown.
    :
    --- snip ---

    I found that we can reduce the time waiting by setting `com.ibm.ws.webcontainer.ServletDestroyWaitTime=30` but this might just delay the problem.

    Hence I'm wondering if there is a way to configure the wait-time for the shutdown and a hard time limit to force a shutdown

    Thanks in advance for the advise and have a good day - Hermann




    ------------------------------
    Hermann Huebler
    2innovate IT Consulting GmbH
    Vienna
    Austria

    #IBMChampion
    ------------------------------


  • 2.  RE: Can we limit the time a tWAS v9 server stops?
    Best Answer

    Posted Fri July 16, 2021 09:09 AM
    Edited by Hermann Huebler Fri July 16, 2021 11:05 AM
    Hermann

    As you know the custom property you cited com.ibm.ws.webcontainer.ServletDestroyWaitTime defaults to 60 seconds, so while reducing it to 30 seconds may help, but the fact that shutdown is still running after 20 minutes and the property defaults to 60 seconds, makes me wonder what's going on for the subsequent 19 minutes.

    Another customer property that comes into play is com.ibm.ejs.sm.server.quiesceTimeout
    https://www.ibm.com/docs/en/was/9.0.5?topic=jvm-java-virtual-machine-custom-properties#com.ibm.ejs.sm.server.quiescetimeout which defaults to 180 seconds

    You could certainly set it lower to force a shutdown, though again doing so might not help, if there's some long running requests that the application server started processing prior to the stopServer command which

    a) stops all incoming requests
    b) waits until the value of com.ibm.ejs.sm.server.quiesceTimeout, for existing requests to complete

    I know that in using wsadmin to stop a server using AdminControl.stopServer('serverName', 'mynode') there are a couple of arguments you can add to force an earlier shutdown

    1) AdminControl.stopServer('serverName', 'nodeName', immediate)

    and 

    2) AdminControl.stopServer('serverName', 'nodeName', terminate)   

    Note that "terminate" is part of the wsadmin AdminControl.stopServer help command, but not documented in the KC

    These two options function as follows
    - "immediate" performs an immediate shutdown, a value of 0 (zero) for com.ibm.ejs.sm.server.quiesceTimeout
    - "terminate" is equivalent to a command line kill -9 <pid>

    The problem is I suspect you desire that the existing requests, those received prior to the stopServer command, complete normally, and  neither of these will do so.



    ------------------------------
    Tom Alcott
    Senior Technical Staff Member
    IBM
    ------------------------------



  • 3.  RE: Can we limit the time a tWAS v9 server stops?

    Posted Fri July 16, 2021 12:45 PM
    Edited by Stephanie Wilkerson Mon July 19, 2021 10:13 AM
    Hermann

    Just wanted re-enforce my caution in using either "immediate" or "terminate" as part of AdminControl.stopServer() . When using those options, existing requests, meaning those received prior to the stopServer command, that are still in process when the command is executed, will not complete normally, you can see this below in excerpts from the SystemOut.log for 
    AdmiinControl.stopServer(), AdmiinControl.stopServer()  with "immediate" and AdmiinControl.stopServer() with terminate. So before i
    implementing this make sure you test it in your environment  (in the case of terminate as I mentioned before it's equivalent to "kill -9 <pid>" as shown by the lack of any output) 


    < AdmiinControl.stopServer() > 

    [7/15/21 17:01:43:846 CDT] 00000054 NGUtil$Server I ASND0002I: Detected server dmgr started on node fedora33CellManager01
    [7/15/21 17:01:44:148 CDT] 00000054 NGUtil$Server I ASND0002I: Detected server nodeagent started on node fedora33Node01
    [7/15/21 17:02:36:338 CDT] 0000007f authz I CWWIM2000I Initialization of the authorization component completed successfully.
    [7/15/21 17:02:36:547 CDT] 0000007f AdminHelper A ADMN1020I: An attempt is made to stop the server1 server. (User ID = defaultWIMFileBasedRealm/wasadmin)
    [7/15/21 17:02:36:599 CDT] 0000004c ServerCollabo A WSVR0023I: Server server1 is stopping
    [7/15/21 17:02:36:609 CDT] 0000004c TCPChannel I TCPC0002I: TCP Channel TCP_1 has stopped listening on host * (IPv6) port 9061.
    [7/15/21 17:02:36:612 CDT] 0000004c TCPChannel I TCPC0002I: TCP Channel TCP_3 has stopped listening on host * (IPv6) port 9044.
    [7/15/21 17:02:36:614 CDT] 0000004c TCPChannel I TCPC0002I: TCP Channel TCP_2 has stopped listening on host * (IPv6) port 9080.
    [7/15/21 17:02:36:615 CDT] 0000004c TCPChannel I TCPC0002I: TCP Channel TCP_4 has stopped listening on host * (IPv6) port 9443.
    [7/15/21 17:02:41:630 CDT] 0000004c JMSProxyRunti I XJMS0005I: XD JMS message pacing controller successfully stopped.
    [7/15/21 17:02:41:634 CDT] 0000004c ApplicationMg A WSVR0217I: Stopping application: plantsbywebsphereee6
    [7/15/21 17:02:41:671 CDT] 0000004c WebContainerL I WebContainerLifecycle stopApplication OpenWebBeans Container was stopped for context path, [/PlantsByWebSphere]
    [7/15/21 17:02:41:675 CDT] 0000004c SharedEJBRunt I WSVR0041I: Stopping EJB jar: PlantsByWebSphere.war
    [7/15/21 17:02:41:683 CDT] 0000004c SharedEJBRunt I WSVR0059I: EJB jar stopped: PlantsByWebSphere.war
    [7/15/21 17:02:41:717 CDT] 0000004c ApplicationMg A WSVR0220I: Application stopped: plantsbywebsphereee6
    [7/15/21 17:02:41:719 CDT] 0000004c SchedulerServ I SCHD0040I: The Scheduler Service is stopping.
    [7/15/21 17:02:41:720 CDT] 0000004c SchedulerServ I SCHD0002I: The Scheduler Service has stopped.
    [7/15/21 17:02:41:722 CDT] 0000004c AppProfileCom I ACIN0009I: The application profiling service is stopping.
    [7/15/21 17:02:41:722 CDT] 0000004c ActivitySessi I WACS0049I: The ActivitySession service is stopping.
    [7/15/21 17:02:41:730 CDT] 0000004c ObjectPoolSer I OBPL0011I: The Object Pool service is stopping.
    [7/15/21 17:02:41:732 CDT] 0000004c distSecurityC I securityServiceStarted is false
    [7/15/21 17:02:41:735 CDT] 0000004c CGBridgeServi I CWRCB0103I: The core group bridge service has stopped.
    [7/15/21 17:02:41:736 CDT] 0000004c DragDropDeplo I CWLDD0004I: Stopping monitored directory application deployment service...
    [7/15/21 17:02:41:737 CDT] 0000004c DragDropDeplo I CWLDD0005I: Monitored directory application deployment service is stopped.
    [7/15/21 17:02:41:738 CDT] 0000004c DCSStackImpl I DCSV8011I: DCS Stack DefaultCoreGroup at Member fedora33Cell01\fedora33Node01\server1: Member received a request for shutdown.
    [7/15/21 17:02:41:754 CDT] 0000005c Peer I ODCF8533I: Removed neighbor ip=192.168.142.148 udp=11008 tcp=11007 ID=732a18ebedcb5d74f5265764c83c54c9abcb03b8 version=0;cellName=fedora33Cell01;bridgedCells=[];structuredGateway=true;properties={inOdc=1, memberName=fedora33Cell01\fedora33Node01\nodeagent, MEMBER_STARTUP_TIME=1626379503640, epoch=1626379505977, MEMBER_VERSION=4}, neighbor set size is now 1.
    [7/15/21 17:02:41:769 CDT] 0000004c CfwTCPListene I ODCF8525I: The unstructured overlay is stopping its inbound TCP channel.
    [7/15/21 17:02:41:770 CDT] 00000051 Peer I ODCF8534I: Removed neighbor ip=192.168.142.148 udp=11005 tcp=11006 ID=9953e59bb9efc734d27a1fedd05010331e21984a version=0;cellName=fedora33Cell01;bridgedCells=[];structuredGateway=true;properties={MEMBER_STARTUP_TIME=1626379446389, MEMBER_VERSION=4, memberName=fedora33Cell01\fedora33CellManager01\dmgr, epoch=1626379448398, inOdc=1, ODC_PUBLISHER_ONLY=false}, neighbor set is now 0 nodes
    .
    [7/15/21 17:02:41:775 CDT] 0000004c Peer I ODCF8526I: The unstructured overlay is stopping its outbound TCP channel.
    [7/15/21 17:02:41:793 CDT] 0000004c Peer I ODCF8550I: The current TCP protocol statistics for the unstructured overlay member {192.168.142.148 udp_port=11009 tcp_port=11010} are {connOpened=9, connClosed=5, read={buffAloc=26, byteAloc=133192, buffReld=22, byteReld=133160}, write={buffAloc=28, byteAloc=78848, buffReld=28, byteReld=78848} }.
    [7/15/21 17:02:41:811 CDT] 0000004f Peer I ODCF8527I: The unstructured overlay UDP Channel UDP_0 has stopped listening on host 192.168.142.148 port 11009.
    [7/15/21 17:02:41:842 CDT] 0000004c TCPChannel I TCPC0002I: TCP Channel TCPInboundChannel_ipcc.Default_IPC_Connector_Name has stopped listening on host localhost (IPv4: 127.0.0.1) port 9633.
    [7/15/21 17:02:41:888 CDT] 0000004c FailureScopeC A WTRN0105I: The transaction service has shutdown successfully with no transactions requiring recovery.
    [7/15/21 17:02:41:897 CDT] 0000004c TCPChannel I TCPC0002I: TCP Channel sonIncomingTcpConnectorInboundChannel_SonIncomingTCPSubsystem_0_11010 has stopped listening on host fedora33.demo.com (IPv4: 192.168.142.148) port 11010.
    [7/15/21 17:02:41:907 CDT] 0000004c TCPChannel I TCPC0002I: TCP Channel TCP_5 has stopped listening on host * (IPv6) port 9353.
    [7/15/21 17:02:41:910 CDT] 0000004c ServerCollabo A WSVR0024I: Server server1 stopped

    < /AdmiinControl.stopServer() > 



    < AdmiinControl.stopServer() with immediate  > 

    [7/15/21 17:05:03:901 CDT] 00000055 NGUtil$Server I ASND0002I: Detected server nodeagent started on node fedora33Node01
    [7/15/21 17:05:04:203 CDT] 00000054 NGUtil$Server I ASND0002I: Detected server dmgr started on node fedora33CellManager01

    [7/15/21 17:05:46:515 CDT] 0000007d AdminHelper A ADMN1022I: An attempt is made to immediately stop the server1 server. (User ID = defaultWIMFileBasedRealm/wasadmin)
    [7/15/21 17:05:46:566 CDT] 00000049 ServerCollabo A WSVR0023I: Server server1 is stopping
    [7/15/21 17:05:46:579 CDT] 00000049 TCPChannel I TCPC0002I: TCP Channel TCP_1 has stopped listening on host * (IPv6) port 9061.
    [7/15/21 17:05:46:584 CDT] 00000049 TCPChannel I TCPC0002I: TCP Channel TCP_3 has stopped listening on host * (IPv6) port 9044.
    [7/15/21 17:05:46:592 CDT] 00000049 TCPChannel I TCPC0002I: TCP Channel TCP_2 has stopped listening on host * (IPv6) port 9080.
    [7/15/21 17:05:46:600 CDT] 00000049 TCPChannel I TCPC0002I: TCP Channel TCP_4 has stopped listening on host * (IPv6) port 9443.
    [7/15/21 17:05:46:607 CDT] 00000049 JMSProxyRunti I XJMS0005I: XD JMS message pacing controller successfully stopped.
    [7/15/21 17:05:46:616 CDT] 00000049 ApplicationMg A WSVR0217I: Stopping application: plantsbywebsphereee6
    [7/15/21 17:05:46:666 CDT] 00000049 WebContainerL I WebContainerLifecycle stopApplication OpenWebBeans Container was stopped for context path, [/PlantsByWebSphere]
    [7/15/21 17:05:46:671 CDT] 00000049 SharedEJBRunt I WSVR0041I: Stopping EJB jar: PlantsByWebSphere.war
    [7/15/21 17:05:46:686 CDT] 00000049 SharedEJBRunt I WSVR0059I: EJB jar stopped: PlantsByWebSphere.war
    [7/15/21 17:05:46:786 CDT] 00000049 ApplicationMg A WSVR0220I: Application stopped: plantsbywebsphereee6
    [7/15/21 17:05:46:793 CDT] 00000049 SchedulerServ I SCHD0040I: The Scheduler Service is stopping.
    [7/15/21 17:05:46:794 CDT] 00000049 SchedulerServ I SCHD0002I: The Scheduler Service has stopped.
    [7/15/21 17:05:46:796 CDT] 00000049 AppProfileCom I ACIN0009I: The application profiling service is stopping.
    [7/15/21 17:05:46:797 CDT] 00000049 ActivitySessi I WACS0049I: The ActivitySession service is stopping.
    [7/15/21 17:05:46:811 CDT] 00000049 ObjectPoolSer I OBPL0011I: The Object Pool service is stopping.
    [7/15/21 17:05:46:815 CDT] 00000049 distSecurityC I securityServiceStarted is false
    [7/15/21 17:05:46:838 CDT] 00000049 CGBridgeServi I CWRCB0103I: The core group bridge service has stopped.
    [7/15/21 17:05:46:841 CDT] 00000049 DragDropDeplo I CWLDD0004I: Stopping monitored directory application deployment service...
    [7/15/21 17:05:46:844 CDT] 00000049 DragDropDeplo I CWLDD0005I: Monitored directory application deployment service is stopped.
    [7/15/21 17:05:46:848 CDT] 00000049 DCSStackImpl I DCSV8011I: DCS Stack DefaultCoreGroup at Member fedora33Cell01\fedora33Node01\server1: Member received a request for shutdown.
    [7/15/21 17:05:46:881 CDT] 00000049 CfwTCPListene I ODCF8525I: The unstructured overlay is stopping its inbound TCP channel.
    [7/15/21 17:05:46:887 CDT] 00000049 Peer I ODCF8526I: The unstructured overlay is stopping its outbound TCP channel.
    [7/15/21 17:05:46:890 CDT] 00000049 Peer I ODCF8550I: The current TCP protocol statistics for the unstructured overlay member {192.168.142.148 udp_port=11009 tcp_port=11010} are {connOpened=7, connClosed=3, read={buffAloc=26, byteAloc=156728, buffReld=22, byteReld=156696}, write={buffAloc=29, byteAloc=79872, buffReld=29, byteReld=79872} }.
    [7/15/21 17:05:46:894 CDT] 0000004d Peer I ODCF8527I: The unstructured overlay UDP Channel UDP_0 has stopped listening on host 192.168.142.148 port 11009.
    [7/15/21 17:05:46:935 CDT] 00000049 TCPChannel I TCPC0002I: TCP Channel TCPInboundChannel_ipcc.Default_IPC_Connector_Name has stopped listening on host localhost (IPv4: 127.0.0.1) port 9633.
    [7/15/21 17:05:47:052 CDT] 00000049 FailureScopeC A WTRN0105I: The transaction service has shutdown successfully with no transactions requiring recovery.
    [7/15/21 17:05:47:062 CDT] 00000049 TCPChannel I TCPC0002I: TCP Channel sonIncomingTcpConnectorInboundChannel_SonIncomingTCPSubsystem_0_11010 has stopped listening on host fedora33.demo.com (IPv4: 192.168.142.148) port 11010.
    [7/15/21 17:05:47:079 CDT] 00000049 TCPChannel I TCPC0002I: TCP Channel TCP_5 has stopped listening on host * (IPv6) port 9353.
    [7/15/21 17:05:47:083 CDT] 00000049 ServerCollabo A WSVR0024I: Server server1 stopped
     
    </ AdmiinControl.stopServer() with immediate > 

    < AdmiinControl.stopServer() with terminate  > 

    7/15/21 17:08:11:679 CDT] 00000051 NGUtil$Server I ASND0002I: Detected server nodeagent started on node fedora33Node01
    [7/15/21 17:08:11:982 CDT] 00000052 NGUtil$Server I ASND0002I: Detected server dmgr started on node fedora33CellManager01

    < /AdmiinControl.stopServer() with terminate  > 

    ------------------------------
    Tom Alcott
    ------------------------------



  • 4.  RE: Can we limit the time a tWAS v9 server stops?

    Posted Thu August 19, 2021 09:36 AM
    I hit this in a lab system of mine I've had for years.  I grabbed a few javacores with a kill -3 command, and, realized I had some hung threads due to (poorly written) custom code I was doing for a quick proof of concept.  I cleaned up the custom code and issue was resolved.  FYI.

    ------------------------------
    Travis Cornwell
    ------------------------------