IBM Storage Defender

IBM Storage Defender

Early threat detection and secure data recovery

 View Only
Expand all | Collapse all

Failed/missed backups

  • 1.  Failed/missed backups

    Posted Thu July 28, 2016 07:12 AM

    Hello,

    Another freshman question.

    I see on TSM console that some backups were Failed or Missed in recent days.
    I asked host owners to restart tsm agent on Wintel hosts and it helped for Missed backups and some of the Failed backups as i checked next day.
    Today i checked again and i see that some hosts which failed 2 days ago were successful yesterday and there are some new failed hosts affected.
    What could be the reason of such random failed/missed backup issues ?

    I have checked logs and I see errors like those below:


    07/28/2016 01:55:17      ANR2579E Schedule NIGHTLY_11PM_INCR in domain DOMAIN01 for
                              node FRRV138 failed (return code 12). (SESSION: 1345318)

    Or for missed backups:

    07/26/2016 20:00:01      ANR2578W Schedule NIGHTLY_INCR in domain DOMAIN01 for node
                              PSRRV017 has missed its scheduled start up window.

    What is the reason that backup is missed ? in log there is just an information that is missed and that's it.


    When i started reading logs i thought that if for any host I see entry about failed objects like below:


    07/27/2016 20:14:07      ANE4959I (Session: 1343028, Node: PSRRV017)  Total number
                              of objects failed:    29  (SESSION: 1343028)

    the result will be that backup will fail, but that's seems not correct. Some hosts have failed objects and even though finally i see log  that is completed successfully

    07/27/2016 20:14:07      ANR2507I Schedule NIGHTLY_INCR for domain DOMAIN01 started
                              at 07/27/2016 19:00:00 for node PSRRV017 completed
                              successfully at 07/27/2016 20:14:07. (SESSION: 1343028)

    So now I feel confused, what determines if backup will be missed or failed ?

    Kind regards
    BoguslawB

     

     



  • 2.  RE: Failed/missed backups

    Posted Thu July 28, 2016 07:21 AM

    For some hosts is error code 1 instead of 12

    07/27/2016 19:01:35     ANR2579E Schedule SQL_DAILY_DIF in domain DOMAIN01 for

                              node FFSRV141_SQL failed (return code 1). (SESSION:

                              1342062)



  • 3.  RE: Failed/missed backups

    Posted Thu July 28, 2016 07:56 AM

    You have to look for details in the log files located in the nodes local file system eg. 'C:\Program Files\Tivoli\TSM\baclient' for file backup and 'C:\Program Files\Tivoli\TSM\TDPSql' for SQL backup.



  • 4.  RE: Failed/missed backups

    Posted Thu July 28, 2016 08:47 AM

    In C:\Program Files\Tivoli\TSM\baclient\dsmerror.log file there is not much information just :

    07/27/2016 07:24:24 cuGetAdmCmdResp: Received a communication read error: rc: -50
    07/27/2016 07:24:24 ANS1017E Session rejected: TCP/IP connection failure
    07/27/2016 07:24:24 ANS8064E Communication timeout.  Reissue the command.
    07/27/2016 13:26:46 cuGetAdmCmdResp: Received a communication read error: rc: -50
    07/27/2016 13:26:46 ANS1017E Session rejected: TCP/IP connection failure
    07/27/2016 13:26:46 ANS8064E Communication timeout.  Reissue the command.
    07/27/2016 16:19:48 ANS8023E Unable to establish session with server.
    07/27/2016 16:20:01 ANS8034E Your administrator ID is not recognized by this server.
    07/27/2016 16:20:01 ANS8023E Unable to establish session with server.
    07/27/2016 16:20:52 ANS8023E Unable to establish session with server.
    07/27/2016 16:21:55 ANS8023E Unable to establish session with server.
    07/27/2016 16:32:53 ANS8023E Unable to establish session with server.
    07/27/2016 16:40:36 ANS8023E Unable to establish session with server.
    07/28/2016 12:43:43 cuGetAdmCmdResp: Received a communication read error: rc: -50
    07/28/2016 12:43:43 ANS1017E Session rejected: TCP/IP connection failure
    07/28/2016 12:43:43 ANS8064E Communication timeout.  Reissue the command.



  • 5.  RE: Failed/missed backups

    Posted Thu July 28, 2016 08:49 AM

    Errors listed in initial mail i collected using command q act command



  • 6.  RE: Failed/missed backups

    Posted Thu July 28, 2016 09:07 AM

    Which node/server is this log from? Content in dsmsched.log is also needed.



  • 7.  RE: Failed/missed backups

    Posted Mon August 01, 2016 05:36 AM

    It's on Server 2. I have asked Wintel/Linux admin to send me logs and attached logs extract from two example hosts dsmerror.log and dsmsched.log.

    What wonders me is that going through the logs i see that session is established and lot's of files for which status SENT  but  FAILED/SKIPPED as well, Total number of bytes transferred:   14.51 GB  but final status  is ANS1512E Scheduled event 'NIGHTLY_11PM_INCR' failed.  Return code = 12.. For all those hosts i see similar events like in attachments. Is that an issue completely on host side or i can do anything to fix it from TSM side ? For now, every day i have couple of servers (for some isse happens everyday for some every couple of days which means sometimes it's all fine) withe the status

    tsm: SERVER2>q ev * * begind=-3 exception=yes

    Scheduled Start          Actual Start             Schedule Name     Node Name         Status
    --------------------     --------------------     -------------     -------------     ---------
    07/29/2016 19:00:00      07/29/2016 19:26:44      NIGHTLY_INCR      FRPR034          Failed 12
    07/29/2016 19:00:00      07/29/2016 19:21:08      NIGHTLY_INCR      FRPR056          Failed 12
    07/29/2016 19:00:00      07/29/2016 19:19:22      NIGHTLY_INCR      FRPR060          Failed 12
    07/29/2016 19:00:00      07/29/2016 19:13:49      NIGHTLY_INCR      FRPR215          Failed 12
    07/29/2016 19:00:00      07/29/2016 19:11:26      NIGHTLY_INCR      FRPR25          Failed 12
    07/29/2016 19:00:00      07/29/2016 19:22:00      SQL_DAILY_DIF     FRPR_SQL      Failed  1
    07/29/2016 23:00:00      07/29/2016 23:02:47      NIGHTLY_11PM-     FRPR138          Failed 12



  • 8.  RE: Failed/missed backups

    Posted Mon August 01, 2016 07:31 AM

    Looks like both servers has problems with VSS systemstate snapshot. That means it must be fixed on the host side, not on TSM server.



  • 9.  RE: Failed/missed backups

    Posted Tue August 02, 2016 05:15 AM

    It could be the reason, those hosts are quite old Win2003.

    I have different story for SQL Host where Windows OS NIGHTLY_INCR is completed successfully but SQL_DAILY_DIF fails immediately with error code 1 and looking in dsmerror_sql.log i see that at 07/28/2016 17:08:35  sql file is used and for some reasons lock is not released till 19:00 when sql job SQL_DAILY_DIF  is scheduled

    07/28/2016 17:08:30 The file is being used by another process
    07/28/2016 17:08:35 ANS4987E Error processing '\\FRPR_SQL\c$\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\tempdb.mdf': the object is in use by another process
    07/28/2016 17:08:35 The file is being used by another process
    07/28/2016 17:08:35 ANS4987E Error processing '\\FRPR_SQL\c$\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\templog.ldf': the object is in use by another process
    07/28/2016 17:08:35 The file is being used by another process
    07/28/2016 17:09:34 ANS1577I The Windows console event handler received a 'Ctrl-C' console event.
    07/28/2016 19:05:57 ANS1909E The scheduled command failed.
    07/28/2016 19:05:57 ANS1512E Scheduled event 'SQL_DAILY_DIF' failed.  Return code = 1.

    in dsmsched.log i see:

    07/25/2016 19:17:25
    Executing Operating System command or script:
       c:\backup\dbdiff.cmd
    07/25/2016 19:17:25 Finished command.  Return code is: 1
    07/25/2016 19:17:25 ANS1909E The scheduled command failed.
    07/25/2016 19:17:25 ANS1512E Scheduled event 'SQL_DAILY_DIF' failed.  Return code = 1.
    07/25/2016 19:17:25 Sending results for scheduled event 'SQL_DAILY_DIF'.
    07/25/2016 19:17:25 Results sent to server for scheduled event 'SQL_DAILY_DIF'.

    07/25/2016 19:17:25 TSM Backup-Archive Client Version 5, Release 5, Level 4.0  
    07/25/2016 19:17:25 Querying server for next scheduled event.
    07/25/2016 19:17:25 Node Name: FRPR_SQL
    07/25/2016 19:17:25 Session established with server SERVER2: Windows
    07/25/2016 19:17:25   Server Version 6, Release 3, Level 3.0
    07/25/2016 19:17:25   Server date/time: 07/25/2016 19:17:25  Last access: 07/25/2016 19:17:25

     

    Is there anything i could do with that issue from TSM server side or again is the issue completely on hosts side ? what actually host owners could do to make sure that both Windows OS and SQL backup jobs will run succesfully ? (exept stopping sql services etc)

     

    Regards

    BB



  • 10.  RE: Failed/missed backups

    Posted Wed August 03, 2016 01:04 PM

    ^ Verify the eventviewer for more details.

    ^ Try excuting the script manually.



  • 11.  RE: Failed/missed backups

    Posted Thu August 04, 2016 02:12 AM

    Hi,

    Can you share the detail of below schedule which is failing.

    SQL_DAILY_DIF and what is the result when you run manually and share the content of script also.

     

    Regards

    Prem