Netezza Performance Server

 View Only
  • 1.  DISK_FPGA_ERROR on specific Table

    Posted Thu December 14, 2023 10:04 AM

    Hi All,

    we have faced this issue in the last day, each time we run statement on a specific table we had the Error following reported and also the restart od Database some time NPS remained in DOWN status.

    [nz@openpsese-npshost postgres]$ grep ERROR: pg.log | grep DISK
    2023-12-14 09:33:35.269920 CET [102969] ERROR:  DISK_FPGA_ERROR : Status=0,0x80000000 [ABORTED] SPU=1020 Dev=8 Eng=19 LBA=291400448
    [nz@openpsese-npshost postgres]$

    The only way to fix this, we had to restore the table form a previuos backup and drop the one with the problem.

    We had to drop since also the BACKUP failed with the same Error: ERROR:  DISK_FPGA_ERROR : Status=0,0x80000000 [ABORTED] SPU=1020 .... .

    Also a migrate to move the table in a DATABASE not under Bakup failed(same Error).

    Do you know how to find out what is corrupetd on table(by a query on data ...) and possible how to fix it ..... or it is possible just in the way above reported?

    Thanks a lot in advance.
    By Andrea



    ------------------------------
    Andrea Ceccotti
    ------------------------------



  • 2.  RE: DISK_FPGA_ERROR on specific Table

    Posted Thu December 14, 2023 10:11 AM

    The best course of action is to open a support ticket with IBM.   



    ------------------------------
    Rajshekar (Shekar) Iyer
    ------------------------------



  • 3.  RE: DISK_FPGA_ERROR on specific Table

    Posted Fri December 15, 2023 05:02 AM

    Hi,

    thanks for your feedback, I have done this and the IBM Support provided me the folowing feedback

    It looked like there was some data corruption in the table.

    Ok. but what I would like to know is if there is a way to identify the portion of corrupt data or an alternative to restoring the table from the backup via nzrestore and then proceeding to DROP it.
    I ask this, because for some tables we also have hourly loading and the difference between the photo from the last differential backup and the current one could be big, especially for very large tables.

    Thanks.

    Bye
    Andrea



    ------------------------------
    Andrea Ceccotti
    ------------------------------



  • 4.  RE: DISK_FPGA_ERROR on specific Table

    Posted Fri December 15, 2023 09:16 AM
    Edited by Adam Matusewicz Fri December 15, 2023 09:16 AM

    Hi Andrea
    What version of NPS is this system that's reporting the corruption running please (NPS 7.X or 11.X)?
    Have you experienced any blade/DAC failures recently? Or unexpected power outages, or database crashes? Because it's important to establish the  root cause of the corruption in order to prevent if from happening again before attempting to perform any kind of database restore operation.

    If you're no longer being supported by IBM, this is something we could potentially help with. See here for details: 
    https://smart-associates.biz/solutions/netezza-support.phpIf you've already fixed any possible underlying hardware/software/stability issues that caused the corruption, then unfortunately the only option to proceed after this is to restore the table from a backup.

    One way of potentially not needing to do this in future would be if you had a separate DR system that was continuously replicated from the primary system (on detecting corruption you could instantly switch to the DR environment, and then replicate from that back to the old primary over time). We have some software that could automate the entire process if you think that would be useful. See here for more details: Smart Database Replication for Netezza| Smart Associates

    Smart Associates remove preview
    Smart Database Replication for Netezza| Smart Associates
    Smart Database Replication for Netezza, is a managed service offering provided by Smart Associates.



    If we can be of any further assistance, please reach out to info@smart.associates and someone will get back to you.

    Thanks and Best Regards,
    Adam



    ------------------------------
    Adam Matusewicz
    ------------------------------



  • 5.  RE: DISK_FPGA_ERROR on specific Table

    Posted Fri December 15, 2023 11:46 AM

    Hi,

    this is the NPS release

    [nz@openpsese-npshost ~]$ nzrev
    Release 11.0.7.0 [Build 159]
    [nz@openpsese-npshost ~]$

    We are scheduling an upgrade of version with IBM Support that us currently ongoing.

    We are still under Support, yes we have faced some issues on NPS System in teh last months due to the "old" version so Support suggested to upgrade,

    I signed to this community and asked also here for this Error to go in parallel and find out a way to discover the eventual corrupeet data on table .... in case it is not possible to restore it form backup or the difference is so much great ....... Thanks a lot for your feedback.

    Andrea



    ------------------------------
    Andrea Ceccotti
    ------------------------------



  • 6.  RE: DISK_FPGA_ERROR on specific Table

    Posted Fri January 12, 2024 11:48 AM

    Hi Andrea,

    I am following up to ask if you have been able to resolve this now.

    Regards,
    Sonia



    ------------------------------
    Sonia Singh
    ------------------------------



  • 7.  RE: DISK_FPGA_ERROR on specific Table

    Posted Sat January 13, 2024 04:45 AM

    Hi Sonia,

    yes we solved this issue by restoring the table from a previuos backup and drop the one with the problem.

    We had to drop since any action made on table (also the BACKUP) failed with the same Error: ERROR:  DISK_FPGA_ERROR : Status=0,0x80000000 [ABORTED] SPU=1020 .... .

    My question was indeed if there was an alternative way to fix it.

    Thanks in advance,

    Bye
    Andrea



    ------------------------------
    Andrea Ceccotti
    ------------------------------



  • 8.  RE: DISK_FPGA_ERROR on specific Table

    Posted Sat January 13, 2024 07:26 PM
    This error may not indicate any physical data segment corruption, and it could do away by itself, however, if you like and agree its correctness, the following can be done:

    begin transaction;
    create temp table my_temp as select * from the_concerned_table;
    truncate table the_concerned_table;
    insert into the_concerned_table select * from my_temp;
    generate statistics on the_concerned_table;
    commit;

    Thanks,
    Daniel






  • 9.  RE: DISK_FPGA_ERROR on specific Table

    Posted Sun January 14, 2024 05:12 AM

    Hi Daniel,

    thanks a lot for your feedbak, next time I will face this issue I am going top try to fix it as you have suggested.

    Thanks again

    Bye
    Andrea



    ------------------------------
    Andrea Ceccotti
    ------------------------------