Data Protection Software

 View Only
  • 1.  Deduplication : Spectrum Protect LAN-Free using Spectrum Scale

    Posted Wed December 23, 2020 09:26 AM


    I want your advice regarding an architecture we are willing to push for one of our customer. We have acquired Spectrum Protect and Spectrum Scale for LAN-Free Backups.

    As I checked using LAN-Free is not supported with container pool storage, so we will not benefit from inline deduplication and we need to go with File legacy deduplication.

    I wanted to know what are the tradeoff of using this topology.  And what is the difference between the two types of deduce : 

    • Deduplication ratio : Do we have the same dedup ratio from container and file storage pools
    • Performance : Doing inline dedup involves some impact on performance, is it more efficient to do post-dedup
    • Any other consideration

    Thank you

    Largou walid

  • 2.  RE: Deduplication : Spectrum Protect LAN-Free using Spectrum Scale

    Posted Tue June 08, 2021 05:28 PM
    If you are using LANFREE I guess you are ingesting many TB per day.
    I foresee the following weakness. Using File Legacy DEDUP with large amount of backed up data, your TSMDB will grow to an uncomfortable size and you will start struggling with it (space allocation, offline maintentance, etc).
    I've been there. Not a place I would like to be back.

    I've been using File Legacy DEDUP for a while and converted all these FILE disk based stgpool to Containers, releasing TSMDB from IOPS and reducing their sizes significantly.
    I would not use FILE Legacy DEDUP with Storage Pools bigger than 30 TB. (it is a personal opinion based on my own experience).
    Hope it helps.

    Nicolás Pérez de Arenaza

  • 3.  RE: Deduplication : Spectrum Protect LAN-Free using Spectrum Scale

    Posted Wed June 09, 2021 02:49 AM
    LANfree data transfer is a way to transfer big amount of data faster than over traditional LAN.
    Such as storing data directly to tape media or as you mentioned using Spectrum Scale storage.

    Using deduplication with compression is a way to reduce the amount of data needed to be transferred.
    But it still requires to read all data to detect which part can be reduced using deduplication
    If the amount of similarities is high, than the deduplication reduction will be high, and the remaining data that will be sent will also be lower.
    The drawback with deduplication is that it can use more resources to read the data as it might go faster than sending all data
    The faster it goes to backup data, the more resources is needed to read all data, the more impact it will have to the overall performance of the applications.

    There are other ways to protect huge amount of data.

    Traditionally applications requires periodically full backups, which can take a long time to backup data, especially if the volume is big.
    What if one can take the benefits of using always incremental backup of applications too?
    Is that possible?
    - yes, with application consistent snapshots one can send incremental block changes using progressive allways incremental backups.
    - The first backup will be a full copy of all data
    - All other backups will be sending the block changes since last backup using a built in OS journal to detect the block changes

    How aboiut restore?
    - restore is nearly instantly.
    - the volume is provisioned directly from the Spectrum Protect server storage as a snapshot volumes to the operating system
    - the OS volume will immediately start to restore the volume in the background while the volume is fully operationable
    - the application can use the OS volume, start up it's databases, and perform normal operations while restoration is performed in background by the operating system
    - when all data has been restored, the provisioned volume is no longer in need, and will disappear from the OS

    Take a look at spictera solutions, they have interesting solutoins that improves data protection for Spectrum Protect.

    Using SPFS might also improve data protections of transactional data, if it is possible to integrate the transactional copy process with Spectrum Protect.
    For example Oracle, PostgreSQL, DB2, Progress OpenEdge 4GL, ...

    Let me know if you need some more advice

    Regards Tomas

    Tomas Dalebjörk

  • 4.  RE: Deduplication : Spectrum Protect LAN-Free using Spectrum Scale

    Posted Thu June 10, 2021 08:42 PM
    Deduplication ratio is similar; however, it is post-processing.  So, you ingest, then the data is read, deduplicated, and written out to new volumes.  File volume RECLAIM STGPOOL will happen.  Compression is an issue though.  Either you use client side compression, and give up some of your deduplication efficiency, or you don't use compression at all.  For things like database log backups, and database incremental backups, sometimes deduplication has no real effect, and only compression helps.

    Another option might be to mount your GPFS from Spectrum Scale directly to the TSM/SP server.  As always, you have communication bandwidth to deal with, but perhaps you could use 10gbe or infiniband or some other dedicated connectivity to the same fabric/network used by spectrum scale.  Locking latency, and data bandwidth matter.  But then, the TSM server could back up your GPFS.  This all depends on where you backing devices live, and your overall architecture, as to whether it could help.

    Alternatively, add a 10gb NIC, or 25GB NIC, or IP over FC on 32gbit, or any other fun option and just back up over IP from your normal client, and let it do client side dedupe/compression.  The CPU impact is not as big as you'd think.

    Josh-Daniel Davis
    Highland Village TX

  • 5.  RE: Deduplication : Spectrum Protect LAN-Free using Spectrum Scale

    Posted Fri June 11, 2021 06:21 AM
    There are otherways to protect data that improves data protection strategies.
    Traditional data protection, where clients sents data using normal traditional schedules
    • weekly full backups
    • daily incremental backups
    • hourly transactional backups
      • with an integrated backup such as using spictera SPFS solution, than the transactional data will be protected immediately when created, instead of when it is scheduled

    The methods to transfer data
    • over LAN
    • over WAN
    • over SAN
    Data reduction on client or server
    • de-duplication
    • compression
    For faster data protection, one can use application consistent snapshots
    Such as FlashCopy Manager (dont remember what it is called nowadays)
    Or using vendor specific snapshots.

    Note that a snapshot is not equal to a backup, as the changes are written in a "journal" using other physical disks, but are dependent on the original data disks. If the original data disk gets broken or the data is lost, than it might end up in an unrecoverable scenario.

    So how can this be improved?

    One interesting technique could be to use the spictera brick solution ("SPIR").
    With the spictera brick solution, one can take application consistent snapshot where the snapshots are stored on the Spectrum Protect backup server using progressive block level incremental forever (always incremental backups)
    The daily ingest data is now reduced to what has actually been changed since last backup.

    The restoration is fast, as the snapshot is provisioned from the Spectrum Protect backup server to the operating system as a snapshot device. The users can start using the volumes where the application data is stored immediately while the restoration is performed in background.

    If one is familar with how FastBack (FilesX) is working, than this is similar ...

    RTO can be shrinked to only a few minutes using spictera brick solution, or FastBack, or snapshots...
    RPO can be shrinked to up to last written transaction log using spictera SPFS solution, or integrated transactional backups (DB2)

    Tomas Dalebjörk