File and Object Storage

File and Object Storage

Software-defined storage for building a global AI, HPC and analytics data platform 

 View Only

IBM Systems Technical University - Day 4 morning

By Tony Pearson posted Fri May 26, 2017 01:47 PM

  

Originally posted by: TonyPearson


Banner-IBM-TechU-Orlando-2017

This week, I am presenting at the IBM Systems Technical University in Orlando, Florida, May 22-26, 2017. Here is my recap of the sessions on the morning of Day 4.

Configurable IBM Spectrum Scale

Kent Koeninger presented IBM Spectrum Scale software, which Kent refers to as "Configurable Spectrum Scale" (or CSS for short), as opposed to the pre-built system known as Elastic Storage Server (ESS).

Why choose CSS versus ESS? Lower entry price. You can start with just two single-socket servers and a drawer of disk.

IBM Spectrum Scale was formerly called IBM General Parallel File System (GPFS). Many who tried earlier versions of GPFS found it difficult to configure, because it only had a command line interface. Now, Spectrum Scale has a fully-functional GUI, and clients have been able to install and configure Spectrum Scale in just 30 minutes!

How big can Spectrum Scale grow? As much as your budget can afford! With an architecture that can support YottaBytes of data and 900 quintillion files, you won't hit any limits anytime soon.

There are some unique capabilities of ESS not available in CSS. For example, ESS offers Spectrum Scale Native RAID (erasure coding) with fast rebuild times, and ESS is certified for SAP HANA. You can combine any combination of CSS and ESS in the same Spectrum Scale to create a "data lake" for mixed workloads.

A good use case for Spectrum Scale, either CSS or ESS, is backup. Kent explained why it is an excellent option to store backups with enterprise backup software such as IBM Spectrum Protect or Commvault.

VersaStack - Hybrid Cloud like no other

This session was jointly presented by Chris Vollmar, IBM Storage Architect, and Brent Anderson, Cisco Global Consulting Systems Engineer. IBM and Cisco have been partners for more than 25 years.

VersaStack combines Cisco UCS x86 servers, Cisco Nexus and MDS switches, and IBM FlashSystem or Spectrum Virtualize storage.

What if you have a SAN Infrastructure built entirely from IBM b-type or Brocade-based switches? Cisco supports their SAN switches for this, but nobody has tested VersaStack in this combination, and UCS Director does not manage this combination, so IBM does not support this. Instead, for this situation, IBM recommends doing external connection via Ethernet, or using direct-attach configurations.

The Cisco Validated Design spends four months testing, and gives you bulletproof process to deploy the solution.

There is a difference between Cisco UCS Manager and UCS Director. UCS Manager is available at no additional charge, but only manages the Cisco x86 servers. UCS Director is optionally extra priced, and manages Cisco servers, Cisco networking, and IBM Spectrum Virtualize storage.

Brent explained the benefits of UCS Management through policies and profiles.

Chris covered Cisco CloudCenter, which the Cisco team shortens to just "C3". IBM Spectrum Copy Data Management can be used to move snapshots of data between on-premises and off-premises Cloud to help in Hybrid Cloud configurations.

How to Design an IBM Spectrum Scale solution

Tomer Perry, IBM Spectrum Scale I/O Development, presented this session.

For those who want to bring up a quick IBM Spectrum Scale environment to play around with, you can do this in as little as 30 minutes. But to design a mission critical deployment, additional requirements may need to be addressed. You may need to consult with not just storage admins, but also application owners, network admins and security personnel.

Large companies have hundreds or thousands of applications, so Tomer recommends to group these into "Workload families", based on data set types, access patterns and performance requirements. For NAS take-out, 80 percent of NAS I/O is "get attribute" that can easily be served directly from cache memory.

For each workload family, you may need to decide on snapshots, quotas, namespace (bind mounts, symlinks, etc.), security (ACL, encryption), estimated capacity, replication BC/DR, backup and ILM requirements.

Unless this is completely greenfield deployment, the existing infrastructure needs to be evaluated. This includes the LAN and WAN network topology, name resolution (DNS), time services (NTP), Authentication (AD, LDAP, NIS, Keystone), Keyserver (IBM SKLM), Monitoring and Migration requirements.

Tomer suggests designing the environment in this order: Cluster, File System, Storage Pools, Fileset, Replication, and finally Monitoring.

Generally, you need three NSD servers per cluster. For those licensing Spectrum Scale Standard Edition by the socket, you may be tempted to put everything into one big cluster. The new capacity-based Spectrum Scale Data Management Edition eliminates that concern, so Tomer recommends having separate computer clusters and storage clusters, connected by cross-cluster mount. All nodes in a cluster are considered an "ssh" administration domain.

A single Spectrum Control namespace can support up to 256 file systems. There are various reasons to have multiple file systems: block size, backup/recovery, snapshot, quotas, and cross-cluster isolation. If a file system gets corrupted, it will not affect other file systems. In an internal test, an "fsck" on 1 billion, 1 PB of data file system took only 30 minutes to repair.

Storage Pool design can separate metadata from content, and workloads can be separated to different storage media. With ILM, HSM and TCT, you can move colder data to Cloud, Object Storage, Spectrum Protect or Spectrum Archive.

Filesets are tree branches for each file system. IBM Spectrum Scale supports both dependent and independent filesets. Filesets can be used for Non-erasable, Non-Rewriteable (NENR) Immutability, policies, quotas, snapshots. Consider using a fileset instead of carving off a new file system.

Spectrum Scale offers both synchronous and asynchronous replication. For Synchronous, the ReadReplicaPolicy can be set to default, local or fastest. For Asynchronous, there are a variety of AFM modes (Read-only, Local-Update, Single-Writer, Independent-Writer, and Disaster Recovery). You may need to decide if your AFM gateways are dedicated or collocated. You will need to tune your TCP buffers for WAN performance to get the RPO you desire.

The nice thing about IBM solutions is that you can start small, and grow big. In all of these examples above, IBM offers sizes to match nearly any IT budget.

technorati tags: , , , , , , , , , , , , , , , , , , , , , , , , , , ,

0 comments
3 views

Permalink