File and Object Storage

 View Only

Leveraging IBM Storage Ceph for Scalable, Secure, and Efficient Splunk SmartStore

By Daniel Alexander Parkes posted Tue October 01, 2024 04:36 PM

  

IBM Storage Ceph for Splunk SmartStore

by Kyle Bader & Daniel Parkes

 

Introduction to Splunk and IBM Storage Ceph Object Storage

Splunk is a powerful platform designed to collect, analyze, and visualize machine-generated data. It is an indispensable tool for operational intelligence, security monitoring, and data analytics. It allows organizations to search, monitor, and analyze large volumes of log data in real-time, giving businesses critical insights.

On the storage side, IBM Storage Ceph Object Storage is an open-source, scalable solution for unstructured data. It is designed for enterprise workloads and offers flexibility, high durability, and scalability, which makes it an ideal choice for modern data architectures. IBM Ceph integrates with various applications, including Splunk SmartStore, optimizing data storage, cost efficiency, and retrieval processes.

What is Splunk SmartStore?

Splunk SmartStore is a feature that decouples Splunk's compute and storage layers, allowing external object stores, such as IBM Storage Ceph, to store warm and cold data. Hot data remains cached on local storage for faster access, while IBM Storage Ceph manages the bulk of data effectively and at a lower cost.

IBM Storage Ceph and Splunk SmartStore Integration

IBM Storage Ceph integrates with Splunk SmartStore to optimize data storage for large-scale deployments. Ceph’s architecture, designed for massive scalability, allows it to handle the increasing data demands generated by Splunk’s indexing processes. When combined, Splunk and Ceph provide cost-efficient scalability, and Splunk SmartStore's design enables computing and storage resources to scale independently. High Availability and Resiliency, IBM Storage Ceph's built-in replication and erasure coding features ensure that data remains available and resilient in the face of hardware failures. Dynamic Cache Management, SmartStore intelligently manages cached data on local storage for high-performance searches, even when most data is stored on Ceph.

IBM Storage Ceph has been validated with Splunk Smartstore and has been added in IBM Storage Ceph 7.1 the Compatibility Matrix

SmartStore Advantages for Splunk Deployments

SmartStore offers several advantages to the deployment's indexing tier: 

  • Reduced storage cost. Your deployment can take advantage of the economy of remote object stores instead of relying on costly local storage. 

  • Access to high availability and data resiliency features available through remote object stores. 

  • The ability to scale compute and storage resources separately, thus ensuring that you use resources efficiently. 

  • Simple and flexible configuration with per-index settings. 

  • A bootstrapping capability that allows a new cluster or standalone indexer to inherit data from an old cluster or standalone indexer. 

SmartStore offers additional advantages specific to deployments of indexer clusters: 

  • Fast recovery from peer failure and fast data rebalancing, requiring only metadata fixups for warm data. 

  • Lower overall storage requirements, as the system maintains only a permanent copy of each warm bucket. 

  • Warm buckets recover fully even when the number of peer nodes goes down, provided the replication factor is greater than or equal to it. 

  • Global size-based data retention. 

  • Simplified upgrades. 

An intelligent cache manager ensures that SmartStore performs similarly to local storage configurations for most search use cases. 

Storage Sizing for Splunk SmartStore

Proper sizing of the object store is crucial for successful deployment of Splunk SmartStore. It ensures efficient utilization of compute and storage resources, leading to optimal performance and scalability as data grows. Let's delve deeper into specifications and recommendations for sizing your storage with IBM Storage Ceph.

Sizing storage for Splunk SmartStore becomes a straightforward task when using IBM Storage Ready Nodes with IBM Storage Ceph. IBM Ceph allows for scale-out in single-node increments, where the number of indexes correlates to the number of nodes in the Ceph cluster:

  • 1 Splunk Indexer per 4616-X2D all-flash ready node.

  • 8 Splunk Indexers per 4616-X5D all-flash ready node.

Refer to the Splunk Capacity Planning Manual for more information on how to size a Splunk Enterprise environment.

Object Store to per-Splunk-indexer throughput 

 

Minimum specifications 

Performance specifications 

Download throughput 

100 MB/s or higher 

800 MB/s or higher 

Upload throughput 

30 MB/s or higher 

500 MB/s or higher 

Network connectivity 

1Gbps or higher 

10Gbps or higher 

SmartStore Efficiency: Data Reduction and Erasure Coding

Erasure coding

IBM Storage Ceph supports space-efficient erasure coding across failure domains, which requires dramatically less raw storage capacity than storing data with replication on Splunk indexers. As a result, even all-flash IBM Storage Ceph clusters can lead to significant cost savings for Splunk environments. 

Compression

“To reduce S3 usage and improve network performance, SmartStore can compress tsidx files before uploading them to the remote store. This capability uses zstd compression. When the files are downloaded to indexers, SmartStore automatically decompresses them before placing them in the cache.” 

Splunk documentation to Compress tsidx files upon upload to S3 

Additionally, IBM Storage Ceph Object storage can compress the data stored in Splunk at the Object Gateway level. Since version 7.1, IBM Storage Ceph has supported hardware-accelerated compression via Intel QuickAssist Technology. QAT offloads compression and decompression tasks from software to hardware, reducing the CPU load and significantly improving overall system performance. This feature is handy in high-throughput applications like Splunk, where low latency is crucial.

SmartStore Security: Data Encryption at Rest

In any data-intensive environment, security is paramount. Protecting sensitive information,Protecting sensitive information, whether it's in transit or at rest, is essential for meeting regulatory requirements and managing potential risks. Splunk SmartStore offers robust encryption options that, when combined with IBM Storage Ceph, ensure that your data is fully secured. Here’s an overview of the encryption options available.

Splunk SmartStore can be configured with three types of S3 server-side encryption: SSE-C, SSE-S3, and SSE-KMS. Splunk SmartStore can be configured to use either SSE-S3 or SSE-KMS with IBM Storage Ceph. In this configuration, Ceph interacts with an external KMS like HashiCorp Vault to securely encrypt objects. For more detailed information about how to configure an external KMS for SSE-S3 or SSE-KMS with IBM Storage Ceph see S3 server-side encryption - IBM Documentation.  

Configuration of IBM Storage Ceph for Splunk SmartStore

Here is a high-level example of how to configure  IBM Storage Ceph with Splunk SmartStore:

IBM Storage Ceph configuration

Create an RGW S3 User for Splunk from the radosgw-admin cli

 $ radosgw-admin user create --uid splunk --display-name splunk 

In this example, we will use the AWS S3 CLI client to create the required bucket by any S3 client or the Ceph UI/Dashboard can be used.

Download and install the AWS CLI.  

 $ dnf install awscli -y 

Create a .aws/credentials file:

$ aws configure 
AWS Access Key ID: ABC
AWS Secret Access Key: XYZ 
Default region name [default]: default
Default output format [json]: json

Splunk SmartStore Configuration

For each index, configure a remote store. Here’s an example to configure a SmartStore remote volume with IBM Storage Ceph:

Create a Bucket for Remote Store:

 $ aws s3api create-bucket --endpoint-url https://s3.example.com s3://splunk

Configure the Remote Store in Splunk:

 $cat ${SPLUNK_HOME}/etc/system/local/indexes.conf
[volume:remote_store]
storageType = remote
path = s3://splunk/remote_store
remote.s3.endpoint = https://s3.example.com
remote.s3.access_key = ABC
remote.s3.secret_key = XYZ
remote.s3.tsidx_compression = true 

Configure an Index to Use the Remote Store:


[cs_index]
homePath = $SPLUNK_DB/$_index_name/db
coldPath = $SPLUNK_DB/$_index_name/colddb
emotePath = volume:remote_store/$_index_name

This is the most basic example. For a comprehensive list of indexer configuration options, please refer to the Splunk documentation.

Maximizing Value with IBM Storage Ceph and Splunk SmartStore

IBM Storage Ceph and Splunk SmartStore deliver outstanding scalability, storage efficiency, security, and cost savings. This integration reduces storage costs through erasure coding and compression, improves security with Object Versioning and Server-Side Encryption (SSE), and ensures optimal performance with intelligent cache management. 

By combining cost-efficient storage with high-performance indexing and searching capabilities, IBM Storage Ceph and Splunk SmartStore allow enterprises to future-proof their infrastructure, providing the agility needed to keep pace with data growth and the evolving digital landscape.

0 comments
15 views

Permalink