IBM Storage Ceph for Splunk SmartStore
by Kyle Bader & Daniel Parkes
Introduction to Splunk and IBM Storage Ceph Object Storage
Splunk is a powerful platform designed to collect, analyze, and visualize machine-generated data. It is an indispensable tool for operational intelligence, security monitoring, and data analytics. It allows organizations to search, monitor, and analyze large volumes of log data in real-time, giving businesses critical insights.
On the storage side, IBM Storage Ceph Object Storage is an open-source, scalable solution for unstructured data. It is designed for enterprise workloads and offers flexibility, high durability, and scalability, which makes it an ideal choice for modern data architectures. IBM Ceph integrates with various applications, including Splunk SmartStore, optimizing data storage, cost efficiency, and retrieval processes.
What is Splunk SmartStore?
Splunk SmartStore is a feature that decouples Splunk's compute and storage layers, allowing external object stores, such as IBM Storage Ceph, to store warm and cold data. Hot data remains cached on local storage for faster access, while IBM Storage Ceph manages the bulk of data effectively and at a lower cost.
IBM Storage Ceph and Splunk SmartStore Integration
IBM Storage Ceph integrates with Splunk SmartStore to optimize data storage for large-scale deployments. Ceph’s architecture, designed for massive scalability, allows it to handle the increasing data demands generated by Splunk’s indexing processes. When combined, Splunk and Ceph provide cost-efficient scalability, and Splunk SmartStore's design enables computing and storage resources to scale independently. High Availability and Resiliency, IBM Storage Ceph's built-in replication and erasure coding features ensure that data remains available and resilient in the face of hardware failures. Dynamic Cache Management, SmartStore intelligently manages cached data on local storage for high-performance searches, even when most data is stored on Ceph.
IBM Storage Ceph has been validated with Splunk Smartstore and has been added in IBM Storage Ceph 7.1 the Compatibility Matrix
SmartStore Advantages for Splunk Deployments
SmartStore offers several advantages to the deployment's indexing tier:
SmartStore offers additional advantages specific to deployments of indexer clusters:
An intelligent cache manager ensures that SmartStore performs similarly to local storage configurations for most search use cases.
Storage Sizing for Splunk SmartStore
Proper sizing of the object store is crucial for successful deployment of Splunk SmartStore. It ensures efficient utilization of compute and storage resources, leading to optimal performance and scalability as data grows. Let's delve deeper into specifications and recommendations for sizing your storage with IBM Storage Ceph.
Sizing storage for Splunk SmartStore becomes a straightforward task when using IBM Storage Ready Nodes with IBM Storage Ceph. IBM Ceph allows for scale-out in single-node increments, where the number of indexes correlates to the number of nodes in the Ceph cluster:
Refer to the Splunk Capacity Planning Manual for more information on how to size a Splunk Enterprise environment.
Object Store to per-Splunk-indexer throughput
|
Minimum specifications
|
Performance specifications
|
Download throughput
|
100 MB/s or higher
|
800 MB/s or higher
|
Upload throughput
|
30 MB/s or higher
|
500 MB/s or higher
|
Network connectivity
|
1Gbps or higher
|
10Gbps or higher
|
SmartStore Efficiency: Data Reduction and Erasure Coding
Erasure coding
IBM Storage Ceph supports space-efficient erasure coding across failure domains, which requires dramatically less raw storage capacity than storing data with replication on Splunk indexers. As a result, even all-flash IBM Storage Ceph clusters can lead to significant cost savings for Splunk environments.
Compression
“To reduce S3 usage and improve network performance, SmartStore can compress tsidx files before uploading them to the remote store. This capability uses zstd compression. When the files are downloaded to indexers, SmartStore automatically decompresses them before placing them in the cache.”
Splunk documentation to Compress tsidx files upon upload to S3
Additionally, IBM Storage Ceph Object storage can compress the data stored in Splunk at the Object Gateway level. Since version 7.1, IBM Storage Ceph has supported hardware-accelerated compression via Intel QuickAssist Technology. QAT offloads compression and decompression tasks from software to hardware, reducing the CPU load and significantly improving overall system performance. This feature is handy in high-throughput applications like Splunk, where low latency is crucial.
SmartStore Security: Data Encryption at Rest
In any data-intensive environment, security is paramount. Protecting sensitive information,Protecting sensitive information, whether it's in transit or at rest, is essential for meeting regulatory requirements and managing potential risks. Splunk SmartStore offers robust encryption options that, when combined with IBM Storage Ceph, ensure that your data is fully secured. Here’s an overview of the encryption options available.
Splunk SmartStore can be configured with three types of S3 server-side encryption: SSE-C, SSE-S3, and SSE-KMS. Splunk SmartStore can be configured to use either SSE-S3 or SSE-KMS with IBM Storage Ceph. In this configuration, Ceph interacts with an external KMS like HashiCorp Vault to securely encrypt objects. For more detailed information about how to configure an external KMS for SSE-S3 or SSE-KMS with IBM Storage Ceph see S3 server-side encryption - IBM Documentation.
Configuration of IBM Storage Ceph for Splunk SmartStore
Here is a high-level example of how to configure IBM Storage Ceph with Splunk SmartStore:
IBM Storage Ceph configuration
Create an RGW S3 User for Splunk from the radosgw-admin cli
$ radosgw-admin user create --uid splunk --display-name splunk
In this example, we will use the AWS S3 CLI client to create the required bucket by any S3 client or the Ceph UI/Dashboard can be used.
Download and install the AWS CLI.
$ dnf install awscli -y
Create a .aws/credentials file:
$ aws configure
AWS Access Key ID: ABC
AWS Secret Access Key: XYZ
Default region name [default]: default
Default output format [json]: json
Splunk SmartStore Configuration
For each index, configure a remote store. Here’s an example to configure a SmartStore remote volume with IBM Storage Ceph:
Create a Bucket for Remote Store:
$ aws s3api create-bucket --endpoint-url https://s3.example.com s3://splunk
Configure the Remote Store in Splunk:
$cat ${SPLUNK_HOME}/etc/system/local/indexes.conf
[volume:remote_store]
storageType = remote
path = s3://splunk/remote_store
remote.s3.endpoint = https://s3.example.com
remote.s3.access_key = ABC
remote.s3.secret_key = XYZ
remote.s3.tsidx_compression = true
Configure an Index to Use the Remote Store:
[cs_index]
homePath = $SPLUNK_DB/$_index_name/db
coldPath = $SPLUNK_DB/$_index_name/colddb
emotePath = volume:remote_store/$_index_name
This is the most basic example. For a comprehensive list of indexer configuration options, please refer to the Splunk documentation.
Maximizing Value with IBM Storage Ceph and Splunk SmartStore
IBM Storage Ceph and Splunk SmartStore deliver outstanding scalability, storage efficiency, security, and cost savings. This integration reduces storage costs through erasure coding and compression, improves security with Object Versioning and Server-Side Encryption (SSE), and ensures optimal performance with intelligent cache management.
By combining cost-efficient storage with high-performance indexing and searching capabilities, IBM Storage Ceph and Splunk SmartStore allow enterprises to future-proof their infrastructure, providing the agility needed to keep pace with data growth and the evolving digital landscape.