File and Object Storage

 View Only

Monitoring Spectrum Scale Object metrics

By Archive User posted Thu September 14, 2017 10:54 AM

  
As you know "mmperfmon" command (with query option) in IBM Spectrum Scale is used to view metrics associated with GPFS and the protocols. In this post, I will share details on how to monitor object protocol (account, container, object) metrics (GET/PUT/POST/DELETE) using mmperfmon command.

In-order to view object metrics;
(Note: The below steps are needed only when object is installed manually and need not to be performed when installed via install tool kit.)

1. Object protocol has to be enabled (obvious check, but just adding for completeness :-))
(For more details related to install and object, please refer to https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_quickrefobjectstorage.htm)

2. Configure performance monitoring tool
(For more details related to terminology and configuration, please refer to https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1adv_configuringthePMT.htm)

3. Install pmswift rpm
(This rpm should be available in /usr/lpp/mmfs/4.x.x.x/zimon_rpms/rhel7/pmswift-*.noarch.rpm)

4. Add object sensors config (SwiftAccount, SwiftContainer, SwiftProxy, SwiftObject) to Zimon.
This can be achieved using below command;
# /usr/local/pmswift/bin/pmswift-config-zimon set


5. Add "statsD" configuration to object configuration (account-server.conf, container-server.conf, container-reconciler.conf, container-sync-realms.conf, object-expirer.conf, object-server.conf, object-server-sof.conf, proxy-server.conf) files.
This can be achieved using below command;
# /usr/local/pmswift/bin/pmswift-config-swift set

(After this step, you should be able to view below entries in the above mentioned config files)

log_statsd_host = localhost
log_statsd_port = 8125
log_statsd_default_sample_rate = 1.0
log_statsd_sample_rate_factor = 1.0
log_statsd_metric_prefix =

To learn more about statsd parameters in detail refer to;
http://docs.openstack.org/developer/swift/admin_guide.html#reporting-metrics-to-statsd

6. Start pmswift service
# systemctl pmswiftd.service start


Run object operations (such as container create/delete, object create/delete etc), for a specific duration of interval (make sure to record the start and end time) and execute;

# mmperfmon query

Example:

# mmperfmon query objObj 2016-09-28-09:56:39 2016-09-28-09:56:43

1: cluster1.ibm.com|SwiftObject|object_auditor_time
2: cluster1.ibm.com|SwiftObject|object_expirer_time
3: cluster1.ibm.com|SwiftObject|object_replication_partition_delete_time
4: cluster1.ibm.com|SwiftObject|object_replication_partition_update_time
5: cluster1.ibm.com|SwiftObject|object_DEL_time
6: cluster1.ibm.com|SwiftObject|object_DEL_err_time
7: cluster1.ibm.com|SwiftObject|object_GET_time
8: cluster1.ibm.com|SwiftObject|object_GET_err_time
9: cluster1.ibm.com|SwiftObject|object_HEAD_time
10: cluster1.ibm.com|SwiftObject|object_HEAD_err_time
11: cluster1.ibm.com|SwiftObject|object_POST_time
12: cluster1.ibm.com|SwiftObject|object_POST_err_time
13: cluster1.ibm.com|SwiftObject|object_PUT_time
14: cluster1.ibm.com|SwiftObject|object_PUT_err_time
15: cluster1.ibm.com|SwiftObject|object_REPLICATE_time
16: cluster1.ibm.com|SwiftObject|object_REPLICATE_err_time
17: cluster1.ibm.com|SwiftObject|object_updater_time
Row object_auditor_time object_expirer_time object_replication_partition_delete_time
object_replication_partition_update_time object_DEL_time object_DEL_err_time
object_GET_time object_GET_err_time object_HEAD_time object_HEAD_err_time object_POST_time
object_POST_err_time object_PUT_time object_PUT_err_time object_REPLICATE_time
object_REPLICATE_err_time object_updater_time
1 2016-09-28 09:56:39 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.855923 0.000000 0.000000 0.000000 45.337915 0.000000 0.000000 0.000000 0.000000
2 2016-09-28 09:56:40 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
3 2016-09-28 09:56:41 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.931925 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
4 2016-09-28 09:56:42 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.855923 0.000000 0.000000 0.000000 516.280890 0.000000 0.000000 0.000000 0.000000
object_DEL_total_time = 0.0                      object_PUT_total_time = 561.618805
object_GET_total_time = 0.0                      object_POST_total_time = 0.0
object_HEAD_total_time = 1.786948      object_PUT_max_time = 516.28089
object_POST_max_time = 0.0                    object_GET_max_time = 0.0
object_HEAD_max_time = 0.931025       object_DEL_max_time = 0.0
object_GET_avg_time = 0.0                         object_DEL_avg_time = 0.0
object_PUT_avg_time = 280.809402       object_POST_avg_time = 0.0
object_HEAD_avg_time = 0.893474         object_DEL_time_count = 0.0
object_POST_time_count = 0                      object_PUT_time_count = 2
object_HEAD_time_count = 2                     object_GET_time_count = 0
object_DEL_min_time = 0.0                       object_PUT_min_time = 45.337915
object_GET_min_time = 0.0                       object_POST_min_time = 0.0
object_HEAD_min_time = 0.855923


"mmperfmon" supports various object queries and each query results in fetching related object statsd metrics.

You can find the mapping between various mmperfmon query and object metrics here.

In the above shown example, query "objObj" fetched various object (component = object, and it does not display container, account statistics) related statsd metrics (object_auditor_time, object_expirer_time, object_replication_partition_delete_time, object_replication_partition_update_time, object_DEL_time, object_DEL_err_time, object_GET_time, object_GET_err_time, object_HEAD_time, object_HEAD_err_time, object_POST_time, object_POST_err_time, object_PUT_time, object_PUT_err_time, object_REPLICATE_time, object_REPLICATE_err_time, object_updater_time).

And for each metric in the query: count, total_time, max_time, min_time, avg_time are calculated.
"[metric]_time_count" represent the number of requests happened during the specified query duration, "[metric]_max_time" represent the maximum amount of time taken by a CURD (GET/PUT/DELETE/POST) operation.

Just to describe it clear, for the above shown example,
"object_PUT_time_count = 2" represent a total of 2 object PUT requests happened during specified duration (2016-09-28-09:56:39 2016-09-28-09:56:43)
"object_PUT_total_time = 561.618805" represent total amount of time spent serving the 2 object PUT requests during specified duration
"object_PUT_max_time = 516.28089" represent maximum amount of time spent for serving an object PUT request during specified duration
"object_PUT_min_time = 45.337915" represent minimum amount of time spent for serving an object PUT request during specified duration
"object_PUT_avg_time = 280.809402" represent average amount of time in serving the 2 object PUT requests during specified duration

(Views presented here are my own and not of my employer’s)
#ibmstorage
#performance
#monitoring
#Cloudstorage
#IBMSpectrumScale
#Softwaredefinedstorage
#fileandobject
0 comments
6 views

Permalink