IBM Spectrum Computing Group

[LSF Explorer]Merging data from different sources into one index via Logstash

  • 1.  [LSF Explorer]Merging data from different sources into one index via Logstash

    Posted Tue December 29, 2020 01:38 AM

    Logstash is an open source data collection engine with real-time pipe-lining capabilities. It is used by Explorer and LSF Suite to collect monitoring and accounting data into Elasticsearch. This document will demonstrate how to update content in existing indices with Logstash. So that, data from different sources can be merged into one index for future complex usage.

    Refer to https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-document_id for more details.

    Update by partial document

    Existing document in an index can be updated by passing a partial document through Logstash specifying the same document id. The patrial document will be merged into the existing document (simple recursive merge, inner merging of objects, replacing core "keys/values" and arrays).

    The sample format of Elasticsearch output plugin for Logstash is as below.

    elasticsearch {
    hosts => ["your_elasticsearch_url"]
    index => "your_existing_index"
    document_id => " target _document_id"
    action => "update"
    doc_as_upsert => true
    upsert => "partrial_content_json_string"
    }

    Notes:

    • "index" should be set as the index whose documents will be updated
    • "document_id" should be set as the id of the document which will be updated. This id should be composed by several key fields somehow. For example, if the id of the documents in existing index is composed as {field1}_{field2}_{field3}, the "document_id" should be "%{field1}_%{ field2}_%{field3}". These 3 fields here can be regarded as the primary key of the index whose combination is unique in the index.
    • "action" should be set as "update" to indicate the update operation.
    • "doc_as_upsert" should be set as true to enable creating a new document with source if document_id doesn't exist in Elasticsearch.

    Example: Merge"new_field" field into a document whose id is composed by field1 and field2 assuming that there are 3 fields in the pipeline including field1, field2 and new_field after filter stage.

    output {
    elasticsearch {
    hosts => ["127.0.0.1:9200 "]
    index => "test_idx"
    document_id => " %{ field1}_%{ field2}"
    action => "update"
    doc_as_upsert => true
    }
    }

    Update by script

    Existing document in an index can be updated based on a script provided through Logstash specifying the same document id. The operation gets the document (collocated with the shard) from the index, runs the script (with optional script language and parameters), and indexes back the result (also allows to delete, or ignore the operation).

    The sample format of Elasticsearch output plugin for Logstash is as below.

    elasticsearch {
    hosts => ["your_elasticsearch_url "]
    index => "your_existing_index "
    document_id => "target_document_id "
    action => "update"
    script_lang => "painless"
    script_type => "inline"
    scripted_upsert => true
    script => "update_script"
    }

    Note:

    • "index" should be set as the index whose documents will be updated
    • "document_id" should be set as the id of the document which will be updated. This id should be composed by several key fields somehow. For example, if the id of the documents in existing index is composed as {field1}_{field2}_{field3}, the "document_id" should be "%{field1}_%{ field2}_%{field3}". These 3 fields here can be regarded as the primary key of the index whose combination is unique in the index.
    • "action" should be set as "update" to indicate the update the update operation.
    • "script_lang" should be set as "painless" to use painless as script language.
    • "script_type" could be set as "inline" to be used as inline script. This value can also be set as "indexed" or "file" to use other script reference way.
    • "script_upsert" should be set as true to enable creating a new document if document_id doesn't exist in Elasticsearch
    • "script" should be set as the script to update the document.

    Example: Merge"new_field" field into a document whose id is composed by field1 and field2 assuming that there are 3 fields in the pipeline including field1, field2 and new_field after filter stage.

    output {
    elasticsearch {
    hosts => ["127.0.0.1:9200 "]
    index => "tst_idx "
    document_id => "%{ field1}_%{ field2}"
    action => "update"
    script_lang => "painless"
    script_type => "inline"
    scripted_upsert => true
    script => "ctx._source.new_field = params.event.get('new_field');"
    }
    }


    ------------------------------
    Edward Deng
    ------------------------------