Cloud Platform as a Service

Cloud Platform as a Service

Join us to learn more from a community of collaborative experts and IBM Cloud product users to share advice and best practices with peers and stay up to date regarding product enhancements, regional user group meetings, webinars, how-to blogs, and other helpful materials.

 View Only

Log Record Field Types in IBM Cloud Logs Priority Insights

By Sven Lange-Last posted Mon May 26, 2025 11:43 AM

  

The Priority insights feature in IBM Cloud Logs parses log records into typed fields. By and large, the field types are not declared but dynamically inferred from field names and values when a field is processed for the first time. The advantage of typed fields is that specialized search operations based on the type can be supported.

Unfortunately, typed fields cause mapping exceptions when a log record field has a value that does not match the type inferred before. Such field mapping exceptions and how to deal with them is a complex topic of its own. A separate article explains them in more detail.

This article explains:

  • how Priority insights detects new fields and determines their type.
  • why Priority insights has different field types.
  • what multifields are.
  • which field types ignore malformed values.
  • how arrays, null values, and objects are processed.
  • how nested objects and dots (.) in field names work and what the resulting problems are.
  • how the daily index works.
  • what field mapping exceptions are.

Another article provides a set of log records with all field types introduced in this article and instructions how to send them to your IBM Cloud Logs service instance and how to query the current field mapping.

The Priority insights feature is based on the OpenSearch Project. If you already know OpenSearch, the concepts will sound familiar to you.

What is important to know: The Store and search feature in IBM Cloud Logs works differently than Priority insights. Store and search uses dynamic typing of fields and converts field value types as needed. The Store and search feature requires that you connect an IBM Cloud Object Storage (COS) bucket to your IBM Cloud Logs service instance.

Parsing log records into fields

Log records sent to IBM Cloud Logs can be text in many different formats: JSON, logfmt, Extended Log File Format, Syslog, Common Log Format, HAProxy log formats, ... or any text that does not follow a known format.

If a log record is in JSON format, its JSON members are parsed into named fields together with their contents. Example: { "key": "value" } is parsed into a field named key with content value.

If a log record is NOT in JSON format or mixed with non-JSON data like a timestamp at the beginning of the log record, parsing rules can be used in IBM Cloud Logs to transform the log record into JSON format that can be parsed. Parsing rules can also be used to extract portions of the log record into named fields.

The result of parsing rules and parsing: The log record is indexed (stored) in Priority insights as a set of fields and their contents.

(For completeness: When a log record is stored in Priority insights, it is also stored in Store and search if a COS bucket is connected, i.e. it is stored in two places in parallel.)

What happens when a new field is detected?

Priority insights keeps track of all known fields in its index. When a log record is processed, Priority insights checks whether the field names are already known. For an unknown field, a field mapping is added to the index that determines the field type plus other instructions how to index the field.

After unknown fields have been added to the index, field values from the log record are indexed for fast field-based and full-text search. In addition, the full log record is stored.

How is the field type inferred?

The field mapping depends on the field value and the field name. You may wonder why it depends on the field name. IBM Cloud Logs allows users to select some special field types based on the field name suffix because the field value would not be sufficient to safely detect the desired type. Example: If a field name ends with _ipaddr, its type is IP address. Without this name suffix, IP address values are classified as type text.

The table below shows the different field mappings in Priority insights at the time of writing this article. When a new field is detected, Priority insights evaluates field mapping matching conditions from top to bottom. The first field mapping that matches the new field is added to the index.

Rank Field value has type Field name ends with Resulting field type Multifields Example values
1 Any _geopoint geo_point keyword with type keyword "location_geopoint": { "lat": 48.666234, "lon": 9.0366981 }
2 object Any object None "new": { "key": "value" }
"new": { }
3 boolean Any boolean keyword with type keyword "new": true
"new": false
4 Any _ipaddr ip keyword with type keyword "server_ipaddr": "203.0.113.42"
5 Any _custom_timestamp date numeric with type scaled_float
keyword with type keyword
"start_custom_timestamp": "2024-09-17T16:39:30"
"start_custom_timestamp": "1726591170"
6 Any Any text numeric with type scaled_float
keyword with type keyword
"new": "This is a text."
"new": "true"
"new": 42
"new": "48.666234,9.0366981"
"new": "203.0.113.42"
"new": "2024-09-17T16:39:30"

What are multifields?

Multifields are used to index the same field differently. Example: Field "json_number_integer": 42 is stored as ...

  • text field named json_number_integer.
    It can be queried with Lucene query json_number_integer:"42" and DataPrime query source logs | filter $d.json_number_integer ~ '42'.
  • scaled_float field named json_number_integer.numeric.
    It can be queried with Lucene query json_number_integer.numeric:[42 TO 42]. DataPrime does not fully recognize the json_number_integer.numeric field. For this reason, DataPrime queries use type conversion instead of subfield source logs | filter $d.json_number_integer:number == 42.
  • keyword field named json_number_integer.keyword.
    It can be queried with Lucene query json_number_integer.keyword:"42" and DataPrime query source logs | filter $d.json_number_integer.keyword ~ '42'.

Why different types?

Different field types in Priority insights support different Lucene query types. Following table shows examples which Lucene query types work well for different field types.

Field type Lucene query type Example field Matching example query
ip CIDR query "json_ipaddr": "203.0.113.42" json_ipaddr:"203.0.113.0/24"
date Range query "json_custom_timestamp": "1726591170" json_custom_timestamp:[1726591170||-1m TO 1726591170||+1h]
text Term query "json_string": "This is a text." json_string:(a text this is)
text Fuzzy term query "json_string": "This is a text." json_string:ex~2
text Wildcard query "json_string": "This is a text." json_string:(te* th??)
text Range query "json_string": "This is a text." json_string:(>=tha <=thz)
keyword Phrase query "json_string": "This is a text." json_string.keyword:"This is a text."
keyword Regular expression query "json_string": "This is a text." json_string.keyword:/This [si]+ a.*/
keyword Range query "json_string": "This is a text." json_string.keyword:["This in" TO "This will"]
scaled_float Term query "json_number_integer": 42 json_number_integer.numeric:42
scaled_float Range query "json_number_integer": 42 json_number_integer.numeric:[42 TO *]

The main difference between text and keyword fields in Priority insights:

  • text field values are analyzed and stored as tokens.
    When searching text fields, you can search for tokens in any order.
    Example: Lucene query json_string:(a text this is) matches a log record that contains "json_string": "This is a text.".
  • keyword field values are not analyzed but only normalized to lower case and leading / trailing whitespace is removed.
    When searching keyword fields, you can only search exact matches.
    Examples: Lucene queries json_string.keyword:"this is a text." and json_string.keyword:(this IS a TEXT\.) (. needs to be escaped) match a log record that contains "json_string": "This is a text.". Lucene queries json_string.keyword:"this is a text" (. missing at the end) and json_string.keyword:(a text\. this is) do not match the log record.

More details can be found here:

Ignoring malformed values

The field mappings for types geo_point, ip, and date ignore malformed values. Example: The value This is a text. is no valid geo_point, ip, or date value - i.e. it is malformed for said types. When a log record contains a geo_point, ip, or date field with a malformed value, that log record field value won't be indexed in Priority insights - other fields and their values from the same log record will be indexed and the whole log record including the malformed field value will be stored. Field values that are not indexed cannot be found in any search.

Example: A log record containing "server_ipaddr": "This is a text." won't be returned by Lucene query server_ipaddr:"This is a text." or DataPrime query source logs | filter $d.server_ipaddr == 'This is a text.' in Priority insights. (With Store and search, i.e. when selecting All logs on the Logs page, the queries will return the log record.)

Following queries return all log records that contain ignored fields:

  • Lucene: _exists_:_ignored
  • DataPrime: source logs | filter $d._ignored != null

See Elasticsearch documentation for details: ignore_malformed and The antidote for index mapping exceptions: ignore_malformed.

What about arrays?

In Priority insights, a log record field with JSON array value is stored as a field with multiple values. What is important: All values in the array must have the same type. Example: Field "json_number_integer": [23, 42] is stored as text field with two values, i.e. 23 and 42.

There is no explicit array type. When a new field is detected, a field mapping is added to the index based on value and it doesn't make a difference whether the field has a single value or multiple values in a JSON array. Example: Both, "json_number_integer": 42 and "json_number_integer": [23, 42] create exactly the same field mapping.

What about null values?

JSON has the null value which stands for the empty value. When Priority insights detects a new field with value null, an empty array ([]), or an array that only contain null values ([ null, ... ]), it does not add a field mapping to the index, i.e. the field stays unknown.

It is not possible to search for null values with Lucene queries.

See OpenSearch documentation: Null value.

What about nested objects?

In Priority insights, a log record field with JSON object value is stored in the index as an object field. Similar to a JSON object, an object field contains an unordered set of sub-fields with values. Any JSON member in a JSON object creates a sub-field based on member's value. In queries, a sub-field is referenced by using a path expression with dots (.).

Example: When a log record contains "json_object": { "json_boolean": true, "json_string": "This is a text." }, Priority insights will create three fields:

  1. An object field named json_object.
  2. A boolean field named json_boolean as sub-field of json_object. In queries, this field can be referenced as json_object.json_boolean.
  3. A text field named json_string as sub-field of json_object. In queries, this field can be referenced as json_object.json_string.

In the example, field json_object has two sub-fields. Future log records with the json_object field need NOT contain these two sub-fields. Future log records may contain none, either, or both of these sub-fields as well as different sub-fields.

In Priority insights, sub-fields of an object field can also have type object. As a result, object fields can be nested to (almost) any depth. Path expressions with dots (.) are used to reference nested fields in queries. Example: "an": {"object": {"nested:" {"in": {"an": "object"} } } }. In this example, the innermost field named an is a text field and can be referenced as an.object.nested.in.an. All other fields are object fields.

What about dots in field names?

As mentioned above, Priority insights supports nested object fields and path expressions with dots (.) are used to reference nested fields in queries. But what if log records contain fields with dots? Example: "an.object.nested.in.an": "object".

Imagine Priority insights received a log record with following content - i.e. the same nested fields in different notation (Spoiler: It wouldn't work.): { "an.object.nested.in.an": "object", "an": {"object": {"nested:" {"in": {"an": "object"} } } } }. In queries, both fields would be referenced as an.object.nested.in.an. This is obviously a problem.

In order to avoid said problem, Priority insights interpretes field names in received log records containing dots as path expressions. Such field names are expanded as nested objects when ingesting log records. Example: "an.object.nested.in.an": "object" is interpreted as "an": {"object": {"nested:" {"in": {"an": "object"} } } }. With this approach, path expressions in field names have the same meaning when ingesting and when querying.

See Allowing dots in field names #15951 for details.

(For completeness: The Store and search feature in IBM Cloud Logs works differently than Priority insights. It will NOT expand field names containing dots as nested objects.)

Challenges with dots in field names

Log records may contain field names with dots without being path expressions describing nested objects. Example: Kubernetes Recommended Labels or Kubernetes Well-Known Labels, Annotations and Taints contain dots (.). Widely used labels are app.kubernetes.io/name, app.kubernetes.io/version, kubernetes.io/os. Log collectors may add these labels and their values to log records when tailing container logs in a Kubernetes cluster.

When Priority insights processes a log record with field "app.kubernetes.io/name": "test", it will be added to the index as "app": { "kubernetes": { "io/name": "test" } } - i.e. object fields app and kubernetes as well as text field io/name. Most users would have expected a single text field named app.kubernetes.io/name.

Daily indices

Priority insights processes log records and stores them in the index as a set of fields and their contents. All fields have a field mapping in the index that determines the field type plus other instructions how to index the field.

Each IBM Cloud Logs service instance has a separate set of Priority insights indices. The first index in the set is created when the service instance is created. Once a day, a new index is created and added to the set. At the same time, the oldest index is removed if it has reached its expiration date.

A new daily index only contains field mappings for known label (Examples: Application and Subsystem) and metadata (Example: Timestamp and Secerity) fields. Every day, all other fields in the index are learnt anew from the processed log records. As a result, field mappings in daily indices can differ if processed log records differ.

Example:

  • On day 1, log record { "a": "text", "b": {} } is processed. New fields named a and b are detected. Field a is added to the index as text field. Field b is added to the index as object field.
  • On day 2, a new index is started and log record { "a": "text", "b": "text" } is processed. Fields a and b do not exist yet in the new daily index. They are detected as new fields. Fields a and b are added to the index as text fields.

In this example, field b was an object field on day 1 and a text field on day 2.

Queries in Priority insights work on the full set of indices that belong to the service instance - not only the current daily index.

The advantage of the daily index lifecycle: Priority insights does not keep inferred field types forever. If you change field types in your log records, the system will learn the new type with the next daily index.

What are mapping exceptions?

As mentioned before, Priority insights parses log records into named fields. All these fields have a field mapping in the index that determines the field type plus other instructions how to index the field.

When Priority insights processes a log record, some of the fields in the log record will already have a field mapping in the current daily index and others may be unknown. For known fields, the existing field mapping defines the field type. It can happen that the value of a field in a new log record is not an allowed value for the type defined by the field mapping.

Example:

  • The current daily index contains a field named json_object with type object.
  • Priority insights processes a new log record with field "json_object": "This is a text.".
  • Priority insights determines that the provided value "This is a text." is not a allowed value for an object field.
  • Priority insights cannot index the log record with the given json_object field and value.

There are other conflicts that lead to field mapping exceptions.

When a field mapping exception occurs when ingesting a log record, Priority insights will not process and index the log record as usual. Instead, it will transform the whole log record into a single field of type text and index it. As a result, full text search will still work for this log record - but not field searches.

Field mapping exceptions and how to deal with them is a complex topic of its own. Check article Mapping Exceptions in IBM Cloud Logs Priority Insights.

0 comments
25 views

Permalink