Cloud Pak for Data Group

Optional Data Types in the SPL Programming Language 

Thu September 10, 2020 11:01 AM

Streams 4.3 introduced a new type to the SPL programming language to better allow Streams applications to interoperate with external data sources such as databases and JSON data. Previously, there was no straightforward way for SPL developers to handle tuple attributes that had no value. For example, handling a NULL or missing attribute from an external data source required reserving a special value (say, -1 in the case of a numeric attribute), or coupling the attribute with another boolean attribute that would indicate whether or not the real attribute was set. This problem is now resolved in Streams V4.3.

This new type is called the optional type, which is used to wrap another type (which we will refer to as T). At its core, an optional<T> indicates an SPL type which may or may not have a value of type T (which is said to be the underlying type of the optional). If the optional has no value of type T, then the optional is said to be null.

This article will cover how to use this type as well as how it integrates with other toolkits.

Using the Optional Type

Attributes with optional type can be included in type definitions, e.g:
type Person = int32 id, rstring name, optional<rstring> emailAddress;

An optional<T> can be initialized or otherwise assigned to from a value of type T:

myOptionalInt32 = 42;

They can also be initialized explicitly using the new null literal:

mutable optional<int32> myOptionalInt32 = null;

The null Literal

The null literal is used to indicate whether or not something of optional type is “present”, and it is the default value for optionals which are not explicitly initialized.

null in SPL has no numeric value; it is not like a C++ pointer whose value happens to be zero. It cannot be compared to any numeric value, and you cannot do numeric comparisons or operations on it (less than, greater than, XOR, etc.). null can, however, be checked for equality with other expressions of optional type, or assigned to optionals. E.g.:

if(var == null) { /* do something /* }

Accessing the Value Of An Optional Attribute

By definition, an attribute with an optional type may not be “present”, that is, it may not have a value. The isPresent operator, represented in SPL as postfix ??, is used to determine null-ness, e.g.,

if (var??) { /* do something*/}

Is equivalent to:

if(var != null) { /* do something*/}

Optionals must be “unwrapped” to access the value.  Use the unwrap operator, represented by postfix ! (aka bang) operator in SPL, to do this:

mutable int32 myInt = 0;
mutable optional<int32> myOptionalInt = 42;
myInt = myOptionalInt!; // myInt is now 42

Unwrapping is unsafe by design. If the value is not present, an exception is thrown.
For convenience, an unwrapOrElse operator has been created, represented by the binary ?: operator, which looks like this in usage:

x ?: y

The semantics of this are that if x is present, unwrap x, otherwise, use the default value of y (which is of the underlying type of T). This is effectively a short form of:

x?? ? x! : y

Identity Semantics

All of the above operators have identity semantics, which is to say, they can also be used on something which is not an optional. A non-optional is always considered to be present, hence using the isPresent operator upon it will always return true, unwrapping it will return the same value as if calling an identity function, and using unwrapOrElse will never use the default value (since the value provided is always present). These semantics are present to enable you to write generic SPL composites which are able to function on both optional<T> and T using the same code.

Additional notes

  • Since null is unique in SPL  in that it needs to represent null-ness for many optional types, null must always be promoted to an optional type before code is generated. In the overwhelming majority of cases, this is done automatically, but there exist some cases where ambiguities occur and the user must supply a cast. For example,


void f( optional<int32> x);
void f( optional<string> s);
// somewhere in the logic block
f(null); // Compile error, this is ambiguous. The compiler doesn't know which f() you mean to call.
f( ( optional<int32>) null); // this will compile

  • The only restriction on the underlying type T is that it must not directly be an optional itself. For instance, you cannot have an optional<optional<T>> . This is because multiple direct levels of optionality are redundant, and they are also problematic when it comes to initialization syntax such as in the following (illegal) example:

mutable optional<optional<int32>> myOptionalOptional = null; // illegal

Does the above statement mean you meant to set the nullness of the outer optional or the inner one? It would be ambiguous. This is avoided if there is no nesting of optionals allowed. If T is a composite type (e.g. tuple, set, list, map), it may have nested optional types within it, but those optional types themselves have the same restriction of no optional<optional<T>>.

How Optionals Are Handled By Existing Streams Operators

Existing operators which pass through or auto assign attributes will be able to do so with attributes that are optional without any changes. Operators which accept expressions (e.g. Filter) generally should operate properly if given expressions with the new syntax. It is up to the user of that operator to provide a valid/safe expression (check before unwrapping).

Existing operators which expect certain attributes to be present in the input streams require those attributes to be present in order to process them, and should already be doing type checking on any provided attributes to cause compilation errors if the wrong type is used.

For example, if I have a Geofence operator which takes a latitude/longitude and outputs an alert if something enters the fence, a latitude and longitude must be present  to expect that operator to do work on the tuple. It can’t determine whether an object has contacted a fence if the position is unknown. Someone writing such a Geofence operator ought to already be confirming that if the user specifies some attribute (say, float64 lat) to hold the latitude, that it is of the expected type.  So, if the user attempts to use an optional type (such as optional<float64>), the operator should reject it at compile time.

This is what many of the operators included in the Standard Toolkit which is a part of Streams will do, but of course, the behavior of third party toolkits cannot be vouched for, so if you wish to use optionals with pre-existing toolkits, you should investigate their behavior before attempting to do so. There is no attempt by Streams to bake a default behavior into the operators themselves because in almost all cases the right default behavior is going to be application dependent. E.g. in the case of our Geofence, if the operator were to encounter a null latitude, would we want to just ignore the object, or send some other alert saying the object’s been lost, or what exactly? It’s impossible to say for sure what all downstream consumers of the operator would want, so it’s left up to the application developer to decide.

Toolkit support statement for Streams 4.3

The following summarizes the support for optional types for the toolkits included with Streams:

  • Standard Toolkit
    • For all operators in the Standard Toolkit, an attempt to use an optional attribute in a place where calculation is required will result in a compile error. Where optional attributes are passed from input to output, the nullness will flow through.
    • Aggregate has been modified to ignore nulls.
  • JSON Toolkit
    • Java operators
      • TupleToJSON:
        • writes tuple attributes with SPL value null to JSON string, e.g. {…, “attribute”: null, …}
      • JSONToTuple:
        • Assigns SPL null value to attributes of type optional<T> in case JSON string doesn’t contain the key or the key has JSON value null
    • C++ native functions
      • Conversion functions support reading/writing null values from/to JSON string (restrictions on supported SPL types remain as before)
      • parseJSON and queryJSON functions are not extended regarding optional<T>, as they have already an interface to handle absence/null-value of keys in JSON string
  • JDBC Toolkit
    • JDBCRun operator supports reading and writing optional type data from and into databases.
    • It inserts tuple attributes with SPL value NULL into tables of database.
    • It returns also SPL value NULL from database in case the column doesn’t contain any value to the SPL application.
  • Object Storage Toolkit
    • ObjectStorageSink supports writing objects in parquet format
    • Attributes with optional types are supported at input stream
    • Null attributes result in empty values in parquet object
    • Restriction: Map, List and Set types must not be defined as optional</li>

API Changes

The C++, Java, Python, Perl code generation, and REST APIs extended to allow for interrogation and manipulation of tuple attributes and expressions of optional type.


#streams

Statistics

0 Favorited
6 Views
0 Files
0 Shares
0 Downloads