Written by Kris Hildrum.
“Syntactic sugar” is syntax that helps make code easier to read or to write. In this post, I’ll show you some syntax that may make stream type management a little easier.
Consider the case when the of the output stream of an operator is closely related to the type of an input stream–perhaps it is exactly the same as type of the input stream, or maybe the operator adds an attribute. In this case, you might end up having the same complex type listed in multiple places in your code. This is hard to read and hard to maintain, but with a little change in how the app is written, we can tidy it up.
Our example application reads from a file into a stream, then for each tuple, produces a JSON string representing the data (JSON is a human-readable way of encoding data into a string), and then adds the size of the two lists. The basic version of this application is below:
composite Basic
{
graph
stream<list<tuple<rstring person, int32 age>> people, list<rstring> cities>
entitiesStream = FileSource()
{
param
file : "cities.txt" ;
format : csv ;
}
stream<rstring jsonString, list<tuple<rstring person, int32 age>> people,
list<rstring> cities> plusJson = TupleToJSON(entitiesStream)
{
}
// By default, functors carry through attribute
stream<int32 personCount, int32 cityCount, rstring jsonString,
list<tuple<rstring person, int32 age>> people, list<rstring> cities>
plusCounts = Functor(plusJson)
{
output
plusCounts : personCount = size(people), cityCount = size(cities) ;
}
}
It is tedious to retype list<tuple<rstring person, int32 age>> people, list<rstring> cities over and over again. And if you decide a new version of your app, the cities.txt file is going to have some additional fields (say a list of landmarks), you have to make multiple edits to your spl code.
Using named types
One improvement is to name the type of the tuples in the file. We’ll call it
EntityType. This saves you from having to retype the type, and means it’s easy to change should you need to later.
This gives us an output stream for the first operator. For the rest, we’ll define those types relative to that EntityType.
Remember that if you have TypeOne and TypeTwo, and you want a type with all the attributes of those two types, that type would be TypeOne,TypeTwo. For example:
type TypeOne = rstring foo, rstring bar;
type TypeTwo = rstring baz;
type TypeThree = TypeOne,TypeTwo;
// this is the same as: type TypeThree = rstring foo, rstring bar, rstring baz;
That means that to prepend an rstring attribute named jsonString to EntityType, you’d write tuple<rstring jsonString>, EntityType.
Here’s the composite using the named type EntityType to represent the type of the tuples in the file and defining the other types relative to that:
composite NamedType
{
type
EntityType = list<tuple<rstring person, int32 age>> people,
list<rstring> cities ;
graph
// This one is much shorter, now.
stream<EntityType> entitiesStream = FileSource()
{
param
file : "cities.txt" ;
format : csv ;
}
// This gets a lot shorter
//V1: stream<rstring jsonString, list<tuple<rstring person, int32 age>> people, list<rstring> cities>
stream<tuple<rstring jsonString>, EntityType> plusJson =
TupleToJSON(entitiesStream)
{
}
// This one is shorter, but still messy.
//V1 : int32 personCount, int32 cityCount, rstring jsonString,list<tuple<rstring person, int32 age>> people, list<rstring> cities>
stream<tuple<int32 personCount, int32 cityCount, rstring jsonString>,
EntityType> plusCounts = Functor(plusJson)
{
output
plusCounts : personCount = size(people), cityCount = size(cities) ;
}
}
That’s better, but there’s another trick we can use to tidy it up a bit more.
Using the input stream type in the output stream
Notice that two of the operators are just adding a field to the input tuple. Conceptually, the output type is the new type with something else. Streams allows you to use the input stream as the type of the output stream, for example:
stream<inStream> outStream = Functor (inStream)
(This is particularly useful when you haven’t named the type.) We’ll re-write the example using this technique. For the counts operator, we’ll use an alias of the stream name, which can be useful when there are many input streams or if their names are long.
Here’s the resulting composite:
composite ShortVersion
{
type
EntityType = list<tuple<rstring person, int32 age>> people,
list<rstring> cities ;
graph
stream<EntityType> entitiesStream = FileSource()
{
param
file : "cities.txt" ;
format : csv ;
}
// uses entitiesStream to represent the type of entitiesStream
stream<tuple<rstring jsonString>, entitiesStream> plusJson =
TupleToJSON(entitiesStream)
{
}
// Alias plusJson to I, and use I to represent the type of plusJson
stream<tuple<int32 personCount, int32 cityCount>, I> plusCounts =
Functor(plusJson as I)
{
output
plusCounts : personCount = size(people), cityCount = size(cities) ;
}
}
In this formulation, it’s clear from the code which attributes are added by each operator. In addition, a change to the type of what is in the file only requires a change in one line.
Written by Kris Hildrum.#CloudPakforDataGroup