Cloud Pak for Data

Cloud Pak for Data

Come for answers. Stay for best practices. All we’re missing is you.

 View Only

Common patterns for tracking moving objects in Streams, part 1 

Mon August 17, 2020 04:15 PM

There are several use cases that involve tracking moving objects or people in real time. Some of these include wildlife tracking and monitoring, fleet management, traffic control, or even law enforcement and surveillance. The Geospatial toolkit in Streams has several operators that are designed to tackle common problems. This article will discuss some of these scenarios and how to use operators in the geospatial toolkit to solve them.

Table of Contents


Overview

A moving object is any device that is reporting its location periodically to Streams: a bus, taxi, ship, cell phone, etc.
In addition to knowing when, an object is stopped completely, it sometimes useful to detect that an object is moving but hasn’t left the same general area for some time.

For example, a freight company can detect when there is heavy traffic that is moving slowly to provide up-to-the-minute delivery times.

This article will cover the problem of detecting where and when moving objects are completely or relatively stationary. This is called hangout detection.
You'll learn how to use the Hangout operator to solve this problem and answer the following questions:

a. Which of the moving objects are hanging out together at the same time (co-hangout)?
b. Where are the most popular areas for hangouts right now?
c. How do I determine when an object starts/stops hanging out?

Assumptions

Some familiarity with Streams, including concepts such as operators, toolkits, tuples is required to follow along with this article. See the Quick Start Guide for an introduction to these topics.

In general, the operators in the Geospatial toolkit require an incoming stream of data about objects and their location.  In this article, this stream is referred to as the location input stream. At a minimum, each tuple in this stream must include the following attributes describing the current location of a moving object: latitude, longitude, timestamp and a unique string that identifies the object.

Detect where and when a moving object is completely or relatively stationary

What is the difference between relatively and completely stationary?
The following animation illustrates both scenarios:

As you can see, although one moving object is completely stationary (yellow) and the other is moving about (purple), both are remaining at Central Park.

The Hangout operator in the Geospatial toolkit can be used to detect when either scenario occurs. In the example, above, it would report that both objects are in a hangout. A hangout means an object has been within a specific geographic area for some time.

Reporting Hangouts – how?

Continuing the illustration above, reporting a hangout also involves reporting where it is happening. Since the yellow object is stationary, something like “the yellow object has been hanging out at coordinates (x, y) for 10 minutes” would be correct.

In the case of the purple object, it is more difficult to report where it is hanging out because it reports a different location each time.
Thus, latitude and longitude coordinates are insufficient for objects that are moving about. So, a more complex way to report the location of the hangout is needed.

The Hangout operator solves this problem by reporting the location of a hangout as a geohash instead of latitude and longitude coordinates. If you are unfamiliar with Geohashes, read the section in the appendix to learn more.

Configuring the Hangout operator’s parameters

To configure the Hangout operator to detect a Hangout, you must specify:

a) the maximum distance the object can travel, while still viewed as relatively stationary. This value is set in the cellSize parameter of the operator, which is the size of each geohash cell.
b) the minimum length of time the object must remain within the area specified above to be considered as idle or hanging out. This value is set using the minimumDwellTime parameter.

A hangout being detected means that the object has been in the same geohash G1 for M seconds. Geohash G1 has dimensions P x Pmeters. P and M are parameters cellSize and minimumDwellTime, respectively.

The values you choose for these parameters depend on your use case and the speed of the objects being tracked.

Some examples:

  • If you have a freight company and want to determine when a truck is completely stopped for 2 minutes or more, you could use a geohash cellSize of 75 meters and a minimum dwell time of 120 seconds.
  • Detecting where there is heavy traffic that is moving at 10 miles/h (2.77 m/s) or less, you could increase the geohashcellSize to 150m with a dwell time to 60 seconds.
  • If you are tracking wildlife and want to know where they stop to rest, to you might use a dwell time of 5 minutes and a cellSize of 100 meters.

A simple example

Our location data stream in this example is NextBus data. The NextBus service allows public transit buses to report their location periodically. The Hangout operator will use NextBus data as its location input stream and will detect when a bus is idle.

A bus is idle if it has remained within the same 75m x 75m geohash cell for 3 minutes or more.
Here is an example invocation of the operator:

stream<HangoutOutput> BusLocation_WithHangouts = Hangout(BusLocationStream)
{
   param 
minimumDwellTime : 180u ; 
cellSize : 75.0; //geohash cells of 75m x 75 m 
sampleLatitude : 37.0 //use 37 for San Francisco area
 precision : 15.0 ; 
timeStamp : reportTime ; 

output BusLocation_WithHangouts : 
       idle = IsInHangout(), 
     duration = HangoutDuration(), 
      geohash = HangoutGeohashBase32() ; 
} 

The remaining parameters are as follows:

  • sampleLatitude : Use this parameter to indicate roughly where in the world the objects are, so that the hangout operator can adjust the geohash cells to match the target cell size. Since the data is from San Francisco,  so set the sampleLatitude to 37.
  • precision: This is used to adjust for inaccuracy in the location reported by the buses, and to account for scenarios where a bus moves between neighbouring geohashes. By setting the precision to 15m, a bus will be treated as still within geohash cell G1 even if it is up to 15m away from the border of cell G1.

Operator output

Let’s look again at the operator’s output functions:

output
     BusLocation_WithHangouts :
          idle = IsInHangout(),
          duration = HangoutDuration(),
          geohash = HangoutGeohashBase32() ;
}

For each incoming tuple, the following attributes are added to the output:

  • idle, a boolean indicating whether or not the object is hanging out,
  • geohash, base-32/ASCII location of the hangout,
  • duration, how long the object has been hanging out, in seconds.

Downstream, a Custom operator prints out whether or not a hangout was detected.

This is the SimpleNextBusHangout application in the toolkit’s samples.

Running it for a few minutes will print out when a bus is idle and the geohash:

Bus 8629:sf-muni:nextbus has been idle at geohash 9q8yt0m for 351 seconds.
Bus 5529:sf-muni:nextbus has been idle at geohash 9q8znb5 for 410 seconds.
Bus 5505:sf-muni:nextbus has been idle at geohash 9q8yvzy for 422 seconds.

Using detected hangouts

The rest of this section will show code examples of using the output of the Hangout operator.
All the examples discussed in this article are included in the streamsx.transportation toolkit, which already has operators to connect to NextBus. See the section about the samples below on how to run them.

How can I determine when an object stops hanging out?

Perhaps you are interested in knowing when the hangout stops. If the hangout is the expected behaviour and departing a specific region is unusual, you could create an alert for this.

You can use the MatchRegex operator to detect when a hangout stops. It allows you to define a pattern in your data as a regular expression, and then it detects that pattern.

In our case, the pattern you want to detect is that the idle attribute changed from true to false between two consecutive tuples.

Looking again at the Hangout operator in the example above, notice that the operator’s output includes a boolean function IsInHangout. This function’s value is assigned to the idle attribute.  You can use this output to detect when an object that was previously hanging out (idle = true) has stopped (idle = false).

Below is the invocation of the MatchRegex operator:

stream<HangoutSummary> HangoutSummaries = MatchRegex(HangoutDetectionOutput)
{
param 
     partitionBy : id ; 
//The pattern is a tuple with hangingOut = true immediately followed by a tuple with isHangingOut = false pattern : "hangout stop" ; predicates : { hangout = idle == true, stop = idle == false } ;
output HangoutSummaries : totalHangoutDuration = Max(duration), //how long they were at the geohash id = Any(id), //id of the bus geohash = First(geohash), //location of hangout lastSeenLatitude = First(latitude), //last reported latitude within geohash lastSeenLongitude = First(longitude), //last reported longitude within geohash
lastSeenTime = First(reportTime) ; //last reported timestamp within geohash }

After detecting the pattern, the operator will produce output indicating the last time the object was in the geohash, how long they spent in total, etc.
See the DetectHangoutEnd composite in the sample for the full application.

This problem could also be solved by using aCustom operator and multiple data structures to keep track of objects and their state. This is a much simpler approach.

MatchRegex operator SPLDoc

Group hangouts/Rendezvous: Which objects are hanging out in the same geohash at the same time?

If there are many buses in the same geohash, this could indicate an unexpected obstruction and the municipality could respond by diverting buses heading to that area.

If 2 or more buses are reported to be hanging out at the same geohash G1, we can say that they are in a group, or co-hangout.

I will use the Aggregate operator with the following parameters:
– Use groupBy to group all the tuples in the input by geohash so that all buses hanging out in the same geohash will be in the same group.

To be certain that multiple buses are hanging out at the same geohash at the same time, you must also compare the timestamps on each tuple.

– Use a tumbling window with a 75 second time delta. What does this mean? For each incoming tuple with a geohash G1, it will be assigned to the group of tuples for G1 if its timestamp is within each 75 seconds of the other tuples in the group. This value can change depending on the use case and/or how frequently the moving objects are reporting their location.

Every time the Aggregate operator produces output, it will include the following information for each geohash group:

  • busCountPerHash : Number of buses in the geohash
  • buses_in_hash: List of the ids of all the buses in the geohash
  • firstReportTime: the timestamp of the earliest tuple.

Here is the operator’s invocation :

stream<HangoutsByGeohash> HangoutsByGeohash_ = Aggregate(IdleBusStream as in0)
{

 window 
      in0 : tumbling, delta(reportTime, 75000l) ;
 param 
     groupBy : geohash ;

output HangoutsByGeohash_ : 
      total_geohashes = CountGroups(),
       buses_in_hash = CollectDistinct(id),
       busCountPerHash = CountDistinct(id), 
        geohash = Any(geohash), 
      firstReportTime = Min(reportTime);
 }

Add a Filter to isolate geohashes where there is more than 1 bus:

stream<GroupHangoutsByGeohash> MultipleBusesInGroup = Functor(HangoutsByGeohash_ as inputStream)
{
     param
          filter : busCountPerHash >= 2 ;
}

This application is in GroupHangouts.spl. Here’s some sample output:

There are 5 buses hanging out at geohash 9q8yte2, the buses are ["6710:sf-muni:nextbus","6718:sf-muni:nextbus","6623:sf-muni:nextbus","6709:sf-muni:nextbus","6659:sf-muni:nextbus"].
There are 2 buses hanging out at geohash 9q8yute, the buses are ["5613:sf-muni:nextbus","5555:sf-muni:nextbus"].
There are 6 buses hanging out at geohash 9q8znb5, the buses are ["5433:sf-muni:nextbus","5416:sf-muni:nextbus","8441:sf-muni:nextbus","6647:sf-muni:nextbus","7241:sf-muni:nextbus","5600:sf-muni:nextbus"].

Where are the most popular areas for hangouts right now?

Extending the previous scenario, after identifiying which regions have the most number of idle buses, the transit authority can prioritize which areas need the most attention based on the number of idle buses therein.

Using the output of the above operator that detects when 2 or more buses are in the same geohash, a 2nd Aggregate operator will build a list of geohashes and a corresponding list of the number of buses in each geohash.

stream<list<int32> busesPerGeohash, list<rstring> geohashList,
     int64 first_arrival_timeStamp> HangoutStats =  Aggregate(HangoutsByGeohash_)
{
    window
        HangoutsByGeohash_ : tumbling, punct() ; //use the punct from the preceding Aggregate operator
    output
        HangoutStats : 
                busesPerGeohash = Collect(busCountPerHash),
                geohashList = Collect(geohash), 
                first_arrival_timeStamp = Min(firstReportTime) ;
}

This 2nd Aggregate operator produces output tuples like this:

busesPerGeohash=[2,1,2,1,2,1,1,1,1,2],
geohashList =["9q8znb6","9q8yvtb","9q8zn6w","9q8zn6x","9q8yuhr","9q8yywy","9q8yvfk","9q8ywrc","9q8yvfm","9q8yyfn"]

These two lists are in the same order, e.g index 0 of geohashList is geohash 9q8znb6, and index 0 of busesPerGeohash is 2, which means there are 2 buses hanging out at geohash 9q8znb6, 9q8yvtb has 1 bus and so on.

Now, you can extract the top 5 geohashes by popularity from these lists.
See the TopKHangouts Functor in the sample for the full code.
Sample output tuple:

List of geohashes: ["9q8yyv4","9q8ytdu","9q8zn4x","9q8yywd","9q8ywyr"]
Number of buses in each geohash:[6,3,3,3,3]
Time:  Thu Mar 15 13:42:24 2018

This is interpreted as, “In the 75 second interval starting Thu Mar 15 13:42:24 2018, there were 6 buses hanging out at geohash 9q8yyv4.”

Something similar can be done to get the least popular hangouts.

The sample applications

The samples are in the streamsx.transportation toolkit. This toolkit already has operators to connect to NextBus. These applications are in the com.ibm.streamsx.transportation.hangout.demo namespace.

The samples are organized using a microservice architecture, such that the input data from NextBus is retrieved in one application, and the other applications that perform the analysis connect to that application to process the data.

The SimpleNextBusHangout composite connects to NextBus, adds the hangout information to the output stream, and publishes that stream. The remaining applications subscribe to the published stream and compute each of the sub-scenarios discussed above.

So, you must run the SimpleNextBusHangout application first before running the remaining applications.
This has the advantage that you only have to have one application connecting to NextBus, so as not to exceed your connection quota.
Since this is using live data, you might run the applicaitons for some time and there might still be no hangouts detected if no buses are idle. If you would like to see some hangouts detected, you could experiment by reducing the minimumDwellTime in the SimpleNextBusHangout to a few seconds.

Running the samples

Clone or download the streamsx.transportation toolkit, and import the NextBusSamples project into Streams Studio to run the samples.

Summary

This article has discussed the Hangout operator and how it can be used to monitor moving objects in real time and identify where the objects are spending time. It also covered ways that you can use detected Hangouts.

The next article will show how to use the toolkit to solve problems involving geofencing, that is, tracking moving objects around known areas of interest.

Useful links

Reference


Appendix: Geohashes

A geohash is a unique identifier for a specific region on earth. The geohash algorithm encodes a latitude and longitude as a binary string called the geohash. The geohash string can also be represented alphanumerically, e.g. “dr5“. I’m going to use alphanumeric geohashes for simplicity. The number of bits used to compute the geohash determines the length of the geohash string . In turn, the length of the geohash string also indicates the size of the geohash region it identifies. These regions are sometimes referred to as geohash cells.

Understanding Geohashes

Let’s look at some examples. Imagine that Freda is at Central Park, with a cell phone that reports the current location as 40.766604, -73.975038. (Freda is the purple star below).


Using 8 bits to to encode Freda’s location will result in a 1 character geohash d. Since each geohash corresponds to a specific region on earth, let’s have a look at the region covered by “d”:

The cell sizes are very large, e.g every point in the Eastern United States is said to be in geohash d. At this resolution, saying that Freda and Carlos are both at geohash d is not very useful in determining exactly where they are, or if they are close to each other.

Using 10 bits results in a geohash of 2 characters, dr:

The regions are much smaller in the map above, but dr covers almost all of New York state. A little better, but not quite very useful for most scenarios.

Using 30 bits, we get a 6 character geohash, we have dr5rut:

Geohash cells can get even smaller, with the smallest geohash being 22m2 in area. But hopefully by now you can see the pattern.

To summarize:

  • In all 3 examples, the same latitude and longitude were used to compute the geohash. However, the geohashes in all 3 examples were increasingly longer but the cells were increasingly smaller. This is because the number of bits used in the geohash algorithm determines both the cell size and the length of the resulting geohash.
  • Each geohash cell is rectangular, and roughly the same size as the next cell, but not always. As the regions got smaller, the reported geohashes got longer, while maintaining the same prefix, dr, and then dr5rut. When comparing two geohashes of the same length, the more prefix characters they have in common, the closer the two geohashes are.

Using Geohashes

The Hangout operator in the Geospatial toolkit reports hangouts as geohashes. In addition, the toolkit also includes functions and operators for working with geohashes, including functions to generate a geohash from a given point or polygon and functions to convert from a binary to alphanumeric (base32) geohash.

Useful links

Functions for geohashes and more in the Geospatial toolkit.

Geohash explorer: a great tool to understand and visualize geohashes

geohash.org: Generate a geohash from coordinates. Also, if you have a geohash, geohash.org/ displays the geohash’s center on the map.


#CloudPakforDataGroup

Statistics
0 Favorited
21 Views
0 Files
0 Shares
0 Downloads