Global AI and Data Science

Global AI & Data Science

Train, tune and distribute models with generative AI and machine learning capabilities

View Only

Back to Blog List

User behavior analysis based on transaction data using the IBM SPSS event-based time series algorithm

By A PENG ZHANG posted Thu November 08, 2018 12:38 AM

Advances in data collection and data storage technologies lead to the increasing availability of complex temporal data sets, where the data instances are traces of entity behaviors that are characterized by the time series of events with single or multiple variables. This kind of data is called an event-based time series (EBTS). The analysis of these temporal data is one of the most challenging topics in data mining research.

About event-based time series

EBTS consist of one or more sequences of events that occurred at different time points. Each event is optionally linked to a numeric value, and the time points are unevenly spaced, where the time spaces between consecutive events are of arbitrary length.

EBTS data can be collected from many industrial or scientific domains. In above data example above, it shows the online travel agency data. Each transaction record includes a customer ID number, the type of product booked online, a time stamp shows when a booking event happened, and a numeric value shows how much money this booking cost. Data with such characteristics can also be found in other cases. For example, customer can conduct different activities at different time points: withdraw, deposit, transfer, and so on. At a gas station, customer activities might include topping off, refilling, shopping, and so on. All these activities or events of customers can be represented in EBTS data. So an enterprise would benefit from EBTS pattern analysis because it provides behavior insight and understanding, such as behavior prediction, demand shaping, personalized promotion.

Compared with traditional time series or sequence data, these are some of the following challenges for EBTS pattern analysis:

Different from sequence analysis in which only orders of events are mined. EBTS pattern analysis also needs to mine the time intervals between consecutive events.
EBTS pattern analysis is interested in the values that are linked with events and their adjacent relationship rather than mining the events themselves.

To solve those challenges, IBM provides an EBTS analysis algorithm that tries to discover temporal patterns in EBTS data.

IBM SPSS EBTS Analysis

The IBM SPSS EBTS Analysis algorithm can determine temporal patterns in EBTS data by taking into account two elements for each event: time interval and event value. Time interval and event value reveal the sequential relationship among adjacent events. Temporal patterns are determined across all the entities and they can be used as a feature for customer segmentation or behavior prediction.

EBTS analysis can handle the following type of data:

Consist of one or more series of events occurred at different time points.
Each series is un-equally spaced time series.
Each event might link with a numeric value.

EBTS analysis can provide the following features:

Discretization rule for the linked values and time interval.
Temporal patterns frequently occurred across all the entities.
Temporal features to characterize each entity.

Use Case

Jane runs an online travel agency and she wants to understand from her historical data the buying behaviors of her online customers. Further, Jane wants to understand how to maximize the value from offers she might provide to customers and identify an optimal time to make the offer.

The data from the online travel agency includes the following information:

There are millions of customers to be analyzed.
Each customer has a sequence of transactional events.
Each event data includes customer ID, event time, event type, and numeric event value which is the amount of money that is spent on the event.

The characters of the data match the property of EBTS data.

Jane heard from her friend that the EBTS analysis in IBM Watson Studio can help analyze her travel agency data. She opened the IBM Watson Studio site to start the analysis in the following steps:

Step 1. Load Data

Jane specified the data type for each field of the data, and loaded the data ebts_data.csv.

val schema = StructType(

StructField("CustomerID", StringType, true) ::

StructField("EventTime", DateType, true) ::

StructField("EventType", StringType, true) ::

StructField("EventValue", DoubleType, true):: Nil)

val df = sparkSession.

read.format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").

option("header", "true").

schema(schema).

load("ebts_data.csv")

Step 2. Set Parameters and Run EBTS Analysis

Jane set the entity ID, event time field, event type field, and event value field. She set the maximal level of pattern to 3 and threshold of vertical support to 0.6.

import com.ibm.spss.ml.frequentpatternmining.EventBasedTimeSeriesPatternFinding

val ebts = EventBasedTimeSeriesPatternFinding().

setEntityIDField("CustomerID").

setEventTimeField("EventTime").

setEventTypeField("EventType").

setEventValueField("EventValue").

setPatternLevelMethod("CUSTOMIZE").

setMaxPatternLevel(3).

setVerticalSupportThreshold(0.6).

fit(df)

val patternXML = ebts.patternXML()

Step 3. Check Result

Jane got the result as a pattern XML file. In the output pattern XML, she found the discretization information and patterns for each customer.

Discretization rule of event value by each event type:

In the output, the money spent for hotel was split into the following categories:

Category 1: 0 to 158
Category 2: 158 to 170
Category 3: 170 to 174
Category 4: 174 to 183
Category 5: greater than 183

Discretization rule of time interval:

Time interval was split to the following categories:

Category 1: 0 to 18 days
Category 2: 18 to 28 days
Category 3: greater than 28 days

Patterns with vertical support and confidence:

For the pattern with ID 3, it described a customer who spent (0 to 158) to book a hotel, then after (0 to 18) days he/she spent (0 to 110) for a ticket.

Patterns of each customer:

For customer with ID 0, the following first two patterns are described:

The value (0, 18) is pattern 0 and frequency 18, which means this customer performed the booking pattern 0 and did it with 18 times.
The value (1, 2) is pattern 1 and frequency 2, which means this customer performed the booking pattern 1 and did it two times.

Step 4. Prediction

Jane wanted to predict the next customer pattern customer will take, she runs the following code to get the prediction:

val prediction = ebts.transform(df)

prediction.show()

The result is:

+----------+---------+----------+

|CustomerID|patternID|confidence|

+----------+---------+----------+

| 2| 3| 0.75|

| 4| 13| 1.0|

......

In the output, customer with ID 2 will take pattern 3, which means in the next 0 to 18 days this customer will buy a ticket with spending 0 to 110. Based on the buying behavior pattern analysis, the confidence for this prediction is 0.75.
Customer with ID 4 will do pattern 13 and the confidence for this prediction is 1.0.

With all this information, Jane has a better understanding of her customers' behavior. She can provide more suitable offers for her customers based on these patterns.

Locating the IBM SPSS EBTS Algorithm

#GlobalAIandDataScience
#GlobalDataScience

0 comments

68 views

Permalink

https://community.ibm.com/community/user/blogs/a-peng-zhang/2018/11/08/ebts-algorithm