• Sun, May 2024

POINT PROCESSES AT A GLANCE

POINT PROCESSES AT A GLANCE

For the entirety of my odyssey to complete writing my dissertation, I had devoted time and assiduous efforts to formulate models based on the theory of point processes, which aim to predict the numbers of reposts (or retweets) stemming from the original posts (or tweets) on a renowned microblogging platform called Twitter.

It has been of immense interest to me, as an avid researcher, to share what point processes are, to general readers who possess little to no prior knowledge of mathematics or statistics. This article shall explain what point processes are in an approachable manner, outlining their various forms in tandem. Therefore, complicated formulations or mathematical jargons are kept afar to prevent readers from being led astray.

A point process is a stochastic or random process whose realizations consist of point events in time or space, known respectively as a temporal point process and spatial point process . Furthermore, when one aims to model data involving both space and time, a more complex point process, known as the spatial-temporal point process, will be used. Our discussion herein focuses on the more vastly used temporal point process, which is a counting process up to a certain time point.  

The Poisson process is a subclass of point processes that supports the more complex formulations of point processes such as the Hawkes process. Such a process can be homogeneous or inhomogeneous in nature, the former of which is based on a constant rate of events whilst the latter has a time-varying rate. The rate here can be defined as the expected number of events in a specific time interval.

Diving deeper into the formulation process would acquaint us with the self-exciting point process, which is a process where the event arrival rate depends on instances from the past. Under such a process, the cumulative effects of all previous instances are accounted for, with the more recent events exerting greater degrees of influence. When triggered by such an excitation effect, the arrival of each event will make the future arrivals of events more probable.        

Naturally, self-excitation is an especially useful feature in modeling the retweet activities on Twitter as posts rapidly going viral tend to get retweeted more by interconnected users on the network. Shown below is a graphical representation of how the occurrences of events intensify a temporal point process, with the horizontal axis being the time and the vertical axis being the associated intensity (the labels have been suppressed to make this appear more intuitive).      

image.png

Under the context of Twitter, the illustration can be thought of as the intensity of a retweet process, where subsequent retweets separated by smaller time gaps will substantially increase the intensity of the process, indicated by each spike. The different levels of intensities are indispensable in determining the number of future events.      

A Hawkes process is a type of self-exciting point process, with a baseline intensity and a component to account for the self-excitation of events. For simplicity, the component governing such self-excitation effects shall be referred to as the excitation function. The Hawkes process serves as the basis for a multitude of models proposed in the literature to capture the dynamics of various real-life phenomena and to make predictions accordingly.        

The self-exciting model proposed in my dissertation to capture the dynamics of retweets was motivated by the Hawkes process, with similar components incorporated. The main components include the numbers of followers of both tweeters and retweeters (incorporated into the baseline and excitation functions) as well as the cumulative effects of previous events. With the formulation done, the associated parameters can be obtained, which can then be used to predict future occurrences of events. In layman’s terms, a model parameter is a configuration variable that is internal to the model, the value of which can be estimated from the data available.        

At this point, it is worth noting that the modeling and prediction procedures are done at a microscopic level, for individual retweet cascades. Specifically, a retweet cascade contains the time records of the original tweet, together with subsequent retweets, up to a certain observation time point. Features such as the number of followers may also be captured for modeling and prediction purposes.        

The applications of point processes, in particular self-exciting point processes, are abundant. For its potential applicability in Malaysia, we shall take an indicative example using popular online shopping platforms such as Shopee and Lazada. The marketers of these shopping platforms can benefit from using (self-exciting) point processes to model the behaviors of their customers followed by predicting their future numbers of purchases, based on their past transactions. Features such as the amount spent per transaction may also be used to predict future transactions more accurately. Such data, however, is not readily available to the public and can be arduous to collect since it requires the exact time points (and features such as the monetary values spent, if available) of purchases before the model can be formulated and predictions can be made.