Skip to content

Real-time Financial Technical Indicator Generation in Apache Flink

Notifications You must be signed in to change notification settings

terrierteam/FinFlink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FinFlink

Real-time Financial Technical Indicator Generation in Apache Flink

What is FinFlink?

Most financial analytics and financial machine learned models do not process trading data raw, but instead convert that data into a series of more meaningful ‘technical indicators’ that capture price movements and trends. FinFlink is a Java libruary which was developed during the Infinitech project and implements real-time technical indicators within the Apache Flink distributed processing platform.

Terminology

Technical indicators are computed over distinct periods of time given some historical context. To understand how these metrics are calculated in FinFlink we need to first define some terminology to represent the different time periods involved. To describe how technical indicators are calculated, we need to define three core concepts, the Window, the Compute Period and the Compute Time:

  • Window: The Window represents the maximum amount of historical information that will be used when computing the technical indicator, defined as a period of time. For example, if the window is set to 5 minutes, the last 5 minutes of price history will be considered during indicator calculation. FinFlink maintains a data structure containing all trades per financial asset within this sliding window, which is available at technical indicator calculation time.
  • Block Size: If you examine the financial indicators above, many of these indicators compare prices for a time period t with earlier time periods (typically t-1, but sometimes earlier). The practical application of this is that we need to divide our Window into a number of time periods of equal length, with the current period ending at the current time. We refer to the size of these periods as the Block Size, which is defined as a period of time and must be equal to or less than the Window. For instance, in Figure 2, the Block Size is 3 minutes. The Window is then divided into n distinct periods of the provided Block Size, where the most recent is denoted t, preceded by t-1, t-2, etc, until the remaining time in the Window is smaller than the Block Size. In the case of Figure 2, only a single period is produced, as only a single 3-minute block can fit inside the 5-minute window.
  • Compute Time/Slide: As time progresses, the Window will accumulate new trades as they appear in the stream and discard old trades when they exit the window period. However, we also need to define when to trigger the computation of the technical indicators, known as the Compute Time(s). Typically, we don’t simply want to calculate the technical indicators once, but instead repeatedly and frequently re-calculate these indicators over time, such that they always reflect the current data about each financial asset. Hence, we define the ‘Slide’, which represents the time difference between re-calculation of the technical indicators. In the case of Figure 2, the slide is set to 2 minutes, meaning that we will refresh the technical indicators every two minutes.

It is worth noting that it is common for the Block Size and Slide to be set to the same value, however this does not need to be the case.

Architecture

FinFlink is designed to enable distributed parallelized computation of different the different financial indicators. Within FinFlink, parallel computation is enabled at two levels, asset level and asset stream level. By default, FinFlink will consider each financial asset its own stream and can transparently distribute the ingestion of trades for each asset to different task managers within Flink. If collecting multiple sources of trade data for a single asset (e.g. when that asset is being traded on multiple markets), then the trades from each source can also be accumulated in parallel (although this comes at an additional merging cost).

Trade data can be streamed in real-time into FinFlink, and the trades for each asset will be accumulated into windows within Flink TaskManagers. The state for each window is stored within a Trade Accumulator. TaskManager states are periodically check-pointed, and any lost state (due to host failure for example) can be regenerated by replaying the data for that TaskManager from the last checkpoint. Internally, we can break down the technical indicator generation process within FinFlink into three distinct phases once a Compute Time is reached:

  • Trade Accumulator Merging: In most scenarios, we will have only a single trade stream for a financial asset. However, in the event that we have multiple streams for an asset and that asset are being processed by different TaskManagers, resulting in multiple distributed trade accumulators. In this scenario, the trade data from each TaskManager for an asset will be transferred and merged into a single (complete) window representation (trade accumulator) before progressing to the next stage.
  • Trade Period Generation: As noted earlier, most technical indicators compare price data across different time periods. As such, the second phase is to divide the trade data for the current window into these periods based on the defined Block Size. Each period is represented by a Trade Period object that holds the trades and calculates useful metadata such as high and low prices. Note that the data scientist can define multiple block sizes. By doing so, multiple trade period sets will be created.
  • Technical Indicator Calculation: Finally, each trade period set is passed to a Technical Indicator Pipeline which defines what technical indicators to compute for that block size, which emit Technical Indicators objects containing the resultant features as an output stream.

About

Real-time Financial Technical Indicator Generation in Apache Flink

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages