H.P.C. Chapter 2 - Design & Architecture

hpc

Aug 10

The New Paradigm

As mentioned in the previous article, our Processing Engine’s design is heavily based on the SEDA and data flow programming paradigm. our Processing Engine paradigm mainly consists of 2 parts:

Data Model Extension
Flow Model Extension

Data Model Extension

Data Model Extension focuses on data definition within the business application. Data defined in the Data Model Extension essentially serves as the heart of the business application carrying all necessary information ranging from business domain data to all the function requests and responses provided by the business application. In our Processing Engine, we define data into the following categories:

Persistent Data
Event Data
Application State Data
Enumeration

Persistent Data is the type of data that the business application needs to access in an eventually consistent read reference (it is ok to read the snapshot of the Persistent Data that might or might not represent the latest information) and requires permanent persistence in the data store with querying capability (i.e User’s information such as age, address, etc). The Data Model Extension automatically generates persistence data accessing classes for CRUD operation along with classes that handle data querying.

Event Data is an immutable data type that represents a certain thing that has occurred within the business application. It is essential to have Event Data as immutable data since if a certain thing has occurred no one should be able to change it. Event Data can represent a user’s request within the business application (i.e Someone place an order), or response to a request (i.e. An order has been placed), or outcome triggered by an incoming event (i.e. Market price has been changed when a new order was placed). Public Request based Event Data naturally serves as the API of the business application.

Application State Data is a mutable data type that represents the current state of the business application. Application State Data can only be referenced or updated in an atomic fashion to guarantee correctness (i.e. Bank account for withdrawal or deposit). In F1, a specific Application State Data can only be referenced or updated in one thread and the same thread only during the application lifecycle.

Enumeration represents a valid range of items with specific business meaning in the business application (i.e. Types of User such as Administrative User, Guest User, Normal User).

All Data types except Enumeration have to provide the Context information of the underlying data information. Context is defined by the business application designer to link data that are strongly related to each other together. Data of the same Context will be processed in one thread and the same thread only to guarantee determinism and atomicity. For example in Financial Foreign Exchange Trading, one such context would be USD/CAD, Market Data Update Event of USD/CAD (Event Data), Order Book of USD/CAD (Application State Data), Place USD/CAD Order (Event Data) are all related to the same Context and will all get processed in one thread and the same thread.

The Data Model Extension also defines a binary encoding of the data called Data Model Extension Elastic Encoding. Our Processing engine uses Data Model Extension Elastic Encoding for in-memory data access within the business application to replace POJO (for Java implementation) and also as a wire protocol. Data Model Extension Elastic Encoding removes the need to perform serialization and deserialization of the Event Data and also allows our Processing Engine to use more shared memory-based infrastructure to improve efficiency and performance. The reason why we call this encoding elastic is that it is a fixed-length binary encoding for fast data access and it also has the capability to be compact for network transfer.

Flow Model Extension

Flow Model Extension focuses on the logic flow definition within the business application. Flow Model Extension provides a very natural view of what the logic of the business application looks like and how the business application reacts to Incoming Events. It is therefore very easy to provide a visual representation of the business logic to both technical and non-technical designers of the business application. Flow Model Extension consists of the following components:

Task
Flow

Task contains the actual business logic on how to handle/react to the Incoming Event Data. Each definition of Task consists of the following types of input parameters:

Trigger Event Data (required) is the Event Data that triggers the execution of the Task
Application State Data (optional) is the current state of the business application that the Task needs to modify or access
Query (optional) allows Task to perform Query on Persistent Data in a nonblocking fashion
Output Event Data is the result from the processing of the Task either responding back to an Incoming Event request or an Output Event to trigger further Processing

Flow contains Task logic routing information showing how each Event Data travels through different Tasks for Processing

Processing Engine Overview

Our Processing Engine is based on a very simple concept - Input, Processing, and Output:

Input represents Incoming Data Model Extension Event Data from various input sources including network media and Processing loopback Data Model Extension Output Event Data
Processing represents corresponding CPU bound logic handling the Incoming Data Model Extension Event Data in each Flow Model Extension Task
Output represents the resulting Output Data Model Extension Event Data generated from the Processing of the Incoming Data Model Extension Event Data in the Flow Model Extension Task that will either require further Processing or broadcast out back to the original sender of the Incoming Data Model Extension Event Data as a response

Our Processing Engine has a very clear and precise separation of Input, Processing, and Output for efficiency purposes to avoid any I/O operation within the CPU-bound Processing.

Input

Our Processing Engine’s Input is responsible for the following:

Receive all Incoming Input Data Model Extension Event Data from different Input Sources including network media and loopback Output Data Model Extension Events. Current Processing engine implementation uses Aeron to handle the Incoming Data Model Extension Event Data via network media and Chronicle Queue for the loopback Output Data Model Extension Event Data
Arrange all Input Data Model Extension Event Data into sequential input queue based on arrival time in FIFO style to provide processing fairness based on time of arrival of each event
Filter out all Input Data Model Extension Event Data that our Processing Engine instance is not configured to control load distribution among Processing Engine Cluster
Based on Context of the Incoming Data Model Extension Event Data, send the Incoming Input Data Model Extension Event Data to the same queue that handles the Context. Current Processing Engine implementation uses Chronicle Queue for the Input module to send the Incoming Input Data Model Extension Event Data to the Flow Model Extension Flow Processor such that we can have a highly efficient queue and at the same time able to keep the full journal of all the Incoming Input Data Model Extension Event Data for restart/debug replay purpose

The last step is important. It is what we call Context-Based Multi Threading. Our main focus for Processing Engine is to provide Deterministic Processing. We achieve that by ensuring all Incoming Input Data Model Extension Event Data related to the same Context are processed by one thread and the same thread only throughout the entire lifecycle of the application. Since it is a single-threaded execution, development is much simpler without the complicated synchronization for concurrent access. Removing concurrency will also guarantee determinism. And if the CPU cache design allows efficient multi-processing, this approach can provide concurrent Processing for Incoming Data Model Extension Event Data that have different Contexts since they are not related to each other.

Processing

Flow Processor provides Task execution on the Incoming Data Model Extension Event Data. Each Flow Processor is bound to a specific CPU processing unit (core) through thread affinity. Each Flow Processor will read from the corresponding Data Model Extension Event Data Queue to handle all the Incoming Data Model Extension Event Data from the Processing Engine Input in a busy spin fashion. When the Task generates an Output Data Model Extension Event Data, Flow Processor will write to a shared memory-based output queue to either send the Output Event Data back to the sender of the request or loopback to the Flow Processor for further Processing. Each Task can only contain CPU bound logic and cannot contain any I/O bound logic to ensure the efficiency of Processing

Output

The Processing Engine’s Output is responsible for the following:

Collecting all the Output Data Model Extension Event Data from all the Flow Processors' output event queues
Sort all the Output Data Model Extension Event Data into a sequential output queue in FIFO style to ensure processing fairness. Current Processing Engine implementation uses Chronicle Queue to provide the sequential output queue implementation
Send the Output Data Model Extension Data Event to network media to allow other instances of Processing Engine Platform in the cluster to the processor for the Processing Engine Gateway to send back to the end-user. Current Processing Engine implementation uses Aeron to provide the network communication

Data Cache

Other than the main Processing Engine I/O and Processing modules, Processing Engine Platform also provides a zero-copy shared memory-based Processing Engine Data Cache for the Processing Engine Flow Processor to access Data Model Extension Persistent Data in a highly efficient fashion. Processing Engine Data Cache is a single writer concurrent reader type of data cache. Processing Engine Data Cache stores Data Model Extension Persistent Data in the native Data Model Extension Elastic Encoding format such that Processing Engine Flow Processor can access the Data Model Extension Persistent Data information via shared memory through Data Model Extension Flyweight Objects.

Data Store Service

Along with Processing Engine Data Cache, Processing Engine Platform also provides Processing Engine Data Store Service to interface with the underlying Data Model Extension Persistent Data Store (such as relational database engine, NoSQL database engine, or flat-file storage) to provide Data Model Extension Persistent Data Querying capability. Particularly for JDBC, Processing Engine Data Store Service can convert synchronous I/O-based query operation into asynchronous operation for Processing Engine Flow Processor to consume.

For the upcoming series of articles, we will go into deep detail to describe how we implement the Processing Engine including the design of Data Model Extension encoding and the Flow engine, and how the design affects the performance of the Processing Engine.

hpc

GoodLabs Branding

H.P.C. Chapter 2 - Design & Architecture

Syndrome Anomaly Detection Chapter 3 - Are These Symptoms Normal?

Quantum Payment Optimization Chapter 3 -Approaches to Optimization