Demystifying System Design of Electronic Stock Exchange Applications

11 min readApr 1, 2023

Sharing it as part of my learning and experience, I faced during SDE’s level interviews. Recently, I have been asked what trading app do I used mostly and if I want to build it from scratch what steps I would be choosing.

Prominently their expectation and my response were focused on Data Streaming & Processing. As Electronic Stock Exchange systems are real-time systems but data stream processing works perfectly for a near real-time system. My understanding of NSE and Nasdaq is that they do have a stream processing system but the actual Matching engine is real-time and much complex to support partial execution and some various instruments and times play key roles in their trading. And Data Stream processing is not real-time but “near-real-time”.

Understand “Real-time” vs “Near Real Time”

Streaming data processing means that the data will be analyzed and that actions will be taken on the data within a short period or near real-time, as best as it can.
Real-time data processing guarantees that the real-time data will be acted on within a period, like milliseconds. An example would be for-real time application that purchases a stock within 20ms of receiving a desired price or collision detection.

High-Level Architecture

I intentionally excluded — Monitoring Tracing, Alerts, CDN, Network Proxy, and a few more data governance/DLP/compliance components.

Architecture Prefered — Lambda architecture

What is Lambda architecture?

Lambda architecture is a data processing architecture in which one takes advantage of the batch as well as the stream processing method.

Another relevant architecture is Kappa, but I felt Lambda makes more sense in this scenario. Let's talk about each component mentioned in above Figure 1.

API Gateway

An API gateway can play a crucial role in an electronic stock exchange application by providing a single-entry point for all incoming requests from various clients, such as trading bots, mobile applications, and web applications. The API gateway can perform a variety of functions to enhance the security, scalability, and reliability of the application. Such as

Authenticate and authorize incoming requests,
Verifying API keys and credentials,
Manage traffic flow and load balancing.
Can also perform caching, logging, and monitoring functions.

Why Kong?

Kong is one of the leading API Gateways which supports all the above-mentioned roles of API gateway which is why I mentioned Kong.

Authentication Server

One of the primary roles of an Authentication Server in an electronic stock exchange application is to verify the identity of users. The server can authenticate users/brokers using a variety of mechanisms, such as username and password, two-factor authentication, biometric authentication, or digital certificates. The Authentication Server can also perform authorization, ensuring that users have the necessary permissions to access specific resources or perform certain actions.

Why Okta?

Okta platform provides all the above-mentioned benefits other than being an identity and access management platform.

Message Broker (Incoming orders Or Trade Orders)

A Message Broker can play a crucial role in a stock exchange application by providing a scalable and reliable messaging infrastructure that enables efficient communication between different components of the system.

Why Kafka?

Kafka is a distributed messaging system that provides several benefits for businesses and organizations that need to process and analyze large amounts of data in real time. Some of the reasons why Kafka is used include:

Scalability
Durability
Real-time processing
Flexibility
Integration

Instrument Bid Matching Engine

A match engine is a critical component in a stock exchange application that facilitates the matching of buy and sell orders for different financial instruments, such as stocks, bonds, and derivatives. The role of a match engine in a stock exchange application includes the following:

Order matching:
Price discovery
Trade execution
Order management
Scalability

Why Spark Streaming?

Spark Streaming is a real-time processing framework that is part of the Apache Spark project.

Real-time processing:
Scalability
Fault tolerance

In this demo, for ease of setup, I have used ksqlDB SQL Engine as its easy to set up and take less time in explaining the concept. But practically, I consider Spark Streaming better choice.

ksqlDB is a distributed streaming SQL engine built on top of Apache Kafka. It allows businesses to process, analyze, and query real-time streaming data using a familiar SQL- like syntax. Some of the benefits of using ksqlDB include The scalable, Reliable, and Decoupled tool which will consume all the incoming orders from various brokers and persist in the sequence.

Refer — https://www.confluent.io

Real-time data processing
SQL-like syntax
Scalability
Stream processing
Easy Integration with Kafka

Order Book

An Order Book system plays a crucial role in a stock exchange application by maintaining a record of Order management & Transparency

Notification Engine

A Notification Engine can play a critical role in a stock exchange application by providing real-time notifications to market participants about important events, such as the execution of trades, price changes, and other market conditions. The role of a Notification Engine in a stock exchange application includes the following:

Real-time notifications
Customization
Integration
Security

Historical Data layer

A Historical Data layer in a stock exchange application is responsible for collecting and storing historical market data, such as prices, volumes, and trades, over some time. The role of a Historical Data layer in a stock exchange application includes the following:

Market analysis
Regulatory compliance
Backtesting
Risk management
Business intelligence

Why Hadoop Distributed File System?

HDFS is a distributed file system designed to store and manage large amounts of data across multiple nodes in a cluster. It is often used to store historical data for several reasons:

Cost-effectiveness
Scalability
Fault Tolerance
Easy access
Data retention

Why Spark Machine learning?

Spark Machine Learning (ML) is a powerful tool for businesses looking to develop and deploy machine learning models at scale. Some of the benefits of using Spark ML include the following:

Performance
Flexibility
Integration
Ease of use

CQRS

CQRS stands for Command and Query Responsibility Segregation, a pattern that separates read and update operations for a data store. Implementing CQRS in your application can maximize its performance, scalability, and security.

Here, I was trying to present the API architecture as CQRS, Brokers will push the Order place command to one application i.e Match Stream but will read the updates from the other application i.e stock meta/Order Book (explained in Figure 1) plus will be notified event based by notification engine.

Assumptions:

As I am building solutions for electronic exchanges like Nasdaq or NSE. We would be receiving orders from various brokers like Zerodha, Icici direct, or Moti Oswal on the behalf of traders who have chosen these various brokers for various reasons.
Our system should be generic/scalable enough to handle multiple lists of brokers without extra code.
Our Stock Exchange only fulfills Full orders not supporting partial Order execution.
API Model is a kind of CQRS Pattern. Mock Exchange API will consume Order and push to the streaming engine for the Query Broker will invoke a separate system that will be Order Book. Or Order Book’s Notification engine will invoke Brokers BE system in async mode soft real-time mode.
I have implemented a dummy python Client and mockExchangeServerAPI Client — Broker’s Jupiter notebook, that will behave as a client, using which trader will be placing orders and monitoring the status. In the real world, this is not a command-based tool, but features enrich web UI + mobile App (Like Zerodha Kites)

Refer — Broker/Client code.

Mock Exchange Server API

Mock Stock Exchange API will receive the Order post request and push the message to message broker Kafka.

The framework used for Backend API is FLASK(do install it first). It will host the mock exchange Rest api on localhost 5000

Refer — Mock Exchange Server Code here

Start Mock Server

FLASK_APP=spaexchange.py flask run - debugger

Stock data analysis is used by investors and traders to make critical decisions related to stocks. Investors and traders study and evaluate past and current stock data and attempt to gain an edge in the market by making decisions based on the insights obtained through the analyses.

As explained in figure 1 — Architecture of Electronic exchange. I mentioned that the order matchmaker will be a stream-based system.

Layers implemented for this demo/assignment/discussion

Incoming orders — for all active orders received from the various brokers.
Buy Orders — This will contain all the orders with the type BUY.
Sell Orders — Will contain all the orders with type SELL.
Match Making Join — Will join both the buy stream and sell stream based on the instrument and the buy. price <= sell. price. Those matched records will be pushed to the next immediate stream which is Matched Stream
Matched Stream — Will contain all the orders which got executed along with the buyer and select order id and the quantity and the price. This stream will be consumed by
Order Book system to update the orders and trade information.
Simple Moving Average stream to calculate the average price of instruments.

Install: Tools/Software

kafka_2.11–2.4.0
confluent-7.3.2
Open JDK 1.8
Python 3.1.2
Anaconda/Jupyter Notebook for development

Start Message broker and Streaming engine using below commands

Start Zookeeper

bin/zookeeper-server-start.sh config/zookeeper.properties

Start Kafka

bin/kafka-server-start.sh config/server.properties

Start Confluent KsqlDb

bin/ksql-server-start ./etc/ksqldb/ksql-server.properties

Create Topic first

./kafka-console-producer.sh — broker-list localhost:9092 — topic incoming_orders

KSQLDB SQL Engine

bin/ksql http://localhost:8088

What are Stream and Table in KSQLDB

A stream in KSQLDB is an unbounded sequence of immutable events or records, with each event containing a key and a value. Streams are used to represent continuous, real-time data and are best suited for situations where data is constantly changing, such as event processing or real-time analytics.
A table in KSQLDB is a structured, immutable collection of data, with each record containing a key and a value. Tables are used to represent a snapshot of data at a specific point in time and are best suited for situations where data is static or changes infrequently, such as reference data or materialized views. Tables can be used to join with streams, aggregate data, filter data, and perform lookups.

Incoming Orders Stream

It will receive the list of orders placed by the trader’s broker system for the BUY or SELL side.

-- Create a stream for incoming orders 
CREATE STREAM incoming_orders (
order_id VARCHAR, stock_symbol VARCHAR, quantity INT,
side VARCHAR, order_type VARCHAR, price DOUBLE, time_in_force VARCHAR
) WITH ( KAFKA_TOPIC='incoming_orders', VALUE_FORMAT='JSON', PARTITIONS=1
);

Buy Orders Stream

It will filter the data from the Incoming_orders stream and consume only BUY Order.
In the practical world, we could add a separate stream for the type of Instrument or further divide it to the level of individual Stock. But with each layer of complexity of handling traffic, processing orders immediately and consistency issues surged.

-- Create a stream for buy orders 
CREATE STREAM buy_orders AS SELECT *
FROM incoming_orders
WHERE side = 'BUY';

Output

Sell Orders Stream

It will filter the data from the Incoming_orders stream and consume only SELL Order

-- Create a stream for sell orders
CREATE STREAM sell_orders AS SELECT *
FROM incoming_orders WHERE side = 'SELL';

Output

Matched trades Stream

Here I am Joining the BUY_orders and SELL_orders stream based on the stock symbol and SELL Price and Quantity.
Criteria

Buy and Sell stock Symbol should be the same
Quantity should be least of Sell Quantity and buy Quantity
The selling Price should be less than equal to Buy Price.

-- Join the buy and sell orders streams on the stock symbol and price 

CREATE STREAM matched_trades AS
SELECT
buy.order_id AS buy_order_id,
sell.order_id AS sell_order_id, buy.stock_symbol,
buy.price AS buy_price,
sell.price AS sell_price,
LEAST(buy.quantity, sell.quantity) AS quantity
FROM
buy_orders buy
INNER JOIN sell_orders sell
WITHIN 8 HOURS
ON buy.stock_symbol = sell.stock_symbol
where buy.price >= sell.price and sell.quantity=buy.quantity;

This matched_trades topic will be subscribed by order_book and then execution will happen. To subscribe to this topic use the below Kafka command

./bin/kafka-console-consumer.sh --broker-list localhost:9092 –topic matched_trades

Implementing SMA using the concept of Hopping Windows

What is Hopping Windows in Data Streaming?

Hopping windows group events for aggregation. Hopping windows are equal in duration but overlap at regular intervals. They can help with regular calculations where the time basis for aggregation is different from the frequency at which the calculation should be performed.

The advantage of using a hopping window is that it can provide a continuous stream of results without waiting for the entire data set to be collected. This makes it well-suited for Simple Moving Average Processing.

Create a stream for Simple Moving Average for the last 10 minutes and a 5-minute sliding window

CREATE TABLE sma_hopping_table AS
SELECT BUY_STOCK_SYMBOL, AS_VALUE(BUY_STOCK_SYMBOL) as STOCK_SYMBOL, AVG(SELL_PRICE) AS sma
FROM matched_trades
WINDOW HOPPING (SIZE 10 MINUTES, ADVANCE BY 5 MINUTES) GROUP BY BUY_STOCK_SYMBOL
EMIT CHANGES;

Output

In the real-world application, SMA not only has a streaming window but also some other components which consume the messages from Kafka and process and persist in the database and also broadcast the changes to brokers or other components of the architecture like the historical database.

Let’s test by generating random orders from broker’s client and subscribing to the above SMA and Matched order topic

# Define a function to generate a random order
def generate_order():
    # Generate a random stock symbol from the list
    stock = random.choice(list(stock_prices.keys()))

    # Get the last price of the stock
    last_price = stock_prices[stock][-1]

    # Generate a random quantity between 1 and 10
    quantity = random.choice(quantities)

    # Generate a random buy or sell side
    side = random.choice(sides)

    # Generate a realistic price based on the last price of the stock
    if side == 'BUY':
        # Add a random percentage to the last price
        price = round(last_price * (1 + random.uniform(0, 0.05)), 2)
    else:
        # Subtract a random percentage from the last price
        price = round(last_price * (1 - random.uniform(0, 0.05)), 2)

    # Get the current time
    timestamp = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')

    # Create the order dictionary using the randomly generated values
    order = Order(symbol=f'{stock}', qty=quantity, side=f'{side}', type='limit', price=price, time_in_force=timestamp)  
    return order

Kafka Topic which will consume the SMA events from Market Maker systems.

> ./bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic SMA_HOPPING_TABLE

Output

There is another component that I missed in this about the profit & Loss, the basis on that's how the Stock Exchange trading platform earns.

Thanks for reading! Happy Learning.

References

Demystifying System Design of Electronic Stock Exchange Applications

Understand “Real-time” vs “Near Real Time”

High-Level Architecture

API Gateway

Authentication Server

Why Kafka?

Instrument Bid Matching Engine

Order Book

Notification Engine

Historical Data layer

Why Hadoop Distributed File System?

Why Spark Machine learning?

CQRS

What are Stream and Table in KSQLDB

Written by Ritresh Girdhar