The ability to ensure quick purchase and sale of assets at the expected price without sharp slippage is a key parameter that directly affects the income and reputation of the crypto exchange.
At the same time, high liquidity can be achieved through custom crypto exchange architecture that prioritizes matching engine configurations, microservices, and high-throughput aggregation.
Below we will talk in detail about the key parameters of CEX that should be taken into account during development.
Why is liquidity a key indicator for a successful crypto exchange?
It is the criterion of liquidity that market makers, arbitrageurs and algorithmic traders pay attention to. Why is it important to attract a target audience:
● It is they who often generate up to 50-70% of the volume of all transactions and form the largest income of the crypto exchange (more large transactions, the higher the total profit from commissions).
● Price stabilization occurs due to the formation of a “glass” with a dense price grid, as professional traders regularly create new buy/sell orders. What is the advantage of the exchange: the possibility of manipulating the “glass” decreases, which means that reputational risks also decrease.
● The “depth of the glass” increases (traders have the opportunity to conduct large transactions without slipping). This attracts institutional clients and increases the status and trust of the stock exchange.
In turn, a good reputation and large trading volumes attract even more new clients, and liquidity will increase even more.
At the same time, exchanges with low liquidity are considered unstable and less competitive.
Failure tolerance and liquidity: how are they related?
Resilience is the ability of a crypto exchange to continue to work correctly in the event of failures, peak loads, network problems, updates or local problems of individual servers.
How does this correlate with liquidity:
● The unstable operation of the exchange (downtime/delays in the execution of orders) forces users to switch to alternative platforms, which reduces liquidity.
● Reliable and fast processing of orders builds the confidence of traders, stimulates large transactions, reduces spreads, increases exchange income due to commissions and increases liquidity.
● Errors due to low fault tolerance (loss/partial execution of orders, duplication of transactions, balance failures) lead to reputational and financial losses (due to complaints and compensations), customer outflow and drop in liquidity.
High liquidity is achieved only by exchanges that show their reliability, withstand peak loads and continue to function correctly even with local failures.
Matching engine: why it is important and how to increase fault tolerance
Matching engine (ME) is the key software of a crypto exchange that receives, matches, implements and fixes orders for the purchase/sale of crypto-assets.
Additional functions;
● updates the order book;
● manages liquidity;
● controls shoulders, limits;
● implements rollback of the operation in case of failure.
The greater the volume of trades, the variety of order types (limit, stop, etc.), the types of trade (spot, futures), the greater the required performance and the lower the latency (the time between the order’s sending and realization).
For example, on large exchanges such as Binance, Bybit, OKX matching engine must process millions of operations per second with a delay of milliseconds.
But at the same time, the fault tolerance parameter also plays a key role: the Matching engine must work correctly even with network problems, server failures, and updates/failures of dependent servers.
Important: The larger the exchange and the larger the load (which is especially relevant in peak periods with high market volatility), the higher the risk of failure.
How is the increased fault tolerance of the matching engine implemented from a technical point of view
Hot cues
This is the creation of 2 or more instances of ME that work in parallel.
There are two models: Active-Passive (one works, the second is in reserve, but is constantly synchronized with the first; in case of failure, it switches to the second) and Active-Active (both work simultaneously and are synchronized, in case of failure of the second, it will continue to work without delay).
Active-Active is 2-10 times more expensive to implement (depending on the complexity of the synchronization mechanism, tests, infrastructure capacity + DevOps rates, which require more complex work). It makes sense if the exchange has a very high load and is focused on institutional clients (funds and market makers). Since the main plus is the complete absence of delay for users. For startups with average trading volumes, AP is often enough.
WAL (Write-Ahead Logging)
With this approach, each operation is recorded in a special log before implementation through ME.
If WAL is not implemented, if the matching engine fails, the loss of a part of the orders, desynchronization of the book, duplication/only partial execution of transactions may occur – these are financial losses for the exchange + a decrease in trust.
When developing/choosing ME characteristics for a crypto exchange, it is also important to pay attention to its internal properties, such as atomicity and idempotency.
● Atomicity is a guarantee that the order will either be fully executed or not executed at all. There will be no situations when the order is written off, but not added to the order book and vice versa.
● Idempotency is a part of the “order processing logic” that excludes the duplication of the same order. For example, a trader created an order, a failure occurred, he sent it again, but ME will not create a duplicate, but will record that it is already in the process of implementation.
In practice, what is important to specify when ordering ME: are the transactions atomic and is the logic implemented that duplicates are not created when an order is submitted with the same UUID.
This is usually written in the ME/API documentation. This can also be checked by looking at the crash tests.
Without idempotency, the order book can “break”, errors will appear in client balances and, as a consequence, complaints and loss of trust (and sometimes compensation payments).
Without atomicity, there will be desynchronization, the risk of financial errors and stock exchange losses will increase + customer dissatisfaction will increase.
Microservice architecture: how does it increase fault tolerance?
There are two types of crypto exchange architecture: monolithic (all functions are combined in one application/ME; all processes work in a single memory space/database) and microservice.
With a microservice architecture, the system is divided into separate independent services that “communicate” through an API. Each of them has its own function, for example:
● Service of wallets and deposits;
● Service for ETH-USD pair;
● Service for reporting and so on.
Such an architecture increases fault tolerance, as a failure in one service does not affect the work of others.
Additional advantages: it is easier and cheaper to restart, change, scale 1 separate service than the entire monolith; it is easier to localize failures; the load is distributed better.
An example of a mechanism for increasing fault tolerance within the microservice architecture is sharding for the order book.
The book is divided into separate shards, each of which is processed by a separate service.
What this gives: if 1 shard falls, the failure will be local, all other shards and the exchange will continue to work in normal mode + so the ME load is better distributed and it can process more transactions at the same time (this is a plus for traders and improves the reputation of the exchange).
General principles and patterns of microservice architecture, which are fundamentally important for the owner of a crypto exchange:
● Asynchronous queues. Between services not direct synchronization, but messaging. If 1 of the services fails, the message remains in the queue and does not disappear, while other services continue to work. It is especially relevant during times of high volatility. Why is this important for the owner of the exchange: orders and transactions will not be lost, downtime will be minimized and, as a result, income will be even in case of single failures + the confidence of traders increases.
● Horizontal scaling. If necessary, only 1 service is added to handle the increased load, while the others will continue to function. This approach makes it possible to scale the project with lower costs, increase productivity and, as a result, show a lower % of errors and delays.
● Localization of failures. An error in 1 service does not affect others, it can be quickly restarted, while the exchange will continue to function fully at this time. In this way, the probability of downtime is reduced (that is, a stable income will be preserved) + total financial losses in the event of failures are reduced.
When a microservice architecture is critically important: with a large volume of trades + a growing number of users + if scaling and the introduction of new functions are planned in the future.
A monolithic structure is cheaper, as it is easier to develop, but it is justified only if the exchange is small or at the very first stages with a limited budget.
Additional elements: what else increases fault tolerance?
Database replication is also useful for reducing the risk of data loss + reducing financial errors, and load balancing is used to prevent overloading.
The Monitoring & Alerting system will allow you to quickly track failures, react and minimize downtime.
There should also be an emergency recovery plan.
The post The architecture of modern liquidity: How tech stacks define crypto exchange success appeared first on Invezz