Welcome to The Engineer Banker, a weekly newsletter dedicated to organizing and delivering insightful technical content on the payments domain, making it easy for you to follow and learn at your own pace.
Welcome to a new installment of Payment Bites. Today, we're building on our prior discussions about SICT settlement by focusing on the less-than-ideal scenarios—what we like to call the 'unhappy cases.' For a look at the happy scenario, you may refer to our previous article:
In a distributed system like RT1, errors can occur at multiple layers. These could range from network latencies and timeouts to issues related to data consistency. For example, a transaction might get delayed due to network congestion or fail altogether due to an unresponsive node. Such anomalies are much harder to predict, isolate, and resolve in a distributed environment compared to a monolithic one.
In distributed systems, the concept of partial errors presents a unique set of challenges. Unlike in a monolithic system, where a failure is usually total and clear-cut, distributed systems may experience partial errors where some nodes or components fail while others continue to operate. This ambiguous state complicates error detection and recovery strategies, as the system has to decide how to proceed when only a part of it is malfunctioning.
The architecture must also account for eventual consistency. In a distributed system, especially one that deals with financial transactions, ensuring that all parties have the same view of the transaction is crucial. This is even more vital in the context of RT1, where even minor discrepancies can have a large impact, given the high volume of transactions and the real-time nature of the platform.
Before diving into the diagram to explore possible business and technical pitfalls, let's first revisit the common misconceptions about distributed systems and the fallacies of distributed systems:
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn't change
There is one administrator
Transport cost is zero
The network is homogeneous
This list of distributed systems fallacies is intended to serve as a critical guide for developers and architects working within the context of a distributed system, encouraging them to take these potential pitfalls into account during the design and programming stages.
In the diagram above, we provide a comprehensive overview of potential error cases, which are categorized into business and technical failures. Starting with the first set of errors, you'll find an assortment of validation issues that could arise within RT1. These range from structural inconsistencies in pacs.008 messages to routing errors, and even liquidity checks that could potentially halt transaction processing.
The second set focuses on business-related errors that may occur on the beneficiary side. These include a wide array of challenges such as accounts being closed or seized, the triggering of sanctions screenings, and the activation of fraud detection mechanisms. It's important to note that the SICT rulebook does not accommodate scenarios where transactions may be accepted without posting, even in cases where a manual review could be deemed necessary.
For an explanation of accept without posting functionality you can read our article about instant credit transfers in ISO20022 infrastructures.
Last but not least, the diagram delves into the realm of technical errors. These types of errors encompass scenarios where different types of messages are lost during transmission, along with the ramifications of such losses. Finally, the diagram illustrates the implications of losing a settlement confirmation on the originating side. Specifically, it shows how such a loss can activate an automatic transaction status request via a built-in recovery mechanism. This feature is critical for ensuring the robustness of the system.