Building bi-directional sync between two system

Published on 24 February 2021, in #data-engineering, #system-design

I've been facing the question of how to build a bidirectional near-real-time synchronization between two systems. In the past years, I've seen multiple projects where data between two systems were synced bi-directionally. In these projects, the bi-directional sync was a cause of data loss, unresolved conflicts, and frustration. In one project, the company ended up with two employees tasked to manually copy data from one system to another.

So when tasked to build bidirectional sync, I'm concerned about these issues. To mitigate the concerns, here are the strategies I use:

Before building, I've found it worth investing time into understanding the underlying needs and organizational constraints and into communicating back concerns from the IT perspective. Then, to investigate different business solutions to the needs and concerns without bidirectional sync.
If the sync is needed, we try to turn the bidirectional sync into two one-directional syncs. In one-directional sync, each table is assigned a system, and the system is the source of truth for that table. Any change in the source of truth for that table overwrites the table in the other system. We can do the same on a more granular level, that is, on columns or rows.
If that's not possible, we look at the bidirectional sync as a problem of reaching eventual consistency in a distributed system. The eventual consistency can be resolved using CRDTs, which are nicely explained in this video. As far as I understand, CRDTs require tight control over the actors of the distributed system. We hit a dead-end if we cannot change the actor, such as if we cannot implement the last-writer wins strategy within the actor. We encountered this dead-end when we needed to sync two SaaS products: we couldn't access the codebase, and their APIs didn't support the CRDT use-case.
If none of these are possible, the last resort I've found is to build the bidirectional sync as a standalone sync program. The program uses APIs, webhooks, or databases to fetch updates from the systems, resolve conflicts with business rules, and push updates back to the systems. However, the solution comes with a risk of data loss—if the data is updated during the time window when the sync runs. For such bi-sync to be delivered, the business needs to accept the risk of possible data loss.

← Previous post: Resolving conflicts peacefully and powerfully
→ Next post: Generating tax reports with Python, Jinja, and Click

This blog is written by Marcel Krcah, an independent consultant for product-oriented software engineering. If you like what you read, sign up for my newsletter