Happy Business Starts Here

Real-Time Cloud Data Synchronization

Zuora Staff

The Issue


Zuora’s core billing, payment and finance system creates millions of transactions on a daily basis. Customers usually also deploy other SaaS applications to manage other aspects of their business operations. A big part of Zuora’s vision, is to serve as a “hub” between multiple systems. We see a growing need of synchronizing Zuora’s financial transactions data incrementally into multiple target systems in real-time, because customers’ business operations are largely dependent on the real-time availability of this data.


We thus need to effectively capture and process the data changes to achieve data synchronization in real-time. Zuora’s unique challenge is that our Billing and Payment system is mission-critical: It has little tolerance for instability, performance degradation or downtime, yet it is the data source for all downstream systems. Therefore, we need a non-intrusive, high-performing and highly scalable solution to capture, transform and deliver data changes to target systems on the Cloud.


We also need to be able to dynamically transform the captured data into a schema and data format that can be inserted into the target systems using their APIs. Each target system may require different schema and data format. Each target system may enforce different API limits. So our solution needs to be adaptive to these variations.


Our Approach


There are multiple approaches to capture changed data incrementally.


The most common one is Audit Fields based, a.k.a., querying the changed data using Updated Date or Created Date fields. The problem for this approach is that it is highly inefficient and non-scalable - large database tables are usually scanned for only a few record changes. Updated Date or Created Date are rarely indexed. It also relies on the applications to reliably update the Audit Fields.


Database triggers are also widely used to capture the data changes. This, however, is also problematic; Database triggers do degrade the tr ansaction performance because they are synchronously executed as part of the transaction. Also, data capture and transformation failure would end up rolling back the entire transaction.


Here at Zuora, we choose to use Database Binary Logs. MySQL databases currently serve as our transactional databases, and we replicate the transactional DB to various replicas to ensure availability. If we turn on the Row-Based logging of the replica DB, we can get row-level data changes for every transaction. We can filter down these row-level data changes to the sub-set of data changes we need for data synchronization, and then further transform the data to the required schema and data format, before sending the data to the target systems.


This approach is the least intrusive - it is asynchronous, thus has zero performance impact to our mission critical DB transactions; it is very efficient because we know exactly what’s changed in a transaction; it also ensures transaction integrity because each record change in the DB Binary Log is associated with the DB transaction ID, so it allows us to synchronize the data with the transaction boundary, without leaving data at partial state.




The following diagrams illustrate the high-level architecture and data flow of Real-time Cloud Data Synchronization:


Diagram 1 : Real-time change data capture (CDC) via Binary Log and downstream data filtering and transformation




In this diagram, the changed data is captured through the MySQL Binary Log, and a copy of Raw Data Set is persisted. The Raw Data Set is published to one or more downstream processors. Each downstream processor filters down the data according to business needs, and further transforms the data into the specifics required by the corresponding target system.

In our current i mplementation, we use Tungsten replicator as the CDC processor that produces the Transaction History Log, which serves as the Raw Data Set in the diagram.


Diagram 2: Real-time data synchronization engine



In this diagram, upon completing the data processing, the Transformation Processor raises an event to our internal asynchronous messaging system. The event then gets consumed by one or more Event Consumers. The event consumer caches the event into a dataset called Sync Record Set, which is further processed - Filtered, Merged and Converted - by a component called Sync Rule Engine, before it is ready to be synchronized over the network to the target system on the Cloud.


The Sync Rule Engine is not only necessary, but is critical in this solution in that, it reduces the number of data events, ensures the transaction boundary, optimizes the API usage of the target system and therefore increases the overall efficiency of the data synchronization.


The Sync Engine is the last component in the diagram, and performs the actual real-time sync operations.