@Jamie @ehavens We may add a new permission to still let customers to not sync over "null" value for custom fields just in case some customers may want the old behavior due to any reasons. But by default, they should be able to sync over "NULL" value. We will get back to you regarding the progress soon.
... View more
This does look like a defect for me. Though we may need to consider that for some customers they may already get used to current behaviors. Changing current behaviors would have side mpacts for them even it may be the right thing to do.But we will evaluate this enhancement very soon in engineering team internally and will keep you updated about this.
... View more
Zuora’s core billing, payment and finance system creates millions of transactions on a daily basis. Customers usually also deploy other SaaS applications to manage other aspects of their business operations. A big part of Zuora’s vision, is to serve as a “hub” between multiple systems. We see a growing need of synchronizing Zuora’s financial transactions data incrementally into multiple target systems in real-time, because customers’ business operations are largely dependent on the real-time availability of this data.
We thus need to effectively capture and process the data changes to achieve data synchronization in real-time. Zuora’s unique challenge is that our Billing and Payment system is mission-critical: It has little tolerance for instability, performance degradation or downtime, yet it is the data source for all downstream systems. Therefore, we need a non-intrusive, high-performing and highly scalable solution to capture, transform and deliver data changes to target systems on the Cloud.
We also need to be able to dynamically transform the captured data into a schema and data format that can be inserted into the target systems using their APIs. Each target system may require different schema and data format. Each target system may enforce different API limits. So our solution needs to be adaptive to these variations.
There are multiple approaches to capture changed data incrementally.
The most common one is Audit Fields based, a.k.a., querying the changed data using Updated Date or Created Date fields. The problem for this approach is that it is highly inefficient and non-scalable - large database tables are usually scanned for only a few record changes. Updated Date or Created Date are rarely indexed. It also relies on the applications to reliably update the Audit Fields.
Database triggers are also widely used to capture the data changes. This, however, is also problematic; Database triggers do degrade the tr ansaction performance because they are synchronously executed as part of the transaction. Also, data capture and transformation failure would end up rolling back the entire transaction.
Here at Zuora, we choose to use Database Binary Logs. MySQL databases currently serve as our transactional databases, and we replicate the transactional DB to various replicas to ensure availability. If we turn on the Row-Based logging of the replica DB, we can get row-level data changes for every transaction. We can filter down these row-level data changes to the sub-set of data changes we need for data synchronization, and then further transform the data to the required schema and data format, before sending the data to the target systems.
This approach is the least intrusive - it is asynchronous, thus has zero performance impact to our mission critical DB transactions; it is very efficient because we know exactly what’s changed in a transaction; it also ensures transaction integrity because each record change in the DB Binary Log is associated with the DB transaction ID, so it allows us to synchronize the data with the transaction boundary, without leaving data at partial state.
The following diagrams illustrate the high-level architecture and data flow of Real-time Cloud Data Synchronization:
Diagram 1 : Real-time change data capture (CDC) via Binary Log and downstream data filtering and transformation
In this diagram, the changed data is captured through the MySQL Binary Log, and a copy of Raw Data Set is persisted. The Raw Data Set is published to one or more downstream processors. Each downstream processor filters down the data according to business needs, and further transforms the data into the specifics required by the corresponding target system.
In our current i mplementation, we use Tungsten replicator as the CDC processor that produces the Transaction History Log, which serves as the Raw Data Set in the diagram.
Diagram 2: Real-time data synchronization engine
In this diagram, upon completing the data processing, the Transformation Processor raises an event to our internal asynchronous messaging system. The event then gets consumed by one or more Event Consumers. The event consumer caches the event into a dataset called Sync Record Set, which is further processed - Filtered, Merged and Converted - by a component called Sync Rule Engine, before it is ready to be synchronized over the network to the target system on the Cloud.
The Sync Rule Engine is not only necessary, but is critical in this solution in that, it reduces the number of data events, ensures the transaction boundary, optimizes the API usage of the target system and therefore increases the overall efficiency of the data synchronization.
The Sync Engine is the last component in the diagram, and performs the actual real-time sync operations.
... View more