Happy Business Starts Here

AQUA: An Introduction to Join Processing

Zuora Staff

Zuora Query Language (also known as ZOQL) allows Zuora tenants to query their data using a SQL-like language.  One feature of ZOQL is that you don't have to specify joins. Because the data model is known to us, you simply tell us what data you need, and we retrieve the data.


This makes it simple for Zuora users, but leaves the intricacies of figuring out how to retrieve the data to us.  Zuora reporting data warehouse stores the data in various tables that generally correspond to entities in the Zuora data model.  Sometimes several joins need to be made in order to produce the requested data.


On the high level, we use the entity-relationship graph of the Zuora data model to construct the join tree.


Let's look at a simple example using Invoice datasource and ZOQL query:

select BillToContact.City from Invoice

This query translates into something like:

select billtocontact.city from invoice join account ON invoice.account__id = account.account__id join contact as billtocontact ON account.billtocontact__id = billtocontact.contact__id

The graph below illustrates the ER graph for the Invoice datasource and the join tree that's built for the SQL query:




The join tree gives us the following path from Invoice to Contact: Invoice --> Account --> Contact, so we first join Invoice to Account, and then to Contact:


Here is a slightly more complex example of a ER graph and join tree for this query:

select BillToContact.City from InvoiceItem



In this example, there are multiple paths that lead to Contact from InvoiceItem.  E.g.:

  • InvoiceItem --> RatePlanCharge --> Account --> Contact
  • InvoiceItem --> Invoice --> Account --> Contact
  • InvoiceItem --> Account --> Contact

Which one do we pick? The shortest one, which is InvoiceItem --> Account --> Contact.


The algorithm for finding the shortest path in a graph is BFS (breadth first search).  In our graph, vertices are entities such as Invoice, Account.  Graph edges are relation names, such as BillToContact, SoldToContact, Account, ParentAccount.  We perform a BFS from Invoice looking for edge BillToContact.  The algorithm is a based on the classic BFS algorithm using a queue with some modification to support cycles (e.g. ParentAccount is a loop on Account).


While the above examples were straightforward where it's easy to see the shortest path, it quickly gets more complicated as we get deeper into the data.  E.g. here is a fragment of a graph for RevenueScheduleItemInvoiceItem.  


See if you can quickly find the shortest path from  RevenueScheduleItemInvoiceItem to Contact. This is when O(N) graph algorithms like BFS come handy!