byMarcelo02-10-201710:00 AM - edited 02-10-201705:01 PM
Hi. I'm Marcelo Rinesi, Data Scientist for the Insights team. Insights is a fantastic visualization and analysis tool that combines Zuora data with data from external sources, such as Salesforce and customers' own applications. Unfortunately, you cannot really see the power of Insights until you've plugged data into it, and customers want to see what it does before doing that. This post is about how we solved this chicken-and-egg problem by creating our own customer simulator.
We had two constraints from the beginning:
The data had to be sufficiently complex and show realistic-looking patterns. The whole point of Insights is to extract meaning from data; if the data you're looking at is obviously fake and too simple, then you won't get a proper feeling for what the tool can do.
We were going to have to make a range of demos. Zuora's current and future customers belong to many different industries, each with different patterns of usage, billing, user behavior, etc. A realistic data set is better than an obviously fake one, but customers care about what Insights can do for _them_, which means using data that shows some similarity to what they're used to.
The first constraint implied that we couldn't just generate data on its own. The only way to make data that looks like it comes from accounts and users with internal state and characteristic behaviors is to _have_ accounts and users with internal state and characteristic behaviors, and generate the data from them. In other words, we'd program and run a realistic-enough simulation of an idealized customer's accounts, and use logs from that simulation as data for the demos.
An immediate advantage of this approach was that it separated very cleanly the configuration and management of demos from Insights development and devops. One of the strengths of Insights is that it can take data from many different sources, and it has very simple APIs to let you connect your own systems to it. So from the point of view of Insights, a demo isn't at all different from any other customer: it's just somebody pushing data to a normal instance through the standard API. This also gives people using the demos confidence that everything they see is an actual, working feature of the system.
Writing a simulation is an interesting exercise in any language, but as we were going to build many slightly related simulations in a very specific domain, we chose to design our own domain-specific language. This isn't as overkill as it might look at first. The Insights object model is well-adapted to what it does (objects include Accounts, Users, Attributes, Metrics, Events, etc), so we decided that a language designed from scratch would allow us to separate (literal) business logic from any implementation details. (As a bonus, it would be easy to generate simulation configurations programmatically, something that came in handy later.) After some internal debate on aesthetics versus practicality, we eschewed a LISP-like syntax for JSON as a serialization format.
Here's a fragment of a simulation configuration in this domain-specific language:
The serialization format is a bit verbose, but it can be parsed everywhere, and it's almost self-explanatory. This bit of code describes an "account_name" attribute that's a (fixed) random company name, and a daily metric that begins as a random number between zero and ten, and then increases in a monotonic way.
The language has a variety of control structures and mathematical functions, besides domain-specific functions to generate random names, addresses, etc. It's rich enough to define pretty much any behavior you might want, so we can generate data to highlight the ever-growing analytical capabilities of Insights, as well as make the data look realistic. For example, here's code that defines a metric that's one during the workweek and zero during the weekends, except for accounts with an specific attribute value:
This isn't the kind of specific pattern you want to write support for in a simulator, but it's quite easy to write in our simulation configuration language.
Of course, even the most realistic data begins to look underwhelming once it becomes stale. To avoid this, the simulator saves to disk the entire state of each simulation after it's processed, and a daily cron job advances the simulation up to the current date every night. It then connects to the Insights API to push the generated data, just as all customers do.
A short epilogue: getting realer
A few weeks ago we faced a new iteration of this challenge. We were going to demo Insights to existing Zuora customers, which ideally meant that we'd connect their core Zuora and Salesforce (if they have one) accounts to Insights. This presented two new challenges to adding simulated data: it had to be consistent with the data from their Zuora account, and any events or metrics generated had to show the names and patterns the customer would expect. Writing a simulation config from scratch wouldn't work anymore; we'd have to run simulations at least partially driven by customer's existing data and expectations.
We exploited the separation between the simulation configuration language and the simulation engine to simplify this problem. We settled on a spreadsheet format where CSM specialists could enter the parameters they thought would work best for customers to find the demo useful (including existing Zuora account names, the names and behavior patterns of metrics, etc), and wrote a tool that parses this spreadsheet and generates a simulation configuration specific for this customer. The simulation configuration still describes the metrics and information that Insights expects, but it does not specify any of the data that will now be coming from Zuora and Salesforce. Because the simulation configuration language is based around the Insights object model, but not any specific semantics, it could adapt to this new role without changes to the simulation engine itself.
No part of the Zuora platform keeps still, and Insights is no exception. Some of the changes ---and not the least interesting ones!--- take place behind the curtains, enhancing the reliability, scalability, and flexibility of the data pipeline. There are many improvements planned to Insights' data analysis and exploration capabilities, and as the tool advances, we'll keep improving our demos so there's always good fake data to show off the great real stuff.