This one goes out to all the data geeks: If you could create just ONE statistic to measure the coolest most epic most important thing in the world, what would it be?
Hold on - I didn't ask you what the coolest most epic important thing in the world is to you - I don't care, to be honest. It goes without saying around Zuora that the coolest most epic thing these days is the transformation going on in the economy, from products to services and commodities to relationships. But I know you've heard plenty about that, so I promise to shut up about it for the rest of this post!
I'm asking, how would you go about measuring the most important thing (whatever it is) with a single measurement or statistic? Because that's the decision I had to make when I was handed the task of designing the Subscription Economy Index. In this age of data deluge we've all become used to looking at everything from so many different angles (yes data geeks, I mean high dimension feature sets). And it gets hard when sometimes you have to pick just one number. Really hard. I think in the end we did a pretty a good job with the Subscription Economy Index, but not without a few missteps along the way. With hindsight I'd say that the process went through four steps, and I think this is a good way to work for any time you have to summarize a complicated situation with just one number:
- Ask what's most important
- Make it dynamic
- Make it represent
- Make it robust
1. Ask what's most important
I started out with a lot of choices. We're all awash in numbers these days - so many metrics, measuring so many things. It's a really good exercise to stop sometimes and try to remind ourselves what's most important. For the Subscription economy the natural choice might be subscriptions and subscribers (duh!) but then everyone likes to talk about churn. And how about the products and services themselves or transactions? I felt a need to step back from specific numbers and ask what is it that people really want to know about the Subscription Economy? My first idea was how big is it? Or another way to put is how much; how much business is really going on? So the idea was to make some kind of metric of how big the Subscription Economy is because that seems like the most important thing. But what?
2. Make it dynamic
I went through some ideas of statistics to measure how big the Subscription Economy is like volumes of currency and numbers of transactions but I found myself bored! I realized, just taking the grand total of something is boring. It's a snapshot, but it it doesn't capture the dynamism. That was when I realized, the Subscription Economy Index should measure Growth! Not how big, a different way of looking at how much, which is really How fast is the subscription economy growing? Because one thing we see again and again in the modern world is that exponential growth in a new model can swamp the old order of things in no time. So the best metric to capture a new emerging space like the Subscription Economy is the growth rate.
3. Make it represent
Then next step was deciding how to actually measure that growth. The growth in the number of subscribers? The number of services? Going back to step 1, the key was to measure the growth that people really want to know about. The people interested in this index are most likely going to be companies in the Subscription Economy or investors in those companies so I decided that the growth we measure should be the growth of the Subscription Economy companies themselves. And to make it really representative, the statistic had to represent the growth experienced by a typical Subscription Economy company, and not the aggregate growth of all the Subscription Economy companies together (that might be more interesting to an economist). That lead me to the idea of calculating each company's growth separately and then making the statistic represent the average or typical value, and that's the Subscription Economy Index, in a nutshell.
But there was a catch: Remember I said the people interested in this are not only Subscription Economy companies, but also investors? We wanted the statistic to represent typical customers, but at the same time we wanted the statistic to represent the size of the opportunity for investors. So if a "typical" compa is a small opportunity for investors we should over-represent the large companies that held a bigger share of the opportunity, and make the statistic more meaningful for investors . Fortunately, before I was a data scientist in Silicon Valley I was a quant for a Wall Street firm, and this kind of thinking was very familiar to me. The tried and true solution here is to use a weighted average with constraints, just like in a Stock market index. That way the Subscription Economy Index would represent a balance between the "typical" mid-size companies (most interesting to the companies) and the uncommon large companies that represent more of an opportunity (interesting to investors).
4. Make it robust
At this point I was really excited about the index. But when I first ran some calculations it looked a bit odd: the growth was just too high to believe, and there were also odd quarters where some of the sub-indices were wildly inconsistent with the overall average. At that point I realized I had a serious outlier problem. And that brings us to the last step in making one great statistic: make sure it is robust!
If you've ever worked with growth rates you know there's only one rule: The minimum value is minus 100%. Other than that, all bets are off - because a growth rate is a ratio it explodes when the denominator is small (and of course it's undefined when the denominator is zero but that's not really a problem of robustness here since we just don't take a measurement in that case). Think about a startup that experiences 1500% growth from a starting point of $100 - hurray, they're making $15K! If only they could keep that growth rate going to $1M, but of course they can't. And even a weighted average can become badly deranged in the presence of extreme outliers. So I knew there had to be some kind of outlier removal as part of the process.
After some experimentation with historical data, I settled on removing the top and bottom five percentiles. Another thing that helped was using an extended burn-in period on each constituent before it became part of the average. That is partly a matter of robustness, since the most volatile period (most quarter to quarter change in the growth rate) was typically early in the life of each constituent. But it's also a matter of representativeness: We don't want the index to represent what happens when a constituent is still going live in the service or just starting out, we want it to show what they look like when they have hit their stride. One last thing about robustness: In order to make sure a statistic is robust you'd better play around with some historical data!
Wait a second, didn't I say there were four steps? Well if you're a practicing data geek (and not the armchair variety) you would know that for anything important you almost never get it right the first time. So the actual process looks about like this plate of spaghetti:
Don't be afraid to go through a few versions and collect feedback from everyone you can.
The Beginning, Not the End
So there you have it: How to design a great statistic in four not-so-easy steps! All you have to do is figure out how to capture the dynamics of the most important thing, while making sure it is representative to multiple audiences and has a robust implementation. (You thought it was going to be easy? Never said that.) But of course that's just the beginning - after that you get to watch your metric live and help you understand the real world! After all that hard work, I'm looking forward to watching the Subscription Economy Index for years to come...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.