@andrewchen

Subscribe · Featured · Recent · The Cold Start Problem 📘

The first 6 steps to homegrowing basic startup analytics

Quick intro to getting set up on analytics
I’ve been asked a few times recently, “Wow, these analytics you write about are great, but how does a startup begin to bite off the relevant parts?” This blog is to address these questions.

First, let me recommend reading a previous blog, called omg I’m just a startup, I can’t do those fancy metrics. In it, I cover some more general philosophical ideas about how to approach what to measure and what not to measure. Might be worth taking a look if it’s not too important.

Now let’s move on to the first couple topics:

Step 0: Pre-product
Initally, the product development process should likely be focused on big-picture qualitative information, like whether or not your business is addressing the right audience as well as the preferences for that audience. So don’t measure anything yet :)

Instead, spend your time gathering qualitative data, interviewing users, understanding the problem-behind-the-problem you’re trying to solve, and prototyping concepts.

Do this for a couple weeks!

Step 1: Prototypes
As you create prototypes of your product, you should throw up some free, simple analytics to get you some rough ideas of what’s happening inside the functionality. This likely means something like Google Analytics, although there is a very large universe of equivalent tools out there as well.

Google analytics can’t really tell you much – it’s not very actionable. The main things I like to look at are new versus return visitors, top content pages, what pages are causing bounces, etc. Again, at this stage you are still primarily driven by qualitative research and ideas, and it’s hard for analytics to drive much of your thinking.

This prototype phase might last a month or a couple months

Step 2: Traffic comes in, so data must be collected
As your product begins to mature, and you get a better sense for what you are trying to do with it, the next thing I might do is to figure out what the important pieces of data are, and confirm that it’s being measured. Nothing is worse than throwing data away that you might want to use later.

Generally, I prefer a single table or log that can be queried later that stores events. The right granularity of events is at the “business” event level, like “someone updated their profile” or “someone downloaded a video” rather than at the URL level. This ensures that you are getting a good amount of information from the logs but it’s not so overwhelming that you’re blowing up your database.

You might, for example, hold events in the rough key/value form:

user_id, event_name, value, datetime

Where it might look something like:

1000, profile.photo.update, 1, 9:30AM 3/14/2008

Make sense?

I prefer to start out via SQL so that the manipulations of the data are easy, although many large-scale systems eventually move to flat-files of some format.

Design-wise, here are some things to consider:

  • What’s your “event” hierarchy and what level of granularity do you want?
  • Do you want your analytics DB to be the same as your webapp DB?
  • How should you join data between your webapp stats and your analytics stats?
  • Where does it make sense to throw data away versus trying to store it forever?
  • How do you pass data into the analytics DB? Via a JS interface called by the client (like Google Analytics) or server-side within your methods?

There’s really no wrong answers to the above – I’ve seen it done in many ways.

Step 3: Identifying your user flows
Every web product ultimately has a bunch of user flows contained within it. For example, there might be a series of flows in how users come into the site, starting with ads, SEO, or otherwise. Similarly, once they get on the site, you might be trying to optimize their usage of their site.Identifying these flows is key since you are trying to find the”critical path” that is then optimized. Figure these flows out, and make sure you’re collecting the right data to optimize.
A good place to learn about these user flows is to read about ecommerce “funnels” and how folks go about breaking those down and optimizing them.

Step 4: Trying ad hoc queries
As users are coming into the system, it can then become a good idea to start gathering data into a standard format. This means creating a small set of queries that you might try to run to learn more about the critical paths that users are taking, and where you can adjust their flow. At this point, it’s important to have the vision of the product become fairly stable so that you are starting to optimize the edges rather than reinventing the core constantly.

The kinds of ad hoc queries worth doing revolve around whatever are the tactical goals of your business. If you are trying to come up with a monetization strategy, you should try to figure out your average order size and what percentage of users that start a buying process finish it. Once you create a small list of these queries, then you can start to formalize the ideas into specific metrics that you track daily.

If any ad hoc queries return data that is similar to what you could get out of Google Analytics (for example, aggregate numbers like pageviews and uniques), it’s probably a dumb idea to try to do those in-house. Don’t do more work than you have to! Instead, the only homegrown stuff should be so specific to your business that it’s easier to do in-house than to shoehorn it into a 3rd party analytics stuff. Don’t waste your effort on numbers a off-the-shelf analytics pacakge would get you.

Assuming that your product is stable, most startups will want to tackle this within the first few weeks (but obviously not until you have data)

Step 5: Formal in-house reporting
Once the product features (and thus the user flows) are sufficiently mature to invest in this area, then it makes sense to formalize out the reports. Typically I would start out with a series of pretty plain HTML pages using tables that just print out SQL queries. You can add finishing touches like percentage %s, key ratios, etc. as you go. I generally invest zero time into cute visualizations and graphs, and prefer to read the key numbers.

How many reports should you generate? I find that it’s pretty addictive to build reports and get a clear understanding of what’s actually happening in your product. So create enough that you can make key decisions, but don’t go too far either – you’ll hit diminishing returns quickly. Generally, 2-3 reports are good enough to start, but ultimately you’ll probably track dozens of dashboards each focusing on specific aspects of your business like.

  • System performance and uptime
  • User acquisition via each method you use
  • Aggregate metrics
  • Retention
  • Engagement
  • Content creation?
  • Ads and monetization?
  • Pricing and revenue?
  • etc.

Anyway, get enough data but not too much – it’s a fine balance. For timing, it probably only makes sense to do this once the product is quite stable and the key user flows are stable as well. This is likely at least a month or two out from the prototype stage.

Step 6: Too much data! Reports are too slow!
If you’re lucky, eventually your reports will be too slow. At Revenue Science, we were gathering somewhere like 1 billion pixel hits per day, and that had to be translated into reporting. Ouch. So you likely will go through a couple specific steps:

  • Reports will initially query the production server – eventually this doesn’t work and slows down the site
  • Reports and data are then moved off to a slave machine, where the queries still happen in real-time – but eventually this doesn’t work either because it’s too slow and there’s too much data
  • Reports and data are then pre-processed every hour, and then served up – which is fine, until your queries take too long, and you have go keep moving
  • Data is then replicated across a number of slave machines, where the pre-processing happens
  • etc.

There are many many layers of incremental improvements you can make here – but the toughest nut to crack, in the case where your web product is HUGE is that you will be inserting more data into the system than the system can process within a reasonable time.

Then the more exotic technologies like Hadoop, HBase, Hypertable, etc start to make a difference. Most sites don’t have to deal with this so I’ll stop here!

Conclusion
Eventually, most serious analytics-driven businesses have to build their own internal analytics. It’s not pretty, but it has to be done. Hopefully the above article gives some background on the key issues you might want to look at as you scale up your product.

If you liked this blog post, please recommend it to a colleague and/or click here to get updates via email or RSS.

PS. Get new updates/analysis on tech and startups

I write a high-quality, weekly newsletter covering what's happening in Silicon Valley, focused on startups, marketing, and mobile.

Views expressed in “content” (including posts, podcasts, videos) linked on this website or posted in social media and other platforms (collectively, “content distribution outlets”) are my own and are not the views of AH Capital Management, L.L.C. (“a16z”) or its respective affiliates. AH Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell -- or a solicitation of an offer to buy -- any securities, and may not be used or relied upon in evaluating the merits of any investment.

The content should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources. While taken from sources believed to be reliable, I have not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. The content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website -- or on associated content distribution outlets -- be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles -- which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters. There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available at https://a16z.com/investments/. Excluded from this list are investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets. Past results of Andreessen Horowitz’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Please see https://a16z.com/disclosures for additional important information.