Andrew Chen Archives

Subscribe · Featured · Recent · The Cold Start Problem 📘
Dear readers, I have moved to Substack and I will be writing here from now on:
In the meantime, I will leave up for posterity. Enjoy!

Is your website a leaky bucket? 4 scenarios for user retention

Do you have happy, smiling users?
I’ve previously written a lot about metrics and user acquisition but have not written much about metrics and user retention. By retention, I mean the process in which you convert new users who don’t care about your site into recurring users that are loyal and continually drive pageviews.

In general, I would say that more people care about this than pure user acquisition, which is great, but they are often using aggregate numbers to measure this retention. By aggregate data, I mean looking at an overall Google Analytics number, or looking at an Alexa rank, or some other rolled-up metric which doesn’t differentiate between new users that are discovering your site for the first time versus loyal users that are returning to your site.

In fact, in general I think of websites as “leaky buckets” where users are constantly getting poured into the top, and the site is constantly leaking users. In fact, you can imagine that if you pour 1,000 users into any website and then stop additional new users from joining, that 1,000 can only decrease. Over time, some users become loyal and throw off pageviews, but over time, they disappear. The rate at which this happens can be a turned into a metric just like any other number.

The growth disambiguation problem
When you look at a graph that’s going up and to the right, it’s not possible to know if it’ll keep going. It’s basically impossible from purely outside data to disambiguate the following scenarios:

  1. Pageviews are coming ONLY from new users
  2. Pageviews are coming ONLY from one generation of users (like early adopters)
  3. Pageviews are coming ONLY from retained users
  4. Pageviews are coming from new users and retained users

This should be totally obvious to people, but instead I see people pointing at Alexa graphs and saying that site A or site B is doing well, when in fact they could have a deep systemic problem.

In fact, let me argue the following in this post:

From aggregate data (like Alexa), you can figure out what sites are doing poorly at retention, but not what sites are doing well

Let’s start with the first scenario:

1. Pageviews are coming ONLY from new users
In this first scenario, the retention on your site totally sucks meaning that you lose all your people after the first session. That means that the drop off from a 1,000 users flowing in is 1,000 dropping to 0. Your retention rate is 0% from week 1 to week 2 :)

That said, how could you still get pageviews? First off, you obviously get any pageviews a user might create in the first session, even if they never come back. I think the most common scenarios are the following:

  • Users create text content which is SEO’d and placed in the Google index
  • Users send invites via e-mail which are then accepted

In either case, they are some form of “viral loop” that attracts new users even if the original user is never retained. In fact, I bet you that a lot of sites out there are buoyed by their search engine traffic, even when they have really terrible retention rates. All that matters is that they do enough work to generate a couple pageviews, and then bring in the next generation.

Using the bucket analogy, this is a bucket that has a firehose filling it, but all the water leaks out almost immediately. With a big enough firehose, the aggregate stats could look good when they are in fact rather shitty.

2. Pageviews are coming ONLY from one generation of users (like early adopters)
3. Pageviews are coming ONLY from retained users
Similar to the first scenario, you might have a situation where the numbers look great, but it’s because the bucket was able to fill well in the first group of users, but after then, the site sucks at retention. Or the inverse, where there’s no growth at all, but the retention is great.

In either case, this might hint at a bad systematic condition within the site, but ultimately the aggregate numbers hide the problem. In either case, not being able to acquire and retain brand new users is a problem, and without measuring the groups separately, it seems impossible to assess the true situation.

Back to Twitter for a second
So in fact, looking at the Twitter chart, the right answer is “we don’t know.” A plateau’d chart like that could mean that Twitter is doing fine at retaining some set of users, and it’s stalled on new users, or that it’s acquiring news users like crazy but not retaining them, or anything in the middle.

That said, given the fact that Twitter pages show up in Google, which will provide them with a steady stream of new users, and that the average time on site looks closer to a heavily SEO’d site like Yelp than a social site like MySpace (5min instead of 30min, according to, I’d guess that they are actually bleeding users pretty rapidly. Again, it’s hard to do an analysis like this without a lot more data to back it up, but that’d be my high-level analysis.

How do you figure out the health of the site then? Measuring “cohorts”
In general, the solution to the retention measurement problem lies in separating out NEW users and RETURNING users within the analytics. So at the minimum, you’d have to be able to talk about the following:

  • 1 million uniques to the site
  • 100,000 new uniques
  • 900,000 returning uniques from the month before

That’d give you a sense that the site was actually retaining users well. But to take this further, what you really care about is to carve up your userbase into “cohorts,” and measure drop-off rates from time period to time period. Here’s the definition of a time-based cohort:

A cohort is all the users that joined through a particular time period

Only then can you track the retention rate of a SPECIFIC set of users, and then measure other users experiencing an independent scenario. In the “cohort model” you’d end up with a group like:

Users that joined in Week 1
week 1 uniques: 100,000
week 2 uniques: 50,000
week 3 uniques: 25,000

In this model, you’d see that 100k users joined in week 1, and if you follow that “cohort” through, you end up with a 50% drop-off rate from week to week.

But then, in week 2, new users joined as well, which creates a week 2 cohort. Of course, in your aggregate metrics, the site would have 100k uniques in week 1, then 125k+50k uniques in week 2.

Users that joined in Week 2
week 2 uniques: 125,000
week 3 uniques: 50,000

Note that this cohort only goes through 2 weeks because it starts at week 2 and ends at week 3, whereas the week 1 cohort is able to run 3 weeks.

When you compare to the week 1 to week 2 cohort, you can tell that 1) there was a 25% increase in new users (100k to 125k), and that the retention rate DECREASED to 40% (50k/100k versus 50k/125k). This would be a red flag that your site was sucking, even if your aggregate stats looked good:

Total site stats
week 1 uniques: 100,000
week 2 uniques: 175,000
week 3 uniques: N/A*
(*since week3 cohort is not defined, 25k+50k+week3 cohort stats)

It’s not clear what your time period should be – perhaps weeks, perhaps days, perhaps months. Probably it depends on the average time between your users logging in, or something similar.

Is there a retention coefficient?
In fact, one might argue that in analyzing these cohorts that in addition to a “viral coefficient” which is measured in viral marketing, there’s in fact a “retention coefficient” that measures how well you are able to keep ahold of users.

This would be true if the cohorts you chose typically lose a constant % from week to week. That would mean that every cohort decays exponentially, which would give you a coefficient. (i.e., f(x) = e^-ax, where a is the retention coefficient)

Please measure and e-mail me your findings ;-)

PS. Get new updates/analysis on tech and startups

I write a high-quality, weekly newsletter covering what's happening in Silicon Valley, focused on startups, marketing, and mobile.

Views expressed in “content” (including posts, podcasts, videos) linked on this website or posted in social media and other platforms (collectively, “content distribution outlets”) are my own and are not the views of AH Capital Management, L.L.C. (“a16z”) or its respective affiliates. AH Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell -- or a solicitation of an offer to buy -- any securities, and may not be used or relied upon in evaluating the merits of any investment.

The content should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources. While taken from sources believed to be reliable, I have not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. The content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website -- or on associated content distribution outlets -- be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles -- which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters. There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available at Excluded from this list are investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets. Past results of Andreessen Horowitz’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Please see for additional important information.