Andrew Chen Archives

Subscribe · Featured · Recent · The Cold Start Problem 📘
Dear readers, I have moved to Substack and I will be writing here from now on:
In the meantime, I will leave up for posterity. Enjoy!

Data portability: Is the social network data you’re hoarding treasure or trash?

I recently wrote a guest blog yesterday at, and wanted to republish it here as well.  

Data portability: What's the value of your social network data?
This blog post will be focused on the business-perspective of how a
company operating one of these social networks might think about their
data, particularly in regards to advertising monetization.

There's been a lot of discussion on the data portability issue in one form or the other. The consumer perspective on the data portability issue on the consumer side has been well-covered, and is well represented by Robert Scoble, Marc Canter, Gillmor Gang, and others. This is a big topic, especially when you've added as many friends
to Facebook as our Doonsberry friends have, in the above comic.

Business reasons to resist portability

When a company has aggregated a critical mass of audience and data, it's clear that data is worth something, but unclear how much. In particular, one might be resistant to data portability for a number of reasons, including:

  • If the data can be monetized for through advertising means, then a company might want to have proprietary access to that data
  • If a competitor can easily import user data, it makes it easier to switch services
  • If a user can too-easily share their data with external services, it may create privacy and security issues
  • … and others

There are many other reasons why businesses are reluctant to jump full bore into releasing control of the data, some great for consumers, some neutral, and some completely unaligned with their users. It's my opinion that, like the way Windows has evolved, you want to provide access but need to make it very clear what they are getting themselves into.

The reason why spyware has turned into such a huge industry was that for many years, it was far too easy to install any executable off the internet – and the Operating System gave poor warning on what users were trying to do. There are things you want to do to make sure you're not destroying an entire ecosystem, while still supporting the goals of your users.

My particular interest in this question mostly has to do with the value
of the data, particularly from an advertising standpoint – the first
bullet above.

The monetization of user data

The question is, if companies are busy hoarding all this user data – what is it really worth? How do you evaluate its value? And how does it fit into the context of the overall advertising market?

To outline the answers to this question, I'll cover a couple specific topics:

  • Ad network business models
  • Interest versus intent
  • Data to traffic overlap

Then I'll conclude with a short discussion on the future of social network data.

Ad network business models
The market for user data is very early. Only in
the last few years have companies emerged like Revenue Science, Tacoda,
Blue Lithium, and other companies you see on this list.
Note, of course, that I was previously employed by Revenue Science and
worked on their direct response ad network (in addition to other roles).

But to step back: For newbies to the advertising world, it's important to note that there are many many ad networks out there besides Google AdSense. For example, Blue Lithium, Valueclick, ContextWeb,, etc are all ad networks that fundamentally do the same thing:

Buy ad space at a lower price, then resell it for a higher price

Quite simply, it's arbitrage. So they sign up publishers, get them to stick ad code on their pages, and then fill the space with banner ads, Punch-the-monkey flash games, etc. The bigger the delta between what the buy it for and what they sell it for, the better their profit margins.

The problem is, there's like 300 ad networks out there, and it's getting more competitive every day. So like a Wall Street bank, these ad networks have to get smarter (and bigger!). They allow advertisers to target on context, geography, time, demographics, and many other factors. They support Flash ads, text ads, video ads, banner ads, all in many different sizes.

In all the targeting, they become huge consumers of data. To competitively identify low-value ad inventory and buy it on the cheap, you need to have more data than:

  • the publisher you're buying it from
  • the 299 ad networks who are also looking for the low-value inventory

If you have less data than either, then the ad inventory price will get bid up, and all of a sudden it'll be hard to get the volume of traffic you want. And thus, it makes sense to voraciously gather and utilize all the data you can, across many different areas. In particular, "user data" is interesting – if you can tell when someone in the market for a car, ad impressions against that user are suddenly very valuable.

Question is: What kind of data is valuable? And what kind is not?

Interest versus Intent
The first step to understanding the value of data is to look at the marketing funnel below:

You can consider the top part as consumer interest whereas the bottom part is consumer intent.

A user moves through a long funnel before coming into market and exhibiting buying signals (aka Intent). And there are a lot more people at the top, who are sorta kinda maybe in the market for a car (but maybe don't even know that they are) versus the folks at the bottom of the funnel who are ready to get their car loan processed and drive out to dealerships the next weekend.

When you are the bottom of the funnel, you are part of a select group, and because you are very close to taking action, it's easy to value you as a user. Here's how a car dealership might figure that out:

  • The dealership closes roughly 1% of anyone coming through as a "lead"
  • They make on average $2000 per car they sell
  • So they are willing to buy a lead for $20
  • Then build in some margin, and they're willing to spend $10 on a lead

(Note: these are made up numbers)

The problem here, however, is that there are only so many people ready to buy at one time. So typically, all this inventory gets sold out, and then you have to move upstream to buy more users. In particular, there's often a big concern that a product can get left out of the "consideration set" if it's not branded well. That is, even if you're buying all the Ford dealership leads as you can, if you can't position your gas guzzling SUV for the eco-conscious set, they never get the chance to filter to the bottom.

When you're at the top of the funnel, it's hard to value the ROI from advertising to those users. The focus there is to just be in the game and inside the "consideration set," as I mentioned. So the targeting there isn't typically focused on in-market status, but rather on more qualitative things like:

  • demographics
  • psychographics
  • desirable editorial areas (to complement the values of your brand)
  • really cool ad creative
  • etc.

Anyway, it's not as quantitative, and the distance between the brand side and revenue is often larger than folks want it to be. But it works, even through the following classic advertising quote applies:

"Half the money I spend on advertising is wasted; the trouble is I don't know which half"
— John Wanamaker

As a result, it may not surprise you that data around in-market behaviors (aka Intent) are worth a LOT more than the more watered down stuff (aka Interest), particularly because you can prove to advertisers that the former will make them money.

For search engines, the ultimate collector of consumer intent, you can get 1000X+ times the monetization levels that you'd get from social networks. Social networks, being communication-oriented, have very little intent relative to other sites on the internet. This doesn't mean the social network data is worthless, but it's definitely hard to use it to monetize.

Where are other places you can find intent?

  • Comparison shopping sites
  • Product reviews
  • Loan calculators
  • Shopping sites
  • Search engine marketing landing pages
  • etc.

Think of any service you might go to in order to make a transaction, or prior to making a transaction. The more of that data you have, the better off you are.

On the flip side, this data is scarce. The bottom of the funnel doesn't have many people, and because people aren't typically shopping forever on a product, it means the data is perishable.

Data to traffic overlap
Once you have the data, you have to figure out how to use it to buy-low and sell-high. One of the big questions revolves around when/where your data is applicable – and this problem is sometimes referred to as "overlap."

Let's say that you collect data about a bunch of unique users on your site, and all those users are very valuable. Then you want to find those same users on some other site, which has cheapo inventory. The plan is that if you can buy that inventory for cheap, but you can figure out the good stuff in there, then you can buy just the good stuff. Sounds great right?

Problem is, what's the overlap of users between your site and this publisher's? If it's small, then you might not be able to write a big check to justify the expense of doing the transaction. If you have 100k users and then you're finding some % of that on some other site, then that's not so exciting. So you really need to aggregate a ton of data to make this transaction work. And ideally, you are able to use your own data, but also use the data of other similar ocmpanies – this allows for more opportunities to bring in new users, rather than just recycling the current set of users you already have.

This issue of insufficient overlap has been alleviated somewhat recently. Since the mid 00s, there's been a number of ad networks that allow you to buy advertising by-the-cookie. Right Media, in particular, leads in these types of transactions. But Valueclick,, etc all can provide similar arrangements as well. So given that these ad networks have already pre-aggregated a huge amount of inventory (several hundred billion pageviews per month), you can get reasonable scale on your data even if you don't have too much data. The downside to this, of course, is that it introduces yet another middleman into the mix, and since they know you are buying by-the-cookie, it's easy for them to charge you a little extra. Doh.

So to summarize the article above:

  • One potential issue that makes social networks resist data portability is the monetizability of the data
  • Not all user data is created equal, there's interest versus intent
  • Social networks generally produce lots of low-value interest data, which has weak ROI attached to it
  • Search engines, review sites, comparison shopping, etc all produce high-value intent data
  • Even if you have the data, you have to worry about whether or not you have enough of it to matter – although ad networks and exchanges have started to alleviate that

Questions and comments welcome!

PS. Get new updates/analysis on tech and startups

I write a high-quality, weekly newsletter covering what's happening in Silicon Valley, focused on startups, marketing, and mobile.

Views expressed in “content” (including posts, podcasts, videos) linked on this website or posted in social media and other platforms (collectively, “content distribution outlets”) are my own and are not the views of AH Capital Management, L.L.C. (“a16z”) or its respective affiliates. AH Capital Management is an investment adviser registered with the Securities and Exchange Commission. Registration as an investment adviser does not imply any special skill or training. The posts are not directed to any investors or potential investors, and do not constitute an offer to sell -- or a solicitation of an offer to buy -- any securities, and may not be used or relied upon in evaluating the merits of any investment.

The content should not be construed as or relied upon in any manner as investment, legal, tax, or other advice. You should consult your own advisers as to legal, business, tax, and other related matters concerning any investment. Any projections, estimates, forecasts, targets, prospects and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Any charts provided here are for informational purposes only, and should not be relied upon when making any investment decision. Certain information contained in here has been obtained from third-party sources. While taken from sources believed to be reliable, I have not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. The content speaks only as of the date indicated.

Under no circumstances should any posts or other information provided on this website -- or on associated content distribution outlets -- be construed as an offer soliciting the purchase or sale of any security or interest in any pooled investment vehicle sponsored, discussed, or mentioned by a16z personnel. Nor should it be construed as an offer to provide investment advisory services; an offer to invest in an a16z-managed pooled investment vehicle will be made separately and only by means of the confidential offering documents of the specific pooled investment vehicles -- which should be read in their entirety, and only to those who, among other requirements, meet certain qualifications under federal securities laws. Such investors, defined as accredited investors and qualified purchasers, are generally deemed capable of evaluating the merits and risks of prospective investments and financial matters. There can be no assurances that a16z’s investment objectives will be achieved or investment strategies will be successful. Any investment in a vehicle managed by a16z involves a high degree of risk including the risk that the entire amount invested is lost. Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by a16z is available at Excluded from this list are investments for which the issuer has not provided permission for a16z to disclose publicly as well as unannounced investments in publicly traded digital assets. Past results of Andreessen Horowitz’s investments, pooled investment vehicles, or investment strategies are not necessarily indicative of future results. Please see for additional important information.