Own Your Data

A partial reconstruction of a discussion between Jeffrey Zeldman, Tantek Çelik, and a few others on the merits of self-hosting social content and publishing to various sites rather than aggregating locally from external sources.

I’ve been following discussions like this with some interest lately. Jeremy Keith posted a piece on his decision to self-host his bookmarks and cross-post to Delicious, rather than enter them in Delicious first and rely on their API to get his data out. Stephen Hay wrote a similar post on a shift in the way we post and consume content given the plethora of social content-sharing sites in existence today:

For a while we’ve posted our data all over the internet on all types of services. These services provide APIs so we can access the data we put into them, so that we can do things with that data. Read that again.

The prime example of all this lately is the social bookmarking site Delicious. In December 2010, a slide leaked from a Yahoo meeting indicated that Delicious was to be shut down. People on Twitter and elsewhere on the web collectively freaked out. Some people had thousands of bookmarks that would seemingly be gone, lost forever at the whim of Yahoo. People started to wonder if someone should instead build an open-source version of Delicious, and others pointed out how extraordinarily hard that would be. The end result being, people are starting to realize just how frail our data is. We post photos, articles, tweets, and whatever else we want, to lots of different sites, but we don’t actually have control over that data once we hit “post.”

So is it better to self-host your content and push that data out to separate services, or post directly to those services and pull your content into local backups after the fact? I don’t have that answer. But I did find this exchange on Twitter to be quite fascinating, and wanted to have some sort of linear record of it for posterity.

Update (Jan 10): Jeffrey Zeldman expounded on his thoughts from yesterday in a post on his own site:

We can’t preserve social relationships connected to our data. I can save my photos but not nice things you said about them.

Own Your Data on zeldman.com

Update 2 (Jan 10): Tantek has posted his own follow-up as well:

I’d rather host my data and live with such awkwardness in the open than be a sharecropper on so many beautiful social content farms.

On Owning Your Data on tantek.com


14 thoughts on “Own Your Data

  1. There are already open source alternatives to Delicious (and have been for many years) but the issue is cloning the *network* that is Delicious, not the code/application itself. The same way there are clones of Digg, and Twitter, and every other big “social” site. The issue isn’t the functionality, the issue is getting a huge amount of users, and then being able to deal with those users. I run Scuttle, a Delicious-clone, on my own server. It supports me fine. It would probably support a dozen users, maybe 100 users, but not 1,000+ users. I don’t have the bandwidth/infrastructure/money to do that. Now if we all had our own Delicious clones running, and they were syndicated/federated with each other…

  2. Does anyone else find it somewhat ironic that this series of tweets is being preserved here in this post, never to perish? That being said, I agree with @zeldman and @andyrutledge.

  3. Zak – it’s preserved here in this post for as long as Tumblr and Disqus exist. That’s one of the issues at hand; by relying on third-party services rather than purely hosting my own (like I used to a long while ago), this “backup” is still at the whim of external content providers.

    At some point I’d like to port all this back to a home-grown solution, or at least back to a self-hosted install of something like WordPress.

  4. For five years, I’ve been using service’s APIs (Flickr, Twitter, Facebook, Delicious, etc.) to pull my social content into a local database, and publishing most (but not all) of it at jeffcroft.com. This is the opposite of what Tantek does (publishing locally, and then using the service’s APIs to push content to them). I think it’s really smart to have a local copy of your social data (Tantek’s right, it’s awesome for searching), but I’m not really sure it matter which way you go about it.

    But I can tell you why I chose to do it the way I do it: tooling.

    Let’s just take Twitter, for example. If I want to post a tweet, and I do it Tantek’s way, I have to build some local interface from which to tweet. That takes time and effort. And, it’s likely to be feature incomplete (for example, when Tantek post a reply, it doesn’t seem to use reply-to-id when it gets posted to Twitter, and therefore doesn’t perserve the conversation thread). On the other hand, doing it my way, I don’t have to create any interface — i just download one of the bazillion great Twitter clients already available for any platform, and bam, I’m tweeting.

    Same goes for delicious (or pinboard, or whatever), Facebook, and Flickr — these services already have a ton of great tools built on top of them. For another example, when I’m reading RSS feeds on my iPad using the Reeder app, there’s a built-in function to “Post this to Pinboard.” If I went Tantek’s way, I wouldn’t have a function built into Reeder.

    I am glad that people are finally coming around to the idea that it’s good to have a local backup of your social content, but I think the way Jeremy and Tantek are suggesting we do it is the hard way.

  5. Jeff, one thing to note about the way Tantek does it is this: He is in complete ownership of his Tweets. He publishes them, and then syndicates them out to Twitter. This may have ramifications in the future because while most of us “create” our Tweets using Twitter (meaning, we use some Twitter client which published it *first* on Twitter, Tantek has a method where he published it *first* on his own site. This is an important distinction which may gives you rights you don’t have if you publish on a 3rd party site and pull the data back into your own site. I wrote about a similar thing back in 2004: http://rasterweb.net/raster/2004/09/16/20040916130706/

  6. Tantek is a special case (ie not aware of anyone doing similar) and I unfollowed him specifically because of his implementation. It drove me absolutely nuts that when I was interested by what he said on twitter and clicked the link, I got… exactly the same comment. I think this adds zero value and is web pollution.
    Flipside; its irrelevant whether you publish locally and push or publish remotely and pull, as long as you end up with a full copy locally. I agree with the comment about missing functionality; We are mining our photos from flickr to a local copy (brisbanephotos.com.au) rather than pushing for that reason.

  7. “its irrelevant whether you publish locally and push or publish remotely and pull” I disagree with this… If you read my post about the Waypoint License Agreement, you’ll see that where you publish first can have an affect on things, it all depends on the terms of service and what you agree to when you sign up for a service.

  8. It’s interesting. This discussion put the lie to the entire idea of the “cloud” as a separate entity. Everything online is clouded. In the end, it’s about how much control one has over one’s published content – paper, local machine, cloud, or indexed.

  9. RS – I know, Tumblr’s been consistently unavailable lately. It’s one of the reasons I’m working on moving this site to WordPress instead, so that I have more complete control over it and an easier way of backing everything up.

  10. Andrew, I think when the word “cloud” first started to appear with regards to online services, it was more focused on organizations “outsourcing” their own needs. So instead of running your own mail servers, you’d just let some other company handle it “in the cloud” and not worry about managing you own servers which would probably take more time + money that just paying someone else to maintain it all “in the cloud.” So for instance, at my previous job all mail servers were in-house. If the Internet connection wen’t down, our email didn’t.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>