Ma.gnolia Data Corruption: How And What To Back Up?
Larry Halff at Ma.gnolia had to make a terribly difficult announcement about data corruption and loss at social bookmarking site Ma.gnolia.

Ma.gnolia, originally uploaded by Stowe Boyd.
Leaving aside the specifics, the general issue raised is really important in the Web world we live in: what recourses are there for backing up your data when you use these services?
I use Tumblr for something like ‘bookmarking’ these days, and I haven’t even researched archiving off service yet.
Back in 2000, I lost my first blog and all my content, Message From Edge City, when Convey.com went out of business. But I guess I haven’t really learned my lesson. I occasionally make a backup of /Message, but not in any regular fashion. And if it were Typepad announcing data loss and corruption today, I would be howling at the moon.
Just like the econolypse, we have to take some measure of personal responsibility for our digital selves.
I am now using the Gmail offline capability, but I am not sure that leads to a service-independent archive of email, and I am pretty sure that Gmail is only caching a few months of my email on my hard drive. I also doubt that any other app can read those archives, at least at this point.
All of my other online data — Flickr, GetHarvest, Basecamp, Twitter, etc. — relies on the procedures and policies of those companies to protect my data. I am not sure what recourse they offer. I am sure in general their service agreements absolve them of any responsibility, but I am not sure what practical alternatives exist to trusting them.
Consider the use case of posting screenshots to my blog. I shifted over to relying on Flickr for this a few years ago — capturing screenshots with Skitch, uploading to Flickr, and then cross posting to Typepad — because I wanted to store my screenshots somewhere that was blog platform independent. Now, I am in the process of moving from Typepad to Tumblr, and that investment will pay off… so long as Flickr doesn’t lose the pictures.
Is there really a way to protect yourself, totally? Do you really need to retain a complete copy of everything you create online?
Alternatively, is there a service that can do this for me? Imagine a tool — perhaps related to Google’s long-awaited Gdrive, or a new offering from Amazon — that would allow me to register all my accounts with all manner of services online, and would periodically cache a backup of all the contents. This service would have to figure out how to do this on a case by case basis — using the xpc interface for blogs, and other approaches for other services, like the Basecamp XML export — but at least it could do so on a more regular and efficient basis than I could. And since it would be a cloud-based service, I wouldn’t have to allocate an increasingly large proportion of my hard drive for it.