The jinx on building a unified Personal Content Store. And why Google will break it. (Part 2)

August 19th, 2008

This is the second part of a post on building a unified personal content store. You can find the context for this post in the first part.

It’s been well established that successful software development depends on three core principles – focus on delivering concrete value to users, focus on having a coherent technical architecture and focus on building incrementally. These principles have been offered up as part of rigorous processes as well as in the form of simple practical advice.

What is striking about all the attempts to build the unified personal store has been the almost exclusive focus on technical architecture. A classic case of architecture astronauts running wild! The other two pillars were given short shrift!

1. Delivering concrete value - The focus was on satisfying a abstract and generic requirement instead of meeting specific (and lucrative) requirements to start with. Even after many years Picorp and Chandler were able to only demonstrate trivial usage scenario’s.

2. Building Incrementally – The focus was on creating a mature and generic technical architecture as opposed to incrementally refining the architecture over multiple releases . This also ensured that there was very little user feedback and business validation.

In a lot of ways, Microsoft was uniquely positioned to deliver on this vision of a unified store – I did write about this a couple of years back. However, looking at it today, its clear that it is in fact a company like Google that is ideally poised to overcome the issues identified earlier.

1. Delivering concrete value is easier when there is a large portfolio of apps. Google can focus on offering up small integrations among a few of its applications. Notice the seamlessness with which google docs comes together with google mail and calendar.
2. Building incrementally is easier when you operate out of the cloud. Google is not bound by traditional software release cycles. It can push features and get feedback and get validation within a period of weeks!
3. And building a generic architecture is easier when you have access to top talent! Google has the technical capability to incrementally transform the underlying architecture into something more generic and allow others to plug into it’s unified store. Having a high profile ensures that there is quick buy-in by external developers.

A look at Google’s data api list instantly gives you an idea of far Google has already traveled in the direction of a unified store. What’s more, some of the folks like Mark Lucovsky who were driving WinFS at Microsoft are now at Google! It’s only a matter of time before we see this thing coming together.

The jinx on building a unified Personal Content Store. And why Google will break it. (Part 1)

August 17th, 2008

One of the ideas that has fascinated me for a long time is the possibility of having a highly structured, unified Personal Content Store. So what is this beast? Let me explain.

Consider the two most common types of content that sits on your PC – the stuff that sits in your outlook folder and stuff that sits in your document folder. Now imagine you want to pull together all content involving a particular group of people over a particular period of time about a particular topic. Bizarrely, there is no efficient way for you to do this on the desktop today.

This is precisely the problem that a unified store is expected to solve. The theory is that if all apps share a generic data store, it will be easy to connect entities created in one app to those created by another. So the author of a spreadsheet can be the same entity as the author of a email. Remember that this is a generic concept – for example, you could navigate from events recorded in your calendar to the event’s associated expenses in your personal finance app!

Now, this idea has been around for years. But somehow, all attempts to build this thing have ended up as spectacular failures. Here are few.

1. Microsoft/WinFS. These folks have been at it the longest - Bill Gates himself had pushing for this over many years! WinFS, which was expected to be shipped along with Vista, was finally formally abandoned in June 2006.
2. PiCorp/Piworx. Another ambitious attempt spearheaded by Paul Maritz, who incidentally was a successful VP at Microsoft! After languishing for a couple of years, the company was recently acquired by EMC.

3. OSAF/Chandler. This was a open-source effort driven by Mitch Kapor, the guy who founded Lotus and created the Lotus 123 spreadsheet. Even after 6 years, the product has not had much of a dent. Its painful story was the subject of the book – Dreaming in Code

Even with all these supermen pushing for it, why have things not panned out? Why this jinx when it comes to building something that there is an obvious need for?

In the next post, I will try to answer just this question as well as why Google appears to be well positioned to finally break this jinx.

Semantic Web != NLP + Agents. Atleast, not entirely!

August 12th, 2008

There is an interesting article on CNet by Stefanie Olsen exploring what’s next on the VC horizon. The starting premise of the article (”its all about the data”) is obviously something that I enthusiastically embraced. But then, what followed did make me sigh.

“The first wave of Internet investing dealt with commercializing the Web, helping companies like and eBay get on their way. The second wave has been about helping people socialize and connect through sites like Flickr, YouTube, and Facebook. The third, venture capitalists say, will be about making sense of all the data people create around the Web, and then searching for patterns in the data to improve the delivery of personalized content, search results, or advertising.”

When you declare to a bunch of VC’s that you are a semantic web entrepreneur, the majority of them assume that you are involved in building technology to understand one of two things.
1. human language : process natural language(NLP) to help provide better web search(eg powerset) or better advertising (eg peer39)
2. human intent : analyse user activities such as searching and browsing and have “agents” go out and identify relevant information that can be pushed to the user (eg twine).

The pitch that these VC’s have heard a zillion times goes something like this - “Users are lazy. They are lazy producers of information and therefore we should understand what they are really saying. They are lazy consumers of information and therefore we should figure out what they really want and give it to them”.

It makes a great pitch. But unfortunately, it has been rather hard to deliver on that pitch. Powerset never seemed to take off. And RadarNetwork’s Twine, which hopes to be the poster boy of personalized content discovery, is yet to deliver on its promise.

Both these products attempt to use semantic web technologies to deliver value by exploiting the implicit semantics in information. However there is an alternative application of semweb technologies based on delivering value by exploiting the explicit semantics in information - value that could be in the form of improving content reuse, navigation and enduser procesability.

I for one believe that it’s the later that holds the greater promise for the near future. And that’s the belief upon which we are building our product here at OneBigWeb!

PS : By explicit semantics, I am referring to content annotations that have been explicitly added by user. Mechanisms for doing so include microformats, rdfa and grddl.

Incentivizing semantic content publication

August 11th, 2008

When the semantic web is pitched to developers, one of the first questions that comes up is inevitably to do with the motivation for creating semantic content. “Why will anyone bother making the effort to publish content in this funny looking format to start with”, they ask.

I was excited to discover (through Valentin’s blog) that there’s going to be a entire workshop dedicated to answering this very question at the upcoming International Semantic Web Conference (ISWC2008).

As I see it, there are likely to be two major drivers that will motivate publication of semantic content in the near future.

Increasing audience : If a page is annotated with semantic content, search engines like Yahoo now support display of this metadata as part of the search results (part of search monkey - check out the sites exploiting this capability). This has the potential to improve the visibility of a piece of content and thus increasing audience. Other agregators are likely to follow suite.

Increasing utility : A whole bunch of small utilities are been created to increase the navigability, reusability and procesability for semantic content found on a page. For eg, a firefox extension, Tails, allows users to clip and save contact or event info found on a page. Another example is the extension from AdaptiveBlue which exploits semantic content to allow users to quickly navigate to related sources (go from a product review to a product listing page!).

How do you explain the semantic web in 5 mins?

July 7th, 2008

While introducing the semantic web idea to an audience that has never heard of it, it’s best to focus on a couple core concepts and then incrementally communicate the idea through dialogue and discussion.

This is the approach that I have adopted at events like barcamps and devcamps.Here is the quick ten slide presentation which I have used to kick off the sessions.

The Third Transformation

July 7th, 2008

In the early 90’s, when I was a still in college, I remember reading a interview by Louis Gerstner, then chief of IBM. When asked about how he saw the opportunity landscape for IBM, he spoke of the extent of content and data in the world that was still in analog form, and how each of these instances represented a huge opportunity for a computing company.

What was interesting was that everything that was analog was being seen as a potencial opportunity. Its not often that you can take one single transformative principle, apply it across the landscape of human activity and then go on to effectively reinvent how its all done. An entrepreneur just has to draw up a huge matrix and start ticking off the boxes which represent the most value!

If you look back, one can see three such universal transformative principles that are having a big impact on the world that we live in.

1. Analog to Digital data : The one that Gertsner spoke about. Note that it is not just about transfering data from ledgers into financial accounting apps but includes everything - communications, media distribution, paper money, legal documents … the whole lot.
2. Offline to Online data : This is the one that has occupied us over the last decade – all content available for anyone/anytime/anywhere on any device. Every single domain that made the leap from analog to digital was ready to now make this leap.

3. Disjoint to Linked data : This is the one that we are just getting started on – turning all content in the world into one unified database. Personal, enterprise and web data is simply differentiated by a set privacy attributes!

As you might have guessed from the name of my company, our focus is on the third transformation – going from disjoint pools of data to a linked ocean of data.

Hello World (again!)

July 4th, 2008

After a very long blogging hiatus, I thought it was time to make a fresh start. Henceforth, this blog will focus primarily on the work of my company, OneBigWeb. You could check out the new website!

Check back for more over the next few days!

ps: I have removed most of the past posts from this blog. I have however maintained a small selection - you can access them through the sidebar.