Distributed (micro-) blogging / how many markets does your protocol support?

I have been idly thinking about a distributed Twitter. The blogging technology we already have does a lot of what we want: your stream of updates ought to be just an Atom feed, and you can subscribe to and aggregate the atom feeds of the people you follow. What does this lack compared to Twitter?

A nice user interface. Surely just a small matter of programming.
Quick updates. For this use pubsubhubbub.
Protected feeds. I'm going to ignore this problem and hope the crypto fairy comes and waves her magic wand to make it go away.
Notifications that someone you don't know has mentioned you or replied to you.

The last one is crucial, because open communication between strangers is a key feature of Twitter. But if you allow strangers to contact you then you are vulnerable to spam. A centralized system has the advantage of concentrating both information about how spammers behave and the expertise to analyse and counteract them. In a distributed system spam becomes everyone's problem, and gives everyone an awkward dilemma between preserving privacy and collecting data for spam fighting.

An alternative approach, since feeds are generally public, is to view this as a search problem. That is, you rely on a third party to collect together all the feeds it can, cull the spammers, and inform you of items of interest to you - mentions, replies, tags, etc. This is a slightly centralized system, but you a search provider is not involved in communication between people who know each other, and search is open to competition in a way that most social networking services are not.

The system as a whole then has (roughly) three layers: clients that can update, collect, and display Atom feeds; servers that host Atom feeds; and search services that index the servers. All this tied together with pubsubhubbub and HTTP. In a successful system each of these layers should be a competitive market with multiple implementations and service providers.

This three tier structure is reminiscent of the web. But a lot of Internet applications have only a two tier structure. This led me to think about what kinds of systems have different numbers of markets.

Zero markets

These are proprietary systems entirely controlled by a single vendor. For instance, Skype, AOL (in the mid-1990s), many end-user applications. A lot of web applications fall into this category, where the client software is downloaded on demand to run in the browser - treating the web (servers and browsers) as a substrate for the application rather than a part of it.

One market

A proprietary system with a published API. Operating systems typically fall into this category, and programmable application software. A characteristic of web 2.0 is to provide APIs for web services.

Social networking typically falls somewhere between zero and one on this scale, depending on how complete and well supported their API is. Twitter was a one market system for years, but is now restricting its developers to less than that. Google's services are usually somewhere between zero and one, often closer to zero.

Two markets

An open system, with multiple implementations of the provider side of the interface as well as the consumer side. Many Internet applications are in this category: telnet, mail, usenet, IRC, the web, etc. etc.

Unix / POSIX is another good example. Perhaps some operating systems are slightly more open than a pure one-market system: NextSTEP has a clone, OpenSTEP, but it was never popular enough to become an open system and then Mac OS X raced away leaving it behind. Wine is playing permanent catch-up with Microsoft Windows.

Many programming languages are two-market systems: Ada, C, Fortran, Haskell, JavaScript, Pascal. Lua, Python, Ruby, and Tcl to some extent - they have reference implementations and clones rather than an independent open specification. Java has multiple implementations, even though the spec is proprietary. Some successful languages are still only one market systems, such as Perl and Visual Basic.

Three markets

The key feature here seems to be that a two market system doesn't provide enough connectivity; the third tier links the second tier providers together. This is not layering in the sense of the OSI model: for example, many two-tier Internet applications rely on the DNS for server-to-server connections, but this is a sublayer, not implemented as part of the application itself. In the web and in my distributed (micro-) blogging examples, the search tier operates as both client and server of the application protocol.

Linux can be viewed as a three-tier POSIX implementation. Linux's second tier is much more fragmented than traditional Unix; so Linux distributions provide a third tier that ties it together into a coherent system.

Perhaps the Internet itself is in this category. The IP stack in user PCs and application servers are consumers of Internet connectivity; Internet access providers and server hosting providers are the second tier; and the third tier is the backbone network providers. This is obviously a massive oversimplification - many ISPs can't be easily classified in this manner. (But the same is also true of two-market systems: for example, webmail providers act in both the client and server tiers of Internet mail.)

More?

The DNS has an elaborate multi-tiered structure: stub resolvers, recursive resolvers, authoritative servers, registrars, registries, and the root. This is partly due to its hierarchial stricture, but the registrar / registry split is somewhat artificial (though it is similar to manufacturer / dealer and franchising arrangements). Though perhaps it can also be viewed as a three tier system: The user tier includes resolvers, DNS update clients, zone master file editors, and whois clients for querying registries. (Grumpy aside: Unfortunately editing the DNS is usually done with proprietary web user interfaces and non-standard web APIs.) The middle tier comprises authoritative servers and registrars, not just because these services are often bundled, but also because you can't publish a zone without getting it registered. The third tier comprises the registries and the root zone, which provide an index of the authoritative servers. (The interface between the second and third tiers is EPP rather than DNS.)

Conclusion

I think I have found this classification an interesting exercise because a lot of discussion about protocols that I have seen has been about client / server versus peer-to-peer, so I was interested to spot a three-tier pattern that seems to be quite successful. (I haven't actually mentioned peer-to-peer systems much in this article; they seem to have a similar classification except with one less tier.) This is almost in the opposite direction to the flattened anarchism of cypherpunk designs; even so it seems three-tier systems often give users a lot of control over how much they have to trust powerful third parties. Unfortunately the top tier is often a tough market to crack…