May 22, 2007

Understanding hypermedia as the engine of application state

REST has four architectural constraints:

  1. separation of resource from representation,
  2. manipulation of resources by representations,
  3. self-descriptive messages, and
  4. hypermedia as the engine of application state.

The constraint with the most mystical reverence is the fourth one. But, really, it's not that hard to understand. It's just an extra level of abstraction above traditional message passing architecture. Here's an attempt to explain based on my current understanding.

In a tightly coupled message passing system, consumers normally depend on providers.

When we introduce an interface, we want to separate the concerns of the Consumer from the Provider. This is good software engineering, in that it enables interfaces to be oriented towards broad classes of consumers, and enables substitution in the provider's location, implementation, or even the organization that provides the service.

But what is the content of the interface? How should it be constrained, and what is the granularity?

Perhaps the best way to understand this is to look at the framing questions one asks when organizing requirements into an architecture. I like Zachman's approach, which is paraphrased here:

Technology-focused architecture tends to focus on the "what" and "how":

This is not to say that architects don't focus on other areas, but there tends to be fewer intrinsic constraints in most runtime architectures to explicitly support these areas.

Hypermedia as the engine of application state is about making sure that your interfaces constrain the "when": logical timing & ordering expectations. Since interfaces are hypermedia types, they flexibly describe the "what", "how", who", and "where" through the uniformity of resource identifiers & data transfer semantics. The "when" is driven by the context of the link within the media.

For example, web browsers have at least two well-understood and related state machines for different hypermedia types: one for HTML, and another for CSS. In HTML, tags like IMG, OBJECT, and SCRIPT tags represent resources for enriching the current context, Anchors (A HREF) are side-effect-free state transitions, and FORM tags & children describe side-effect-inducing state transitions. Whereas in CSS, links are only enriching - providing URIs to background images, for example.

The typical web services composition looks like this:

Governed service composition usually adds canonicalization of the "what" and "how" through standard orchestrations and schemas, but the burden is still on the consumer to address timing considerations. This is the case where several services share schema, but still define their own operations & service definitions.

If the servers evolve some of their capabilities that affect timing and order of operations across the composition, the client breaks. There's no way for an agent to predict which operations are "side-effect-inducing" or free of side effects to understand the impact. Furthermore, this approach doesn't loosely couple authority and location of information from the service providing it, since data identifiers are still hidden behind the facade of the service interface. Once again, the burden is on the consumer to maintain context associated with the identifier so that it can be used at a later time.

Most well thought out SOA approaches, or even "naive" REST approaches, begin to use many of RESTs constraints: they adopt URIs for most interesting things in the system, and take advantage of a uniform transfer protocol to underlie the representations. But, they sometimes choose to ignore the hypermedia constraint.

With this approach there is still big benefit in the separation between the semantics, representation, and location or authority of information that is made explicit. But there still is a somewhat tightly coupled end-result: the temporal assumptions are defined & controlled completely by the provider interfaces, and the consumer is subject to their whim.

With hypermedia, the ordering of interactions, discovery of capabilities , independence of location and authority boundaries, becomes an intrinsic function of the media type and embedded URI. All a consumer requires is a single URI to bootstrap the interaction process. The composability of information is defined by the logic behind the media type itself, instead of tightly-coupling it into a client's consumption of today's available & discoverable capabilities. The consumer agent, whether human or automated, only has to specify a high level plan, or goal, and have set of general state machines which are dynamically selected based on message metadata.

This doesn't seem like BPM-land, where analysts merrily draw their processes and change them when the capabilities change in a deploy-time/run-time separation. It is, rather, an online agent-oriented approach. It suggests that composition of unrelated services should occur through introducing a media type that fits the motivation for the composition. It is not a typical way to think about interface design.

So far, the imperfect way I think about it, given my OO background, is the passing of an object-graph to an agent, where pointers are either information/value objects that describe the media type, or are identifiers of information resources. The agent can choose to dereference the identifier, and receives a new graph, of a new type: a state change in a set of composable state machines.

When we think about WS-* style services, there's little notion of graphs of information resources. One exchanges documents with embedded, "managed" data identifiers, like primary keys. The client has to maintain the context of what the identifier signifies and know the provider's assumptions in how, when, and where the identifier should be accessed. All of these assumptions are tacit, and hence, tightly coupled.

Posted by stu at May 22, 2007 12:45 AM