ACM Middleware Conference: Pub-Sub and Related Topics

Find my posts on IT strategy, enterprise architecture, and digital transformation at ArchitectElevator.com.

Patterns vs. Research Papers

After I finished teaching my tutorial someone came up an essentially said that the patterns are not that useful for their work because they are mostly old stuff. For example, topic-based publish-subscribe is soo 1999 and not really interesting anymore. This discussion reminded of me of one important difference between patterns authors and research paper / thesis authors. The goal of pattern authors is to find common themes in existing usage and to make them easily understood. The goal of research paper authors is curiously opposite. They need to find something that no one has done yet and make it appear very different from everything else. So I guess you can expect s little bit of natural friction between the two groups. Cynics might say that patterns book only contain stuff you already knew while research papers are never interested in something that is actually useful in practice.

Still, I like to attend more academic conferences from time to time. First, JavaOne et al often degrade into a series of advertorials so it is refreshing to see less snazzy but more information rich presentations. It is also interesting to see what the researchers are working on because in a year or two some of that work might find its way into commercial products. So it's a little bit like looking into the future. So what were some of the current research topics that were discussed?

Publish-Subscribe

One interesting track focused on publish-subscribe messaging. Most of the talks focused on efficient implementations of content-based publish-subscribe. A Publish-Subscribe Channel sends a copy of a message to multiple recipients, based on the subscription preferences of the subscribers. It can be useful to distinguish levels of sophistication for subscribers to express their subscription preferences:

Simple Topics. The simplest way to subscribe to messages is to specify a topic. Any message published to this specific topic will be sent to the subscriber. The topic is often expressed as a simple string. Within the context of our book we would actually consider each topic a logical Channel.
Topics with Wildcards. Wildcards are a simple enhancement to simple topics. Most of these systems use the topic string to represent a hierarchy of individual "topic nodes". For example, a message might be sent to the topic "prod.customer.new" to indicate that the message is part of a production system, relates to customer data, and indicates that this message is the result of a new customer being added. Subscribers can subscribe to "prod.customer.*" to receive all messages published to topics that start with "prod.customer", such as "prod.customer.new" or "prod.customer.delete". This type of publish-subscribe messaging is quite common, for example in TIBCO Rendezvous. Interestingly, the JMS specification does not define such a topic hierarchy but virtually all JMS implementations support it with slightly varying syntax notations.
Content-based Subscription. Not all ontologies are easily represented in a hierarchy tree as the "Topics with Wildcards" approach requires. A topic tree forces all messages to be attached to a hierarchy. If a message is identified by a number of independent fields and values, this mapping can be unnatural. For example, if messages are meant to be routed by business divisions (operations, finance, HR, etc) and environment (production, testing, etc), should business division sit above environment in the tree or below? They are really two independent dimensions of the messaging space. Content-based subscriptions support this model by allowing subscribers to express their subscription preferences by boolean expressions on fields rather than a topic hierarchy. For example, a subscriber could subscribe to " Division='HR' and Environment='Production' ". Any message whose fields match this expression will be routed to the subscriber. The JMS API supports this concept via Message Selectors (the expressions are limited to header and property fields). Message Selectors allow subscribers to specify an expression to apply against incoming messages. Applying Message Selectors against a publish-subscribe channel (a JMS topic) results in the same semantics as content-based publish-subscribe. However, most implementations implement this functionality in the receiver using a reactive filtering approach as opposed to in the messaging infrastructure. This implementation is naturally inefficient because a message has to be delivered to the end point just to find out that the message does not match the expression. Most of the talks at ACM Middleware were concerned with how to make this implementation more efficient.
Content-based Subscription with Aggregation. When using content-based subscriptions it is likely that a message fulfills part of a condition but does not contain the fields related to another part of the condition. Advanced experimental middleware can store this partial message and then wait for another message that might fulfil another part of the expression. For example, an endpoint can subscribe to " A='X' and B='Y' ", with both A and B being optional fields. Of one message contains A='X' and another B='Y', the messaging infrastructure can wait for both messages and merge their content. Essentially, the messaging infrastructure implements an Aggregator. The completeness condition of the Aggregator is to fulfill all pieces of the expression. Naturally, there are a number of issues to be resolved. For example, what if a message fulfils one part of the expression but violates another? How should we merge messages if each message could have a different value for the same field? What to do if we receive multiple messages that fulfill the same partial expression while waiting for another part of the expression? Should we keep all messages or just the first or the last? These are exactly the issues the research papers deal with.

Languages

Another discussion we had at the conference was one of languages. Academics often like to create languages and new language formalisms. New languages can be powerful as they can cleanly represent a new concept. However, new language syntax is often a big deterrent to commercial developers. Language syntax is an inconvenience in the first place (who needs semicolons??) and learning a new one makes it only more frustrating. Also, development tools for mainstream all-purpose languages like Java or C# have become so powerful (Refactoring, syntax highlighting, auto-correct, auto-format) that shifting back to vi or Notepad is only worth it if the new language offers a huge improvement in productivity. As a result I think that the bar to adoption of new languages in commercial environments is quite high. Another big factor is skill set portability. SeeBeyond got so much crap from the analysts for its Monk language. Not because Monk is bad but because of the cost and risk of training developers in a new language.

Conclusion

it was fun to attend a more academic conference for a change. I am OOPSLA this week, which manages to maintain a good balance between academics (published papers) and practitioners (tutorials, practitioner reports, panels etc).