Patterns vs. Research Papers
After I finished teaching my tutorial someone came up an essentially said that the
patterns are not that useful for their work because they are mostly old stuff. For
example, topic-based publish-subscribe is soo 1999 and not really interesting anymore.
This discussion reminded of me of one important difference between patterns authors
and research paper / thesis authors. The goal of pattern authors is to find common
themes in existing usage and to make them easily understood. The goal of research
paper authors is curiously opposite. They need to find something that no one has done
yet and make it appear very different from everything else. So I guess you can expect
s little bit of natural friction between the two groups. Cynics might say that patterns
book only contain stuff you already knew while research papers are never interested
in something that is actually useful in practice.
Still, I like to attend more academic conferences from time to time. First, JavaOne
et al often degrade into a series of advertorials so it is refreshing to see less
snazzy but more information rich presentations. It is also interesting to see what
the researchers are working on because in a year or two some of that work might find
its way into commercial products. So it's a little bit like looking into the future.
So what were some of the current research topics that were discussed?
One interesting track focused on publish-subscribe messaging. Most of the talks focused
on efficient implementations of content-based publish-subscribe. A Publish-Subscribe Channel sends a copy of a message to multiple recipients, based on the subscription preferences
of the subscribers. It can be useful to distinguish levels of sophistication for subscribers
to express their subscription preferences:
- Simple Topics. The simplest way to subscribe to messages is to specify a topic. Any message published
to this specific topic will be sent to the subscriber. The topic is often expressed
as a simple string. Within the context of our book we would actually consider each
topic a logical Channel.
- Topics with Wildcards. Wildcards are a simple enhancement to simple topics. Most of these systems use the
topic string to represent a hierarchy of individual "topic nodes". For example, a
message might be sent to the topic "prod.customer.new" to indicate that the message
is part of a production system, relates to customer data, and indicates that this
message is the result of a new customer being added. Subscribers can subscribe to
"prod.customer.*" to receive all messages published to topics that start with "prod.customer",
such as "prod.customer.new" or "prod.customer.delete". This type of publish-subscribe
messaging is quite common, for example in TIBCO Rendezvous. Interestingly, the JMS
specification does not define such a topic hierarchy but virtually all JMS implementations
support it with slightly varying syntax notations.
- Content-based Subscription. Not all ontologies are easily represented in a hierarchy tree as the "Topics with
Wildcards" approach requires. A topic tree forces all messages to be attached to a
hierarchy. If a message is identified by a number of independent fields and values,
this mapping can be unnatural. For example, if messages are meant to be routed by
business divisions (operations, finance, HR, etc) and environment (production, testing,
etc), should business division sit above environment in the tree or below? They are
really two independent dimensions of the messaging space. Content-based subscriptions
support this model by allowing subscribers to express their subscription preferences
by boolean expressions on fields rather than a topic hierarchy. For example, a subscriber
could subscribe to " Division='HR' and Environment='Production' ". Any message whose
fields match this expression will be routed to the subscriber. The JMS API supports
this concept via Message Selectors (the expressions are limited to header and property fields). Message Selectors allow
subscribers to specify an expression to apply against incoming messages. Applying
Message Selectors against a publish-subscribe channel (a JMS topic) results in the
same semantics as content-based publish-subscribe. However, most implementations implement
this functionality in the receiver using a reactive filtering approach as opposed
to in the messaging infrastructure. This implementation is naturally inefficient because
a message has to be delivered to the end point just to find out that the message does
not match the expression. Most of the talks at ACM Middleware were concerned with
how to make this implementation more efficient.
- Content-based Subscription with Aggregation. When using content-based subscriptions it is likely that a message fulfills part
of a condition but does not contain the fields related to another part of the condition.
Advanced experimental middleware can store this partial message and then wait for
another message that might fulfil another part of the expression. For example, an
endpoint can subscribe to " A='X' and B='Y' ", with both A and B being optional fields.
Of one message contains A='X' and another B='Y', the messaging infrastructure can
wait for both messages and merge their content. Essentially, the messaging infrastructure
implements an Aggregator. The completeness condition of the Aggregator is to fulfill all pieces of the expression.
Naturally, there are a number of issues to be resolved. For example, what if a message
fulfils one part of the expression but violates another? How should we merge messages
if each message could have a different value for the same field? What to do if we
receive multiple messages that fulfill the same partial expression while waiting for
another part of the expression? Should we keep all messages or just the first or the
last? These are exactly the issues the research papers deal with.
Another discussion we had at the conference was one of languages. Academics often
like to create languages and new language formalisms. New languages can be powerful
as they can cleanly represent a new concept. However, new language syntax is often
a big deterrent to commercial developers. Language syntax is an inconvenience in the
first place (who needs semicolons??) and learning a new one makes it only more frustrating.
Also, development tools for mainstream all-purpose languages like Java or C# have
become so powerful (Refactoring, syntax highlighting, auto-correct, auto-format) that
shifting back to vi or Notepad is only worth it if the new language offers a huge
improvement in productivity. As a result I think that the bar to adoption of new languages
in commercial environments is quite high. Another big factor is skill set portability.
SeeBeyond got so much crap from the analysts for its Monk language. Not because Monk is bad
but because of the cost and risk of training developers in a new language.
it was fun to attend a more academic conference for a change. I am OOPSLA this week,
which manages to maintain a good balance between academics (published papers) and
practitioners (tutorials, practitioner reports, panels etc).