The Dark Side of Encapsulation?

Find my posts on IT strategy, enterprise architecture, and digital transformation at ArchitectElevator.com.

Interface vs. Implementation

It has been stated many times that computer science is the field where every problem can be solved by just one more level of indirection. One instance of such indirection is the separation between interface and implementation. This programming construct manifests itself in many different technologies, such as object-oriented programming, as a language construct of virtually all contemporary programming languages (even Visual Basic!), and as a core concept of service-oriented architectures, which feature standards-based WSDL interface descriptions.

The core driver behind the separation of public interface and internal implementation is twofold:

The interface allows the implementation to vary without users of the interface having to know about it.
The interface can hide complex implementation details.

Hiding complex implementation details is likely the more challenging (and more abused) aspect of encapsulation. Too many times we see developers who purportedly abstracted the internals of their class be wrapping all class variables using getter/setter accessor methods. While this approach does provide a level of indirection, it hardly provides any abstraction or hiding of implementation internals. If each variable is accessible through a setter and getter method, the amount of knowledge that any developer has about the class internals is no less than is the class was simply exposing public properties. True encapsulation on the other hand happens when a complex implementation is wrapped with a much simpler public interface. In some cases, the interface can even present a programming model that is quite different from the internal implementation.

How Does TCP/IP Really Work?

When most application developers think about TCP/IP these days, they think about streams of data that are about as easy to use as writing data to or reading from a file. The abstraction that the TCP/IP stack provides to higher-level application programs is that of a connection-oriented data stream model. The internal implementation is in fact quite different. TCP/IP is layered on top of the basic IP layer, which provides nothing much besides the unreliable routing of individual data packets. In order to achieve the illusion of a reliable data stream, TCP/IP has to deal with issues such as the retransmission of lost packets, resequencing of out of order packets, and elimination of duplicate packets. Much of the design of TCP is concerned with window size, the frequency of acknowledgement messages, and issues such as the Silly Window Symptom. However, none of these complexities surface in the public interface.

The Network stack is a great example of a Layered Architecture, where each layer encapsulates the implementation details of the layers below. Which application designer really needs to know how TCP works, leave alone the "lower" layers like the physical network layer.

Reinventing the Wheel?

The odd thing is that the successful encapsulation of the internal complexities of the lower layers can have an undesired side effect: users of the abstracted interface tend to forget how the lower layers already solve some of those problems that might reoccur at a higher level. This phenomenon has caused more than one integration developer to "invent" message resequencing and / or retransmission schemes very analogous to those already implemented inside any IP stack.

This results in a number of apparent disadvantages.

Developing a resequencing or resending algorithm means spending unnecessary design effort because the problem has already been solved before.
Recreating such a mechanism is likely to be less robust than the implementation incorporated in the IP stack. TCP networking has evolved over many years, has been reviewed by many experts and is very reliable. The few remaining weaknesses are well understood. It is unlikely that a quick-and-dirty solution developed at the application level is going to be that reliable.
Solving a problem multiple times at different layers in a layered architecture can lead to inefficiencies. If a lower layer already resequences packages, do we really have to do it again at a higher level? Or can we get away without performing it at the lower level? Especially in asynchronous messaging architectures, it seems odd that people use a reliable, connection oriented protocol (TCP/IP), layer a connection-less, synchronous protocol on top (HTTP), then make it asynchronous (e.g., for asynchronous Web Services), just to having to make it reliable again (e.g., using WS-ReliableMessaging). Some vendors have recognized these inefficiencies. For example, this realization was one of the drivers for TIBCO to create the RendezVous protocol directly on top of IP instead of TCP/IP with its associated overhead.

So should every developer go study IP stack implementations? Well, knowledge never hurts, but that may be asking a bit much. One would have to wade through these 1000 page tomes just to harvest a few design ideas. This is where design patterns come in handy. The patterns allow us to pass knowledge from the lower layers up the stack in form of design guidance without getting stuck in low-level technical details such as the whether the IP address is transmitted in big endian or little endian format. In fact, many of the design patterns documented in Enterprise Integration Patterns make references to TCP/IP implementation considerations. In case you do want to know all the dirty details of IP networking I can highly recommend the books by Stevens or Tanenbaum.

Hide Complexity, Share Knowledge

In summary, encapsulation is a great thing. It dramatically reduces the complexity of developing software by allowing us to solve problems at a higher level of abstraction without getting bogged down in the details. However, in some cases it may be useful to hide the implementation complexities of lower layers but to make a concerted effort to share the knowledge embodied in them. I hope patterns can help us do this.