How to Scale an Organization? The same way you scale a system!

Find my posts on IT strategy, enterprise architecture, and digital transformation at ArchitectElevator.com.

Servers in a Data Center The digital world is all about scalability: millions of web sites, billions of hits per month, more data, more tweets, more images uploaded. To make this work, architects have learned a ton about scaling systems: make services stateless for horizontal scalability, avoid synchronization points to maximize throughput, keep transactions local, reduce synchronous remote communication, use clever caching strategies, shorten your variable names (just kidding!).

With everything around us scaling to never-before-seen throughput, the limiting element in all this is bound to be us, the human users. And the organizations we work in. One may wonder, then, why scaling and optimizing throughput in organizations is considered a very different field usually completely ignored by the IT architects who know so much about scalability. I may have become an Architect Astronaut suffering from Oxygen deprivation due to exceedingly high levels of abstraction, but I nevertheless feel that many of the scalability and performance enhancement approaches experienced IT architects know can be just as well used to scale organizations. If a Coffee Shop can teach us about maximizing a system’s throughput, maybe our knowledge of IT systems design can help improve the performance of organizations!

Component Design - Personal Productivity

Increasing throughput starts with the individual. Some folks are simply ten times more productive than others. For me it’s hit or miss: I can be incredibly productive if I am “in the zone", or can lose track when I am annoyed by something or being frequently interrupted. So I won’t bestow you with any great advice, but refer you to the many resources like GTD – Getting Things Done. The key point of these techniques tends to be to minimize inventory of open tasks (making the “lean" folks happy) and by breaking larger tasks down into small tasks that are actionable. For example, “I really need to replace that old clunker" turns into “visit 3 dealerships this weekend". Incoming “stuff" is categorized and parked if needed until it is actionable to reduce the number of concurrent threads. The suggestions are very sound but as always it takes a bit of trust and lots of discipline to succeed at implementing them.

Avoid Sync Points - Meetings Don’t Scale

Let’s assume people do their best individually to be productive and have high throughput, meaning we have efficient and effective system components. Now we need to look at the integration architecture, which includes interaction between the components, er people. One of the most common interaction points (short of e-mail, more on that later) surely is the meeting. The name alone gives some of us goose bumps because it suggests that people get together to “meet" each other, but does not define any specific objective or outcome.

People at a Meeting From a system design perspective meetings have another troublesome property: Meetings require multiple humans to be (mostly) in the same place at the same time. In software architecture we call this a "synchronization" point, known as one of the biggest throughput killers. A synchronization point almost invariably means that some parties or components wait for others (that’s how they get “synchronized"), which kills throughput.

In many traditional organizations, setting up a meeting can take a month or longer, depending on whose attendance is required, which leads to significantly slowed down decision-making and project progress. Meetings also lead to resource contention: people’s time. When I look at my calendar I all too often see 3 or 4 meetings stacked in the same time slot despite the significant effort put into managing the calendar. The fact that most managers in large organizations have someone manage their calendar underlines the overhead involved in employing meetings as the primary interaction strategy. To top things off, full schedules trigger a positive feedback loop consisting of planning further ahead and blocking time "just in case", which subsequently makes it even more difficult to find a slot. In system design we would call this pessimistic resource allocation, which is also known to be placing a burden on the system.

While meetings can be useful for critical discussions and decisions (see below), the worst kind of meetings must be status meetings. If I want to know where a project “stands" I need to wait for the next status meeting in a week or two. Worse yet, many status meetings I attended consisted of someone reading off slides, which were unavailable to be distributed ahead of the meeting lest someone could read them by themselves and skip the meeting.

Interrupts Interrupt - Phone Calls

When you can’t wait for the next meeting, you tend to call the person. I know well as I log half a dozen incoming calls a day, which I routinely don’t answer (they typically lead to an e-mail starting with the phrase “I was unable to reach you by phone", whose purpose I never quite understood). Phone calls are also synchronous, but don’t require wait time, so they should be better than setting up a meeting?

Yes, phone calls are low-latency because the requestor doesn’t wait, but they also require two or more resources to be available at the same time. How many times have you played “phone tag", i.e. you were busy or on the phone when someone called just to experience the reverse when you call them back? I am not sure there’s an analog to this in system communication (I should know; after all I am documenting conversation patterns…), but it’s difficult to imagine this as effective communication.

Phone calls are “interrupts" (they are blockable as long as you mute your ringer), and in open environment they not only interrupt you, but also your coworkers. That's one reason that the Google Japan desks were by default not equipped with phones - you had to specifically request one, which was looked upon as a little old-fashioned. The damage ringing phones can do in open office spaces was already illustrated in Tom DeMarco and Tim Lister's classic Peopleware. The "tissue trick" won't work anymore with digital phones, but luckily almost all of them have a volume setting. My pet peeve related to phones is that people bust into my office while I am on the speaker phone (conference calls), so I'd like to build a pet project to illuminate an "on air" sign when I am on the phone. IP telephony should make this fairly easy, barring any self-inflicted access restrictions.

Phone calls also also tend to lead to very uneven resource usage. It seems that suddenly everyone calls you, but other times it’s awfully quiet. This is a classic utilization problem that queuing solves by performing “traffic shaping" – spikes are absorbed by the queue to allow the service to process requests at the optimal rate without becoming overloaded.

Piling on instead of backing off

Retrying an unsuccessful operation is a typical remote conversation pattern. It is also a dangerous operation because it can escalate a small disturbance in a system into an onslaught of retries which bring everything to a grinding halt. That's why Exponential Backoff is a well-known pattern and forms the basis of many low-level networking protocols, such as the CSMA/CD (Carrier Sense, Multiple Access with Collision Detection), which is a core element of the Ethernet protocol.

Ironically, humans tend to not back off if a phone call fails, but have a tendency to "pile on": if you don't pick up they tend to call you at ever shorter intervals to signal you that "it's urgent". Ultimately, they will back off, but only after burdening the system with overly aggressive retries.

Asynchronous Communication – Email, Chat, and More

Busy Calendar In corporate environments E-Mail tends to draw almost as much hatred as meetings. It has one big advantage, though: it is asynchronous. Instead of being interrupted, you can process your e-mail whenever you have a few minutes to spare. Getting a response may take slightly longer, but it’s a classic “throughput over latency" architecture, best described by Clemens Vaster’s analogy of needing wider bridges not faster cars (referring to the perennially congested highway 520 bridge that connects Seattle to Redmond).

Of course e-mail has significant draw-backs, the main one being is that people tend to flood everyone with it as it is perceived as having no cost. You must have a good inbox filter if you want to survive – those of us who use Outlook without working search (not the product’s fault, but a "feature" of our setup) are doomed. Also, Mail isn't collectively searchable – each person has their own record of history. I guess you could call that an eventually-consistent architecture of sorts and just live with it, but it still seems horribly inefficient. I wonder how many copies of that same 10 MB PowerPoint presentation plus all it's prior versions are stored on our Exchange server.

One of the best ways I have seen to overcome the limitations of e-mail is to integrate chat with mail: if you don’t get a reply or the reply indicates that a real-time discussion is needed, you select the “reply by chat" button, which turns the conversation into a quasi-synchronous mode: it still allows the receiver to answer at will (so it’s asynchronous), but allows for much quicker iterations than mail. The other effective way to conduct asynchronous communication without e-Mail is to use products like Slack, which favor a chat / channel paradigm (I have not had a chance to use it irl). In systems architecture you could liken this to Tuple spaces, which employ a blackboard architectural style. Such a style is well suited for scalable, distributed systems as it is loosely coupled and avoids duplication.

Speaking of blackboard, the most transformative change in corporate collaboration I have seen was the advent of Google Docs and it’s not due to my 7 years of drinking Google Koolaid. In fact, when Docs first became available internally at Google I complained a lot because it had a level of feature maturity below that of Microsoft Word 5.0. However, being able to collaborate in real-time on a document completely changed the way people work together towards a shared outcome. Having had to go back to mailing Word documents back-and-forth has been an extremely frustrating experience.

Asking Doesn’t Scale – Build a Cache!

Much of corporate communication consists of asking questions, often using synchronous communication. This does not scale because the same questions get asked again and again. Architects would surely introduce a cache into their system to offload the source component.

I was reminded of this when I received repeated requests for simple information, such as a photo of our new team member. I simply typed his name into Google, and replied with the link to the picture: I asked Google, not another person. Search scales, but only if the answers are available in a globally (as in across your organization) searchable medium. So if you receive a question, answer so that everyone can see (and search it), e.g. on an internal forum. That’s how you load the cache. Taking the time to explain something in a short document or forum post scales: 1000 people can search for and read what you have to share. 1000 1:1 meetings to explain the same story would take half or your annual work time.

Poorly Set Domain Boundaries - Excessive Alignment

Wheel Alignment Coupon When I joined corporate IT I had to learn a whole new language. Not just because I moved to Bavaria (ja mei!), but because corporate IT in large organizations has a distinct vocabulary. My coworkers can attest that my most unwelcome word at work is “to align". I often jest that that “aligning" is what I do when my car doesn’t run straight or wears the tires unevenly. Why I need to do it at work all the time puzzled me. Especially as “alignment" invariably triggered a meeting with no clear objective (see above).

In corp speak, “to align" means to coordinate on an issue and come to some sort of common understanding or agreement. A common understanding is an integral part of productive team work, but what worries me is that the act of “aligning" takes on a life of its own. My explanation is that it’s a sign of mis-alignment (pun intended) between the project and organizational structure: the people that are critical to a project’s success or are vital decision makers are not part of the project, requiring frequent “steering" and “alignment" meetings.

The system design analog for this problem are poorly set domain boundaries, drawing on Eric Evan’s Domain-driven Design concept of a Bounded Context. As a side note, Eric’s website layout will quickly reveal that his book was published at the same time as ours.

Self-Service is Better Service

Self-service generally has poor connotations – would you rather eat at McDonald’s or in a white table-cloth restaurant with waiter service if the price was the same? However, if you are a food chain looking to optimize throughput would you rather be McDonald’s or the quaint Italian place with 5 tables? Self-service scales. Like a large supermarket. Requesting a service or ordering a product by making a phone call or e-mailing spreadsheets attachments for someone to manually enter data does not scale, even if you lower the labor cost with near- or offshoring. In order to scale, automate everything: make all functions and processes available on-line on the intranet, ideally both as Web UI’s and as (access protected) service API’s so users can layer services or custom user interfaces on top, e.g. to combine popular functions.

Scaling for Size or for Speed

So far we discussed scaling organizations for speed and throughput. Typically, “Economies of scale" are associated with scale in terms of size, though. Our organization often prides itself in “economies of skill, scale, and scope". Since we are a large organization, this should be a good thing, but I often don’t see the economies of scale in software. A software vendor once visited us and stated that “obviously the license cost per unit goes down if we buy more licenses". To me this is not obvious at all as there is no distribution cost per unit of software, aside from the sales person sitting across the table from me. Whether 10000 customers download one license or 1 customer buys 10000 licenses should be the same, assuming the software vendor automated everything. I guess enterprise software sales still has some transformations to make.

Size is often seen as an advantage, but it also brings disadvantages as many large organizations that wish to compete with start-ups and digital native companies can attest. Large organizations tend to be slower as they are bogged down in internal processes and control structures that are required or perceived to be required to keep a large organization in check. The same holds true for cities: density and scale have definite advantages (e.g. short transportation and communication paths, diverse labor supply), but also brings pollution and congestion problems, which ultimately limit the size of cities. For organizations one big factor is speed of change – in times of little change being big is almost always an advantage. In times of change economies of speed are likely to outweigh economies of scale.

Staying Human

Does scaling organizations like computer systems mean that the digital world shuns personal interaction, turning us into faceless e-mail and workflow drones that must maximize throughput? I don't think so. I very much value personal interaction for brainstorming, negotiation, solution finding, bonding, or just having a good time. That's what we should maximize face-to-face time for. Having someone read slides aloud or calling me the third time to ask the same question could be achieved many times faster by optimizing communication patterns. Am I being impatient? Possibly, but in a world where everything moves faster and faster, patience may not be the best strategy. High-throughput systems do not reward patience.