Serverless Integration Patterns on Google Cloud Functions

Find my posts on IT strategy, enterprise architecture, and digital transformation at ArchitectElevator.com.

Serverless

After examining which patterns are embedded in Google Cloud Pub/Sub in an earlier post, I set out to implement a few common messaging patterns on top of Google Cloud Functions, Google's serverless implementation.

Serverless is one of the latest buzzwords in the cloud world and a name that's easily mis-understood. Of course, serverless applications still run on servers. The key point that the application owner doesn't have to worry about which server their application runs on. Technically, that has been true for most PaaS (Platform-as-a-Service), but serverless takes the concepts of deployment automation to a new level. Mike Roberts wrote an excellent article explaining serverless in detail. I generally summarize the evolution towards serverless as follows:

Physical servers: There was a time when to run software you had to order a physical server. This would take months, involved a fair bit of manual effort and usually you'd end up with a server that's hopelessly oversized because having to order another one would take another three months. And your application wouldn't scale out across both, anyway, so you'd have to toss the first one you ordered. The easy way out was to run the software on the PC under your desk.
Virtualization: Server virtualization was a major step ahead: instead of ordering and mounting an actual physical machine, existing machines would be partitioned into multiple virtual machines, drastically reducing provisioning times, increasing hardware utilization, and making server sizing more flexible. However, deploying applications onto these servers remained a separate, often manual task. Also, many applications did not utilize the virtual resources very well.
Containers & PaaS: Docker-style containerization addresses two issues: it packages applications in self-contained images that can be automatically deployed and it allows multiple application instances to share a single server securely and with sufficient isolation. Combined, containers vastly improve application deployment and resource utilization. Platform-as-a-Service and Container Orchestration tools deal with managing container instances, such as automated restarts, and inter-container communication.
Serverless: While PaaS and containers are a huge step forward from manual deployment on physical servers, they still work based on the concept of applications that are deployed once or in a fixed number of instances. As demands on the application increases, new instances would have to be deployed. The serverless approach improves both aspects: instead of complete applications, serverless deploys individual functions, which are dynamically instantiated as requests come in and thus scale automatically based on current demand.

The dramatic progress can be summarized in a table as follows. The figures are meant to be qualitative in nature and can vary depending on many factors:

Stage	Deployment unit	Deployment method	Deployment time
Physical server	Server	Manual	Months
Virtual server	Operating System	OS automated, application usually manual	Days or weeks
Container / PaaS	Application	Automated	Minutes
Serverless	Function	Automated	"Real-time"

Google Cloud Functions

Google Cloud Functions is Google Cloud Platform's implementation of a serverless architecture. Cloud functions can be written in JavaScript in a Node.js run-time environment. As the run-time environment manages the execution of the function, a function has to be bound to an endpoint or an event in order to be invokable. Google functions come in two flavors of binding:

HTTP functions are triggered via an HTTP request, e.g. from a browser application.
Background functions are triggered by a Google Pub/Sub message or storage bucket change.

As we are interested in asynchronous messaging, we implement a background function that is triggered by a message event on a pub/sub channel. Writing and deploying cloud functions is quite easy - you can deploy them from the command line if you have the Google Cloud SDK installed. You manage the function bindings via command line arguments. Once deployed, functions are invoked directly by the run-time - that's the power of serverless!

The big advantage of using cloud functions is that the amount of wrapper code needed to bind and invoke code is dramatically reduced. This makes writing messaging pattern examples actually easier, as so much is already taken care of.

Implementing a Content-based Router

Disclaimer: I am not a JavaScript developer, so some of this code is likely non-idiomatic and not production quality. My intention was to focus on the direct translation of the pattern into an expressive implementation. Feel free to send me a pull request with suggestions for improvement.

Let's start with a very simple pattern implementation, a Content-based Router. This pattern inspects an incoming message and routes it to different destination channels based on its content. To avoid exercising unneeded creativity, we stick with the Widgets and Gadgets example from the book, which routes incoming orders to a widget or gadget channel depending on the order type.

Setting up a basic cloud function is quite easy. The function is called with an event parameter that holds all needed data and optionally a callback parameter that the function must call when it is done. Alternatively, the function can return a promise. So pretty much all we need to do is unpack the incoming data, look at the order type, determine the correct channel, and forward the message to that channel. All this can be done with a few lines of JavaScript:

const Pubsub = require('@google-cloud/pubsub');
const pubsub = Pubsub({projectId: "eaipubsub"})

exports.contentBasedRouter = function contentBasedRouter(event) {
  const pubsubMessage = event.data;
  const payload = Buffer.from(pubsubMessage.data, 'base64').toString();
  console.log("Payload: " + payload);
  order = JSON.parse(payload)

  outChannel = getOutChannel(order.type);
  console.log("Publishing to: " + outChannel)

  return pubsub.topic(outChannel).get({autoCreate: true}).then(function(data) {
    var topic = data[0];
    return topic.publish(order);
  })
};

function getOutChannel(type) {
  switch(type) {
    case "widget":
      return "widgets";
    case "gadget":
      return "gadgets";
    default:
      return "unknown";
  }
}

Most code examples in the book are extracts that rely on quite a bit or wrapper code in order to function. In this case, the code above is all the code there is! There's a fairly good JavaScript library for Google Cloud Platform, which we require to publish messages. Luckily I found a function to automatically create a topic if it doesn't yet exist, which eliminates a lot of conditional code that plagued my first version. The pub/sub message comes in in the data field of the event parameter. JSON data is base64 encoded, so we unpack it first and do some unnecessary logging for our own entertainment. After we parse the data into JSON, we look at the type field to determine the channel to relay the message to. That's almost it - we still have to get a reference to the topic and off we go. The get and publish methods return JavaScript promises. In case of get, we process the result synchronously so we can return the promise returned by the publish method to Google Cloud Functions.

All that's missing is the dependency on Google Cloud Pub/sub in the package.json file:

{
 "dependencies": {
    "@google-cloud/pubsub": "~0.10.0"
  }
}

Deploying & Running the patterns

Deploying a function from the Cloud SDK is quite straightforward:

gcloud beta functions deploy contentBasedRouter --stage-bucket eaipubsub-functions
            --trigger-topic orders

This command deploys the function and binds it to the topic orders within our project, resulting in the full topic name projects/eaipubsub/topics/orders. We are now ready to feed some messages in this topic and see our pattern function in action:

gcloud beta functions logs read contentBasedRouter
D      contentBasedRouter  118580632422832  2017-04-23 17:53:34.912  Function execution started
I      contentBasedRouter  118580632422832  2017-04-23 17:53:38.788  Payload: { "type":"widget", "quantity":3, "ID":123 }
I      contentBasedRouter  118580632422832  2017-04-23 17:53:38.799  Publishing to: widgets
D      contentBasedRouter  118580632422832  2017-04-23 17:53:38.843  Function execution took 3933 ms, finished with status: 'ok'
D      contentBasedRouter  118580674297419  2017-04-23 17:54:13.921  Function execution started
I      contentBasedRouter  118580674297419  2017-04-23 17:54:15.873  Payload: { "type":"widget", "quantity":3, "ID":123 }
I      contentBasedRouter  118580674297419  2017-04-23 17:54:15.879  Publishing to: widgets
D      contentBasedRouter  118580674297419  2017-04-23 17:54:15.914  Function execution took 1994 ms, finished with status: 'ok'
D      contentBasedRouter  118580598336288  2017-04-23 17:54:15.921  Function execution started
I      contentBasedRouter  118580598336288  2017-04-23 17:54:15.936  Payload: { "type":"widget", "quantity":3, "ID":123 }
I      contentBasedRouter  118580598336288  2017-04-23 17:54:15.937  Publishing to: widgets
D      contentBasedRouter  118580598336288  2017-04-23 17:54:15.999  Function execution took 78 ms, finished with status: 'ok'

We can see that execution times vary quite a bit, which hints at some cache loading / warm-up being at work. After I converted the function to return a promise instead of explicitly calling back, the times appeared to get a bit more consistent. I didn't do any performance tests, though, and we have to keep in mind that a setup using pub/sub channels is not intended to minimize latency but to maximize throughput.

Stateful Patterns

Cloud functions are stateless - that's how they can be instantiated and discarded by the framework at will. Stateful patterns like an Aggregator therefore require use of a database. I'll tackle those next.

The source code is available on github.com.