Archive for the ‘architecture’ Category

Digesting Microservices at muCon

On Friday, I had the privilege of presenting at the very first Microservices conference – muCon. In my talk, Engineering Sanity into Microservices, I spoke about the technical issues surrounding state in distributed systems as a whole, how these become a bigger problem as the number of deployed services goes up, and a few suggested patterns that will help you stay sane. The video is now available on the Skilllsmatter site (registration required).

MuCon was a really enjoyable single-topic conference, the talks ranged from high-level CTO-type overviews all the way to the gory details, and war stories. It will be interesting to turn up next year to hear more of the latter.

My biggest takeaway was from Greg Young’s presentation The Future of Microservices, where he spoke about Conway’s Law. As a reminder:

Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations
— M. Conway

The topical corollary to which he explained as (I paraphrase):

Siloed organizations will never be able to get the benefits of a microservice architecture, as it does not correspond to their communication structures.

Read that again, and really let it sink in.

I will put a layer of interpretation on this: SOA is absolutely not dead. It is a useful tool for highly compartmentalized organizations. Microservices is not its replacement. They are two different tools for different organizational types.

That insight alone was worth turning up for.

Understanding ActiveMQ Broker Networks

Networks of message brokers in ActiveMQ work quite differently to more familiar models such as that of physical networks. They are not any harder to understand or reason about but we need to have an appreciation as to what exactly each of the pieces in the puzzle do by themselves in order to understand them in them large. I will try to explain each component piece progressively moving up in complexity from a single broker through to a full blown network. At the end you should have a feel of how these networks behave and be able to reason about the interactions across different topologies.

One of the key things in understanding broker-based messaging is that the production, or sending of a message, is disconnected from the consumption of that message. The broker acts as an intermediary, serving to make the method by which a method is consumed as well as the route that the message has travelled orthogonal to its production. You shouldn’t need to understand the entire postal system to know that you post a letter in the big red box and eventually it will arrive in the little box at the front of the recipient’s house. Same idea applies here.

Producer and consumer are unaware of each other; only the broker they are connected to

Connections are shown in the direction of where they were established from (i.e. Consumer connects to Broker).

Out of the box when a standard message is sent to a queue from a producer, it is sent to the broker, which persists it in its message store. By default this is in KahaDB, but it can configured to be stored in memory, which buys performance at the cost of reliability. Once the broker has confirmation that the message has been persisted in the journal (the terms journal and message store are often used interchangeably), it responds with an acknowledgement back to the producer. The thread sending the message from the producer is blocked at this time.

On the consumption side, when a message listener is registered or a call to receive() is made, the broker creates a subscription to that queue. Messages are fetched from the message store and passed to the consumer; it’s usually done in batches, and the fetching is a lot more complex than simply read from disk, but that’s the general idea. With the default behaviour, assuming the Session.AUTO_ACKNOWLEDGE is being used, the consumer will acknowledge that the message has been received before processing it. On receiving the acknowledgement, the broker updates the message store marking that message as consumed, or just deletes it (this depends on the persistence mechanism).

Consuming messages

If you want the consumer to acknowledge the message after it has been sucessfully consumed, you need to set up transaction management, or handle it manually using Session.CLIENT_ACKNOWLEDGE.

So what happens when there are more than one consumer on a queue? All things being equal, and ignoring consumer priorities, the broker will in this case hand out incoming messages in a round-robin manner to each subscriber.


Now to scale this up to two brokers, Broker1 and Broker2. In ActiveMQ a network of brokers is set up by connecting a networkConnector to a transportConnector (think of it as a socket listening on a port). A networkConnector is an outbound connection from one broker to another.

When a subscription is made to a queue on Broker2, that broker tells the other brokers that it knows about (in our case, just Broker1) that it is interested in that queue; another subscription is now made on Broker1 with Broker2 as the consumer. As far as an ActiveMQ broker is concerned there is no difference between a standard client consuming messages, or another broker acting on behalf of a client. They are treated in the exact same manner.

So now that Broker1 sees a subscription from Broker2, what happens? The result is a hybrid of the two producer and consumer behaviours. Broker1 is the producer, and Broker2 the consumer. Messages are fetched from Broker1’s message store, passed to Broker2. Broker2 processes the message by store-ing it in its journal, and acknowledges consumption of that message. Broker1 then marks the message as consumed.

The simple consume case then applies as Broker2 forwards the message to its consumers, as tough the message was produced directly into it. Neither the producer nor consumer are aware that any network of brokers exists, it is orthogonal to their functionality – a key driver of this style of messaging.

Local and remote consumers

It has already been noted that as far as a broker is concerned, all subscriptions are equal. To it there is no difference between a local “real” consumer, and another broker that is going to forward those messages on. Hence incoming messages will be handed out round-robin as usual. If we have 2 consumers – Consumer1 on Broker1, and Consumer2 on Broker2 – if messages are produced to Broker1, both consumers will each receive the same number of messages.

A networkConnector is unidirectional by default, which means that the broker initiating the connector acts as a client, forwarding its subscriptions. Broker2 in this case subscribes on behalf of its consumers to Broker1. Broker2 however will not be made aware of subscriptions on Broker1. networkConnectors can however be made duplex, such that subscriptions are passed in both directions.

So let’s take it one step further with a network that demonstrates why it is a bad idea to connect brokers to each other in an ad-hoc manner. Let’s add Broker3 into the mix such that it connects into Broker1, and Broker2 sets up a second networkConnector into Broker3. All networkConnectors are set up as duplex.

This is a common approach people take when they first encounter broker networks and want to connect a number of brokers to each other, as they are naturally used to the internet model of network behaviour where traffic is routed down the shortest path. If we think about it from first principles, it quickly becomes apparent that is not the best approach. Let’s examine what happens when a consumer connects to Broker2.

  1. Broker2 echoes the subscription to the brokers it knows about – Broker1 and Broker3.
  2. Broker3 echoes the subscription down all networkConnectors other than the one from which the request came; it subscribes to Broker1.
  3. A producer sends messages into Broker1.
  4. Broker1 stores and forwards messages to the active subscriptions on it’s transportConnector; half to Broker2, and half to Broker3.
  5. Broker2 stores and forwards to it’s consumer.
  6. Broker3 stores and forwards to Broker2.

Eventually everything ends up at the consumer, but some messages ended up needlessly travelling Broker1->Broker3->Broker2, while the others went by the more direct route Broker1->Broker2. Add more brokers into the mix, and the store-and-forward traffic increases exponentially as messages flow through any number of weird and wonderful routes.

Very bad! Lots of unnecessary store-and-forward.

Fortunately, it is possible to avoid this by employing other topologies, such as hub and spoke.

Better. A message can flow between any of the numbered brokers via the hub and a maximum of 3 hops (though it puts a lot of load onto the hub broker).

You can also use a more nuanced approach that includes considerations such as unidirectional networkConnectors that pass only a certain subscriptions, or reducing consumer priority such that further consumers have a lower priority than closer ones.

Each network design needs to be considered separately and trades off considerations such message load, amount of hardware at your disposal, latency (number of hops) and reliability. When you understand how all the parts fit and think about the overall topology from first principles, it’s much easier to work through.

Update (July 2014): To get a better understanding of what’s going on under the covers, check out the details of how network connectors work.

Desktop Apps made easier, JavaFX and that ESB thing…

Isn’t it always the way, when you want to blog other stuff comes up? I had intended to write up a final post about the last day of Tech Days, but the weather has been great to get the kite out and the holiday is winding down so…

Day 3 was pretty cool, as I went to a few tech sessions related to stuff that I don’t normally work with, as I do web apps most of the time. The Netbeans sessions were pretty good, with a great demo of the Matisse GUI Builder. I think that with Netbeans 6, Java has finally got it’s answer to the VB/Delphi mode of development. The introduction of the Swing Application Framework (JSR 296) and Beans Binding (JSR 295), really takes away a lot of the grunt work in building small to mid sized desktop apps, and Netbeans does a great job in hiding a lot of the initial application setup code. It’s really nice stuff, and to be honest it really drops the barrier to entry. At some stage you will inevitably need to get into the bowels of Swing, but Matisse gives you a great leg up and means that the learning curve can be that little bit easier. The fact that basic CRUD type applications are pretty well automatically generated is a huge help and lets you get down to doing the interesting bits.

I had the pleasure afterwards to turn up to Jim Weaver’s presentation on Java FX that give a great overview of how the technology worked from an architectural perspective. The user interface is defined using FX Script, which has a weird nested CSS-ish feel to it and is used to define your interface, event handlers and UI transitions. This is then compiled down to a Java app. The apps themselves are distrubuted either as applets (remember those?) or via Webstar/JNLP and talk to the home server via JSON invocations, which means that anything can support the interface on the server side. It would be cool to have a play with sticking a Grails app on the back. Nifty stuff.

The last session was no less interesting, as I am finally getting my head around this ESB stuff! I’ve always found the concept a bit esoteric, not having worked in an environment that uses a bus and it’s not something that lends itself easily to kicking the tires. SOA initiatives that I have worked on in the past involved point to point hooks, but I can really see why the ESB concept might come in handy. It’s very easy to get bamboozled by talk of federation, mediation and orchestration. Essentially the idea is pretty simple – hook up everything to a massive pipe, define standard messages and worry only about communication with the pipe itself. The devil, as in any such thing, is in the details – but essentially the pipe handles things like transactionality, message delivery, data transformation, enrichment, routing and the like through underlying mechanisms. You need to understand how to use the specific pipe in question, as with any such piece of infrastructure, but the payoff looks really good. I have not yet come across a decent guide in layman’s english (not a marketecture white paper) as to how to get everything humming, but I feel like the pieces are falling into place.

Winding down the Australia trip this week for my migration to London. Back to reality – CVs, agents, company setups, finding apartments and poms 😉 All I have to deal with is a flood of contractors on the market because of the sub-prime debacle (wasn’t Basel2 supposed to make sure this nonsense wouldn’t happen?) and the April budget rounds. Bring it on!

Unit Testing the Database Tier

Unit testing database code is a bit of a funny problem. Most developers can pretty easily get their heads around unit testing a piece of Java code using interfaces and mock objects. When it comes to database code or DAOs, it suddenly becomes particularly difficult. But why, what is so difficult about testing stuff against the database? Surprisingly enough, the answer is that it has nothing to do with coding or a particular framework, although these do play their parts. It comes down to a complex web of human interaction, version control and managing environments. Let me explain.

The standard unit test has three basic phases:

  • Setup (@Before)
  • Test (@Test)
  • Tear down (@After)

The first sets the test environment into an expected state, the second runs the test and checks that the outcome is as expected, while the final one clears up any test resources.

How does this relate to database testing? Let’s say that we have a DAO that performs a particular select statement. Our test should be to retrieve a particular number of records from a known set. Easy enough. The precondition of course, is that you have a known set to begin with.

It’s ALL about the environment.

Most large development projects go like this: The database guys update the schema. The developers write the code. The developers need a particular data set to exercise the various use cases so they add it to the schema. It all becomes a bit messy.

Eventually, very complex data sets are set up by everyone concerned in a primary schema that keeps getting updated. The database schema generally is not version controlled, as it is constantly being redefined using DDL statements run by the DBAs. Most of the time you will be lucky to get a backup of a schema, with all of the data truncated, as the schema and supporting code (i.e. the application) moves between environments.

Getting back to the test. You set up your data by hand in the master schema so that there were three items in the widgets table where some condition was true. You write your test, it runs against the schema, pulls out the expected three widgets and everything is great. You check in the tests. A week later your colleague, Bob, adds another widget to satisfy his test condition. Your test all of a sudden returns 4 items and the test breaks.

Of course, Bob didn’t actually run your test because he was too busy with his own and the test suite isn’t clean anyway because everyone is falling over each other.

Sound familiar?

What about inserts? The precondition: no sprockets were in the table, the test: insert a sprocket, the postcondition: a sprocket is in your table. Kind of hard to test under the above conditions isn’t it? For one thing, the exact data of the test sprocket may be in the table, so checking by value may give you false positives, while deleting it may get rid of more records than you wanted. What about concurrent tests? With a group of developers running the same tests, they start tripping over each other very quickly and the whole effort becomes an exercise in frustration. At this point the development manager throws up his hands, says that this automated testing thing is a load of bollocks and to get back to your work because they didn’t deal with all this when he was doing VB. Somewhere else, Kent Beck sheds a tear…

Let’s examine what goes on in Ruby on Rails. One of the best ideas that was popularized by this framework was its method for database unit testing. A developer’s workspace has multiple environments by default – development, test and production. You develop against the development schema, designing table structures, and playing with the user interface to your heart’s content. When you run unit tests, the following happens – the schema from development is copied into the test database with no data in it. The framework imports version controlled sets of test data (saved as YAML files) into this new schema. Whenever a test is run, it is guaranteed that the database will be in this state. Any changes a test makes are visible only within the scope of this one test. The tear down step cleans out your changes. This makes life so much simpler, especially if you have been working in the nightmare scenario above.

So how do we get the same sort of effect in a corporate development environment?

You need multiple database schemas in order to unit test your db code.

Pause and re-read that line. It’s not negotiable. Probably two per developer. One with sample data to use while you work on the user interface. The other, a temporary one for unit testing. A whole development team using the one schema does not work. Most projects do it, but that doesn’t mean that it’s a good idea.

Some suggestions for how to manage this. The DBAs have their own schemas. The full DDL for the database is kept in version control. After each change, the full database DDL is dumped and checked in. No UPDATE TABLE statements. Ever. This way you are guaranteed that if you ever want to get a baseline of your system, you can also rebuild the database as it existed at this time. I worked on a very large telecoms project with a huge development team, and this worked. Well.

The test data for your environments is stored in version control – at the very least, as dumps of insert statements. For unit testing purposes, a dedicated unit test framework is beneficial. DBUnit performs the same task in Java as described above for Rails – it loads test data from dumps (a number of formats are supported), and guarantees that the test database exists in the expected state when each test is run.

To test your database code, refresh your test schema with the one from version control – typically using your chosen build system. Ant tasks are generally pretty good for this. Now run your test cases. Gorgeous! No tripping over other people, and your tests are guaranteed to work the same each time. No excuses for a red bar.

So why is unit testing databases so difficult if it doesn’t have to be? Most of the time it involves process change and getting out of bad habits, not just a tool. And change means convincing people. Generally, managers do not understand what benefit there is in multiple database schemas, as it is seen to increase complexity and therefore risk, and DBAs like to have full control over what is going on on their servers. The topic of databases and processes is also a great one for religious zeal.

The process outlined above should explain the hows and whys to the individuals involved. The changes above mean a little bit more setup initially, but a saner development process.

A nice side effect is how easy upgrading databases through your environments can become. Run the latest DDL against a fresh schema, get the differences between it and your target environment using a database compare tool, and fire it off. Beautiful.

Home Cooked vs Open Source. Or, Don’t Build Your Own Workflow.

First thing’s first. I love open source. I think that it’s the best thing since sliced bread. That thing that we were always told about since computer science, that of the open marketplace for components to be shared and reused HAS happened. Just not in the “buy this billing component” kind of way. It’s even better! It’s free (ish)! You download what you want and plug it in to your application. The quality varies, but if you keep your ear to the ground and do your homework, it will save you a lot of time. And time is money.

But just how much money?

I recently discovered Ohloh . It’s like professional networking, but not exactly, and it’s for open source. It has some cool features, but the one that got me instantly was that it trawls through open source repositories and gives you some very cool stats. Like just how much effort it would take to build an equivalent version of something, and how much it would cost given a yearly salary for a programmer. It uses lines of code, which are not a good metric, but rather an OK litmus test.

Another one of my favourite sites is Pick a particular category of software that you need, and it lists you a suite of open source options in Java – ready for you to do with as you will.

So let’s pick a category. Here’s one that I prepared earlier – workflow engines. Pretty much every place that I have ever worked has rolled their own in this regard. For some reason it’s considered a low hanging fruit (even though people write postgraduate theses on them). In some cases, off the shelf offerings are seen as overkill, too restrictive or just too complex. Most of the time the analysis is little more than gut feel, a kind of “Hmm… looks too hard, must have been over-engineered.” So a couple of guys get together and do a “bake at home” version. Pretty soon, the reality takes over and it doesn’t do what it’s supposed to, the use case was misunderstood, you need a management console, version control of process flows, different flows in different environments… uh oh! Suddenly, you spend a lot of time maintaining this beast.

So let’s do the stats and take the first handful of products listed on both sites (there are many others). These vary in the amount of activity going on, have quite varied features and uses – some plug in to applications, others are standalone orchestration engines. It’s not exactly scientific, but it’s interesting for illustration purposes. Let’s take the average programmer salary as $55000 (dollars, euro, pounds – it’s all about the sameish worldwide, so doesn’t really matter):

LOC = Lines Of Code

  1. Apache ODE. 108,547 LOC, 55 Person Years, $1,498,221.
  2. Taverna. 134,334 LOC, 33 Person Years, $1,832,157.
  3. jBPM. 286,618 LOC, 74 Person Years, $4,081,422.
  4. Enhydra Shark. 255,101 LOC, 65 Person Years, $3,576,525.
  5. OpenSymphony OSWorkflow. 48,303 LOC, 11 Person Years, $627,203.
  6. ObjectWeb Bonita. 67,916 LOC, 16 Person Years, $894,118.
  7. OpenWFE. 187,176 LOC, 47 Person Years, $2,608,592.
  8. WfMOpen, 152,557 LOC, 38 Person Years, $2,084,413.

The average cost of building a workflow engine?

155,069 LOC. 42 Person Years, $2,150,333.

Once again, this is completely unscientific. I have no idea whether the cost is over the lifetime of the product or initial development cost, whether management costs are included, we aren’t comparing apples with apples, and these are general purpose engines rather than a thing that does only the thing you want. But it does make for a very interesting question. Have you got the time and money to do this, or would you rather get on with the business problem at hand?

Enterprise software is a complex business. There are no shortcuts. There are no easy decisions. The landscape changes all the time. You have to weigh up support costs, training, extensibility, maintenance and skills.

My take on it? Someone did the heavy lifting already. Do yourself a favour and take advantage of it. If it doesn’t do exactly what you want, then the code is right there to change. You can always contribute it back, and if it’s good enough then it becomes the property of the community. The numbers are compelling.

Thoughts on Mixing EJB3 and Spring

A few weeks ago I blogged about Spring and EJB integration, but something just did not sit well with me. Why would you want to do this at all?

I like Spring. Its myriad components make my life simpler. I can write the business code I need to faster without worrying as much about the plumbing aspects. The Spring contributors have put a lot of thought into reducing the lines of code that I as an application developer need to write. Brilliant! My stuff can be easily tested by applying DI principles and this makes me happy as it takes less effort to make the end client happy.

I have access to all of the services on the JEE platform, including transactions, naming and management and can use it as is appropriate to the task at hand. EJB is also a way of doing the same thing. If two things do the same thing, then why would you want to use them both?

I have read a lot of opinions on this question (actually, most have been on which is better, which is meaningless in this context), and have come to the conclusion that I don’t. The problem space is such that you get no additional benefits in tying the two together.

Spring is a platform, an over-arching container, a way of structuring applications and a tool set. EJB is a component model for developing the business tier that ties in directly into your JEE server. Where they cross over is that they allow your application to access the underlying services in such a way that you don’t deal with them when you write your business code.

You would use Spring to manage your business tier when you want to write a Spring application – ORM templates, Webflow, Web services, Remoting API. You would use EJB when you want to write something like a Seam application – JSF, tying in directly to a business tier that manages your persistence, jBPM for process management etc.

The question is what Style of application are you building – is it multi-tier with clear delineation of function (web tier, business tier, data access tier, anemic model), which is what Spring excels at, or do you want a tighter relationship between your tiers with a framework that manages everything in a much more integrated way (not saying that you can’t achieve this with Spring).
This is what I would term a soft decision – it’s less about technical merit and more about the vibe. Where software design meets gut feel.

Sticking EJB into a Spring app is clunky. It does not sit well. Even the stuff that you may have considered EJB for in that past have equivalents in this new world. Remoting (largely out of favour in preference to WS), listening on the end of JMS queues (which I like MDBs for) have nicer corresponding elements that sit better in Spring.

The application style is up to you at the end of the day. Whether you like the full prescribed JEE stack using EJB, or mix and matching the frameworks that you are productive in with Spring, run with it. Just keep in mind the problems that each piece of your application is trying to solve, and don’t double up. It will only make your applications unnecessarily complex.

As a side note, my current project has dropped EJB3 from the rest of the (Spring) stack.