Archive for the ‘open source’ Category


Beyond the Hype Cycle; Co-operative Open Source

At some point in the lifetime of an open-source project, having survived rewrites, deployments in hundreds of organisations, battle-testing… it becomes boring. The sheen goes off it, and people move on to the newer and shinier. When this happens, the number of maintainers drops off, releases slow down to a trickle, and issues or improvements that users would like to get addressed gather dust in long-forgotten JIRAs. Eventually the level of inactivity causes companies to move their platforms onto other better-supported tech, not because the older is no longer useful (the ultimate measure of worth in software), but because the lack of support becomes a risk.

Over time even companies that support open-source commercially are likely to feel this pressure from within, as the teams that work on the stalwart pieces of often highly-complex infrastructure move apart over time.

This is not specific to any one piece of software, but applies to entire categories of products. What happens off to the very right hand side of the Gartner Hype Cycle? Where does the curve of enlightenment lead to? What does the long end game of a successful product look like?

Perhaps what is needed is a rethink of the relationship between the users and the maintainers of a piece of OSS. Companies often will not go into production without some sort of commercial support, which forms a type of insurance. A subscription or a license is the equivalent of an insurance premium. Over time, the lack of a major production incident, causes them to question the value of that policy. This is a bit like thinking “I have been paying for maintenance of this bridge for 5 years, and it hasn’t once fallen down. What’s the point?”

From a commercial standpoint, once the custom starts to drop off and demand rises in other sectors, it makes commercial sense to ramp down the support on the older products and divert resources elsewhere. But this is only a rational conclusion if the purpose of your company is to maximise profit, rather than support its clients in the long term. Given that this by definition is how commercial companies work, perhaps the problem is the nature of the company itself.

I have been thinking about this area for some time, having recently read Wired Differently – the 50 year story of NISC, a company providing software to electricity providers. NISC is set up as a co-operative, a model typically applied in areas such as agriculture, but that has much wider applicability; there are nearly 7,000 co-ops in the UK – banks, retailers, funeral homes, accountancies…

Under this scheme, the co-operative behaves like any other commercial company, acquiring clients, selling services etc. The big difference is structural in that the shareholders are the staff and customers of the co-operative; clients take a stake in the company through membership. The purpose of the co-op is to principally serve its members as well as possible, while operating at a lower cost than would be possible for a commercial organisation. At the end of the year, the treatment of profits is voted upon democratically by members, to either reinvest or distribute back in accordance to usage of its services. The model works particularly well in industries where specialization and stability are the most highly sought after attributes.

Co-operatives fill a niche in the market by providing long term stability, to the specialists working in those areas, as well as members who rely on the co-op’s services as a key component of their infrastructures. The service would be the ongoing development of these pieces of infrastructure, support, as well as elements that would be impossible for a commercial entity to justify, such as components that proactive monitor the infrastructure and prevent service callouts.

Perhaps it is time to apply the co-operative model to open source; it seems a natural fit. A co-operative as a form of sharing the costs of maintenance; half-way between an external vendor and a dedicated in-house team.

You can read more about the mechanics of co-operatives at Co-operatives UK.

Understanding ActiveMQ Broker Networks

Networks of message brokers in ActiveMQ work quite differently to more familiar models such as that of physical networks. They are not any harder to understand or reason about but we need to have an appreciation as to what exactly each of the pieces in the puzzle do by themselves in order to understand them in them large. I will try to explain each component piece progressively moving up in complexity from a single broker through to a full blown network. At the end you should have a feel of how these networks behave and be able to reason about the interactions across different topologies.

One of the key things in understanding broker-based messaging is that the production, or sending of a message, is disconnected from the consumption of that message. The broker acts as an intermediary, serving to make the method by which a method is consumed as well as the route that the message has travelled orthogonal to its production. You shouldn’t need to understand the entire postal system to know that you post a letter in the big red box and eventually it will arrive in the little box at the front of the recipient’s house. Same idea applies here.

Producer and consumer are unaware of each other; only the broker they are connected to

Connections are shown in the direction of where they were established from (i.e. Consumer connects to Broker).

Out of the box when a standard message is sent to a queue from a producer, it is sent to the broker, which persists it in its message store. By default this is in KahaDB, but it can configured to be stored in memory, which buys performance at the cost of reliability. Once the broker has confirmation that the message has been persisted in the journal (the terms journal and message store are often used interchangeably), it responds with an acknowledgement back to the producer. The thread sending the message from the producer is blocked at this time.

On the consumption side, when a message listener is registered or a call to receive() is made, the broker creates a subscription to that queue. Messages are fetched from the message store and passed to the consumer; it’s usually done in batches, and the fetching is a lot more complex than simply read from disk, but that’s the general idea. With the default behaviour, assuming the Session.AUTO_ACKNOWLEDGE is being used, the consumer will acknowledge that the message has been received before processing it. On receiving the acknowledgement, the broker updates the message store marking that message as consumed, or just deletes it (this depends on the persistence mechanism).

Consuming messages

If you want the consumer to acknowledge the message after it has been sucessfully consumed, you need to set up transaction management, or handle it manually using Session.CLIENT_ACKNOWLEDGE.

So what happens when there are more than one consumer on a queue? All things being equal, and ignoring consumer priorities, the broker will in this case hand out incoming messages in a round-robin manner to each subscriber.

Store-and-forward

Now to scale this up to two brokers, Broker1 and Broker2. In ActiveMQ a network of brokers is set up by connecting a networkConnector to a transportConnector (think of it as a socket listening on a port). A networkConnector is an outbound connection from one broker to another.

When a subscription is made to a queue on Broker2, that broker tells the other brokers that it knows about (in our case, just Broker1) that it is interested in that queue; another subscription is now made on Broker1 with Broker2 as the consumer. As far as an ActiveMQ broker is concerned there is no difference between a standard client consuming messages, or another broker acting on behalf of a client. They are treated in the exact same manner.

So now that Broker1 sees a subscription from Broker2, what happens? The result is a hybrid of the two producer and consumer behaviours. Broker1 is the producer, and Broker2 the consumer. Messages are fetched from Broker1’s message store, passed to Broker2. Broker2 processes the message by store-ing it in its journal, and acknowledges consumption of that message. Broker1 then marks the message as consumed.

The simple consume case then applies as Broker2 forwards the message to its consumers, as tough the message was produced directly into it. Neither the producer nor consumer are aware that any network of brokers exists, it is orthogonal to their functionality – a key driver of this style of messaging.

Local and remote consumers

It has already been noted that as far as a broker is concerned, all subscriptions are equal. To it there is no difference between a local “real” consumer, and another broker that is going to forward those messages on. Hence incoming messages will be handed out round-robin as usual. If we have 2 consumers – Consumer1 on Broker1, and Consumer2 on Broker2 – if messages are produced to Broker1, both consumers will each receive the same number of messages.

A networkConnector is unidirectional by default, which means that the broker initiating the connector acts as a client, forwarding its subscriptions. Broker2 in this case subscribes on behalf of its consumers to Broker1. Broker2 however will not be made aware of subscriptions on Broker1. networkConnectors can however be made duplex, such that subscriptions are passed in both directions.

So let’s take it one step further with a network that demonstrates why it is a bad idea to connect brokers to each other in an ad-hoc manner. Let’s add Broker3 into the mix such that it connects into Broker1, and Broker2 sets up a second networkConnector into Broker3. All networkConnectors are set up as duplex.

This is a common approach people take when they first encounter broker networks and want to connect a number of brokers to each other, as they are naturally used to the internet model of network behaviour where traffic is routed down the shortest path. If we think about it from first principles, it quickly becomes apparent that is not the best approach. Let’s examine what happens when a consumer connects to Broker2.

  1. Broker2 echoes the subscription to the brokers it knows about – Broker1 and Broker3.
  2. Broker3 echoes the subscription down all networkConnectors other than the one from which the request came; it subscribes to Broker1.
  3. A producer sends messages into Broker1.
  4. Broker1 stores and forwards messages to the active subscriptions on it’s transportConnector; half to Broker2, and half to Broker3.
  5. Broker2 stores and forwards to it’s consumer.
  6. Broker3 stores and forwards to Broker2.

Eventually everything ends up at the consumer, but some messages ended up needlessly travelling Broker1->Broker3->Broker2, while the others went by the more direct route Broker1->Broker2. Add more brokers into the mix, and the store-and-forward traffic increases exponentially as messages flow through any number of weird and wonderful routes.

Very bad! Lots of unnecessary store-and-forward.

Fortunately, it is possible to avoid this by employing other topologies, such as hub and spoke.

Better. A message can flow between any of the numbered brokers via the hub and a maximum of 3 hops (though it puts a lot of load onto the hub broker).

You can also use a more nuanced approach that includes considerations such as unidirectional networkConnectors that pass only a certain subscriptions, or reducing consumer priority such that further consumers have a lower priority than closer ones.

Each network design needs to be considered separately and trades off considerations such message load, amount of hardware at your disposal, latency (number of hops) and reliability. When you understand how all the parts fit and think about the overall topology from first principles, it’s much easier to work through.

Update (July 2014): To get a better understanding of what’s going on under the covers, check out the details of how network connectors work.

Why wrong licenses kill good products

Being on the ODBMS train of thought, I checked out a few products that are out there, and something struck me – all of them would be really difficult to bring in to a project. The reason is not technical, but rather one of hassle. All of the products are either proprietary closed-source or GPL dual-licensed. These licensing models actively discourage people from bringing them into the enterprise.

Commercial software is notoriously hard to get approved, and there is generally a long-winded process that has to be followed so that the bean-counters can say “can’t you use an open-source alternative”. GPL dual-licensed stuff is almost as bad! Most of the places I have worked have a blanket ban on GPL software (excluding GNU-type tools which generally sneak under the radar in a server installation). Financial companies in particular are almost paranoid about this. Anything that’s not Apache, or LGPL has to go to the legal department, who will take ages only to eventually say “we’re not comfortable with this”.

Licensing is one of the main reasons I conciously jumped ship from the Microsoft world to Java. I found it really painful to get the most basic library approved for use. Who cares that tool X it would save me a week of development – it cost $50! Then I jumped into Java and it was like a blessing. You need a rules engine? Workflow? Plugins for XML for your IDE? No approval process – no worries! My life became much, much easier. I could just dowload what I required and get on with doing what I needed to do.

There are always exceptions. MySQL springs to mind. The fact that it is standalone made people a bit more comfortable using it without the specter of viral licensing rearing its ugly head. ODBMSes are not like a normal database. They integrate directly into your code and distribution to your server means that they are a core part of your application. I suspect that GPL may be triggered at this point because of the “distribution clause”. I have read the exact opposite on the DB4O forums, which say that server stuff is OK to use under GPL, but distibuting client (desktop) software that uses the ODBMS forces GPL on your application.

Hmm. This is a pretty murky line, and one that I’m not comfortable with. What about if you “distribute” your server software around a cluster? I’ll be happy to put my hands up and just throw this in the “too hard” basket. If I can’t advocate that a piece of software has no legal repercussions on the ultra-proprietary system that I’m writing, I will lose no sleep whatsoever looking somewhere else. I suspect that I’m not the only one. It seems that there is a fine line between external dependency and a core library that is part of the application.

I appreciate why dual-licensing is there, and acknowedge that it’s a valid business model – for certain types of applications. However, you just can’t see it working everywhere. How far would Spring have gone if it was GPL dual-licensed? You only have to look at ExtJS to see how well-received frameworks or libraries are when it becomes too hard to justify their use.

Most large organizations in my experience are more than happy to buy support for LGPL stuff. Once it’s in there, someone wants the reassurance that there will be someone there to bug in case of difficult issues or simply to point the finger at (depending on the company culture). Dual-licensed stuff won’t even get it’s foot in the door – not even a sniff at success. I would really like to use an OO DB in a real-world setting, but it just seems like it would be far easier to get around the hassle of getting it approved if I just used something that’s not quite a 100% fit for the use case. That’s a real shame, and the software engineer in me just sighs deflatedly as the realist wins again in the face of pending deadlines.

Maybe that’s why so many people put up with the object-relational mismatch instead of trying a different approach. In the world I live in, an administrator’s “no” outweighs a technical “yes” every time.

This post is not intended to create FUD, just to outline why uncertainty may be enough to stop someone working with what may otherwise be brilliant stuff.

Home Cooked vs Open Source. Or, Don’t Build Your Own Workflow.

First thing’s first. I love open source. I think that it’s the best thing since sliced bread. That thing that we were always told about since computer science, that of the open marketplace for components to be shared and reused HAS happened. Just not in the “buy this billing component” kind of way. It’s even better! It’s free (ish)! You download what you want and plug it in to your application. The quality varies, but if you keep your ear to the ground and do your homework, it will save you a lot of time. And time is money.

But just how much money?

I recently discovered Ohloh . It’s like professional networking, but not exactly, and it’s for open source. It has some cool features, but the one that got me instantly was that it trawls through open source repositories and gives you some very cool stats. Like just how much effort it would take to build an equivalent version of something, and how much it would cost given a yearly salary for a programmer. It uses lines of code, which are not a good metric, but rather an OK litmus test.

Another one of my favourite sites is Java-Source.net. Pick a particular category of software that you need, and it lists you a suite of open source options in Java – ready for you to do with as you will.

So let’s pick a category. Here’s one that I prepared earlier – workflow engines. Pretty much every place that I have ever worked has rolled their own in this regard. For some reason it’s considered a low hanging fruit (even though people write postgraduate theses on them). In some cases, off the shelf offerings are seen as overkill, too restrictive or just too complex. Most of the time the analysis is little more than gut feel, a kind of “Hmm… looks too hard, must have been over-engineered.” So a couple of guys get together and do a “bake at home” version. Pretty soon, the reality takes over and it doesn’t do what it’s supposed to, the use case was misunderstood, you need a management console, version control of process flows, different flows in different environments… uh oh! Suddenly, you spend a lot of time maintaining this beast.

So let’s do the stats and take the first handful of products listed on both sites (there are many others). These vary in the amount of activity going on, have quite varied features and uses – some plug in to applications, others are standalone orchestration engines. It’s not exactly scientific, but it’s interesting for illustration purposes. Let’s take the average programmer salary as $55000 (dollars, euro, pounds – it’s all about the sameish worldwide, so doesn’t really matter):

LOC = Lines Of Code

  1. Apache ODE. 108,547 LOC, 55 Person Years, $1,498,221.
  2. Taverna. 134,334 LOC, 33 Person Years, $1,832,157.
  3. jBPM. 286,618 LOC, 74 Person Years, $4,081,422.
  4. Enhydra Shark. 255,101 LOC, 65 Person Years, $3,576,525.
  5. OpenSymphony OSWorkflow. 48,303 LOC, 11 Person Years, $627,203.
  6. ObjectWeb Bonita. 67,916 LOC, 16 Person Years, $894,118.
  7. OpenWFE. 187,176 LOC, 47 Person Years, $2,608,592.
  8. WfMOpen, 152,557 LOC, 38 Person Years, $2,084,413.

The average cost of building a workflow engine?

155,069 LOC. 42 Person Years, $2,150,333.

Once again, this is completely unscientific. I have no idea whether the cost is over the lifetime of the product or initial development cost, whether management costs are included, we aren’t comparing apples with apples, and these are general purpose engines rather than a thing that does only the thing you want. But it does make for a very interesting question. Have you got the time and money to do this, or would you rather get on with the business problem at hand?

Enterprise software is a complex business. There are no shortcuts. There are no easy decisions. The landscape changes all the time. You have to weigh up support costs, training, extensibility, maintenance and skills.

My take on it? Someone did the heavy lifting already. Do yourself a favour and take advantage of it. If it doesn’t do exactly what you want, then the code is right there to change. You can always contribute it back, and if it’s good enough then it becomes the property of the community. The numbers are compelling.

Client-side web development evolved

It’s finally time to say goodbye to my trusty Venkman debugger for Firefox. My old friend has served me well for Javascript development, but I have found a new, better tool: Firebug. Javascript, DOM, XHR debugging, profiling, viewing, command line interface, profiling. Mmm…

Java PC Emulator/Open Source Goodness

After the recent announcement by Dell that they were going to be selling Linux-based desktops and laptops, I decided to check out their site to see what’s on offer. For some strange reason the open source laptops they are selling don’t come with Linux at all (?) but with FreeDOS. I had never heard of it, so I googled it, and a few clicks later came up with a nifty Java-based PC emulator that runs FreeDOS.

So what?

I present Lemmings, Commander Keen and Prince of Persia in your browser! They’re on this virtual PC’s C: drive.

http://www.physics.ox.ac.uk/jpc/Demo.html