Jakub Korab
Tech, Opinion, and Doing Stuff

Bored with software?

July 23rd, 2010

What’s interesting right now in software isn’t the new shiny thing. We already have the tools to do most of what we want. What’s interesting is scale and change.

You build a system. Then you realize you need to break out and share functionality via modules. Then you want to manage them independently in live environments. And not take the system down. And have the old transactions finish on the old code while the new work hits the new code.

You build logic. It grows to the point where your original hand crafted solution is too unweildy. You need a rules engine, or workflow. Your code needs to keep running. A rewrite is not an option. Rework, refactor, augment, migrate. But don’t break what’s there.

You just wanted to integrate to that one external system. Web services behind a facade. Now another, this time via messaging. All of a sudden it’s 12. Integration framework? ESB? You’re in a cluster, shared network memory, processes that can only run in one place at a time. What’s the last straw, the tipping point to your next upgrade? Where to from here?

That’s what’s interesting.


Filed under: software engineering | No Tag
No Tag
July 23rd, 2010 16:27:19

Get Functional

November 21st, 2009

That was the message that was coming through the Devoxx conference presentations this year. The idea that it will help your code run in the brave new world of multi everything (multi-core, multi-thread etc.) is one that’s widely touted, but rarely the primary driver for its use. Instead, it’s about less code, that’s more easily understood. When you do get to scaling it, it won’t do any harm either.

As Guillaume Laforge tweeted, from 800 Java developers in his session, only 10 knew/used Scala, 3 Clojure, 20 Ruby, and 50 were on Groovy – which gives a nice gentle introduction to some of the constructs for those looking to wade in. Good stats to cut through they hype. So what of the roughly 90% slogging on without closures, does this mean that they have to miss out on this fun?

Quite simply, no. There’s heap of drop in libraries that you can add into a Java project for all manner of functional goodness, and which don’t change the syntax of the language. LambdaJ for example gives a nice functional way of dealing with collections. To steal an example directly from the website, the following typical Java code:

List<Person> sortedByAgePersons = new ArrayList<Person>(persons);
Collections.sort(sortedByAgePersons, new Comparator<Person>() {
        public int compare(Person p1, Person p2) {
           return Integer.valueOf(p1.getAge()).compareTo(p2.getAge());
        }
});

is replaced with:

List<Person> sortedByAgePersons = sort(persons, on(Person.class).getAge());

Fancy a bit of map-reduce without a grid? Well, it comes stock-standard with the Fork Join (JSR166y) framework that will be added to the concurrency utilities in JDK 7. If you don’t fancy waiting until September 2010 (the latest expected date for the GA release), it’s downloadable here. As an aside, Doug Lea has written a really good paper on the FJ framework.

Don’t fancy loops in loops in loops to filter, aggregate, do set operations with all the null checking that Java programming typically entails? Well, the Google Collections library (soon to be integrated into Guava, a set of Google’s core libs), contains predicates and transform functions that make all of this a lot easier to write and reason about. Dick Wall had a great presentation about this showing just how much code can be reduced (heaps).

A thing I heard a number of times outside the sessions was, “I don’t know about all this stuff, surely as we get further from the metal, performance suffers”. Sure, it gets harder to reason about timings as the abstractions get weirder, but the environment gets better all the time, and the productivity gains more than outweigh performance in all but the most perf-intensive environments. Brian Goetz spoke about how the JVM supports this new multi-language world. Not something that I had ever really given much thought to, but the primary optimizations aren’t at the language compiler level (javac, scalac, groovyc etc.)- they’re are all done at runtime, when the JVM compiles the bytecode. The number of optimizations in HotSpot are massive (there was a striking slide showing 4 columns of individual techniques in a tiny font). Multiple man-centuries of effort have gone into it, and each new release tightens it up. If you’re not sure, then profile it and make up your own mind. JDK 7 will also see the VM with some goodness that will make dynamic languages really fly.

One thing that still sticks out like a sore thumb is Closures support in Java. It’s not a candidate for inclusion in JDK 7, and the proposed syntax shown at the conf by Mark Reinhold looks pretty ugly when compared to other langs (see the proposal by Neal Garter). Either way, not a sniff of actual implementation. I understand there’s some serious work on the VM to make any of this possible regardless of the syntax. Not holding my breath. [Closures will actually be in JDK7 - thanks Neal.]

All up, I’m pretty excited by all this, and can’t wait to get my hot little hands on some of these tools. The functional style yields code that’s much easier to read and reason about, and the fact that it’s essentially all Java syntax, means that there’s no reason not to apply it. If you’re already comfortable with using EasyMock on your team, you won’t find it a huge mind shift.


Filed under: conference, java, software engineering, tools | No Tag
No Tag
November 21st, 2009 16:08:35

The Church of the One True Language

January 16th, 2009

I stumbled upon an interview from JAOO 2007 with Joe Armstrong and Mads Torgensen discussing Erlang, concurrency and program structure (objects versus interrelated processes). It was really interesting to see how similar yet different their points of view were. I’m not going to paraphrase, as it’s worth listening in on it.

Two points came out the conversation that are worth talking about – the fallacy of the silver bullet language, and the right tools for the future.

The premise of The Church of the One True Language goes something like this: ”you can do anything in my language”. Write servers, build databases, write accounting apps etc. But if you think of languages having their sweet spots and use cases, much like libraries, that pretty much falls apart. I for one, wouldn’t want to be writing hugely parallel software in Java, just like I wouldn’t be writing web services in Erlang. Sure it’s doable, but probably not the best way to go about it. So it makes sense, that unless you want to be working in the one problem domain for the duration of your career it pays to diversify. Right tool for the job and all that.

This leads me to something that I have been thinking about for a while. The multi-core era is upon us, and we don’t have the right tools for the job.

OO programming makes it easy to design by component, and organise and compose the pieces to desired effect. The problem is that those same concepts break down when you think of system-level services like threads, and the interaction of your pieces with the platform. Should a thread really be an object that can be controlled by the programmer? I think probably not.

Writing multi-threaded software is really hard. After reading Java Concurrency In Practice, I realised the nuances of just how hard it really is, and how easy it is to do the wrong thing. Even really smart people get it wrong. The core of the problem is shared mutable state, and any language that does not sufficiently separate the effects among threads can, and probably will, end up doing the wrong thing. Erlang’s message passing model is quite cool in that it separates processes, yet it falls over on the front of modelling entities and the relationships between them. Not surprising given its design philosophy.

This seems to be the crux of the problem – the next generation of apps will have to deal easily with breaking up problems in an easily concurrent manner, but at the same time model the world in “this object is a bank account that belongs to that guy over there” abstractions that we have become used to thinking in. Those abstractions seem to be at odds with each other using current development paradigms. You can stick Actor libraries on existing languages, but they still don’t make it impossible to mess things up. This is ultimately what the next-gen programming environment needs to address. It should be really difficult to mess things up. Threading should be as though of like garbage collection in a modern VM (i.e. you should know how it works in case things go wrong, but can pretty much depend on it to work correctly the rest of the time).

Maybe the correct approach isn’t to duct tape these concepts together in one syntax, but rather to have different abstraction in a language that model each world-view seperately. This would be a bit like using floor plans in combination with elevations in building design. Or, for that matter, class and sequence diagrams in UML. Both represent a facet of the whole, but neither is fully complete on it’s own.

Either way, it pays to diversify. Languages have their own particular sweet spots, and problems that they address well. Even if we do manage to marry an object view of the world with transparent multi-threading, that willl just highlight a different class of problems that cannot be easily solved using that approach.


Filed under: software engineering | Tags:
January 16th, 2009 21:32:27

A fire-side chat about programming

December 10th, 2008

Every once in a while I go through a period of introspection where I pose questions like “why am I solving the same stuff all the time?”, “is there a better way to be doing this?” and “what’s around the corner?”. I think it’s pretty healthy, and I prefer to give it a good two weeks of thought straight rather than to constantly be going through that process (which I find pretty distracting at the 10k foot level). As part of that I have been reading an awesome book in the last week called “Secrets of the Rock Star Programmers“. It’s a collection of interviews with some of the biggest/loudest names in programming, and contains the sorts of conversations that you would have down at the pub with these guys. I think that it’s quite an introspective, passing-on-wisdom type of book in the vein of “The Pragmatic Programmer” (TPP), but for the Java/.Net generation. Unlike TPP, it covers subjects around the meta-level stuff like keeping up to date versus trend chasing, and work-life balance amongst the day-to-day grind of pending deadlines. The really interesting thing is the common threads coming out despite the personalities and differences in approach. The book’s style is very different to TPP’s in that it is not prescriptive, but rather lets you draw your own conclusions. It has been an interesting read that I think I will keep coming back to, and one that I think I would not have gotten as much out of at the beginning of my career. I strongly recommend it, especially if you happen to be going through a “so, what’s it all about, then?” stage and don’t happen to have your favourite rock star around to chat to.


Filed under: books, software engineering | Tags: ,
December 10th, 2008 17:20:26

Be a Better Developer

November 18th, 2008

I came across 91 Surefire Ways to Become an Even Better Developer while loooking for programming resources similar to Project Euler (the best way to learn a new language). Dozens of links and ideas when you feel that work is not stretching the brain as much as it could. My favourite? Get your boss to get you a massage.


Filed under: software engineering | Tags:
November 18th, 2008 21:52:55

What can you learn from the guys at Google?

March 30th, 2008

Anyone whose coding work tends to lean towards the more advanced or low-level should check out Google Code University. Topics covered in this series of presentations include language corner cases, web security, distributed systems and AJAX. Good stuff, worth taking a look at.


Filed under: software engineering | Tags:
March 30th, 2008 21:54:01

Poorly Formatted Code Costs You Money

January 29th, 2008

After nearly 10 years of working on complex systems I think I have nailed down why poorly formatted code annoys me so much. It wastes time. Complex logic requires whitespace in order for the reader to make sense of it in the same way that punctuation is used in sentences. If the whole thing looks like a dog’s breakfast, it makes it more difficult to understand.

When a person approaches poorly laid out code, they have two choices:

  • battle through it
  • clean it up and make sense of it

The first one results in an exercise in frustration, the second… well, that’s a beast unto itself.

A long time ago, in my first job, I was working for a consultancy at a major telecoms company on a very large system. The system was used to activate telecoms products on individual lines and talked to telephone exchanges across the network. The project was in its tenth year and had fallen into a steady routine of releases. Regression tests had been written years earlier, but had long since fallen by the wayside. A new project manager came in with an agenda of improvement, and the process to get the tests running again began in earnest.

I was given the task of redeveloping telephone exchange simulators that the tests made use of. These Perl daemon servers would listen on a pipe, take some text in, interpret it and spit out what was expected of an actual telephone exchange of the appropriate manufacturer and version.

I had never worked on anything like this before and asked whether there was any documentation. The response as I remember it was “Bwahahaha! Documentation?”. OK, maybe not quite that dramatic, more along the lines of… “Nothing concrete but it has a lot of comments”.

Understatement of the millennium. Just a sample:

$i++; # add 1 to the value of i

Apparently, the project had taken on a contractor years previously who wasn’t particularly good. Rather than getting rid of him (they didn’t care as they were being paid by the hour for the bum on the seat), they got him to comment the code. Obviously annoyed that he was being sidelined, he commented every single line out of pure spite.

The code wasn’t great to begin with, but in this state it was unreadable! The first thing I did was strip out the redundant comments (some 20,000 lines worth) and checked in a clean copy. The next day one of the senior programmers and the version control manager gave me a a very stern talking to!

It seems that even though no one could argue with my intentions and everyone agreed that it was the right thing to do, it played havoc with the merge tracking. Everything had changed and it would now be impossible to see what my actual code changes were!

The same issue arises with non-standardized code. On a large project there are a lot of people working against the same code base. Some will be good, others not so much. Everyone iterates through each others classes, making changes as is warranted. Now imagine the scenario above, but with numerous people working on various branches of code that all have to be merged back together.

Your programmers now have the same choice:

  • do they slowly battle with illegible code in dealing with the task at , or
  • do they reformat and take up someone else’s time as they struggle to work out what of the multiple versions needs to go into the final release?

Not a pretty choice. But there is hope!

Actually apply a coding standard.

Give anyone who does not apply it a good talking to. You could establish one using current naming structures, layouts, consensus etc. But you will probably end up making life more difficult for yourself. Getting code formatters to behave just the way you want to and then getting those changes out to everyone on the team takes time, and in a project situation, that’s a rare commodity.

Using Java? Use the Sun standard. ALT-SHIFT-F will automatically format Java code to it by default in Netbeans, and CTRL-SHIFT-F does the same in Eclipse. Weird naming conventions are great for your pet project, but just use the defaults in real life. Personal preference has little relevance in reality. The curly brackets debate happened a long time ago, and no one won. I have used Jalopy in the past as a custom formatter where some weird conventions were dictated. Even though it was supposedly a standard, I realized that few other on the project team did the same, because it took too much time to set up and they didn’t know what the big deal was anyway… *sigh*

Use standard coding conventions. Keep a close eye on anyone who checks in nonsense because poor formatting is often an indicator of poor quality code in other ways, and it will take time to clean up their mess. Time that could be better spent bringing your project in on budget.


Filed under: software engineering | Tags:
January 29th, 2008 13:24:48

Unit Testing the Database Tier

January 25th, 2008

Unit testing database code is a bit of a funny problem. Most developers can pretty easily get their heads around unit testing a piece of Java code using interfaces and mock objects. When it comes to database code or DAOs, it suddenly becomes particularly difficult. But why, what is so difficult about testing stuff against the database? Surprisingly enough, the answer is that it has nothing to do with coding or a particular framework, although these do play their parts. It comes down to a complex web of human interaction, version control and managing environments. Let me explain.

The standard unit test has three basic phases:

  • Setup (@Before)
  • Test (@Test)
  • Tear down (@After)

The first sets the test environment into an expected state, the second runs the test and checks that the outcome is as expected, while the final one clears up any test resources.

How does this relate to database testing? Let’s say that we have a DAO that performs a particular select statement. Our test should be to retrieve a particular number of records from a known set. Easy enough. The precondition of course, is that you have a known set to begin with.

It’s ALL about the environment.

Most large development projects go like this: The database guys update the schema. The developers write the code. The developers need a particular data set to exercise the various use cases so they add it to the schema. It all becomes a bit messy.

Eventually, very complex data sets are set up by everyone concerned in a primary schema that keeps getting updated. The database schema generally is not version controlled, as it is constantly being redefined using DDL statements run by the DBAs. Most of the time you will be lucky to get a backup of a schema, with all of the data truncated, as the schema and supporting code (i.e. the application) moves between environments.

Getting back to the test. You set up your data by hand in the master schema so that there were three items in the widgets table where some condition was true. You write your test, it runs against the schema, pulls out the expected three widgets and everything is great. You check in the tests. A week later your colleague, Bob, adds another widget to satisfy his test condition. Your test all of a sudden returns 4 items and the test breaks.

Of course, Bob didn’t actually run your test because he was too busy with his own and the test suite isn’t clean anyway because everyone is falling over each other.

Sound familiar?

What about inserts? The precondition: no sprockets were in the table, the test: insert a sprocket, the postcondition: a sprocket is in your table. Kind of hard to test under the above conditions isn’t it? For one thing, the exact data of the test sprocket may be in the table, so checking by value may give you false positives, while deleting it may get rid of more records than you wanted. What about concurrent tests? With a group of developers running the same tests, they start tripping over each other very quickly and the whole effort becomes an exercise in frustration. At this point the development manager throws up his hands, says that this automated testing thing is a load of bollocks and to get back to your work because they didn’t deal with all this when he was doing VB. Somewhere else, Kent Beck sheds a tear…

Let’s examine what goes on in Ruby on Rails. One of the best ideas that was popularized by this framework was its method for database unit testing. A developer’s workspace has multiple environments by default – development, test and production. You develop against the development schema, designing table structures, and playing with the user interface to your heart’s content. When you run unit tests, the following happens – the schema from development is copied into the test database with no data in it. The framework imports version controlled sets of test data (saved as YAML files) into this new schema. Whenever a test is run, it is guaranteed that the database will be in this state. Any changes a test makes are visible only within the scope of this one test. The tear down step cleans out your changes. This makes life so much simpler, especially if you have been working in the nightmare scenario above.

So how do we get the same sort of effect in a corporate development environment?

You need multiple database schemas in order to unit test your db code.

Pause and re-read that line. It’s not negotiable. Probably two per developer. One with sample data to use while you work on the user interface. The other, a temporary one for unit testing. A whole development team using the one schema does not work. Most projects do it, but that doesn’t mean that it’s a good idea.

Some suggestions for how to manage this. The DBAs have their own schemas. The full DDL for the database is kept in version control. After each change, the full database DDL is dumped and checked in. No UPDATE TABLE statements. Ever. This way you are guaranteed that if you ever want to get a baseline of your system, you can also rebuild the database as it existed at this time. I worked on a very large telecoms project with a huge development team, and this worked. Well.

The test data for your environments is stored in version control – at the very least, as dumps of insert statements. For unit testing purposes, a dedicated unit test framework is beneficial. DBUnit performs the same task in Java as described above for Rails – it loads test data from dumps (a number of formats are supported), and guarantees that the test database exists in the expected state when each test is run.

To test your database code, refresh your test schema with the one from version control – typically using your chosen build system. Ant tasks are generally pretty good for this. Now run your test cases. Gorgeous! No tripping over other people, and your tests are guaranteed to work the same each time. No excuses for a red bar.

So why is unit testing databases so difficult if it doesn’t have to be? Most of the time it involves process change and getting out of bad habits, not just a tool. And change means convincing people. Generally, managers do not understand what benefit there is in multiple database schemas, as it is seen to increase complexity and therefore risk, and DBAs like to have full control over what is going on on their servers. The topic of databases and processes is also a great one for religious zeal.

The process outlined above should explain the hows and whys to the individuals involved. The changes above mean a little bit more setup initially, but a saner development process.

A nice side effect is how easy upgrading databases through your environments can become. Run the latest DDL against a fresh schema, get the differences between it and your target environment using a database compare tool, and fire it off. Beautiful.


Filed under: architecture, java, software engineering, testing, tools | Tags: , ,
January 25th, 2008 10:14:38

Home Cooked vs Open Source. Or, Don’t Build Your Own Workflow.

January 21st, 2008

First thing’s first. I love open source. I think that it’s the best thing since sliced bread. That thing that we were always told about since computer science, that of the open marketplace for components to be shared and reused HAS happened. Just not in the “buy this billing component” kind of way. It’s even better! It’s free (ish)! You download what you want and plug it in to your application. The quality varies, but if you keep your ear to the ground and do your homework, it will save you a lot of time. And time is money.

But just how much money?

I recently discovered Ohloh . It’s like professional networking, but not exactly, and it’s for open source. It has some cool features, but the one that got me instantly was that it trawls through open source repositories and gives you some very cool stats. Like just how much effort it would take to build an equivalent version of something, and how much it would cost given a yearly salary for a programmer. It uses lines of code, which are not a good metric, but rather an OK litmus test.

Another one of my favourite sites is Java-Source.net. Pick a particular category of software that you need, and it lists you a suite of open source options in Java – ready for you to do with as you will.

So let’s pick a category. Here’s one that I prepared earlier – workflow engines. Pretty much every place that I have ever worked has rolled their own in this regard. For some reason it’s considered a low hanging fruit (even though people write postgraduate theses on them). In some cases, off the shelf offerings are seen as overkill, too restrictive or just too complex. Most of the time the analysis is little more than gut feel, a kind of “Hmm… looks too hard, must have been over-engineered.” So a couple of guys get together and do a “bake at home” version. Pretty soon, the reality takes over and it doesn’t do what it’s supposed to, the use case was misunderstood, you need a management console, version control of process flows, different flows in different environments… uh oh! Suddenly, you spend a lot of time maintaining this beast.

So let’s do the stats and take the first handful of products listed on both sites (there are many others). These vary in the amount of activity going on, have quite varied features and uses – some plug in to applications, others are standalone orchestration engines. It’s not exactly scientific, but it’s interesting for illustration purposes. Let’s take the average programmer salary as $55000 (dollars, euro, pounds – it’s all about the sameish worldwide, so doesn’t really matter):

LOC = Lines Of Code

  1. Apache ODE. 108,547 LOC, 55 Person Years, $1,498,221.
  2. Taverna. 134,334 LOC, 33 Person Years, $1,832,157.
  3. jBPM. 286,618 LOC, 74 Person Years, $4,081,422.
  4. Enhydra Shark. 255,101 LOC, 65 Person Years, $3,576,525.
  5. OpenSymphony OSWorkflow. 48,303 LOC, 11 Person Years, $627,203.
  6. ObjectWeb Bonita. 67,916 LOC, 16 Person Years, $894,118.
  7. OpenWFE. 187,176 LOC, 47 Person Years, $2,608,592.
  8. WfMOpen, 152,557 LOC, 38 Person Years, $2,084,413.

The average cost of building a workflow engine?

155,069 LOC. 42 Person Years, $2,150,333.

Once again, this is completely unscientific. I have no idea whether the cost is over the lifetime of the product or initial development cost, whether management costs are included, we aren’t comparing apples with apples, and these are general purpose engines rather than a thing that does only the thing you want. But it does make for a very interesting question. Have you got the time and money to do this, or would you rather get on with the business problem at hand?

Enterprise software is a complex business. There are no shortcuts. There are no easy decisions. The landscape changes all the time. You have to weigh up support costs, training, extensibility, maintenance and skills.

My take on it? Someone did the heavy lifting already. Do yourself a favour and take advantage of it. If it doesn’t do exactly what you want, then the code is right there to change. You can always contribute it back, and if it’s good enough then it becomes the property of the community. The numbers are compelling.


Filed under: architecture, java, open source, software engineering | Tags: , ,
January 21st, 2008 22:57:01

Visio 2003 UML is The Bomb

August 23rd, 2007

I have worked with a number of different employer-provided UML tools in the past and have often been left underwhelmed. Rational Rose is a complex memory-hogging beast, ArgoUML seems clunky (although I’m happy to work with it at home since it’s free), and older versions of Visio have needed Pavel Hruby’s stencil to provide good, if fairly basic UML support.

Which is why I have been so pleasantly surprised using Visio 2003’s native UML Model Diagram – it has the CASE like features of the others (Java code generation excluded – surprise, surprise) and it “Just Works”. Within half an hour I was happily churning out structure and collaboration diagrams. The defaults are pretty intuitive (4/5) and standard actions such as moving methods between classes/interfaces, and repackaging classes are as simple as drag and drop. Change the structural details of your classes in the Model Explorer and all your diagrams update just the way you expected them to.

Java support is non-existent out of the box (it supports VB, IDL, C# and C++), but the C# native types are close enough that I’m not that fussed.

Credit where credit is due – any tool that makes me this productive also makes me very happy.


Filed under: software engineering, tools | No Tag
No Tag
August 23rd, 2007 16:22:00