May 8

From the Pimp My Build session by the Atlassian guys.

  • Use Ant imports. The imported stuff can check for preconditions and fail cleanly using the <fail unless=”…”> tag.
  • Use macros.
  • Don’t build stuff you don’t need using the <uptodate> task. Use <outofdate> from ant-contrib, which is even better.
  • You can use audio snippets to tell you when you screw it all up :)
  • You can filter messages in builds using the Unix shell to notify you of actually important stuff rather than the standard boiler plate.
  • Don’t be afraid to write tasks - everyone should know how the build works. Don’t be precious about it. If you have repetitive tasks, why not script it?
  • Use scripts. You can embed Javascript directly into your Ant build via a <[CDATA[..]]> block
  • Use conditional tasks (ant-contrib) <if> <then> <else>
  • Don’t do one-off analysis. PMD, Checkstyle and Findbugs can be scripted! I found this to be particularly useful. Much easier to find issues, especially if coupled with continuous integration.
  • Document your build! Ant targets have descriptions. You do it with your code, why not your build artefacts? Use the -target_name convention for private targets.
  • Use continuous integration. This has been an absolute life changing thing for me as a developer.
  • Test in your builds!!! JUnit, TestNG et al.
  • Maven tips:
    • Use a remote repository proxy - caches are good (Apache Archiva). Helps performance and stability - make sure you can run when the net goes down.
    • Create a local repository for private artifacts
    • Local repository for public artifacts - third party Jars or commercial stuff not available in public repositories

Oh yeah, Ivy is good when you aren’t using Maven.

Apr 24

Everything is finally booked and I am looking forward to hitting the shores of San Francisco next weekend. The lineup looks really good and I’m still having difficulty choosing between the sessions. I will also be at CommunityOne, which looks outstanding for a free event. 14 tracks!? Amazing! Hats must go off to the organizers. The only session that I have firmly fixed is on Java User Groups, but I have no doubt that the rest of the schedule will work itself out with ease.

I hope to be blogging live (ie. unedited notes and opinion) while there, but that all depends on how the laptop batteries manage to hold out. Fingers crossed!

All that and I get to sample Delta Airlines’ world famous hospitality too ;)

Apr 16

I finally got around to booking in onto some of the tech sessions for JavaOne in San Francisco next month. Gasp! The amount of stuff going on is incredible. From new languages on the JVM (Fortress, Scala, JRuby) to SOA, mobility and techniques in app development it’s pretty easy to book up 12 hours a day. My approach, lock in a full programme of stuff that looks good, and turn up if the brain is still functioning. It has taken the better part of an hour to read through the sessions for the Tuesday, so it’s no easy task. I can’t wait. All I have to do is get around to sorting out the minor detail of a flight from London ;)

Mar 20

Isn’t it always the way, when you want to blog other stuff comes up? I had intended to write up a final post about the last day of Tech Days, but the weather has been great to get the kite out and the holiday is winding down so…

Day 3 was pretty cool, as I went to a few tech sessions related to stuff that I don’t normally work with, as I do web apps most of the time. The Netbeans sessions were pretty good, with a great demo of the Matisse GUI Builder. I think that with Netbeans 6, Java has finally got it’s answer to the VB/Delphi mode of development. The introduction of the Swing Application Framework (JSR 296) and Beans Binding (JSR 295), really takes away a lot of the grunt work in building small to mid sized desktop apps, and Netbeans does a great job in hiding a lot of the initial application setup code. It’s really nice stuff, and to be honest it really drops the barrier to entry. At some stage you will inevitably need to get into the bowels of Swing, but Matisse gives you a great leg up and means that the learning curve can be that little bit easier. The fact that basic CRUD type applications are pretty well automatically generated is a huge help and lets you get down to doing the interesting bits.

I had the pleasure afterwards to turn up to Jim Weaver’s presentation on Java FX that give a great overview of how the technology worked from an architectural perspective. The user interface is defined using FX Script, which has a weird nested CSS-ish feel to it and is used to define your interface, event handlers and UI transitions. This is then compiled down to a Java app. The apps themselves are distrubuted either as applets (remember those?) or via Webstar/JNLP and talk to the home server via JSON invocations, which means that anything can support the interface on the server side. It would be cool to have a play with sticking a Grails app on the back. Nifty stuff.

The last session was no less interesting, as I am finally getting my head around this ESB stuff! I’ve always found the concept a bit esoteric, not having worked in an environment that uses a bus and it’s not something that lends itself easily to kicking the tires. SOA initiatives that I have worked on in the past involved point to point hooks, but I can really see why the ESB concept might come in handy. It’s very easy to get bamboozled by talk of federation, mediation and orchestration. Essentially the idea is pretty simple - hook up everything to a massive pipe, define standard messages and worry only about communication with the pipe itself. The devil, as in any such thing, is in the details - but essentially the pipe handles things like transactionality, message delivery, data transformation, enrichment, routing and the like through underlying mechanisms. You need to understand how to use the specific pipe in question, as with any such piece of infrastructure, but the payoff looks really good. I have not yet come across a decent guide in layman’s english (not a marketecture white paper) as to how to get everything humming, but I feel like the pieces are falling into place.

Winding down the Australia trip this week for my migration to London. Back to reality - CVs, agents, company setups, finding apartments and poms ;) All I have to deal with is a flood of contractors on the market because of the sub-prime debacle (wasn’t Basel2 supposed to make sure this nonsense wouldn’t happen?) and the April budget rounds. Bring it on!

Mar 5

Beautiful day in Sydney.

I came out of this morning’s Tech Days session on Java ME applications with a whole bunch of questions - they’re much more fun than answers.

The latest version of JME now contains heaps of APIs for everything from geolocation to bluetooth and is supported by millions of mobiles, and will continue to be so. The implementations are open sourced through phoneME (CDLC) and phoneME advanced (CDC). Anyone who has played with Java ME will soon realise that building apps is a real pain in the ass as every device supports different versions.

Now the Open Handset Alliance led by Google comes along with Android derived from Java but a different platform altogether, even though some of the java.lang libraries are supported. It lauds a bunch of features like geolocation and bluetooth (which are already part of JavaME). There’s a lot of feel-good talk about openness and freedom.

To me it seems that it only compounds the platform/version fragmentation issue and will become a problem to application developers who try to reach the largest possible market. This was an obvious concern at question time. People don’t know which horse to back here when starting out in mobile development. Chances are that app developers are having a bigger problem. Is this another HD-DVD/BlueRay scenario?

What features does Android provide that JavaME does not? On the surface to the casual observer they are almost the same.

But the big question here is why is OHA/Google going against the grain and building their own mobile platform?

Java ME use is still growing and expanding to new embedded devices like Sun Spots. Companies don’t invest millions on technology just for the fun of it. Technology uptake is painfully slow in the wild and even if every vendor dropped JME today in favour of Android, it would take years for it to get a majority of market share. Having said that, it may not be that Android’s end goal is mobile product or technology specific at all. Are Google et al attempting to force licensing change from Sun? Maybe Android set top boxes? It looks like a stepping stone that is part of a larger strategy.

Food for thought - the best thing to come out of conferences.

Mar 4

After half an hour of walking around trying to determine where exactly the convention centre IS in the Olympic complex (signage would be really nice), I finally managed to make it to the Melbourne satellite event of the Australian leg of Sun Tech Days. I love events like this; the interchange of ideas and pointers in new unexplored directions really get the mental juices flowing. I was a bit late, but still managed to catch most of James Gosling’s keynote.

The highlights for me were many, and they’re going to take some digesting.

James destroyed the “Java is slow” myth (anyone who still believes this hasn’t fired up a new JVM lately). Java runtimes are incredibly optimized with test results showing performance equalling or beating C/C++ equivalents.

  • Linpack -2%

  • Scimark + 4%

with GC being a lot faster than malloc/free.

The reason why dynamic compilation beats its static equivalent as the JVM is able to tweak performance depending on the processor type being used, even in the same type of architecture. This enables the JVM to take advantage of the strengths of AMD chips over Intel and vice versa.

There are lots of good things coming up in Java 7, both in the core and on the mobile. I will be detailing them a bit more once I get around to working out what all of the JSR numbers I scribbled down meant :P. Too fast with the old Powerpoint.

The question of RAD tools came up. James quantified it with a question – what exactly do you mean by rapid? Is it time to demo or time to production deployment? I had never really thought about this, but it does make sense. Java is being focussed on time to production. The reason for this blew my mind. Venture capital provides funding in 3 month lots only. In that time you need to turn an idea into a production grade system.

3 months from idea to production.

As developers we need to scout out the enabling technologies behind this kind of turnaround and work it into the toolbox. And enterprises need to have a good hard think about why they are not achieving similar results (and no, the answer is not to kill your programmers with 120 hour weeks to do it). This only highlights the discrepancy between small startups and large organizations. I have yet to see anything get put in production in less than a year in a large institution.

The other highlight for me was not what was presented, but rather what could be gleaned from the feedback questionnaire. This was one of those basic “who are you, what do you do and what are you using to do it?” numbers. It listed a whole bunch of technologies that Sun are presumably keeping their eyes on. The stand-outs? RoR, Groovy, Grails and Wicket. If you want to skill up on what’s going to be big on the job boards within the near future, these would be a very good start.

The things I’m looking forward to playing with as a result of this morning? JMaki – a super-framework and Netbeans plugin that glues of all the best AJAX frameworks together (mash-ups faster than you can say Dojo), and of course JavaFX – super sweet user interfaces done as simply as a web page.

Oh yeah, and watch out for an announcement in the very, very near future about PHP and its relationship to Java.

Tomorrow in Sydney!

Jan 25

Unit testing database code is a bit of a funny problem. Most developers can pretty easily get their heads around unit testing a piece of Java code using interfaces and mock objects. When it comes to database code or DAOs, it suddenly becomes particularly difficult. But why, what is so difficult about testing stuff against the database? Surprisingly enough, the answer is that it has nothing to do with coding or a particular framework, although these do play their parts. It comes down to a complex web of human interaction, version control and managing environments. Let me explain.

The standard unit test has three basic phases:

  • Setup (@Before)
  • Test (@Test)
  • Tear down (@After)

The first sets the test environment into an expected state, the second runs the test and checks that the outcome is as expected, while the final one clears up any test resources.

How does this relate to database testing? Let’s say that we have a DAO that performs a particular select statement. Our test should be to retrieve a particular number of records from a known set. Easy enough. The precondition of course, is that you have a known set to begin with.

It’s ALL about the environment.

Most large development projects go like this: The database guys update the schema. The developers write the code. The developers need a particular data set to exercise the various use cases so they add it to the schema. It all becomes a bit messy.

Eventually, very complex data sets are set up by everyone concerned in a primary schema that keeps getting updated. The database schema generally is not version controlled, as it is constantly being redefined using DDL statements run by the DBAs. Most of the time you will be lucky to get a backup of a schema, with all of the data truncated, as the schema and supporting code (i.e. the application) moves between environments.

Getting back to the test. You set up your data by hand in the master schema so that there were three items in the widgets table where some condition was true. You write your test, it runs against the schema, pulls out the expected three widgets and everything is great. You check in the tests. A week later your colleague, Bob, adds another widget to satisfy his test condition. Your test all of a sudden returns 4 items and the test breaks.

Of course, Bob didn’t actually run your test because he was too busy with his own and the test suite isn’t clean anyway because everyone is falling over each other.

Sound familiar?

What about inserts? The precondition: no sprockets were in the table, the test: insert a sprocket, the postcondition: a sprocket is in your table. Kind of hard to test under the above conditions isn’t it? For one thing, the exact data of the test sprocket may be in the table, so checking by value may give you false positives, while deleting it may get rid of more records than you wanted. What about concurrent tests? With a group of developers running the same tests, they start tripping over each other very quickly and the whole effort becomes an exercise in frustration. At this point the development manager throws up his hands, says that this automated testing thing is a load of bollocks and to get back to your work because they didn’t deal with all this when he was doing VB. Somewhere else, Kent Beck sheds a tear…

Let’s examine what goes on in Ruby on Rails. One of the best ideas that was popularized by this framework was its method for database unit testing. A developer’s workspace has multiple environments by default - development, test and production. You develop against the development schema, designing table structures, and playing with the user interface to your heart’s content. When you run unit tests, the following happens - the schema from development is copied into the test database with no data in it. The framework imports version controlled sets of test data (saved as YAML files) into this new schema. Whenever a test is run, it is guaranteed that the database will be in this state. Any changes a test makes are visible only within the scope of this one test. The tear down step cleans out your changes. This makes life so much simpler, especially if you have been working in the nightmare scenario above.

So how do we get the same sort of effect in a corporate development environment?

You need multiple database schemas in order to unit test your db code.

Pause and re-read that line. It’s not negotiable. Probably two per developer. One with sample data to use while you work on the user interface. The other, a temporary one for unit testing. A whole development team using the one schema does not work. Most projects do it, but that doesn’t mean that it’s a good idea.

Some suggestions for how to manage this. The DBAs have their own schemas. The full DDL for the database is kept in version control. After each change, the full database DDL is dumped and checked in. No UPDATE TABLE statements. Ever. This way you are guaranteed that if you ever want to get a baseline of your system, you can also rebuild the database as it existed at this time. I worked on a very large telecoms project with a huge development team, and this worked. Well.

The test data for your environments is stored in version control - at the very least, as dumps of insert statements. For unit testing purposes, a dedicated unit test framework is beneficial. DBUnit performs the same task in Java as described above for Rails - it loads test data from dumps (a number of formats are supported), and guarantees that the test database exists in the expected state when each test is run.

To test your database code, refresh your test schema with the one from version control - typically using your chosen build system. Ant tasks are generally pretty good for this. Now run your test cases. Gorgeous! No tripping over other people, and your tests are guaranteed to work the same each time. No excuses for a red bar.

So why is unit testing databases so difficult if it doesn’t have to be? Most of the time it involves process change and getting out of bad habits, not just a tool. And change means convincing people. Generally, managers do not understand what benefit there is in multiple database schemas, as it is seen to increase complexity and therefore risk, and DBAs like to have full control over what is going on on their servers. The topic of databases and processes is also a great one for religious zeal.

The process outlined above should explain the hows and whys to the individuals involved. The changes above mean a little bit more setup initially, but a saner development process.

A nice side effect is how easy upgrading databases through your environments can become. Run the latest DDL against a fresh schema, get the differences between it and your target environment using a database compare tool, and fire it off. Beautiful.

Jan 23

Bob Lee posted yesterday about one of his sessions at Javapolis (viewable at Parley’s), that covered heaps of good stuff about dependency injection and API design. At the end he mentions weak and soft references. What? Who? Not exactly a language feature that I have come across in the day to day, so I did a quick search and came up with this:

A weak reference is a reference that isn’t strong enough to force an object to remain in memory.

Hopefully, I can give a better explanation than that.

A strong reference is one that while it exists, the object won’t get garbage collected:

Goat goat = new Goat();

If the goat is referenced elsewhere and loses all other references, it will still stay in memory. This is a problem if you want to keep some other data around about the goat:

Map<Goat, GoatInfo> goatMetaData = new HashMap<Goat, GoatInfo>();
goatMetaData.add(goat, new GoatInfo("Billy"));

To ensure that you don’t keep goats around any longer than you have to, you would have to do some weird coding to ensure that that code that deals with goats cleans up the store. Something with listener classes or similar.

Enter the weak reference. You wrap your goat in it as such:

WeakReference weakGoat = new WeakReference(goat);
Goat goat = (Goat) weakGoat.get();

Alternatively:

WeakReference<Goat> weakGoat = new WeakReference<Goat>(goat);
Goat goat = weakGoat.get();

weakGoat will return null, once all the other references to goat have been set to null, and it has been garbage collected.

So that’s cool for a single instance. What about the metadata example above? Well, there is a Map implemenation that uses this feature.

Map<Goat, GoatInfo> goatMetaData = new WeakHashMap<Goat, GoatInfo>();
goatMetaData.add(goat, new GoatInfo("Billy"));

If you try to get the additional info about the goat once it has been garbage collected, the map will return null. Nifty.

Soft reference objects, on the other hand are cleared at the discretion of the garbage collector in response to memory demand. So while weak references will be cleaned when the garbage collector deems that the underlying object has gone, use of soft references means that they will hang around a bit longer until the memory space is needed for something else.

An object is phantomly referenced after it has been finalized, but before its allocated memory has been reclaimed. According to the API, they are “most often used for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism”.

The API has much more detail on how to use this, although it’s kind of tough to get your head around. Too many uses of the word “reference”.

Useful when you are writing a container or cache. But you don’t want to do that because you know that you can get open source ones off the shelf, right? ;)

Nice one to know for interviews. Real bastard to ask :)

—- Update 25/04/08

I got asked :)

Jan 21

First thing’s first. I love open source. I think that it’s the best thing since sliced bread. That thing that we were always told about since computer science, that of the open marketplace for components to be shared and reused HAS happened. Just not in the “buy this billing component” kind of way. It’s even better! It’s free (ish)! You download what you want and plug it in to your application. The quality varies, but if you keep your ear to the ground and do your homework, it will save you a lot of time. And time is money.

But just how much money?

I recently discovered Ohloh . It’s like professional networking, but not exactly, and it’s for open source. It has some cool features, but the one that got me instantly was that it trawls through open source repositories and gives you some very cool stats. Like just how much effort it would take to build an equivalent version of something, and how much it would cost given a yearly salary for a programmer. It uses lines of code, which are not a good metric, but rather an OK litmus test.

Another one of my favourite sites is Java-Source.net. Pick a particular category of software that you need, and it lists you a suite of open source options in Java - ready for you to do with as you will.

So let’s pick a category. Here’s one that I prepared earlier - workflow engines. Pretty much every place that I have ever worked has rolled their own in this regard. For some reason it’s considered a low hanging fruit (even though people write postgraduate theses on them). In some cases, off the shelf offerings are seen as overkill, too restrictive or just too complex. Most of the time the analysis is little more than gut feel, a kind of “Hmm… looks too hard, must have been over-engineered.” So a couple of guys get together and do a “bake at home” version. Pretty soon, the reality takes over and it doesn’t do what it’s supposed to, the use case was misunderstood, you need a management console, version control of process flows, different flows in different environments… uh oh! Suddenly, you spend a lot of time maintaining this beast.

So let’s do the stats and take the first handful of products listed on both sites (there are many others). These vary in the amount of activity going on, have quite varied features and uses - some plug in to applications, others are standalone orchestration engines. It’s not exactly scientific, but it’s interesting for illustration purposes. Let’s take the average programmer salary as $55000 (dollars, euro, pounds - it’s all about the sameish worldwide, so doesn’t really matter):

LOC = Lines Of Code

  1. Apache ODE. 108,547 LOC, 55 Person Years, $1,498,221.
  2. Taverna. 134,334 LOC, 33 Person Years, $1,832,157.
  3. jBPM. 286,618 LOC, 74 Person Years, $4,081,422.
  4. Enhydra Shark. 255,101 LOC, 65 Person Years, $3,576,525.
  5. OpenSymphony OSWorkflow. 48,303 LOC, 11 Person Years, $627,203.
  6. ObjectWeb Bonita. 67,916 LOC, 16 Person Years, $894,118.
  7. OpenWFE. 187,176 LOC, 47 Person Years, $2,608,592.
  8. WfMOpen, 152,557 LOC, 38 Person Years, $2,084,413.

The average cost of building a workflow engine?

155,069 LOC. 42 Person Years, $2,150,333.

Once again, this is completely unscientific. I have no idea whether the cost is over the lifetime of the product or initial development cost, whether management costs are included, we aren’t comparing apples with apples, and these are general purpose engines rather than a thing that does only the thing you want. But it does make for a very interesting question. Have you got the time and money to do this, or would you rather get on with the business problem at hand?

Enterprise software is a complex business. There are no shortcuts. There are no easy decisions. The landscape changes all the time. You have to weigh up support costs, training, extensibility, maintenance and skills.

My take on it? Someone did the heavy lifting already. Do yourself a favour and take advantage of it. If it doesn’t do exactly what you want, then the code is right there to change. You can always contribute it back, and if it’s good enough then it becomes the property of the community. The numbers are compelling.

Dec 18

Now you can find out with Crap4J. Can wait to run some projects through it. I’m curious as to what the code quality looks like on some of the larger Open Source projects (Dom4J looks pretty good though).

I love stats. I don’t know what it is. They’re like horoscopes, or something. Pinch of salt and all that, but you can’t not read them. Even better with diagrams :)

« Previous Entries