Archive for the ‘technology’ Category


Better living through parallelism

Judging by the interest to my last actors post, I thought I’d throw up a piece of code that uses actors anonymously to parallelise a long running operation. Not every operation can be parallelised, most things we work on tend to be fairly sequential. However, sometimes if you can split up the work to perform an operation, you can get some serious bang for your buck.

Let’s take a pretty interesting little problem as an example, working out pi. I first encountered this problem in a job interview a while back (yeah, I thought it was out there too). It’s an example of what’s known as a Monte Carlo Simulation. The general way to work this out is described in detail in this Google Code University tutorial on parallel programming (worth taking a look).

The genaral approach works on the basis of this diagram.

The steps are as follows:

  1. generate points randomly in the square
  2. work out whether each point falls into the circle (distance from centre is less than the radius, use Pythagoras here)
  3. the proportion of points in the circle to those in the square is approximately the same as that of the areas of the two shapes; transpose the area formulas (I’ll show it in the code) to work out an approximate value of pi

It’s a nice little problem, because the more points you generate, the better the estimate gets. It’s also easily parallelisable as the first two steps can be performed repeatedly by anyone. Let’s do a first pass in a single thread:

package net.jakubkorab.pi_evaluator

import scala.actors._
import scala.actors.Actor._
import scala.math._

object PiEvaluator {
	val sideLength : Double = 1
	val radius : Double = sideLength / 2
	val totalPointsToSample : Long = 10000

	def isRandomPointWithinCircleRadius() : Boolean = {
		val x = (random * sideLength) - radius
		val y = (random * sideLength) - radius
		val hypotenuse = sqrt(pow(abs(x), 2) + pow(abs(y), 2)) // pythagoras
		hypotenuse <= radius
	}

	def pointsInCircle(pointsToSample : Long) : Long = {
		(1L to pointsToSample)
			.map((i : Long) => isRandomPointWithinCircleRadius())
			.foldLeft(0L) { (pointCount : Long, pointInCircle : Boolean) =>
				if (pointInCircle) pointCount + 1 else pointCount
			}
	}

	def approximatePi(pointsEvaluated : Long, pointsInCircle : Long) = {
		println("After " + pointsEvaluated
			+ " samples, the number of points in a circle was " + pointsInCircle)

		val areaOfSquare = sideLength * sideLength
		// areaOfCircle/areaOfSquare =~ pointsInCircle/pointsEvaluated, so
		val areaOfCircle = pointsInCircle * areaOfSquare / pointsEvaluated
		// areaOfCircle = pi * r^2, so
		val approximatePi = areaOfCircle / pow(radius, 2)
		println("Pi is approx: " + approximatePi)
	}

	def main(args : Array[String]) = {
		approximatePi(totalPointsToSample, pointsInCircle(totalPointsToSample))
	}
}

At line 20, there’s a nice little bit of map-reduce action going on (fold is an equivalent name). What’s going on is that for each number, we’re generating a random number an determining whether it’s in a circle. Then we fold/reduce the list of booleans returned by the map function to give a count of all the points that were generated within the radius of the circle. It’s a super useful little programming construct.

We then call approximatePi() to do our maths for us.

Running this (I’m doing it through SBT, that’s why the [Info] outputs) we get:

After 10000 samples, the number of points in a circle was 7883
Pi is approx: 3.1532
[info] == run ==
[success] Successful.
[info]
[info] Total time: 5 s, completed 30-Jan-2011 13:45:46


Not bad, but we need more points to get a better result. We know that pi should be around 3.14159, but we’re not quite there yet. Scaling up our numbers:

Samples Time (s) Pi
10,000 5 3.1532
100,000 5 3.13676
1,000,000 7 3.141188
10,000,000 20 3.1424864
100,000,000 189 3.14139356


Ok, we’re approaching our known value, but this approach has shown some stress. It’s not scaling too well. It’s slowing right down, but my processor (Intel Core2 Duo T9300 @ 2.5GHz) is showing only ~60% usage. If we could break down the point generation among a number of threads, and then tally up the result at the end, we could make better use of our hardware. Enter fork-join.

Fork-join is probably one of the most common things that you’d want to do with concurrency. It’s a bit of a pain in the butt in Java (task executor framework, futures etc.). It really shouldn’t be. There’s some excellent work going on around a framework for this, and map-reduce too, for future versions of the language. In the meantime, you need to jump through hoops – cue fishing through Concurrency in Practice. In Scala though, it’s really easy using actors. So we rewrite main:

	val workers = 10
	def main(args : Array[String]) = {
		// break down the job between worker - there's a rounding issue here, but never mind
		val samplesPerWorker = totalPointsToSample / workers
		1 to workers foreach { (workerNumber : Int) =>
			val worker = actor {
				receive {
					case pointsToEvaluate : Long => {
						sender ! (pointsToEvaluate, pointsInCircle(pointsToEvaluate)) // reply with a tuple
					}
				}
			}
			worker ! samplesPerWorker
		}

		var workersReplied = 0
		var totalPointsEvaluated = 0L
		var totalPointsInCircle = 0L
		while (workersReplied < workers) {
			receive {
				case (pointsEvaluated : Long, pointsInCircle : Long) => {
					workersReplied += 1
					totalPointsEvaluated += pointsEvaluated
					totalPointsInCircle += pointsInCircle
					if (workersReplied % 100 == 0) {
						println(workersReplied + " workers replied")
					}
				}
			}
		}
		approximatePi(totalPointsEvaluated, totalPointsInCircle)
	}
Samples Workers Time (s) Pi
10,000,000 10 18 3.1427624
100,000,000 10 64 3.1418042


64s is a lot better than our original 189s! And this effect just gets more pronounced the more processors you have (where the optimal number of actors matches the number of processors; at least it should be – feel free to play around with this).

In the second version of main, the actor keyword/block is actually a method on the Actor object that creates an anonymous actor instance and instantly starts it. We ping some messages to it, and wait for the results using receive().

Easy, and you can apply it for a ton of different tasks.

Why I dig Scala: Concurrency and the Dining Philosophers

I am occasionally asked what the big deal is about Scala. For me, to decide whether a programming language is worthwhile is dependent on two practical questions: does it aid comprehension, and does it reduce code. The two are not necessarily interchangeable. Terseness, after all, does nothing to aid comprehension. Scala scores points on both counts. It also has a sweet spot that I haven’t encountered elsewhere, that is that it lends itself to concurrent programming in a way which is easy to reason about, and therefore get right. It does this through a set of supporting language features that combined allows us to code at a higher level of abstraction: a leaning towards immutability, functional constructs (such as closures), as well as a familiar way to model a domain via object orientation.

Put together, it’s massively powerful. As it runs on the JVM you could code Scala concurrency the same way as Java, with the bog-standard tools such as wait/notify/notifyAll or the JDK 5+ concurrency libraries. You might get some shorter code, but it misses the point. Scala comes with an implementation of the Erlang-inspired actors model out of the box, which lets you deal with the problems of concurrency in a manner that is much easier to reason about. Actors aren’t a language construct, but a library that makes use the underlying platform (the JVM) and Scala’s language features to provide a much simpler mental model for us to deal with. Actors are “like” threads (not really, but close enough for a starting point) that send and receive messages to and from other actors. How does this aid comprehension? Synchronous and asynchronous messages are very simple to reason about, and IMHO much more straightforward than the Java concurrency libraries (compare the actors documentation to the Trains Book).

Consider the classic computer science concurrency problem, the Dining Philosophers. A number of philosophers sit down at a round table to do some eating and thinking. Each philosopher brings with him a single chopstick that he places on his right hand side. So you have X philosophers and X chopsticks. To eat, a philosopher must pick up the chopsticks on his left and right sides. Leaving aside hygiene issues, it’s a cool toy problem around resource contention. So, how would you do this in Scala? The mental leap to be made is that “everything is an actor”. Given a number of philosophers dining at a table, it’s quite nicely modelled if you think about both the philosophers and the table as actors that pass messages beteen each other. If you want to see the whole file (~150 lines), it’s available here.

Firstly the messages that we’re going to be passing around:

package net.jakubkorab.philosophers

import messages._
import scala.actors._
import scala.actors.Actor._
import scala.math._

package messages {
	class Chopstick(val position : Int)

	object Side extends Enumeration {
		type Side = Value
		val Left, Right = Value

		def randomSide() = { Side(floor(Side.values.size * random).intValue) }
		def otherSide(side : Side.Value) = { Side.values.find{_ != side}.get }
	}

	sealed abstract class Message

	abstract class TableMessage() extends Message
	case class AllFinished() extends TableMessage

	abstract class ChopstickResponse() extends TableMessage
	case class ChopstickAvailable(val chopstick : Chopstick) extends ChopstickResponse
	case class ChopstickUnavailable() extends ChopstickResponse 

	abstract class DinerMessage() extends Message
	case class RequestChopstick(val philosopher : Philosopher, val side : Side.Value) extends DinerMessage
	case class ReplaceChopstick(val chopstick : Chopstick) extends DinerMessage
	case class CouldNotEatAnotherBite(val guest : String) extends DinerMessage
}

Messages don’t actually need to be of any particular type, I just like thinking of that sort of thing in a hierarchy. All of the messages that I’ll pass around are subclasses of Message.

Now for our philosophers:

class Philosopher(val name : String, val wordsOfWisdom : String) extends Actor {
	var table : Actor = null
	var seatedAt : Int = -1

	private var timesLeftToEat = 3
	override def act() = {
		while (timesLeftToEat > 0) {
			think()
			val side = Side.randomSide // pick a chopstick to use first
			say("Requesting chopstick 1 on " + side)
			table !? (1000, RequestChopstick(this, side)) match {
				case Some(ChopstickAvailable(chopstick1 : Chopstick)) => {
					pause() // put in a delay so we can see actors switching
					val otherSide = Side.otherSide(side) // request the other
					say("Requesting chopstick 2 on " + otherSide)
					table !? (100, RequestChopstick(this, otherSide)) match {
						case Some(ChopstickAvailable(chopstick2 : Chopstick)) => {
								eat()
								pause()
								table ! ReplaceChopstick(chopstick1) // return chopsticks
								pause()
								table ! ReplaceChopstick(chopstick2)
							}
						case Some(ChopstickUnavailable()) => {
							say("No " + otherSide + " chopstick");
							table ! ReplaceChopstick(chopstick1);
						}
						case None => { say("None"); table ! ReplaceChopstick(chopstick1) }
					}
				}
				case Some(ChopstickUnavailable()) => { say("No " + side + " chopstick") } // no luck getting a chopstick
				case None => say("None")
			}
		}
		react {
			case AllFinished => { say(wordsOfWisdom); exit }
		}
	}

	private def think() = { say("Hmm"); pause() }
	private def eat() = {
		say("Nom nom");
		timesLeftToEat -= 1
		if (timesLeftToEat == 0) {
			table ! CouldNotEatAnotherBite(name)
		}
	}
	private def say(s : String) = { println(name + ": " + s) }
	private def pause() = { Thread.sleep(ceil(random * 1000).intValue) }
}
object Philosopher {
	def apply(name : String, wordsOfWisdom : String) = new Philosopher(name, wordsOfWisdom)
}

As I said, it’s pretty straightforward if you think of an actor as a Thread. Think of act() as the equivalent of Runnable#run(). Philosophers will be instantiated with a name and some words of wisdom they’ll come up with. Once they’re sat at a table, they’ll receive an instance of table for them to communicate with and a place where they’re sitting. Messages are sent either asynchronously to the table using the ! method, or synchronously using !? (in which case the number that follows is a timeout). The syntax may be unfamiliar, but I think it reads pretty easily even to those unfamiliar with Scala. I won’t go through it in detail. A philosopher sends a chopstick request to the table and gets a response, either that a chopstick is available, or that it’s unavailable. Pretty straightforward.

So, now the table.

class Table(val philosophers : Set[Philosopher]) extends Actor {
	if (philosophers.size < 2) throw new IllegalArgumentException("At least 2 philosophers must dine together")
	var chopsticks = new Array[Chopstick](philosophers.size)
	var location = 0
	philosophers.foreach { philosopher =>
		chopsticks(location) = new Chopstick(location)  // lay the cutlery
		philosopher.seatedAt = location
		location += 1
	}
	var guestsEating = philosophers.size

	override def act() = {
		println("Starting the meal")
		philosophers.foreach { philosopher => philosopher.table = self; philosopher.start  } // let's go
		while (true) {
			receive {
				case RequestChopstick(philosopher : Philosopher, side : Side.Value) => giveChopstickIfAvailable(philosopher, side)
				case ReplaceChopstick(chopstick : Chopstick) => replaceChopstick(chopstick)
				case CouldNotEatAnotherBite(guest : String) => guestFinished(guest)
			}
		}
	}

	private def giveChopstickIfAvailable(philosopher : Philosopher, side : Side.Value) = {
		var index = if (side == Side.Right) philosopher.seatedAt else philosopher.seatedAt - 1
		if (index < 0) { index = philosophers.size - 1 } // get the one on the end of the array

		val chopstick = chopsticks(index)
		if (chopstick == null) {
			println("No chopstick available at " + index)
			sender ! ChopstickUnavailable() // sender, not philosopher!
		} else {
			chopsticks(index) = null
			sender ! ChopstickAvailable(chopstick)
		}
	}

	private def replaceChopstick(chopstick : Chopstick) = {
		chopsticks(chopstick.position) = chopstick
	}

	private def guestFinished(guest : String ) = {
		println(guest + " is done")
		guestsEating -= 1
		if (guestsEating == 0) {
			philosophers.foreach {_ ! AllFinished}
			println("All done")
			exit
		}
	}
}

The role of the table is to manage the resources, in this case the chopsticks. Calling start() on an actor is analogous to Thread#start().

And now, to kick it all off, let’s stick some philosophers on a table. I have chosen the Greco-Roman Stoics for their easy going approach to life, but any school of thought will do. Chinese philosophers may have been more appropriate to the cutlery. My example, my choice.

object PhilosophersLauncher {
	def main(args : Array[String]) = {
		val table = new Table(
			Set(Philosopher("Seneca the Younger", "The point is, not how long you live, but how nobly you live."),
				Philosopher("Epictetus", "Freedom is secured not by the fulfilling of men's desires, but by the removal of desire." ),
				Philosopher("Marcus Aurelius", "Everything is right for me, which is right for you, O Universe."),
				Philosopher("Zeno of Citium", "Shit happens.")) // one of his lesser known ones
			).start
	}
}

So, does it work?

Starting the meal
Seneca the Younger: Hmm
Epictetus: Hmm
Marcus Aurelius: Hmm
Seneca the Younger: Requesting chopstick 1 on Right
Marcus Aurelius: Requesting chopstick 1 on Right
Epictetus: Requesting chopstick 1 on Right
Marcus Aurelius: Requesting chopstick 2 on Left
No chopstick available at 1
Marcus Aurelius: No Left chopstick
Marcus Aurelius: Hmm
Epictetus: Requesting chopstick 2 on Left
No chopstick available at 0
Epictetus: No Left chopstick
Epictetus: Hmm
Marcus Aurelius: Requesting chopstick 1 on Left
Seneca the Younger: Requesting chopstick 2 on Left
Seneca the Younger: Nom nom
Marcus Aurelius: Requesting chopstick 2 on Right
Marcus Aurelius: Nom nom
Epictetus: Requesting chopstick 1 on Left
No chopstick available at 0
Epictetus: No Left chopstick
Epictetus: Hmm
...
Epictetus: Requesting chopstick 1 on Left
Zeno of Citium: Requesting chopstick 1 on Left
Epictetus: Requesting chopstick 2 on Right
Epictetus: Nom nom
Epictetus is done
Zeno of Citium: Requesting chopstick 2 on Right
Zeno of Citium: Nom nom
Zeno of Citium: Hmm
Zeno of Citium: Requesting chopstick 1 on Right
Zeno of Citium: Requesting chopstick 2 on Left
Zeno of Citium: Nom nom
Zeno of Citium is done
All done
Marcus Aurelius: Everything is right for me, which is right for you, O Universe.
Epictetus: Freedom is secured not by the fulfilling of men's desires, but by the removal of desire.
Seneca the Younger: The point is, not how long you live, but how nobly you live.
Zeno of Citium: Shit happens.


Yup. I think it’s pretty easy to make sense of all this. You can easily reason about what happens when, just by drawing a sequence diagram. Consider the backdown strategy when a philosopher can’t get hold of a chopstick:

Zeto->Table: RequestChopstick(Left)
activate Table
Table-->Zeto: ChopstickAvailable(C0)
deactivate Table

Epictetus->Table: RequestChopstick(Left)
activate Table
Table-->Epictetus: ChopstickAvailable(C1)
deactivate Table

Zeto->Table: RequestChopstick(Right)
activate Table
Table-->Zeto: ChopstickUnavailable()
deactivate Table
Zeto->Table: ReplaceChopstick(C0)

note over Zeto: Sleeps for a bit before trying again

Epictetus->Table: RequestChopstick(Right)
activate Table
Table-->Epictetus: ChopstickAvailable(C0)
deactivate Table
Epictetus->Epictetus: Eat
Epictetus->Table: ReplaceChopstick(C1)
Epictetus->Table: ReplaceChopstick(C0)


So, actors are cool, and in Scala they are easy to reason about due to the syntax and ability to mix in OO concepts. For a comparison, check out the equivalent in Java using semaphores. The Scala version is far less code and much easier to comprehend. And if you think that’s cool, check out Akka.

Bored with software?

What’s interesting right now in software isn’t the new shiny thing. We already have the tools to do most of what we want. What’s interesting is scale and change.

You build a system. Then you realize you need to break out and share functionality via modules. Then you want to manage them independently in live environments. And not take the system down. And have the old transactions finish on the old code while the new work hits the new code.

You build logic. It grows to the point where your original hand crafted solution is too unweildy. You need a rules engine, or workflow. Your code needs to keep running. A rewrite is not an option. Rework, refactor, augment, migrate. But don’t break what’s there.

You just wanted to integrate to that one external system. Web services behind a facade. Now another, this time via messaging. All of a sudden it’s 12. Integration framework? ESB? You’re in a cluster, shared network memory, processes that can only run in one place at a time. What’s the last straw, the tipping point to your next upgrade? Where to from here?

That’s what’s interesting.

Expressive or just terse?

This Scala code populates a list with objects:

// fairly standard
var people = List[Person]()
for (i <- 1 to 10) {
	people = new Person(i) :: people
}

So does this.

// slightly more functional
val people = (List[Person]() /: (1 to 10)) {(people, i) =>
	new Person(i) :: people
}

If you’re a Java programmer and just balked, you’re probably not alone. I know what it does and I have to read it from left to right to left to right again. In this case the shortcut’s a Scala mental snag equivalent to Java code like:

boolean empty = (!list.empty()) ? false : true;

There’s always going to be a pause when you read stuff like this. Remove the operator weirdness and you get:

val people = (1 to 10).foldLeft(List[Person]()) {(people, i) =>
	new Person(i) :: people
}

At least here, the reader doesn’t have to catch on to the flipping of the object and the method argument done by the /: method. In time, you probably get used to reading shortcut operators, but more than likely you’re going to be snagging for a while, and so will the next guy reading your code.

Just because you can do something, doesn’t mean you should :)

Update:
Thanks to @jstrachan for this:

val people = (1 to 10).map(new Person(_))

I have much to learn :)

NoSuchMethodException launching Scala App

This is a pretty standard thing to do in Java:

public class App {
	public static void main(String[] args) {
		App app = new App();
		app.start();
	}

	public void start() {
		// do something
	}
}

So you reckon you’d be able to do this in Scala:

object App {
	def main(args: Array[String]) = {
		val app = new App()
		app.start()
	}
}

class App() {
	def start() = {
		// do something
	}
}

You go to run it and:
Exception in thread "main" java.lang.NoSuchMethodException: id.jakubkorab.App.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1605)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:107)

Turns out the Scala compiler doesn't treat application launching in the same way that it treats other classes with companion objects. If you rename the object App to another name, it works.

object AppLauncher {
	def main(args: Array[String]) = {
		val app = new App()
		app.start()
	}
}

Go figure.

Running a Scala App in IDEA 9

My first experience with IDEA hasn’t been a good one. In all fairness, not IDEA itself, but with a plugin (isn’t that always the way). I use Eclipse on a day to day basis, but since I heard that the IDEA Scala support is much better, I decided to download the Community Edition and give it a bash.

After following these instructions to hook up a Maven Scala project in IDEA (why not just go all in) I went to run the auto generated app:

package net.jakubkorab

/**
 * Hello world!
 */
object App extends Application {
	println("Hello World!")
}

The instructions were straightforward “just do CTRL SHIFT F10 and it’ll run”. Tried that. Nothing. Huh? To the net…

“Add a new Run configuration – Scala Application”. Nothing.
Maybe it’s a plugin version thing? Google… *muttering*

“you’ll need a different plugin called Scala Application”. Search for the plugin… Doesn’t exist. *more muttering*

Finally I found this post from 2008 – seems the plugin won’t run classes that extend scala.Application – you have to give it a main() method!

NP. Given that not many people have actually met sailors, I’d like to propose “swears like a programmer”.

package net.jakubkorab

/**
 * Hello world!
 */
object App {
	def main(args:Array[String]) {
		println("Hello World!")
	}
}

CTRL SHIFT F10 and it works. If you look in Run configurations, it’s been set up as an Application, not a Scala Application like some of the other sites suggest.

Not a good start with IDEA, but like most systems with plugins, don’t throw the baby out with the bath water. I look forward to happier times from here on.

Get Functional

That was the message that was coming through the Devoxx conference presentations this year. The idea that it will help your code run in the brave new world of multi everything (multi-core, multi-thread etc.) is one that’s widely touted, but rarely the primary driver for its use. Instead, it’s about less code, that’s more easily understood. When you do get to scaling it, it won’t do any harm either.

As Guillaume Laforge tweeted, from 800 Java developers in his session, only 10 knew/used Scala, 3 Clojure, 20 Ruby, and 50 were on Groovy – which gives a nice gentle introduction to some of the constructs for those looking to wade in. Good stats to cut through they hype. So what of the roughly 90% slogging on without closures, does this mean that they have to miss out on this fun?

Quite simply, no. There’s heap of drop in libraries that you can add into a Java project for all manner of functional goodness, and which don’t change the syntax of the language. LambdaJ for example gives a nice functional way of dealing with collections. To steal an example directly from the website, the following typical Java code:

List<Person> sortedByAgePersons = new ArrayList<Person>(persons);
Collections.sort(sortedByAgePersons, new Comparator<Person>() {
        public int compare(Person p1, Person p2) {
           return Integer.valueOf(p1.getAge()).compareTo(p2.getAge());
        }
});

is replaced with:

List<Person> sortedByAgePersons = sort(persons, on(Person.class).getAge());

Fancy a bit of map-reduce without a grid? Well, it comes stock-standard with the Fork Join (JSR166y) framework that will be added to the concurrency utilities in JDK 7. If you don’t fancy waiting until September 2010 (the latest expected date for the GA release), it’s downloadable here. As an aside, Doug Lea has written a really good paper on the FJ framework.

Don’t fancy loops in loops in loops to filter, aggregate, do set operations with all the null checking that Java programming typically entails? Well, the Google Collections library (soon to be integrated into Guava, a set of Google’s core libs), contains predicates and transform functions that make all of this a lot easier to write and reason about. Dick Wall had a great presentation about this showing just how much code can be reduced (heaps).

A thing I heard a number of times outside the sessions was, “I don’t know about all this stuff, surely as we get further from the metal, performance suffers”. Sure, it gets harder to reason about timings as the abstractions get weirder, but the environment gets better all the time, and the productivity gains more than outweigh performance in all but the most perf-intensive environments. Brian Goetz spoke about how the JVM supports this new multi-language world. Not something that I had ever really given much thought to, but the primary optimizations aren’t at the language compiler level (javac, scalac, groovyc etc.)- they’re are all done at runtime, when the JVM compiles the bytecode. The number of optimizations in HotSpot are massive (there was a striking slide showing 4 columns of individual techniques in a tiny font). Multiple man-centuries of effort have gone into it, and each new release tightens it up. If you’re not sure, then profile it and make up your own mind. JDK 7 will also see the VM with some goodness that will make dynamic languages really fly.

One thing that still sticks out like a sore thumb is Closures support in Java. It’s not a candidate for inclusion in JDK 7, and the proposed syntax shown at the conf by Mark Reinhold looks pretty ugly when compared to other langs (see the proposal by Neal Garter). Either way, not a sniff of actual implementation. I understand there’s some serious work on the VM to make any of this possible regardless of the syntax. Not holding my breath. [Closures will actually be in JDK7 - thanks Neal.]

All up, I’m pretty excited by all this, and can’t wait to get my hot little hands on some of these tools. The functional style yields code that’s much easier to read and reason about, and the fact that it’s essentially all Java syntax, means that there’s no reason not to apply it. If you’re already comfortable with using EasyMock on your team, you won’t find it a huge mind shift.

Deep Diff Pizza

There is nothing I love more than a proprietary, undocumented API. Call it an unfortunate fact of life, but weird object models that hang together by the skin of their teeth are out there. Most of the time there’s no validation logic to check that they’re semantically or syntatically correct until you send this tangle of objects to a system you’re integrating with. Having been burned badly in the past on this sort of work, I’ve been looking for a way to work effectively in these types of scenarios.

In my most recent case, I had some example code that built up these object graphs for various use cases. The code wasn’t transportable, but it did generate correct inputs. After reverse engineering the graph construction logic into a builder by working out of XStream dumps to console, I was then able to build up exploratory integration tests that triggered the various use case scenarios. So far, so good. Now to just remove the need to have the back-end system up.

What I needed was a way to compare a test object graph with a constructed one. I already had an XML representation thanks to XStream of the expected output, so I could reload it into memory as needed – so there’s the test data. The equals() and hashCode() methods on the model were unreliable, so that’s no good. I toyed with the idea of writing a general purpose deep diff mechanism for object graphs, but came to the conclusion that it wasn’t a good idea. Aside from not wanting to get into writing gnarly reflection code I realised that the problem was more difficult than it seemed. You get into questions like “What is equality?”, “How much change is acceptable?”, “Where do I stop? Primitives, java.util.*?”, and “How can I mark things as being acceptably different?”. It’s probably a good indication that noone had written this sort of thing already.

Then I realised that the answer was staring me in the face. Just diff the XML representation programatically! By loading up an expected input from a file next to the unit test, I could then dump the model under test out to another String and… Rock and Roll.

XMLUnit has a quite good diffing utility, which was about 80% of the way there. Some of the data in my model changed over time, so I needed a way to say “ignore these bits of the tree”. There’s a neat little interface in XMLUnit called DifferenceListener, that gets notified whenever the mechanism finds a diff. You can implement a method that decides whether to report the difference as different, the same, or similar (why you’d want to do that is beyond me – “this is different-ish”). I hacked up my own implementation that took some XPath expressions, mapping the nodes in the control graph that ought to be ignored and voila.

The more I code the more uses I find for XStream. In combination with XMLUnit, it’s like hazelnuts to chocolate. It’s an excellent option next time you need to diff object graphs.

A Better Builder

The builder pattern should be familiar to anyone who has needed to change data from one format to another. Often called assemblers, translators or similar, they’re found peppered throughout almost every project. The idea is pretty simple:

public class HouseBuilder {
    public House buildHouse(OtherHouseType otherHouse) {
        House house = new House();
        // copy a whole bunch of properties from the otherHouse....
        for (OtherWallType otherWall : otherHouse.getWalls()) {
            house.addWall(buildWall(otherWall));
        }
        return house;
    }

    private Wall buildWall(OtherWallType otherWall) {
        Wall wall = new Wall();
        // copy another whole bunch of other properties....
        // do something fancy every third Tuesday of the month...
        for (OtherWindowType otherWindow : otherWall.getWindows()) {
            wall.addWindow(buildWindow(otherWindow));
        }
        return wall;
    }

    private Window buildWindow(OtherWindowType otherWindow) {
        Window window = new Window();
        // you get the idea....
        return window;
    }
}

It’s OK for translating small object graphs, but for non-trivial work, these things can turn into huge heaving masses of code. Because they’re pretty quick to knock out, they’re generally untested as the above code doesn’t really lend itself to it. Take the buildWall() method above – you can’t really exercise the “every third Tuesday” case easily without running through buildHouse(), and then calling buildWindow(). It becomes a pain to construct the data model to pump into test cases, and as we know, anything that’s not easy or highly visible doesn’t get tested. Did I just say that? Oops. Sorry, industry secret.

One strategy that’s applied is to write builders for each object type, i.e. HouseBuilder, WallBuilder, WindowBuilder. These are then either wired together via dependency injection (ugly as it exposes internal mechanics), or hard-coded in with a setter to substitute a dependency at test time. There’s also the drawback of a morass of tiny classes proliferating in a package somewhere. Not necessarily a bad thing, but when doing this sort of work there’s typically a good amount of refactoring, as well as use of utility classes to enrich data when moving between formats. Personally, not a big fan of this approach.

So, here’s a cool little trick:

public class MuchImprovedHouseBuilder {
    private MuchImprovedHouseBuilder delegate;

    public MuchImprovedHouseBuilder() {
        delegate = this;
    }

    void substituteInternalBuilder(MuchImprovedHouseBuilder otherBuilder) {
        delegate = otherBuilder;
    }

    public House buildHouse(OtherHouseType otherHouse) {
        House house = new House();
        // copy a whole bunch of properties from the otherHouse....
        for (OtherWallType otherWall : otherHouse.getWalls()) {
            house.addWall(delegate.buildWall(otherWall));
        }
        return house;
    }

    Wall buildWall(OtherWallType otherWall) {
        Wall wall = new Wall();
        // copy another whole bunch of other properties....
        // do something fancy every third Tuesday of the month...
        for (OtherWindowType otherWindow : otherWall.getWindows()) {
            wall.addWindow(delegate.buildWindow(otherWindow));
        }
        return wall;
    }

    Window buildWindow(OtherWindowType otherWindow) {
        Window window = new Window();
        // you get the idea....
        return window;
    }
}

All of the translation is done in one builder class. On construction, a private field defines a “delegate”. Think of it as a placeholder that will do the “other bits” of the translation and set it to “this”. A package scoped setter allows a test to overwrite the builder with a mock. So with a bit of reference fiddling, we can then test one method at a time, without having to worry about the implementation details of the other methods in the class. All of the non-public methods are given package scope, so they can be tested straight out of your favourite test framework. So, you can now do this in live code:

HouseBuilder builder = new HouseBuilder();
House house = builder.build(otherHouseType);

And at the same time write tests like this:

HouseBuilder builder = new HouseBuilder();
HouseBuilder mockBuilder = EasyMock.createMock(HouseBuilder.class); // the classextension version of EasyMock
EasyMock.expect(mockBuilder.buildWindow(isA(OtherWindowType.class)).andReturn(new Window());
EasyMock.replay(mockBuilder);
builder.substituteInternalBuilder(mockBuilder);
Wall wall = builder.buildWall(otherWallType);

Simple, and you don’t have to stick in any new tools to achieve it.

Bob's girlfriend misunderstood when he said his diet needed more iron.

Testing the Untestable

Once upon a time you got sick of writing code and having it fail when you ran, or someone else did, and you learned to unit test your code.

You read the JUnit docs, and wrote tests for those classes which contained discrete, self-contained code. And quickly realised that the world doesn’t work like that.

Code relies on other code. Often very complex code. And you discovered dependency injection. You abstracted out the collaborating class, and injected it in to the code. Great!

Now you could use stubs – code which always behaves in a certain way – as part of your test, and discretely use them to test the top level code. Awesome! For about two weeks.

The stubs became too inflexible, and every time you changed the interface of the class being stubbed, you had to change the stub too. Not only that, having the stubbed code behave the same way every time was too inflexible. What if you need to test how the top-level code reacts when its collaborators throw errors?

So you looked on the net and you learned about mock objects. It was like getting into a hot bath. Now you could dynamically define the behaviour of collaborating classes in every test. You could test exception handling! Excellent! Until once again, it wasn’t.

Because the world doesn’t work like that. Your code style develops. You want to use other patterns, and behaviours, that don’t fit well with the idea of injection, and which are a code-level implementation detail. Sometimes you just want to instantiate a class and use it. And not have to test it along with the code that uses it. Maybe it’s builders. Maybe it’s a stream that you want to write to. Conversely, you don’t want to litter your Spring config with objects of a tiny ganularity just for the purposes of testing. And you find that your testing tools aren’t enough. Not only that, they are now driving the design of your code.

What if you could write code at the level of abstraction that makes sense to you? Static helper methods? Builders? Package private classes that get instantiated from inside your code? What if you want to wire the objects in your application at a granularity that makes sense?

“Surely, you must be smoking!” I hear you say. “That is a pipe dream reserved for dynamic language programmers, not us grunting around at the coalface in Java.” Well, actually, no. The latest generation of unit testing tools allow you to do just that! With Java!

JMockit is one of the latest toolkits in the evolutionary chain of mocking tools. It depends on the instrumentation APIs in Java 6 to enable behaviour that was previously unattainable. One of the prerequisites is that you are using the latest 1.6 JDK for your code, but if you are – rock and roll. Let me demonstrate:

Let’s say that you are writing an application that blatantly violates the trademarks of MTV’s favourite show and tricks out rubbish cars. Here’s your main code:

package demo;

public class PimpingService {

    private CarDao carDao;

    public void setCarDao(CarDao tradeDao) {
        this.carDao = tradeDao;
    }

    /**
     * Loads a car and pimps it, saving it and returning.
     * representative of a lot of code out in the wild.
     *
     * @param carId The primary key of the car to load.
     * @return The pimped ride.
     */
    public Car pimpRide(Long carId) {
        Car unpimpedCar = carDao.findCar(carId);
        Car pimpedCar = PaintShop.resprayWithFlames(unpimpedCar);

        if (pimpedCar.getValue() > 100000) {
            ElectronicsWizard wizard = new ElectronicsWizard(24);
            pimpedCar = wizard.makeFullySick(pimpedCar);
        }

        try {
            carDao.save(pimpedCar);
        } catch (IllegalStateException isx) {
            // roll back our changes
            pimpedCar = unpimpedCar;
        }
        return pimpedCar;
    }
}

You’ve got some pretty interesting usage patterns here:

  • a dependency injected DAO
  • use of static methods
  • object instantiation

The first is pretty straightforward to mock up, but the others, not so much.

Let’s take a look at the other classes:

package demo;

public final class Car {
    private final Double value;

    public Car(Double value) {
        this.value = value;
    }

    public Double getValue() {
        return value;
    }
}
package demo;

public class CarDao {

    /**
     * Finds a car by its primary key.
     * @param carId The primary key.
     * @return A car if found, or null.
     */
    public Car findCar(Long carId) {
        // normally there'd  be some kind of jdbc thing here
        return new Car(100000d);
    }

    /**
     * Saves a car.
     * @param car The car.
     */
    public void save(Car car) {
        // ...
    }
}
package demo;

public class PaintShop {

    /**
     * Resprays a car, increasing its value.
     * @param car The car.
     * @return A resprayed version of the car.
     */
    public final static Car resprayWithFlames(Car car) {
        Car resprayedCar = new Car(car.getValue() * 2);
        return resprayedCar;
    }

}
package demo;

public class ElectronicsWizard {

    private int multiplier;

    public ElectronicsWizard(int multiplier) {
        this.multiplier = multiplier;
    }

    /**
     * Makes a car fully sick.
     * @param car The car to make fully sick.
     * @return A fully sick car.
     */
    public Car makeFullySick(Car car) {
        return new Car(car.getValue() * multiplier);
    }

}

So, as we run the code, things happen. We first load the car, and then we keep changing its value depending on what operations we run on it.

Let’s exercise it using JUnit and JMockit. This code uses JUnit 4 and JMockit 0.98.

package demo;

import mockit.Expectations;
import mockit.Mocked;
import mockit.integration.junit4.JMockit;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import static org.junit.Assert.assertSame;

@RunWith(JMockit.class)
public class TradeServiceTest {

    @Mocked
    private CarDao mockCarDao;
    @Mocked
    private ElectronicsWizard mockWizard;
    @Mocked
    private final PaintShop unusedMockPaintShop = null;
    private PimpingService pimpingService;

    @Before
    public void setUp() {
        pimpingService = new PimpingService();
        pimpingService.setCarDao(mockCarDao);
    }

    /**
     * Checks that happy path to pimping cars.
     */
    @Test
    public void testPimpRide() {
        final Long carId = 1L; // this is the id that we'll look up
        // the ride we'll expect to get back from the process
        final Car fullySickCar = new Car(300000.0);

        new Expectations() {
            { // static block

                // set an expectation on this mock that it will be invoked with
                // a car id
                mockCarDao.findCar(withEqual(carId));
                Car foundCar = new Car(100000.0);
                returns(foundCar); // return value from last mock call

                // this static method will be called next
                PaintShop.resprayWithFlames(withSameInstance(foundCar));
                Car resprayedCar = new Car(200000.0);
                returns(resprayedCar); // set its return value

                new ElectronicsWizard(24);
                returns(mockWizard);

                mockWizard.makeFullySick(withSameInstance(resprayedCar));
                returns(fullySickCar);

                // we're done
                mockCarDao.save(withSameInstance(fullySickCar));
            }
        };

        // all that's left is to invoke our service and see what happens
        Car resprayedCar = pimpingService.pimpRide(carId);
        assertSame(fullySickCar, resprayedCar);
    }

}

So what’s going on here?

  • We import a few JMockit classes (line 11) to help us with the testing. Because of the way that JMockit operates, there are bits and pieces that are required to integrate the toolkit with the testing framework being used. Here, we import the JUnit version.
  • You need to create placeholders (lines 14-19) for each of the classes that you will be mocking, regardless of whether you want to instantiate them, such as in the case of our unusedMockPaintShop, on which we will just be calling static methods.
  • You define mock objects behaviour in an Expectations block. This makes use of a syntax similar to JMock, where all your code is defined in a static block (line 38). If you are used to EasyMock, you’ll immediately notice the change in style. You use static methods of the Expectations class immediately after the expected method calls to define how the mock object will behave. Line 44′s returns(foundCar) is an instruction for the mockCarDao used on line 42.

The rest pretty much looks like a standard unit test. What about some different behaviour from the instantiated class?

    /**
     * Checks that the electronics wizard won't get called in if the paintshop gives us a
     * car with a small value amount.
     */
    @Test
    public void testPimpRideElectronicsWizardNotCalled() {
        final Long carId = 1L;
        // the car we'll expect to get back - look at the new value
        final Car resprayedCar = new Car(50000.0);

        new Expectations() {
            {
                mockCarDao.findCar(withEqual(carId));
                Car foundCar = new Car(100000.0);
                returns(foundCar);

                PaintShop.resprayWithFlames(withSameInstance(foundCar));
                returns(resprayedCar);

                // its return value is < 100,000 so no wizard here
                // if the wizard was invoked, you'd get an error:
                // Unexpected invocation of: demo.ElectronicsWizard#ElectronicsWizard(int)
                // it's @Mocked, remember?

                mockCarDao.save(withSameInstance(resprayedCar));
            }
        };

        Car tradeReturnedFromService = pimpingService.pimpRide(carId);
        assertSame(resprayedCar, tradeReturnedFromService);
    }

That’s all well and good. But what about testing how our code will handle exceptions?

    /**
     * Checks that no wizard work will be done if respraying gives us a
     * car with a small value amount.
     */
    @Test
    public void testPimpRideElectronicsWizardNotCalledSavingBreaks() {
        final Long carId = 1L;
        final Car foundCar = new Car(100000.0);

        new Expectations() {
            {
                mockCarDao.findCar(withEqual(carId));
                returns(foundCar);

                PaintShop.resprayWithFlames(withSameInstance(foundCar));
                Car resprayedCar = new Car(50000.0);
                returns(resprayedCar);

                mockCarDao.save(withSameInstance(resprayedCar));
                // dao throws a humorous antipodean complaint here
                IllegalStateException isx = new IllegalStateException("Ooh bugger");
                throwsException(isx);
            }
        };

        Car tradeReturnedFromService = pimpingService.pimpRide(carId);
        // the service should have just returned the car it found
        assertSame(foundCar, tradeReturnedFromService);
    }

That’s nowhere near what you can do with JMockit, but it should give you a good taster. Code that was previously untestable, can now be dealt with as easily as a straightforward class with injected dependencies. Go forth and code in whatever way makes sense to you, DI or not, content in the knowledge that it’s all good and testable.

Man, am I happy I can test my code.