Archive for the ‘scala’ Category

Goldilocks actors: not too many, not too few

After my last post on parallelism with Scala actors, I had a thought: when doing a calculation like this, am I actually making the most of my resources? If as in the Pi example, more cycles will generally lead to a better result, surely if I have a limited amount of time to get the best value I want to squeeze every last bit of juice from my hardware. It may seem a trivial question for a small problem, but this has real-world implications in domains like banking where a few more cycles in a pricing or risk calculation means a difference in income.

There’s a bit of lore that says that the optimal number of threads to perform an opertation is equal to the number of processors. Likewise, there’s a generally accepted idea that you will get a progressive deterioration in performance as more threads are run concurrently due to context-switching.

In other words, you can have too many actors. Gratuitous use of images to make tech blog seem more humorous.

I wasn’t convinced, and decided to try it out for myself. My thinking was that the one-thread per processor idea was a bit too clean. With processor hyperthreading, I/O overhead, JVM magic and other factors, it was pretty likely that at least (number of processors + 1) * actors would be able to run before performance degraded.

Stand back! I’m about to try Science!

So, I rewote the example to test throughput (check it out if you’re interested, but it’s more or less the same as the first, but with more jiggery-pokery), and ran it on my dual-core laptop:

1 actor (output cleaned up):

PiEvaluator:Shutting down generators
PointGenerator[1]:exiting having generated 47040000 points
PiEvaluator:Shutting down aggregator
Aggregator:Considered 47040000 points
Aggregator:Pi is approx: 3.1415372448979593

2 actors:

PiEvaluator:Shutting down generators
Aggregator:Pi is approx: 3.141657176679491
PointGenerator[2]:exiting having generated 34000000 points
PointGenerator[1]:exiting having generated 33690000 points
PiEvaluator:Shutting down aggregator
Aggregator:Considered 67690000 points
Aggregator:Pi is approx: 3.1416520608657112

3 actors:

PiEvaluator:Shutting down generators
PointGenerator[1]:exiting having generated 20470000 points
PointGenerator[2]:exiting having generated 27680000 points
PointGenerator[3]:exiting having generated 20410000 points
PiEvaluator:Shutting down aggregator
Aggregator:Considered 68560000 points
Aggregator:Pi is approx: 3.141365577596266

4 actors:

PiEvaluator:Shutting down generators
Aggregator:Pi is approx: 3.141612733060482
PointGenerator[2]:exiting having generated 27660000 points
PointGenerator[1]:exiting having generated 19670000 points
PointGenerator[3]:exiting having generated 18790000 points
PointGenerator[4]:exiting having generated 10000 points
PiEvaluator:Shutting down aggregator
Aggregator:Considered 66130000 points
Aggregator:Pi is approx: 3.1416107061847875

5 actors:

PiEvaluator:Shutting down generators
Aggregator:Pi is approx: 3.141523427529626
PointGenerator[2]:exiting having generated 28220000 points
PointGenerator[4]:exiting having generated 10000 points
PointGenerator[5]:exiting having generated 10000 points
PointGenerator[1]:exiting having generated 18810000 points
PointGenerator[3]:exiting having generated 18850000 points
PiEvaluator:Shutting down aggregator
Aggregator:Considered 65900000 points
Aggregator:Pi is approx: 3.141531047040971

I ran this quite a few times to verify that I wasn’t getting inconsistent results due to GC or other one-off factors. The results were pretty consistent:

  • The jump from 1 to 2 actors gave a huge boost in performance as the second processor joined the fray. Interesting to know that there wasn’t exactly a 100% performance boost, more like ~50%. There were programs, notably Spotify, running on my PC at the same time as the test, so that may account for it.
  • When a third actor was introduced, there was a proportionately small, but repeatable increase in the number of calculations that could be run at the same time, equivalent to 2-3%
  • beyond 3 actors, the Scala actors library itself looks to have kicked in (I may be wrong here, so please correct me if you know otherwise). The 4th and 5th actors were starved of work as opposed to being allowed to degrade the performance of the first three as was expected. It wasn’t until the first three actors started shutting down that the ones created later got to finish their first batch of work.

So, don’t take any known lore for granted. Much like anywhere else on the JVM, tuning performance is down to experimentation.

As an aside, it was interesting to note that beyond a point, my evaluation of Pi wasn’t getting any better. I put it down to the imprecision of floating-point arithmetic. Key take-away point: when programming a moon launch, don’t use doubles.

Better living through parallelism

Judging by the interest to my last actors post, I thought I’d throw up a piece of code that uses actors anonymously to parallelise a long running operation. Not every operation can be parallelised, most things we work on tend to be fairly sequential. However, sometimes if you can split up the work to perform an operation, you can get some serious bang for your buck.

Let’s take a pretty interesting little problem as an example, working out pi. I first encountered this problem in a job interview a while back (yeah, I thought it was out there too). It’s an example of what’s known as a Monte Carlo Simulation. The general way to work this out is described in detail in this Google Code University tutorial on parallel programming (worth taking a look).

The genaral approach works on the basis of this diagram.

The steps are as follows:

  1. generate points randomly in the square
  2. work out whether each point falls into the circle (distance from centre is less than the radius, use Pythagoras here)
  3. the proportion of points in the circle to those in the square is approximately the same as that of the areas of the two shapes; transpose the area formulas (I’ll show it in the code) to work out an approximate value of pi

It’s a nice little problem, because the more points you generate, the better the estimate gets. It’s also easily parallelisable as the first two steps can be performed repeatedly by anyone. Let’s do a first pass in a single thread:

package net.jakubkorab.pi_evaluator

import scala.actors._
import scala.actors.Actor._
import scala.math._

object PiEvaluator {
	val sideLength : Double = 1
	val radius : Double = sideLength / 2 
	val totalPointsToSample : Long = 10000

	def isRandomPointWithinCircleRadius() : Boolean = {
		val x = (random * sideLength) - radius
		val y = (random * sideLength) - radius
		val hypotenuse = sqrt(pow(abs(x), 2) + pow(abs(y), 2)) // pythagoras
		hypotenuse <= radius
	def pointsInCircle(pointsToSample : Long) : Long = { 
		(1L to pointsToSample)
			.map((i : Long) => isRandomPointWithinCircleRadius())
			.foldLeft(0L) { (pointCount : Long, pointInCircle : Boolean) => 
				if (pointInCircle) pointCount + 1 else pointCount

	def approximatePi(pointsEvaluated : Long, pointsInCircle : Long) = {
		println("After " + pointsEvaluated 
			+ " samples, the number of points in a circle was " + pointsInCircle)
		val areaOfSquare = sideLength * sideLength
		// areaOfCircle/areaOfSquare =~ pointsInCircle/pointsEvaluated, so
		val areaOfCircle = pointsInCircle * areaOfSquare / pointsEvaluated
		// areaOfCircle = pi * r^2, so
		val approximatePi = areaOfCircle / pow(radius, 2)
		println("Pi is approx: " + approximatePi) 

	def main(args : Array[String]) = {
		approximatePi(totalPointsToSample, pointsInCircle(totalPointsToSample))

At line 20, there’s a nice little bit of map-reduce action going on (fold is an equivalent name). What’s going on is that for each number, we’re generating a random number an determining whether it’s in a circle. Then we fold/reduce the list of booleans returned by the map function to give a count of all the points that were generated within the radius of the circle. It’s a super useful little programming construct.

We then call approximatePi() to do our maths for us.

Running this (I’m doing it through SBT, that’s why the [Info] outputs) we get:

After 10000 samples, the number of points in a circle was 7883
Pi is approx: 3.1532
[info] == run ==
[success] Successful.
[info] Total time: 5 s, completed 30-Jan-2011 13:45:46

Not bad, but we need more points to get a better result. We know that pi should be around 3.14159, but we’re not quite there yet. Scaling up our numbers:

Samples Time (s) Pi
10,000 5 3.1532
100,000 5 3.13676
1,000,000 7 3.141188
10,000,000 20 3.1424864
100,000,000 189 3.14139356

Ok, we’re approaching our known value, but this approach has shown some stress. It’s not scaling too well. It’s slowing right down, but my processor (Intel Core2 Duo T9300 @ 2.5GHz) is showing only ~60% usage. If we could break down the point generation among a number of threads, and then tally up the result at the end, we could make better use of our hardware. Enter fork-join.

Fork-join is probably one of the most common things that you’d want to do with concurrency. It’s a bit of a pain in the butt in Java (task executor framework, futures etc.). It really shouldn’t be. There’s some excellent work going on around a framework for this, and map-reduce too, for future versions of the language. In the meantime, you need to jump through hoops – cue fishing through Concurrency in Practice. In Scala though, it’s really easy using actors. So we rewrite main:

	val workers = 10
	def main(args : Array[String]) = {
		// break down the job between worker - there's a rounding issue here, but never mind
		val samplesPerWorker = totalPointsToSample / workers 
		1 to workers foreach { (workerNumber : Int) => 
			val worker = actor {
				receive {
					case pointsToEvaluate : Long => { 
						sender ! (pointsToEvaluate, pointsInCircle(pointsToEvaluate)) // reply with a tuple
			worker ! samplesPerWorker 
		var workersReplied = 0
		var totalPointsEvaluated = 0L
		var totalPointsInCircle = 0L
		while (workersReplied < workers) {
			receive {
				case (pointsEvaluated : Long, pointsInCircle : Long) => {
					workersReplied += 1
					totalPointsEvaluated += pointsEvaluated
					totalPointsInCircle += pointsInCircle
					if (workersReplied % 100 == 0) {
						println(workersReplied + " workers replied")
		approximatePi(totalPointsEvaluated, totalPointsInCircle)
Samples Workers Time (s) Pi
10,000,000 10 18 3.1427624
100,000,000 10 64 3.1418042

64s is a lot better than our original 189s! And this effect just gets more pronounced the more processors you have (where the optimal number of actors matches the number of processors; at least it should be – feel free to play around with this).

In the second version of main, the actor keyword/block is actually a method on the Actor object that creates an anonymous actor instance and instantly starts it. We ping some messages to it, and wait for the results using receive().

Easy, and you can apply it for a ton of different tasks.

Why I dig Scala: Concurrency and the Dining Philosophers

I am occasionally asked what the big deal is about Scala. For me, to decide whether a programming language is worthwhile is dependent on two practical questions: does it aid comprehension, and does it reduce code. The two are not necessarily interchangeable. Terseness, after all, does nothing to aid comprehension. Scala scores points on both counts. It also has a sweet spot that I haven’t encountered elsewhere, that is that it lends itself to concurrent programming in a way which is easy to reason about, and therefore get right. It does this through a set of supporting language features that combined allows us to code at a higher level of abstraction: a leaning towards immutability, functional constructs (such as closures), as well as a familiar way to model a domain via object orientation.

Put together, it’s massively powerful. As it runs on the JVM you could code Scala concurrency the same way as Java, with the bog-standard tools such as wait/notify/notifyAll or the JDK 5+ concurrency libraries. You might get some shorter code, but it misses the point. Scala comes with an implementation of the Erlang-inspired actors model out of the box, which lets you deal with the problems of concurrency in a manner that is much easier to reason about. Actors aren’t a language construct, but a library that makes use the underlying platform (the JVM) and Scala’s language features to provide a much simpler mental model for us to deal with. Actors are “like” threads (not really, but close enough for a starting point) that send and receive messages to and from other actors. How does this aid comprehension? Synchronous and asynchronous messages are very simple to reason about, and IMHO much more straightforward than the Java concurrency libraries (compare the actors documentation to the Trains Book).

Consider the classic computer science concurrency problem, the Dining Philosophers. A number of philosophers sit down at a round table to do some eating and thinking. Each philosopher brings with him a single chopstick that he places on his right hand side. So you have X philosophers and X chopsticks. To eat, a philosopher must pick up the chopsticks on his left and right sides. Leaving aside hygiene issues, it’s a cool toy problem around resource contention. So, how would you do this in Scala? The mental leap to be made is that “everything is an actor”. Given a number of philosophers dining at a table, it’s quite nicely modelled if you think about both the philosophers and the table as actors that pass messages beteen each other. If you want to see the whole file (~150 lines), it’s available here.

Firstly the messages that we’re going to be passing around:

package net.jakubkorab.philosophers

import messages._
import scala.actors._
import scala.actors.Actor._
import scala.math._

package messages {
	class Chopstick(val position : Int)
	object Side extends Enumeration {
		type Side = Value
		val Left, Right = Value
		def randomSide() = { Side(floor(Side.values.size * random).intValue) }
		def otherSide(side : Side.Value) = { Side.values.find{_ != side}.get }

	sealed abstract class Message

	abstract class TableMessage() extends Message
	case class AllFinished() extends TableMessage
	abstract class ChopstickResponse() extends TableMessage
	case class ChopstickAvailable(val chopstick : Chopstick) extends ChopstickResponse
	case class ChopstickUnavailable() extends ChopstickResponse 

	abstract class DinerMessage() extends Message
	case class RequestChopstick(val philosopher : Philosopher, val side : Side.Value) extends DinerMessage 
	case class ReplaceChopstick(val chopstick : Chopstick) extends DinerMessage 
	case class CouldNotEatAnotherBite(val guest : String) extends DinerMessage

Messages don’t actually need to be of any particular type, I just like thinking of that sort of thing in a hierarchy. All of the messages that I’ll pass around are subclasses of Message.

Now for our philosophers:

class Philosopher(val name : String, val wordsOfWisdom : String) extends Actor {
	var table : Actor = null
	var seatedAt : Int = -1
	private var timesLeftToEat = 3
	override def act() = {
		while (timesLeftToEat > 0) {
			val side = Side.randomSide // pick a chopstick to use first 
			say("Requesting chopstick 1 on " + side)
			table !? (1000, RequestChopstick(this, side)) match { 
				case Some(ChopstickAvailable(chopstick1 : Chopstick)) => {
					pause() // put in a delay so we can see actors switching
					val otherSide = Side.otherSide(side) // request the other
					say("Requesting chopstick 2 on " + otherSide)
					table !? (100, RequestChopstick(this, otherSide)) match {
						case Some(ChopstickAvailable(chopstick2 : Chopstick)) => {
								table ! ReplaceChopstick(chopstick1) // return chopsticks
								table ! ReplaceChopstick(chopstick2)
						case Some(ChopstickUnavailable()) => { 
							say("No " + otherSide + " chopstick"); 
							table ! ReplaceChopstick(chopstick1); 
						case None => { say("None"); table ! ReplaceChopstick(chopstick1) }
				case Some(ChopstickUnavailable()) => { say("No " + side + " chopstick") } // no luck getting a chopstick
				case None => say("None")
		react {
			case AllFinished => { say(wordsOfWisdom); exit }
	private def think() = { say("Hmm"); pause() }
	private def eat() = { 
		say("Nom nom"); 
		timesLeftToEat -= 1
		if (timesLeftToEat == 0) {
			table ! CouldNotEatAnotherBite(name)
	private def say(s : String) = { println(name + ": " + s) }
	private def pause() = { Thread.sleep(ceil(random * 1000).intValue) }
object Philosopher {
	def apply(name : String, wordsOfWisdom : String) = new Philosopher(name, wordsOfWisdom)

As I said, it’s pretty straightforward if you think of an actor as a Thread. Think of act() as the equivalent of Runnable#run(). Philosophers will be instantiated with a name and some words of wisdom they’ll come up with. Once they’re sat at a table, they’ll receive an instance of table for them to communicate with and a place where they’re sitting. Messages are sent either asynchronously to the table using the ! method, or synchronously using !? (in which case the number that follows is a timeout). The syntax may be unfamiliar, but I think it reads pretty easily even to those unfamiliar with Scala. I won’t go through it in detail. A philosopher sends a chopstick request to the table and gets a response, either that a chopstick is available, or that it’s unavailable. Pretty straightforward.

So, now the table.

class Table(val philosophers : Set[Philosopher]) extends Actor {
	if (philosophers.size < 2) throw new IllegalArgumentException("At least 2 philosophers must dine together")
	var chopsticks = new Array[Chopstick](philosophers.size)
	var location = 0
	philosophers.foreach { philosopher => 
		chopsticks(location) = new Chopstick(location)  // lay the cutlery
		philosopher.seatedAt = location
		location += 1
	var guestsEating = philosophers.size
	override def act() = {
		println("Starting the meal")
		philosophers.foreach { philosopher => philosopher.table = self; philosopher.start  } // let's go
		while (true) {
			receive {
				case RequestChopstick(philosopher : Philosopher, side : Side.Value) => giveChopstickIfAvailable(philosopher, side)
				case ReplaceChopstick(chopstick : Chopstick) => replaceChopstick(chopstick)
				case CouldNotEatAnotherBite(guest : String) => guestFinished(guest)
	private def giveChopstickIfAvailable(philosopher : Philosopher, side : Side.Value) = {
		var index = if (side == Side.Right) philosopher.seatedAt else philosopher.seatedAt - 1 
		if (index < 0) { index = philosophers.size - 1 } // get the one on the end of the array
		val chopstick = chopsticks(index)
		if (chopstick == null) {
			println("No chopstick available at " + index)
			sender ! ChopstickUnavailable() // sender, not philosopher!
		} else {
			chopsticks(index) = null
			sender ! ChopstickAvailable(chopstick)
	private def replaceChopstick(chopstick : Chopstick) = {
		chopsticks(chopstick.position) = chopstick
	private def guestFinished(guest : String ) = {
		println(guest + " is done")
		guestsEating -= 1
		if (guestsEating == 0) { 
			philosophers.foreach {_ ! AllFinished}
			println("All done")

The role of the table is to manage the resources, in this case the chopsticks. Calling start() on an actor is analogous to Thread#start().

And now, to kick it all off, let’s stick some philosophers on a table. I have chosen the Greco-Roman Stoics for their easy going approach to life, but any school of thought will do. Chinese philosophers may have been more appropriate to the cutlery. My example, my choice.

object PhilosophersLauncher {
	def main(args : Array[String]) = {
		val table = new Table(
			Set(Philosopher("Seneca the Younger", "The point is, not how long you live, but how nobly you live."),
				Philosopher("Epictetus", "Freedom is secured not by the fulfilling of men's desires, but by the removal of desire." ),
				Philosopher("Marcus Aurelius", "Everything is right for me, which is right for you, O Universe."),
				Philosopher("Zeno of Citium", "Shit happens.")) // one of his lesser known ones

So, does it work?

Starting the meal
Seneca the Younger: Hmm
Epictetus: Hmm
Marcus Aurelius: Hmm
Seneca the Younger: Requesting chopstick 1 on Right
Marcus Aurelius: Requesting chopstick 1 on Right
Epictetus: Requesting chopstick 1 on Right
Marcus Aurelius: Requesting chopstick 2 on Left
No chopstick available at 1
Marcus Aurelius: No Left chopstick
Marcus Aurelius: Hmm
Epictetus: Requesting chopstick 2 on Left
No chopstick available at 0
Epictetus: No Left chopstick
Epictetus: Hmm
Marcus Aurelius: Requesting chopstick 1 on Left
Seneca the Younger: Requesting chopstick 2 on Left
Seneca the Younger: Nom nom
Marcus Aurelius: Requesting chopstick 2 on Right
Marcus Aurelius: Nom nom
Epictetus: Requesting chopstick 1 on Left
No chopstick available at 0
Epictetus: No Left chopstick
Epictetus: Hmm
Epictetus: Requesting chopstick 1 on Left
Zeno of Citium: Requesting chopstick 1 on Left
Epictetus: Requesting chopstick 2 on Right
Epictetus: Nom nom
Epictetus is done
Zeno of Citium: Requesting chopstick 2 on Right
Zeno of Citium: Nom nom
Zeno of Citium: Hmm
Zeno of Citium: Requesting chopstick 1 on Right
Zeno of Citium: Requesting chopstick 2 on Left
Zeno of Citium: Nom nom
Zeno of Citium is done
All done
Marcus Aurelius: Everything is right for me, which is right for you, O Universe.
Epictetus: Freedom is secured not by the fulfilling of men's desires, but by the removal of desire.
Seneca the Younger: The point is, not how long you live, but how nobly you live.
Zeno of Citium: Shit happens.

Yup. I think it’s pretty easy to make sense of all this. You can easily reason about what happens when, just by drawing a sequence diagram. Consider the backdown strategy when a philosopher can’t get hold of a chopstick:

Zeto->Table: RequestChopstick(Left)
activate Table
Table-->Zeto: ChopstickAvailable(C0)
deactivate Table

Epictetus->Table: RequestChopstick(Left)
activate Table
Table-->Epictetus: ChopstickAvailable(C1)
deactivate Table

Zeto->Table: RequestChopstick(Right)
activate Table
Table-->Zeto: ChopstickUnavailable()
deactivate Table
Zeto->Table: ReplaceChopstick(C0)

note over Zeto: Sleeps for a bit before trying again

Epictetus->Table: RequestChopstick(Right)
activate Table
Table-->Epictetus: ChopstickAvailable(C0)
deactivate Table
Epictetus->Epictetus: Eat
Epictetus->Table: ReplaceChopstick(C1)
Epictetus->Table: ReplaceChopstick(C0)

So, actors are cool, and in Scala they are easy to reason about due to the syntax and ability to mix in OO concepts. For a comparison, check out the equivalent in Java using semaphores. The Scala version is far less code and much easier to comprehend. And if you think that’s cool, check out Akka.

Expressive or just terse?

This Scala code populates a list with objects:

// fairly standard
var people = List[Person]()
for (i <- 1 to 10) {
	people = new Person(i) :: people

So does this.

// slightly more functional
val people = (List[Person]() /: (1 to 10)) {(people, i) =>
	new Person(i) :: people

If you’re a Java programmer and just balked, you’re probably not alone. I know what it does and I have to read it from left to right to left to right again. In this case the shortcut’s a Scala mental snag equivalent to Java code like:

boolean empty = (!list.empty()) ? false : true;

There’s always going to be a pause when you read stuff like this. Remove the operator weirdness and you get:

val people = (1 to 10).foldLeft(List[Person]()) {(people, i) =>
	new Person(i) :: people

At least here, the reader doesn’t have to catch on to the flipping of the object and the method argument done by the /: method. In time, you probably get used to reading shortcut operators, but more than likely you’re going to be snagging for a while, and so will the next guy reading your code.

Just because you can do something, doesn’t mean you should :)

Thanks to @jstrachan for this:

val people = (1 to 10).map(new Person(_))

I have much to learn :)

NoSuchMethodException launching Scala App

This is a pretty standard thing to do in Java:

public class App {
	public static void main(String[] args) {
		App app = new App();

	public void start() {
		// do something

So you reckon you’d be able to do this in Scala:

object App {
	def main(args: Array[String]) = {
		val app = new App()

class App() {
	def start() = {
		// do something

You go to run it and:
Exception in thread "main" java.lang.NoSuchMethodException: id.jakubkorab.App.main([Ljava.lang.String;)
at java.lang.Class.getMethod(
at com.intellij.rt.execution.application.AppMain.main(

Turns out the Scala compiler doesn’t treat application launching in the same way that it treats other classes with companion objects. If you rename the object App to another name, it works.

object AppLauncher {
	def main(args: Array[String]) = {
		val app = new App()

Go figure.

Running a Scala App in IDEA 9

My first experience with IDEA hasn’t been a good one. In all fairness, not IDEA itself, but with a plugin (isn’t that always the way). I use Eclipse on a day to day basis, but since I heard that the IDEA Scala support is much better, I decided to download the Community Edition and give it a bash.

After following these instructions to hook up a Maven Scala project in IDEA (why not just go all in) I went to run the auto generated app:

package net.jakubkorab

 * Hello world!
object App extends Application {
	println("Hello World!")

The instructions were straightforward “just do CTRL SHIFT F10 and it’ll run”. Tried that. Nothing. Huh? To the net…

“Add a new Run configuration – Scala Application”. Nothing.
Maybe it’s a plugin version thing? Google… *muttering*

“you’ll need a different plugin called Scala Application”. Search for the plugin… Doesn’t exist. *more muttering*

Finally I found this post from 2008 – seems the plugin won’t run classes that extend scala.Application – you have to give it a main() method!

NP. Given that not many people have actually met sailors, I’d like to propose “swears like a programmer”.

package net.jakubkorab

 * Hello world!
object App {
	def main(args:Array[String]) {
		println("Hello World!")

CTRL SHIFT F10 and it works. If you look in Run configurations, it’s been set up as an Application, not a Scala Application like some of the other sites suggest.

Not a good start with IDEA, but like most systems with plugins, don’t throw the baby out with the bath water. I look forward to happier times from here on.