An alternative model of computation for concurrency

I recently found this document that I had written for my company newsletter, copied verbatim below.

Concurrency is one of the hot subjects in computer science today. This has partly to do with the fact that processors (1) are reaching their physical limits and thus we need to start looking at new avenues of achieving performance. Herb Sutter the renowned author has written an excellent article (2) named “The Free Lunch Is Over – A Fundamental Turn Toward Concurrency in Software” which captures very beautifully why concurrency is going to be important in the years ahead.

Let us look at a few issues surrounding concurrency today and also look at an alternative model of computation that solves these problems.

The most prevalent model of computation in both hardware and software today is the Von Neumann architecture (3) which is an implementation of the Turing (4) machine which in turn is based on the work of Alan Turing(5). Programs written in this model have a shared memory area also known as the store which can be to store data. The store is divided into chunks called cells and named cells are called variables.

A function written in the Neumann model can make use of the store to aid in its computation, thus any time a function uses data other than those provided as parameters to the function, it is in fact using the store.

The alternative model of computation we’ll look at is called the functional model. The functional model is based on the work of Alonzo Church (6) called Lambda Calculus(7), who came up with this model at approximately the same time as Alan Turing. The good thing though is that both models of computation are equivalent i.e. any computation that can be expressed in one model can also be expressed in the other.

The functional model is based on mathematics, in this model there is no store and all computation is done by evaluating mathematical functions. A mathematical function is one in which the value of a function is totally dependent on the values of the parameters and not on any external state. Also the term variable in this model refers to a named value and not to a named cell.

Consider a simple function
f(x) = g(x) + h(x)

The function f takes x as a parameters and returns the result of evaluating function g with parameter x and adding it to the result of evaluating the function h with parameter x.

In Neumann style programs it is possible that f will return a different output for the same value of x, where as in functional style programs it is guaranteed that no matter how many times f is called with a particular value of x, the output will always be the same. This is because functional programs do their computations without the use of any external state whereas the computations in Neumann style programs are affected by the state of the store.

Let us understand this with a concrete example
int y = 2;
foo(x)
{
return x *y;
}

bar(x)
{
int y = 2;
return x * y;
}

The function foo uses a variable y from the free store to do its computation, thus if some other function were to alter the value of the cell y, foo would start returning different results.

On the other hand bar does not use any external state and for any value of x it will always return twice of x.

It is possible to write functional programs in Neumann languages by totally avoiding the use of store and modeling the entire computation using only functions that use local variables. On the other hand functional languages do not have the concept of a shared store so writing Neumann style programs in them is not possible.

Now let’s take a look at how these computation models interact with concurrency. There are two ways parallelize programs, implicitly and explicitly.

In implicit parallelization, the compiler does the hard work of looking at the program and deciding what instruction sequences can be parallelized. This is the most commonly used method of parallelization and in fact most of the programmers are not even aware that the compiler does this under the hood.

Neumann style programs are hard to optimize for parallelization since functions can be implicitly dependent on each other via the store. For e.g.
baz(x)
{
foo(x);
return bar(x);
}

int y = 0;
foo(x)
{
y = 2 * x;
}

bar(x)
{
return y * 6;
}
In this case foo and bar need to execute sequentially because they make use of the cell ‘y’ from the store. This is an extremely simple example but hidden dependencies like this can get incredibly complicated to figure out. Thus the compiler has to be conservative about what it can parallelize and only parallelize code that it absolutely knows is safe to do so which unfortunately is not a lot.

On the other hand since functions written in a functional language are not dependent on external state they can be very easily parallelized by evaluating different functions in parallel.
For e.g.
baz(x)
{
foo(x);
bar(x)
combo(foo(x));
}
Since foo, bar and combo are purely mathematical functions, the evaluation of baz can be parallelized by evaluating foo and bar in parallel and then evaluating combo, since combo depends on the value from foo.

Now let’s turn our attention to explicit parallelism, where the programmer explicitly programs parallelism. This is done by creating parallel execution sequences called threads that are scheduled by the operating system to execute on the available processors.

In the Neumann style the problem that arises with explicit parallelism is that since all threads share the same common store they need to be careful not to overwrite the data of other threads. The canonical way to handle this has been to but checks around code that accesses shared data to ensure that only a single thread has access to shared data at any point of time. This blocking behavior in turns leads to issues like deadlock where all threads keep waiting for access to a shared location on the store. Further the complexity of this solution increases exponentially with the number of threads, number of shared data areas and the number of places where the shared data is accessed.

A better solution to this problem is actually doing the lock on the memory itself rather than on the code that accesses the memory. Transactional memory systems (8) have started making their way into mainstream computing but they’re still a long way off from being an ideal solution.

In 1978 C.A.R Hoare(9) (best known for inventing QuickSort) wrote a paper titled Communicating Sequential Processes (10), this classic paper introduces a new way to tackle the problem of explicit parallelism, which is in a way similar to the model used by functional programming languages. In this model concurrent processes share state information by passing messages to each other rather than sharing a common store. These messages can be passed asynchronously or synchronously. Thus the whole problem of preventing access to shared data is made non-existent.

There is one language today that combines aspects of functional programming and message passing concurrency to come up as an ideal distributed computing platform. That language is Erlang (11) from Ericsson laboratories. This language and platform came about as the result of research in 1980s to find out which features of computer languages were suited for programming telecommunication systems. In addition to concurrency, this language has some other cool features like error recovery and hot updates that enable any system built using Erlang to keep on running. Joe Armstrong, the brains behind Erlang has written a very good article (12) on all the fuss about Erlang and I would encourage anyone who is interested to take a look at it.

I had originally intended for this article to be an introduction to Erlang but ran out of my 1000 word limit even before I could get past the introduction, thus I decided to pick up the one big feature of Erlang which is concurrency and focus on that. In terms of support for reliability and concurrency, Erlang is unrivaled and I believe that in the years ahead some aspects of Erlang or Erlang itself will become mainstream the way has Ruby has.

References
1. www.cise.ufl.edu/research/revcomp/physlim/PhysLim-CiSE/PhysLim-CiSE-5.doc – Physical limits of computing.
2. http://www.gotw.ca/publications/concurrency-ddj.htm – The free lunch is over.
3. http://en.wikipedia.org/wiki/Von_Neumann_architecture – The Von Neumann architecture.
4. http://en.wikipedia.org/wiki/Turing_machine – Turing machine.
5. http://en.wikipedia.org/wiki/Alan_Turing – Alan Turing
6. http://en.wikipedia.org/wiki/Alonzo_Church – Alonzo Church
7. http://en.wikipedia.org/wiki/Lambda_calculus – Lambda calculus
8. http://en.wikipedia.org/wiki/Transactional_memory – Software Transactional Memory
9. http://en.wikipedia.org/wiki/C._A._R._Hoare – C.A.R. Hoare
10. http://en.wikipedia.org/wiki/Communicating_sequential_processes – Communicating Sequential Processes
11. http://www.erlang.org/ – Erlang
12. http://www.pragmaticprogrammer.com/articles/erlang.html – More about Erlang
13. http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10142 – Concepts, Techniques, and Models of Computer Programming – This is an amazing book that talks about the various models of computation.