Tuesday, March 13, 2012

What future for futures?

I recently read twitter's scala style recommendations and could not help being somewhat unhappy about their recommendation to use futures. They basically say: "Use Futures to manage concurrency.

Fifteen plus years ago I wrote a futures library which I used in a derivative trading system for a long long time. All the basic functionalities were there (trigger on future, future timeouts,  merge futures) and some more advanced like boolean trigger conditions (e.g. trigger if futureA or futureB), as well as futures across processes and networks.  It was a nice library!

Yet ten years later, we removed the use of futures!
Here is the reasoning...

When a future is used, its state can be seen as part of a higher order concurrent logic. The comfort of using futures, is that we do not need to model nor design this higher logic, it is implicitely defined and managed for us by the future library. There are situation where this lack of "bigger" picture is a good thing, one of these is to use futures at the periphery of your system's design. This makes sense because your design stops at the border of your system, so it makes less economic sense to invest in building a model of how the rest of the world will interact with you. Yet as much as it make sense to use futures in boundary interfaces, the deeper you go into your system the less it makes sense to use futures. What happens is that the implicit higher order model created by the futures has no relations with your system's higher order design. And this leads to bad growth.

Developers typically will start to notice this higher order mismatch in two ways: the first is shared resource management, the second is when using futures in streams.

When a future value is set, the associated triggers may be executed synchronously by having the "setter" call directly the trigger, or asychronously at a later time or within another thread.If you want the triggers to have a "DB" like transactional property, then you want to stay within the synchronous trigger model. The tricky part with the sychronous trigger model is that it interferes with your shared resource model: if you use locks, you can easily have unexpected deadlocks, if you use a transactional memory model, your transactions are of unknown size because you do not always know who has been set as triggers, cause large transactions to retry at a performance cost. Granted enough detective work can work around these issues, but these type of problems can happen at the worse of time, such as in the error handling of the future, possibly in a branch of code which is rarely executed and often in difficult to reproduce scenarios. The solution to go "asynchronous" is often not a solution because the asynchronous triggered code is severly handicaped as it happens out of context later.

Another area where futures meet their limits is with streams. Imagine you use a future to hold the return of a computation; so you create the future, set up a trigger, launch the computation. Now if you need to do that again (create, set up, launch), and again, and you try to do that a few million times per second, you find that you meet a performance limit with futures. Futures do not scale well within real time streams. You could push the future concept to achieve high performance streaming, but this could go against creating a true design identity for the stream and would limit the growth of your design.

I am not sure that my examples here are convincing. The reality is that getting your higher order designs right is hard (and expensive in time). So even if futures meet there limit at some level of abstraction, they are definitely mighty confortable to use before you meet those limits.  So maybe my recommendation is "use futures" and then "take them out" when you understand how to do better!