Friday, April 13, 2012

Distributed Algorithms and Convexity

I like this  paper in CIDR 2011 which shows you how to use the convex properties of functions to implement distributed algorithm with provable properties. The idea is that if functions are logicaly monotonic then they have nice properties. This paper fills in the details of what we already knew but with the exact details of what these good properties are. FYI: logically monotonic implies that inputs only add to outputs. Said differently incremental implementations of functions based on selections, joins and projections all have the logical monotonicity property. And this brings us to convexity because that is exactly why these properties exist: selections, joins and projection is all about preserving the convexity property of the function. And finally this is why I like monadic and stream oriented programming: its about implementing incremental algorithms and with enough care you can preserve nice provable properties!

On a side note: type inference and non-linear optimization all strive on convexity.  Many concepts are built on a basis of convex properties, so your really want to pay attention to this!

Tuesday, March 13, 2012

What future for futures?

I recently read twitter's scala style recommendations and could not help being somewhat unhappy about their recommendation to use futures. They basically say: "Use Futures to manage concurrency.

Fifteen plus years ago I wrote a futures library which I used in a derivative trading system for a long long time. All the basic functionalities were there (trigger on future, future timeouts,  merge futures) and some more advanced like boolean trigger conditions (e.g. trigger if futureA or futureB), as well as futures across processes and networks.  It was a nice library!

Yet ten years later, we removed the use of futures!
Here is the reasoning...

When a future is used, its state can be seen as part of a higher order concurrent logic. The comfort of using futures, is that we do not need to model nor design this higher logic, it is implicitely defined and managed for us by the future library. There are situation where this lack of "bigger" picture is a good thing, one of these is to use futures at the periphery of your system's design. This makes sense because your design stops at the border of your system, so it makes less economic sense to invest in building a model of how the rest of the world will interact with you. Yet as much as it make sense to use futures in boundary interfaces, the deeper you go into your system the less it makes sense to use futures. What happens is that the implicit higher order model created by the futures has no relations with your system's higher order design. And this leads to bad growth.

Developers typically will start to notice this higher order mismatch in two ways: the first is shared resource management, the second is when using futures in streams.

When a future value is set, the associated triggers may be executed synchronously by having the "setter" call directly the trigger, or asychronously at a later time or within another thread.If you want the triggers to have a "DB" like transactional property, then you want to stay within the synchronous trigger model. The tricky part with the sychronous trigger model is that it interferes with your shared resource model: if you use locks, you can easily have unexpected deadlocks, if you use a transactional memory model, your transactions are of unknown size because you do not always know who has been set as triggers, cause large transactions to retry at a performance cost. Granted enough detective work can work around these issues, but these type of problems can happen at the worse of time, such as in the error handling of the future, possibly in a branch of code which is rarely executed and often in difficult to reproduce scenarios. The solution to go "asynchronous" is often not a solution because the asynchronous triggered code is severly handicaped as it happens out of context later.

Another area where futures meet their limits is with streams. Imagine you use a future to hold the return of a computation; so you create the future, set up a trigger, launch the computation. Now if you need to do that again (create, set up, launch), and again, and you try to do that a few million times per second, you find that you meet a performance limit with futures. Futures do not scale well within real time streams. You could push the future concept to achieve high performance streaming, but this could go against creating a true design identity for the stream and would limit the growth of your design.

I am not sure that my examples here are convincing. The reality is that getting your higher order designs right is hard (and expensive in time). So even if futures meet there limit at some level of abstraction, they are definitely mighty confortable to use before you meet those limits.  So maybe my recommendation is "use futures" and then "take them out" when you understand how to do better!

Thursday, February 16, 2012

Double-checked locks to support lockless collaboration

Double-checked locks are normally seen as a form of optimization, they reduce the overhead of their acquisition by first testing their locking criterion. I recently used double-checked locking to support non locking collaboration  between real-time processes. Here is how it works:  all processes agree upon a future collaboration point, each then sets up a lock to protect themselves from going past that point without the right information, they then proceed to exchange enough information in a finite number of transactions, this completes enough their knowledge that when they meet the collaboration point the lock is not activated. Well-tuned and repeated, the processes exchange information in a semi-synchronous manner without ever waiting on the locks!