Saturday, December 28, 2019

Beware of the language trap in software development

Consistent design of multiple views of distributed state

The notion of "state" is key to software development. For example, we talk about stateful and stateless designs. Still, there are multiple notions of state. For example, there is the stored state, the communicated state, the received state, the processed state. Therefor when we make a statement like "stateful", we are always just referring to one "notion" of state and not the others. This "abuse of stateful view" was historically not an major issue. Yet with the rise of processing speeds and truly distributed systems, these multiple views of states started more and more to exist "simultaneously" and this caused problems. For example, we might model a trading system as four different states: the state of trading action messages received by the exchange, the current state of the exchange, the state of confirmations received from the exchange, and the current state of our trading system. A question is then: in our concrete code, from which state views should we build our design, and how to do it consistently? We now know this is the wrong question. A trading system is a distributed system, all views of states are real and have business significance. Therefore the question is not which state, but how to design all state views/models as one coded design in a cohesive manner?

We now know we can use higher order types to do this, to capture the multiple views of states "as one model" within a distributed computation. It took me a long time to understand this. One reason being that only C++14 had the explicit template specialization needed to do this purely in C++. And while I had experimented with Prolog to generate complementary C++ code models in the mid-1990s (and was not convinced). And in early 2000, I tried to use generative technics with dynamically typed functional programming (I wrote about a generative programming conference in Dr. Dobb's at the time). It was only when I picked up statically typed FP (initially F# in 2006), that I understood how it could be done (e.g. with monad-comonad adjunctive joins as hinted here pre-Elevence).

Business success

The reason I bring "managing distributed states" in this posting, is that I was retrospecting on the challenge of success in my many software developments. This made me think back to my experience in developing a market making system. (I wrote about Actant in a recent posting).

Early 2000 market and regulatory pressures meant that many new derivative exchanges where being created. This resulted in many "easy" new business opportunities to sell an algo trading and market making system, as one "just" needed to connect a product to these new exchanges. However, what we did not foresee, was that this "lively" exchange business development would also have the exchanges competing "lively" among themselves and be updating their trading and quoting APIs at an unprecedented rate (e.g. every quarter). In parallel, we consistently had mismatches in our different view of distributed state, for the reason mentioned above, but also because we were exposing different state views that were easy on our end-users, but which broke distributed consistency. The result was, we spent most of our development resources maintaining high-performance exchange connectivity and managing a hard to manage "mismatch" of state models, with little resources left over to be strategic.

The language trap

One of my Actant partners once said: C++! It was C++ that hurt us the most. (Again Actant mentioned here). By that he meant "stick to C", but C is a subset of C++, right? So how can C++ be an issue?

Here is a timeline that will help us:
  1. Pre-C++98 (templates): we used macros, generative and model driven programming to write consistent models that ran distributed programs as multiple and consistent views of state.
  2. Post-C++98: we used templates, type unsafe models, and redundant code to write distributed programs with often inconsistent views of state.
  3. Post-C++14 (explicit template specialization): we used templates and explicit template specialization to write consistent models that run distributed programs as multiple and consistent views of state.
Because we chose to adopt the rules of C++, because we did not understand that by doing so we could not code a single consistent model of multiple views of state that we needed for distributed computing, we got "caught" in the 16 year gap of formal inconsistency that C++98 introduced! 

I write "formal inconsistency" because nothing in C++98 says that you couldn't continue to use macros, generative and model driven programming to get around the limitations of templates. The thing is "we do not know what we do not know", so we did not know that it would have been best to ignore the "formal" template solution and stick with our old technics. And that is example of a language trap.

A language trap is when developers choose to adhere to limiting coding language semantics without understanding that they are "shooting themselves in the foot" because they are no longer able to "get things right" when adhering to these limiting language rules. In some sense, a language trap is a technical form of semantic barrier.

Unfortunately, again "we do not know what we do not know". So we may or not be in a language trap, we do not usually know if this is the case. In early 2000, we did not realise how much the choice of purist C++ approach had been a bad choice.

My learning from that period was: never let a language nor language purism restrict you. Because you do not know how that it harming you until it is too late. The safer and future resistant approach is to deconstruct "a single language", and be purist in how you formally compose multiple language constructions.  An advantage of this approach is that it may also be applied consistently across multiple languages.

All original content copyright James Litsios, 2019.