Equational: higher abstractions

Showing posts with label higher abstractions. Show all posts

Sunday, October 06, 2024

A software mind's eye

I have been writing higher-order functional programming in Python for the last few weekends:

hofppy (https://github.com/equational/hofppy) will be a Python library. For the moment it is a collection of Jupyter notebooks.
My initial goal was to have a handy FP toolkit which supports applied math with JAX's JIT. Yet I realise that in fact what I am really doing is reimplementing the generic part of a compiler for a reactive (trading) language I wrote in 2010 in F#, while including a few tricks I picked up since then, the primary one being to think of code as implementing a synchronous execution model.
There is really very little code like this on the web, therefore why I am doing this open source.

This blog is in part written to mention the above, as already the first JAX JIT supporting monad and comonad models are "nice". Yet this blog is also to bring up the subject of the process of creating new technology.

My recipe to do something new, such as lead a team on a new subject, write software that did not exist before is the following:

Create a "mind's eye" imaginary world of "usage" of the end result.
Imagine how it should work.
Start very quickly to implement features.
Review 1, 2, and 3, as you regularly cycle through them, again, again, and again!

I use the word imagine twice above. For a few reasons...

People are constantly asking me "how do you know?", with regards to client's requirements, technology, design, approach, etc. The reality is that I do not completely know: I know much, I have done much, written hundreds and hundreds of thousands of lines of software. Yet when something is new, no one knows fully how things will look in the end. The reality is the newer it is, the less you know. And... the more you think you know, the more you are fooling yourself. However, just like with poker, the optimal approach is to imagine much and often, and to work as hard as possible to build and maintain a mindful view of "new usage" and "new functioning technology".

The more experience, the better! As your imagination is only as good as the real details it contains. I have tons of real-life learning that I use for this higher-order FP in python. I mention here a few of them:

Lazy software design are easier express in code.

The most flexible design, is the code that does nothing but accumulates lazily what it could do, until finally because output is expected, it works backwards from output expectations to pull out the desired output.

Multi-stack semantics is perfectly normal.

For example, in the project's above monadic code, the free monads have "their" variables, "normal" Python has its variables, and these are carefully kept separate.
Multiple flows of causality exist in the real world, stacks of various semantics is the only cheap way to capture these cleanly in software.

For every Ying, there is a Yang.

If there is more, there is also less; If there are variables, there are constraints, etc.

Structure is more important than types.

I started in functional programming without the types. Then I embraced types. And yet to understand that the sweet spot is somewhere in between. The reality is that it is much easier to use code to constrain software "just a right level" than to use types.

Math is the most powerful structure.

Think geometry, topology, and dualities!

All original content copyright James Litsios, 2024.

Sunday, February 21, 2021

Stream oriented processing

Here is a presentation on designing stream processing systems which I uploaded to YouTube today. Recorded in ~2014, and given in one form or another a few times since about 2005, it is how I "model with streams", something I learned designing market making and algo systems. What I like most about this old deck is how it stays aligned with modern abstraction expressed in higher-order FP semantics.

I wrote a bit on stream processing here in 2011. I see also that I have a draft for publishing this presentation in 2013. Yet I had wanted to add Scala or Haskell code that captured the "nice figures" of this presentation, and that is why I held back from sharing it earlier.

All original content copyright James Litsios, 2021.

Saturday, January 02, 2021

What works in higher order Python (Functional Programming, early 2021)

Here is what works for me to write higher order typed Python:

Use class with __call__ instead of lambdas or functions (aka explicitly specify each closure).
Favor constraining types with Callable, ParamSpec and Concatenate (not with class constructions).
Replace *args by immutable list (Tuple[a,Tuple[b, ...) when ParamSpec fails to scale.
Replace **kwargs with object using @overload with Literal typed arguments (sadly TypedDict seems not to support generic typing) .
Use Generic to “store” multiple polymorphic types
Use Protocol to separate semantic tiers of types (e.g. ownership in a smart contract)
No nested functions, lambdas or classes.
Use cast to type "unmanaged" types (e.g. eval)
Use phantom typed "dummy" arguments (and type casts) to get around "wrong direction" type dependencies
Visual Code's Pyright works. Early 2021, pycharm fails.

The untyped application scope is a broader Python 3 stack. Note this is 3.10 Python typing (e.g. ParamSpec).

Not all is easy. Deeply typed Python seems to be magnitudes more expensive to write than "classical" Python. Also, Union types may be needed to "break out" from "tight" type relations. My current feeling is that one is sometimes forced to introduce Unions to allow complex trace joins and bifurcation, to then need to add additional layers of type constraints to control the generality of these Unions. All of this needs to be done with careful "locking" of types. Not the best of situations!

Python is known to be a dynamically typed language, which is ok, as static typing is more of a luxury than a necessity. I learned the above this 2020-2021 Xmas holiday writing lens/optic like python code. Initially with no types, fully lambda centric. Then I thought: let’s try to make it typed in Python!

All original content copyright James Litsios, 2021.

Sunday, April 12, 2020

Thinking versus brute force "doing" in software design in 2020

About thinking versus doing

(I put a subset of this on YouTube as: https://www.youtube.com/watch?v=_kXpyctbmrQ)

Is it ok to program by "brute force" and put aside your brain?

A few years ago, I wrote on the pros and cons of programming with explicit and implicit type declarations. In 2020, I feel that this question is almost irrelevant. Therefore to ask the question:

Should we think about software design when we program?

There is a trend in software development which states "developing software products is not about thinking but about doing". I know many venture people who believe very much in this statement. Yet the reality is that software is a process of thinking (in part at least). Still, from a pure seed-venture-money game, the rule of "it is not about thinking" is true"! Because the number one rule is "be fast" and thinking too much will very much slow you down.

Is it then both true and untrue, that we should both use and not use our brains when we write software? Yes, and yes! And it is exactly this subtle challenge that makes software management in innovation a tricky game.

Thinking can be done different ways, different approaches can be taken. Some are efficient, some are a waste of time. Some will help your software grow, others will block you. Some will help you meet your deadline, others will invariably lead to failure. The game is then only to think in the ways that help you!

Some of my colleagues have heard me condense these concepts into one statement:

It is about thinking backwards!

There are many ways to do this. I will cover just a few here, there are in fact as many more.

Play to win above all else

In recent blitz chess games, FM Lefong Hua repeats "it is not about calculating the end's win game", (because that is not a guarantee win), instead is about about "flagging your opponent". By which he means, winning is not about thinking within the classical rules of chess, it is about meeting the "win conditions", which in blitz chess games is much about not running out of time (being flagged).

Transposed into a software development, that means it is not about meeting the classical rules of software design, it is about thinking through how to always meet your goals and deliver results. Or to be more specific, it is about thinking backward, to find a path back from your "runnable software" goals, to your current status that guarantees your development success.

The heart of your software comes first

I have mentioned that "One of the little understood properties of programming is that at the lowest level things tend to be single dimensional". A good software design builds a "break" or "separation" in that unique dimension "just about where your main product abstraction is", to achieve a sandwich like design where new product abstractions are added "by reassembly" of the two separated parts.

There are few ways to do this, domain specific languages (DSL) and functional programming (FP) being my favoured approaches. While all my major software developments had DSLs, it was only much later in my career that I understood that what was working for me was not the DSLs but the effort to maintain separation between interpreter and interpreted abstractions. This separation is naturally achieved with a DSL, yet can also be achieved with adjunctive higher order FP. (The "ultimate tool" is then to combine the two concepts as adjunctive FP based DSLs, which was the base of Elevence's contract language).

Be functional and agile

Agile development is a lot about prioritizing your requirements and implementing them in work iterations. That means that the first level of your design, what might be seen as the first draft of primary keys of your data base, are directly tied to the most important business concept (taken from OO vs FP vs agile). The trick is then not to see this as a database of data, where keys are data, but a database of code, where keys are functional properties such as APIs with specific invariants (e.g. somewhat like class types in Haskell). The extended trick is then also to make this a normalized design (just like a normalized classical DB schema, but in a the "functional space"), for example by using linear types.

Brute force yes, but with the right pieces

Let us put this all together, our first conclusion is when building new software:

Use brute force and minimal brains to assemble data and call external APIs

Importantly, keep your program structure as flat as possible when you do this. Which pretty much means to shun OO patterns, and to use as many array and container like constructions as you want.

Then, and this is the tricky part:

Incrementally invest effort to separate interpreter and interpreted abstractions in your code.

Always make sure that these abstractions fit within the plan to your deliverables. That means you are able to think backwards from your deliverable goal, and work out the path to your current development state, the path which you will follow to build your code. An agile process and a good team will make this an easier job. Also, all those arrays and containers will need to fit into a single unified concept (e.g. such as indexed tensors).

It's complicated

Sometimes things are just too complicated. Many developers do not know how to "separate interpreter and interpreted abstractions". Here I will be honest, it is not easy. And just to make this clear, the last DSL I wrote took me multiple tries until "I got it right". Also, to mention that embedding product abstraction in higher order FP is even harder than writing DSLs.

Worse, the process is "hack brute force" to assemble data and use external APIs, and only then think about how you can slice your interpretive abstractions. These means that initially, software development is about doing, not thinking. Tricky...

Welcome to 2020

I have been writing this blog for almost 15 years. Interestingly, a major change over these years is that there is so much well integrated open source software (OSS) out there that it is cheaper to "just try" than to "think" (when using OSS and third party APIs). And in part it was this reality that led me to write this blog today. Enjoy, and stay safe!

All original content copyright James Litsios, 2020.

Sunday, January 11, 2015

Software designs that grow with monads, comonads, and type compatibility

This post is about designing software with internal and external APIs that are robust to future changes. It is therefore about API compatibility, but more importantly it is about the compatibility of a full software design to changes. Not surprisingly, monads and comonads are part of the presented solution, as is a careful approach to use of types.

I had a "aha" moment last year when I watched a video (by Timothy Baldridge)

that showed how an AST for a Clojure compiler was fully based on key-value pairs (nested hash maps), therefore without typed structures nor classes, and was doing very well. The thing is, I have suffered enough times to get the types of the different phases of a compiler to fit together. Therefor the idea of giving up on types and just using key-value pairs, that can easily be added for each compiler phase, seemed really to be an attractive way to write a compiler.

Key-value pairs, duck typing, and many "traditional" functional patterns (think lisp, not Haskell) have all in common their reliance on generic, almost typeless, structures. So while each "atomic element" of these patterns has a type (e.g. int, string, ...), the structures (e.g. struct, list, array, ...) are all generically typed.

Types are what capture the invariants of a design. Giving up on types is in effect giving up on capturing those invariants in a robust manner. Using types normally leads to higher quality software, yet with enough complexity, types no longer capture all the design invariants of a program. Worse, sometime types actually hinder the design by holding it back because they are not powerful enough to capture the desired invariant. This is the case with the difficulty to design an typed AST that "fits" all phases of a compiler. This rigid nature of types is also the hidden problem of API compatibility.

The invariants of APIs are captured with the types of their signatures. When new features or corrections are added to an API, these invariants and associated types evolve. When APIs link different software projects, changing API types is where API compatibility becomes an issue: APIs remain compatible when types change but remain compatible with older types, APIs become incompatible when the types are no longer compatible. An example of type compatibility in OO, is to derive a new class from an existing class, and to add a constructor in this new class from the old class. Unfortunately, that last example is at the source level. At the binary level, compatibility to type change is usually nonexistent, especially when focusing on forward compatibility. Note that API compatibility is not only about types: an API will also become incompatible when the interpretation given to values transferred over the API changes. Therefore to remain compatible, an API can add new value interpretations but must also ensure that past value interpretations never change.

Serializing data for transport and for storage is much about breaking program defined types into language standard basic types, and keeping a strict discipline of interpretation to ensure compatibility. Therefore ensuring both backward and forward compatibility of an API is to maintain a strict growth of value interpretation and to use serializing packages like Protocol Buffers or Thrift. The question is then: how do we ensure the future growth of a complete software design, not just a single API, but a family of APIs? Just like with the single API, the answer also lies in the notion of serialization. The goal is to stitch together the typed lower level design with a higher level untyped design that can change over time.

Data is serialized by walking through it a certain order and breaking it down into its primitive types. Programs can be broken down the same way. Yet to do that, your first need to adopt a functional programming style because it is hard to serialize procedural constructions. In a functional style, only functions need to be "broken down".

In the good old days, scalable software design was about using construction such as data schemas, object patterns, futures, continuation style, etc. Data schemas are still good, but all these other program constructions elements must be revisited with the fact that they can all be implemented with monads and comonads. More importantly, they must be revisited because the bind and cobind operator (and other monadic and comonadic operators) is what serializes functions! Therefore, just like you must serialize your data schema to ensure future data compatibility, you must "serialize" your functions with monads and comonads to ensure future design compatibility.

Here are few examples of existing designs that do this:

Injection frameworks are comonadic constructions.
Transactional frameworks are monadic constructions.
Parallel tasks are monadic constructions.
Embedded queries are monadic constructions.
Reactive streams are monadic constructions.
Lenses are comonadic constructions.
Automatic differentiation (computing analytical derivatives of numerical code) are both monadic (forward differentiation) and comonadic (backward differentiation).

Just like data compatibility is tied to the order in which data is traversed, future design compatibility is tied to the "order" of function serialization. That is to say that each software design is strongly defined by the order in which functions are broken into monadic and comonadic constructions. While monads and comonads have a duality relationship, they fundamentally differ in "character": monads are "trace" centric, while comonads are environment centric. Said differently, conditional expressions can be monadic, while comonadic expressions can abstract away their environment. A design is then a chosen hierarchy of monads and comonads with chosen set of API extension points.

Just like with data compatibility, future design compatibility is tied to the amount in which types can be changed and remain compatible. And again, to differentiate between the need of source compatibility (e.g. for component designs) and binary compatibility (e.g. to design distributed systems). Use strong types to build your design when your programming language offers a type system that ensures that forward design compatibility is supported by forward type compatibility. Limit your reliance on types when these do not provide this forward compatibility. If this limited use of types, or the limits of the type system, do not allow monads and comonads constructions to be expressed with types, then use typeless/generic bind, cobind, and other monadic and comonadic like operators (e.g. duck typing on key-value pairs).

Finally, use only the language features that allow you to break down your functions into monadic and comonadic constructions. For example, only use destructive assignment if you can model it within your monadic and comonadic design. Also, do not forget you need also to enforce a "growth" only rule for your interpretation of values.

Now having said all this, you cannot build a design if it costs too much or if you cannot explain it to others. For example, I wanted once to upgrade an algorithmic trading API around a monadic construction (within a C++ framework). I communicated around prototypes and presentations, (I was CTO), yet failed to get across to my key team members (who where top notch developers), and ended up canning the idea. And this brings me back to a few important issues:

Monadic and comonadic constructions are higher order, even if you do not implement them with higher order types, you still need to think of them as higher order invariants hidden in back of your design. This is like a mathematician would "see them", and is not obvious from a classical procedural or OO background.
The cost to implement typeless/generic monadic and constructions within your language may simply be too high.

For example, concerning this last point, this week I wanted to implement a duck typing pattern in Java on top of Protocol Buffers objects (for all the reasons mentioned above), but Protocol Buffers is only efficient if you access fields in a typed manner. So even while I could see how to build a design with a robust algorithmic binary compatibility model, the cost on performance would simply be too high to do it on top of Protocol Buffers. (Luckily using Protocol Buffers is not a requirement for this project so I can move to another data serialization protocol that supports fast generic field access).

As an end note: I understand that I am not telling you in this post "how" to write monadic and comonad style constructions, especially without a functional language. What I am telling you is that you want to know how to do this if you want your designs to grow well.

All original content copyright James Litsios, 2015.

Tuesday, March 13, 2012

What future for futures?

I recently read twitter's scala style recommendations and could not help being somewhat unhappy about their recommendation to use futures. They basically say: "Use Futures to manage concurrency.

Fifteen plus years ago I wrote a futures library which I used in a derivative trading system for a long long time. All the basic functionalities were there (trigger on future, future timeouts, merge futures) and some more advanced like boolean trigger conditions (e.g. trigger if futureA or futureB), as well as futures across processes and networks. It was a nice library!

Yet ten years later, we removed the use of futures!
Here is the reasoning...

When a future is used, its state can be seen as part of a higher order concurrent logic. The comfort of using futures, is that we do not need to model nor design this higher logic, it is implicitely defined and managed for us by the future library. There are situation where this lack of "bigger" picture is a good thing, one of these is to use futures at the periphery of your system's design. This makes sense because your design stops at the border of your system, so it makes less economic sense to invest in building a model of how the rest of the world will interact with you. Yet as much as it make sense to use futures in boundary interfaces, the deeper you go into your system the less it makes sense to use futures. What happens is that the implicit higher order model created by the futures has no relations with your system's higher order design. And this leads to bad growth.

Developers typically will start to notice this higher order mismatch in two ways: the first is shared resource management, the second is when using futures in streams.

When a future value is set, the associated triggers may be executed synchronously by having the "setter" call directly the trigger, or asychronously at a later time or within another thread.If you want the triggers to have a "DB" like transactional property, then you want to stay within the synchronous trigger model. The tricky part with the sychronous trigger model is that it interferes with your shared resource model: if you use locks, you can easily have unexpected deadlocks, if you use a transactional memory model, your transactions are of unknown size because you do not always know who has been set as triggers, cause large transactions to retry at a performance cost. Granted enough detective work can work around these issues, but these type of problems can happen at the worse of time, such as in the error handling of the future, possibly in a branch of code which is rarely executed and often in difficult to reproduce scenarios. The solution to go "asynchronous" is often not a solution because the asynchronous triggered code is severly handicaped as it happens out of context later.

Another area where futures meet their limits is with streams. Imagine you use a future to hold the return of a computation; so you create the future, set up a trigger, launch the computation. Now if you need to do that again (create, set up, launch), and again, and you try to do that a few million times per second, you find that you meet a performance limit with futures. Futures do not scale well within real time streams. You could push the future concept to achieve high performance streaming, but this could go against creating a true design identity for the stream and would limit the growth of your design.

I am not sure that my examples here are convincing. The reality is that getting your higher order designs right is hard (and expensive in time). So even if futures meet there limit at some level of abstraction, they are definitely mighty confortable to use before you meet those limits. So maybe my recommendation is "use futures" and then "take them out" when you understand how to do better!