Equational: api compatibility

Showing posts with label api compatibility. Show all posts

Sunday, February 07, 2021

Single class refactoring exercise (in Python)

This week I needed to extract a little piece of computational geometry, imbedded in a single Python class of application specific logic. Therefore I proceeded to create a second class and migrate the computation geometry code over to it. On of the pleasures of Python is the ability to do this type of refactoring at low cost, and pretty much follow a recipe. Which I include here:

Start with class A
Create new class B, derive A from B
Move part of A.__init__ to B.__init__
Move initial subset of member functions A to B, implicitly creating a first API between A and B (API A-B)
Iterate and apply one or more of:

Split / merge methods within API A-B
For A still accessed directly (through self) by B, extend API A-B to provide additional A content.
Find flow/bundle/fiber/ data of A and B, refactor these to be within new classes C0, C1, .... Alternatively refactor them to be within additional Numpy axes (e.g. for ML applications).
Move selected Cs to from A to B or B to A. If appropriate adapt API
Move selected Cs into API A-B or take out of the API
Find variants within the Cs. Bring these under new protocol classes P0, P1, ... Adapt API A-B to depend on Ps. Alternatively, find Numpy axes with shared "basis", and refactor towards abstract shape/axes properties, then adapt API.

Stop when all traces of A have been removed from B, and the API A-B is generic enough for the desired future usage of B.

Notes:

"Flow data" are typically found in arguments to functions.
Writing generic axes logic in Numpy is super hard as Numpy gives little help for pure pointfree-style functional code. However it can be done, and you can for example write 1d, 2d, and 3d logic as just one common piece of Numpy code.
I was going say always retest as you refactor. Yet in my usage above I did not, and regretted it when my first test failed after a few hours of work. Given that I had "changed much", I quickly reapplied my changes to the original code, but this time running tests each time.

All original content copyright James Litsios, 2021.

Sunday, April 12, 2020

Thinking versus brute force "doing" in software design in 2020

About thinking versus doing

(I put a subset of this on YouTube as: https://www.youtube.com/watch?v=_kXpyctbmrQ)

Is it ok to program by "brute force" and put aside your brain?

A few years ago, I wrote on the pros and cons of programming with explicit and implicit type declarations. In 2020, I feel that this question is almost irrelevant. Therefore to ask the question:

Should we think about software design when we program?

There is a trend in software development which states "developing software products is not about thinking but about doing". I know many venture people who believe very much in this statement. Yet the reality is that software is a process of thinking (in part at least). Still, from a pure seed-venture-money game, the rule of "it is not about thinking" is true"! Because the number one rule is "be fast" and thinking too much will very much slow you down.

Is it then both true and untrue, that we should both use and not use our brains when we write software? Yes, and yes! And it is exactly this subtle challenge that makes software management in innovation a tricky game.

Thinking can be done different ways, different approaches can be taken. Some are efficient, some are a waste of time. Some will help your software grow, others will block you. Some will help you meet your deadline, others will invariably lead to failure. The game is then only to think in the ways that help you!

Some of my colleagues have heard me condense these concepts into one statement:

It is about thinking backwards!

There are many ways to do this. I will cover just a few here, there are in fact as many more.

Play to win above all else

In recent blitz chess games, FM Lefong Hua repeats "it is not about calculating the end's win game", (because that is not a guarantee win), instead is about about "flagging your opponent". By which he means, winning is not about thinking within the classical rules of chess, it is about meeting the "win conditions", which in blitz chess games is much about not running out of time (being flagged).

Transposed into a software development, that means it is not about meeting the classical rules of software design, it is about thinking through how to always meet your goals and deliver results. Or to be more specific, it is about thinking backward, to find a path back from your "runnable software" goals, to your current status that guarantees your development success.

The heart of your software comes first

I have mentioned that "One of the little understood properties of programming is that at the lowest level things tend to be single dimensional". A good software design builds a "break" or "separation" in that unique dimension "just about where your main product abstraction is", to achieve a sandwich like design where new product abstractions are added "by reassembly" of the two separated parts.

There are few ways to do this, domain specific languages (DSL) and functional programming (FP) being my favoured approaches. While all my major software developments had DSLs, it was only much later in my career that I understood that what was working for me was not the DSLs but the effort to maintain separation between interpreter and interpreted abstractions. This separation is naturally achieved with a DSL, yet can also be achieved with adjunctive higher order FP. (The "ultimate tool" is then to combine the two concepts as adjunctive FP based DSLs, which was the base of Elevence's contract language).

Be functional and agile

Agile development is a lot about prioritizing your requirements and implementing them in work iterations. That means that the first level of your design, what might be seen as the first draft of primary keys of your data base, are directly tied to the most important business concept (taken from OO vs FP vs agile). The trick is then not to see this as a database of data, where keys are data, but a database of code, where keys are functional properties such as APIs with specific invariants (e.g. somewhat like class types in Haskell). The extended trick is then also to make this a normalized design (just like a normalized classical DB schema, but in a the "functional space"), for example by using linear types.

Brute force yes, but with the right pieces

Let us put this all together, our first conclusion is when building new software:

Use brute force and minimal brains to assemble data and call external APIs

Importantly, keep your program structure as flat as possible when you do this. Which pretty much means to shun OO patterns, and to use as many array and container like constructions as you want.

Then, and this is the tricky part:

Incrementally invest effort to separate interpreter and interpreted abstractions in your code.

Always make sure that these abstractions fit within the plan to your deliverables. That means you are able to think backwards from your deliverable goal, and work out the path to your current development state, the path which you will follow to build your code. An agile process and a good team will make this an easier job. Also, all those arrays and containers will need to fit into a single unified concept (e.g. such as indexed tensors).

It's complicated

Sometimes things are just too complicated. Many developers do not know how to "separate interpreter and interpreted abstractions". Here I will be honest, it is not easy. And just to make this clear, the last DSL I wrote took me multiple tries until "I got it right". Also, to mention that embedding product abstraction in higher order FP is even harder than writing DSLs.

Worse, the process is "hack brute force" to assemble data and use external APIs, and only then think about how you can slice your interpretive abstractions. These means that initially, software development is about doing, not thinking. Tricky...

Welcome to 2020

I have been writing this blog for almost 15 years. Interestingly, a major change over these years is that there is so much well integrated open source software (OSS) out there that it is cheaper to "just try" than to "think" (when using OSS and third party APIs). And in part it was this reality that led me to write this blog today. Enjoy, and stay safe!

All original content copyright James Litsios, 2020.

Sunday, January 11, 2015

Software designs that grow with monads, comonads, and type compatibility

This post is about designing software with internal and external APIs that are robust to future changes. It is therefore about API compatibility, but more importantly it is about the compatibility of a full software design to changes. Not surprisingly, monads and comonads are part of the presented solution, as is a careful approach to use of types.

I had a "aha" moment last year when I watched a video (by Timothy Baldridge)

that showed how an AST for a Clojure compiler was fully based on key-value pairs (nested hash maps), therefore without typed structures nor classes, and was doing very well. The thing is, I have suffered enough times to get the types of the different phases of a compiler to fit together. Therefor the idea of giving up on types and just using key-value pairs, that can easily be added for each compiler phase, seemed really to be an attractive way to write a compiler.

Key-value pairs, duck typing, and many "traditional" functional patterns (think lisp, not Haskell) have all in common their reliance on generic, almost typeless, structures. So while each "atomic element" of these patterns has a type (e.g. int, string, ...), the structures (e.g. struct, list, array, ...) are all generically typed.

Types are what capture the invariants of a design. Giving up on types is in effect giving up on capturing those invariants in a robust manner. Using types normally leads to higher quality software, yet with enough complexity, types no longer capture all the design invariants of a program. Worse, sometime types actually hinder the design by holding it back because they are not powerful enough to capture the desired invariant. This is the case with the difficulty to design an typed AST that "fits" all phases of a compiler. This rigid nature of types is also the hidden problem of API compatibility.

The invariants of APIs are captured with the types of their signatures. When new features or corrections are added to an API, these invariants and associated types evolve. When APIs link different software projects, changing API types is where API compatibility becomes an issue: APIs remain compatible when types change but remain compatible with older types, APIs become incompatible when the types are no longer compatible. An example of type compatibility in OO, is to derive a new class from an existing class, and to add a constructor in this new class from the old class. Unfortunately, that last example is at the source level. At the binary level, compatibility to type change is usually nonexistent, especially when focusing on forward compatibility. Note that API compatibility is not only about types: an API will also become incompatible when the interpretation given to values transferred over the API changes. Therefore to remain compatible, an API can add new value interpretations but must also ensure that past value interpretations never change.

Serializing data for transport and for storage is much about breaking program defined types into language standard basic types, and keeping a strict discipline of interpretation to ensure compatibility. Therefore ensuring both backward and forward compatibility of an API is to maintain a strict growth of value interpretation and to use serializing packages like Protocol Buffers or Thrift. The question is then: how do we ensure the future growth of a complete software design, not just a single API, but a family of APIs? Just like with the single API, the answer also lies in the notion of serialization. The goal is to stitch together the typed lower level design with a higher level untyped design that can change over time.

Data is serialized by walking through it a certain order and breaking it down into its primitive types. Programs can be broken down the same way. Yet to do that, your first need to adopt a functional programming style because it is hard to serialize procedural constructions. In a functional style, only functions need to be "broken down".

In the good old days, scalable software design was about using construction such as data schemas, object patterns, futures, continuation style, etc. Data schemas are still good, but all these other program constructions elements must be revisited with the fact that they can all be implemented with monads and comonads. More importantly, they must be revisited because the bind and cobind operator (and other monadic and comonadic operators) is what serializes functions! Therefore, just like you must serialize your data schema to ensure future data compatibility, you must "serialize" your functions with monads and comonads to ensure future design compatibility.

Here are few examples of existing designs that do this:

Injection frameworks are comonadic constructions.
Transactional frameworks are monadic constructions.
Parallel tasks are monadic constructions.
Embedded queries are monadic constructions.
Reactive streams are monadic constructions.
Lenses are comonadic constructions.
Automatic differentiation (computing analytical derivatives of numerical code) are both monadic (forward differentiation) and comonadic (backward differentiation).

Just like data compatibility is tied to the order in which data is traversed, future design compatibility is tied to the "order" of function serialization. That is to say that each software design is strongly defined by the order in which functions are broken into monadic and comonadic constructions. While monads and comonads have a duality relationship, they fundamentally differ in "character": monads are "trace" centric, while comonads are environment centric. Said differently, conditional expressions can be monadic, while comonadic expressions can abstract away their environment. A design is then a chosen hierarchy of monads and comonads with chosen set of API extension points.

Just like with data compatibility, future design compatibility is tied to the amount in which types can be changed and remain compatible. And again, to differentiate between the need of source compatibility (e.g. for component designs) and binary compatibility (e.g. to design distributed systems). Use strong types to build your design when your programming language offers a type system that ensures that forward design compatibility is supported by forward type compatibility. Limit your reliance on types when these do not provide this forward compatibility. If this limited use of types, or the limits of the type system, do not allow monads and comonads constructions to be expressed with types, then use typeless/generic bind, cobind, and other monadic and comonadic like operators (e.g. duck typing on key-value pairs).

Finally, use only the language features that allow you to break down your functions into monadic and comonadic constructions. For example, only use destructive assignment if you can model it within your monadic and comonadic design. Also, do not forget you need also to enforce a "growth" only rule for your interpretation of values.

Now having said all this, you cannot build a design if it costs too much or if you cannot explain it to others. For example, I wanted once to upgrade an algorithmic trading API around a monadic construction (within a C++ framework). I communicated around prototypes and presentations, (I was CTO), yet failed to get across to my key team members (who where top notch developers), and ended up canning the idea. And this brings me back to a few important issues:

Monadic and comonadic constructions are higher order, even if you do not implement them with higher order types, you still need to think of them as higher order invariants hidden in back of your design. This is like a mathematician would "see them", and is not obvious from a classical procedural or OO background.
The cost to implement typeless/generic monadic and constructions within your language may simply be too high.

For example, concerning this last point, this week I wanted to implement a duck typing pattern in Java on top of Protocol Buffers objects (for all the reasons mentioned above), but Protocol Buffers is only efficient if you access fields in a typed manner. So even while I could see how to build a design with a robust algorithmic binary compatibility model, the cost on performance would simply be too high to do it on top of Protocol Buffers. (Luckily using Protocol Buffers is not a requirement for this project so I can move to another data serialization protocol that supports fast generic field access).

As an end note: I understand that I am not telling you in this post "how" to write monadic and comonad style constructions, especially without a functional language. What I am telling you is that you want to know how to do this if you want your designs to grow well.

All original content copyright James Litsios, 2015.