Saturday, March 12, 2022

Thinking about parallel and distributed computation

A classical way to look at parallel / distributed systems is in terms of:

  1. Work-Efficiency
  2. Parallelism
  3. Locality 
This is one of those inconsistent triads (or trilemma), where only two among three properties can be chosen. For example, a system can have both work-efficiency and parallelism, or parallelism and locality but cannot be fully work-efficient, parallel and local.

This is not the only inconsistent triads to think about when designing scalable systems.
Here are more:
  1. Velocity
  2. Resources
  3. Work
  1. Momentum
  2. Quality
  3. Traffic
And of course the CAP theorem's:
  1. Consistency
  2. Availability
  3. Partition tolerance
And the important:
  1. Fast
  2. Good
  3. Cheap
All of these are ultimately important, yet some of these are more practically useful. Especially when you realise that you can weaken the inconsistent of these triads by introducing additional complementary properties. 

To illustrate this with examples:
  • Two people must work together vs an engineer and a sales person must work together
  • Two time constraints must be considered vs a measurement must last one microsecond and be done in the next two weeks.
The idea here is that by clarifying dependencies and scales we are 'loosening' our constraints. The second example above is about distinguishing what is short versus long (in duration), and also that a measurement is typically done before things are said to done. Therefore if we partition what is of 'short duration' vs what is of 'long duration', or what is 'done before' and what is 'done after', we soften the inconsistent of the triads above. For example, we could have:
  • Work-Efficiency of all durations
  • Parallelism of long durations
  • Locality of short durations
As a result, we 'never have all three' for short durations, or for long durations.

In their most primitive form, these are binary properties. For example, things can be:
  1. Small
  2. Large
  1. Slow
  2. Fast
  1. Short
  2. Long
  1. Light
  2. Heavy
  1. Precise
  2. Approximate
  1. Predictable
  2. Unpredictable
  1. Efficient
  2. Inefficient
  1. Up-stream
  2. Down-stream
  1. Present
  2. Future
And so on...

These properties make sense when they are properties that build the inconsistent triad. Using arbitrary properties would not work. For example, tall vs short does not help us. And to note the triads above are different. For example, efficiency is not part of the CAP theorem, but does effect the other triads (and this depends also on how efficiency is measured).

Finally, all of these inconsistent triads are interdependent. You want to understand how. 
For example, my previous post was in fact based on two tightly depend triads, which I illustrate as follows with their 'limits' within a distributed system view:


The cost of a cloud architecture is strongly tied how these complementary inconsistent triads are approached.

All original content copyright James Litsios, 2022.

Monday, March 07, 2022

Six ways to improve your organisational agility



Failing smart 

 My agile mantra is much:
"Failures must bring you closer to your vision"

It is a mantra I have often used. And very much how I approach agile innovation (e.g. see Search and Vision for Systematic Innovation)

However...

It is only partially true, as failure may happen simply because you are disorganised, with no relations to your vision. Fixing organisation issues helps avoid drifting further away from your vision, it does not bring you closer to your vision! Still, unmanaged organisational issues will eventually consume you 'from within', therefore they too must be addressed.

Can we focus on fixing organisation issues? Can we identify a subset of failures as 'organisational failures', and others as 'non-organisational failures on the way to our vision'? The simple answer is yes! Yet we most know what we are looking for.

Six types of failures

When I act as agile manager I try to distinguish between six types of failures. These are:

  1. Fail Goal: Fail to achieve previously promised goals
  2. Waiting: Fail deadline because of waiting on others
  3. Underutilisation: Fail to use all your team members
  4. Exhaustion: Fail to stay productive because of exhaustion
  5. Inconsistency: Fail because of misalignments
  6. Queuing: Fail because of past work no longer relevant, or past work never finished.
Failure 2 to 6 are all organisation failures!
Failure 1, not achieving goals, might be an organisational issue, yet may also be tied to the difficulty of the the task at hand.

Six ways to improve your organisation

We can express specific actions for each type of failure observed: 

# Failure type Observation Corrective action
1 Fail Goal Fail to achieve previously promised goals Review / pivot how resources approach goals
2 Waiting Fail deadline because of waiting on others Review how resources approach work
3 Under-utilisation Fail to use all your team members Review how goals are broken down into work
4 Exhaustion Fail to stay productive because of exhaustion Review how goals are picked up by resources
5 Inconsistency Fail because of misalignments Review how work is shared across resources
6 Queuing Fail because of past work no longer relevant, or past work never finished. Review how common work impacts different goals

You may note the common patterns both in the observations and in the corrective measures above. This is because they all refer to the same system! This is important, and maybe the most important learning from this post. When managing fast greenfield innovation projects:
Comparing failure types is as important as to address specific failures!
This is because when a failure type happen more often than others, we can take organisational actions even before we understand the specific details of each failure! 

A final note: the approach is scalable.

All original content copyright James Litsios, 2022.