Saturday, November 14, 2020

Rare intersecting with rare is pretty much empty

Having good intuition about rare events is actually pretty difficult.

Thinking in terms of the plots below helps me with the subject.

This is all in this Jupyter notebook.

(Note: formatting below is iffy as original is in Jupyter notebook)


#!/usr/bin/env python3

%matplotlib inline

import numpy as np

import matplotlib.pyplot as plt


An illustration of how some things are rare, and others common

Our "test" population is a thousand

In [2]:

n=1000

Rare numbers are numbers that are special one way of another. Here rarity means "uniformly randomly generated number between 0 and 1 near to one". We can find them by raising the numbers to a high power (e.g. 100). If they are not close enough to one, they get "crushed" to 0.

In [3]:

def mk_rare():

    return np.power(np.random.uniform(0.0, 1.0, n), 100)

rare = mk_rare()

 

The plot of the rare numbers:

In [4]:

plt.plot(rare)

plt.show()

The common numbers are all random numbers that are not close to zero. To be complementary how we found rare numbers, common numbers can be found by looking for high order "roots". The closer these are to one, the more common the number.In [5]:

def mk_common():

    return np.power(np.random.uniform(0.0, 1.0, n), 1.0/100.0)


common = mk_common()

 

Plotting the common numbers

In [6]:

plt.plot(common)

plt.show()

 



Let's find out how many rare numbers we have that are above a given number (between 0 and 1).

In [7]:

def count_above(s, skew_selection_for_better_plot):

    x = np.power(np.arange(0.0,1.0,0.01),skew_selection_for_better_plot)

    y = (s > x[:, np.newaxis]).sum(axis=1)

    plt.plot(x, y)

    plt.show()

count_above(rare, 10)

 


We can do the same for our common numbers:

In [8]:

count_above(common, 1/10)


 

The plots above are what you would expect, there are very few rare number above most thresholds. And most common numbers are above most of the numbers (between 0 and 1).

Let's do one more thing: multiply rare and common sets of numbers. First, let's have a few rare and common numbers.

In [9]:

a_rare = mk_rare()

b_rare = mk_rare()

c_common = mk_common()

d_common = mk_common()

 

We multiply a few

In [10]:

ab = a_rare*b_rare

ac = a_rare*c_common

cd = c_common*d_common

 

and plot the results

In [11]:

count_above(ab, 10)



In [12]:

count_above(ac, 10)

 


In [13]:

count_above(cd, 1/10)

 


The plots above confirm the following:

  • rare and rare is pretty much empty (when uncorrelated!)

  • rare and common is still rare

  • common and common is common


All original content copyright James Litsios, 2020.

Sunday, October 18, 2020

Healthy, noisy, productive Python

Asked about my feelings about Python, “speckle noise” came to mind.  The bigger subject here is Python's limited types and inherent noisiness in capturing deeper invariant properties (and therefore my speckle remark). 


Speckle noiseis the noise that arises due to the effect of environmental conditions on the imaging sensor during image acquisition. The analogy is that there is inherent noise in how one captures ideas in Python, and that this noise has "good Pythonic character". By noise, I mean something that cannot be precisely mastered.  Note that "real" speckle noise is not "good" or "bad", it just is.


All programming languages are "noisy". Yet to the developer, the way that noise affect you varies greatly. "Messiness" of computer languages may hurt you, as it may also help you (e.g. by improving your team's productivity). Said differently, sometime "cleanliness" is unproductive.  The main idea is the following: 


People naturally care about uncertainty.  Therefore, sometimes, we naturally focus on things that are not certain. As a bonus, we are naturally social around uncertain topics (think of the weather!), in part because we are happy to share when no one has an absolute truth, but also because sharing helps us deal with these uncertainties. Finally, there are many situation where an "external" nudge is needed to move out of a local minima. (I mention here the suggestion that financial markets need a bit uncertainty to be healthy, and here how I was once stuck in a bad local C++ design).


People naturally build on certainties. And when we do so, we in part lock ourselves in, because it would cost us to change what was certain and rebuild things.


This game of "certainty", "uncertainty", "building and locking ourselves in", "not finding a solid base to build", is what happens when we program.  Our choice of software language strongly affects how this happens. I have programmed and managed teams using Python, Haskell, F#, C++, Pascal, C, and Fortran (plus exotic languages like OPS5). Each of these languages is "robust" at a different level, some with more impact than others. 


Python, for example, is a language where expressions and functions come first, and types (e.g. list, object, classes, ...) are much a thin way to group functions and data together.  To contrast with Haskell,  where types are more important than expressions. The result is that new concepts are quickly captured in Python, and are considerably harder to capture in Haskell. However, it is quite difficult to capture deeper invariant properties of new concepts in Python, something that is easy to do in Haskell, with its strong types.


We might summarize by stating that Python has noisy types. At least that is often the way I feel when "dragging" concepts from one expression to another using "glue" list, dictionaries, objects or tuplets structures, just to make it work. Also to mention Python's limited dispatch logic, forcing yet more ad hoc constructions into your expressions Yet the magic of the real world, is that such noise creating properties is not necessarily bad!


I few years ago, I hired Haskell developers to build a system with "crystalline design properties". This had been one of my goals since being responsible for a "pretty messy design" in the late 90's. Therefore I co-founded a company with in part the goal of building a "single coherent distributed system". It is not easy to create a system where every complementary concerns fit precisely together, and all exists within coherent contexts. In fact, it only makes sense if you need it, for example to ensure trust and security. Now imagine the developers in the team working on such a "single coherent design". In such a development, no engineer can take independent decisions. In such a design no code can be written that does not fit exactly with rest of the code. How then to create common goals that map into personal team member tasks so as to avoid a design deadlock?  The simple answer might be make sure you have failed before in many ways, so as to avoid repeating those failures. Yet still, that does not avoid design deadlock. The hint of the approach is that for every dimension of freedom that you need to remove to guarantee the strength of your design, make sure to add an additional free non-critical dimension to help individuals still have a form of personal independence. In addition, I will add that it took me a lot of micro-management of vision and team dynamics to make that development a great success.


With “speckle noise”, especially at the type level, no such problems! There is no single crystalline unified software design. There is no coherency that is assured across your system. Python naturally accumulates imperfection which are just too expensive to keep precisely tamed with each addition of new code. This means that developers can agree on similar and yet different Python designs, In some sense one agrees to compare naturally fuzzy design views. And by doing so, one naturally protects one’s ego, as there is always a bit of room to express individual choices.


This may sound like Python bashing. It is not. This expensive “to design right” property is common to most programming languages.  This post is in fact a “praise Python” post.  If Python only had “a certain fuzziness”, it would be not much better than Visual Basic (to be slightly nasty).  Python is not “just another language”, it is a language where design logic is cheap to change because of the messy types are in fact naturally "spread apart". That is, the "noisy" Python type property results in "not too dense type relations", allowing changes to be made in one (implied) type without effecting the core of the other (implied) types. 


ps: I mentioned Fortran above really only because I like numpy's stride_tricks.as_strided and it reminds me of my large equivalence array structures which I used in Fortran when I was a teenager.


All original content copyright James Litsios, 2020.





Sunday, August 16, 2020

25'000 views; Some topics I am passionate about

 This week a took a day off to walk in the alps with my son. And while doing so noticed that my blog view count was 24999. So I quickly asked my son to be the next view! 

My blog has been around for more than ten years, and a tenth of a second is all a site like Google would need to achieve the same hit count. Still, I write this infrequent blog because it helps me find what is relevant to me and to others, and helps me communicate better. And encourages me to do my best, because I do care about my audience.

I write infrequently mostly because I do not have the time. Also, my topics tend to be complex, and sometimes even sensitive, and therefor take time. Still, I often jot down a title, sometimes with a page or two of content. These then become public if I happen to have free evening or weekend. For example, these are my unpublished blog titles going back to early 2019:

  • Balancing productivity, tempo, and success
  • Program = complementary
  • Some retrospectives in business of innovation
  • Type centric vs ...
  • The instantaneous view of the real world has no math
  • Control and flow in innovation
  • Lessoned learned: writing "coherent software"
  • Careful Software Architecture (in Python)
  • Absent minded at work
  • Choose your inner truth: run your organization like a hedge fund
  • Dealing with time
  • Computer languages as data for machine learning
  • My reality? Or your reality?
  • The Peter principle in software development
  • Innovate, but don't waste you time
  • ...
Out of these, you might notice topics that I am passionate about:
  • Innovation
  • Productivity
  • Formal software and architecture
  • Teamwork and organization processes
  • Modelling the real (and unreal!) world
The thing is: You cannot innovate if you are not productive. You cannot be productive if your teams and organizations do not work well together. You cannot work well together if you do not understand how to express the real world in your work, and to be precise and correct when needed. These are topics that I care about, and what this blog has mostly been about.

Cheers to all, I am looking forward to sharing some more writing!