Saturday, November 14, 2020

Rare intersecting with rare is pretty much empty

Having good intuition about rare events is actually pretty difficult.

Thinking in terms of the plots below helps me with the subject.

This is all in this Jupyter notebook.

(Note: formatting below is iffy as original is in Jupyter notebook)


#!/usr/bin/env python3

%matplotlib inline

import numpy as np

import matplotlib.pyplot as plt


An illustration of how some things are rare, and others common

Our "test" population is a thousand

In [2]:

n=1000

Rare numbers are numbers that are special one way of another. Here rarity means "uniformly randomly generated number between 0 and 1 near to one". We can find them by raising the numbers to a high power (e.g. 100). If they are not close enough to one, they get "crushed" to 0.

In [3]:

def mk_rare():

    return np.power(np.random.uniform(0.0, 1.0, n), 100)

rare = mk_rare()

 

The plot of the rare numbers:

In [4]:

plt.plot(rare)

plt.show()

The common numbers are all random numbers that are not close to zero. To be complementary how we found rare numbers, common numbers can be found by looking for high order "roots". The closer these are to one, the more common the number.In [5]:

def mk_common():

    return np.power(np.random.uniform(0.0, 1.0, n), 1.0/100.0)


common = mk_common()

 

Plotting the common numbers

In [6]:

plt.plot(common)

plt.show()

 



Let's find out how many rare numbers we have that are above a given number (between 0 and 1).

In [7]:

def count_above(s, skew_selection_for_better_plot):

    x = np.power(np.arange(0.0,1.0,0.01),skew_selection_for_better_plot)

    y = (s > x[:, np.newaxis]).sum(axis=1)

    plt.plot(x, y)

    plt.show()

count_above(rare, 10)

 


We can do the same for our common numbers:

In [8]:

count_above(common, 1/10)


 

The plots above are what you would expect, there are very few rare number above most thresholds. And most common numbers are above most of the numbers (between 0 and 1).

Let's do one more thing: multiply rare and common sets of numbers. First, let's have a few rare and common numbers.

In [9]:

a_rare = mk_rare()

b_rare = mk_rare()

c_common = mk_common()

d_common = mk_common()

 

We multiply a few

In [10]:

ab = a_rare*b_rare

ac = a_rare*c_common

cd = c_common*d_common

 

and plot the results

In [11]:

count_above(ab, 10)



In [12]:

count_above(ac, 10)

 


In [13]:

count_above(cd, 1/10)

 


The plots above confirm the following:

  • rare and rare is pretty much empty (when uncorrelated!)

  • rare and common is still rare

  • common and common is common


All original content copyright James Litsios, 2020.