Rare intersecting with rare is pretty much empty
Having good intuition about rare events is actually pretty difficult.
Thinking in terms of the plots below helps me with the subject.
This is all in this Jupyter notebook.
(Note: formatting below is iffy as original is in Jupyter notebook)
#!/usr/bin/env python3
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
An illustration of how some things are rare, and others common
Our "test" population is a thousand
In [2]:
n=1000
Rare numbers are numbers that are special one way of another. Here rarity means "uniformly randomly generated number between 0 and 1 near to one". We can find them by raising the numbers to a high power (e.g. 100). If they are not close enough to one, they get "crushed" to 0.
In [3]:
def mk_rare():
return np.power(np.random.uniform(0.0, 1.0, n), 100)
rare = mk_rare()
The plot of the rare numbers:
In [4]:
plt.plot(rare)
plt.show()
In [5]:def mk_common():
return np.power(np.random.uniform(0.0, 1.0, n), 1.0/100.0)
common = mk_common()
Plotting the common numbers
In [6]:
plt.plot(common)
plt.show()
Let's find out how many rare numbers we have that are above a given number (between 0 and 1).
In [7]:
def count_above(s, skew_selection_for_better_plot):
x = np.power(np.arange(0.0,1.0,0.01),skew_selection_for_better_plot)
y = (s > x[:, np.newaxis]).sum(axis=1)
plt.plot(x, y)
plt.show()
count_above(rare, 10)
We can do the same for our common numbers:
In [8]:
count_above(common, 1/10)
The plots above are what you would expect, there are very few rare number above most thresholds. And most common numbers are above most of the numbers (between 0 and 1).
Let's do one more thing: multiply rare and common sets of numbers. First, let's have a few rare and common numbers.
In [9]:
a_rare = mk_rare()
b_rare = mk_rare()
c_common = mk_common()
d_common = mk_common()
We multiply a few
In [10]:
ab = a_rare*b_rare
ac = a_rare*c_common
cd = c_common*d_common
and plot the results
In [11]:
count_above(ab, 10)
In [12]:
count_above(ac, 10)
In [13]:
count_above(cd, 1/10)
The plots above confirm the following:
rare and rare is pretty much empty (when uncorrelated!)
rare and common is still rare
common and common is common