Having good intuition about rare events is actually pretty difficult.
Thinking in terms of the plots below helps me with the subject.
This is all in this Jupyter notebook.
(Note: formatting below is iffy as original is in Jupyter notebook)
#!/usr/bin/env python3
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
An illustration of how some things are rare, and others common
Our "test" population is a thousand
In [2]:
n=1000
Rare numbers are numbers that are special one way of another. Here rarity means "uniformly randomly generated number between 0 and 1 near to one". We can find them by raising the numbers to a high power (e.g. 100). If they are not close enough to one, they get "crushed" to 0.
In [3]:
def mk_rare():
return np.power(np.random.uniform(0.0, 1.0, n), 100)
rare = mk_rare()
The plot of the rare numbers:
In [4]:
plt.plot(rare)
plt.show()
In [5]:def mk_common():
return np.power(np.random.uniform(0.0, 1.0, n), 1.0/100.0)
common = mk_common()
Plotting the common numbers
In [6]:
plt.plot(common)
plt.show()
Let's find out how many rare numbers we have that are above a given number (between 0 and 1).
In [7]:
def count_above(s, skew_selection_for_better_plot):
x = np.power(np.arange(0.0,1.0,0.01),skew_selection_for_better_plot)
y = (s > x[:, np.newaxis]).sum(axis=1)
plt.plot(x, y)
plt.show()
count_above(rare, 10)
We can do the same for our common numbers:
In [8]:
count_above(common, 1/10)
The plots above are what you would expect, there are very few rare number above most thresholds. And most common numbers are above most of the numbers (between 0 and 1).
Let's do one more thing: multiply rare and common sets of numbers. First, let's have a few rare and common numbers.
In [9]:
a_rare = mk_rare()
b_rare = mk_rare()
c_common = mk_common()
d_common = mk_common()
We multiply a few
In [10]:
ab = a_rare*b_rare
ac = a_rare*c_common
cd = c_common*d_common
and plot the results
In [11]:
count_above(ab, 10)
In [12]:
count_above(ac, 10)
In [13]:
count_above(cd, 1/10)
The plots above confirm the following:
rare and rare is pretty much empty (when uncorrelated!)
rare and common is still rare
common and common is common