2019-May-21
Random generation of events is important for modeling many statistical processes. In this post we will discuss how to simulate the tossing of an unbiased coin in Clojure. We will do this using pure functional techniques. We will also write functions to derive some interesting statistics from coin-toss experiments.
A coin toss can result in either a head or a tail. We shall assign the integer 1
to head and 0
to tail,
so that we can use the rand-int
function to simulate a single toss.
> (defn toss [] (rand-int 2))
> (toss)
;;=> will output 1 or 0 with equal chance. probability=0.5
How can we simulate N
tosses? With the help
of lazy sequences.
The following function will give us as many tosses as we want!
> (defn tosses []
(lazy-seq (cons (toss) (tosses))))
> (take 10 (tosses))
;;=> (1 1 0 1 0 1 0 0 0 1)
Now we would want to answer some questions about the data gathered from all those tosses, like how many heads were generated
by the experiment? The higher-order function reduce
can help us here:
> (reduce + 0 (take 10 (tosses)))
;;=> 5
Note You may get a different result, as the simulation includes random values.
It is well-known that the ratio of the number of heads (or tails) to the total number of tosses will approach 0.5
as
the number of tosses increases. To test this out, we first need a version of reduce
that returns a sequence of cumulative
results. The definition of this function follows:
(defn cumulate [f init xs]
(when (seq xs)
(let [r (f init (first xs))]
(lazy-seq (cons r (cumulate f r (rest xs)))))))
Cumulate
applies the function f
to init
and the first element of xs
.
The result (r
) becomes the head of a lazy sequence. The tail of this sequence is constructed by
a recursive call to cumulate
with r
being the value of init
. This process continues
until xs
runs out of values. The returned sequence will contain all intermediate results of the reduction of xs
by f
.
Let's call reduce
and cumulate
on the same inputs and see how the outputs differ:
> (reduce + 0 [1 2 3 4 5])
;;=> 15
> (cumulate + 0 [1 2 3 4 5])
;;=> (1 3 6 10 15)
The value returned by reduce
and the final value in the sequence returned by cumulate
are the same.
In addition, cumulate
also captures the intermediate values of the reduction.
Cumulative summing of sequences can be expressed by the following partial application of cumulate
:
> (def sums (partial cumulate + 0))
We have all the tools necessary to experiment and find out if heads appear with a probability of 0.5
if a coin is
tossed for a sufficiently large number of times. The following program will conduct this experiment for us:
> (def xs (take 3000 (tosses)))
> (def ss (sums xs))
> (def rs (map #(/ (float %1) %2) ss (range 1 3001)))
> (drop 2500 rs)
;;=> (0.5041983206717313 0.504396482813749 0.5041949660407511 ...)
First, 3000
tosses are made. The number of heads seen until each toss is found by calling sums
.
The higher-order function map
walks down this sequence, dividing each entry by the corresponding toss number
to produce a sequence of ratios.
As we see in the final output, the ratio of heads indeed approach 0.5
.
There are other interesting information that we can derive from the data returned by tosses
.
For instance, can you design a function to report the excess of heads over tails for a particular toss?
This problem can be solved by finding the difference between the cumulative number of heads and the cumulative number of tails
for the specified toss. We can find the cumulative of tails by first "inverting" the tosses and then applying sums
to
the result. Here is the invert function:
> (defn inv [xs]
(map #(if (zero? %) 1 0) xs))
> (def xs (take 10 (tosses)))
> xs
;;=> (0 0 0 1 1 1 0 0 1 0)
> (def ys (inv xs))
> ys
;;=> (1 1 1 0 0 0 1 1 0 1)
The function excess
defined below, finds the difference between a sequence and its inverse:
> (defn excess [xs n]
(let [hs (sums xs)
ts (sums (inv xs))]
(nth (map #(- %1 %2) hs ts) n)))
> (excess xs 4)
;;=> -1
> (excess xs 5)
;;=> 0
> (excess xs 6)
;;=> -1
Exercise 1 Write a function to report the first n
excesses of tails over heads.
Exercise 2 Use Incanter to visualize the various information you gather about coin tosses. To start, you may plot the ratios obtained from the first 3000 tosses.
Note The inspiration for this post came from this lecture by Prof. Keith Smillie.