BootStraping is a form of ensemble testing.

Here I collect some notes, code snippets, papers etc on BootStrap.

Reading material

Brief introductions:

The R boot package lives here and has a pdf as documentation.

The R News article suggests looking at the plots which jack.after.boot produces, the interpretation of these plots is described in Jackknife-after-bootstrap standard errors and influence functions (Journal of the Royal Statistical Society. Series B (Methodological), Vol. 54, No. 1 (1992), pp. 83-127)

You can use the BootStrap to estimate the bias and resolution of your estimator. The resolution will be the spread of the N values of your estimator, the error on the bias is this value divided by sqrt(N) where N is the number of resamples you made.

Code

As BootStrapping sounds like magic that should not work, here some code so you can convince yourself otherwise.

As a first example we show how to estimate the mean and spread of 25 measurements of some quantity. Imagine you measured the distance between your desk and the door 25 times, this is where your 25 numbers come from, here we use a Gaussian random number generator.

   1 import math
   2 import random
   3 random.seed(1234)
   4 
   5 measurements = []
   6 for n in xrange(25):
   7     measurements.append(random.gaus(3.141, 0.83))
   8 
   9 # Some helper functions
  10 def mean(values):
  11     return sum(values) / float(len(values))
  12 def spread(values):
  13     mean_ = mean(values)
  14     variance = sum((v-mean_)**2 for v in values) * (1./len(values))
  15     return math.sqrt(variance)
  16 
  17 def perform_pse(pool, estimator=mean):
  18     """Pick values with replacement from the pool of measurements
  19     and apply the estimator to them.
  20     """
  21     pse = []
  22     for n in xrange(len(pool)):
  23         pse.append(random.choice(pool))
  24     return estimator(pse)
  25 
  26 # The results of our pseudo experiments
  27 means = []
  28 spreads = []
  29 # Perform 500 pseudo experiments
  30 for n in xrange(500):
  31     means.append(perform_pse(measurements))
  32     spreads.append(perform_pse(measurements, spread))
  33 
  34 # The uncertainty on the mean of the 25 measurements is
  35 # from theory 0.83 / sqrt(25), and from bootstrapping
  36 print "Uncertainty:", spread(means)
  37 # Similarly the uncertainty on our estimate of the 
  38 # width of the Gaussian is
  39 print spread(spreads)

Shorter but a bit less see through version of BootStrapping. Python code (using SciPy) which uses BootStrapping to estimate the mean:

   1 import scipy as S
   2 # We want to find the mean of these ten numbers
   3 x = S.array((0.4106846, 1.0231651, -2.8861417, 1.3732520,\
    -0.1353872, 1.8619070, -1.2744800, 1.0050735, 0.9914339,\
    -1.3136960))
   4 
   5 X = [] # collect the estimates here
   6 # Select 10 elements from x with replacement, calculate the
   7 # mean and repeat this 1000 times.
   8 for xx in xrange(1000):
   9     X.append(S.mean(x[S.random.randint(10,size=10)]))
  10 
  11 # The spread of X gives an estimate of the error on the mean
  12 # you should expect for your dataset
  13 print "standard error on mean:", S.std(X)

BootStrap (last edited 2010-02-05 02:54:46 by TimHead)