scikits-bootstrap

Version: 1.1.0

Summaries

ci()

Given a set of data data, and a statistics function statfunction that applies to that data, computes the bootstrap confidence interval for statfunction on that data.

pval(data[, statfunction, compfunction, ...])

Given a set of data data, a statistics function statfunction that applies to that data, and the criteria function compfunction, computes the bootstrap probability that the statistics function statfunction on that data satisfies the the criteria function compfunction.

Functions

scikits.bootstrap.bootstrap_indices(data: ndarray[Any, dtype[Any]], n_samples: int = 10000, seed: None | int | ndarray[Any, dtype[Any]] | SeedSequence | BitGenerator | Generator = None) Iterator[ndarray[Any, dtype[Any]]][source]

Given data points data, where axis 0 is considered to delineate points, return an generator for sets of bootstrap indices. This can be used as a list of bootstrap indices (with list(bootstrap_indices(data))) as well.

scikits.bootstrap.bootstrap_indices_moving_block(data: ndarray[Any, dtype[Any]], n_samples: int = 10000, block_length: int = 3, wrap: bool = False, seed: None | int | ndarray[Any, dtype[Any]] | SeedSequence | BitGenerator | Generator = None) Iterator[ndarray[Any, dtype[Any]]][source]

Generate moving-block bootstrap samples.

Given data points data, where axis 0 is considered to delineate points, return a generator for sets of bootstrap indices. This can be used as a list of bootstrap indices (with list(bootstrap_indices_moving_block(data))) as well.

Parameters:
  • 10000] (n_samples [default) –

  • 3] (block_length [default) –

  • False] (wrap [default) –

  • true (the last block for data of length L start at L-block_length. If) –

  • choose

  • anywhere (blocks starting) –

  • data (and if they extend past the end of the) –

  • wrap

  • again. (around to the beginning of the data) –

scikits.bootstrap.ci(data: DataType, statfunction: StatFunctionWithWeights | StatFunction | None = None, alpha: float | Iterable[float] = 0.05, n_samples: int = 10000, method: Literal['pi', 'bca', 'abc'] = 'bca', output: Literal['lowhigh', 'errorbar'] = 'lowhigh', epsilon: float = 0.001, multi: None | bool | Literal['independent'] | Literal['paired'] = None, return_dist: Literal[False, True] = False, seed: SeedType = None, use_numba: bool = False) Union[\n    NDArrayAny,\n    Tuple[NDArrayAny, NDArrayAny],\n][source]

Given a set of data data, and a statistics function statfunction that applies to that data, computes the bootstrap confidence interval for statfunction on that data. Data points are assumed to be delineated by axis 0.

Parameters:
  • data (array_like, shape (N, ...) OR tuple of array_like all with shape (N, ...)) – Input data. Data points are assumed to be delineated by axis 0. Beyond this, the shape doesn’t matter, so long as statfunction can be applied to the array. If a tuple of array_likes is passed, then samples from each array (along axis 0) are passed in order as separate parameters to the statfunction. The type of data (single array or tuple of arrays) can be explicitly specified by the multi parameter.

  • statfunction (function (data, weights=(weights, optional)) -> value) –

    This function should accept samples of data from data. It is applied to these samples individually.

    If using the ABC method, the function _must_ accept a named weights parameter which will be an array_like with weights for each sample, and must return a _weighted_ result. Otherwise this parameter is not used or required. Note that numpy’s np.average accepts this. (default=np.average)

  • alpha (float or iterable, optional) – The percentiles to use for the confidence interval (default=0.05). If this is a float, the returned values are (alpha/2, 1-alpha/2) percentile confidence intervals. If it is an iterable, alpha is assumed to be an iterable of each desired percentile.

  • n_samples (float, optional) – The number of bootstrap samples to use (default=10_000)

  • method (string, optional) – The method to use: one of ‘pi’, ‘bca’, or ‘abc’ (default=’bca’)

  • output (string, optional) – The format of the output. ‘lowhigh’ gives low and high confidence interval values. ‘errorbar’ gives transposed abs(value-confidence interval value) values that are suitable for use with matplotlib’s errorbar function. (default=’lowhigh’)

  • epsilon (float, optional (only for ABC method)) – The step size for finite difference calculations in the ABC method. Ignored for all other methods. (default=0.001)

  • multi (boolean or string, optional) –

    If False, assume data is a single array. If True or “paired”, assume data is a tuple/other iterable of arrays of the same length that should be sampled together (eg, values in each array at a particular index are linked in some way). If None, “paired” is used if data is an actual tuple, and False otherwise. If “independent”, sample the tuple of arrays separately. For True/”paired”, each array must be the same length. (default=None)

    An example of a situation where True/”paired” might be useful is if you have an array of x points and an array of y points, and want confidence intervals on a linear fit, eg `boot.ci((x,y), lambda a,b: np.polyfit(a,b,1), multi=”paired”). In this case, the statistic function needs to have samples that preserve the links between values in x and y in order for the fit to make sense. This is equivalent to running boot.ci on an Nx2 array.

    An example of where “independent” might be useful is if you have an array of values x and an array of values y, and you want a confidence interval for the difference of the averages of the values in each, eg boot.ci((x,y), lambda a,b: np.average(a)-np.average(b), multi=”independent”) . Here, you don’t care about maintaining the link between each value in x and y, and treat them separately. This is equivalent to taking bootstrap samples of x and y separately, and then running the statistic function on those bootstrap samples.

  • return_dist (boolean, optional) – Whether to return the bootstrap distribution along with the confidence intervals. Defaults to False. Note that, as the ‘abc’ method does not actually calculate the bootstrap distribution, method=’abc’ conflicts with return_dist=True .

Returns:

  • confidences (tuple of floats) – The confidence percentiles specified by alpha

  • stat (array) – Bootstrap distribution. Returned only if return_dist=True.

  • Calculation Methods

  • ——————-

  • ‘pi’ (Percentile Interval (Efron 13.3)) – The percentile interval method simply returns the 100*alphath bootstrap sample’s values for the statistic. This is an extremely simple method of confidence interval calculation. However, it has several disadvantages compared to the bias-corrected accelerated method, which is the default.

  • ’bca’ (Bias-Corrected Accelerated (BCa) Non-Parametric (Efron 14.3) (default)) – This method is much more complex to explain. However, it gives considerably better results, and is generally recommended for normal situations. Note that in cases where the statistic is smooth, and can be expressed with weights, the ABC method will give approximated results much, much faster. Note that in a case where the statfunction results in equal output for every bootstrap sample, the BCa confidence interval is technically undefined, as the acceleration value is undefined. To match the percentile interval method and give reasonable output, the implementation of this method returns a confidence interval of zero width using the 0th bootstrap sample in this case, and warns the user.

  • ’abc’ (Approximate Bootstrap Confidence (Efron 14.4, 22.6)) – This method provides approximated bootstrap confidence intervals without actually taking bootstrap samples. This requires that the statistic be smooth, and allow for weighting of individual points with a weights= parameter (note that np.average allows this). This is _much_ faster than all other methods for situations where it can be used.

Examples

To calculate the confidence intervals for the mean of some numbers:

>> boot.ci( np.randn(100), np.average )

Given some data points in arrays x and y calculate the confidence intervals for all linear regression coefficients simultaneously:

>> boot.ci( (x,y), scipy.stats.linregress )

References

Efron, An Introduction to the Bootstrap. Chapman & Hall 1993

scikits.bootstrap.jackknife_indices(data: ndarray[Any, dtype[Any]]) Iterator[ndarray[Any, dtype[Any]]][source]

Given data points data, where axis 0 is considered to delineate points, return a list of arrays where each array is a set of jackknife indices.

For a given set of data Y, the jackknife sample J[i] is defined as the data set Y with the ith data point deleted.

scikits.bootstrap.pval(data: DataType, statfunction: StatFunction = <function average>, compfunction: Callable[[Any], Any] = <function <lambda>>, n_samples: int = 10000, multi: Optional[bool] = None, seed: SeedType = None) np.number[Any] | NDArrayAny[source]

Given a set of data data, a statistics function statfunction that applies to that data, and the criteria function compfunction, computes the bootstrap probability that the statistics function statfunction on that data satisfies the the criteria function compfunction. Data points are assumed to be delineated by axis 0.

Parameters:
  • data (array_like, shape (N, ...) OR tuple of array_like all with shape (N, ...)) – Input data. Data points are assumed to be delineated by axis 0. Beyond this, the shape doesn’t matter, so long as statfunction can be applied to the array. If a tuple of array_likes is passed, then samples from each array (along axis 0) are passed in order as separate parameters to the statfunction. The type of data (single array or tuple of arrays) can be explicitly specified by the multi parameter.

  • statfunction (function (data, weights=(weights, optional)) -> value) – This function should accept samples of data from data. It is applied to these samples individually.

  • compfunction (function (stat) -> True or False) – This function should accept result of the statfunction computed on the samples of data from data. It is applied to these results individually. The default tests for each element of statfunction output being > 0.

  • n_samples (float, optional) – The number of bootstrap samples to use (default=10_000).

  • multi (boolean, optional) – If False, assume data is a single array. If True, assume data is a tuple/other iterable of arrays of the same length that should be sampled together. If None, decide based on whether the data is an actual tuple. (default=None)

Returns:

probability – The probability that the statistics defined by the statfunction satisfies the criteria defined by the compfunction.

Return type:

a float

References

Efron, An Introduction to the Bootstrap. Chapman & Hall 1993

scikits.bootstrap.subsample_indices(data: ndarray[Any, dtype[Any]], n_samples: int = 1000, size: float = 0.5, seed: None | int | ndarray[Any, dtype[Any]] | SeedSequence | BitGenerator | Generator = None) ndarray[Any, dtype[Any]][source]

Given data points data, where axis 0 is considered to delineate points, return a list of arrays where each array is indices a subsample of the data of size size. If size is >= 1, then it will be taken to be an absolute size. If size < 1, it will be taken to be a fraction of the data size. If size == -1, it will be taken to mean subsamples the same size as the sample (ie, permuted samples)

Indices and tables