pingouin.compute_bootci#

pingouin.compute_bootci(x, y=None, func=None, method='bca', paired=False, confidence=0.95, n_boot=2000, decimals=2, seed=None, return_dist=False)[source]#

Bootstrapped confidence intervals of univariate and bivariate functions.

Parameters:
x1D-array or list

First sample. Required for both bivariate and univariate functions.

y1D-array, list, or None

Second sample. Required only for bivariate functions.

funcstr or custom function

Function to compute the bootstrapped statistic. Accepted string values are:

  • 'pearson': Pearson correlation (bivariate, paired x and y)

  • 'spearman': Spearman correlation (bivariate, paired x and y)

  • 'cohen': Cohen d effect size (bivariate, paired or unpaired x and y)

  • 'hedges': Hedges g effect size (bivariate, paired or unpaired x and y)

  • 'mean': Mean (univariate = only x)

  • 'std': Standard deviation (univariate)

  • 'var': Variance (univariate)

methodstr

Method to compute the confidence intervals (see Notes):

  • 'bca': Bias-corrected and accelerated (BCa, default)

  • 'per': Simple percentile

  • 'basic': Basic (pivotal) method

  • 'norm': Normal approximation with bootstrapped bias and standard error

pairedboolean

Indicates whether x and y are paired or not. For example, for correlation functions or paired T-test, x and y are assumed to be paired. Pingouin will resample the pairs (x_i, y_i) when paired=True, and resample x and y separately when paired=False. If paired=True, x and y must have the same number of elements.

confidencefloat

Confidence level (0.95 = 95%)

n_bootint

Number of bootstrap iterations. The higher, the better, the slower.

decimalsint

Number of rounded decimals.

seedint or None

Random seed for generating bootstrap samples.

return_distboolean

If True, return the confidence intervals and the bootstrapped distribution (e.g. for plotting purposes).

Returns:
ciarray

Bootstrapped confidence intervals.

Notes

This function uses scipy.stats.bootstrap() under the hood. Requires SciPy >= 1.10.

The bias-corrected and accelerated method (bca, default) corrects for both bias and skewness of the bootstrap distribution using jackknife resampling.

The percentile bootstrap method (per) is defined as the \(100 \times \frac{\alpha}{2}\) and \(100 \times \frac{1 - \alpha}{2}\) percentiles of the distribution of \(\theta\) estimates obtained from resampling, where \(\alpha\) is the level of significance (1 - confidence, default = 0.05 for 95% CIs).

The basic (pivotal) method (basic) reflects the bootstrap distribution around the observed statistic.

The normal approximation method (norm) calculates the confidence intervals with the standard normal distribution using bootstrapped bias and standard error.

References

  • DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical science, 189-212.

  • Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application (Vol. 1). Cambridge university press.

  • Jung, Lee, Gupta, & Cho (2019). Comparison of bootstrap confidence interval methods for GSCA using a Monte Carlo simulation. Frontiers in psychology, 10, 2215.

Examples

  1. Bootstrapped 95% BCa confidence interval of a Pearson correlation

>>> import pingouin as pg
>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> x = rng.normal(loc=4, scale=2, size=100)
>>> y = rng.normal(loc=3, scale=1, size=100)
>>> stat = np.corrcoef(x, y)[0][1]
>>> ci = pg.compute_bootci(x, y, func="pearson", paired=True, seed=42, decimals=4)
>>> print(round(stat, 4), ci)
0.0945 [-0.0986  0.2868]
  1. Bootstrapped 95% BCa confidence interval of a Cohen d

>>> stat = pg.compute_effsize(x, y, eftype="cohen")
>>> ci = pg.compute_bootci(x, y, func="cohen", seed=42, decimals=3)
>>> print(round(stat, 4), ci)
0.7009 [0.413 1.01 ]
  1. Bootstrapped BCa confidence interval of a standard deviation (univariate)

>>> import numpy as np
>>> stat = np.std(x, ddof=1)
>>> ci = pg.compute_bootci(x, func="std", seed=123)
>>> print(round(stat, 4), ci)
1.5534 [1.39 1.81]

Changing the confidence intervals method

>>> pg.compute_bootci(x, func="std", seed=123, method="norm")
array([1.37, 1.76])
>>> pg.compute_bootci(x, func="std", seed=123, method="percentile")
array([1.36, 1.75])
  1. Bootstrapped confidence interval using a custom univariate function

>>> from scipy.stats import skew
>>> round(skew(x), 4), pg.compute_bootci(x, func=skew, n_boot=10000, seed=123)
(-0.137, array([-0.51,  0.38]))

5. Bootstrapped confidence interval using a custom bivariate function. Here, x and y are not paired and can therefore have different sizes.

>>> def mean_diff(x, y):
...     return np.mean(x) - np.mean(y)
>>> y2 = rng.normal(loc=3, scale=1, size=200)  # y2 has 200 samples, x has 100
>>> ci = pg.compute_bootci(x, y2, func=mean_diff, n_boot=10000, seed=123)
>>> print(round(mean_diff(x, y2), 2), ci)
0.88 [0.54 1.21]

We can also get the bootstrapped distribution

>>> ci, bt = pg.compute_bootci(x, y2, func=mean_diff, n_boot=10000, return_dist=True, seed=9)
>>> print(
...     f"The bootstrap distribution has {bt.size} samples. The mean and standard "
...     f"{bt.mean():.4f} ± {bt.std():.4f}"
... )
The bootstrap distribution has 10000 samples. The mean and standard 0.8792 ± 0.1707