pingouin.compute_effsize#

pingouin.compute_effsize(x, y, paired=False, eftype='cohen')[source]#

Calculate effect size between two set of observations.

Parameters:
xnp.array or list

First set of observations.

ynp.array, list, or float

Second set of observations, or a scalar representing the population mean for a one-sample test (e.g. y=0 to compare x against zero).

pairedboolean

If True, uses Cohen d-avg formula to correct for repeated measurements (see Notes).

eftypestring

Desired output effect size. Available methods are:

  • 'none': no effect size

  • 'cohen': Unbiased Cohen d

  • 'cohen_dz': Cohen \(d_z\) (paired samples only, see Notes)

  • 'hedges': Hedges g

  • 'r': Pearson correlation coefficient

  • 'pointbiserialr': Point-biserial correlation

  • 'eta_square': Eta-square

  • 'odds_ratio': Odds ratio

  • 'AUC': Area Under the Curve

  • 'CLES': Common Language Effect Size

Returns:
effloat

Effect size

See also

convert_effsize

Conversion between effect sizes.

compute_effsize_from_t

Convert a T-statistic to an effect size.

Notes

Missing values are automatically removed from the data. If x and y are paired, the entire row is removed.

If x and y are independent, the Cohen \(d\) is:

\[d = \frac{\overline{X} - \overline{Y}} {\sqrt{\frac{(n_{1} - 1)\sigma_{1}^{2} + (n_{2} - 1) \sigma_{2}^{2}}{n1 + n2 - 2}}}\]

If x and y are paired, the Cohen \(d_{avg}\) is computed:

\[d_{avg} = \frac{\overline{X} - \overline{Y}} {\sqrt{\frac{(\sigma_1^2 + \sigma_2^2)}{2}}}\]

The Cohen \(d_z\) (eftype='cohen_dz') uses the standard deviation of the difference scores (Lakens, 2013):

\[d_z = \frac{\overline{X - Y}}{\sigma_{X-Y}}\]

Note that \(d_z = t / \sqrt{n}\) where \(t\) is the paired t-statistic and \(n\) is the number of pairs. cohen_dz requires paired=True, if paired=False a warning is raised and the standard Cohen’s d is returned instead.

One-sample test: if y is a scalar (e.g. a known population mean \(\mu\)), the effect size is computed as:

\[d = \frac{\overline{X} - \mu}{\sigma_X}\]

The Cohen’s d is a biased estimate of the population effect size, especially for small samples (n < 20). It is often preferable to use the corrected Hedges \(g\) instead:

\[g = d \times (1 - \frac{3}{4(n_1 + n_2) - 9})\]

The common language effect size is the proportion of pairs where x is higher than y (calculated with a brute-force approach where each observation of x is paired to each observation of y, see pingouin.wilcoxon() for more details):

\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]

For other effect sizes, Pingouin will first calculate a Cohen \(d\) and then use the pingouin.convert_effsize() to convert to the desired effect size.

References

  • Lakens, D., 2013. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front. Psychol. 4, 863. https://doi.org/10.3389/fpsyg.2013.00863

  • Cumming, Geoff. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge, 2013.

  • https://osf.io/vbdah/

Examples

  1. Cohen d from two independent samples.

>>> import numpy as np
>>> import pingouin as pg
>>> x = [1, 2, 3, 4]
>>> y = [3, 4, 5, 6, 7]
>>> pg.compute_effsize(x, y, paired=False, eftype="cohen")
-1.707825127659933

The sign of the Cohen d will be opposite if we reverse the order of x and y:

>>> pg.compute_effsize(y, x, paired=False, eftype="cohen")
1.707825127659933
  1. Hedges g from two paired samples.

>>> x = [1, 2, 3, 4, 5, 6, 7]
>>> y = [1, 3, 5, 7, 9, 11, 13]
>>> pg.compute_effsize(x, y, paired=True, eftype="hedges")
-0.8222477210374874
  1. Common Language Effect Size.

>>> pg.compute_effsize(x, y, eftype="cles")
0.2857142857142857

In other words, there are ~29% of pairs where x is higher than y, which means that there are ~71% of pairs where x is lower than y. This can be easily verified by changing the order of x and y:

>>> pg.compute_effsize(y, x, eftype="cles")
0.7142857142857143
  1. One-sample Cohen d: compare x against a known population mean (e.g. 0).

>>> x = [1, 2, 3, 4, 5, 6, 7]
>>> pg.compute_effsize(x, y=0, eftype="cohen")
1.8516401995451028

This is equivalent to (mean(x) - 0) / std(x, ddof=1).

  1. Cohen \(d_z\) for paired samples.

>>> x = [1, 2, 3, 4, 5, 6, 7]
>>> y = [1, 3, 5, 7, 9, 11, 13]
>>> pg.compute_effsize(x, y, paired=True, eftype="cohen_dz")
-1.3887301496588271