pingouin.wilcoxon
-
pingouin.
wilcoxon
(x, y, tail='two-sided')[source] Wilcoxon signed-rank test. It is the non-parametric version of the paired T-test.
- Parameters
- x, yarray_like
First and second set of observations.
x
andy
must be related (e.g repeated measures) and, therefore, have the same number of samples. Note that a listwise deletion of missing values is automatically applied.- tailstring
Specify whether to return ‘one-sided’ or ‘two-sided’ p-value. Can also be ‘greater’ or ‘less’ to specify the direction of the test. If
tail='one-sided'
, the alternative of the test will be automatically detected by looking at the sign of the median of the differences betweenx
andy
. For instance, ifnp.median(x - y) > 0
andtail='one-sided'
, Pingouin will automatically settail='greater'
and vice versa.
- Returns
- stats
pandas.DataFrame
'W-val'
: W-value'p-val'
: p-value'RBC'
: matched pairs rank-biserial correlation (effect size)'CLES'
: common language effect size
- stats
See also
Notes
The Wilcoxon signed-rank test [1] tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x - y is symmetric about zero. A continuity correction is applied by default (see
scipy.stats.wilcoxon()
for details).The matched pairs rank biserial correlation [2] is the simple difference between the proportion of favorable and unfavorable evidence; in the case of the Wilcoxon signed-rank test, the evidence consists of rank sums (Kerby 2014):
\[r = f - u\]The common language effect size is the proportion of pairs where
x
is higher thany
. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a brute-force version of the formula given by Vargha and Delaney 2000 [4]:\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]The advantage is of this method are twofold. First, the brute-force approach pairs each observation of
x
to itsy
counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.When tail is
'less'
, the CLES is then set to \(1 - \text{CL}\), which gives the proportion of pairs wherex
is lower thany
.Warning
Versions of Pingouin below 0.2.6 gave wrong two-sided p-values for the Wilcoxon test. P-values were accidentally squared, and therefore smaller. This issue has been resolved in Pingouin>=0.2.6. Make sure to always use the latest release.
References
- 1
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics bulletin, 1(6), 80-83.
- 2
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.
- 3
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.
- 4
Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329
Examples
Wilcoxon test on two related samples.
>>> import numpy as np >>> import pingouin as pg >>> x = [20, 22, 19, 20, 22, 18, 24, 20, 19, 24, 26, 13] >>> y = [38, 37, 33, 29, 14, 12, 20, 22, 17, 25, 26, 16] >>> pg.wilcoxon(x, y, tail='two-sided') W-val tail p-val RBC CLES Wilcoxon 20.5 two-sided 0.285765 -0.378788 0.395833
Compare with SciPy
>>> import scipy >>> scipy.stats.wilcoxon(x, y, correction=True) WilcoxonResult(statistic=20.5, pvalue=0.2857652190231508)
One-sided tail: one can either manually specify the alternative hypothesis
>>> pg.wilcoxon(x, y, tail='greater') W-val tail p-val RBC CLES Wilcoxon 20.5 greater 0.876244 -0.378788 0.395833
>>> pg.wilcoxon(x, y, tail='less') W-val tail p-val RBC CLES Wilcoxon 20.5 less 0.142883 -0.378788 0.604167
Or simply leave it to Pingouin, using the ‘one-sided’ argument, in which case Pingouin will look at the sign of the median of the differences between
x
andy
and ajust the tail based on that:>>> np.median(np.array(x) - np.array(y)) -1.5
The median is negative, so Pingouin will test for the alternative hypothesis that the median of the differences is negative (= less than 0).
>>> pg.wilcoxon(x, y, tail='one-sided') # Equivalent to tail = 'less' W-val tail p-val RBC CLES Wilcoxon 20.5 less 0.142883 -0.378788 0.604167