pingouin.wilcoxon

pingouin.
wilcoxon
(x, y, tail='twosided')[source] Wilcoxon signedrank test. It is the nonparametric version of the paired Ttest.
 Parameters
 x, yarray_like
First and second set of observations.
x
andy
must be related (e.g repeated measures) and, therefore, have the same number of samples. Note that a listwise deletion of missing values is automatically applied. tailstring
Specify whether to return ‘onesided’ or ‘twosided’ pvalue. Can also be ‘greater’ or ‘less’ to specify the direction of the test. If
tail='onesided'
, the alternative of the test will be automatically detected by looking at the sign of the median of the differences betweenx
andy
. For instance, ifnp.median(x  y) > 0
andtail='onesided'
, Pingouin will automatically settail='greater'
and vice versa.
 Returns
 stats
pandas.DataFrame
'Wval'
: Wvalue'pval'
: pvalue'RBC'
: matched pairs rankbiserial correlation (effect size)'CLES'
: common language effect size
 stats
See also
Notes
The Wilcoxon signedrank test [1] tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x  y is symmetric about zero. A continuity correction is applied by default (see
scipy.stats.wilcoxon()
for details).The matched pairs rank biserial correlation [2] is the simple difference between the proportion of favorable and unfavorable evidence; in the case of the Wilcoxon signedrank test, the evidence consists of rank sums (Kerby 2014):
\[r = f  u\]The common language effect size is the proportion of pairs where
x
is higher thany
. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a bruteforce version of the formula given by Vargha and Delaney 2000 [4]:\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]The advantage is of this method are twofold. First, the bruteforce approach pairs each observation of
x
to itsy
counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.When tail is
'less'
, the CLES is then set to \(1  \text{CL}\), which gives the proportion of pairs wherex
is lower thany
.Warning
Versions of Pingouin below 0.2.6 gave wrong twosided pvalues for the Wilcoxon test. Pvalues were accidentally squared, and therefore smaller. This issue has been resolved in Pingouin>=0.2.6. Make sure to always use the latest release.
References
 1
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics bulletin, 1(6), 8083.
 2
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11IT.
 3
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.
 4
Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329
Examples
Wilcoxon test on two related samples.
>>> import numpy as np >>> import pingouin as pg >>> x = [20, 22, 19, 20, 22, 18, 24, 20, 19, 24, 26, 13] >>> y = [38, 37, 33, 29, 14, 12, 20, 22, 17, 25, 26, 16] >>> pg.wilcoxon(x, y, tail='twosided') Wval tail pval RBC CLES Wilcoxon 20.5 twosided 0.285765 0.378788 0.395833
Compare with SciPy
>>> import scipy >>> scipy.stats.wilcoxon(x, y, correction=True) WilcoxonResult(statistic=20.5, pvalue=0.2857652190231508)
Onesided tail: one can either manually specify the alternative hypothesis
>>> pg.wilcoxon(x, y, tail='greater') Wval tail pval RBC CLES Wilcoxon 20.5 greater 0.876244 0.378788 0.395833
>>> pg.wilcoxon(x, y, tail='less') Wval tail pval RBC CLES Wilcoxon 20.5 less 0.142883 0.378788 0.604167
Or simply leave it to Pingouin, using the ‘onesided’ argument, in which case Pingouin will look at the sign of the median of the differences between
x
andy
and ajust the tail based on that:>>> np.median(np.array(x)  np.array(y)) 1.5
The median is negative, so Pingouin will test for the alternative hypothesis that the median of the differences is negative (= less than 0).
>>> pg.wilcoxon(x, y, tail='onesided') # Equivalent to tail = 'less' Wval tail pval RBC CLES Wilcoxon 20.5 less 0.142883 0.378788 0.604167