pingouin.mwu

pingouin.
mwu
(x, y, tail='twosided')[source] MannWhitney U Test (= Wilcoxon ranksum test). It is the nonparametric version of the independent Ttest.
 Parameters
 x, yarray_like
First and second set of observations.
x
andy
must be independent. tailstring
Specify whether to return ‘onesided’ or ‘twosided’ pvalue. Can also be ‘greater’ or ‘less’ to specify the direction of the test. If
tail='onesided'
, the alternative of the test will be automatically detected by comparing the medians ofx
andy
. For instance, if median(x
) < median(y
) andtail='onesided'
, Pingouin will automatically settail='less'
, and vice versa.
 Returns
 stats
pandas.DataFrame
'Uval'
: Uvalue'pval'
: pvalue'RBC'
: rankbiserial correlation'CLES'
: common language effect size
 stats
See also
Notes
The Mann–Whitney U test [1] (also called Wilcoxon ranksum test) is a nonparametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. The test assumes that the two samples are independent. This test corrects for ties and by default uses a continuity correction (see
scipy.stats.mannwhitneyu()
for details).The rank biserial correlation [2] is the difference between the proportion of favorable evidence minus the proportion of unfavorable evidence.
The common language effect size is the proportion of pairs where
x
is higher thany
. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a bruteforce version of the formula given by Vargha and Delaney 2000 [4]:\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]The advantage is of this method are twofold. First, the bruteforce approach pairs each observation of
x
to itsy
counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.When tail is
'less'
, the CLES is then set to \(1  \text{CL}\), which gives the proportion of pairs wherex
is lower thany
.References
 1
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 5060.
 2
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11IT.
 3
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.
 4
Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329
Examples
>>> import numpy as np >>> import pingouin as pg >>> np.random.seed(123) >>> x = np.random.uniform(low=0, high=1, size=20) >>> y = np.random.uniform(low=0.2, high=1.2, size=20) >>> pg.mwu(x, y, tail='twosided') Uval tail pval RBC CLES MWU 97.0 twosided 0.00556 0.515 0.2425
Compare with SciPy
>>> import scipy >>> scipy.stats.mannwhitneyu(x, y, use_continuity=True, ... alternative='twosided') MannwhitneyuResult(statistic=97.0, pvalue=0.0055604599321374135)
Onesided tail: one can either manually specify the alternative hypothesis
>>> pg.mwu(x, y, tail='greater') Uval tail pval RBC CLES MWU 97.0 greater 0.997442 0.515 0.2425
>>> pg.mwu(x, y, tail='less') Uval tail pval RBC CLES MWU 97.0 less 0.00278 0.515 0.7575
Or simply leave it to Pingouin, using the ‘onesided’ argument, in which case Pingouin will compare the medians of
x
andy
and select the most appropriate tail based on that:>>> # Since np.median(x) < np.median(y), this is equivalent to tail='less' >>> pg.mwu(x, y, tail='onesided') Uval tail pval RBC CLES MWU 97.0 less 0.00278 0.515 0.7575