pingouin.mwu
-
pingouin.
mwu
(x, y, tail='two-sided')[source] Mann-Whitney U Test (= Wilcoxon rank-sum test). It is the non-parametric version of the independent T-test.
- Parameters
- x, yarray_like
First and second set of observations.
x
andy
must be independent.- tailstring
Specify whether to return ‘one-sided’ or ‘two-sided’ p-value. Can also be ‘greater’ or ‘less’ to specify the direction of the test. If
tail='one-sided'
, the alternative of the test will be automatically detected by comparing the medians ofx
andy
. For instance, if median(x
) < median(y
) andtail='one-sided'
, Pingouin will automatically settail='less'
, and vice versa.
- Returns
- stats
pandas.DataFrame
'U-val'
: U-value'p-val'
: p-value'RBC'
: rank-biserial correlation'CLES'
: common language effect size
- stats
See also
Notes
The Mann–Whitney U test [1] (also called Wilcoxon rank-sum test) is a non-parametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. The test assumes that the two samples are independent. This test corrects for ties and by default uses a continuity correction (see
scipy.stats.mannwhitneyu()
for details).The rank biserial correlation [2] is the difference between the proportion of favorable evidence minus the proportion of unfavorable evidence.
The common language effect size is the proportion of pairs where
x
is higher thany
. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a brute-force version of the formula given by Vargha and Delaney 2000 [4]:\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]The advantage is of this method are twofold. First, the brute-force approach pairs each observation of
x
to itsy
counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.When tail is
'less'
, the CLES is then set to \(1 - \text{CL}\), which gives the proportion of pairs wherex
is lower thany
.References
- 1
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50-60.
- 2
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.
- 3
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.
- 4
Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329
Examples
>>> import numpy as np >>> import pingouin as pg >>> np.random.seed(123) >>> x = np.random.uniform(low=0, high=1, size=20) >>> y = np.random.uniform(low=0.2, high=1.2, size=20) >>> pg.mwu(x, y, tail='two-sided') U-val tail p-val RBC CLES MWU 97.0 two-sided 0.00556 0.515 0.2425
Compare with SciPy
>>> import scipy >>> scipy.stats.mannwhitneyu(x, y, use_continuity=True, ... alternative='two-sided') MannwhitneyuResult(statistic=97.0, pvalue=0.0055604599321374135)
One-sided tail: one can either manually specify the alternative hypothesis
>>> pg.mwu(x, y, tail='greater') U-val tail p-val RBC CLES MWU 97.0 greater 0.997442 0.515 0.2425
>>> pg.mwu(x, y, tail='less') U-val tail p-val RBC CLES MWU 97.0 less 0.00278 0.515 0.7575
Or simply leave it to Pingouin, using the ‘one-sided’ argument, in which case Pingouin will compare the medians of
x
andy
and select the most appropriate tail based on that:>>> # Since np.median(x) < np.median(y), this is equivalent to tail='less' >>> pg.mwu(x, y, tail='one-sided') U-val tail p-val RBC CLES MWU 97.0 less 0.00278 0.515 0.7575