pingouin.mwu#
- pingouin.mwu(x, y, alternative='two-sided', **kwargs)[source]#
Mann-Whitney U Test (= Wilcoxon rank-sum test). It is the non-parametric version of the independent T-test.
- Parameters:
- x, yarray_like
First and second set of observations.
xandymust be independent.- alternativestring
Defines the alternative hypothesis, or tail of the test. Must be one of “two-sided” (default), “greater” or “less”. See
scipy.stats.mannwhitneyu()for more details.- **kwargsdict
Additional keywords arguments that are passed to
scipy.stats.mannwhitneyu().
- Returns:
- stats
pandas.DataFrame 'U-val': U-value corresponding with sample x'alternative': tail of the test'p-val': p-value'RBC': rank-biserial correlation'CLES': common language effect size
- stats
See also
Notes
The Mann–Whitney U test [1] (also called Wilcoxon rank-sum test) is a non-parametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. The test assumes that the two samples are independent. This test corrects for ties and by default uses a continuity correction (see
scipy.stats.mannwhitneyu()for details).The rank biserial correlation [2] is the difference between the proportion of favorable evidence minus the proportion of unfavorable evidence. Values range from -1 to 1, with negative values indicating that y > x, and positive values indicating x > y.
The common language effect size is the proportion of pairs where
xis higher thany. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a brute-force version of the formula given by Vargha and Delaney 2000 [4]:\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]The advantage is of this method are twofold. First, the brute-force approach pairs each observation of
xto itsycounterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.When tail is
'less', the CLES is then set to \(1 - \text{CL}\), which gives the proportion of pairs wherexis lower thany.References
[1]Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50-60.
[2]Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.
[3]McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.
[4]Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329
Examples
>>> import numpy as np >>> import pingouin as pg >>> np.random.seed(123) >>> x = np.random.uniform(low=0, high=1, size=20) >>> y = np.random.uniform(low=0.2, high=1.2, size=20) >>> pg.mwu(x, y, alternative='two-sided') U-val alternative p-val RBC CLES MWU 97.0 two-sided 0.00556 -0.515 0.2425
Compare with SciPy
>>> import scipy >>> scipy.stats.mannwhitneyu(x, y, use_continuity=True, alternative='two-sided') MannwhitneyuResult(statistic=97.0, pvalue=0.0055604599321374135)
One-sided test
>>> pg.mwu(x, y, alternative='greater') U-val alternative p-val RBC CLES MWU 97.0 greater 0.997442 -0.515 0.2425
>>> pg.mwu(x, y, alternative='less') U-val alternative p-val RBC CLES MWU 97.0 less 0.00278 -0.515 0.7575
Passing keyword arguments to
scipy.stats.mannwhitneyu():>>> pg.mwu(x, y, alternative='two-sided', method='exact') U-val alternative p-val RBC CLES MWU 97.0 two-sided 0.004681 -0.515 0.2425
Reversing the order of x and y.
>>> pg.mwu(y, x) U-val alternative p-val RBC CLES MWU 303.0 two-sided 0.00556 0.515 0.7575