pingouin.normality

pingouin.normality(*args, alpha=0.05)[source]

Shapiro-Wilk univariate normality test.

Parameters
sample1, sample2,…array_like

Array of sample data. May be of different lengths.

Returns
normalboolean

True if x comes from a normal distribution.

pfloat

P-value.

See also

homoscedasticity

Test equality of variance.

sphericity

Mauchly’s test for sphericity.

Notes

The Shapiro-Wilk test calculates a \(W\) statistic that tests whether a random sample \(x_1, x_2, ..., x_n\) comes from a normal distribution.

The \(W\) statistic is calculated as follows:

\[W = \frac{(\sum_{i=1}^n a_i x_{i})^2} {\sum_{i=1}^n (x_i - \overline{x})^2}\]

where the \(x_i\) are the ordered sample values (in ascending order) and the \(a_i\) are constants generated from the means, variances and covariances of the order statistics of a sample of size \(n\) from a standard normal distribution. Specifically:

\[(a_1, ..., a_n) = \frac{m^TV^{-1}}{(m^TV^{-1}V^{-1}m)^{1/2}}\]

with \(m = (m_1, ..., m_n)^T\) and \((m_1, ..., m_n)\) are the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution, and \(V\) is the covariance matrix of those order statistics.

The null-hypothesis of this test is that the population is normally distributed. Thus, if the p-value is less than the chosen alpha level (typically set at 0.05), then the null hypothesis is rejected and there is evidence that the data tested are not normally distributed.

The result of the Shapiro-Wilk test should be interpreted with caution in the case of large sample sizes. Indeed, quoting from Wikipedia:

“Like most statistical significance tests, if the sample size is sufficiently large this test may detect even trivial departures from the null hypothesis (i.e., although there may be some statistically significant effect, it may be too small to be of any practical significance); thus, additional investigation of the effect size is typically advisable, e.g., a Q–Q plot in this case.”

References

1

Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3/4), 591-611.

2

https://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm

3

https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test

Examples

  1. Test the normality of one array.

>>> import numpy as np
>>> from pingouin import normality
>>> np.random.seed(123)
>>> x = np.random.normal(size=100)
>>> normal, p = normality(x, alpha=.05)
>>> print(normal, p)
True 0.275
  1. Test the normality of two arrays.

>>> import numpy as np
>>> from pingouin import normality
>>> np.random.seed(123)
>>> x = np.random.normal(size=100)
>>> y = np.random.rand(100)
>>> normal, p = normality(x, y, alpha=.05)
>>> print(normal, p)
[ True False] [0.275 0.001]