pingouin.sphericity

pingouin.sphericity(data, dv=None, within=None, subject=None, method='mauchly', alpha=0.05)[source]

Mauchly and JNS test for sphericity.

Parameters
datapd.DataFrame

DataFrame containing the repeated measurements. Both wide and long-format dataframe are supported for this function. To test for an interaction term between two repeated measures factors with a wide-format dataframe, data must have a two-levels pandas.MultiIndex columns.

dvstring

Name of column containing the dependant variable (only required if data is in long format).

withinstring

Name of column containing the within factor (only required if data is in long format). If within is a list with two strings, this function computes the epsilon factor for the interaction between the two within-subject factor.

subjectstring

Name of column containing the subject identifier (only required if data is in long format).

methodstr

Method to compute sphericity

'jns' : John, Nagao and Sugiura test.
'mauchly' : Mauchly test (default).
alphafloat

Significance level

Returns
spherboolean

True if data have the sphericity property.

Wfloat

Test statistic.

chi_sqfloat

Chi-square statistic.

ddofint

Degrees of freedom.

pfloat

P-value.

Raises
ValueError

When testing for an interaction, if both within-subject factors have more than 2 levels (not yet supported in Pingouin).

See also

epsilon

Epsilon adjustement factor for repeated measures.

homoscedasticity

Test equality of variance.

normality

Univariate normality test.

Notes

The Mauchly \(W\) statistic is defined by:

\[W = \frac{\prod \lambda_j}{(\frac{1}{k-1} \sum \lambda_j)^{k-1}}\]

where \(\lambda_j\) are the eigenvalues of the population covariance matrix (= double-centered sample covariance matrix) and \(k\) is the number of conditions.

From then, the \(W\) statistic is transformed into a chi-square score using the number of observations per condition \(n\)

\[f = \frac{2(k-1)^2+k+1}{6(k-1)(n-1)}\]
\[\chi_w^2 = (f-1)(n-1) \text{log}(W)\]

The p-value is then approximated using a chi-square distribution:

\[\chi_w^2 \sim \chi^2(\frac{k(k-1)}{2}-1)\]

The JNS \(V\) statistic is defined by:

\[V = \frac{(\sum_j^{k-1} \lambda_j)^2}{\sum_j^{k-1} \lambda_j^2}\]
\[\chi_v^2 = \frac{n}{2} (k-1)^2 (V - \frac{1}{k-1})\]

and the p-value approximated using a chi-square distribution

\[\chi_v^2 \sim \chi^2(\frac{k(k-1)}{2}-1)\]

Missing values are automatically removed from data (listwise deletion).

References

1

Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. The Annals of Mathematical Statistics, 11(2), 204-209.

2

Nagao, H. (1973). On some test criteria for covariance matrix. The Annals of Statistics, 700-709.

3

Sugiura, N. (1972). Locally best invariant test for sphericity and the limiting distributions. The Annals of Mathematical Statistics, 1312-1316.

4

John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59(1), 169-173.

5

http://www.real-statistics.com/anova-repeated-measures/sphericity/

Examples

Mauchly test for sphericity using a wide-format dataframe

>>> import pandas as pd
>>> import pingouin as pg
>>> data = pd.DataFrame({'A': [2.2, 3.1, 4.3, 4.1, 7.2],
...                      'B': [1.1, 2.5, 4.1, 5.2, 6.4],
...                      'C': [8.2, 4.5, 3.4, 6.2, 7.2]})
>>> pg.sphericity(data)
(True, 0.21, 4.677, 2, 0.09649016283209666)

John, Nagao and Sugiura (JNS) test

>>> pg.sphericity(data, method='jns')
(False, 1.118, 6.176, 2, 0.0456042403075203)

Now using a long-format dataframe

>>> data = pg.read_dataset('rm_anova2')
>>> data.head()
   Subject Time   Metric  Performance
0        1  Pre  Product           13
1        2  Pre  Product           12
2        3  Pre  Product           17
3        4  Pre  Product           12
4        5  Pre  Product           19

Let’s first test sphericity for the Time within-subject factor

>>> pg.sphericity(data, dv='Performance', subject='Subject',
...            within='Time')
(True, nan, nan, 1, 1.0)

Since Time has only two levels (Pre and Post), the sphericity assumption is necessarily met.

The Metric factor, however, has three levels:

>>> pg.sphericity(data, dv='Performance', subject='Subject',
...            within=['Metric'])
(True, 0.968, 0.259, 2, 0.8784417991645136)

The p-value value is very large, and the test therefore indicates that there is no violation of sphericity.

Now, let’s calculate the epsilon for the interaction between the two repeated measures factor. The current implementation in Pingouin only works if at least one of the two within-subject factors has no more than two levels.

>>> pg.sphericity(data, dv='Performance', subject='Subject',
...            within=['Time', 'Metric'])
(True, 0.625, 3.763, 2, 0.15239168046050933)

Here again, there is no violation of sphericity acccording to Mauchly’s test.

Alternatively, we could use a wide-format dataframe with two column levels:

>>> # Pivot from long-format to wide-format
>>> piv = data.pivot_table(index='Subject', columns=['Time', 'Metric'],
...                        values='Performance')
>>> piv.head()
Time      Post                   Pre
Metric  Action Client Product Action Client Product
Subject
1           34     30      18     17     12      13
2           30     18       6     18     19      12
3           32     31      21     24     19      17
4           40     39      18     25     25      12
5           27     28      18     19     27      19
>>> pg.sphericity(piv)
(True, 0.625, 3.763, 2, 0.15239168046050933)

which gives the same output as the long-format dataframe.