pingouin.sphericity

pingouin.
sphericity
(data, dv=None, within=None, subject=None, method='mauchly', alpha=0.05)[source] Mauchly and JNS test for sphericity.
 Parameters
 data
pandas.DataFrame
DataFrame containing the repeated measurements. Both wide and longformat dataframe are supported for this function. To test for an interaction term between two repeated measures factors with a wideformat dataframe,
data
must have a twolevelspandas.MultiIndex
columns. dvstring
Name of column containing the dependent variable (only required if
data
is in long format). withinstring
Name of column containing the within factor (only required if
data
is in long format). Ifwithin
is a list with two strings, this function computes the epsilon factor for the interaction between the two withinsubject factor. subjectstring
Name of column containing the subject identifier (only required if
data
is in long format). methodstr
Method to compute sphericity:
‘jns’: John, Nagao and Sugiura test.
‘mauchly’: Mauchly test (default).
 alphafloat
Significance level
 data
 Returns
 spherboolean
True if data have the sphericity property.
 Wfloat
Test statistic.
 chi2float
Chisquare statistic.
 dofint
Degrees of freedom.
 pvalfloat
Pvalue.
 Raises
 ValueError
When testing for an interaction, if both withinsubject factors have more than 2 levels (not yet supported in Pingouin).
See also
epsilon
Epsilon adjustement factor for repeated measures.
homoscedasticity
Test equality of variance.
normality
Univariate normality test.
Notes
The Mauchly \(W\) statistic [1] is defined by:
\[W = \frac{\prod \lambda_j}{(\frac{1}{k1} \sum \lambda_j)^{k1}}\]where \(\lambda_j\) are the eigenvalues of the population covariance matrix (= doublecentered sample covariance matrix) and \(k\) is the number of conditions.
From then, the \(W\) statistic is transformed into a chisquare score using the number of observations per condition \(n\)
\[f = \frac{2(k1)^2+k+1}{6(k1)(n1)}\]\[\chi_w^2 = (f1)(n1) \text{log}(W)\]The pvalue is then approximated using a chisquare distribution:
\[\chi_w^2 \sim \chi^2(\frac{k(k1)}{2}1)\]The JNS \(V\) statistic ([2], [3], [4]) is defined by:
\[V = \frac{(\sum_j^{k1} \lambda_j)^2}{\sum_j^{k1} \lambda_j^2}\]\[\chi_v^2 = \frac{n}{2} (k1)^2 (V  \frac{1}{k1})\]and the pvalue approximated using a chisquare distribution
\[\chi_v^2 \sim \chi^2(\frac{k(k1)}{2}1)\]Missing values are automatically removed from
data
(listwise deletion).References
 1
Mauchly, J. W. (1940). Significance test for sphericity of a normal nvariate distribution. The Annals of Mathematical Statistics, 11(2), 204209.
 2
Nagao, H. (1973). On some test criteria for covariance matrix. The Annals of Statistics, 700709.
 3
Sugiura, N. (1972). Locally best invariant test for sphericity and the limiting distributions. The Annals of Mathematical Statistics, 13121316.
 4
John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59(1), 169173.
See also http://www.realstatistics.com/anovarepeatedmeasures/sphericity/
Examples
Mauchly test for sphericity using a wideformat dataframe
>>> import pandas as pd >>> import pingouin as pg >>> data = pd.DataFrame({'A': [2.2, 3.1, 4.3, 4.1, 7.2], ... 'B': [1.1, 2.5, 4.1, 5.2, 6.4], ... 'C': [8.2, 4.5, 3.4, 6.2, 7.2]}) >>> spher, W, chisq, dof, pval = pg.sphericity(data) >>> print(spher, round(W, 3), round(chisq, 3), dof, round(pval, 3)) True 0.21 4.677 2 0.096
John, Nagao and Sugiura (JNS) test
>>> round(pg.sphericity(data, method='jns')[1], 3) # Pvalue only 0.046
Now using a longformat dataframe
>>> data = pg.read_dataset('rm_anova2') >>> data.head() Subject Time Metric Performance 0 1 Pre Product 13 1 2 Pre Product 12 2 3 Pre Product 17 3 4 Pre Product 12 4 5 Pre Product 19
Let’s first test sphericity for the Time withinsubject factor
>>> pg.sphericity(data, dv='Performance', subject='Subject', ... within='Time') (True, nan, nan, 1, 1.0)
Since Time has only two levels (Pre and Post), the sphericity assumption is necessarily met.
The Metric factor, however, has three levels:
>>> round(pg.sphericity(data, dv='Performance', subject='Subject', ... within=['Metric'])[1], 3) 0.878
The pvalue value is very large, and the test therefore indicates that there is no violation of sphericity.
Now, let’s calculate the epsilon for the interaction between the two repeated measures factor. The current implementation in Pingouin only works if at least one of the two withinsubject factors has no more than two levels.
>>> spher, _, chisq, dof, pval = pg.sphericity(data, dv='Performance', ... subject='Subject', ... within=['Time', 'Metric']) >>> print(spher, round(chisq, 3), dof, round(pval, 3)) True 3.763 2 0.152
Here again, there is no violation of sphericity acccording to Mauchly’s test.
Alternatively, we could use a wideformat dataframe with two column levels:
>>> # Pivot from longformat to wideformat >>> piv = data.pivot_table(index='Subject', columns=['Time', 'Metric'], ... values='Performance') >>> piv.head() Time Post Pre Metric Action Client Product Action Client Product Subject 1 34 30 18 17 12 13 2 30 18 6 18 19 12 3 32 31 21 24 19 17 4 40 39 18 25 25 12 5 27 28 18 19 27 19
>>> spher, _, chisq, dof, pval = pg.sphericity(piv) >>> print(spher, round(chisq, 3), dof, round(pval, 3)) True 3.763 2 0.152
which gives the same output as the longformat dataframe.