pingouin.epsilon

pingouin.epsilon(data, dv=None, within=None, subject=None, correction='gg')[source]

Epsilon adjustement factor for repeated measures.

Parameters
datapd.DataFrame

DataFrame containing the repeated measurements. Both wide and long-format dataframe are supported for this function. To test for an interaction term between two repeated measures factors with a wide-format dataframe, data must have a two-levels pandas.MultiIndex columns.

dvstring

Name of column containing the dependant variable (only required if data is in long format).

withinstring

Name of column containing the within factor (only required if data is in long format). If within is a list with two strings, this function computes the epsilon factor for the interaction between the two within-subject factor.

subjectstring

Name of column containing the subject identifier (only required if data is in long format).

correctionstring

Specify the epsilon version

'gg' : Greenhouse-Geisser
'hf' : Huynh-Feldt
'lb' : Lower bound
Returns
epsfloat

Epsilon adjustement factor.

See also

sphericity

Mauchly and JNS test for sphericity.

homoscedasticity

Test equality of variance.

Notes

The lower bound epsilon is:

\[lb = \frac{1}{\text{dof}},\]

where the degrees of freedom \(\text{dof}\) is the number of groups \(k\) minus 1 for one-way design and \((k_1 - 1)(k_2 - 1)\) for two-way design

The Greenhouse-Geisser epsilon is given by:

\[\epsilon_{GG} = \frac{k^2(\overline{\text{diag}(S)} - \overline{S})^2}{(k-1)(\sum_{i=1}^{k}\sum_{j=1}^{k}s_{ij}^2 - 2k\sum_{j=1}^{k}\overline{s_i}^2 + k^2\overline{S}^2)}\]

where \(S\) is the covariance matrix, \(\overline{S}\) the grandmean of S and \(\overline{\text{diag}(S)}\) the mean of all the elements on the diagonal of S (i.e. mean of the variances).

The Huynh-Feldt epsilon is given by:

\[\epsilon_{HF} = \frac{n(k-1)\epsilon_{GG}-2}{(k-1) (n-1-(k-1)\epsilon_{GG})}\]

where \(n\) is the number of observations.

Missing values are automatically removed from data (listwise deletion).

References

1

http://www.real-statistics.com/anova-repeated-measures/sphericity/

Examples

Using a wide-format dataframe

>>> import pandas as pd
>>> import pingouin as pg
>>> data = pd.DataFrame({'A': [2.2, 3.1, 4.3, 4.1, 7.2],
...                      'B': [1.1, 2.5, 4.1, 5.2, 6.4],
...                      'C': [8.2, 4.5, 3.4, 6.2, 7.2]})
>>> gg = pg.epsilon(data, correction='gg')
>>> hf = pg.epsilon(data, correction='hf')
>>> lb = pg.epsilon(data, correction='lb')
>>> print(lb, gg, hf)
0.5 0.5587754577585018 0.6223448311539781

Now using a long-format dataframe

>>> data = pg.read_dataset('rm_anova2')
>>> data.head()
   Subject Time   Metric  Performance
0        1  Pre  Product           13
1        2  Pre  Product           12
2        3  Pre  Product           17
3        4  Pre  Product           12
4        5  Pre  Product           19

Let’s first calculate the epsilon of the Time within-subject factor

>>> pg.epsilon(data, dv='Performance', subject='Subject',
...            within='Time')
1.0

Since Time has only two levels (Pre and Post), the sphericity assumption is necessarily met, and therefore the epsilon adjustement factor is 1.

The Metric factor, however, has three levels:

>>> pg.epsilon(data, dv='Performance', subject='Subject',
...            within=['Metric'])
0.9691029584899856

The epsilon value is very close to 1, meaning that there is no major violation of sphericity.

Now, let’s calculate the epsilon for the interaction between the two repeated measures factor:

>>> pg.epsilon(data, dv='Performance', subject='Subject',
...            within=['Time', 'Metric'])
0.727166420214127

Alternatively, we could use a wide-format dataframe with two column levels:

>>> # Pivot from long-format to wide-format
>>> piv = data.pivot_table(index='Subject', columns=['Time', 'Metric'],
...                        values='Performance')
>>> piv.head()
Time      Post                   Pre
Metric  Action Client Product Action Client Product
Subject
1           34     30      18     17     12      13
2           30     18       6     18     19      12
3           32     31      21     24     19      17
4           40     39      18     25     25      12
5           27     28      18     19     27      19
>>> pg.epsilon(piv)
0.727166420214127

which gives the same epsilon value as the long-format dataframe.