pingouin.epsilon

pingouin.
epsilon
(data, dv=None, within=None, subject=None, correction='gg')[source] Epsilon adjustement factor for repeated measures.
 Parameters
 datapd.DataFrame
DataFrame containing the repeated measurements. Both wide and longformat dataframe are supported for this function. To test for an interaction term between two repeated measures factors with a wideformat dataframe,
data
must have a twolevelspandas.MultiIndex
columns. dvstring
Name of column containing the dependant variable (only required if
data
is in long format). withinstring
Name of column containing the within factor (only required if
data
is in long format). Ifwithin
is a list with two strings, this function computes the epsilon factor for the interaction between the two withinsubject factor. subjectstring
Name of column containing the subject identifier (only required if
data
is in long format). correctionstring
Specify the epsilon version
'gg' : GreenhouseGeisser 'hf' : HuynhFeldt 'lb' : Lower bound
 Returns
 epsfloat
Epsilon adjustement factor.
See also
sphericity
Mauchly and JNS test for sphericity.
homoscedasticity
Test equality of variance.
Notes
The lower bound epsilon is:
\[lb = \frac{1}{\text{dof}},\]where the degrees of freedom \(\text{dof}\) is the number of groups \(k\) minus 1 for oneway design and \((k_1  1)(k_2  1)\) for twoway design
The GreenhouseGeisser epsilon is given by:
\[\epsilon_{GG} = \frac{k^2(\overline{\text{diag}(S)}  \overline{S})^2}{(k1)(\sum_{i=1}^{k}\sum_{j=1}^{k}s_{ij}^2  2k\sum_{j=1}^{k}\overline{s_i}^2 + k^2\overline{S}^2)}\]where \(S\) is the covariance matrix, \(\overline{S}\) the grandmean of S and \(\overline{\text{diag}(S)}\) the mean of all the elements on the diagonal of S (i.e. mean of the variances).
The HuynhFeldt epsilon is given by:
\[\epsilon_{HF} = \frac{n(k1)\epsilon_{GG}2}{(k1) (n1(k1)\epsilon_{GG})}\]where \(n\) is the number of observations.
Missing values are automatically removed from
data
(listwise deletion).References
Examples
Using a wideformat dataframe
>>> import pandas as pd >>> import pingouin as pg >>> data = pd.DataFrame({'A': [2.2, 3.1, 4.3, 4.1, 7.2], ... 'B': [1.1, 2.5, 4.1, 5.2, 6.4], ... 'C': [8.2, 4.5, 3.4, 6.2, 7.2]}) >>> gg = pg.epsilon(data, correction='gg') >>> hf = pg.epsilon(data, correction='hf') >>> lb = pg.epsilon(data, correction='lb') >>> print(lb, gg, hf) 0.5 0.5587754577585018 0.6223448311539781
Now using a longformat dataframe
>>> data = pg.read_dataset('rm_anova2') >>> data.head() Subject Time Metric Performance 0 1 Pre Product 13 1 2 Pre Product 12 2 3 Pre Product 17 3 4 Pre Product 12 4 5 Pre Product 19
Let’s first calculate the epsilon of the Time withinsubject factor
>>> pg.epsilon(data, dv='Performance', subject='Subject', ... within='Time') 1.0
Since Time has only two levels (Pre and Post), the sphericity assumption is necessarily met, and therefore the epsilon adjustement factor is 1.
The Metric factor, however, has three levels:
>>> pg.epsilon(data, dv='Performance', subject='Subject', ... within=['Metric']) 0.9691029584899856
The epsilon value is very close to 1, meaning that there is no major violation of sphericity.
Now, let’s calculate the epsilon for the interaction between the two repeated measures factor:
>>> pg.epsilon(data, dv='Performance', subject='Subject', ... within=['Time', 'Metric']) 0.727166420214127
Alternatively, we could use a wideformat dataframe with two column levels:
>>> # Pivot from longformat to wideformat >>> piv = data.pivot_table(index='Subject', columns=['Time', 'Metric'], ... values='Performance') >>> piv.head() Time Post Pre Metric Action Client Product Action Client Product Subject 1 34 30 18 17 12 13 2 30 18 6 18 19 12 3 32 31 21 24 19 17 4 40 39 18 25 25 12 5 27 28 18 19 27 19
>>> pg.epsilon(piv) 0.727166420214127
which gives the same epsilon value as the longformat dataframe.