pingouin.pairwise_tukey

pingouin.pairwise_tukey(dv=None, between=None, data=None, alpha=0.05, tail='two-sided', effsize='hedges')[source]

Pairwise Tukey-HSD post-hoc test.

Parameters
dvstring

Name of column containing the dependant variable.

between: string

Name of column containing the between factor.

datapandas DataFrame

DataFrame

alphafloat

Significance level

tailstring

Indicates whether to return the ‘two-sided’ or ‘one-sided’ p-values

effsizestring or None

Effect size type. Available methods are

'none' : no effect size
'cohen' : Unbiased Cohen d
'hedges' : Hedges g
'glass': Glass delta
'eta-square' : Eta-square
'odds-ratio' : Odds ratio
'AUC' : Area Under the Curve
Returns
statsDataFrame

Stats summary

'A' : Name of first measurement
'B' : Name of second measurement
'mean(A)' : Mean of first measurement
'mean(B)' : Mean of second measurement
'diff' : Mean difference
'SE' : Standard error
'tail' : indicate whether the p-values are one-sided or two-sided
'T' : T-values
'p-tukey' : Tukey-HSD corrected p-values
'efsize' : effect sizes
'eftype' : type of effect size

Notes

Tukey HSD post-hoc is best for balanced one-way ANOVA. It has been proven to be conservative for one-way ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the Games-Howell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA.

Note that when the sample sizes are unequal, this function actually performs the Tukey-Kramer test (which allows for unequal sample sizes).

The T-values are defined as:

\[t = \frac{\overline{x}_i - \overline{x}_j} {\sqrt{2 \cdot MS_w / n}}\]

where \(\overline{x}_i\) and \(\overline{x}_j\) are the means of the first and second group, respectively, \(MS_w\) the mean squares of the error (computed using ANOVA) and \(n\) the sample size.

If the sample sizes are unequal, the Tukey-Kramer procedure is automatically used:

\[t = \frac{\overline{x}_i - \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{MS_w}{n_j}}}\]

where \(n_i\) and \(n_j\) are the sample sizes of the first and second group, respectively.

The p-values are then approximated using the Studentized range distribution \(Q(\sqrt2*|t_i|, r, N - r)\) where \(r\) is the total number of groups and \(N\) is the total sample size.

Note that the p-values might be slightly different than those obtained using R or Matlab since the studentized range approximation is done using the Gleason (1999) algorithm, which is more efficient and accurate than the algorithms used in Matlab or R.

References

1

Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99-114.

2

Gleason, John R. “An accurate, non-iterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147-158.

Examples

Pairwise Tukey post-hocs on the pain threshold dataset.

>>> from pingouin import pairwise_tukey, read_dataset
>>> df = read_dataset('anova')
>>> pt = pairwise_tukey(dv='Pain threshold', between='Hair color', data=df)