# pingouin.pairwise_tukey

pingouin.pairwise_tukey(data=None, dv=None, between=None, alpha=0.05, tail='two-sided', effsize='hedges')[source]

Pairwise Tukey-HSD post-hoc test.

Parameters
datapandas DataFrame

DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.

dvstring

Name of column containing the dependant variable.

between: string

Name of column containing the between factor.

alphafloat

Significance level

tailstring

Indicates whether to return the ‘two-sided’ or ‘one-sided’ p-values

effsizestring or None

Effect size type. Available methods are

'none' : no effect size
'cohen' : Unbiased Cohen d
'hedges' : Hedges g
'glass': Glass delta
'r' : Pearson correlation coefficient
'eta-square' : Eta-square
'odds-ratio' : Odds ratio
'AUC' : Area Under the Curve
'CLES' : Common Language Effect Size

Returns
statsDataFrame

Stats summary

'A' : Name of first measurement
'B' : Name of second measurement
'mean(A)' : Mean of first measurement
'mean(B)' : Mean of second measurement
'diff' : Mean difference (= mean(A) - mean(B))
'se' : Standard error
'tail' : indicate whether the p-values are one-sided or two-sided
'T' : T-values
'p-tukey' : Tukey-HSD corrected p-values
'hedges' : effect size (or any effect size defined in effsize)


Notes

Tukey HSD post-hoc is best for balanced one-way ANOVA.

It has been proven to be conservative for one-way ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the Games-Howell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA.

Note that when the sample sizes are unequal, this function actually performs the Tukey-Kramer test (which allows for unequal sample sizes).

The T-values are defined as:

$t = \frac{\overline{x}_i - \overline{x}_j} {\sqrt{2 \cdot MS_w / n}}$

where $$\overline{x}_i$$ and $$\overline{x}_j$$ are the means of the first and second group, respectively, $$MS_w$$ the mean squares of the error (computed using ANOVA) and $$n$$ the sample size.

If the sample sizes are unequal, the Tukey-Kramer procedure is automatically used:

$t = \frac{\overline{x}_i - \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{MS_w}{n_j}}}$

where $$n_i$$ and $$n_j$$ are the sample sizes of the first and second group, respectively.

The p-values are then approximated using the Studentized range distribution $$Q(\sqrt2*|t_i|, r, N - r)$$ where $$r$$ is the total number of groups and $$N$$ is the total sample size.

Note that the p-values might be slightly different than those obtained using R or Matlab since the studentized range approximation is done using the Gleason (1999) algorithm, which is more efficient and accurate than the algorithms used in Matlab or R.

References

1

Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99-114.

2

Gleason, John R. “An accurate, non-iterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147-158.

Examples

Pairwise Tukey post-hocs on the pain threshold dataset.

>>> from pingouin import pairwise_tukey, read_dataset
>>> df = read_dataset('anova')
>>> pt = pairwise_tukey(data=df, dv='Pain threshold', between='Hair color')