pingouin.pairwise_tukey(data=None, dv=None, between=None, alpha=0.05, tail='two-sided', effsize='hedges')[source]

Pairwise Tukey-HSD post-hoc test.

datapandas DataFrame

DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.


Name of column containing the dependant variable.

between: string

Name of column containing the between factor.


Significance level


Indicates whether to return the ‘two-sided’ or ‘one-sided’ p-values

effsizestring or None

Effect size type. Available methods are

'none' : no effect size
'cohen' : Unbiased Cohen d
'hedges' : Hedges g
'glass': Glass delta
'r' : Pearson correlation coefficient
'eta-square' : Eta-square
'odds-ratio' : Odds ratio
'AUC' : Area Under the Curve
'CLES' : Common Language Effect Size

Stats summary

'A' : Name of first measurement
'B' : Name of second measurement
'mean(A)' : Mean of first measurement
'mean(B)' : Mean of second measurement
'diff' : Mean difference (= mean(A) - mean(B))
'se' : Standard error
'tail' : indicate whether the p-values are one-sided or two-sided
'T' : T-values
'p-tukey' : Tukey-HSD corrected p-values
'hedges' : effect size (or any effect size defined in ``effsize``)


Tukey HSD post-hoc is best for balanced one-way ANOVA.

It has been proven to be conservative for one-way ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the Games-Howell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA.

Note that when the sample sizes are unequal, this function actually performs the Tukey-Kramer test (which allows for unequal sample sizes).

The T-values are defined as:

\[t = \frac{\overline{x}_i - \overline{x}_j} {\sqrt{2 \cdot MS_w / n}}\]

where \(\overline{x}_i\) and \(\overline{x}_j\) are the means of the first and second group, respectively, \(MS_w\) the mean squares of the error (computed using ANOVA) and \(n\) the sample size.

If the sample sizes are unequal, the Tukey-Kramer procedure is automatically used:

\[t = \frac{\overline{x}_i - \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{MS_w}{n_j}}}\]

where \(n_i\) and \(n_j\) are the sample sizes of the first and second group, respectively.

The p-values are then approximated using the Studentized range distribution \(Q(\sqrt2*|t_i|, r, N - r)\) where \(r\) is the total number of groups and \(N\) is the total sample size.

Note that the p-values might be slightly different than those obtained using R or Matlab since the studentized range approximation is done using the Gleason (1999) algorithm, which is more efficient and accurate than the algorithms used in Matlab or R.



Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99-114.


Gleason, John R. “An accurate, non-iterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147-158.


Pairwise Tukey post-hocs on the pain threshold dataset.

>>> from pingouin import pairwise_tukey, read_dataset
>>> df = read_dataset('anova')
>>> pt = pairwise_tukey(data=df, dv='Pain threshold', between='Hair color')