pingouin.pairwise_tukey¶

pingouin.pairwise_tukey(data=None, dv=None, between=None, effsize='hedges')

Pairwise Tukey-HSD post-hoc test.

Parameters
datapandas.DataFrame

DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.

dvstring

Name of column containing the dependent variable.

between: string

Name of column containing the between factor.

effsizestring or None

Effect size type. Available methods are:

• 'none': no effect size

• 'cohen': Unbiased Cohen d

• 'hedges': Hedges g

• 'r': Pearson correlation coefficient

• 'eta-square': Eta-square

• 'odds-ratio': Odds ratio

• 'AUC': Area Under the Curve

• 'CLES': Common Language Effect Size

Returns
statspandas.DataFrame
• 'A': Name of first measurement

• 'B': Name of second measurement

• 'mean(A)': Mean of first measurement

• 'mean(B)': Mean of second measurement

• 'diff': Mean difference (= mean(A) - mean(B))

• 'se': Standard error

• 'T': T-values

• 'p-tukey': Tukey-HSD corrected p-values

• 'hedges': Hedges effect size (or any effect size defined in effsize)

Notes

Tukey HSD post-hoc [1] is best for balanced one-way ANOVA.

It has been proven to be conservative for one-way ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the Games-Howell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA. Only one-way ANOVA design are supported.

The T-values are defined as:

$t = \frac{\overline{x}_i - \overline{x}_j} {\sqrt{2 \cdot \text{MS}_w / n}}$

where $$\overline{x}_i$$ and $$\overline{x}_j$$ are the means of the first and second group, respectively, $$\text{MS}_w$$ the mean squares of the error (computed using ANOVA) and $$n$$ the sample size.

If the sample sizes are unequal, the Tukey-Kramer procedure is automatically used:

$t = \frac{\overline{x}_i - \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{\text{MS}_w}{n_j}}}$

where $$n_i$$ and $$n_j$$ are the sample sizes of the first and second group, respectively.

The p-values are then approximated using the Studentized range distribution $$Q(\sqrt2|t_i|, r, N - r)$$ where $$r$$ is the total number of groups and $$N$$ is the total sample size.

Warning

Versions of Pingouin below 0.3.10 used a wrong algorithm for the studentized range approximation [2], which resulted in (slightly) incorrect p-values. Please make sure you’re using the LATEST VERSION of Pingouin, and always DOUBLE CHECK your results with another statistical software.

References

1

Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99-114.

2

Gleason, John R. “An accurate, non-iterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147-158.

Examples

Pairwise Tukey post-hocs on the Penguins dataset.

>>> import pingouin as pg