pingouin.pairwise_tukey

pingouin.
pairwise_tukey
(dv=None, between=None, data=None, alpha=0.05, tail='twosided', effsize='hedges')[source] Pairwise TukeyHSD posthoc test.
 Parameters
 dvstring
Name of column containing the dependant variable.
 between: string
Name of column containing the between factor.
 datapandas DataFrame
DataFrame
 alphafloat
Significance level
 tailstring
Indicates whether to return the ‘twosided’ or ‘onesided’ pvalues
 effsizestring or None
Effect size type. Available methods are
'none' : no effect size 'cohen' : Unbiased Cohen d 'hedges' : Hedges g 'glass': Glass delta 'etasquare' : Etasquare 'oddsratio' : Odds ratio 'AUC' : Area Under the Curve
 Returns
 statsDataFrame
Stats summary
'A' : Name of first measurement 'B' : Name of second measurement 'mean(A)' : Mean of first measurement 'mean(B)' : Mean of second measurement 'diff' : Mean difference 'SE' : Standard error 'tail' : indicate whether the pvalues are onesided or twosided 'T' : Tvalues 'ptukey' : TukeyHSD corrected pvalues 'efsize' : effect sizes 'eftype' : type of effect size
Notes
Tukey HSD posthoc is best for balanced oneway ANOVA. It has been proven to be conservative for oneway ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the GamesHowell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA.
Note that when the sample sizes are unequal, this function actually performs the TukeyKramer test (which allows for unequal sample sizes).
The Tvalues are defined as:
\[t = \frac{\overline{x}_i  \overline{x}_j} {\sqrt{2 \cdot MS_w / n}}\]where \(\overline{x}_i\) and \(\overline{x}_j\) are the means of the first and second group, respectively, \(MS_w\) the mean squares of the error (computed using ANOVA) and \(n\) the sample size.
If the sample sizes are unequal, the TukeyKramer procedure is automatically used:
\[t = \frac{\overline{x}_i  \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{MS_w}{n_j}}}\]where \(n_i\) and \(n_j\) are the sample sizes of the first and second group, respectively.
The pvalues are then approximated using the Studentized range distribution \(Q(\sqrt2*t_i, r, N  r)\) where \(r\) is the total number of groups and \(N\) is the total sample size.
Note that the pvalues might be slightly different than those obtained using R or Matlab since the studentized range approximation is done using the Gleason (1999) algorithm, which is more efficient and accurate than the algorithms used in Matlab or R.
References
 1
Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99114.
 2
Gleason, John R. “An accurate, noniterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147158.
Examples
Pairwise Tukey posthocs on the pain threshold dataset.
>>> from pingouin import pairwise_tukey, read_dataset >>> df = read_dataset('anova') >>> pt = pairwise_tukey(dv='Pain threshold', between='Hair color', data=df)