pingouin.pairwise_tukey

pingouin.
pairwise_tukey
(data=None, dv=None, between=None, alpha=0.05, tail='twosided', effsize='hedges')[source] Pairwise TukeyHSD posthoc test.
 Parameters
 datapandas DataFrame
DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.
 dvstring
Name of column containing the dependant variable.
 between: string
Name of column containing the between factor.
 alphafloat
Significance level
 tailstring
Indicates whether to return the ‘twosided’ or ‘onesided’ pvalues
 effsizestring or None
Effect size type. Available methods are
'none' : no effect size 'cohen' : Unbiased Cohen d 'hedges' : Hedges g 'glass': Glass delta 'r' : Pearson correlation coefficient 'etasquare' : Etasquare 'oddsratio' : Odds ratio 'AUC' : Area Under the Curve 'CLES' : Common Language Effect Size
 Returns
 statsDataFrame
Stats summary
'A' : Name of first measurement 'B' : Name of second measurement 'mean(A)' : Mean of first measurement 'mean(B)' : Mean of second measurement 'diff' : Mean difference (= mean(A)  mean(B)) 'se' : Standard error 'tail' : indicate whether the pvalues are onesided or twosided 'T' : Tvalues 'ptukey' : TukeyHSD corrected pvalues 'hedges' : effect size (or any effect size defined in ``effsize``)
Notes
Tukey HSD posthoc is best for balanced oneway ANOVA.
It has been proven to be conservative for oneway ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the GamesHowell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA.
Note that when the sample sizes are unequal, this function actually performs the TukeyKramer test (which allows for unequal sample sizes).
The Tvalues are defined as:
\[t = \frac{\overline{x}_i  \overline{x}_j} {\sqrt{2 \cdot MS_w / n}}\]where \(\overline{x}_i\) and \(\overline{x}_j\) are the means of the first and second group, respectively, \(MS_w\) the mean squares of the error (computed using ANOVA) and \(n\) the sample size.
If the sample sizes are unequal, the TukeyKramer procedure is automatically used:
\[t = \frac{\overline{x}_i  \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{MS_w}{n_j}}}\]where \(n_i\) and \(n_j\) are the sample sizes of the first and second group, respectively.
The pvalues are then approximated using the Studentized range distribution \(Q(\sqrt2*t_i, r, N  r)\) where \(r\) is the total number of groups and \(N\) is the total sample size.
Note that the pvalues might be slightly different than those obtained using R or Matlab since the studentized range approximation is done using the Gleason (1999) algorithm, which is more efficient and accurate than the algorithms used in Matlab or R.
References
 1
Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99114.
 2
Gleason, John R. “An accurate, noniterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147158.
Examples
Pairwise Tukey posthocs on the pain threshold dataset.
>>> from pingouin import pairwise_tukey, read_dataset >>> df = read_dataset('anova') >>> pt = pairwise_tukey(data=df, dv='Pain threshold', between='Hair color')