pingouin.pairwise_ttests(data=None, dv=None, between=None, within=None, subject=None, parametric=True, marginal=True, alpha=0.05, tail='two-sided', padjust='none', effsize='hedges', correction='auto', nan_policy='listwise', return_desc=False, interaction=True, within_first=True)[source]

Pairwise T-tests.


DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.


Name of column containing the dependent variable.

betweenstring or list with 2 elements

Name of column(s) containing the between-subject factor(s).


Note that Pingouin gives slightly different T and p-values compared to JASP posthoc tests for 2-way factorial design, because Pingouin does not pool the standard error for each factor, but rather calculate each pairwise T-test completely independent of others.

withinstring or list with 2 elements

Name of column(s) containing the within-subject factor(s), i.e. the repeated measurements.


Name of column containing the subject identifier. This is mandatory when within is specified.


If True (default), use the parametric ttest() function. If False, use pingouin.wilcoxon() or pingouin.mwu() for paired or unpaired samples, respectively.


If True, average over repeated measures factor when working with mixed or two-way repeated measures design. For instance, in mixed design, the between-subject pairwise T-test(s) will be calculated after averaging across all levels of the within-subject repeated measures factor (the so-called “marginal means”).

Similarly, in two-way repeated measures factor, the pairwise T-test(s) will be calculated after averaging across all levels of the other repeated measures factor.

Setting marginal=True is recommended when doing posthoc testing with multiple factors in order to avoid violating the assumption of independence and conflating the degrees of freedom by the number of repeated measurements. This is the default behavior of JASP.


The default behavior of Pingouin <0.3.2 was marginal = False, which may have led to incorrect p-values for mixed or two-way repeated measures design. Make sure to always use the latest version of Pingouin.

New in version 0.3.2.


Significance level


Specify whether the alternative hypothesis is ‘two-sided’ or ‘one-sided’. Can also be ‘greater’ or ‘less’ to specify the direction of the test. ‘greater’ tests the alternative that x has a larger mean than y. If tail is ‘one-sided’, Pingouin will automatically infer the one-sided alternative hypothesis of the test based on the test statistic.


Method used for testing and adjustment of pvalues.

  • 'none': no correction

  • 'bonf': one-step Bonferroni correction

  • 'sidak': one-step Sidak correction

  • 'holm': step-down method using Bonferroni adjustments

  • 'fdr_bh': Benjamini/Hochberg FDR correction

  • 'fdr_by': Benjamini/Yekutieli FDR correction

effsizestring or None

Effect size type. Available methods are:

  • 'none': no effect size

  • 'cohen': Unbiased Cohen d

  • 'hedges': Hedges g

  • 'glass': Glass delta

  • 'r': Pearson correlation coefficient

  • 'eta-square': Eta-square

  • 'odds-ratio': Odds ratio

  • 'AUC': Area Under the Curve

  • 'CLES': Common Language Effect Size

correctionstring or boolean

For unpaired two sample T-tests, specify whether or not to correct for unequal variances using Welch separate variances T-test. If ‘auto’, it will automatically uses Welch T-test when the sample sizes are unequal, as recommended by Zimmerman 2004.

New in version 0.3.2.


Can be ‘listwise’ for listwise deletion of missing values in repeated measures design (= complete-case analysis) or ‘pairwise’ for the more liberal pairwise deletion (= available-case analysis).

New in version 0.2.9.


If True, append group means and std to the output dataframe


If there are multiple factors and interaction is True (default), Pingouin will also calculate T-tests for the interaction term (see Notes).

New in version 0.2.9.


Determines the order of the interaction in mixed design. Pingouin will return within * between when this parameter is set to True (default), and between * within otherwise.

New in version 0.3.6.

  • 'Contrast': Contrast (= independent variable or interaction)

  • 'A': Name of first measurement

  • 'B': Name of second measurement

  • 'Paired': indicates whether the two measurements are paired or independent

  • 'Parametric': indicates if (non)-parametric tests were used

  • 'Tail': indicate whether the p-values are one-sided or two-sided

  • 'T': T statistic (only if parametric=True)

  • 'U-val': Mann-Whitney U stat (if parametric=False and unpaired data)

  • 'W-val': Wilcoxon W stat (if parametric=False and paired data)

  • 'dof': degrees of freedom (only if parametric=True)

  • 'p-unc': Uncorrected p-values

  • 'p-corr': Corrected p-values

  • 'p-adjust': p-values correction method

  • 'BF10': Bayes Factor

  • 'hedges': effect size (or any effect size defined in effsize)


Data are expected to be in long-format. If your data is in wide-format, you can use the pandas.melt() function to convert from wide to long format.

If between or within is a list (e.g. [‘col1’, ‘col2’]), the function returns 1) the pairwise T-tests between each values of the first column, 2) the pairwise T-tests between each values of the second column and 3) the interaction between col1 and col2. The interaction is dependent of the order of the list, so [‘col1’, ‘col2’] will not yield the same results as [‘col2’, ‘col1’], and will only be calculated if interaction=True.

In other words, if between is a list with two elements, the output model is between1 + between2 + between1 * between2.

Similarly, if within is a list with two elements, the output model is within1 + within2 + within1 * within2.

If both between and within are specified, the output model is within + between + within * between (= mixed design), unless within_first=False in which case the model becomes between + within + between * within.

Missing values in repeated measurements are automatically removed using a listwise (default) or pairwise deletion strategy. However, you should be very careful since it can result in undesired values removal (especially for the interaction effect). We strongly recommend that you preprocess your data and remove the missing values before using this function.

This function has been tested against the pairwise.t.test R function.


Versions of Pingouin below 0.3.2 gave incorrect results for mixed and two-way repeated measures design (see above warning for the marginal argument).


Pingouin gives slightly different results than the JASP’s posthoc module when working with multiple factors (e.g. mixed, factorial or 2-way repeated measures design). This is mostly caused by the fact that Pingouin does not pool the standard error for between-subject and interaction contrasts. You should always double check your results with JASP or another statistical software.


For more examples, please refer to the Jupyter notebooks

  1. One between-subject factor

>>> from pingouin import pairwise_ttests, read_dataset
>>> df = read_dataset('mixed_anova.csv')
>>> pairwise_ttests(dv='Scores', between='Group', data=df) 
  1. One within-subject factor

>>> post_hocs = pairwise_ttests(dv='Scores', within='Time',
...                             subject='Subject', data=df)
>>> print(post_hocs)  
  1. Non-parametric pairwise paired test (wilcoxon)

>>> pairwise_ttests(dv='Scores', within='Time', subject='Subject',
...                 data=df, parametric=False)  
  1. Mixed design (within and between) with bonferroni-corrected p-values

>>> posthocs = pairwise_ttests(dv='Scores', within='Time',
...                            subject='Subject', between='Group',
...                            padjust='bonf', data=df)
  1. Two between-subject factors. The order of the list matters!

>>> posthocs = pairwise_ttests(dv='Scores', between=['Group', 'Time'],
...                            data=df)
  1. Same but without the interaction

>>> posthocs = df.pairwise_ttests(dv='Scores', between=['Group', 'Time'],
...                               interaction=False)