To install Python on your computer, you should use Anaconda, a Python distribution which natively includes all the most important packages. Then, open the newly installed Anaconda prompt and type:

conda install pip

This will install pip, the most-widely used package manager in Python. Once pip is installed, you should be able to install Pingouin. Still in Anaconda prompt, run the following command:

pip install pingouin

You are almost ready to use Pingouin. First, you need to open an interactive Python console (either IPython or Jupyter). To do so, type the following command:


Now, let’s do a simple paired T-test using Pingouin:

import pingouin as pg
# Create two variables
x = [4, 6, 5, 7, 6]
y = [2, 2, 3, 1, 2]
# Run a T-test
pg.ttest(x, y, paired=True)
# 1) Import the full package
# --> Best if you are planning to use several Pingouin functions.
import pingouin as pg
pg.ttest(x, y)

# 2) Import specific functions
# --> Best if you are planning to use only this specific function.
from pingouin import ttest
ttest(x, y)

Statsmodels is a great statistical Python package that provides several advanced functions (regression, GLM, time-series analysis) as well as an R-like syntax for fitting models. However, statsmodels can be quite hard to grasp and use for Python beginners and/or users who just want to perform simple statistical tests. The goal of Pingouin is not to replace statsmodels but rather to provide some easy-to-use functions to perform the most widely-used statistical tests. In addition, Pingouin also provides some novel functions (to cite but a few: effect sizes, pairwise T-tests and correlations, ICC, repeated measures correlation, circular statistics…).

The scipy.stats module provides several low-level statistical functions. However, most of these functions do not return a very detailed output (e.g. only the T- and p-values for a T-test). Most of Pingouin function are using the low-level SciPy funtions to provide a richer, more exhaustive, output. See for yourself!:

import pingouin as pg
from scipy.stats import ttest_ind

x = [4, 6, 5, 7, 6]
y = [2, 2, 3, 1, 2]

print(pg.ttest(x, y))   # Pingouin: returns a DataFrame with T-value, p-value, degrees of freedom, tail, Cohen d, power and Bayes Factor
print(ttest_ind(x, y))  # SciPy: returns only the T- and p-values


Most Pingouin functions assume that your data is in tidy or long format, that is, each variable should be in one column and each observation should be in a different row. This is true for all the ANOVA / post-hocs function as well as the linear/logistic regression, pairwise correlations, partial correlation, mediation analysis, etc…

An example of data in long-format is shown below. Note that Scores is the dependant variable, Subject is the subject identifier, Time is a within-subject factor (two time points per subject), and Age and Gender are meta-data:

Subject Gender Age Time Scores
1 M 24 Pre 2.5
1 M 24 Post 3.1
2 F 32 Pre 4.2
2 F 32 Post 4.8
3 F 38 Pre 2.5
3 F 38 Post 2.9

To convert your data from a wide format (typical in Excel) to a long format, you can use the pandas.melt() function

You need to use the Pandas package:

import pandas as pd
pd.read_csv('myfile.csv')     # Load a .csv file
pd.read_excel('myfile.xlsx')  # Load an Excel file

No, the central idea behind Pingouin is that all data manipulations and descriptive statistics should be first performed in Pandas (or NumPy). For example, to compute the mean, standard deviation, and quartiles of all the numeric columns of a pandas DataFrame, one can easily use the pandas.DataFrame.describe() method:



To cite Pingouin, please use the publication in JOSS:

Vallat, R. (2018). Pingouin: statistics in Python. Journal of Open Source Software, 3(31), 1026, https://doi.org/10.21105/joss.01026


  title    = "Pingouin: statistics in Python",
  author   = "Vallat, Raphael",
  journal  = "The Journal of Open Source Software",
  volume   =  3,
  number   =  31,
  pages    = "1026",
  month    =  nov,
  year     =  2018