**Pingouin** is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy.

- ANOVAs: one- and two-ways, repeated measures, mixed, ancova
- Post-hocs tests and pairwise comparisons
- Robust correlations
- Partial correlation, repeated measures correlation and intraclass correlation
- Linear/logistic regression and mediation analysis
- Bayesian T-test and Pearson correlation
- Tests for sphericity, normality and homoscedasticity
- Effect sizes and power analysis
- Parametric/bootstrapped confidence intervals around an effect size or a correlation coefficient
- Circular statistics
- Plotting: Bland-Altman plot, Q-Q plot, etc…

Pingouin is designed for users who want **simple yet exhaustive statistical functions**.

For example, the `scipy.stats.ttest_ind()`

function returns only the T-value and the p-value. By contrast,
the `pingouin.ttest()`

function returns the T-value, p-value, degrees of freedom, effect size (Cohen’s d), statistical power and Bayes Factor (BF10) of the test.

The main dependencies of Pingouin are :

- NumPy (>= 1.15)
- SciPy (>= 1.1.0)
- Pandas (>= 0.23)
- Matplotlib (>= 3.0.2)
- Seaborn (>= 0.9.0)

In addition, some functions require :

- Statsmodels
- Scikit-learn

Pingouin is a Python 3 package. While most of the functions should work with Python 2.7, we strongly recommend using Python >= 3.6.

Pingouin can be easily installed using pip

```
pip install pingouin
```

or conda

```
conda install -c conda-forge pingouin
```

New releases are frequent so always make sure that you have the latest version:

```
pip install --upgrade pingouin
```

- Link to the GitHub repository.

Try before you buy! Click on the link below and navigate to the notebooks folder to load a collection of interactive Jupyter notebooks demonstrating the main functionalities of Pingouin. No need to install Pingouin beforehand as the notebooks run in a Binder environment.

```
import numpy as np
import pingouin as pg
np.random.seed(123)
mean, cov, n = [4, 5], [(1, .6), (.6, 1)], 30
x, y = np.random.multivariate_normal(mean, cov, n).T
# T-test
pg.ttest(x, y)
```

T | p-val | dof | tail | cohen-d | power | BF10 |
---|---|---|---|---|---|---|

-3.401 | 0.001 | 58 | two-sided | 0.878 | 0.917 | 26.155 |

```
pg.corr(x, y)
```

n | r | CI95% | r2 | adj_r2 | p-val | BF10 | power |
---|---|---|---|---|---|---|---|

30 | 0.595 | [0.3 0.79] | 0.354 | 0.306 | 0.001 | 54.222 | 0.95 |

```
# Introduce an outlier
x[5] = 18
# Use the robust Shepherd's pi correlation
pg.corr(x, y, method="shepherd")
```

n | r | CI95% | r2 | adj_r2 | p-val | power |
---|---|---|---|---|---|---|

30 | 0.561 | [0.25 0.77] | 0.315 | 0.264 | 0.002 | 0.917 |

```
# Return a boolean (true if normal) and the associated p-value
print(pg.normality(x, y)) # Univariate normality
print(pg.multivariate_normality(np.column_stack((x, y)))) # Multivariate normality
```

```
(array([False, True]), array([0., 0.552]))
(False, 0.00018)
```

```
import numpy as np
import pingouin as pg
np.random.seed(123)
x = np.random.normal(size=50)
ax = pg.qqplot(x, dist='norm')
```

```
# Read an example dataset
from pingouin.datasets import read_dataset
df = read_dataset('mixed_anova')
# Run the ANOVA
aov = pg.anova(data=df, dv='Scores', between='Group', detailed=True)
print(aov)
```

Source | SS | DF | MS | F | p-unc | np2 |
---|---|---|---|---|---|---|

Group | 5.460 | 1 | 5.460 | 5.244 | 0.02320 | 0.029 |

Within | 185.343 | 178 | 1.041 |

```
pg.rm_anova(data=df, dv='Scores', within='Time', subject='Subject', detailed=True)
```

Source | SS | DF | MS | F | p-unc | np2 | eps |
---|---|---|---|---|---|---|---|

Time | 7.628 | 2 | 3.814 | 3.913 | 0.022629 | 0.062 | 0.999 |

Error | 115.027 | 118 | 0.975 |

```
# FDR-corrected post hocs with Hedges'g effect size
posthoc = pg.pairwise_ttests(data=df, dv='Scores', within='Time', subject='Subject',
padjust='fdr_bh', effsize='hedges')
# Pretty printing of table
pg.print_table(posthoc, floatfmt='.3f')
```

Contrast | A | B | Paired | T | tail | p-unc | p-corr | p-adjust | BF10 | efsize | eftype |
---|---|---|---|---|---|---|---|---|---|---|---|

Time | August | January | True | -1.740 | two-sided | 0.087 | 0.131 | fdr_bh | 0.582 | -0.328 | hedges |

Time | August | June | True | -2.743 | two-sided | 0.008 | 0.024 | fdr_bh | 4.232 | -0.485 | hedges |

Time | January | June | True | -1.024 | two-sided | 0.310 | 0.310 | fdr_bh | 0.232 | -0.170 | hedges |

```
# Compute the two-way mixed ANOVA and export to a .csv file
aov = pg.mixed_anova(data=df, dv='Scores', between='Group', within='Time',
subject='Subject', correction=False,
export_filename='mixed_anova.csv')
pg.print_table(aov)
```

Source | SS | DF1 | DF2 | MS | F | p-unc | np2 | eps |
---|---|---|---|---|---|---|---|---|

Group | 5.460 | 1 | 58 | 5.460 | 5.052 | 0.028 | 0.080 | |

Time | 7.628 | 2 | 116 | 3.814 | 4.027 | 0.020 | 0.065 | 0.999 |

Interaction | 5.168 | 2 | 116 | 2.584 | 2.728 | 0.070 | 0.045 |

```
np.random.seed(123)
z = np.random.normal(5, 1, 30)
data = pd.DataFrame({'X': x, 'Y': y, 'Z': z})
pg.pairwise_corr(data, columns=['X', 'Y', 'Z'])
```

X | Y | method | tail | n | r | CI95% | r2 | adj_r2 | z | p-unc | BF10 | power |
---|---|---|---|---|---|---|---|---|---|---|---|---|

X | Y | pearson | two-sided | 30 | 0.366 | [0.01 0.64] | 0.134 | 0.070 | 0.384 | 0.047 | 1.006 | 0.525 |

X | Z | pearson | two-sided | 30 | 0.251 | [-0.12 0.56] | 0.063 | -0.006 | 0.256 | 0.181 | 0.344 | 0.272 |

Y | Z | pearson | two-sided | 30 | 0.020 | [-0.34 0.38] | 0.000 | -0.074 | 0.020 | 0.916 | 0.142 | 0.051 |

```
# Convert from Cohen's d to Hedges' g
pg.convert_effsize(0.4, 'cohen', 'hedges', nx=10, ny=12)
```

```
0.384
```

```
pg.linear_regression(data[['X', 'Z']], data['Y'])
```

names | coef | se | T | pval | r2 | adj_r2 | CI[2.5%] | CI[97.5%] |
---|---|---|---|---|---|---|---|---|

Intercept | 4.650 | 0.841 | 5.530 | 0.000 | 0.139 | 0.076 | 2.925 | 6.376 |

X | 0.143 | 0.068 | 2.089 | 0.046 | 0.139 | 0.076 | 0.003 | 0.283 |

Z | -0.069 | 0.167 | -0.416 | 0.681 | 0.139 | 0.076 | -0.412 | 0.273 |

```
pg.mediation_analysis(data=data, x='X', m='Z', y='Y', n_boot=500)
```

Path | Beta | CI[2.5%] | CI[97.5%] | Sig |
---|---|---|---|---|

X -> M | 0.103 | -0.051 | 0.256 | No |

M -> Y | 0.018 | -0.332 | 0.369 | No |

X -> Y | 0.136 | 0.002 | 0.269 | Yes |

Direct | 0.143 | 0.003 | 0.283 | Yes |

Indirect | -0.007 | -0.050 | 0.027 | No |

```
import numpy as np
import pingouin as pg
np.random.seed(123)
mean, cov = [10, 11], [[1, 0.8], [0.8, 1]]
x, y = np.random.multivariate_normal(mean, cov, 30).T
ax = pg.plot_blandaltman(x, y)
```

Plot the curve of achieved power given the effect size (Cohen d) and the sample size of a paired T-test.

```
import matplotlib.pyplot as plt
import seaborn as sns
import pingouin as pg
import numpy as np
sns.set(style='ticks', context='notebook', font_scale=1.2)
d = 0.5 # Fixed effect size
n = np.arange(5, 80, 5) # Incrementing sample size
# Compute the achieved power
pwr = pg.power_ttest(d=d, n=n, contrast='paired', tail='two-sided')
# Start the plot
plt.plot(n, pwr, 'ko-.')
plt.axhline(0.8, color='r', ls=':')
plt.xlabel('Sample size')
plt.ylabel('Power (1 - type II error)')
plt.title('Achieved power of a paired T-test')
sns.despine()
```

Pingouin was created and is maintained by Raphael Vallat. Contributions are more than welcome so feel free to contact me, open an issue or submit a pull request!

To see the code or report a bug, please visit the GitHub repository.

Note that this program is provided with NO WARRANTY OF ANY KIND. If you can, always double check the results with another statistical software.

- Nicolas Legrand
- Richard Höchenberger

If you want to cite Pingouin, please use the publication in JOSS:

Vallat, R. (2018). Pingouin: statistics in Python. *Journal of Open Source Software*, 3(31), 1026, https://doi.org/10.21105/joss.01026

```
@ARTICLE{Vallat2018,
title = "Pingouin: statistics in Python",
author = "Vallat, Raphael",
journal = "The Journal of Open Source Software",
volume = 3,
number = 31,
pages = "1026",
month = nov,
year = 2018
}
```

Several functions of Pingouin were inspired from R or Matlab toolboxes, including:

- effsize package (R)
- ezANOVA package (R)
- pwr package (R)
- circular statistics (Matlab) (Berens 2009)
- robust correlations (Matlab) (Pernet, Wilcox & Rousselet, 2012)
- repeated-measure correlation (R) (Bakdash & Marusich, 2017)

I am also grateful to Charles Zaiontz and his website www.real-statistics.com which has been useful to understand the practical implementation of several functions.