# FAQ

## Python

##### I am new to Python, how can I install Python and Pingouin on my computer?

To install Python on your computer, you should use Anaconda, a Python distribution which natively includes all the most important packages. Then, open the newly installed Anaconda prompt and type:

conda install pip


This will install pip, the most-widely used package manager in Python. Once pip is installed, you should be able to install Pingouin. Still in Anaconda prompt, run the following command:

pip install pingouin


You are almost ready to use Pingouin. First, you need to open an interactive Python console (either IPython or Jupyter). To do so, type the following command:

ipython


Now, let’s do a simple paired T-test using Pingouin:

import pingouin as pg
# Create two variables
x = [4, 6, 5, 7, 6]
y = [2, 2, 3, 1, 2]
# Run a T-test
pg.ttest(x, y, paired=True)

##### How to import and use Pingouin?
# 1) Import the full package
# --> Best if you are planning to use several Pingouin functions.
import pingouin as pg
pg.ttest(x, y)

# 2) Import specific functions
# --> Best if you are planning to use only this specific function.
from pingouin import ttest
ttest(x, y)

##### What are the differences between statsmodels and Pingouin?

Statsmodels is a great statistical Python package that provides several advanced functions (regression, GLM, time-series analysis) as well as an R-like syntax for fitting models. However, statsmodels can be quite hard to grasp and use for Python beginners and/or users who just want to perform simple statistical tests. The goal of Pingouin is not to replace statsmodels but rather to provide some easy-to-use functions to perform the most widely-used statistical tests. In addition, Pingouin also provides some novel functions (to cite but a few: effect sizes, pairwise T-tests and correlations, ICC, repeated measures correlation, circular statistics…).

##### What are the differences between scipy.stats and Pingouin?

The scipy.stats module provides several low-level statistical functions. However, most of these functions do not return a very detailed output (e.g. only the T- and p-values for a T-test). Most of Pingouin function are using the low-level SciPy funtions to provide a richer, more exhaustive, output. See for yourself!:

import pingouin as pg
from scipy.stats import ttest_ind

x = [4, 6, 5, 7, 6]
y = [2, 2, 3, 1, 2]

print(pg.ttest(x, y))   # Pingouin: returns a DataFrame with T-value, p-value, degrees of freedom, tail, Cohen d, power and Bayes Factor
print(ttest_ind(x, y))  # SciPy: returns only the T- and p-values


## Data

##### How can I load a .csv or .xlsx file in Python?

You need to use the pandas.read_csv() or pandas.read_excel() functions:

import pandas as pd

##### How does Pingouin deal with missing values?

Pingouin hates missing values almost as much as you do!

Most functions of Pingouin will automatically remove the missing values. In the case of paired measurements (e.g. paired T-test, correlation, or repeated measures ANOVA), a listwise deletion of missing values is performed, meaning that the entire row is removed. This is generally the best strategy if you have a large sample size and only a few missing values. However, this can be quite drastic if there are a lot of missing values in your data. In that case, it might be useful to look at imputation methods (see Pandas documentation).

If you prefer to know what’s going on under the hood, you can also remove the missing values a priori using the pingouin.remove_na() and pingouin.remove_rm_na() functions. The first one is a convenient and flexible function to remove rows or columns with missing values in 1D or 2D array(s), and the second one is specifically geared at long-format repeated measures dataframe, such as the ones required by the pingouin.rm_anova() function.

##### What's the difference between wide format and long format data and how can I convert my data from one to the other?

In wide format, each row represent a subject, and each column a measurement (e.g. “Pre”, “Post”). This is the most convenient way for humans to look at repeated measurements. It typically results in spreadsheet with a larger number of columns than rows. An example of wide-format dataframe is shown below:

Subject

Pre

Post

Gender

Age

1

2.5

3.1

M

24

2

4.2

4.8

F

32

3

2.5

2.9

F

38

In long-format, each row is one time point per subject and each column is a variable (e.g. one column with the “Subject” identifier, another with the “Scores” and another with the “Time” grouping factors). In long-format, there are usually many more rows than columns. While this is harder to read for humans, this is much easier to read for computers. For this reason, all the repeated measures functions in Pingouin work only with long-format dataframe. In the example below, the wide-format dataframe from above was converted into a long-format dataframe:

Subject

Gender

Age

Time

Scores

1

M

24

Pre

2.5

1

M

24

Post

3.1

2

F

32

Pre

4.2

2

F

32

Post

4.8

3

F

38

Pre

2.5

3

F

38

Post

2.9

The Pandas package provides some convenient functions to convert from one format to the other:

##### Can I compute descriptive statistics with Pingouin?

No, the central idea behind Pingouin is that all data manipulations and descriptive statistics should be first performed in Pandas (or NumPy). For example, to compute the mean, standard deviation, and quartiles of all the numeric columns of a pandas DataFrame, one can easily use the pandas.DataFrame.describe() method:

data.describe()


## Others

##### How can I be notified of new releases?

To be notified whenever a new release of Pingouin is available, you can click on “Watch releases” on the GitHub of Pingouin (see below).

Whenever a new release is out there, you can simply upgrade your version by typing the following line in a terminal window:

pip install --upgrade pingouin

##### I am not a programmer, how can I contribute to Pingouin?

There are many ways to contribute to Pingouin, even if you are not a programmer, for example, reporting bugs or results that are inconsistent with other statistical softwares, improving the documentation and examples, or, even buying the developpers a coffee!

##### How can I cite Pingouin?

To cite Pingouin, please use the publication in JOSS:

Vallat, R. (2018). Pingouin: statistics in Python. Journal of Open Source Software, 3(31), 1026, https://doi.org/10.21105/joss.01026

BibTeX:

@ARTICLE{Vallat2018,
title    = "Pingouin: statistics in Python",
author   = "Vallat, Raphael",
journal  = "The Journal of Open Source Software",
volume   =  3,
number   =  31,
pages    = "1026",
month    =  nov,
year     =  2018
}