pingouin.logistic_regression(X, y, coef_only=False, alpha=0.05, as_dataframe=True, remove_na=False, **kwargs)[source]

(Multiple) Binary logistic regression.

Xnp.array or list

Predictor(s). Shape = (n_samples, n_features) or (n_samples,).

ynp.array or list

Dependent variable. Shape = (n_samples). Must be binary.


If True, return only the regression coefficients.


Alpha value used for the confidence intervals. \(\text{CI} = [\alpha / 2 ; 1 - \alpha / 2]\)


If True, returns a pandas DataFrame. If False, returns a dictionnary.


If True, apply a listwise deletion of missing values (i.e. the entire row is removed). Default is False, which will raise an error if missing values are present in either the predictor(s) or dependent variable.


Optional arguments passed to sklearn.linear_model.LogisticRegression.

statsdataframe or dict

Logistic regression summary:

'names' : name of variable(s) in the model (e.g. x1, x2...)
'coef' : regression coefficients
'se' : standard error
'z' : z-scores
'pval' : two-tailed p-values
'CI[2.5%]' : lower confidence interval
'CI[97.5%]' : upper confidence interval


This is a wrapper around the sklearn.linear_model.LogisticRegression class. Note that Pingouin automatically disables the l2 regularization applied by scikit-learn. This can be modified by changing the penalty argument.

The calculation of the p-values and confidence interval is adapted from a code found at

Note that the first coefficient is always the constant term (intercept) of the model. Scikit-learn will automatically add the intercept to your predictor(s) matrix, therefore, \(X\) should not include a constant term. Pingouin will remove any constant term (e.g column with only one unique value), or duplicate columns from \(X\).

Results have been compared against statsmodels, R, and JASP.


  1. Simple binary logistic regression

>>> import numpy as np
>>> from pingouin import logistic_regression
>>> np.random.seed(123)
>>> x = np.random.normal(size=30)
>>> y = np.random.randint(0, 2, size=30)
>>> lom = logistic_regression(x, y)
>>> lom.round(2)
       names  coef    se     z  pval  CI[2.5%]  CI[97.5%]
0  Intercept -0.27  0.37 -0.74  0.46     -1.00       0.45
1         x1  0.07  0.32  0.21  0.84     -0.55       0.68
  1. Multiple binary logistic regression

>>> np.random.seed(42)
>>> z = np.random.normal(size=30)
>>> X = np.column_stack((x, z))
>>> lom = logistic_regression(X, y)
>>> print(lom['coef'].values)
[-0.36736745 -0.04374684 -0.47829392]
  1. Using a Pandas DataFrame

>>> import pandas as pd
>>> df = pd.DataFrame({'x': x, 'y': y, 'z': z})
>>> lom = logistic_regression(df[['x', 'z']], df['y'])
>>> print(lom['coef'].values)
[-0.36736745 -0.04374684 -0.47829392]
  1. Return only the coefficients

>>> logistic_regression(X, y, coef_only=True)
array([-0.36736745, -0.04374684, -0.47829392])
  1. Passing custom parameters to sklearn

>>> lom = logistic_regression(X, y, solver='sag', max_iter=10000,
...                           random_state=42)
>>> print(lom['coef'].values)
[-0.36751796 -0.04367056 -0.47841908]