pingouin.logistic_regression

pingouin.logistic_regression(X, y, coef_only=False, alpha=0.05, as_dataframe=True, remove_na=False, **kwargs)[source]

(Multiple) Binary logistic regression.

Parameters
Xnp.array or list

Predictor(s). Shape = (n_samples, n_features) or (n_samples,).

ynp.array or list

Dependent variable. Shape = (n_samples). Must be binary.

coef_onlybool

If True, return only the regression coefficients.

alphafloat

Alpha value used for the confidence intervals. CI = [alpha / 2 ; 1 - alpha / 2]

as_dataframebool

If True, returns a pandas DataFrame. If False, returns a dictionnary.

remove_nabool

If True, apply a listwise deletion of missing values (i.e. the entire row is removed).

**kwargsoptional

Optional arguments passed to sklearn.linear_model.LogisticRegression.

Returns
statsdataframe or dict

Logistic regression summary:

'names' : name of variable(s) in the model (e.g. x1, x2...)
'coef' : regression coefficients
'se' : standard error
'z' : z-scores
'pval' : two-tailed p-values
'CI[2.5%]' : lower confidence interval
'CI[97.5%]' : upper confidence interval

Notes

This is a wrapper around the sklearn.linear_model.LogisticRegression class.

Results have been compared against statsmodels and JASP.

Note that the first coefficient is always the constant term (intercept) of the model.

This function will not run if NaN values are either present in the target or predictors variables. Please remove them before runing the function.

Adapted from a code found at https://gist.github.com/rspeare/77061e6e317896be29c6de9a85db301d

Examples

  1. Simple binary logistic regression

>>> import numpy as np
>>> from pingouin import logistic_regression
>>> np.random.seed(123)
>>> x = np.random.normal(size=30)
>>> y = np.random.randint(0, 2, size=30)
>>> lom = logistic_regression(x, y)
>>> lom.round(2)
       names  coef    se     z  pval  CI[2.5%]  CI[97.5%]
0  Intercept -0.27  0.37 -0.73  0.46     -0.99       0.45
1         x1  0.06  0.32  0.19  0.85     -0.56       0.68
  1. Multiple binary logistic regression

>>> np.random.seed(42)
>>> z = np.random.normal(size=30)
>>> X = np.column_stack((x, z))
>>> lom = logistic_regression(X, y)
>>> print(lom['coef'].values)
[-0.34933805 -0.0226106  -0.39453532]
  1. Using a Pandas DataFrame

>>> import pandas as pd
>>> df = pd.DataFrame({'x': x, 'y': y, 'z': z})
>>> lom = logistic_regression(df[['x', 'z']], df['y'])
>>> print(lom['coef'].values)
[-0.34933805 -0.0226106  -0.39453532]
  1. Return only the coefficients

>>> logistic_regression(X, y, coef_only=True)
array([-0.34933805, -0.0226106 , -0.39453532])
  1. Passing custom parameters to sklearn

>>> lom = logistic_regression(X, y, solver='sag', max_iter=10000)
>>> print(lom['coef'].values)
[-0.34941889 -0.02261911 -0.39451064]