pingouin.mediation_analysis¶

pingouin.
mediation_analysis
(data=None, x=None, m=None, y=None, covar=None, alpha=0.05, n_boot=500, seed=None, return_dist=False)¶ Mediation analysis using a biascorrect nonparametric bootstrap method.
 Parameters
 data
pandas.DataFrame
Dataframe.
 xstr
Column name in data containing the predictor variable. The predictor variable must be continuous.
 mstr or list of str
Column name(s) in data containing the mediator variable(s). The mediator(s) can be continuous or binary (e.g. 0 or 1). This function supports multiple parallel mediators.
 ystr
Column name in data containing the outcome variable. The outcome variable must be continuous.
 covarNone, str, or list
Covariate(s). If not None, the specified covariate(s) will be included in all regressions.
 alphafloat
Significance threshold. Used to determine the confidence interval, \(\text{CI} = [\alpha / 2 ; 1  \alpha / 2]\).
 n_bootint
Number of bootstrap iterations for confidence intervals and pvalues estimation. The greater, the slower.
 seedint or None
Random state seed.
 return_distbool
If True, the function also returns the indirect bootstrapped beta samples (size = n_boot). Can be plotted for instance using
seaborn.distplot()
orseaborn.kdeplot()
functions.
 data
 Returns
 stats
pandas.DataFrame
Mediation summary:
'path'
: regression model'coef'
: regression estimates'se'
: standard error'CI[2.5%]'
: lower confidence interval'CI[97.5%]'
: upper confidence interval'pval'
: twosided pvalues'sig'
: statistical significance
 stats
See also
Notes
Mediation analysis [1] is a “statistical procedure to test whether the effect of an independent variable X on a dependent variable Y (i.e., X → Y) is at least partly explained by a chain of effects of the independent variable on an intervening mediator variable M and of the intervening variable on the dependent variable (i.e., X → M → Y)” [2].
The indirect effect (also referred to as average causal mediation effect or ACME) of X on Y through mediator M quantifies the estimated difference in Y resulting from a oneunit change in X through a sequence of causal steps in which X affects M, which in turn affects Y. It is considered significant if the specified confidence interval does not include 0. The path ‘X –> Y’ is the sum of both the indirect and direct effect. It is sometimes referred to as total effect.
A linear regression is used if the mediator variable is continuous and a logistic regression if the mediator variable is dichotomous (binary). Multiple parallel mediators are also supported.
This function will only work well if the outcome variable is continuous. It does not support binary or ordinal outcome variable. For more advanced mediation models, please refer to the lavaan or mediation R packages, or the PROCESS macro for SPSS.
The twosided pvalue of the indirect effect is computed using the bootstrap distribution, as in the mediation R package. However, the pvalue should be interpreted with caution since it is not constructed conditioned on a true null hypothesis [3] and varies depending on the number of bootstrap samples and the random seed.
Note that rows with missing values are automatically removed.
Results have been tested against the R mediation package and this tutorial https://data.library.virginia.edu/introductiontomediationanalysis/
References
 1
Baron, R. M. & Kenny, D. A. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 51, 1173–1182 (1986).
 2
Fiedler, K., Schott, M. & Meiser, T. What mediation analysis can (not) do. J. Exp. Soc. Psychol. 47, 1231–1236 (2011).
 3
Hayes, A. F. & Rockwood, N. J. Regressionbased statistical mediation and moderation analysis in clinical research: Observations, recommendations, and implementation. Behav. Res. Ther. 98, 39–57 (2017).
Code originally adapted from https://github.com/rmill040/pymediation.
Examples
Simple mediation analysis
>>> from pingouin import mediation_analysis, read_dataset >>> df = read_dataset('mediation') >>> mediation_analysis(data=df, x='X', m='M', y='Y', alpha=0.05, ... seed=42) path coef se pval CI[2.5%] CI[97.5%] sig 0 M ~ X 0.561015 0.094480 4.391362e08 0.373522 0.748509 Yes 1 Y ~ M 0.654173 0.085831 1.612674e11 0.483844 0.824501 Yes 2 Total 0.396126 0.111160 5.671128e04 0.175533 0.616719 Yes 3 Direct 0.039604 0.109648 7.187429e01 0.178018 0.257226 No 4 Indirect 0.356522 0.083313 0.000000e+00 0.219818 0.537654 Yes
Return the indirect bootstrapped beta coefficients
>>> stats, dist = mediation_analysis(data=df, x='X', m='M', y='Y', ... return_dist=True) >>> print(dist.shape) (500,)
Mediation analysis with a binary mediator variable
>>> mediation_analysis(data=df, x='X', m='Mbin', y='Y', seed=42).round(3) path coef se pval CI[2.5%] CI[97.5%] sig 0 Mbin ~ X 0.021 0.116 0.857 0.248 0.206 No 1 Y ~ Mbin 0.135 0.412 0.743 0.952 0.682 No 2 Total 0.396 0.111 0.001 0.176 0.617 Yes 3 Direct 0.396 0.112 0.001 0.174 0.617 Yes 4 Indirect 0.002 0.050 0.960 0.072 0.146 No
Mediation analysis with covariates
>>> mediation_analysis(data=df, x='X', m='M', y='Y', ... covar=['Mbin', 'Ybin'], seed=42).round(3) path coef se pval CI[2.5%] CI[97.5%] sig 0 M ~ X 0.559 0.097 0.000 0.367 0.752 Yes 1 Y ~ M 0.666 0.086 0.000 0.495 0.837 Yes 2 Total 0.420 0.113 0.000 0.196 0.645 Yes 3 Direct 0.064 0.110 0.561 0.155 0.284 No 4 Indirect 0.356 0.086 0.000 0.209 0.553 Yes
Mediation analysis with multiple parallel mediators
>>> mediation_analysis(data=df, x='X', m=['M', 'Mbin'], y='Y', ... seed=42).round(3) path coef se pval CI[2.5%] CI[97.5%] sig 0 M ~ X 0.561 0.094 0.000 0.374 0.749 Yes 1 Mbin ~ X 0.005 0.029 0.859 0.063 0.052 No 2 Y ~ M 0.654 0.086 0.000 0.482 0.825 Yes 3 Y ~ Mbin 0.064 0.328 0.846 0.715 0.587 No 4 Total 0.396 0.111 0.001 0.176 0.617 Yes 5 Direct 0.040 0.110 0.721 0.179 0.258 No 6 Indirect M 0.356 0.085 0.000 0.215 0.538 Yes 7 Indirect Mbin 0.000 0.010 0.952 0.017 0.025 No