pingouin.mediation_analysis

pingouin.mediation_analysis(data=None, x=None, m=None, y=None, covar=None, alpha=0.05, n_boot=500, seed=None, return_dist=False)[source]

Mediation analysis using a bias-correct non-parametric bootstrap method.

Parameters
datapd.DataFrame

Dataframe.

xstr

Column name in data containing the predictor variable. The predictor variable must be continuous.

mstr or list of str

Column name(s) in data containing the mediator variable(s). The mediator(s) can be continuous or binary (e.g. 0 or 1). This function supports multiple parallel mediators.

ystr

Column name in data containing the outcome variable. The outcome variable must be continuous.

covarNone, str, or list

Covariate(s). If not None, the specified covariate(s) will be included in all regressions.

alphafloat

Significance threshold. Used to determine the confidence interval, \(\text{CI} = [\alpha / 2 ; 1 - \alpha / 2]\).

n_bootint

Number of bootstrap iterations for confidence intervals and p-values estimation. The greater, the slower.

seedint or None

Random state seed.

return_distbool

If True, the function also returns the indirect bootstrapped beta samples (size = n_boot). Can be plotted for instance using seaborn.distplot() or seaborn.kdeplot() functions.

Returns
statspd.DataFrame

Mediation summary:

'path' : regression model
'coef' : regression estimates
'se' : standard error
'CI[2.5%]' : lower confidence interval
'CI[97.5%]' : upper confidence interval
'pval' : two-sided p-values
'sig' : statistical significance

Notes

Mediation analysis is a “statistical procedure to test whether the effect of an independent variable X on a dependent variable Y (i.e., X → Y) is at least partly explained by a chain of effects of the independent variable on an intervening mediator variable M and of the intervening variable on the dependent variable (i.e., X → M → Y)” (from Fiedler et al. 2011).

The indirect effect (also referred to as average causal mediation effect or ACME) of X on Y through mediator M quantifies the estimated difference in Y resulting from a one-unit change in X through a sequence of causal steps in which X affects M, which in turn affects Y. It is considered significant if the specified confidence interval does not include 0. The path ‘X –> Y’ is the sum of both the indirect and direct effect. It is sometimes referred to as total effect. For more details, please refer to Fiedler et al 2011 or Hayes and Rockwood 2017.

A linear regression is used if the mediator variable is continuous and a logistic regression if the mediator variable is dichotomous (binary). Note that this function also supports parallel multiple mediators: “in such models, mediators may be and often are correlated, but nothing in the model allows one mediator to causally influence another.” (Hayes and Rockwood 2017)

This function wll only work well if the outcome variable is continuous. It does not support binary or ordinal outcome variable. For more advanced mediation models, please refer to the lavaan or mediation R packages, or the PROCESS macro for SPSS.

The two-sided p-value of the indirect effect is computed using the bootstrap distribution, as in the mediation R package. However, the p-value should be interpreted with caution since it is a) not constructed conditioned on a true null hypothesis (see Hayes and Rockwood 2017) and b) varies depending on the number of bootstrap samples and the random seed.

Note that rows with NaN are automatically removed.

Results have been tested against the R mediation package and this tutorial https://data.library.virginia.edu/introduction-to-mediation-analysis/

References

1

Baron, R. M. & Kenny, D. A. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 51, 1173–1182 (1986).

2

Fiedler, K., Schott, M. & Meiser, T. What mediation analysis can (not) do. J. Exp. Soc. Psychol. 47, 1231–1236 (2011).

3

Hayes, A. F. & Rockwood, N. J. Regression-based statistical mediation and moderation analysis in clinical research: Observations, recommendations, and implementation. Behav. Res. Ther. 98, 39–57 (2017).

4

https://cran.r-project.org/web/packages/mediation/mediation.pdf

5

http://lavaan.ugent.be/tutorial/mediation.html

6

https://github.com/rmill040/pymediation

Examples

  1. Simple mediation analysis

>>> from pingouin import mediation_analysis, read_dataset
>>> df = read_dataset('mediation')
>>> mediation_analysis(data=df, x='X', m='M', y='Y', alpha=0.05, seed=42)
       path    coef      se          pval  CI[2.5%]  CI[97.5%]  sig
0     M ~ X  0.5610  0.0945  4.391362e-08    0.3735     0.7485  Yes
1     Y ~ M  0.6542  0.0858  1.612674e-11    0.4838     0.8245  Yes
2     Total  0.3961  0.1112  5.671128e-04    0.1755     0.6167  Yes
3    Direct  0.0396  0.1096  7.187429e-01   -0.1780     0.2572   No
4  Indirect  0.3565  0.0833  0.000000e+00    0.2198     0.5377  Yes
  1. Return the indirect bootstrapped beta coefficients

>>> stats, dist = mediation_analysis(data=df, x='X', m='M', y='Y',
...                                  return_dist=True)
>>> print(dist.shape)
(500,)
  1. Mediation analysis with a binary mediator variable

>>> mediation_analysis(data=df, x='X', m='Mbin', y='Y', seed=42)
       path    coef      se      pval  CI[2.5%]  CI[97.5%]  sig
0  Mbin ~ X -0.0205  0.1159  0.859392   -0.2476     0.2066   No
1  Y ~ Mbin -0.1354  0.4118  0.743076   -0.9525     0.6818   No
2     Total  0.3961  0.1112  0.000567    0.1755     0.6167  Yes
3    Direct  0.3956  0.1117  0.000614    0.1739     0.6173  Yes
4  Indirect  0.0023  0.0495  0.960000   -0.0715     0.1441   No
  1. Mediation analysis with covariates

>>> mediation_analysis(data=df, x='X', m='M', y='Y',
...                    covar=['Mbin', 'Ybin'], seed=42)
       path    coef      se          pval  CI[2.5%]  CI[97.5%]  sig
0     M ~ X  0.5594  0.0968  9.394635e-08    0.3672     0.7516  Yes
1     Y ~ M  0.6660  0.0861  1.017261e-11    0.4951     0.8368  Yes
2     Total  0.4204  0.1129  3.324252e-04    0.1962     0.6446  Yes
3    Direct  0.0645  0.1104  5.608583e-01   -0.1548     0.2837   No
4  Indirect  0.3559  0.0865  0.000000e+00    0.2093     0.5530  Yes
  1. Mediation analysis with multiple parallel mediators

>>> mediation_analysis(data=df, x='X', m=['M', 'Mbin'], y='Y', seed=42)
            path    coef      se          pval  CI[2.5%]  CI[97.5%]  sig
0          M ~ X  0.5610  0.0945  4.391362e-08    0.3735     0.7485  Yes
1       Mbin ~ X -0.0051  0.0290  8.592408e-01   -0.0626     0.0523   No
2          Y ~ M  0.6537  0.0863  2.118163e-11    0.4824     0.8250  Yes
3       Y ~ Mbin -0.0640  0.3282  8.456998e-01   -0.7154     0.5873   No
4          Total  0.3961  0.1112  5.671128e-04    0.1755     0.6167  Yes
5         Direct  0.0395  0.1102  7.206301e-01   -0.1792     0.2583   No
6     Indirect M  0.3563  0.0845  0.000000e+00    0.2148     0.5385  Yes
7  Indirect Mbin  0.0003  0.0097  9.520000e-01   -0.0172     0.0252   No