pingouin.partial_corr#

pingouin.partial_corr(data=None, x=None, y=None, covar=None, x_covar=None, y_covar=None, alternative='two-sided', method='pearson')[source]#

Partial and semi-partial correlation.

Parameters:

datapandas.DataFrame

Pandas Dataframe. Note that this function can also directly be used as a pandas.DataFrame method, in which case this argument is no longer needed.

x, ystring

x and y. Must be names of columns in data.

covarstring or list

Covariate(s). Must be a names of columns in data. Use a list if there are two or more covariates.

x_covarstring or list

Covariate(s) for the x variable. This is used to compute semi-partial correlation (i.e. the effect of x_covar is removed from x but not from y). Only one of covar, x_covar and y_covar can be specified.

y_covarstring or list

Covariate(s) for the y variable. This is used to compute semi-partial correlation (i.e. the effect of y_covar is removed from y but not from x). Only one of covar, x_covar and y_covar can be specified.

alternativestring

Defines the alternative hypothesis, or tail of the partial correlation. Must be one of “two-sided” (default), “greater” or “less”. Both “greater” and “less” return a one-sided p-value. “greater” tests against the alternative hypothesis that the partial correlation is positive (greater than zero), “less” tests against the hypothesis that the partial correlation is negative.

methodstring

Correlation type:

'pearson': Pearson \(r\) product-moment correlation
'spearman': Spearman \(\rho\) rank-order correlation

Returns:

statspandas.DataFrame

'n': Sample size (after removal of missing values)
'r': Partial correlation coefficient
'CI95': 95% parametric confidence intervals around \(r\)
'p_val': p-value

See also

corr, pcorr, pairwise_corr, rm_corr

Notes

Partial correlation [1] measures the degree of association between x and y, after removing the effect of one or more controlling variables (covar, or \(Z\)). Practically, this is achieved by calculating the correlation coefficient between the residuals of two linear regressions:

\[x \sim Z, y \sim Z\]

Like the correlation coefficient, the partial correlation coefficient takes on a value in the range from –1 to 1, where 1 indicates a perfect positive association.

The semipartial correlation is similar to the partial correlation, with the exception that the set of controlling variables is only removed for either x or y, but not both.

Pingouin uses the method described in [2] to calculate the (semi)partial correlation coefficients and associated p-values. This method is based on the inverse covariance matrix and is significantly faster than the traditional regression-based method. Results have been tested against the ppcor R package.

Important

Rows with missing values are automatically removed from data.

References

[1]

https://en.wikipedia.org/wiki/Partial_correlation

[2]

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681537/

Examples

Partial correlation with one covariate

>>> import pingouin as pg
>>> df = pg.read_dataset("partial_corr")
>>> pg.partial_corr(data=df, x="x", y="y", covar="cv1").round(3)
          n      r          CI95  p_val
pearson  30  0.568  [0.25, 0.77]  0.001

Spearman partial correlation with several covariates

>>> # Partial correlation of x and y controlling for cv1, cv2 and cv3
>>> pg.partial_corr(
...     data=df, x="x", y="y", covar=["cv1", "cv2", "cv3"], method="spearman"
... ).round(3)
           n      r          CI95  p_val
spearman  30  0.521  [0.18, 0.75]  0.005

Same but one-sided test

>>> pg.partial_corr(
...     data=df,
...     x="x",
...     y="y",
...     covar=["cv1", "cv2", "cv3"],
...     alternative="greater",
...     method="spearman",
... ).round(3)
           n      r         CI95  p_val
spearman  30  0.521  [0.24, 1.0]  0.003

>>> pg.partial_corr(
...     data=df,
...     x="x",
...     y="y",
...     covar=["cv1", "cv2", "cv3"],
...     alternative="less",
...     method="spearman",
... ).round(3)
           n      r          CI95  p_val
spearman  30  0.521  [-1.0, 0.72]  0.997

As a pandas method

>>> df.partial_corr(x="x", y="y", covar=["cv1"], method="spearman").round(3)
           n      r          CI95  p_val
spearman  30  0.578  [0.27, 0.78]  0.001

Partial correlation matrix (returns only the correlation coefficients)

>>> df.pcorr().round(3)
         x      y    cv1    cv2    cv3
x    1.000  0.493 -0.095  0.130 -0.385
y    0.493  1.000 -0.007  0.104 -0.002
cv1 -0.095 -0.007  1.000 -0.241 -0.470
cv2  0.130  0.104 -0.241  1.000 -0.118
cv3 -0.385 -0.002 -0.470 -0.118  1.000

Semi-partial correlation on x

>>> pg.partial_corr(data=df, x="x", y="y", x_covar=["cv1", "cv2", "cv3"]).round(3)
          n      r         CI95  p_val
pearson  30  0.463  [0.1, 0.72]  0.015