pingouin.partial_corr¶

pingouin.
partial_corr
(data=None, x=None, y=None, covar=None, x_covar=None, y_covar=None, alternative='twosided', method='pearson')¶ Partial and semipartial correlation.
 Parameters
 data
pandas.DataFrame
Pandas Dataframe. Note that this function can also directly be used as a
pandas.DataFrame
method, in which case this argument is no longer needed. x, ystring
x and y. Must be names of columns in
data
. covarstring or list
Covariate(s). Must be a names of columns in
data
. Use a list if there are two or more covariates. x_covarstring or list
Covariate(s) for the
x
variable. This is used to compute semipartial correlation (i.e. the effect ofx_covar
is removed fromx
but not fromy
). Only one ofcovar
,x_covar
andy_covar
can be specified. y_covarstring or list
Covariate(s) for the
y
variable. This is used to compute semipartial correlation (i.e. the effect ofy_covar
is removed fromy
but not fromx
). Only one ofcovar
,x_covar
andy_covar
can be specified. alternativestring
Defines the alternative hypothesis, or tail of the partial correlation. Must be one of “twosided” (default), “greater” or “less”. Both “greater” and “less” return a onesided pvalue. “greater” tests against the alternative hypothesis that the partial correlation is positive (greater than zero), “less” tests against the hypothesis that the partial correlation is negative.
 methodstring
Correlation type:
'pearson'
: Pearson \(r\) productmoment correlation'spearman'
: Spearman \(\rho\) rankorder correlation
 data
 Returns
 stats
pandas.DataFrame
'n'
: Sample size (after removal of missing values)'r'
: Partial correlation coefficient'CI95'
: 95% parametric confidence intervals around \(r\)'pval'
: pvalue
 stats
See also
Notes
Partial correlation [1] measures the degree of association between
x
andy
, after removing the effect of one or more controlling variables (covar
, or \(Z\)). Practically, this is achieved by calculating the correlation coefficient between the residuals of two linear regressions:\[x \sim Z, y \sim Z\]Like the correlation coefficient, the partial correlation coefficient takes on a value in the range from –1 to 1, where 1 indicates a perfect positive association.
The semipartial correlation is similar to the partial correlation, with the exception that the set of controlling variables is only removed for either
x
ory
, but not both.Pingouin uses the method described in [2] to calculate the (semi)partial correlation coefficients and associated pvalues. This method is based on the inverse covariance matrix and is significantly faster than the traditional regressionbased method. Results have been tested against the ppcor R package.
Important
Rows with missing values are automatically removed from data.
References
Examples
Partial correlation with one covariate
>>> import pingouin as pg >>> df = pg.read_dataset('partial_corr') >>> pg.partial_corr(data=df, x='x', y='y', covar='cv1').round(3) n r CI95% pval pearson 30 0.568 [0.25, 0.77] 0.001
Spearman partial correlation with several covariates
>>> # Partial correlation of x and y controlling for cv1, cv2 and cv3 >>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'], ... method='spearman').round(3) n r CI95% pval spearman 30 0.521 [0.18, 0.75] 0.005
Same but onesided test
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'], ... alternative="greater", method='spearman').round(3) n r CI95% pval spearman 30 0.521 [0.24, 1.0] 0.003
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'], ... alternative="less", method='spearman').round(3) n r CI95% pval spearman 30 0.521 [1.0, 0.72] 0.997
As a pandas method
>>> df.partial_corr(x='x', y='y', covar=['cv1'], method='spearman').round(3) n r CI95% pval spearman 30 0.578 [0.27, 0.78] 0.001
Partial correlation matrix (returns only the correlation coefficients)
>>> df.pcorr().round(3) x y cv1 cv2 cv3 x 1.000 0.493 0.095 0.130 0.385 y 0.493 1.000 0.007 0.104 0.002 cv1 0.095 0.007 1.000 0.241 0.470 cv2 0.130 0.104 0.241 1.000 0.118 cv3 0.385 0.002 0.470 0.118 1.000
Semipartial correlation on x
>>> pg.partial_corr(data=df, x='x', y='y', x_covar=['cv1', 'cv2', 'cv3']).round(3) n r CI95% pval pearson 30 0.463 [0.1, 0.72] 0.015