pingouin.cronbach_alpha#
- pingouin.cronbach_alpha(data=None, items=None, scores=None, subject=None, nan_policy='pairwise', ci=0.95)[source]#
Cronbach’s alpha reliability measure.
- Parameters:
- data
pandas.DataFrame Wide or long-format dataframe.
- itemsstr
Column in
datawith the items names (long-format only).- scoresstr
Column in
datawith the scores (long-format only).- subjectstr
Column in
datawith the subject identifier (long-format only).- nan_policybool
If ‘listwise’, remove the entire rows that contain missing values (= listwise deletion). If ‘pairwise’ (default), only pairwise missing values are removed when computing the covariance matrix. For more details, please refer to the
pandas.DataFrame.cov()method.- cifloat
Confidence interval (.95 = 95%)
- data
- Returns:
- alphafloat
Cronbach’s alpha
Notes
This function works with both wide and long format dataframe. If you pass a long-format dataframe, you must also pass the
items,scoresandsubjcolumns (in which case the data will be converted into wide format using thepandas.DataFrame.pivot()method).Internal consistency is usually measured with Cronbach’s alpha [1], a statistic calculated from the pairwise correlations between items. Internal consistency ranges between negative infinity and one. Coefficient alpha will be negative whenever there is greater within-subject variability than between-subject variability.
Cronbach’s \(\alpha\) is defined as
\[\alpha ={k \over k-1}\left(1-{\sum_{{i=1}}^{k}\sigma_{{y_{i}}}^{2} \over\sigma_{x}^{2}}\right)\]where \(k\) refers to the number of items, \(\sigma_{x}^{2}\) is the variance of the observed total scores, and \(\sigma_{{y_{i}}}^{2}\) the variance of component \(i\) for the current sample of subjects.
Another formula for Cronbach’s \(\alpha\) is
\[\alpha = \frac{k \times \bar c}{\bar v + (k - 1) \times \bar c}\]where \(\bar c\) refers to the average of all covariances between items and \(\bar v\) to the average variance of each item.
95% confidence intervals are calculated using Feldt’s method [2]:
\[ \begin{align}\begin{aligned}c_L = 1 - (1 - \alpha) \cdot F_{(0.025, n-1, (n-1)(k-1))}\\c_U = 1 - (1 - \alpha) \cdot F_{(0.975, n-1, (n-1)(k-1))}\end{aligned}\end{align} \]where \(n\) is the number of subjects and \(k\) the number of items.
Results have been tested against the psych R package.
References
[2]Feldt, Leonard S., Woodruff, David J., & Salih, Fathi A. (1987). Statistical inference for coefficient alpha. Applied Psychological Measurement, 11(1):93-103.
Examples
Binary wide-format dataframe (with missing values)
>>> import pingouin as pg >>> data = pg.read_dataset('cronbach_wide_missing') >>> # In R: psych:alpha(data, use="pairwise") >>> pg.cronbach_alpha(data=data) (0.732660835214447, array([0.435, 0.909]))
After listwise deletion of missing values (remove the entire rows)
>>> # In R: psych:alpha(data, use="complete.obs") >>> pg.cronbach_alpha(data=data, nan_policy='listwise') (0.8016949152542373, array([0.581, 0.933]))
After imputing the missing values with the median of each column
>>> pg.cronbach_alpha(data=data.fillna(data.median())) (0.7380191693290734, array([0.447, 0.911]))
Likert-type long-format dataframe
>>> data = pg.read_dataset('cronbach_alpha') >>> pg.cronbach_alpha(data=data, items='Items', scores='Scores', ... subject='Subj') (0.5917188485995826, array([0.195, 0.84 ]))