pingouin.homoscedasticity

pingouin.homoscedasticity(data, dv=None, group=None, method='levene', alpha=0.05)[source]

Test equality of variance.

Parameters
datadataframe, list or dict

Iterable. Can be either a list / dictionnary of iterables or a wide- or long-format pandas dataframe.

dvstr

Dependent variable (only when data is a long-format dataframe).

groupstr

Grouping variable (only when data is a long-format dataframe).

methodstr

Statistical test. ‘levene’ (default) performs the Levene test using scipy.stats.levene(), and ‘bartlett’ performs the Bartlett test using scipy.stats.bartlett(). The former is more robust to departure from normality.

alphafloat

Significance level.

Returns
statsdataframe

Pandas DataFrame with columns:

  • 'W/T': test statistic (‘W’ for Levene, ‘T’ for Bartlett)

  • 'pval': p-value

  • 'equal_var': True if data has equal variance

See also

normality

Univariate normality test.

sphericity

Mauchly’s test for sphericity.

Notes

The Bartlett \(T\) statistic is defined as:

\[T = \frac{(N-k) \ln{s^{2}_{p}} - \sum_{i=1}^{k}(N_{i} - 1) \ln{s^{2}_{i}}}{1 + (1/(3(k-1)))((\sum_{i=1}^{k}{1/(N_{i} - 1))} - 1/(N-k))}\]

where \(s_i^2\) is the variance of the \(i^{th}\) group, \(N\) is the total sample size, \(N_i\) is the sample size of the \(i^{th}\) group, \(k\) is the number of groups, and \(s_p^2\) is the pooled variance.

The pooled variance is a weighted average of the group variances and is defined as:

\[s^{2}_{p} = \sum_{i=1}^{k}(N_{i} - 1)s^{2}_{i}/(N-k)\]

The p-value is then computed using a chi-square distribution:

\[T \sim \chi^2(k-1)\]

The Levene \(W\) statistic is defined as:

\[W = \frac{(N-k)} {(k-1)} \frac{\sum_{i=1}^{k}N_{i}(\overline{Z}_{i.}-\overline{Z})^{2} } {\sum_{i=1}^{k}\sum_{j=1}^{N_i}(Z_{ij}-\overline{Z}_{i.})^{2} }\]

where \(Z_{ij} = |Y_{ij} - median({Y}_{i.})|\), \(\overline{Z}_{i.}\) are the group means of \(Z_{ij}\) and \(\overline{Z}\) is the grand mean of \(Z_{ij}\).

The p-value is then computed using a F-distribution:

\[W \sim F(k-1, N-k)\]

Warning

Missing values are not supported for this function. Make sure to remove them before using the pandas.DataFrame.dropna() or pingouin.remove_na() functions.

References

1

Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. R. Soc. Lond. A, 160(901), 268-282.

2

Brown, M. B., & Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346), 364-367.

3

NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/

Examples

  1. Levene test on a wide-format dataframe

>>> import numpy as np
>>> import pingouin as pg
>>> data = pg.read_dataset('mediation')
>>> pg.homoscedasticity(data[['X', 'Y', 'M']])
            W      pval  equal_var
levene  0.435  0.999997       True
  1. Bartlett test using a list of iterables

>>> data = [[4, 8, 9, 20, 14], np.array([5, 8, 15, 45, 12])]
>>> pg.homoscedasticity(data, method="bartlett", alpha=.05)
              T      pval  equal_var
bartlett  2.874  0.090045       True
  1. Long-format dataframe

>>> data = pg.read_dataset('rm_anova2')
>>> pg.homoscedasticity(data, dv='Performance', group='Time')
            W      pval  equal_var
levene  3.192  0.079217       True