pingouin.homoscedasticity#

pingouin.homoscedasticity(data, dv=None, group=None, method='levene', alpha=0.05, **kwargs)[source]#

Test equality of variance.

Parameters:

datapandas.DataFrame, list or dict: Iterable. Can be either a list / dictionnary of iterables or a wide- or long-format pandas dataframe.
dvstr: Dependent variable (only when data is a long-format dataframe).
groupstr: Grouping variable (only when data is a long-format dataframe).
methodstr: Statistical test. ‘levene’ (default) performs the Levene test using scipy.stats.levene(), and ‘bartlett’ performs the Bartlett test using scipy.stats.bartlett(). The former is more robust to departure from normality.
alphafloat: Significance level.
**kwargsoptional: Optional argument(s) passed to the lower-level scipy.stats.levene() function.

Returns:

statspandas.DataFrame

'W/T': Test statistic (‘W’ for Levene, ‘T’ for Bartlett)
'pval': p-value
'equal_var': True if data has equal variance

See also

normality: Univariate normality test.
sphericity: Mauchly’s test for sphericity.

Notes

The Bartlett \(T\) statistic [1] is defined as:

\[T = \frac{(N-k) \ln{s^{2}_{p}} - \sum_{i=1}^{k}(N_{i} - 1) \ln{s^{2}_{i}}}{1 + (1/(3(k-1)))((\sum_{i=1}^{k}{1/(N_{i} - 1))} - 1/(N-k))}\]

where \(s_i^2\) is the variance of the \(i^{th}\) group, \(N\) is the total sample size, \(N_i\) is the sample size of the \(i^{th}\) group, \(k\) is the number of groups, and \(s_p^2\) is the pooled variance.

The pooled variance is a weighted average of the group variances and is defined as:

\[s^{2}_{p} = \sum_{i=1}^{k}(N_{i} - 1)s^{2}_{i}/(N-k)\]

The p-value is then computed using a chi-square distribution:

\[T \sim \chi^2(k-1)\]

The Levene \(W\) statistic [2] is defined as:

\[W = \frac{(N-k)} {(k-1)} \frac{\sum_{i=1}^{k}N_{i}(\overline{Z}_{i.}-\overline{Z})^{2} } {\sum_{i=1}^{k}\sum_{j=1}^{N_i}(Z_{ij}-\overline{Z}_{i.})^{2} }\]

where \(Z_{ij} = |Y_{ij} - \text{median}({Y}_{i.})|\), \(\overline{Z}_{i.}\) are the group means of \(Z_{ij}\) and \(\overline{Z}\) is the grand mean of \(Z_{ij}\).

The p-value is then computed using a F-distribution:

\[W \sim F(k-1, N-k)\]

Warning

Missing values are not supported for this function. Make sure to remove them before using the pandas.DataFrame.dropna() or pingouin.remove_na() functions.

References

[1]

Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. R. Soc. Lond. A, 160(901), 268-282.

[2]

Brown, M. B., & Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346), 364-367.

Examples

Levene test on a wide-format dataframe

>>> import numpy as np
>>> import pingouin as pg
>>> data = pg.read_dataset("mediation")
>>> pg.homoscedasticity(data[["X", "Y", "M"]])
               W      pval  equal_var
levene  1.173518  0.310707       True

Same data but using a long-format dataframe

>>> data_long = data[["X", "Y", "M"]].melt()
>>> pg.homoscedasticity(data_long, dv="value", group="variable")
               W      pval  equal_var
levene  1.173518  0.310707       True

Same but using a mean center

>>> pg.homoscedasticity(data_long, dv="value", group="variable", center="mean")
               W      pval  equal_var
levene  1.572239  0.209303       True

Bartlett test using a list of iterables

>>> data = [[4, 8, 9, 20, 14], np.array([5, 8, 15, 45, 12])]
>>> pg.homoscedasticity(data, method="bartlett", alpha=0.05)
                 T      pval  equal_var
bartlett  2.873569  0.090045       True