pingouin.welch_anova

pingouin.
welch_anova
(data=None, dv=None, between=None, export_filename=None)[source] Oneway Welch ANOVA.
 Parameters
 dvstring
Name of column containing the dependant variable.
 betweenstring
Name of column containing the between factor.
 data
pandas.DataFrame
DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.
 export_filenamestring
Filename (without extension) for the output file. If None, do not export the table. By default, the file will be created in the current python console directory. To change that, specify the filename with full path.
 Returns
 aovDataFrame
ANOVA summary
'Source' : Factor names 'SS' : Sums of squares 'DF' : Degrees of freedom 'MS' : Mean squares 'F' : Fvalues 'punc' : uncorrected pvalues
See also
anova
Oneway and Nway ANOVA
rm_anova
Oneway and twoway repeated measures ANOVA
mixed_anova
Two way mixed ANOVA
kruskal
Nonparametric oneway ANOVA
Notes
From Wikipedia:
It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student’s ttest, and is more reliable when the two samples have unequal variances and/or unequal sample sizes.
The classic ANOVA is very powerful when the groups are normally distributed and have equal variances. However, when the groups have unequal variances, it is best to use the Welch ANOVA that better controls for type I error (Liu 2015). The homogeneity of variances can be measured with the homoscedasticity function. The two other assumptions of normality and independance remain.
The main idea of Welch ANOVA is to use a weight \(w_i\) to reduce the effect of unequal variances. This weight is calculated using the sample size \(n_i\) and variance \(s_i^2\) of each group \(i=1,...,r\):
\[w_i = \frac{n_i}{s_i^2}\]Using these weights, the adjusted grand mean of the data is:
\[\overline{Y}_{welch} = \frac{\sum_{i=1}^r w_i\overline{Y}_i} {\sum w}\]where \(\overline{Y}_i\) is the mean of the \(i\) group.
The treatment sums of squares is defined as:
\[SS_{treatment} = \sum_{i=1}^r w_i (\overline{Y}_i  \overline{Y}_{welch})^2\]We then need to calculate a term lambda:
\[\Lambda = \frac{3\sum_{i=1}^r(\frac{1}{n_i1}) (1  \frac{w_i}{\sum w})^2}{r^2  1}\]from which the Fvalue can be calculated:
\[F_{welch} = \frac{SS_{treatment} / (r1)} {1 + \frac{2\Lambda(r2)}{3}}\]and the pvalue approximated using a Fdistribution with \((r1, 1 / \Lambda)\) degrees of freedom.
When the groups are balanced and have equal variances, the optimal posthoc test is the TukeyHSD test (
pingouin.pairwise_tukey()
). If the groups have unequal variances, the GamesHowell test is more adequate (pingouin.pairwise_gameshowell()
).Results have been tested against R.
References
 1
Liu, Hangcheng. “Comparing Welch’s ANOVA, a KruskalWallis test and traditional ANOVA in case of Heterogeneity of Variance.” (2015).
 2
Welch, Bernard Lewis. “On the comparison of several mean values: an alternative approach.” Biometrika 38.3/4 (1951): 330336.
Examples
Oneway Welch ANOVA on the pain threshold dataset.
>>> from pingouin import welch_anova, read_dataset >>> df = read_dataset('anova') >>> aov = welch_anova(dv='Pain threshold', between='Hair color', ... data=df, export_filename='pain_anova.csv') >>> aov Source ddof1 ddof2 F punc 0 Hair color 3 8.33 5.89 0.018813