pingouin.anova

pingouin.
anova
(dv=None, between=None, data=None, detailed=False, export_filename=None)[source] Oneway and twoway ANOVA.
 Parameters
 dvstring
Name of column in
data
containing the dependent variable. betweenstring or list with two elements
Name of column(s) in
data
containing the betweensubject factor(s). Ifbetween
is a single string, a oneway ANOVA is computed. Ifbetween
is a list with two elements (e.g. [‘Factor1’, ‘Factor2’]), a twoway ANOVA is computed. datapandas DataFrame
DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.
 detailedboolean
If True, return a detailed ANOVA table (default True for twoway ANOVA).
 export_filenamestring
Filename (without extension) for the output file. If None, do not export the table. By default, the file will be created in the current python console directory. To change that, specify the filename with full path.
 Returns
 aovDataFrame
ANOVA summary
'Source' : Factor names 'SS' : Sums of squares 'DF' : Degrees of freedom 'MS' : Mean squares 'F' : Fvalues 'punc' : uncorrected pvalues 'np2' : Partial etasquare effect sizes
See also
rm_anova
Oneway and twoway repeated measures ANOVA
mixed_anova
Two way mixed ANOVA
welch_anova
Oneway Welch ANOVA
kruskal
Nonparametric oneway ANOVA
Notes
The classic ANOVA is very powerful when the groups are normally distributed and have equal variances. However, when the groups have unequal variances, it is best to use the Welch ANOVA (welch_anova) that better controls for type I error (Liu 2015). The homogeneity of variances can be measured with the homoscedasticity function.
The main idea of ANOVA is to partition the variance (sums of squares) into several components. For example, in oneway ANOVA:
\[SS_{total} = SS_{treatment} + SS_{error}\]\[SS_{total} = \sum_i \sum_j (Y_{ij}  \overline{Y})^2\]\[SS_{treatment} = \sum_i n_i (\overline{Y_i}  \overline{Y})^2\]\[SS_{error} = \sum_i \sum_j (Y_{ij}  \overline{Y}_i)^2\]where \(i=1,...,r; j=1,...,n_i\), \(r\) is the number of groups, and \(n_i\) the number of observations for the \(i\) th group.
The Fstatistics is then defined as:
\[F^* = \frac{MS_{treatment}}{MS_{error}} = \frac{SS_{treatment} / (r  1)}{SS_{error} / (n_t  r)}\]and the pvalue can be calculated using a Fdistribution with \(r1, n_t1\) degrees of freedom.
When the groups are balanced and have equal variances, the optimal posthoc test is the TukeyHSD test (
pingouin.pairwise_tukey()
). If the groups have unequal variances, the GamesHowell test is more adequate (pingouin.pairwise_gameshowell()
).The effect size reported in Pingouin is the partial etasquare. However, one should keep in mind that for oneway ANOVA partial etasquare is the same as etasquare and generalized etasquare. For more details, see Bakeman 2005; Richardson 2011.
\[\eta_p^2 = \frac{SS_{treatment}}{SS_{treatment} + SS_{error}}\]Note that missing values are automatically removed. Results have been tested against R, Matlab and JASP.
Important
Versions of Pingouin below 0.2.5 gave wrong results for unbalanced twoway ANOVA. This issue has been resolved in Pingouin>=0.2.5. In such cases, a type II ANOVA is calculated via an internal call to the statsmodels package. This latter package is therefore required for twoway ANOVA with unequal sample sizes.
References
 1
Liu, Hangcheng. “Comparing Welch’s ANOVA, a KruskalWallis test and traditional ANOVA in case of Heterogeneity of Variance.” (2015).
 2
Bakeman, Roger. “Recommended effect size statistics for repeated measures designs.” Behavior research methods 37.3 (2005): 379384.
 3
Richardson, John TE. “Eta squared and partial eta squared as measures of effect size in educational research.” Educational Research Review 6.2 (2011): 135147.
Examples
Oneway ANOVA
>>> import pingouin as pg >>> df = pg.read_dataset('anova') >>> aov = pg.anova(dv='Pain threshold', between='Hair color', data=df, ... detailed=True) >>> aov Source SS DF MS F punc np2 0 Hair color 1360.726 3 453.575 6.791 0.00411423 0.576 1 Within 1001.800 15 66.787   
Note that this function can also directly be used as a Pandas method
>>> df.anova(dv='Pain threshold', between='Hair color', detailed=True) Source SS DF MS F punc np2 0 Hair color 1360.726 3 453.575 6.791 0.00411423 0.576 1 Within 1001.800 15 66.787   
Twoway ANOVA with balanced design
>>> data = pg.read_dataset('anova2') >>> data.anova(dv="Yield", between=["Blend", "Crop"]).round(3) Source SS DF MS F punc np2 0 Blend 2.042 1 2.042 0.004 0.952 0.000 1 Crop 2736.583 2 1368.292 2.525 0.108 0.219 2 Blend * Crop 2360.083 2 1180.042 2.178 0.142 0.195 3 residual 9753.250 18 541.847 NaN NaN NaN
Twoway ANOVA with unbalanced design (requires statsmodels)
>>> data = pg.read_dataset('anova2_unbalanced') >>> data.anova(dv="Scores", between=["Diet", "Exercise"]).round(3) Source SS DF MS F punc np2 0 Diet 390.625 1.0 390.625 7.423 0.034 0.553 1 Exercise 180.625 1.0 180.625 3.432 0.113 0.364 2 Diet * Exercise 15.625 1.0 15.625 0.297 0.605 0.047 3 residual 315.750 6.0 52.625 NaN NaN NaN