What’s new

v0.2.6 (June 2019)

Bugfixes

  1. Fixed major error in two-sided p-value for Wilcoxon test (pingouin.wilcoxon()), the p-values were accidentally squared, and therefore smaller. Make sure to always use the latest release of Pingouin.

  2. pingouin.wilcoxon() now uses the continuity correction by default (the documentation was saying that the correction was applied but it was not applied in the code.)

  3. The show_median argument of the pingouin.plot_shift() function was not working properly when the percentiles were different that the default parameters.

Dependencies

  1. The current release of statsmodels (0.9.0) is not compatible with the newest release of Scipy (1.3.0). In order to avoid compatibility issues in the pingouin.ancova() and pingouin.anova() functions (which rely on statsmodels for certain cases), Pingouin will require SciPy < 1.3.0 until a new stable version of statsmodels is released.

New functions

  1. Added pingouin.chi2_independence() tests.

  2. Added pingouin.chi2_mcnemar() tests.

  3. Added pingouin.power_chi2() function.

  4. Added pingouin.bayesfactor_binom() function.

Enhancements

  1. pingouin.linear_regression() now returns the residuals.

  2. Completely rewrote pingouin.normality() function, which now support pandas DataFrame (wide & long format), multiple normality tests (scipy.stats.shapiro(), scipy.stats.normaltest()), and an automatic casewise removal of missing values.

  3. Completely rewrote pingouin.homoscedasticity() function, which now support pandas DataFrame (wide & long format).

  4. Faster and more accurate algorithm in pingouin.bayesfactor_pearson() (same algorithm as JASP).

  5. Support for one-sided Bayes Factors in pingouin.bayesfactor_pearson().

  6. Better handling of required parameters in pingouin.qqplot().

  7. The epsilon value for the interaction term in pingouin.rm_anova() are now computed using the Greenhouse-Geisser method instead of the lower bound. A warning message has been added to the documentation to alert the user that the value might slightly differ than from R or JASP.

Note that d. and e. also affect the behavior of the pingouin.corr() and pingouin.pairwise_corr() functions.

Contributors

v0.2.5 (May 2019)

MAJOR BUG FIXES

  1. Fixed error in p-values for one-sample one-sided T-test (pingouin.ttest()), the two-sided p-value was divided by 4 and not by 2, resulting in inaccurate (smaller) one-sided p-values.

  2. Fixed global error for unbalanced two-way ANOVA (pingouin.anova()), the sums of squares were wrong, and as a consequence so were the F and p-values. In case of unbalanced design, Pingouin now computes a type II sums of squares via a call to the statsmodels package.

  3. The epsilon factor for the interaction term in two-way repeated measures ANOVA (pingouin.rm_anova()) is now computed using the lower bound approach. This is more conservative than the Greenhouse-Geisser approach and therefore give (slightly) higher p-values. The reason for choosing this is that the Greenhouse-Geisser values for the interaction term differ than the ones returned by R and JASP. This will be hopefully fixed in future releases.

New functions

  1. Added pingouin.multivariate_ttest() (Hotelling T-squared) test.

  2. Added pingouin.cronbach_alpha() function.

  3. Added pingouin.plot_shift() function.

  4. Several functions of pandas can now be directly used as pandas.DataFrame methods.

  5. Added pingouin.pcorr() method to compute the partial Pearson correlation matrix of a pandas.DataFrame (similar to the pcor function in the ppcor package).

  6. The pingouin.partial_corr() now supports semi-partial correlation.

Enhancements

  1. The pingouin.rm_corr() function now returns a pandas.DataFrame with the r-value, degrees of freedom, p-value, confidence intervals and power.

  2. pingouin.compute_esci() now works for paired and one-sample Cohen d.

  3. pingouin.bayesfactor_ttest() and pingouin.bayesfactor_pearson() now return a formatted str and not a float.

  4. pingouin.pairwise_ttests() now returns the degrees of freedom (dof).

  5. Better rounding of float in pingouin.pairwise_ttests().

  6. Support for wide-format data in pingouin.rm_anova()

  7. pingouin.ttest() now returns the confidence intervals around the T-values.

Missing values

  1. pingouin.remove_na() and pingouin.remove_rm_na() are now external function documented in the API.

  2. pingouin.remove_rm_na() now works with multiple within-factors.

  3. pingouin.remove_na() now works with 2D arrays.

  4. Removed the remove_na argument in pingouin.rm_anova() and pingouin.mixed_anova(), an automatic listwise deletion of missing values is applied (same behavior as JASP). Note that this was also the default behavior of Pingouin, but the user could also specify not to remove the missing values, which most likely returned inaccurate results.

  5. The pingouin.ancova() function now applies an automatic listwise deletion of missing values.

  6. Added remove_na argument (default = False) in pingouin.linear_regression() and pingouin.logistic_regression() functions

  7. Missing values are automatically removed in the pingouin.anova() function.

Contributors

  • Raphael Vallat

  • Nicolas Legrand

v0.2.4 (April 2019)

Correlation

  1. Added pingouin.distance_corr() (distance correlation) function.

  2. pingouin.rm_corr() now requires at least 3 unique subjects (same behavior as the original R package).

  3. The pingouin.pairwise_corr() is faster and returns the number of outlier if a robust correlation is used.

  4. Added support for 2D level in the pingouin.pairwise_corr(). See Jupyter notebooks for examples.

  5. Added support for partial correlation in the pingouin.pairwise_corr() function.

  6. Greatly improved execution speed of pingouin.correlation.skipped() function.

  7. Added default random state to compute the Min Covariance Determinant in the pingouin.correlation.skipped() function.

  8. The default number of bootstrap samples for the pingouin.correlation.shepherd() function is now set to 200 (previously 2000) to increase computation speed.

  9. pingouin.partial_corr() now automatically drops rows with missing values.

Datasets

  1. Renamed pingouin.read_dataset() and pingouin.list_dataset() (before one needed to call these functions by calling pingouin.datasets)

Pairwise T-tests and multi-comparisons

  1. Added support for non-parametric pairwise tests in pingouin.pairwise_ttests() function.

  2. Common language effect size (CLES) is now reported by default in pingouin.pairwise_ttests() function.

  3. CLES is now implemented in the pingouin.compute_effsize() function.

  4. Better code, doc and testing for the functions in multicomp.py.

  5. P-values adjustment methods now do not take into account NaN values (same behavior as the R function p.adjust)

Plotting

  1. Added pingouin.plot_paired() function.

Regression

  1. NaN are now automatically removed in pingouin.mediation_analysis().

  2. The pingouin.linear_regression() and pingouin.logistic_regression() now fail if NaN / Inf are present in the target or predictors variables. The user must remove then before running these functions.

  3. Added support for multiple parallel mediator in pingouin.mediation_analysis().

  4. Added support for covariates in pingouin.mediation_analysis().

  5. Added seed argument to pingouin.mediation_analysis() for reproducible results.

  6. pingouin.mediation_analysis() now returns two-sided p-values computed with a permutation test.

  7. Added pingouin.utils._perm_pval() to compute p-value from a permutation test.

Bugs and tests

  1. Travis and AppVeyor test for Python 3.5, 3.6 and 3.7.

  2. Better doctest & improved examples for many functions.

  3. Fixed bug with pingouin.mad() when axis was not 0.

v0.2.3 (February 2019)

Correlation

  1. shepherd now also returns the outlier vector (same behavior as skipped).

  2. The corr function returns the number of outliers for shepherd and skipped.

  3. Removed mahal function.

Licensing

  1. Pingouin is now released under the GNU General Public Licence 3.

  2. Added licenses files of external modules (qsturng and tabulate).

Plotting

  1. NaN are automatically removed in qqplot function

v0.2.2 (December 2018)

Plotting

  1. Started working on Pingouin’s plotting module

  2. Added Seaborn and Matplotlib to dependencies

  3. Added plot_skipped_corr function (PR from Nicolas Legrand)

  4. Added qqplot function (Quantile-Quantile plot)

  5. Added plot_blandaltman function (Bland-Altman plot)

Power

  1. Added power_corr, based on the R pwr package.

  2. Renamed anova_power and ttest_power to power_anova and power_ttest.

  3. Added power column to corr() and pairwise_corr()

  4. power_ttest function can now solve for sample size, alpha and d

  5. power_ttest2n for two-sample T-test with unequal n.

  6. power_anova can now solve for sample size, number of groups, alpha and eta

v0.2.1 (November 2018)

Effect size

  1. Separated compute_esci and compute_bootci

  2. Added corrected percentile method and normal approximation to bootstrap

  3. Fixed bootstrapping method

v0.2.0 (November 2018)

ANOVA

  1. Added Welch ANOVA

  2. Added Games-Howell post-hoc test for one-way ANOVA with unequal variances

  3. Pairwise T-tests now accepts two within or two between factors

  4. Fixed error in padjust correction in the pairwise_ttests function: correction was applied on all p-values at the same time.

Correlation/Regression

  1. Added linear_regression function.

  2. Added logistic_regression function.

  3. Added mediation_analysis function.

  4. Support for advanced indexing (product / combination) in pairwise_corr function.

Documentation

  1. Added Guidelines section with flow charts

  2. Renamed API section to Functions

  3. Major improvements to the documentation of several functions

  4. Added Gitter channel

v0.1.10 (October 2018)

Bug

  1. Fixed dataset names in MANIFEST.in (.csv files were not copy-pasted with pip)

Circular

  1. Added circ_vtest function

Distribution

  1. Added multivariate_normality function (Henze-Zirkler’s Multivariate Normality Test)

  2. Renamed functions test_normality, test_sphericity and test_homoscedasticity to normality, sphericity and homoscedasticity to avoid bugs with pytest.

  3. Moved distribution tests from parametric.py to distribution.py

v0.1.9 (October 2018)

Correlation

  1. Added partial_corr function (partial correlation)

Doc

  1. Minor improvements in docs and binder notebooks

v0.1.8 (October 2018)

ANOVA

  1. Added support for multiple covariates in ANCOVA function (requires statsmodels).

Documentation

  1. Major re-organization in API category

  2. Added equations and references for effect sizes and Bayesian functions.

Non-parametric

  1. Added cochran function (Cochran Q test)

v0.1.7 (September 2018)

ANOVA

  1. Added rm_anova2 function (two-way repeated measures ANOVA).

  2. Added ancova function (Analysis of covariance)

Correlations

  1. Added intraclass_corr function (intraclass correlation).

  2. The rm_corr function uses the new ancova function instead of statsmodels.

Datasets

  1. Added ancova and icc datasets

Effect size

  1. Fixed bug in Cohen d: now use unbiased standard deviation (np.std(ddof=1)) for paired and one-sample Cohen d. Please make sure to use pingouin >= 0.1.7 to avoid any mistakes on the paired effect sizes.

v0.1.6 (September 2018)

ANOVA

  1. Added JNS method to compute sphericity.

Bug

  1. Added .csv datasets files to python site-packages folder

  2. Fixed error in test_sphericity when ddof == 0.

v0.1.5 (August 2018)

ANOVA

  1. rm_anova, friedman and mixed_anova now require a subject identifier. This avoids improper collapsing when multiple repeated measures factors are present in the dataset.

  2. rm_anova, friedman and mixed_anova now support the presence of other repeated measures factors in the dataset.

  3. Fixed error in test_sphericity

  4. Better output of ANOVA summary

  5. Added epsilon function

Code

  1. Added AppVeyor CI (Windows)

  2. Cleaned some old functions

Correlation

  1. Added repeated measures correlation (Bakdash and Marusich 2017).

  2. Added robust skipped correlation (Rousselet and Pernet 2012).

  3. Pairwise_corr function now automatically delete non-numeric columns.

Dataset

  1. Added pingouin.datasets module (read_dataset & list_dataset functions)

  2. Added datasets: bland1995, berens2009, dolan2009, mcclave1991

Doc

  1. Examples are now Jupyter Notebooks.

  2. Binder integration

Misc

  1. Added median absolute deviation (mad)

  2. Added mad median rule (Wilcox 2012)

  3. Added mahal function (equivalent of Matlab mahal function)

Parametric

  1. Added two-way ANOVA.

  2. Added pairwise_tukey function

v0.1.4 (July 2018)

Installation

  1. Fix bug with pip install caused by pingouin.external

Circular statistics

  1. Added circ_corrcc, circ_corrcl, circ_r, circ_rayleigh

v0.1.3 (June 2018)

Documentation

  1. Added several tutorials

  2. Improved doc of several functions

Bayesian

  1. T-test now reports the Bayes factor of the alternative hypothesis (BF10)

  2. Pearson correlation now reports the Bayes factor of the alternative hypothesis (BF10)

Non-parametric

  1. Kruskal-Wallis test

  2. Friedman test

Correlations

  1. Added Shepherd’s pi correlation (Schwarzkopf et al. 2012)

  2. Fixed bug in confidence intervals of correlation coefficients

  3. Parametric 95% CI are returned by default when calling corr

v0.1.2 (June 2018)

Correlation

  1. Pearson

  2. Spearman

  3. Kendall

  4. Percentage bend (robust)

  5. Pairwise correlations between all columns of a pandas dataframe

Non-parametric

  1. Mann-Whitney U

  2. Wilcoxon signed-rank

  3. Rank-biserial correlation effect size

  4. Common language effect size

v0.1.1 (April 2018)

ANOVA

  1. One-way

  2. One-way repeated measures

  3. Two-way split-plot (one between factor and one within factor)

Miscellaneous statistical functions

  1. T-tests

  2. Power of T-tests and one-way ANOVA

v0.1.0 (April 2018)

Initial release.

Pairwise comparisons

  1. FDR correction (BH / BY)

  2. Bonferroni

  3. Holm

Effect sizes:

  1. Cohen’s d (independent and repeated measures)

  2. Hedges g

  3. Glass delta

  4. Eta-square

  5. Odds-ratio

  6. Area Under the Curve

Miscellaneous statistical functions

  1. Geometric Z-score

  2. Normality, sphericity homoscedasticity and distributions tests

Code

  1. PEP8 and Flake8

  2. Tests and code coverage