pingouin.chi2_independence(data, x, y, correction=True)[source]

Chi-squared independence tests between two categorical variables.

The test is computed for different values of \(\lambda\): 1, 2/3, 0, -1/2, -1 and -2 (Cressie and Read, 1984).


The dataframe containing the ocurrences for the test.

x, ystring

The variables names for the Chi-squared test. Must be names of columns in data.


Whether to apply Yates’ correction when the degree of freedom of the observed contingency table is 1 (Yates 1934).


The expected contingency table of frequencies.


The (corrected or not) observed contingency table of frequencies.


The test summary, containing four columns:

  • 'test': The statistic name

  • 'lambda': The \(\lambda\) value used for the power divergence statistic

  • 'chi2': The test statistic

  • 'pval': The p-value of the test

  • 'cramer': The Cramer’s V effect size

  • 'power': The statistical power of the test


From Wikipedia:

The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.

As application examples, this test can be used to i) evaluate the quality of a categorical variable in a classification problem or to ii) check the similarity between two categorical variables. In the first example, a good categorical predictor and the class column should present high \(\chi^2\) and low p-value. In the second example, similar categorical variables should present low \(\chi^2\) and high p-value.

This function is a wrapper around the scipy.stats.power_divergence() function.


As a general guideline for the consistency of this test, the observed and the expected contingency tables should not have cells with frequencies lower than 5.


  • Cressie, N., & Read, T. R. (1984). Multinomial goodness‐of‐fit tests. Journal of the Royal Statistical Society: Series B (Methodological), 46(3), 440-464.

  • Yates, F. (1934). Contingency Tables Involving Small Numbers and the \(\chi^2\) Test. Supplement to the Journal of the Royal Statistical Society, 1, 217-235.


Let’s see if gender is a good categorical predictor for the presence of heart disease.

>>> import pingouin as pg
>>> data = pg.read_dataset('chi2_independence')
>>> data['sex'].value_counts(ascending=True)
0     96
1    207
Name: sex, dtype: int64

If gender is not a good predictor for heart disease, we should expect the same 96:207 ratio across the target classes.

>>> expected, observed, stats = pg.chi2_independence(data, x='sex',
...                                                  y='target')
>>> expected
target          0           1
0       43.722772   52.277228
1       94.277228  112.722772

Let’s see what the data tells us.

>>> observed
target      0     1
0        24.5  71.5
1       113.5  93.5

The proportion is lower on the class 0 and higher on the class 1. The tests should be sensitive to this difference.

>>> stats.round(3)
                 test  lambda    chi2  dof  pval  cramer  power
0             pearson   1.000  22.717  1.0   0.0   0.274  0.997
1        cressie-read   0.667  22.931  1.0   0.0   0.275  0.998
2      log-likelihood   0.000  23.557  1.0   0.0   0.279  0.998
3       freeman-tukey  -0.500  24.220  1.0   0.0   0.283  0.998
4  mod-log-likelihood  -1.000  25.071  1.0   0.0   0.288  0.999
5              neyman  -2.000  27.458  1.0   0.0   0.301  0.999

Very low p-values indeed. The gender qualifies as a good predictor for the presence of heart disease on this dataset.