pingouin.chi2_mcnemar(data, x, y, correction=True)[source]

Performs the exact and approximated versions of McNemar’s test.


The dataframe containing the ocurrences for the test. Each row must represent either a subject or a pair of subjects.

x, ystring

The variables names for the McNemar’s test. Must be names of columns in data.

If each row of data represents a subject, then x and y must be columns containing dichotomous measurements in two different contexts. For instance: the presence of pain before and after a certain treatment.

If each row of data represents a pair of subjects, then x and y must be columns containing dichotomous measurements for each of the subjects. For instance: a positive response to a certain drug in the control group and in the test group, supposing that each pair contains a subject in each group.

The 2x2 crosstab is created using the pingouin.dichotomous_crosstab() function.


Missing values are not allowed.


Whether to apply the correction for continuity (Edwards, A. 1948).


The observed contingency table of frequencies.


The test summary:

  • 'chi2': The test statistic

  • 'dof': The degree of freedom

  • 'p-approx': The approximated p-value

  • 'p-exact': The exact p-value


The McNemar’s test is compatible with dichotomous paired data, generally used to assert the effectiveness of a certain procedure, such as a treatment or the use of a drug. “Dichotomous” means that the values of the measurements are binary. “Paired data” means that each measurement is done twice, either on the same subject in two different moments or in two similar (paired) subjects from different groups (e.g.: control/test). In order to better understand the idea behind McNemar’s test, let’s illustrate it with an example.

Suppose that we wanted to compare the effectiveness of two different treatments (X and Y) for athlete’s foot on a certain group of n people. To achieve this, we measured their responses to such treatments on each foot. The observed data summary was:

  • Number of people with good responses to X and Y: a

  • Number of people with good response to X and bad response to Y: b

  • Number of people with bad response to X and good response to Y: c

  • Number of people with bad responses to X and Y: d

Now consider the two groups:

  1. The group of people who had good response to X (a + b subjects)

  2. The group of people who had good response to Y (a + c subjects)

If the treatments have the same effectiveness, we should expect the probabilities of having good responses to be the same, regardless of the treatment. Mathematically, such statement can be translated into the following equation:

\[\frac{a+b}{n} = \frac{a+c}{n} \Rightarrow b = c\]

Thus, this test should indicate higher statistical significances for higher distances between b and c (McNemar, Q. 1947):

\[\chi^2 = \frac{(b - c)^2}{b + c}\]


  • Edwards, A. L. (1948). Note on the “correction for continuity” in testing the significance of the difference between correlated proportions. Psychometrika, 13(3), 185-187.

  • McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153-157.


>>> import pingouin as pg
>>> data = pg.read_dataset('chi2_mcnemar')
>>> observed, stats = pg.chi2_mcnemar(data, 'treatment_X', 'treatment_Y')
>>> observed
treatment_Y   0   1
0            20  40
1             8  12

In this case, c (40) seems to be a significantly greater than b (8). The McNemar test should be sensitive to this.

>>> stats
            chi2  dof  p-approx   p-exact
mcnemar  20.020833    1  0.000008  0.000003