Talk:McNemar's test

I don't think the given table fit the McNemar's test.

The test should be used in case of frequency counts on a dichotomic scale, with two measures on the same units (so the observations are dependents): in the article it's stated: dichotomic trait with matched pairs of subjects (that is the same). But matched pairs stands for tho measures on the same subjects: so the sex on somebody cannot fix (unless we want to know how many subjects change sex after having flown!).

The mis-understanding rely on the fact that in the article we talk about contingency table. The McNemar's test is not for 2x2 contingency table (with indipendent observations!) whereas it should be applied the Fisher's exact test or Chi squared test.

See the table in the italian wiki.

--Francesco R. 09:09, 9 December 2005 (UTC)[reply]

I agree. The table didn't show counts of matched pairs. I found the following statement to be confusing:

"Individuals without the disease are controls and individuals with the disease are cases. Within the cases and controls, individuals with the hypothesized disease gene are marked as positive for the presence of the gene and individuals without the gene are marked as negative."

into

The previous version (boys/girls, flown/not flown) before Cannin's last edit may have been better. I have exchanged the example with a typical drug testing scenario with before (control) and after conditions. I hope this is better.

--David Reitter 10 March 2006

I think that the drug testing example is incorrect. Consider an extreme example: 1,000,000 before/present+after/present, 10 before/present+after/absent, 0 before/absent+after/present, and 1,000,000 before/absent+after/absent. Now someone without the disease staying disease-free (the 0 entry) is usually expected whatever the effectiveness of the treatment. Only 10 out of a total of 2,000,010 patients recovered after being treated. Yet applying McNemar’s test here leads to the conclusion that the two-sided P value is twice the probability of flipping a fair coin ten times and getting no heads – i.e. (0.5)^9=0.002. I think that McNemar's test can only be applied when the off-diagonal entries in the table can be treated as independent coin-flips (so that the binomial can be used).

--John Sirhc (talk) 06:07, 19 September 2011 (UTC)[reply]

I don't think that the example is interpreted properly. Surely marginal homogeneity in this setting would simply mean that there are equal proportions of diseased individuals before and after treatment. This would mean that rate at which the drug cures people is the same as the rate at which new individuals get sick, not that "there was no effect of the treatment". This does not strike me as an appropriate example for using McNemar's test. If we considered two different treatments on matched pairs, then we could make conclusions such as those in example 10.24 in Rosner's Fundamentals of Biostatistics (7th edition)^[1]: "We conclude that if the treatments give different results from each other for the members of a matched pair, then the treatment A member of the pair is significantly more likely to survive for 5 years than the treatment B member. Thus, all other things being equal (such as toxicity, cost, etc.), treatment A would be the treatment of choice." With the current example, McNemar's test does not seem to have any meaningful interpretation. Chizey (talk) 18:18, 2 November 2015 (UTC)[reply]

Can someone make some redirects that point to this page? Searching for this page was difficult, i think because of spelling.

SAS Output[edit]

I've not edited statistics articles widely yet so I'm hesitant to make the edit but I don't think having software specific code is really appropriate for an encyclopedia article. Should it be removed? —mako ๛ 17:03, 20 November 2008 (UTC)[reply]

Right. Wikipedia is not a manual. This kind of content should go to Wikibooks. Calimo (talk) 13:30, 13 January 2009 (UTC)[reply]

Liddell's exact test[edit]

I added this in "Related tests" section but I think it needs more checking and cleaning of the citation. —Preceding unsigned comment added by Talgalili (talk • contribs) 12:39, 17 November 2009 (UTC)[reply]

Remove {{Refimprove}}?[edit]

I think the {{Refimprove}} tag is a bit of overkill. Would anyone object if I removed it? Which particular statements need references? Tayste (edits) 09:52, 8 February 2010 (UTC)[reply]

I agree. Talgalili (talk) 10:42, 8 February 2010 (UTC)[reply]

I have added fact tags. But a major improvement would be to rearrange so that the definition of the test, and general discussion, do not appear in the "example" section. Melcombe (talk) 11:05, 8 February 2010 (UTC)[reply]

Agree about separating the definition of test from the example.

If the reader read the first reference on the list (if they could actually get a copy of it) would they still question all those statements you've tagged with {{fact}}?

Of all the pages about statistical tests, which would you recommend as a good example to follow for improvements? Tayste (edits) 19:35, 8 February 2010 (UTC)[reply]

Well, we should be moving towards what is said in WP:CITE, but very few stats articles come anywhere near close let alone just those on tests. There is also a requirement not to rely on a single source. The article isn't structured so as to suggest something like "a lot of the following material can be found in this source", which is sometimes done. In any case the things marked look to be things like exceptions or separate opinions that lead one to expect some detail about who/where has made this particular point, even if it is contained in some main source ...there is no problem in putting in several reference tags for the same source.

As to improvements... the major problem at present is the statement of the null hypothesis. After all, in the example one knows that 59 is not the same as 121 .... Presumably there are some underlying probabilities that are being tested as being equal. Melcombe (talk) 11:34, 9 February 2010 (UTC)[reply]

I hope I have clarified the null hypothesis (and the alternative hypothesis) in my recent edits, including the test statistic and its distrubtion under the null. Yes, there is still a lot of work to do tho. Tayste (edits) 05:12, 10 February 2010 (UTC)[reply]

Definition[edit]

The definition is not stated correctly or clearly. "The null hypothesis of marginal homogeneity states that the two marginal probabilities for each outcome are the same, i.e. p_a + p_b = p_a + p_c and p_c + p_d = p_b + p_d." a, b, c, and d are counts, not outcomes, so there is no such thing as p_a, etc. This should simply read "a+b = a+c and c+d = b+d." Furthermore, the phrase "the two marginal probabilities for each outcome are the same" is unclear, as it does not refer to any column or row of probabilities on the margins of the table. It means the count of Test 2 negative outcomes is equal to count of Test 1 negative outcomes, and the count of Test 2 positive outcomes equals the count of Test 1 positive outcomes. These are not marginal probabilities; "outcome positive/negative" is not a variable in this data, so there is no one variable being isolated from the rest of the table. Philgoetz (talk) 21:57, 5 March 2017 (UTC)[reply]

P-value greater than 1[edit]

The formula for `exact-P-value` yields a probability greater than 1 for b = c. For example, with b = c = 2, we get summands 0.375 (i=2), 0.25 (i=3) and 0.0625 (i=4), totalling 0.6875, and times 2 this is 1.375. The implementation https://github.com/jowagner/mtb-tri-training/blob/master/scripts/mcnemar-exact-test.py fixes this by adding only half of the summand for i = n/2. The problem would go away if the formula was restricted to b > c. Jojo (talk) 13:53, 9 September 2021 (UTC)[reply]

Footnotes for this talk page[edit]

^ http://www.cengage.com/search/productOverview.do;jsessionid=F0D86EFF6E791904ED052F07D5868F25?N=0&Ntk=P_Isbn13&Ntt=9780538733496&Ntx=mode%2Bmatchallpartial

[1] ttp://www.cengage.com/search/productOverview.do;jsessionid=F0D86EFF6E791904ED052F07D5868F25?N=0&Ntk=P_Isbn13&Ntt=9780538733496&Ntx=mode%2Bmatchallpartial

[1]