Talk:Matthews correlation coefficient

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Merger proposal[edit]

The following is a closed discussion of a proposed merger. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this discussion.

The result of this proposed merger was merged to phi coefficient. More than twelve (!) years later, consensus is that the two coefficients are the same and that including alternative names in the lede will help those looking for the Matthews correlation coefficient. (non-admin closure) Rotideypoc41352 (talk · contribs) 02:44, 21 November 2021 (UTC)[reply]


Agree with proposal to merge with phi coefficient as definition is clearly identical. I'd say this should be merged into phi coefficient, which is a much older name (Yule 1912) and gets a lot more ghits. Qwfp (talk) 14:19, 17 July 2009 (UTC)[reply]

I agree. I have been using phi correlation in several papers and it is a standard nomenclature from the statistics literature (see for instance Applied multiple regression/correlation analysis for the behavioral sciences, J Cohen, P Cohen, SG West, LS Aiken - 1983 with more than 16,000 citations in Google Scholar. Merger under the title phi-correlation or phi-coefficient will be fine, and adding a section on other names were it includes Matthews. —Preceding unsigned comment added by 166.217.201.162 (talk) 13:20, 5 May 2010 (UTC)[reply]

There doesn't seem to be much action on this topic, but I wouldn't merge - I would delete Matthews, or if necessary, put it in as a footnote under Phi.Edstat (talk) 03:41, 10 June 2010 (UTC)[reply]

Although I acknowledge that the Matthews Correlation Coefficient and the phi coefficient are the same thing, the name Matthews Correlation Coefficient is widely used in the fields of bioinformatics and chemoinformatics, while the terminology "phi coefficient" is not used or recognised. Thus, I don't think deleting Matthews is a good idea. Jbom1 (talk) 14:49, 11 June 2010 (UTC)[reply]

Two pages for the same concept does not really make a lot of sense. I came accross the phi coeffient in a statitics book years ago but I have seen it referred to as Mathew's correlation coeffient in the literature. It is confusing to come across the differences in nomeclature as papers tend to refer to one or the other and never mention that they are the same. I would be in favour of a merger of these pages. DrMicro (talk) 16:59, 31 December 2010 (UTC)[reply]

Agreed, merge with phi coefficient and ensure mentioning naming differences in different disciplines. JKW (talk) 10:53, 13 March 2011 (UTC)[reply]

👍 Like Tal Galili (talk) 21:17, 17 November 2011 (UTC)[reply]

If they are merged, they should be kept under the title "phi coefficient", which as others have said is a much older name. --Presearch (talk) 19:38, 17 January 2012 (UTC)[reply]

Disagree. As observed in the current arguments, most have taken the point of view of a Statistician but not someone else such as a bioinformatician. I am a bioinformatician who has not done a proper statistics course before and thus did not learn Phi coefficient until now. To me, Matthews correlation coefficient (MCC) is it and not Phi coefficient. If the article on MCC were deleted like Edstat suggested, I and future people would not be able to find the information on MCC, which will be a loss of Knowledge and Information and is, I believe, against the purpose of Wikipedia. If MCC is related to Phi coefficient, or even a derivative of Phi coefficient, I would imagine statistics books would mention that. However, as pointed out by DrMicro, the two terms are often referred to in different fields and have no mention of being equivalent of each other. This could mean that the two terms might have been derived independently although arrived at the same endpoint. I propose more research on the history/deduction of Phi coefficient, which, I can see, has been performed by many, and MCC before further actions be taken. Wpliao (talk) 18:34, 23 January 2012 (UTC)[reply]

Agree. It is imperative to merge these, as they are strictly equivalent. This split causes members of both fields to miss key research on this coefficient. A simple solution is to merge these, under the older name ("phi coefficient") as suggested by others, and cite the original works that led to this. "MCC" / "Matthews Correlation Coefficient" should simply redirect to "phi coefficient", which should have a prominent note at the top of the article clarifying this matter.

The following should be noted and incorporated:

The Matthews correlation coefficient (coined in 1975 by the eponymous biochemist[1]) is actually equivalent to the (Pearson/Boas-Yule) phi coefficient. It was first published by Yule in 1912[2]. It actually originates from Pearson correlations and the Pearson (or Boas–Yule) phi coefficient is exactly the same as an MCC (when applied to a 2x2 confusion matrix for binary classification), just phrased differently. It then went on to be widely explored and discussed in statistics[3], before achieving wide use and analysis by psychology researchers[4][5][6]. There has also been some study as to how best to normalize it[7] and some additional statistical analyses[8].


References

  1. ^ Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2), 442–451.
  2. ^ Yule, G. U. (1912). On the Methods of Measuring Association Between Two Attributes. Journal of the Royal Statistical Society, 75(6), 579.
  3. ^ Cramér, H. (1946). Mathematical methods of statistics. Princeton University Press. Page 282.
  4. ^ Guilford, J. P., & Perry, N. C. (1951). Estimation of other coefficients of correlation from the phi coefficient. Psychometrika, 16(3), 335–346.
  5. ^ Scott, W. A. (1960). Measures of test homogeneity. Educational and Psychological Measurement, 20(4), 751–757.
  6. ^ Carroll, J. B. (1961). The nature of the data, or how to choose a correlation coefficient. Psychometrika, 26(4), 347–372.
  7. ^ Davenport, E. C., & El-Sanhurry, N. A. (1991). Phi/Phimax: Review and Synthesis. Educational and Psychological Measurement, 51(4), 821–828.
  8. ^ Cox, D. R., & Wermuth, N. (1992). A comment on the coefficient of determination for binary responses. American Statistician, 46(1), 1–4.

Coby.Viner (talk) 20:27, 18 August 2018 (UTC)[reply]

Agree. These are equivalent and there is no reason a Wiki user should have to read both pages to get the information on the topic. Someone please move ahead with this. MurrayScience (talk) 09:42, 12 August 2021 (UTC)[reply]


Let's decide?[edit]

I agree the two articles should be merged. I also prefer to bring MCC into Phi correlation (and distinctly mention the two names). The reason I think MCC should be introduced into Phi is: 1. history - phi was clearly invented much earlier 2. google searches (via google scholar, which I think should be the deciding factor here)

"phi coefficient" - 17,200 results
"phi correlation coefficient" 1,200 results
"matthews correlation coefficient" 12,400 results
"matthews coefficient" 7,650 results

Given the historical perspective, and that the two numbers are generally very close: I think Phi should be the preferred term for the article, and that MCC will redirect to it, with a clear section on the two names.

If no one else objects - shall we start the merger?

Tal Galili (talk) 12:41, 12 March 2021 (UTC)[reply]

Disagree. Hi, I personally disagree with the merge. The Matthews correlation coefficient is a special case of the phi coefficient, and therefore it needs a specific page to me. Moreover, many scientific articles about binary classification refer to the MCC and not to the phi coefficient. I suggest we keep the MCC page distinct from the phi coefficient. Thank you --Larry.europe (talk) 13:53, 5 April 2021 (UTC)[reply]
Hey Larry, could you please explain what you mean by a "special case"? It's a specific formula, it's used for different purposes sure, but it's the same formula. What are we gaining from now merging the two? (Given that we'll have one term forward to the other?) Could you please be as specific as possible? Tal Galili (talk) 12:52, 15 April 2021 (UTC)[reply]
And more generally, it's clear that many ML papers use one term vs the other. But that's not unusual. There are many cases in which one entity is referred to by different names. I would appreciate more details on your objection. Tal Galili (talk) 12:55, 15 April 2021 (UTC)[reply]
Hi, sorry for the late reply. The Matthews correlation coefficient is a special case of the phi coefficient because it is the phi coefficient applied to a 2 × 2 table. All the Matthews correlation coefficients are also phi coefficients, but not all the phi coefficients are Matthews correlation coefficients. To me, it is the same difference between dog and animal: each dog is an animal but you would never request to delete the dog Wikipedia page because there's another Wikipedia page for animal, right? ;-) Same here. I honestly see only drawbacks from merging the two articles. Thanks for considering my point of view --Larry.europe (talk) 21:33, 13 May 2021 (UTC)[reply]
Hey Larry.europe, thanks for the input.
Your description doesn't seem aligned with what's written in the article. The article doesn't say that MCC is defined as Phi coefficient for binary outcome (i.e.: it's not a special case of phi). But rather that it's defined exactly as phi, but that it's used (not defined, but used) for binary outcomes. Both MCC and Phi are the same thing, used in different contexts (and given different names). Hence, I think the entity should be the same - and they should be in the same article. I think MCC should point to Phi (since Phi pre-dates, and is more common). I think it's fair to have a section on the naming conventions and the history. The current situation is this weird duplication in which two articles are talking about the same thing. I'm also fine with having both articles use the terms interchangeably (i.e.: that existing sections talking about MCC will keep using that term), as this fairly reflects the duality of the term usage. But I don't think having two different articles, each repeating the same observations about the entity - makes much sense. WDYT? Tal Galili (talk) 11:59, 14 May 2021 (UTC)[reply]

Given no further argument will be made, I will proceed with the merger in the coming weeks/months. Tal Galili (talk) 01:04, 13 August 2021 (UTC)[reply]

Thank you -- I think the merger is long overdue. e.g. turning the MCC page into a redirect to the more established term. Mebden (talk) 01:55, 16 November 2021 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Standard Error[edit]

Can anybody confirm the standard error of the sample phi coefficient/Matthews Coefficient? I have a reference stating the formula is in Kendall & Stuart's book from 1961 (!), 'The Advanced Theory of Statistics', but cannot find that text. Shabbychef (talk) 05:08, 26 June 2010 (UTC)[reply]

Relation to Chi-Square[edit]

Formula seems to be wrong - if MCC = sqrt(X**2 / n) then no negative values are possible. Formula should either use |MCC| = sqrt() or use the same formula as in phi coefficient. —Preceding unsigned comment added by 85.53.133.219 (talk) 15:26, 13 November 2010 (UTC)[reply]

Changed the article. —Preceding unsigned comment added by 85.53.133.219 (talk) 15:36, 13 November 2010 (UTC)[reply]

How to edit references? One of them has an invalid link.[edit]

In regards to Citation [4].

Powers, David M W (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation" (PDF). Journal of Machine Learning Technologies. 2 (1): 37–63.

The link to the article says "Forbidden" when you click on it. An updated reference is at https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.9232

I could not figure out how to edit this reference, as it does not appear when I click "edit sources" in the references section. Can someone please advise or help me? Thanks! — Preceding unsigned comment added by Adgaudio (talkcontribs) 04:13, 4 February 2020 (UTC)[reply]

You need to edit the reference where it first appears in the text, not in the references/sources section. I've done this for that paper, linking now to a more accessible source. Klbrain (talk) 09:32, 26 July 2021 (UTC)[reply]
Resolved