Talk:Deviance (statistics)

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Mathematics Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Mid	This article has been rated as Mid-priority on the project's priority scale.

Daily pageviews of this article

A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org

2008[edit]

It isn´t defined as log likelihood. it is the log likelihood ratio!! of the submodel divided by the fullmodel.

Reformulated to meet above point Melcombe (talk) 15:52, 12 September 2008 (UTC)[reply]

Definition[edit]

The First edition of the McCullagh & Nelder reference certainly defines the deviance using the log-likelihood ratio (p17)... I can't access the 2nd Edition. In addition the two external links also agree with this. Melcombe (talk) 10:20, 15 September 2008 (UTC)[reply]

You are right! Got confused about the definition of "full model." I tried to clear this up to prevent others from having this confusion. Pdbailey (talk) 01:00, 16 September 2008 (UTC)[reply]

Possible error - Discussion needed ?[edit]

The text reads: "...two models follows an approximate chi-squared distribution with k-degrees of freedom." However my statistics professor has today pointed out to the class that this is wrong and that it is a common mistake even among statisticians(Dobbson and Barnetts book for example), according to him it only holds for normal linear models or very special cases. I googled it and found another professors slides which also agrees that this is wrong: Could anyone confirm or deny this please ? —Preceding unsigned comment added by Thedreamshaper (talk • contribs) 17:19, 26 October 2010 (UTC)[reply]

I deny it! Qwfp (talk) 17:34, 26 October 2010 (UTC)[reply]

Just what is thought to be wrong? For example, the article clearly states "for nested models" and is right for this case. But the "approximate chi-squared distribution" doesn't hold for non-nested models . Or you might be missing the "approximate" part of the result, and the result is certainly not exact other than for "normal linear models or very special cases" ... but the article does not say that it is. The result is a "large sample" one, and perhaps some people have found poor results for sample-sizes that are "too small". Melcombe (talk) 10:41, 27 October 2010 (UTC)[reply]

Melcombe, McCullagh and Nelder actually give huge numbers of warnings about the assumptions required for the chi-square distribution to hold. This, they say, is not true for the deviance drop. 0¹⁸ (talk) 14:30, 27 October 2010 (UTC)[reply]

As above, I only have access to the First Edition. I only looked in the index under chi-squared and this didn't lead to "huge numbers" of warnings. It did say that the approximation may not be very good, even for what are notionally large samples, but with no specific information. This is really no different from the result for log-likelihoods, since the result here is just a transposition of the result there. (We know that McCullagh went on to do work on higher order approximations in likelihood theory, but that would presumably be impractical for general types of models in real applications. There are of course other possibilities and there may be something practical in the stats literature.) Maybe someone can expand the article to give more specific warnings, but the present version does correctly say "approximate" in a portion of the text that is effectively the lead section for the article, where no more intricate information is really appropriate. Melcombe (talk) 09:14, 28 October 2010 (UTC)[reply]

Interesting side note: Today in class we (our professor) simulated sampling the deviance from a generalized linear model and showed that atleast for 100-5000 simulations the approximation as a chi squarred distribution actually gets worse(!). Thedreamshaper (talk)

That's not exactly precise information. Is "100-5000 simulations" actually refering to the sample size within a single simulation, or to the number of simulations of a fixed sample size. Since something is apparantly being changed, as referred to by "gets worse", what is being changed and in which direction ... and, of couse, what is being kept fixed? Clearly some ways of implementing an "increased sample size" scenario could well produce misleading results... the standard scenario would be one where the statistical information in the sample (about the parameters being fitted) grows proportionally to the sample size (this wouldn't be the case, for example, with a simple model for linear trend with time where an increased sample size might be modelled as an increase in the time-range of the observations, with equal fixed time-spacings). Melcombe (talk) 09:00, 29 October 2010 (UTC)[reply]

I have come across some additional texts on Deviance, as i now understand it SCALED deviance is distributed as Chi^2 while unscaled need not be(unless dispersion parameter=1). Could someone with a better understand add this to the article ? supposing i am correct i think alot of students are confused because some textbooks refer to scaled deviance as deviance and some other people then rightly claim that deviance is not chi^2 distributed. Causing confusion. —Preceding unsigned comment added by 80.163.18.152 (talk) 01:44, 19 December 2010 (UTC)[reply]

Deviance vs scaled deviance[edit]

I believe the definition given here is scaled deviance, not deviance. Deviance would be scaled deviance times the scale parameter in the GLM. Schomerus (talk) 18:27, 23 November 2011 (UTC)[reply]

This seems to follow on from the comment immediately above in that different sources may be using different definitions for deviance and/or scaled deviance. The definition presently in the article agrees with the citations given. Melcombe (talk) 11:25, 24 November 2011 (UTC)[reply]

Hmmm, McCullough and Nelder 1989, the source cited, defines scaled deviance as deviance over the dispersion parameter on pp 33-34, and it's clear there that

-2[\log \lbrace p(y|{\hat {\theta }}_{0})\rbrace -\log \lbrace p(y|{\hat {\theta }}_{s})\rbrace ]\,

is equal to scaled deviance. I haven't come across any sources which define it differently. But if the term is used inconsistently I would think that should be noted. Also of relevance to the discussion above, McCullough and Nelder on p 119 note, "The chi-squared approximation is usually quite accurate for differences of [scaled] deviance [for nested models] even though it is inaccurate for the deviances themselves." Schomerus (talk) 00:02, 25 November 2011 (UTC)[reply]

. . . which is another way of saying that the likelihood ratio test becomes unreliable when the number of parameters approaches the number of observations (i.e. saturation). Schomerus (talk) 15:43, 25 November 2011 (UTC)[reply]

The 1983 edition of McCullough and Nelder definitely defines "deviance" in the way presently stated, and has no "scaled deviance". I am unable to see specific pages in the Google citations you give, and still unable to access the 1989 edition. I have found [[1]], which uses thew "1983" version on page 15, and then does something very strange on page 17. However, I have also found webpages that seem to be using the "1989" form. Melcombe (talk) 10:56, 28 November 2011 (UTC)[reply]

I've had a chance to compare the 1983 and 1989 editions of McCullagh and Nelder. You're right that in the 1983 edition there appears to be no mention of scaled deviance. However I believe that was an error, because they give two definitions of deviance, which are not consistent with each other. On p 17 (1983 ed.) they suggest that deviance is equal to 2 times the difference of log-likelihoods and there is no mention of the scale parameter, but on p. 24 (1983 ed.) deviance is divided by the scale parameter. This error was evidently corrected in the 1989 edition, as 2 times the difference of log-likelihoods is defined explicitly as "scaled deviance" (D*) on p 24 (1989 ed.). I've also checked a few other references from Alan Agresti and Yudi Pawitan and they agree with the latter definition. At any rate, since the 1989 edition is what's cited, I think the definition should agree with what's given there.Schomerus (talk) 20:11, 28 November 2011 (UTC)[reply]

I have the impression that M&N tend to be sloppy with the terminology, which is leading to some of the confusion. For models that assume unit dispersion (eg Bernoulli, Poisson, etc), deviance and scaled deviance are of course equal. So in that context one can use them interchangeably, which McCullagh and Nelder seem to do (cf. p 119, 1989 ed.). When speaking of GLMs in general, though, it's important to make the distinction. Schomerus (talk) 20:48, 28 November 2011 (UTC)[reply]

Are there any other "text book" sources that might be consulted? I agree that M&N are a good basic source for GLMs, but there may be doubt about other contexts, where there is no obvious scale factor. Indeed, starting from scratch, it seems that the adjective "scaled" has been applied to the wrong version of "deviance". But your thought that the usage in the 1983 edition was an error is incorrect, as that is exactly the terminology being used at the time (I attended seminars given by Nelder): of course that terminology may well have changed later. However, as to other uses, I note that the few papers I have seen on "Bayesian deviance" use the twice log-likelihood basis: see for example Deviance information criterion and its references. I will post a note on the Stats project talk page to see if others can contribute here. Melcombe (talk) 10:07, 1 December 2011 (UTC)[reply]

Other text books I've checked are Alan Agresti, "Categorical Data Analysis" (2002), and Yudi Pawitan, "In All Likelihood" (2001). In the seminal paper from 1972, Nelder and Wedderburn (1972, J. R. Stat. Soc. A), do call 2 times the maximum likelihood "deviance" (without subtracting the saturated likelihood) so it seems that other authors have defined it differently since then. I'd suggest pointing out this inconsistency in the article, since it can lead to significant confusion. Schomerus (talk) 23:20, 2 December 2011 (UTC)[reply]

Is this how Deviance is defined?[edit]

According to p. 375 of (http://www.jstor.org/stable/10.2307/2344614?origin=crossref), Deviance was proposed and defined by Nelder et al. as: deviance= -2 * max-log-likelihood.

Furthermore, what is the meaning of the "{}"? What does log{p(y|theta)} mean? Does it mean "max likelihood" or simply "product"? Wouldn't it be better to clarify the meaning of "{}"? 147.8.182.107 (talk) 04:38, 8 January 2012 (UTC)[reply]

It's true that Nelder and Wedderburn do say "... the quantity

-2L_{max}

which we propose to call the deviance." But their next para starts, "Note that the deviance is measured from that of the complete model, so that terms involving constants, the data alone, or the scale factor alone are omitted." (The wikipedia article currently uses the term 'full model' rather than 'complete model'.)

The "{}" are just enclosing the argument of the logarithm function. If it causes confusion, maybe it should be changed to log(p(y|theta)). Qwfp (talk) 15:47, 8 January 2012 (UTC)[reply]

thanks. I think "{}" should be changed to "()". It is quite confusing, making me thinking it refers to either expected value or maximized likelihood. 147.8.182.48 (talk) —Preceding undated comment added 01:00, 9 January 2012 (UTC).[reply]

It is common in mathematics/statistics to use sequences of difference types of brackets, each at a different level of nesting, to make an equation easier to follow. Thus

[\{(...)\}]

is easier to follow than

(((...)))

. In mathematics/statistics, brackets are typically not used to denote expectation or maximisation: such things are always explicit in the notation. Melcombe (talk) 19:43, 9 January 2012 (UTC)[reply]

Lacking examples[edit]

This article is very important for diagnostics of GLM models. I've mentioned Wilks' theorem, but it should also have examples showing it. If someone else will get to adding it, that would be great. Tal Galili (talk) 20:02, 19 August 2018 (UTC)[reply]