Talk:Power transform

Some mention should be made of John Tukey, because Box and Cox got the idea from Tukey's "re-expression" through the "ladder of powers." Tukey was one of the discussants of the original Box-Cox article. Box and Cox traded generality for the continuity condition at zero (where Tukey had his log transformation). As a consequence, their procedure requires an assumption of normality (which Tukey's doesn't). —Preceding unsigned comment added by 98.215.225.109 (talk) 01:16, 1 March 2009 (UTC)[reply]

Log[edit]

This page could use an explanation of why the log is used when the parameter value is zero. PDBailey (talk) 14:30, 9 April 2009 (UTC)[reply]

Done. DavidMCEddy (talk) 05:04, 27 March 2016 (UTC)[reply]

Box-Cox with shift[edit]

In the “more general form of the transformation that incorporates a shift parameter”, shouldn't the geometric mean include the same shift (to ensure positivity)? I.e.

\tau (y_{i};\lambda ,\alpha )={\begin{cases}{\dfrac {(y_{i}+\alpha )^{\lambda }-1}{\lambda (\operatorname {GM} (y+\alpha ))^{\lambda -1}}}&\mathrm {if} \ \lambda \neq 0,\\\\\operatorname {GM} (y+\alpha )\ln(y_{i}+\alpha )&\mathrm {if} \ \lambda =0,\end{cases}}

— Preceding unsigned comment added by Ged.R (talk • contribs) 16:18, 13 April 2012 (UTC)[reply]

Does Box-Cox the math (as shown in Wikipedia) work?[edit]

A colleague of mine is a PhD physicist. He offered the following critique of Wikipedia’s article on the Box-Cox distribution:

Wikipedia defines the Box-Cox distribution as “the distribution of a random variable X for which the Box–Cox transformation on X follows a truncated normal distribution.” The truncated-normal variate is defined as Y, which is the Box-Cox transformation of X: Y = (X^a – 1)/a if a is not equal to zero, else Y = ln(X). I denote the probability density functions (pdfs) of X and Y as fx(X) and fy(Y), and I use symbol a instead of Wikipedia’s notation lambda.

This is a textbook-level problem. Upon solving it, I find there are 5 errors in Wikipedia equation that ostensibly represents fx(X). The derivation goes as follows:

After equating the CDFs Fy(Y(X)) = Fx(X), differentiating both sides with respect to X gives

fx(X) = fy(Y) |dY/dX|.                                                                                                                      (1)

Direct substitution of Y(X) into Eq. (1) gives directly an expression for fx(X):

fx(X) = fy(Y(X)) |X^(a-1)|.                                                                                                                (2)

Here, we remind that Y(X) = (X^a – 1)/a if a is nonzero, else Y(X) = ln(X). Also, Y is undefined when X < 0, and hence Y must also be non-negative by dint of the Box-Cox transformation. The non-negativity enforced on Y forces fy(Y) to be NOT a Gaussian as one would have hoped, but a truncated Gaussian with truncation at Y = 0 and keeping only the part of the Y domain that is either greater than or less than zero. With this information, one can write Eq. 2 (with substitution from Eq. 3) understanding that Y is an implicit function of X. Also, in this case we can drop the absolute-value sign, so

   fx(X) = fy(Y) X^(a-1).                                                                                                                      (3)

In Eq. 3, we can substitute for fy(Y) the following:

       fy(Y) = (1/K) fyo(Y) =(1/K) [1/[sqrt(2pi s^2)] exp[-(Y – m)^2/(2s^2)).                                                                      (4)

Here, fyo is the pdf of the Gaussian that has not yet been truncated, m is its mean, s is its standard deviation, and K is a constant that normalizes the post-truncated Gaussian so its integral is 1 over the truncated variate Y.

In Eq. 4, the constant K has two possible values depending on whether we select the negative or positive half-line of the variate Y. (We consider only these two options.) Denote as Phi(z) as the standard normal CDF in z, in which case Phi((Y – m)/s) is the CDF of the un-truncated Gaussian fyo(Y) with mean m and standard deviation s. Then Phi(-m/ s) is the CDF of fyo at the cut point 0, which includes all of fyo(Y) to the left of 0. Accordingly, 1 – Phi(-m/s) is the CDF of the complementary distribution that includes all of fyo(Y) to the right of 0. Whichever decision is made, we make the truncation at 0. Note that we have freedom to choose the values of m and s. Accordingly, the choice to use the left side of zero receives the normalization K = Phi(-m/s), and the choice to use the right side of zero receives the normalization K = [1 – Phi(-m/s)]. This choice can be effected by the sign of a parameter f and the following expression for K:

K = 1 – I(f<0) – sgn(f) Phi(-m/s), (5)

where I is the indicator function (in this case evaluating to 1 when f < 0, else 0) and sgn(f) is the algebraic sign of f if f is nonzero, else sgn(f) = 0.

Our final expression for fx(X) is Eq. 3 with substitution of fy(Y) from Eq. 4 and substitution of K into Eq. 4 from Eq. 5. It differs in five ways from the Wikipedia equation:

1. Wikipedia omits the factor |dY/dX|, but should include it.

2. Where one expects Y in the exponential, Wikipedia takes (Y ^ f)/f, where f is called the “family parameter.” I think this is a mistake.

3. The argument sqrt(s) in the CDF Phi should be plain s. By writing the “dispersion factor” s with a square root over it, as Wikipedia does, it gives the impression that s a variance and not a standard deviation as it plainly is elsewhere.

4. Where the Wikipedia article introduces the indicator function, there is a hyperlink to the correct definition but the subset of text available to the floating cursor (prior to actual click) renames this term as “characteristic function.” The Wikipedia article on Characteristic function shows abundant definitions of the term, including the topic-relevant definition as the Fourier transform of a pdf. To avoid confusion, I recommend that the words “characteristic function” not be included in the floating text excerpted in the present article.

5. As used in the article, the standard normal CDF is not Phi, but Phi(z, 0,1); with general m and s , Phi(Y, m, s) is the general normal CDF in Y. A preferable notation might cite Phi(z) as the standard normal CDF in z, and hence the single-argument function Phi((Y-m)/s) is the general normal CDF in Y.

I also suggest a bit more specificity in the terminology. Given the specificity of the distribution fyo, there is no reason to call m a location parameter and not a mean; and no reason to call s a dispersion and not a standard deviation.

Still to be resolved: What can a truncated Gaussian offer in conceptual simplification of the problem of non-normal statistics? A truncated Gaussian is already VERY non-normal!

Dr. Michael H. Brill, (609) 375-6368, mhbrill2001@gmail.com

The Wikipedia editorial interlocutor for Dr. Brill is Research Psychologist (talk) 16:24, 9 July 2022 (UTC)[reply]