Talk:Latent Dirichlet allocation

	Linguistics portal This article is within the scope of WikiProject Linguistics, a collaborative effort to improve the coverage of linguistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LinguisticsWikipedia:WikiProject LinguisticsTemplate:WikiProject LinguisticsLinguistics articles
???	This article has not yet received a rating on the project's importance scale.
	This article is supported by Applied Linguistics Task Force.

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

Messy[edit]

If there is beauty in mathematics, it's at least not in these equations... It would already be much easier if it was reworked a bit. For example, the Dirichlet distribution just "flies in" halfway, suddenly setting an entire term to one. In the original paper, even in the appendix, there are intelligible sections like, I quote: "we use the general fact that the derivative of the log normalization factor with respect to the natural parameter is equal to the expectation of the sufficient statistics, ...". This wikipedia however seems to be written by a mathematical robot who is very apt in copying LaTeX equations around. Keep in mind, there are people who will read it! :-) Divide the algorithm in logical steps, name each of them. Start with some assumptions you will use later on. Don't repeat the same complicated terms over and over. Etc. Anne van Rossum (talk) 14:23, 10 March 2014 (UTC)[reply]

Initial comments[edit]

Is this related to Latent semantic analysis? Thadk 06:26, 3 November 2006 (UTC)[reply]

Yes and no. Yes in the sense that it stems from PLSI, which is the probabilistic sequel to Latent Semantic Analysis. No, because LDA is a hieararchic Bayesian model, when on the other hand LSA is based on singular value decomposition. Artod 19:08, 17 May 2007 (UTC)[reply]

A recent edit asserts that function words can be "filtered out;" given that these are high probabilty words, if for every topic multinomial their probably is low, this will lead to a lower overall likelihood for the data. Can someone explain what is supposed to be meant by "filtered out?" I didn't remove it, as it would be probably best to keep the point, though perhaps expressed more clearly. Ezubaric 01:56, 18 May 2007 (UTC)[reply]

I guess that filtering out can be done by examining the words whose variational parameters phi have a flat distribution instead of a spiked one. Not an automated task, though. Please refer to Blei et alii's paper as to what the variational parameters gamma and phi are. Artod 14:25, 4 July 2007 (UTC)[reply]

The new inference section is too messy. I think it would be best just to have the conditional distributions and not do the full derivation. Ezubaric (talk) 14:46, 22 October 2009 (UTC)[reply]

Is it even needed at all? If someone really wants to know, can't they look in the paper? –15:02, 22 October 2009 (UTC)

MULTINOMIAL distribution?[edit]

I am reading this article (together with the paper) and I am not sure about one thing.

Both the paper and the article say "chose topic $z_{n}\,\sim \,\mathrm {Multinomial} (\theta )$ ". I am thinking.... is it really Multinomial distribution? Don't we actually need just Categorical distribution?

--78.128.179.22 (talk) 15:35, 22 April 2011 (UTC)[reply]

UNIFORM Dirichlet?[edit]

I have removed the word "uniform" twice; because a Dirichlet distribution whose vector of parameters alpha is NOT made up of all ones is NOT uniform. — Preceding unsigned comment added by 91.17.207.163 (talk) 12:40, 9 July 2011 (UTC)[reply]

Where do the Betas come from?[edit]

Isn't there one Beta for each topic? How many topics are there? It's also drawn from the Dirichlet I suppose? As of now, the 'Model' section doesn't explain this clearly. I'm going to make a change, I think/hope I'm correct! Aaron McDaid (talk - contribs) 14:21, 25 October 2011 (UTC)[reply]

Inference Questions[edit]

1. How could the gamma function be ignored at the 2nd-last step of the final inference?

2. Right before the "Finally..." line, the joint-probability ignores the selection of k. Does it make sense?

Thanks for your attention and look forward to your reply, Best regards Liu.Ming.PRC@gmail.com — Preceding unsigned comment added by 129.132.39.207 (talk) 14:44, 14 February 2012 (UTC)[reply]

Clarity[edit]

The article does not distinguish between the prior values for the alphas and betas and the posterior values. This makes it extremely hard to understand. — Preceding unsigned comment added by JoDu987 (talk • contribs) 20:12, 9 April 2012 (UTC)[reply]

is this correct?[edit]

K-dimension vector of probabilities, which must sum to 1, and K-dimension vector of probabilities, which must sum to 1.

However in the math notation it uses M instead of K. — Preceding unsigned comment added by 193.206.170.181 (talk) 21:53, 13 February 2013 (UTC)[reply]

Inconsistent phi?[edit]

One of the phis is undefined. Is φ meant to be the same as the one that looks similar to Ø (not sure how to type the exact symbol here)? — Preceding unsigned comment added by 24.130.129.142 (talk) 19:25, 19 January 2014 (UTC)[reply]

I also don't see any difference in meaning and so changed all ϕ's to φ's. Alæxis_¿question? 10:29, 26 August 2015 (UTC)[reply]

Inconsistent indexing[edit]

The readability of the article would be greatly improved if consistent indexing of the variables was used in each section. The variable indexing in the first part of the Model section is different than the Mathematical definition portion. For example, first $i$ , then $d$ is used to index documents. Similarly for the identity of a of a topic in a document, first $j$ and then $w$ are used to index the position of a word in a document. Further, the variable for the value of a word, $w$ , uses the symbol $w$ again as its second index denoting position in the document. For clarity and to be consistent with the previous notation, $j$ should be used for the index. The indexing notation changes twice again in the Inference section. Rich2718 (talk) 21:55, 23 January 2016 (UTC)[reply]

Definition of $φ$ [edit]

In the definition of $φ$ in the Model section, it is called a "Markov matrix (transition matrix)". However, Markov matrices contain the probabilities of transitioning from one state to another in a Markov Chain and are square. In this case, it is suggested to call $φ$ an emission or output matrix. That is, it contains the probabilities of observing word, $w$ , emitted or output by the topic (latent state), $k$ . This nomenclature would then be consistent with the related Hidden Markov models.Rich2718 (talk) 22:39, 23 January 2016 (UTC)[reply]

Number of unique words vs number of word tokens[edit]

These two things are often confusing, if you just use "number of words" in the article. For example, I thought N_d is the number of unique words in document d, but later I saw N_d sums to N, so obviously it should be word tokens (otherwise summation makes no sense). I think the article should make it clearer when necessary. — Preceding unsigned comment added by 2601:19C:4682:FA40:5D0:8C19:566B:9931 (talk) 07:37, 7 May 2017 (UTC)[reply]

Assigning proper credit[edit]

The article as written primarily credits Blei et al. for this technique and notes that it was also proposed by Pritchard et al. However, the article notes the the Pritchard et al. paper is chronologically prior and has more citations, which makes this choice seem odd. Unless there is a specific reason to do this, I propose we primarily credit the original, more highly cited, article. Stellaathena (talk)

Broken References[edit]

The DOI for the Blei et al. article is broken/defunct (currently set to "10.1162/jmlr.2003.3.4-5.993"). Does anyone know the correct DOI? If not, it should be removed. --Showeropera (talk) 13:19, 25 August 2018 (UTC)[reply]

Should the article be renamed so that Allocation is capitalized?[edit]

It seems like it should be since the A is part of the initials in LDA. --LogicBloke (talk) 20:50, 18 April 2021 (UTC)[reply]

That doesn't seem like a good reason to me. I think LDA is just the abbreviation and the full name is "latent Dirichlet allocation". For example, this paper cited by the article begins: "We describe latent Dirichlet allocation (LDA), ...". Falsifian (talk) 22:43, 18 April 2021 (UTC)[reply]

Anyone considered adding a sentence about the etymology of the term?[edit]

As of today, the word "Dirichlet" appears 28 times on the article page, but there isn't any reference to Peter Gustav Lejeune Dirichlet himself. Maybe I'm just retarded, but as I was reading the article, the question "who tf is this Dirichlet nigga?" was burning in my mind. 2601:2C6:4A80:68F0:F8F3:6837:B220:C190 (talk) 18:47, 25 August 2021 (UTC)[reply]

I'm guessing it's named after the Dirichlet distribution rather than directly named after the person. Either way, a brief mention of the etymology could be nice, but I don't have time to verify what that is right now. Falsifian (talk) 03:19, 26 August 2021 (UTC)[reply]