Talk:Deterministic acyclic finite state automaton

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Trade-off between DAWG and trie[edit]

It seems to me that there is a complication in replacing a trie with a directed acyclic word graph, but this is not mentioned in the article. Specifically, allowing nodes to have multiple incoming edges results in nodes that inherently represent (or store) multiple keys (e.g. both desertion and desertification, as in the article's example). This forces these nodes to be buckets, in the same way that a hash table has buckets for hash collisions. These collisions introduce a bit of extra complexity when growing or shrinking this data structure. When, where, and how should paths in the graph be split (to reduce the number of collisions) or joined (to preserve the storage space benefit)? In short, what are the algorithms associated with this data structure? Can someone with access to the article's reference elaborate on this? JCarlos 04:39, 22 October 2007 (UTC)[reply]

Contradiction in opening paragraph[edit]

Am I missing something here? The opening paragraph seems to have an inherent contradiction (emphasis mine):

"... It is used to represent a set of strings and supports a constant time search operation. The lookup time is proportional to the length of the search string and is the same as an equivalent trie."

Not to mention the fact that a DAWG doesn't look like it would support a constant-time search operation. --ian (talk) 18:00, 4 December 2007 (UTC)[reply]

It's constant w.r.t. the number of words in the DAWG. --196.210.102.113 (talk) 11:22, 26 January 2008 (UTC)[reply]

Infixes?[edit]

If I'm reading the definition correctly, paths in DAWGs may also share infixes. Isn't that mentioned in the literature? Regards, Paradoctor (talk) 13:10, 21 November 2009 (UTC)[reply]

Diagram incorrect[edit]

In the DAWG on the right (in the diagram), I am pretty sure the O and the A would need to have separate nodes, and not simply be differently labeled links to the same node. Both O and A would link to the same P node and S node though. —Preceding unsigned comment added by 67.169.75.188 (talk) 04:11, 25 March 2011 (UTC)[reply]

No, I'm pretty sure the diagram is correct.   –Justin Force 23:59, 25 April 2012 (UTC)[reply]

Badly Named?[edit]

Technically its a letter or character graph not a "word graph" ? Because the adjacent is expressed between nodes composed of letters of characters? — Preceding unsigned comment added by 216.239.45.4 (talk) 19:52, 7 July 2011 (UTC)[reply]

Conflation of two distinct data structures[edit]

This article seems to conflate two related but distinct data structures: according to DADS, a DAWG is either a DFA that represents suffixes of a string, or a minimal DFA that represents a set of strings (called a "dictionary" or DAFSA by Daciuk et al.). The article describes the latter, but the references mostly pertain to the former. Qwertyus (talk) 12:00, 25 December 2012 (UTC)[reply]

Fixed by moving and editing directed acyclic word graph to the current article, then recreating the other article by moving some content there. Qwertyus (talk) 12:35, 25 December 2012 (UTC)[reply]

Disputed external link[edit]

There is a dispute over the addition of the external links http://www.pathcom.com/~vadco/dawg.html, http://www.pathcom.com/~vadco/cwg.html and related material [1]. These appear to be undue, and to have been added by the author of the material in question. So far two editors have suggested that the material is given undue weight. Deltahedron (talk) 17:04, 2 February 2013 (UTC)[reply]

This is JohnPaul Adamovsky, and I am not going to pretend to be somebody else. It is absurd to claim that something appears "undue", when you have not read it first. I have taken a look at the pictures of David Eppstein, which he posted on his web page, and I could easily speak at length about how he "appears", but that kind of bogus speculation does not belong here. Keep your idle speculations to yourself. I have conducted several years of research in the realm of lexicon data-structure optimization. The link I wish to add to this article opens up a detailed understanding of how to write computer code in C to actually create a compressed suffix graph using basic integer arrays. I have published thorough documentation about the objectives of using DAWG's, the creation process, the data storage options, and the benefits / drawbacks of using this type of data-structure. For example, many readers seem to think that there may exist an easy way to make the DAWG into a dynamic structure which can have words added and removed on the fly. As of now, DAWG's are constructed once, and then remain static; modification of any kind, especially with integer array encoding would demand a complete reconstruction of the entire graph. Further, I have meticulously documented the code and framework which uses a novel hash function encoding of DAWG to solve the popular 5x5 Boggle Board optimization NP-Complete problem. There is no other way to analyze 70,000 5x5 Boggle boards per second using a PC. In the year 2007, I was searching the internet for the kind of documentation which would show me how to build a DAWG. I did not find this documentation in the academic articles available, so I conducted years of research, wrote the documentation myself, and made it publicly available to anybody with a computer and knowledge of Google and Wikipedia. If you are smart enough, and you want it bad enough, then you can use my documentation to construct your very own DAWG. Perhaps an analogy will assist David Eppstein in understanding why the link to my DAWG web page must be included in the Wikipedia articles: Enzo Ferrari was the kind of man who could tell you about the exotic high-performance supercars which he designed, built, and sold to his millionaire automotive enthusiast customers, whereas I am the kind of man who invites you to take a tour on my Batcave, where I will show you how to build a bad-ass ride, and within a reasonable amount of time you'll be driving your very own Batmobile. I have a deep connection with high-performance data-structures, and I wish to share this enthusiasm with people outside of career academia, who do not have three years of their lives to dedicate to researching this field on their own. I invite you to set aside the elitist academic snobbery, read my work, and present an informed case as to why links to my work do not belong on Wikipedia. Please do not refer to "default position"... I associate the default position of computer people to be playing video games, punctuated by jacking off to internet porn. The right thing to do is to do your research and figure out that the links I posted BELONG in the Wikipedia articles. I apologize for the "edit war", as this is the first time I have used "talk" pages, so it will not happen again. I shall restore the links until you have taken enough time to come to an informed conclusion about their legitimacy, then we shall resume this discussion. I am of the opinion that plagiarism is more likely to happen at UC Irvine than it is at Stanford, so a professor who is getting sick of seeing my code submitted over and over again in his graph theory course should work a little bit harder, and then apply for a job at a better college. Burying information is an illegitimate solution for plagiarism NoblerSavager (talk) 03:12, 3 February 2013 (UTC)[reply]

See also Talk:Directed acyclic word graph, where the same user left the same long comment. I think we should discuss the issue here, not there, for reasons I stated there. Your "I shall restore the links until we come to a decision" stance is the opposite of our standard practice; see WP:BRD. As for your references to masturbation, and the weak attempt at an insult to my employer, please see WP:CIVIL (although you are improving in this respect: this was far more well-reasoned and less full of empty rhetoric than the emails you sent me previously). I'll just note that the assumption that this is about preventing cheating in my current class is far off base: I don't teach string data structures in a graph algorithm class, it's a theory class that doesn't require coding, my homework policy allows the students to get the answers wherever they can regardless of whether they're just copying and pasting, and I find the thought of preventing cheating by keeping relevant information out of the students' hands to be highly repugnant; it's the opposite of what teaching should be. (I'm not yet responding to the actual content of your comment, but likely will later.) —David Eppstein (talk) 03:26, 3 February 2013 (UTC)[reply]

As I posted on the DAWG "talk" page, I believe it is perfectly acceptable for two different things to have the same name. DAFSA is a weak acronym, and the DAWG article seems to be stuck as a stub, so I recommend that we move this article back to where people are going to look for it. I'm sure that John von Neumann is cool about how my family calls me John. As for the links, I replaced them with a much more modest wording, because this is not my domain here, but I appreciate the work of contributing members like David, and I look forward to reading his analysis of what he finds on the web pages I published. I can use my own aggressive-provocative-outrageous writing style in personal emails and on my own web pages. Thanks for hearing me out, and remember that every war has at least two armies, that and playing the blame game doesn't bring the innocent villagers back to life. NoblerSavager (talk) 04:47, 3 February 2013 (UTC)[reply]

Cyclic link in first paragraph[edit]

The link to 'related data structure' in the first sentence links to itself (this article) via a redirect. I'm not sure it there used to be a separate page for DAWG data structures, but it now redirects here. — Preceding unsigned comment added by Knu0 (talkcontribs) 22:12, 23 May 2015 (UTC)[reply]

Thanks for spotting that; I restored the article at Directed acyclic word graph. Somebody redirecting it here, thinking the two structures are the same. QVVERTYVS (hm?) 08:27, 24 May 2015 (UTC)[reply]

References[edit]

Blumer, A.; Blumer, J.; Haussler, D.; Ehrenfeucht, A.; Chen, M.T.; Seiferas, J. (1985) is about the other DAWG. I'd be inclined to refer to How to squeeze a lexicon instead, were I not its co-author. MCiura (talk) 04:49, 18 October 2016 (UTC)[reply]