Talk:Persistent data structure

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Wiki Education Foundation-supported course assignment[edit]

This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Ethan McCue.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 06:26, 17 January 2022 (UTC)[reply]

I think this article needs to be merged from or to Purely functional. Persistent data structure is a better name for the page (functional is an adjective which therefore violates Wikipedia naming conventions). On the other hand, purely functional has some nice diagrams which I drew to explain the concept :-) What does everyone think? Richard W.M. Jones 18:52, 17 January 2006 (UTC)[reply]

I've never seen "persistent data structure" used in the way this article describes. It may be used that way in some obscure book or two, but in my experience, a persistent data structure is one that is automatically saved when the program exits, as is described in the link to persistence. Dave Gudeman 04:40, 29 August 2006 (UTC)[reply]
I agree, the first time that I learned about this term was on the late 1980s because one friend of school made some work on that subject and it was in the sense mentioned by Dave Gudeman. Maybe it is more correct to say "persistent data structures in functional languages" or "persistent data structures in purely declarative languages", but that may be also wrong because they refer to a different notion of persistence. I don't know the book that was used to define the name, but it is highly possible that the authors only used that name in the context of their books mentioned by Deco bellow, not as a general term in computer science.
It's a matter of your field of study. The sense in which you use it is encountered in some papers on OODBMS's. But the functional data structure meaning is widely known in research (everyone I've talked to about them immediately understood) and used in an entire subfield of widely cited papers. :To list a few:
  • J. Driscoll, N. Sarnak, D. D. Sleator, and R. Tarjan. Making Data Structures Persistent. Journal of Computer and System Sciences, 38:86--124, 1989. [1] (123 citations)
  • N. Sarnak and R. E. Tarjan. Planar point location using persistent search trees. Communications of the ACM, 29(7):669--679, 1986. [2] (115 citations)
  • P. F. Dietz. Fully persistent arrays. In Workshop on Algorithms and Data Structures, volume 382 of Lecture Notes in Computer Science, pages 67--74. SpringerVerlag, August 1989. [3] (28 citations)
  • J. Driscoll, D. Sleator, and R. Tarjan. Fully persistent lists with catenation. Journal of the ACM, 41(5):943-959, 1994. [4] (22 citations)
  • Okasaki, C. (1998). Purely Functional Data Structures. Cambridge University Press. [5][6] (104 citations) Used throughout this seminal book, occuring dozens of times. Section 9.2 is entitle "Persistent Data Structures".
The name collision with the idea of disk-persisted data structures is unfortunate, and I wouldn't object to adding a note to clarify this. Deco 05:36, 29 August 2006 (UTC)[reply]

Persistent data structure and Purely Functional should not be merged[edit]

Purely Functional describes how 'purely functional' computer languages deal with variables. So 1) the title Purely Functional is perfectly valid; it is the name of a category of computer programming languages, in the same way as Object Oriented, Imperative, Declarative, Duck-typed, etc. 2) This article should not be merged with any other discussion of data structures because this article is not about the data structures themselves, but rather the way that purely functional computer languages deal with them - whether or not they are persistent. —Preceding unsigned comment added by 202.63.49.61 (talk) 10:45, 21 January 2010 (UTC)[reply]

Not all purely functional data structures are persistent[edit]

For example, consider a queue. A standard representation user two lists, one reversed. When adding an element we add it to the first list, when removing an element we remove it from the second. When the second list is empty and we want to remove an element from the queue, we have to reverse the first list (reversal is O(N)), and this list becomes the new second list. We can remove the element from the second list now. Because we don't have to reverse most of the time, this queue is amortized O(1). But occasionally O(N)!

This queue is *not* persistent: what if we have a queue with an empty second list, and a nonempty first list of length N. When we use this list several times, and remove elements from it several times, the remove takes O(N) time. So this is no longer an O(1) queue! —Preceding unsigned comment added by 84.25.34.87 (talk) 18:31, 12 October 2007 (UTC)[reply]

If you want to claim non-persistence, then why do you argue inefficiency of implementation? Nothing prevents this data structure from being persistent AFAICT. --MarSch (talk) 12:06, 13 February 2008 (UTC)[reply]
Persistence has nothing to do with the amortized runtime of the operations. Additionally, you're wrong - the remove takes amortized O(1) time - this means that a sequence of n insert and remove operations takes O(n) time. See amortized analysis. Dcoetzee 20:10, 13 February 2008 (UTC)[reply]

Not persistant at all[edit]

I would argue there is no implicit persistance. Although there is internal reference sharing, as is the case with list types in ML, there is no way to implicitly refer to these elements in the list. The programmer would have to declare variables for every sublist they wanted a reference to, but changing that reference would create a new list. Specificaly a list which references the values of the old list to the location where they differ and adds its unique elements and is therefore considered a new list. At this point the programmer is at a loss without language support for such a feature. There are similarities between the notion of persistance and functional data structures but nothing tangibly usefull. Every change made to a data structure in a functional language returns a new reference to the updated structure and provides no way to acceess the version previously modified. Now consider garbage collection on unreachable references, the removed element of the list.

In a functional language you don't ``change`` data structures, you create new ones. If you create a new data structure from an old one there is no reason why your reference to the old one should be lost unless you choose to lose it. --MarSch (talk) 12:03, 13 February 2008 (UTC)[reply]

Vlists[edit]

I have removed the reference to the VList data structure. Vlists are not really a "persistent counterpart to the array". They are not persistent at all if you update them as an array. They do allow O(1) random read access if you update them as a list, but that's only if you update them in a partially persistent way. If you update them in a fully persistent way, then the random read access is not efficient. —Preceding unsigned comment added by Ken Hirsch (talkcontribs) 07:40, 21 May 2008 (UTC)[reply]

How (much more or less) efficient in practice?[edit]

"While persistence can be achieved by simple copying, this is inefficient in CPU and RAM usage" While true, I came here from Rope page and in general I would think as for Rope, some operations are slower (there e.g. indexing). Some data structures might be accessed a lot such as strings (ropes) in an editor say, but updated/edited at a human speed (to be fair data structute would also be read infrequently, when e.g. scrolling page?).

It seems to me that a structure based on pointers such as ropes and all persistents(?) would have less locality of reference than some simpler ones such as arrays. In general a hybrid might reduce that drawback arbitrarily (but possible and still be persistent).

I could look into all of this (anyone have answers and they belong here?); in general it seems to me lazy languages such as Haskell have not taken the world by a storm and the newer eager Julia is, and it is fast. I wander if the eager, side-effect functional variant is the way to go (want to find out if Julia has a fatal flaw..). comp.arch (talk) 18:38, 1 November 2014 (UTC)[reply]


Section on modern usage?[edit]

I was thinking that this article could be more useful with a description of Persistent Hash Array Mapped Tries and how they and other persistent data structures have been adopted by languages like Clojure, Scala, Java, Javascript, etc. I know at least that the "Persistent Map" based on that technique has a good deal of adoption in the React community. Curious as to anyone elses thoughts on that. — Preceding unsigned comment added by Ethan McCue (talkcontribs) 21:28, 9 October 2018 (UTC)[reply]

Intro too long?[edit]

The intro blurb is 4 paragraphs and mentions efficiency issues. Not that these aren't worth noting, but it seems dubious that this information would be useful at a glance. Maybe it should be broken down into a new or existing section? — Preceding unsigned comment added by Ethan McCue (talkcontribs) 19:07, 26 November 2018 (UTC)[reply]


Persistency vs Immutability[edit]

The page contains multiple statements like this one:

> Haskell is a pure functional language and therefore does not allow for mutation. Therefore all data structures in the language are persistent, as it is impossible to not preserve the previous state of a data structure with functional semantics.[15] This is because any change to a data structure that would render previous versions of a data structure invalid would violate referential transparency.

I have the feeling that the terms 'persistency' and 'immutability' are here conflated: It would be more correct to say that a pure language requires _immutable_ data structures (to not violate referential transparency), and that persistent data structures are the only (known) _efficient_ implementations of these.

Qqwy (talk) 07:02, 13 May 2019 (UTC)[reply]

Copy on Write has exponential performance characteristics?!?[edit]

There is a claim in about the performance characteristics of Copy on Write algorithms. It is already marked for "Citation needed". I think it is factually false, as the CPU complexity of such a write is proportional to the number of writes and the length of the data structure. As the lengths of the data structure itself is proportional to the number of writes, we can say it is O(n^2). For space complexity, it is also O(n^2) for similar reasons.

I am not an editor, and I have very little experience on how such issues should be handled. — Preceding unsigned comment added by 2001:4C4C:20AD:E900:19A0:D648:E263:AD66 (talk) 11:40, 17 November 2019 (UTC)[reply]

You're right, cannot see how that could be true. I changed it to O(m·n). bungalo (talk) 19:08, 26 November 2019 (UTC)[reply]

Fat nodes and A combination are only partially persistent..?[edit]

To my understanding, based on the description, Fat nodes and A combination are only partially persistent -- I.e. one can modify only the latest version of the data structure. That is because they assume that the time dimension is linear rather than a DAG. The article doesn't mention this though. In principle one could adapt Fat nodes to hold an entire revision DAG in each node to make them fully persistent, but I cannot think of a similar modification for A combination. Please correct me if I'm wrong. bungalo (talk) 19:02, 26 November 2019 (UTC)[reply]

Combine COW and Path copying?[edit]

Path copying is a copy-on-write that copies a node when it's modified. Should be combined or mentioned in the article? bungalo (talk) 19:36, 26 November 2019 (UTC)[reply]