User:DavidMcCabe/ThreadHistory

From Wikipedia, the free encyclopedia

This is a draft and nothing more.

Goals for thread history[edit]

We need to keep track of threads being inserted, moved, and deleted from thread trees. Edits are already handled by the ordinary revision systems in MW. Inserting new replies is handled somewhat too, but it would be nice to provide extra information about threading in the history lists instead of a generic article-created message.

A thread is a root post (the article containing the text itself), some metadata, and zero or more child threads.

When a modification is made, we want the 'watch' message to propogate up to the root, so that a user watching a superthread will see any changes below it. However, we only want to show one entry in the history per move.

Also, when a user looks at a thread at an older revision, they should see not only the older article content, but the older tree structure, with all replies in the places they were and having the content they had at the time of the revision. Every change within a thread should create a new revision of the root thread so that every state the thread was in can be viewed.

Therefore, since every change within a thread should create a new revision on the root, but not all of the new revisions from the place of modification up the tree to the root should be shown in history, we need to mark some revisions as being "spurious" and not noteworthy.

History is stored in two places in the existing MW: the revision table, and the recentchanges table. Everything except page history looks at recentchanges. It would be preferable if we could fit the data we need to present to the user into this table so that existing code can simply read it off. We might have a separate lossless history mechanism accessible by API if needed. There's also the logging table...

Versioned trees[edit]

Fields in thread_revision:

thread_id
rev_number
timestamp
superthread
superthread_rev_start not null
superthread_rev_end null
rootpost
rootpost_rev_start not null
rootpost_rev_end null

So then, if you have a tree like this, with threads identified as id@revision,

1@1
    2@1        sup_rev = 1 - 
        3@1    sup_rev = 1 -
    4@1        sup_rev = 1 -

And suppose we want to move 3 to be a child of 4:

1@1
    2@1        sup_rev = 1 - 1 
        3@1    sup_rev = 1 - 1
    4@1        sup_rev = 1 - 1
1@2
    2@2        sup_rev = 2 -
    4@2        sup_rev = 2 -
        3@2    sup_rev = 2 -

What we did is bump the ancestor threads of each post we want to modify, and limit the range of revisions of the old revisions to stop at the unbumped parents. Then we create a new revision whose range starts at the new parents.

The advantage of storing revision ranges instead of single revisions is that, if we modify a thread that has children, the children remain untouched. However, the ancestors up to the root change, but that's what we want anyways.

Starting from our original tree, let's move 2 to be a child of 4, and see how 3 remains unbumped. For good measure, we'll add an additional thread which is unaffected by the whole procedure. Before:

1@1
    2@1          sup_rev = 1 - 
        3@1      sup_rev = 1 -
    4@1          sup_rev = 1 -
    5@1          sup_rev = 1 -

After:

1@1
    2@1          sup_rev = 1 - 1
        3@1      sup_rev = 1 -
    4@1          sup_rev = 1 - 1
    5@1          sup_rev = 1 -
1@2
    4@2          sup_rev = 2 -
        2@2      sup_rev = 2 -
            3@1  sup_rev = 1 -
    5@1          sup_rev = 1 -

3's revision range is still set to '1 - ' because it is the child of 2@1 just as much as it is the child of 2@2. Only when it becomes the child of some other node will its revision range have to be bumped. Similarly, 5 is a child of 1 in both revisions 1@1 and 1@2, so its old range of '1 - ' still is true and it doesn't need to be bumped.

Now, on to the queries.

To select the immediate children of a thread with id T at revision R:

SELECT * FROM thread_revision WHERE superthread = T
         AND R IS BETWEEN superthread_rev_start AND superthread_rev_end;

To get the latest revision of a thread with id T:

SELECT rev_number FROM thread_revision WHERE thread_id = T
         ORDER BY rev_number DESC LIMIT 1;

To find the thread_revision of a newly-created reply, whose article record has id P, so we can show what it was a reply to in history:

SELECT superthread FROM thread_revision WHERE rootpost = P
         AND revision_number = 1;

TODO: put up actually interesting queries here.

The recentchanges table[edit]

Possible mapping for the recentchanges table:

rc_namespace => the talkpage namespace
rc_title => the talk page
rc_comment => user's comment plus maybe other info we add
rc_new => always false?
rc_cur_id, rc_this_oldid, rc_last_oldid => unknown
rc_type => 3 / log action?
rc_log_type => reply, threadmove, similar custom values?

It looks like the logging may be useful.