User:Terra Novus/Pending Changes Compromise

From Wikipedia, the free encyclopedia

Hi! this is a brilliant resolution to the PC debate that has rocked Wikipedia for months. It was proposed by User:UncleDouggie.

Original PC Compromise Proposal[edit]

I propose to not have any articles designated exclusively for Pending Changes. Instead, start with the process as it was before the PC trial began and merely add an option to each filter in the Edit Filter to invoke PC for an edit instead of warn or disallow. In place of blanket PC protection, we would be free to construct elaborate regular expressions in an attempt to flag only high-risk changes for PC. It will never be perfect, but that doesn't mean that we shouldn't try to improve the current system. We need to continually adapt to vandal countermeasures without impacting good edits. Safeguards could be put in place such as firing limits and community review of PC filters. Semi-protection would still remain an option when needed.

It would of course be possible to construct a rule invoking PC on all anonymous BLP edits, but this would not be the intent of the system. Note that all BLP IP edits could be disallowed right now, subject to the edit filter firing limits.

I believe that adding the PC option to the edit filter would reduce the number of cases where semi-protection is needed, reduce the reviewing workload, and most importantly disrupt the vandal workflow on all articles in a way that no other current or proposed method can achieve. The trial feedback indicated that vandals tended to migrate from PC protected articles to those not protected by PC. This shouldn't have been a surprise to anyone, but it bodes potential doom for an expanded PC trial. It simply is not practical to use full PC on all articles as other editions have done. The English Wikipedia is the largest and its visibility will always attract a high degree of vandalism. We need to improve our automated tools and using the edit filter to flag high risk changes for mandatory manual review is a step in the right direction.

The current IP vandalism workflow seen frequently in the Edit Filter log is:

  1. Make obnoxious edit
  2. Receive disallow response from the Edit Filter
  3. Slightly tone down the edit
  4. Go to step 2 until just under the limit for triggering any filters that have an action of "disallow"
  5. Receive warning from the Edit Filter
  6. Click Save
  7. Get reverted (hopefully)

It would be much more valuable if step 5 was for the Edit Filter to just accept the edit as a PC. How would this be different from blindly applying PC to every article? Because many useful edits never trigger the Edit Filter at all and will not be subjected to PC overhead/bureaucracy/POV pushing!

Example: The Edit Filter should be analyzed as a complete system, not a list of individual filters. Many edits trigger multiple filters as shown here. Note that the edit at :48 was saved and subsequently rejected by a PC reviewer. This is not an endorsement of blanket PC. Quite the contrary: If filter 61 had an action of PC, the bad edit would have been caught without blindly applying PC to all IP edits. While that might not be the perfect use of filter 61 as it now stands, hopefully you get the idea of what is possible. Looking only at the article edit history in this case gives a distorted view of what vandals are attempting and the impact of PC.

It would be nice to have some stats on how many users ignore the edit filter warnings entirely and just click on save anyway. Also, what percentage of edits that triggered warnings are subsequently reverted? Having these stats will not change my proposal, but they will give us an idea of how much work we need to do on the filter set to make effective use of the new capability. --UncleDouggie (talk) 08:47, 2 September 2010 (UTC)[reply]

Prior feedback[edit]

This feedback was provided in Wikipedia_talk:Pending_changes/Straw_poll/Archive_2#Change_PC_to_an_Edit_Filter_Option
  1. I Agree: I oppose PC as is, but I support your idea and encourage Option 1 voters to consider this compromise if it is proposed in the future.....--Gniniv (talk) 08:57, 2 September 2010 (UTC)[reply]
  2. This makes a pretty good suggestion, if it were maybe a new Level-0 Pending Changes level or something. I would like to see the current Level-1 and Level-2 systems available as options just the same though. BigK HeX (talk) 09:19, 2 September 2010 (UTC)[reply]
  3. I like this. I don't know enough about the technical back end of the edit filter to know how tricky this would be but on paper (erm screen?) it looks good. Millahnna (talk) 11:20, 2 September 2010 (UTC)[reply]
  4. I Agree I think this is a great idea. And is something we should work towards whether or not PC is accepted. It would probably take some time to set up and customize. Information we gather from a longer trial of pending changes would be useful in it's implementation. Would allow the entire encyclopedia to be covered.Doc James (talk · contribs · email) 18:52, 2 September 2010 (UTC)[reply]
  5. Like others above me, I think this is an excellent idea, regardless of whether PC is implemented in a fashion as carried out during the trial. ialsoagree (talk) 20:54, 2 September 2010 (UTC)[reply]
  6. An interesting idea. Possibly the next proposal in this field. Septentrionalis PMAnderson 20:56, 2 September 2010 (UTC)[reply]

I had suggested something like that in 2009, Wikipedia:Deferred revisions, but there are technical limitations to overcome, and to work well it may require some sort of patrolled revisions, to give reference revisions to the filter. Cenarium (talk) 00:59, 6 September 2010 (UTC)[reply]

Ah, thanks for blazing this trail already. I see that it's not so simple, as evidenced by this statement in your proposal: "For reviewers, when the right revision in a diff is deferred, there is in this case the additional 'defer' level (before 'unreviewed'), and the revision can be brought back to unreviewed state if it is a false positive (if the page is not reviewable, there is only defer and unreviewed, if semi flag protected, there is also review)." Please don't try to explain it to me; I've read it four times and I can't take anymore. :-)
Let's step outside of this box in which there are no acceptable solutions for just a moment. There are many automatic and manual ways that edits currently get reverted. The problem is that bad versions can be visible for a time. Often this time is short. However, with the overload of things to look at, the time to revert can be also be very long. Immediate visibility to new users is already an illusion. Their shinny new edit may have been reverted by ClueBot or DASHBot before their browser refresh completes.
It would be nice if all mainspace edits were not publicly visible for two minutes by default. In that time, a bot or trusted user could flag the change as "deferred" pending detailed manual review. This would allow us to run bots of arbitrary complexity to identify suspect changes. Bots could also remove the defer flag on a change in cases when a later revert restores a reviewed version of the page. This will prevent unneeded manual review of changes. It would be best if the software would serve up the most current page to any IP user that has edited the page in the last 2 minutes to retain the illusion of visibility.
Having this capability would enable ClueBot to defer highly suspect changes that would otherwise violate its 1RR policy. When there are fast edits to a page by untrusted users, the visibility of all unapproved revisions should be delayed until two minutes after the latest change. This will prevent bad edits from being displayed just because a revert hasn't timed out yet or been approved. There have been several times that I wished there was a delay on my own edits. It's rather unnerving to refresh a Google search and see the edit I made 30 seconds earlier, which I now realize was horribly botched somehow. —UncleDouggie (talk) 15:12, 6 September 2010 (UTC)[reply]
I spent a couple of hours late last night with Huggle when there weren't many other patrollers online so I could easily see the majority of changes in real-time. I'm more convinced than ever that a 2 minute delay would save us an enormous amount of grief. I was reverting repeating vandalism 5 seconds after it happened. Of course, the backlog can be bigger at other times. But if we had a delay in non-autoconfirmed edits being publicly visible, we probably would have had almost no vandalism publicly visible last night. Furthermore, there was a substantial amount of chat style editing between kids. Perhaps regular chat programs are blocked on their computers, or they don't know how to use anything else. Introducing a 2 minute delay would put a damper on all the back and forth, which would mean fewer suspect changes for us to review. —UncleDouggie (talk) 05:25, 8 September 2010 (UTC)[reply]

Further discussion[edit]

The edit filter is an obviously powerful tool, but it has quite a few limitations. Most importantly is its performance limitations. Very few people know this (pretty much only the people that actually manage the edit filter), but there is actually a limit to how many decisions we can make. We get a maximum of 1000 "conditions" which is quite low. And then, of course, there's the fact that we have to make these filters run in milliseconds or it will slow down the entire encyclopedia. Most often the reason we don't have an edit filter for edits is not because we can't, but because it simply is not a good idea performance-wise. We would reach a point where we simply can't add any more pages to pending changes simply because it costs too much. Naturally there is also a limit on the pending changes implementation, too, but that limit is much, much higher, because the edit filter is much more general. Don't get me wrong, I think this is a great idea, but I also think it would be impossible to implement. Not because it's functionally impossible, but because the performance requirements are going to be too strong. --Shirik (Questions or Comments?) 16:45, 19 September 2010 (UTC)[reply]

Increasing the number of filters or their complexity decrease performance, but the kind of action it does (disallow, tag, or delay,..) doesn't affect it significantly. So I think performance is not more an issue here than in general for the edit filter, and we have several tag-only filters or variants which could use this action. Cenarium (talk) 21:10, 19 September 2010 (UTC)[reply]
True Cenarium, but the real power would come from enhanced rules and Shirik has provided some great feedback from previous efforts. I think that a combined edit filter/bot implementation with delayed visibility will give us the best tool set. The edit filter will have the capability of marking an edit as pending, but we won't count on it to implement the more advanced filters. We will leave that to the bots, just as they operate today, with the added capability that they can also mark a change as pending if something is suspicious but they aren't really sure. As others have pointed out previously, the bots can actually do the best job because they can access whatever history they may need to improve the decision process. The key to all of this is to delay public visibility of all non-autoconfirmed edits for a short time to give the bots a crack at them first. An extension of this concept is to delay visibility long enough to give Huggle user's a shot as well for edits that don't get marked as pending. The idea I mentioned on Jimbo's talk page about having the edit filter silently !accept changes it currently rejects outright is separate, however they could be integrated together. The developers may scream, but I'll help write and optimize the SQL if that's what it takes to get it done. —UncleDouggie (talk) 01:05, 20 September 2010 (UTC)[reply]
My point was really to point out that we really can't discard the current PC implementation and replace it with the edit filter. That will force us to make new filters. Instead, having both options would be great, but the edit filter can't replace the current implementation. --Shirik (Questions or Comments?) 02:37, 20 September 2010 (UTC)[reply]
I'm a bit confused. Having the edit filer and/or bots mark changes as pending will obviously require the current reviewer process to be in place, as updated in v2. While it would still be possible to mark an article so that all non-autoconfirmed edits go to pending automatically, my hope is that with the new tools there will be very limited cases where this is needed. The idea is to let the edit filter and/or bots act as an improved front-end so that the reviewers can focus on the truly suspicious changes. This will enable more articles to be protected without an explosion in the reviewing backlog. I think you might be trying to say something else, but I'm not sure exactly what it is. —UncleDouggie (talk) 02:50, 20 September 2010 (UTC)[reply]
I see nothing wrong with this idea in principle, and have no reason to vote against it, but I should make it clear that is logistically impossible right now for the software to do this, and it may or may not be easy for the developers to make changes that will enable it. A proposal to enact pending changes only via the edit filter would, from the standpoint of the editors editing today, be the same as a proposal to turn off pending changes entirely for the indefinite future. Thus I oppose.
It also leaves a few open questions. For one, if this proposal is enacted, it would have the potential to flag any edit by any user as "needing review", and the rules of who can edit through the protection and who could not could potentially be different for every filter that's running, so who would be the class of reviewers that would approve them? You could say that the groups affected could be restricted to "people who are not reviewers" or some subset thereof, but that would seem to be an argument against using the edit filter in the first place, since it would involve many repetitions of redundant code. It would probably be easier (and no more distant from our current implementation) to bypass the edit filter and code the relevant rules directly into the Pending Changes extension. This would also help us solve some of the other bugs in the filter, such as the fact that if an edit "exhausts" the condition limit, it is automatically allowed no matter what it is. Soap 10:32, 24 September 2010 (UTC)[reply]
Proper system development practice is define and agree on the requirements first and then go implement them. Just because the software doesn't support it today doesn't mean that we shouldn't head in this direction. "Reviewer" is already a trusted privilege and anyone with this right could perform a review even if it was their own edit that had been flagged by the edit filter. There is a separate discussion as to whether we have a higher level of reviewer, admins currently, for some edits. This has been proposed for all BLPs. Even if we did that, I would expect that the Edit Filter would not flag any edit made by a standard Reviewer. The purpose is to focus on high risk changes, and trusted users are very unlikely to go creating a mess. I think it would be better to place functionality that we can't fit into the edit filter into a set of bots than the PC extension so changes are easier and faster. —UncleDouggie (talk) 09:22, 25 September 2010 (UTC)[reply]
That doesn't really answer my objections. I'm not saying "we can't do this", I'm just saying it would cause more problems than it's worth. To the problems above with redundant code and the resulting time-out of edits that exhaust too many conditions, you could add the frustration of reviewers having to approve their own edits. This happens every once in a while with the edit filter already as is (the false positives page is where they go to report it), but whereas currently we try to minimize false positives as much as possible, your suggestion seems to assume that people would be willing to tolerate problems like this and not be frustrated or want to complain about the nuisance of having to approve their own edits. Soap 00:34, 27 September 2010 (UTC)[reply]
I don't think we need to have redundant code. I was trying to show that we don't need near duplicate filters for reviewers, because they are already trusted to approve their own changes. I propose that we don't actually make reviewers go through this step, their changes should be automatically approved just as they are currently. Perhaps it would help if you would give an example of what you consider a redundant filter as a result of implementating of this proposal. —UncleDouggie (talk) 04:28, 27 September 2010 (UTC)[reply]
Well, you'd still have to create a separate line of code within each filter to exempt people with the reviewer permission. You could say that this is a bug in the edit filter, because it should be possibly to simply ignore reviewers and not even have those lines of code be run, but the way the extension has been written means that every filter runs on every edit by everyone, even if the first line of code is a command that tells the filter to skip the rest of the code. Even those skip commands take up time, though, so it would be ideal if all PC-related filters were compressed into one filter or at most just a few different filters (but generally speaking, combining filters is difficult in itself because "if you exempt one, you exempt everybody" ... I can explain in more detail later, but I think that my main objection to your ideas isn't really related to the edit filter issue, so it isn't really important to go over the small details). Soap 09:43, 27 September 2010 (UTC)[reply]
Now I get it. I agree that adding a check to every edit filter isn't a good idea. The edit filter will need to be modified so it is capable of marking a change for review. I suggest that we let the regular filters run, but at the end of the process if a "flag for PC" filter has been matched, the edit filter should then query if the user is a reviewer before flagging the change. I believe this will run the fastest because the reviewer status will only be checked once, when necessary, and this is a logical extension of the reviewing workflow. —UncleDouggie (talk) 06:26, 28 September 2010 (UTC)[reply]
Also with regards to your list of 7 steps above, are you assuming that a vandal who receives a message stating their edit was approved for pending changes would assume that they "got caught" and stop trying to get through? I really wouldn't expect that, since presumably the PC approval message would be worded "positively". If it did make them think they got caught, they could just go on to vandalize other articles anyway, as you've said that they're doing currently. So I don't really think this solution can really be used as a tool to deter vandals completely. Soap 00:40, 27 September 2010 (UTC)[reply]
I propose that public visibility of all non-autoconfirmed edits be delayed for a short time to allow bots to evaluate if they require PC review, as well as potentially for Huggle reverts. Due to this, it will not be possible to inform the user that their edit has been accepted as a PC, nor do I think it would be desirable to do so. They should merely be informed that their edit is undergoing review, which normally takes a few minutes, and that they can check the status of their change on the article history page. In the meantime, we should show the latest revision to the user that performed the edit, just as we would for an autoconfirmed user. This will allow them to make any needed corrections or related edits immediately. Back-to-back edits should restart the delay clock so that we don't have bad edits become visible followed some time later by a correction. Pointing them to the history page may not be such a hot idea in situations were a bot rolls them back almost immediately, and I'm open to other ideas to disrupt their workflow as much as possible. Delayed visibility of history comes to mind, but I know that's fraught with peril. —UncleDouggie (talk) 04:28, 27 September 2010 (UTC)[reply]
On all articles, or just the ones that are defined by the code in the edit filters? I think almost nobody would want pending-changes-style protection on every page. Soap 09:43, 27 September 2010 (UTC)[reply]
I favor a flexible implementation in which we can have some filters with a low false positive rate execute on changes for all articles (or a subset of articles, such as adding "death" to any BLP), but limit the filters more likely to return a false positive to those articles at higher risk for vandalism or where there are large ramifications to vandalism. Note that the implementation will be a bit different if we rely on tools like STiki (described below) more than traditional edit filters. —UncleDouggie (talk) 06:26, 28 September 2010 (UTC)[reply]

Competition on Wikipedia Vandalism Detection[edit]

A competition was held in Padua, Italy on 22–23 September 2010 to evaluate automated tools for detecting vandalism on Wikipedia. The competition website states that it was the 4th International Competition, while the paper analyzing the results (PDF) states that it was the 1st International Competition. Either way, the results are very relevant to the issue of enhancing the edit filter or bots in support of automatically flagging high risk edits for pending changes. The results have been analyzed for accuracy of vandalism detection (TP in the PDF file) vs. false positives (FP in the PDF file). The conclusion was that a combination of all 9 submitted detectors can perform better than any single detector. The combined detector has performance of FP=20% at TP=95% (right-hand chart on page 11). A data point for higher detection levels is FP=35% at TP=98%. They did perform training of the tools, which could get into an arms race as vandals adapt their strategies. However, we would likely be able to maintain good performance against typical unsophisticated vandals. We would still need other ways to detect habitual abusers through analyzing editing patterns and issuing blocks. —UncleDouggie (talk) 00:12, 27 September 2010 (UTC)[reply]

The winning individual tool is described here (PDF), and the second place tool is here (PDF). The tools use different detection mechanisms. —UncleDouggie (talk) 00:29, 27 September 2010 (UTC)[reply]

For some reason, none of the links are functional for me? Ronk01 talk 00:48, 27 September 2010 (UTC)[reply]
Three of them are PDFs, which I have now annotated. All work fine for me. What error do you get? —UncleDouggie (talk) 01:02, 27 September 2010 (UTC)[reply]
Copied from Jimbo's talk page: Fascinating. I strongly support work like this!--Jimbo Wales (talk) 05:34, 27 September 2010 (UTC)[reply]
User:HaeB has pointed out that we already have an implementation of a sophisticated vandalism detector named STiki that operates similar to Huggle. I have asked the author of this tool (User:West.andrew.g) if he would please run it against the competition dataset. If it works well, we could very quickly setup a server running this tool and have it flag changes as requiring PC review. All it would take is to replace the GUI with the code needed to set the appropriate flag in the database. In the future, we could also incorporate some of the higher performing algorithms from the competition. —UncleDouggie (talk) 06:40, 27 September 2010 (UTC)[reply]
Definitely a neat idea. Having a metadata filter like STiki, regex filters like Lupin's badword list, and possible integration with editfilter type tests could create a semi-permeable fortress around the wiki. All editors would be able to work as normal, but certain edits would be automatically flagged for exclusion, review, or delay. Interesting stuff... Ocaasi 09:43, 27 September 2010 (UTC)[reply]
Am I correct in thinking that the tools developed in the competition would not be able to work inside the existing edit filter system? Is it technically possible to have some future edit filter system work with the tools? Does PC have the option to have a user right that allows marking edits as requiring review, and would it be possible to have a bot with the tools run through new edits marking certain edits as requiring review? If PC doesn't have the option for marking specific edits as requiring review, would it be possible to have it added to the version to be launched on November 9 in time? --Yair rand (talk) 06:25, 28 September 2010 (UTC)[reply]
Indeed, the sophisticated tools developed for the competition are not be able to work inside the existing edit filter system. Some of the tools make use of regex matching, but they all require greater capabilities. I don't believe that the existing PC extension permits users to flag a change for review. The developers have already had to reject some requested features because they can't be completed by November 9. Technially, this isn't a hard thing to do. I could take the back-end of STiki today, slap it on a server somewhere in the WMF data center, and call a database stored procedure (that I could write myself today from the published Wikipedia schema) to mark a change for review, if given the needed database permission. I realize this isn't a great solution from a security standpoint; I just mention it to show that we're not talking about a massive overhaul of the database here. This is why I'm against rushing into the next trial until we're really ready, including having done testing on the new reviewing user interface. —UncleDouggie (talk) 06:55, 28 September 2010 (UTC)[reply]
For those interested in STiki, its author and I have been engaging in a discussion on his talk page.
  • Support Brilliant compromise The Resident Anthropologist (talk) 04:25, 29 September 2010 (UTC)[reply]
  • Support Keep up the good work!--Novus Orator 05:35, 29 September 2010 (UTC)[reply]
  • Oppose absent an edit filter that can be designed to adapt. As I have pointed out several times before, an edit filter is utterly useless once details are known, so there would need to be a competent EF manager on at all times or else it is doomed to fail. I also oppose any CRASH initiative. —Jeremy (v^_^v PC/SP is a show-trial!) 19:39, 29 September 2010 (UTC)[reply]
    It's impossible to make an automated system to deal with vandals who actually work at it. PC, Semi-protection, smart edit filters, all can be broken by the kind of vandals you're thinking of. This isn't about that. This is about the 99.5% of vandals who don't think. --Yair rand (talk) 19:46, 29 September 2010 (UTC)[reply]
    Even the vandals who do not think are more than willing to follow instructions. Write a guide to vandalizing Wikipedia, post it on Encyclopedia Dramatica, and every yahoo with an internet connection and a single-digit IQ will come by, read it, and put the article to the test. —Jeremy (v^_^v PC/SP is a show-trial!) 20:03, 29 September 2010 (UTC)[reply]
    Those are not the kind of people that these systems are made to protect from. --Yair rand (talk) 20:22, 29 September 2010 (UTC)[reply]
    Then who are they meant to protect from? Bored schoolkids whose disruption can be curbed with a polite email to his school district? —Jeremy (v^_^v PC/SP is a show-trial!) 20:29, 29 September 2010 (UTC)[reply]
    Take a look at Hamburger for examples of what this system could do. School kids aren't going to be stopped by an email. There are multiple kids at the same school involved, they do it from home, etc. We don't want the vandalism to be seen at all. But if you're not going to bother detecting it at all, how will you know to send the email? Constructive criticism is appreciated, but if you are just going to bash every idea that doesn't include full page protection, then please go somewhere else. —UncleDouggie (talk) 01:22, 30 September 2010 (UTC)[reply]
    Douggie, chummer, I am willing to defend the edit filter. I was a manager long enough to know the filter's strengths and weaknesses, and I know that even the "simple" vandals Yair rand talks about frequently dodge the filters - I have seen this even after my deop. And several school districts view Wikipedia access as negotiable at best; reporting a misbehaving student to the school district's IT man ensures that the vandalism ceases. And at home, they have access to other forms of entertainment - TV, video games, sodomizing bullfrogs with lit firecrackers, etc. They're less likely to vandalize WP at home than at school - which is why vandalism always drops from June-August and picks back up again in September. —Jeremy (v^_^v PC/SP is a show-trial!) 02:12, 30 September 2010 (UTC)[reply]

Current status[edit]

I have been very busy experimenting with STiki and working with it's author on what can be accomplished. I'm busy on the edit filter front as well. I don't have much time this week, so it may be a little while before the next big step. —UncleDouggie (talk) 04:32, 4 October 2010 (UTC)[reply]

Could we get another status update? --Yair rand (talk) 00:16, 2 November 2010 (UTC)[reply]