User:PNG recompression

From Wikipedia, the free encyclopedia
PNG recompression
This user is a bot
(talk · contribs)
OperatorA proofreader (t · c)
Authoridem
Approved?no
Flagged?no
Task(s)Losslessly recompress PNG images
Edit rateAs fast as possible (will be changed to 1 per 10 seconds)
Edit period(s)Once every month
Automatic or manual?Automated
Programming language(s)Java
Exclusion compliant?Yes
Source code published?Here (will be moved to Wikipedia after adjustments)
Emergency shutoff-compliant?Not yet

PNG recompression is a bot that will be going through the bot approval process and whose sole purpose will be to losslessly recompress all PNG images on the English Wikipedia using the open-source tools OptiPNG, advdef and advpng.

Results expected[edit]

On average, a PNG image recompressed by this bot is expected to be shrunk by 15% of its size, unless it has been recompressed already, in which case this bot will not re-upload the image.

Caveats[edit]

As it is currently written, the bot uses OptiPNG, which strips all ancillary chunks in PNG images, which may remove the meaning of an image when used in certain pages. For example,

OptiPNG additionally removes the color data for fully-transparent pixels, which may remove the meaning of certain images, for example

In all cases, the meaning of other images may be removed, for example

  • images whose goal is to show PNG images that are not recompressed, if any exist.

Server load expected[edit]

The initial run will read all images from the wiki using the MediaWiki API; however,

  • it will kill connections that are proven not to be downloading PNG files after reading 8 bytes;
  • it will not unnecessarily read files whose size is under 8 KB;
  • because the content of PNG images is not compressible, gzip compression will not be requested;
  • it will only re-upload images if the upload would save over 10% of the original image's size.

Futue runs will be able to skirt many downloads, recompression passes and uploads:

  • it can avoid reading a file if its last revision, as indicated by Special:Allpages, was made by this user;
  • it can avoid reading a file if the SHA-1 hash of its last revision, as indicated by the MediaWiki API, matches the SHA-1 hash of the last revision it has seen, even if it was not re-uploaded to the wiki because it did not save enough bytes;
  • it can avoid reading a file if the timestamp of its last revision, as indicated by Special:Allpages, matches the timestamp of the last revision it has seen, even if it was not re-uploaded to the wiki because it did not save enough bytes.

As this bot is expected to create an additional revision for about half of the PNG images on this wiki, disk usage on the Wikimedia server farm may become a concern.

During the uploads, SHA-1 hashes will be recalculated and some database operations will take place, which may place load on the CPU and disk.

As this bot breaks caching by making browsers download cached images again, and a viewer may be expected to download a few full-sized images per visit, bandwidth on the Wikimedia server farm may become a concern for a short while. This bandwidth spike will be distributed more or less evenly by the fact that not all PNG images are re-uploaded at once.

Source code[edit]

For the time being, the source code for PNG recompression is hosted on an external wiki, on which it is currently running. Please see here for the initial code. Also see PNGOptimisationBot (t · c).

Adjustments to be made[edit]

  • Change the upload rate to be 1 in 10 seconds.
  • Use the maxlag parameter, requiring a maximum database replication lag of 3 seconds.
  • Possibly adjust the ancillary chunks removed by the tools, replacing OptiPNG with a tool that can preserve the chunks.
  • Add the ability to disable the bot by posting a message to its talk page.
  • Post the source code after modifications on the English Wikipedia.