[Wikipedia-l] Web spiders vs Wikipedia

Brion VIBBER brion at pobox.com
Fri Nov 1 00:24:03 UTC 2002


I noticed this afternoon that something at IP 144.167.21.15 was 
spidering the site, loading up thousands of page diffs, user 
contributions pages, and other slow things at a rate of several per 
second -- apparently as fast as it could get them in. I blocked its IP 
from access to the /w/ directory (so it can only access regular pages 
and default-view special pages via the / and /wiki/Foo paths; I put a 
general prohibition into robots.txt as well), and the server load has 
gone *dramatically* down.

It appears to be someone running 'WebStripper' trying to copy the whole 
site; either it doesn't have sane throttling controls or they've 
disabled it.

The IP is an unnamed host belonging to University of Arkansas at Little 
Rock; probably some college kid enjoying the wonders of uni network 
bandwidth.

-- brion vibber (brion @ pobox.com)




More information about the Wikipedia-l mailing list