Occasionally someone asks us how to download the entire IF Archive at once.
Currently the only supported answer is to use a web-scraping tool based on the public index pages or Master-Index.xml. I know from our bandwidth usage that people sometimes do this.
As an experiment, I have created a single-file download source. For information about how to get it, read this page.
I plan to update that download weekly.
(The experiment is (a) does this reduce scraping of the main Archive server? And (b) how much does it wind up costing us in AWS hosting fees? I’ll report back in a couple of month.)
I appreciate what you did, truly, but I’m not about to download 30GB worth in one sitting. In fact, I keep thinking that anything over 500MB is rather stiff. I’ve downloaded 6GB zip files before and the experience were always unpleasant, regardless of my available bandwidth.
Any chance of splitting the zip files to their top most directories? There are directories that I want to download wholesale, but not all of them!
Actually, depending on the directory, it may even make sense to split the subdirectory further, as well as providing a “new file addition” so people can just download them monthly and be up to date.
It’s safe to say that if you’ve never tried to figure out how to download every Archive file with a script, you’re not the target audience for this announcement.
That wouldn’t change much, as /games would be on the high side of 25 gigabytes by itself.
I don’t want to get into trying to guess what subsets people want. That really is a job for a scraping script. “Recent files” is possibly a workable idea, though – thanks.
Rsync may be overkill if done regularly by a lot of people. Maybe those who download the whole site can be persuaded to open up their site as mirrors?
Maybe monthly patches (diff) is desirable? I don’t know how non-unix users are going to cope, though. Diff and patch may not be standard commands in Windows or iOS.
MacOS and Linux come with rsync installed. Windows can install it via WSL or the Chocolatey package manager.
The nice thing about rsync is that it’s incremental. For example, this command will download all the files in games/zcode:
rsync -a rsync://rsync.ifarchive.org/if-archive/games/zcode destdir
It takes about 70 seconds the first time you run it. After that, destdir has the files, so if you re-run the command, it only downloads changed or new files. If nothing has changed, the command determines this and exits immediately.