Updated Usenet post archives

In the olden days, various people kept archives of the rec.*.int-fiction newsgroups and uploaded them to the Archive. Around 2002 that fell out of practice, and we’ve been relying on Google’s not-always-perfect search facility for Usenet history.

Thanks to the generous efforts of various people inside Google, we now have a dump of all of Google’s archives for those two newsgroups, from the beginning of time up through this past weekend.

See:
ifarchive.org/indexes/if-archive … ction.html
ifarchive.org/indexes/if-archive … ction.html

The files are r.a.i-f-1996-2012.mbox.bz2 and r.g.i-f-1992-2012.mbox.bz2.

(The RAIF file goes back to 1996; the RGIF file goes back to 1992. I’m not sure why the discrepancy.)

The posts are slightly spam-filtered, but there’s still quite a bit of spam, particularly in the last couple of years.

A longer-term project – not necessarily for me – is to take all of the newsgroup history on the Archive, merge it, cut out the spam, and create a single downloadable package. In the meantime, at least we have the files.

There are no current plans to archive future Usenet posts, because there aren’t a whole lot of them any more. Feel free to jump in, collect, and upload them.

Great!

But at Google Groups, there are RAIF topics from 1990! (There’s also one RGIF topic from 1991.) Apparently, the dump wasn’t quite complete…

It’s what we got.

Doing that manually, while also reading through the whole archive, taking notes and indexing the highlights would be an interesting project. It would also be a very time consuming project.

And on the topic of indexing old newsgroup discussions: I seem to recall someone doing quite a bit of that a few years ago. Anyone remember who it was and where that can be found? (A quick attempt at searching the web only finds me a collection of broken links on a page belonging to Stephen Granade, dated 1999, and the ifwiki. I’m pretty sure that what I was thinking of was something else.)

And on the subject of archiving: Has that ever been done for this forum? I assume it’s being backed up, but maybe separate archiving of the posts would also be useful?

I’m talking to Mike Snyder about getting a forum post dump. He’s still working out some issues with character encoding.

I was able to add a link to the forum post at the top of each one, but I’m having issues with the encoding problem. And not for lack of trying. I spent several hours yesterday, and the best I was able to do is get UTF-8 text to appear as garbarge characters rather than question marks. There is something strange going on with MySQL, where it can store UTF-8 text in a Latin1 table, and PHP (or at least the forum code) is saving and loading it correctly. My Perl code, which is using DBD-MySQL, is just giving me fits. If it’s a Perl problem, I could re-write my export in PHP, although I’m dreading that. I ran through several of the easier suggestions I found online, but the more involved ones call for actually changing the table structure and updating values in the fields. So I’m still in research trial-and-error mode on this.

I think that was David Fisher’s work, but from what I remember those links were merged into ifwiki anyway? I think his personal web pages are gone now though.

Could you send me a sample of this form? I may be able to figure out how to ungarbage them.

Do you mean the Perl code? Or the resulting mangled output?

The mangled output, I meant. I don’t think I’ll be able to do anything with the Perl that you can’t.