auto-indexer for TADS3 HTML docs

jford · April 28, 2013, 11:31pm

I understand Eric’s difficulty in generating PDFs without the aid of an industrial-strength tech writer’s tool.

But I still find the documentation extremely difficult to use. No fault of Eric’s, it’s just an enormous amount of complexity to absorb.

Once I’ve read something and moved on, there is no easy way to go back and review it later. I need a map .

I think I have solved the problem as best I can without resorting to something like FrameMaker, my tool of choice for this kind of task. It would do the job quite nicely, but that’s pretty serious $$$. Surely, there must be a better way. And so…

In a previous life, I had a need to update a large number of files littered through a deeply nested filesystem, to add a boilerplate prologue to every source file in a collection of demonstration Java utilities I had written for an API Programmer’s Guide.

If you know Java, you know the file system for a Java project can be quite intricate.

Automation was my friend.

With a small expenditure of effort over the past day and half, I have updated that automated file-updater utility so that it reads each of the files in a TADS doc directory and creates index references from specially constructed anchor tags that I have added to the files.

I have attached that utility to this post; inside the ZIP is the .jar that does the heavy lifting, plus a copy of the Adv3Lite Library Manual that Eric includes in the Adv3Lite library distribution.

I have added anchor tags and run the indexer; if you unzip the bundle, you can point a browser at the manual’s index.htm file to see what the index looks like. It’s simple, not ideal, but does the job.

If you want to add your own anchor tags go ahead, then run the .bat file. If you want to point the utility at your own collection of files, without my anchor tags, you can do that, to.

Read the readme to find out how to do it, using the properties file in the resource directory.

The utility creates a file of index references, in the form of a list of links to the places in the doc that discuss the various topics I indexed. The utility then adds a link to this new file to the manual’s existing index.htm…

The good news is, it is quite easy to add an anchor tag to the HTML file whenever I encounter some topic that I think I may want to look up later. I can just edit the HTML file like this…

<a name="some_topic_idx">Some Topic</a>

…and run my index generator. As long as the anchor name value ends in _idx, the item will automatically appear in the index…

The bad news is, these tags won’t persist through a library update from Eric (unless he wants to add these anchor tags to his doc source files). But it’s better than nothing, especially now when I’ve just started absorbing the huge amount of technical detail contained in the docs.

Maybe by the time he does a doc update, I won’t be so desperate for a map.

Jerry
tads3lite-indexer.zip (366 KB)

Eric_Eve · April 29, 2013, 9:57am

Quite so. I might have been able to get round the difficulty by upgrading to the latest version of Help and Manual, instead of using the old one I got for free, but that would also have been quite expensive!

Anyway, I’ve not yet had a chance to download this and try it out, but it looks like it could be useful, so thanks for doing it.

This sounds a little like the DocGen program that generates the Library Reference Manual from the source code (I appreciate it’s not the same, since what it’s indexing is the manual, not the source, but there may be an analogy). The reason I mention it is that I’ve very recently used a tweaked version of DocGen to produce an indexed version of the source files (and the comments therein) like the Library Reference Manual that comes with TADS 3, as you may have seen in another thread. I don’t know if you’ve had a chance to try it out yet, but you could find it usefully indexes quite a lot of the information that you need.

I appreciate, though, that it’s not the same thing as having an index into the Manual, which may be what fits your particular needs.

And this may be the main problem. I’ve already updated quite a few of the docs as I’m gradually fixing and tweaking things for the next release, so I can’t simply take over your tagged files; I would have to locate and copy your tags, which might be quite labour-intensive. I may take a look at doing it, but it would be helpful to have some indication of demand first, since I wouldn’t want to spend hours fiddling around for something only one person wants.

I can’t say exactly when the next release will be. At this stage I’m not following any particular plan of work; I’m simply fixing bugs as they arise and making tweaks as the need suggests, with a view to releasing version 0.8 when the number of tweaks and bug-fixes reaches some ill-defined notion of being sufficiently big to justify a release. That might change if a new version was needed to fix a really critical bug or the lack of some feature tweak was holding people up.

In other words, I can’t guarantee when the next documentation update would be (that would overwrite all your tags in the normal course of things). Some time in June might be my best guess right now, but it could be either earlier or later than that.

jford · April 29, 2013, 4:00pm

Eric:

I have, and it does. It’s a valuable resource.

But it’s not an index into the library manual. Even with the reference, I still will want to go back and reread the more narrative discussions of topics that are not contained in the reference (they are after all a reference and a manual, different docs for different needs).

Understood, though another course would be to just start adding new _idx tags during the normal course of updating the docs without trying to recapture the tags I added. At some point, the collection of tags will reach critical mass and you can run the indexer.

Jerry

Eric_Eve · May 1, 2013, 8:14am

Yes, I can quite see that. It may have some bearing on what is worth indexing in the manual, though. For example, is it worth indexing every object property that’s mentioned in the manual, or is it actually more helpful to look up that sort of information in the Library Reference Manual? See further below.

Yes, this may a way forward. I’m thinking, though, that if I were to release an index with the manual it would have to be done properly (that is, I would have to decide on a consistent indexing scheme and then go through every section of the manual), which could be quite a lot of (potentially quite tedious) work. Perhaps at some point I’ll just have to bite the bullet and get on with it, since if it ends up making adv3Lite easier to use then it probably should be done. The danger of just adding the odd _idx tag here and there in the normal course of updating the manual is that one could end up with a somewhat random selection of items indexed, which users might find more frustrating than helpful.

I think I’ll have to come back and take a look at this when I’ve got a bit more time on my hands!

Another possibility I was vaguely thinking about was building some tool that could merge the _idx tags from an older set of docs into a newer set. This might be one way to go if your indexing tool were to be used primarily as a way for individual users to create their own sets of bookmarks into the documentation for topics rather than to create a universal index for general use. That said, I’m inclined to think the universal index for general use is probably the proper way to go.

That then brings us back to the question of what’s worth indexing. A full index could have quite a lot of entries; is there a danger of the wood becoming invisible for the trees? I suspect not, in that people are used to using quite extensive indexes to look up information in books, and the same principle would apply here, but if anyone has any thoughts on how extensive an index would be useful and/or what sort of things it would be helpful to index, feel free to post them in this thread, since it might help to crystallize my own thinking.

Jim_Aikin · May 1, 2013, 4:50pm

A few books have several indices – an index of people, an index of battles, whatever. Seems to me you might be able to set up a system with three or four different tags, each of which would be keyed to one index. I’m not following the technical discussion here, so I don’t know what would be practical. But it seems a separate index for code keywords would be more helpful than dumping them all into one big index.