How to write good room descriptions?

Nice!

Does anyone have good tips for writing code that browses pages like that to abstract the text? I mean on a basic level of what code can you use to say “Go to a random page here and find the text in the description code”? It seems like it’d be a great resource for a procedural photograph description generator.

I haven’t used either of these, but:
crummy.com/software/BeautifulSoup/ seems to be the standard answer for scraping text from webpages in a structured way. (Fun fact: the author of this library also writes IF)

kimonolabs.com/ has a clicky interface for structuring the data and a video with perky music

If you are using Linux or Mac OSX and are not afraid of the terminal, it could be “easy” (well, for a certain definition of “easy” that requires you to be knowledgeable about some arcane geekery :slight_smile:. For example, the following command returns all the photo descriptions on page one of the first link:

wget -q -O - http://digitallibrary.usc.edu/cdm/search/collection/p15799coll65/searchterm/interior/field/all/mode/all/conn/and/ | grep "img class" | grep "alt=" | perl -pe 's/^.*alt="//' | cut -d '"' -f 1

Of course, this requires you to learn about the command line and a few useful tools to parse the page (grep, perl, cut, awk…). It is probably easier than learning a programming language though. Nothing is easy :stuck_out_tongue:

Beautiful Soup is what I am using for this: Metasite for submitting to all three major community sites? - #10 by bg

My main difficulty so far is that I’m not sure of the best way to get feedback on Python code, so I am going solely by what seems to work, and learning via search engine.

Thanks for the suggestions! I figured if I want to do this kind of stuff I’d have to learn some Python or, um, that other stuff sometime, so it might be time to start. Gulp. Maybe I can ask my friends in the computer science department for help.

Morlock, I just cut and pasted that into my terminal window (not on my home directory, for whatever that’s worth) and it said “-bash: wget: command not found”. Do I need to install some other stuff? I have a Mac running OS 10.7 for whatever that’s worth.

I’m finding it helpful to mentally translate some things into Inform 7. For instance, it seems like

for x in y

in Python is equivalent to

repeat with x running through y

in Inform.

Oh, this is a nice cheat sheet: cogsci.rpi.edu/~destem/gamedev/python.pdf

You need to install wget. I’m not big on Macs (even less on Windows) so I don’t know what the best course is for you to get wget but also all the other basic tools that come pre-packaged with most Linux distributions but apparently not with Macs. I think however what is recommended now is to use XCode:

itunes.apple.com/ca/app/xcode/i … mpt=uo%3D4

If you install this, it brings most useful development tools (compilers, UNIX command line tools…) but I can’t guaranty this will enough.

Beautiful Soup is tailored made for your task, but then you need to learn a lot also (python programming, the beautiful soup module, html…) to use it properly. If you have friends in CS, AND they have a lot of free time for technical supporting you, then MAYBE you will get to do something useful with Beautiful Soup. I program in Python every week, can create modules, invent and implement efficient algorithms, (but I don’t do a lot of web stuff) and I find it difficult to use… Your mileage may vary and your friends may be your greatest assets.

Try installing XCode and tell us if it worked.

Looks like I’d have to register as a developer to get a version of xcode that’s compatible with my current OS. Apparently I can get wget straight from its page if I want to install it using command-line tools.

Registering as a developer is free.

There’s a “command-line development tools” package on the Apple download site somewhere. I don’t remember if it requires registering, but it might be less hassle than the full Xcode download (which is large).

This may be better advice and from someone who obviously knows more about macs than I do :slight_smile:

Do perl and awk not count as programming languages?

Sure, but adding perl -pe 's/replace this/by that/' hardly counts as programming IMHO. It is basic search and replace. Same with awk '{print $2}' , which prints column 2. Compare this to the python script you need to write to get the same result as my oneliner. I consider that the onliner is more versatile and fast to hack then writing a program. It may be the opposite for some people, so in the end it is a question of finding an approach that fits us. I always try to fix problems in bash (the terminal language) first and then proceed to Python or some other specialized programs. I LOVE Python, but bash is more time efficient for some problems.

Personally, I have always found “you can see,” to be best used sparingly because it is a phrase that is removing me from being in the moment. I think smoother transitions are straight descriptions because most IF describes what you can see because 5/10 “You” are the one whose eyes we see through in some form, even in stories like Urkel the Black Dragon(I hope I got the name right).

I was inspired by your post and wrote a Python program to do this:
github.com/blackredbuttonbox/picture-descriptor

It uses a pretty “dumb” approach of just generating a random item ID, then visiting that page and scraping the text, so it can’t filter by keyword or anything. But it works okay and was fun to write.

This is a very cool (and educational) one-liner, but the descriptions in the alt text are much shorter than the long descriptions you get when you mouseover the images or visit the item page. You could probably still do it by grepping and filtering the right attribute, but the tricky part is that some of the long descriptions have a ton of nested links, so I was quite happy for Beautiful Soup’s ability to strip all that HTML out for me.

Thanks!

Excellent work! I have never gotten into Python, so thanks for also supplying an .exe version.