How do I find the word that made parsing fail?

Draconis · January 10, 2025, 1:40am

I’m trying to do something complicated with parsing, and running into an odd issue. Suppose I want to parse a list of dictionary words, that correspond to different colors.

(convert @red into @red)
(convert @scarlet into @red)
(convert @crimson into @red)
(convert @blue into @blue)
(convert @green into @green)

In some cases, a word could be parsed as multiple colors.

(convert @turquoise into @green)
(convert @turquoise into @blue)

I want to take my list of words, and convert each one in turn. And I want to backtrack over all solutions, if there are multiple.

(convert list [] into [])
(convert list [$First|$Rest] into [$NewFirst|$NewRest])
	*(convert $First into $NewFirst)
	*(convert list $Rest into $NewRest)

So far so good. My problem is this: if one of these words doesn’t parse as a color, the overall (convert list $ into $) predicate will fail. I want to get the word that caused this to happen, in order to print a meaningful error message.

How can I make this happen specifically when (convert $ into $) fails, without backtracking into it after (convert $ into $) succeeds?

andrewj · January 10, 2025, 2:30am

The following might work (I haven’t tested it), but it will display the message on the first unknown word and then uses (just) to prevent further backtracking and (fail) to make the whole query fail.

(convert list [] into [])
(convert list [$First|$Rest] into [$NewFirst|$NewRest])
    {   *(convert $First into $NewFirst)
        (or)
        Bad color name: $First (line)
        (just) (fail)
    }
    *(convert list $Rest into $NewRest)

Draconis · January 10, 2025, 2:41am

That was my first thought, but my concern is that if I call (convert list $ into $) in a multiquery (e.g. inside a (collect $) block), it’ll explore into that part of the search tree even if (convert $ into $) succeeds, right?

andrewj · January 10, 2025, 2:47am

My feeling was that the *(convert $First into $NewFirst) can only fail when the word is unknown.

But I gotta say Prolog and Dialog do still confuse me sometimes, even after being quite deep down the rabbit-hole of how their execution works (choice-points, back-tracking, etc) recently.

P.S. you are right, it will back-track into the (or) there. Hmmm…

andrewj · January 10, 2025, 3:33am

A not-very-elegant way would be to use (collect $) to collect the possible conversions of each word, and check for an empty list afterwards.

You’ll also need a way to pass back the failed word for (convert list $ to $), perhaps by binding the output variable to the word itself, and checking the type of the result in the outer code.

Draconis · January 10, 2025, 3:56am

The best I can think of currently is to have one predicate that’s multi-queried to make the list, and a separate predicate that’s normal-queried to find the error if it fails. But that feels distinctly inelegant.

nephar · January 10, 2025, 8:43am

Not very elegant but, modifying @andrewj’s code slightly:

(convert list [] into [])
(convert list [$First|$Rest] into [$NewFirst|$NewRest])
    {   *(convert $First into $NewFirst)
        (or)
		~($First is color)
        Bad color name: $First (line)
        (just) (fail)
    }
    *(convert list $Rest into $NewRest)

($Color is color)
	(convert $Color into $)

might do what you want. One small problem remains though, if the list contains a color that parses into multiples it would output multiple error messages:

> *(convert list [turquoise orange] into $)
Bad color name: orange
Bad color name: orange

Edit: Obviously just using ~(convert $First into $) instead of ~($Color is color) achieves the same result too.

bkirwi · January 12, 2025, 5:19am

If you don’t mind testing the predicate twice you could:

(convert list [] into [])
(convert list [$First|$Rest] into [$NewFirst|$NewRest])
    (if) (convert $First into $) (then)
        *(convert $First into $NewFirst)
    (else)
        Bad color name: $First (line)
        (fail)
    (endif)
    *(convert list $Rest into $NewRest)

Or collect into a list and check if the list is empty. Each is inefficient in a different way, so, pick your poison I guess. (A global variable is also an option… ~obviously not worth it for your example but might be if the convert predicate was expensive and was likely to generate a large number of options.)

Error reporting is also an interesting question… just printing the problem colour has issues as folks mentioned above. Should convert list have an extra output variable for the problem word? Honestly I don’t hate the idea of doing the search only when the call fails… it iterates twice, but the “happy path” keeps its nice signature and simple implementation.

Draconis · January 12, 2025, 5:45am

My solution to the multiple error messages was to (stop) after printing it, so that part’s not an issue, thankfully.