Unfortunately I am now in need of such a thing – my interpreter is struggling with parts of The Impossible Bottle. (Of the “long computation gives incorrect result” sort, which makes debugging rather nasty.) If anyone else has ideas for a good baseline comparison, I’m all ears! (Or if there’s a good boring easy-to-build interpreter I could modify, that’s great too.)
Edit: fixed the bug in question, but still interested in a less viscerally-urgent way.