An Investigation of Dashes and Screen Readers

Premise

There was a conversation in the Neo-Interactives Discord server regarding different styles of pausing and pacing in a sentence, and the use of dashes came up.

This got me curious: Is there a recommended practice for adding dashes for pauses, similar to how there are recommendations for different HTML tags, when writing with accessibility in mind?

Results Summary

Assuming default settings, the case with the best handling was the EN-dash, with spaces added on either side.

Testing Environment

So I did a number of tests; 49, to be exact.

I had a sentence:

The flower, a favorite of Natalie’s, was ready to bloom this year.

The control case (or case one) was written with commas, as above. For the next 6 cases, I swapped commas out with the following:

  1. “Common” dash (or -), with no spaces on either side.
  2. Common dash, with spaces added on either side.
  3. EN-dash (or –), with no spaces on either side.
  4. EN-dash, with spaces added on either side.
  5. EM-dash (or —), with no spaces on either side.
  6. EM-dash, with spaces added on either side.

For each of these 7 cases, I tested the following screen reader configurations:

  1. Orca on Linux Mint with Firefox
  2. Orca on Linux Mint with Chrome
  3. Talkback on Android with Firefox
  4. Talkback on Android with Chrome
  5. NVDA on Windows with Firefox
  6. NVDA on Windows with Chrome
  7. VoiceOver on iPhone with Safari

Unlike my previous screen reader experiments, the choice of browser did not seem to change the results, so the test cases can be simplified to the following:

  1. Orca on Linux Mint
  2. Talkback on Android
  3. NVDA on Windows
  4. VoiceOver on iPhone

Disclaimers

First Disclaimer: I am aware that a user has the ability to change which punctuation elements are announced by a screen reader, but I was mostly interested in how default settings handled these cases.

Second Disclaimer: I am aware that most screen readers are configured to read at high rates, and my own is quite fast (though maybe half the speed that I’ve heard from most YouTubers in the blind community). There is an argument to be made that pauses are not audible, but I believe I noticed the spoken tones following different sequences when pauses were handled and not skipped, so there is still some information getting across to the user.

Details of Results

Orca

Orca is rather verbose. Commas create pauses, but all dashes are announced. however, only EN-dashes and EM-dashes (with or without spaces) create pauses before announcement. Meanwhile, the common dash had no such pause before its announcement.

Talkback and VoiceOver

Talkback and VoiceOver have identical behavior, in this regard.

Pauses are created with commas, common dashes (with spaces), EN-dashes (with spaces), and EM-dashes (with or without spaces). A common dash or EN-dash is completely ignored, if spaces are missing on either side.

NVDA

With NVDA, pauses are created with commas, and EN-dashes. A common dash can create a pause, too, but only when spaces are placed on either side. If these spaces are missing, then a common dash is ignored.

Something that took me by surprise was hearing NVDA completely ignore all EM-dashes, regardless of spacing.

Test for Yourself

If you would like to test all 7 cases with your own screen reader configuration, then I included them below.

Test begins.

  1. Comma case. The flower, a favorite of Natalie’s, was ready to bloom this year.
  2. Common dash no space case. The flower-a favorite of Natalie’s-was ready to bloom this year.
  3. Common dash with space case. The flower - a favorite of Natalie’s - was ready to bloom this year.
  4. EN dash no space case. The flower–a favorite of Natalie’s–was ready to bloom this year.
  5. EN dash with space case. The flower – a favorite of Natalie’s – was ready to bloom this year.
  6. EM dash no space case. The flower—a favorite of Natalie’s—was ready to bloom this year.
  7. EM dash with space case. The flower — a favorite of Natalie’s — was ready to bloom this year.

End of test.

17 Likes

The thing you’re calling a “common dash” is actually a hyphen. This is normally used to separate compound adjectives and compound nouns and should never be announced by a screen reader.

The spaced hyphen can be used in place of a spaced en-dash in situations where the en-dash is not available, such as typewriters. (Does anyone remember those things?).

The en-dash is typically used to indicate a range between numbers and is pronounced as “to”. For example, “see pages 10–12” is pronounced as “see pages 10 to 12”. Did you test this aspect?

The spaced en-dash is typically used to indicate a range between anything other than pure numbers. For example, “25 March – 3 April” is pronounced as “25th of March to 3rd of April”. It can also be used for a pause.

Personally, I don’t like em-dashes and never use them, but I think they are always used unspaced.

These are just the most common cases. A screen reader would have to be pretty clever to get all the subtle variations right. Even so, you should not abuse the normal use of punctuation just to satisfy a screen reader, or you’ll get all the non-visually-impaired off side. If you use punctuation correctly, a screen reader will usually get it right. Your tests generally confirm this.

4 Likes

I believe Orca announced it as “dash”, and VoiceOver announced it as “hyphen” while I was spelling out the url for the test I had hosted on my local network. The other screen readers also called it “dash” while I was typing it out. I’m not making a point here, but just interesting info, in case you wanted it.

Wild. Orca does by default.

Nope! I can add some more test cases and run another series, though, but that will have to wait until tomorrow, because the iPhone user in the house is asleep right now.

I was reading about this earlier, and remember seeing something about how British English tends to use the en-dash, while American English tends to use the em-dash. I’ve always used the em-dash, just because it’s visually distinct from the en-dash, but I’m probably going to retrain myself to switch to an en-dash with spaces, after doing these experiments.

I was just testing for cases where a dash is used to add a mid-sentence sub-clause, which is the correct usage, and not an abuse of punctuation.

5 Likes

I tested this on Debian Testing with Orca in Firefox, Orca in a terminal window via lxterminal and the nano text editor, and espeakup in the console via saving the text file I copied the text into and opening it in nano in the console. I am a full time screen reader user of ~12 years, but I am not a speed listener, so both screen readers are at a normal conversational speed. Also, I’m using espeak as my synth with the default British English voice.

In Orca, I got the same results in both firefox and nano. None produce what I’d call a pause(to get an actual pause took a period followed by a capital letter), but commas produced a natural cadence, as did the various dashes with spaces. Dashes without spaces resulted in a run on effect as if the words either side of the dash are one word and not two.

In the console with espeakup, commas and common dashes got the same result as with Orca, but en and em dashes are completely broken, rendered as double carats when reading by character and treated like the letters s and t respectively when reading by word.

I also tested with period and the ascii ellipsis(e.g. 3 separate periods rather than a unified unicode character), but only with Orca in Firefox. Best I can tell, the ascii ellipsis acts like a comma whether there are no spaces, only a trailing space, or both trailing and leading spaces. A period followed by a space and a capital letter results in a pause in speech while a period, space, and lowercase letter is treated like a space, and a period without spaces is spoken as dot, the only punctuation mark Orca directly vocalized in my tests.

Of course, there might be minute changes in cadence my ears aren’t picking up.

5 Likes

Oh, wow! That’s… a pretty interesting way for that to break! :open_mouth:

Ohhh, this is really useful to know!

This makes sense, since Orca is likely to be used on Linux machines, and this would be the clearest way to read out the names of dot-files.

Thank you for sharing your test results! :grin:

2 Likes

I’ve been using hyphens in place of en-dashes simply because there is no key for an en-dash on a standard keyboard. Given that screen readers treat them differently, I will rectify this in future.

A digression on the subject of em-dashes

I love em-dashes because two of my favourite writers—Laurence Sterne and Emily Dickinson—were big fans. Here’s an article discussing Dickinson’s use of punctuation which suggests a link between the two authors. The fact that Dickinson was once described as a “grammatical reprobate” makes me love her all the more.

5 Likes

I confess to using hyphens instead of en-dashes (or em-dashes) simply because I normally use PunyInform as this allows me to target the largest number of platforms, including retro 8-bit and 16-bit computers. Those old computers didn’t have an en-dash in their character set. It was basically only ASCII (or pretty close to it) and things like Unicode hadn’t even been invented back then.

2 Likes

Most operating systems have ways of inserting special symbols. On Windows the Win + . shortcut brings up a popup that has punctuation symbols like the en/em dashes, plus accented letters, emojis, even kaomoji. Not sure how it works on Mac but I’m pretty sure there’s a keyboard shortcut for en dash

2 Likes

Yes, it’s option+dash. Em dash is shift+option+dash.

3 Likes

Oh there is a way on a PC:

"To insert an em dash (—) on a PC, you can either hold down the Alt key and type 0151 on the numeric keypad, or use the Insert > Symbol > More Symbols menu in programs like Microsoft Word.

But can I ever remember those numbers? I cannot.

1 Like

This is a good discussion!

In my two latest games (available now!), I ask the player if they are using screenreader technology. Usually, this is to replace characters that are specifically for visual effect. For instance, something displayed as

|  C  |  C  |  T  |  B  |  C  |  A  |

might instead be output as

,  C  ,  C  ,  T  ,  B  ,  C  ,  A  ,

It sounds like I ought to be doing things like this, too

To say em:
	say "[if screenreader is false][unicode 8212][otherwise] - [end if]"

For emdashes (I love Emily Dickinson, too).

6 Likes

I change the words a lot in my WIP depending on screenreader use too, so it is certainly something I should test in the future!

This thread is really useful, I’m glad it was brought up. I’ll be bookmarking this for future reference.

2 Likes

I’m curious how two hyphens work for screenreaders. Do these two hyphens -- here -- result in a pause or something odder? They have spaces between them. How about here--without spaces?

3 Likes

For the record, en and em dashes being broken in the console on my system might be the default Debian console not handling unicode well, and not an issue with the screen reader or speech synth(remember, I’m using the same synth with both screen readers… and for anyone confused, the accessibility stack, at least in Linux, is quite modular, and to simplify, the screen reader provides user control and tells the synth what to say, the speech synth is the software that actually produces what is said. The screen reader can control what the synth says, but the synth ultimately decides how something is said. And for the record, when I fed Joey’s test text through my ascii.sh script, which uses transcode to convert every .txt and .html file in the working directory from UTF-8 to ascii and save the output as inputfile.ext.ascii, the en and em dashes become common dashes.

Also, there was no pause with either version of the double dash with Orca in Firefox.

| C | C | T | B | C | A |

Is read with no pauses between letters with Orca in Firefox.

, C , C , T , B , C , A ,

Is read with a brief pause between letters. It’s the difference between spelling a word at normal conversational speed(vertical bars) versus spelling it so a competent typist can type it.

Also, a little testing suggests it matters if the characters either side of a dash are numeric or alphabetic.

a-2

is read A 2

a - 2

is read Uh two

2-a

Is read 2 A.

2 - a

Is read 2 brief pause A.

A-A

Is read A A.

A - A

Is read uh A.

2-2

is read 2 dash 2.

2 - 2

is read 2 brief pause 2.

2 -2

Is read 2 minus 2.

funnily enough, while dash is silent aside from a brief pause in the situation that most clearly suggests subtraction, + is read plus, / is read slash, * is read star, and = is read equals regardless of the spacing, at least best I can tell, I'm not being super systematic and might have missed corners of the combinatoric explosion. Granted, it might be that those punctuation marks are usually verbalized in the way a human would normally read text while dashes have as much unspoken as spoken usage, and the screen reader and/or speech synth has to make assumptions about intent regarding text processing.

Not, I only did these tests in Firefox with Orca since constantly switching between console and GUI and keeping track of what was said where is a bit exhausting, I'm using the same synth in both places, and based on my testing, the biggest difference between the two environments might be how the console renders text versus how Firefox provides text to Orca. I'm also not sure what my Orca punctuation settings are set to, though it's clearly one of the less verbose settings and I'd probably want to crank them up if I was going to try coding with Orca(and I do crank up espeakup's punctuation verbosity when coding).
4 Likes

Oh no! Bad news for comics transcription or typewriter imitation, haha.

3 Likes

I tested by highlighting this text, right clicking and asking OSX to speak the text.

TL;DR: Em-dashes work for pauses with or without spaces. Hyphens work with spaces, but unspaced hyphens or en-dashes are ignored. In no case did the reader narrate “dash” or “hyphen” or “minus”.

10-2=8 says “tentwo equals eight”.
10 - 2 = 8 does the same thing, so if you’re narrating math problems you might need to spell out “minus”.
10+2=12 - this works correctly “ten plus two equals twelve”
10/2=5 - “ten slash two equals 5”
10*2=20 - “tentwo equals twenty”
10 * 2 = 20 - “tentwo equals twenty”
10x2=20 - this works! “ten times two equals twenty”
10 x 2 = 20 - also works “ten times two equals twenty”

3 Likes

Okay, I’m not sure if my ears/mind is playing tricks on me, and I can’t recall if I did my original test before or after fixing this, but while Orca 48 has been available on Debian Testing for a week or two now, I was experiencing a glitch that would render my x session silent following a reboot with Orca 48 installed, which I managed to solve over the weekend by replacing light-locker with suckless-tools(as a bonus, no more light-locker means I no longer have to do a killall light-locker following a reboot to prevent my xsession from locking up if I’m away from keyboard for too long)… either way, dashes with spaces seem to have slightly longer pauses now… though again, not sure the change is actually there or if I’m Mandela-effecting myself somehow.

Also, I just now noticed the first replies in this thread before my initial reply. For what its worth, plenty of punctuation marks have multiple common names, and screen readers often prefer shorter names since speech is inherently lower bandwidth than visual reading and brevity counts for a lot when getting things done, especially for those with slow ears. Not only does Orca call hyphens dashes, it calls exclamation marks bang, Ampersands just and, asterisks stars, shortens parenthesis to paren, leaves the square off of brackets, omits the curly from braces, just says less and greater for the angle brackets, leaves the mark off the question mark, calls a period a dot at all times, uses tick for single quote, quote for doublequote, leaves the sign off at, dollar, percent, and number(which it doesn’t call hash, though it used to), shortens semi-colon to semi, just calls vertical bar bar, calls an underscore line, and I think that covers all the punctuation on the keyboard I’m using. And best I can tell, the only inconsistency between Orca and espeakup is that the character that shares the tilde key is called grave with a short a by Orca and accent by espeakup.

And yeah, Orca as I have it configured often doesn’t handle math expressions well at all. There’s a reason I crank espeakup’s punctuation verbosity to max when coding, there are just too many punctuation marks that would be silent in prose or even the way one would normally read code aloud that the normal verbosity renders many lines of code incomprehensible.

Also, just thought to test with the orca+v toggle set to verbose(I have it in brief usually and it isn’t a command I use regularly), but it makes no difference in the original test text. Granted, I have no idea what that toggle actually does and the Orca settings have a list a mile long of options related to what Orca does or doesn’t say.

2 Likes

This topic has been covered very well, but if I were going to add anything, it would’ve been this. Thanks for such a good discussion and good questions! :smiley:

1 Like

It should be noted that this uses a different text-to-speech engine than VoiceOver (the built-in macOS screen reader.) Reading the first two lines with VoiceOver will say “ten dash two equals eight”.

It will also correctly read “—” as “em-dash” and “–” as “en-dash”.

EDIT: There seems to make a big difference whether you let VoiceOver read an entire page or paragraph at once, or whether you step through the text line-by-line. The latter will spell out all the punctuation, where the former will not.

1 Like

I don’t want to derail from the important work into screen readers and accessibility, but I was amused by this. In my mind, hyphens don’t separate, they connect. I vaguely recall some rule like “dashes interrupt; hyphens connect.”

As to spacing around dashes: it’s complicated. Long ago, I was a typesetter for a newspaper. With narrow newspaper columns, there are few options for line breaks, and sometimes the spaces would grow pretty wide to keep the text justified. As a result, a full em-dash could visually look like it was joining the adjacent words they way that a hyphen would instead of representing an interruption in the thought. So we typically set them with thin spaces on either side. (And, when I got my way, I made sure the preceding thin space was a no-break space because I hate it when the line breaks just before a dash.)

3 Likes