We have just launched our first user test of an accessible user interface prototype with blind and visually impaired testers. Despite my efforts to follow the various WCAG recommendations and specifications, there were major problems.
By far, the most important of them was the confusion that prevailed when the screen reader signaled the changes made to the aria-live regions and the contiguity of these spoken reports with the reading of the names accessible via the 39; user interface in response to key navigation.
I want to point out that our product is a simulator for first aid training. (Spoiler: Act fast or the patient dies).
It's a training product for a very large audience, but much more like a game than a web page, although we use the browser as an engine.
A live region may say "A paramedic has entered the room" or "The patient has opened his eyes". This occurs as an indirect (and delayed) response to the actions of the user.
Our testers looked for a relationship between these types of reports and their typing. I'm sure the real situation is confusing and in a hurry, but at least you can tell the difference between your hands and those of others.
The main problem (in my opinion) is that these two semantically distinct content sets have been read with exactly same voice synthesized, and with no gaps. The result was a cacophony. Button labels were read in a contiguous flow with reports of what was happening in the simulated world.
Such confusion does not occur in the sighted users of our product because the fictitious / dietary / simulated world is simply "different" from the graphical interface used to interact with it.
I am convinced that we can achieve understandable behavior of the UI, but I am quite puzzled as to how we could use the aria-live regions for updated content more than once per second without any experience falling into cacophony.
I used "polite" regions to air life, a parameter that promises (according to specifications) to allow some kind of graphical relationship between different types of content, rather than a babbling of word salads in competition .
Most of the discussions on aria-live seem to suppose a content of type "page" or "document". I followed their recommendations and the result was so disappointing that I am now looking for alternatives. There is a bit of a scene developing around "accessible games", but it seems to be mostly players, rather than developers. Discussions about techniques and implementations are almost as rare as rock dung.
I know that there is a (litigious) effort to get screen readers to support CSS3 Speech, so that different semantics can be "styled" with different voices.
This would be a very good (and standard!) Solution to our problem, but it seems that the "community" of screen readers (developers, engineers and users) considers it a low-priority feature or actively opposes it (for reasons that generally do not apply not in our case). There is certainly no implementation we can reasonably rely on.
The question I ask myself is this: how to design the UX of a relatively fast "game" type application, so that the live regions of the in-fiction world (diegetic) sound differently from the user interface?
I have some ideas.
Handle in-fiction / digetic content with our own accessible audio (eg, prerecorded MP3s) rather than relying on the pity of how the screen reader handles aria-live. (More audio is more?).
Prefix in-fiction / diegetic content with a distinct "beep" or other short sound effect.
Try to choreograph the changes made to the aria-live regions so that they interfere much less with the UI label reads. ("polite" on steroids).
Offer a special "training level" so that screen reader users can discover the user interface without the simultaneous urgency of saving the life of an imaginary patient.
Can any one tell if any of them are obvious ducks and perhaps suggest other areas of exploration?