Cognitive Accessibility Research Modules - Voice Systems and Conversational Interfaces

Voice systems and conversational interfaces can create a number of cognitive accessibility issues for people with disabilities. These technologies include artificial intelligence (AI) voice assistants, natural language processing (NLP) systems, voice menu systems, and interactive voice response (IVR) systems.

This module explores:

cognitive accessibility issues that may arise when using voice systems and conversational interfaces,
key user needs and ways to address these barriers, and
areas for further research.

Research modules are primarily intended for groups in The World Wide Web Consortium (W3C). For example, they are used as source material for Making Content Usable for People with Cognitive and Learning Disabilities.

They may also contain useful information for:

researchers,
inclusion and research-related policy makers,
designers, content creators, and developers who wish to have a more in-depth understanding of inclusion issues.

Note that these modules focus on explaining the issues that lead to diverse user needs impacting web accessibility. For this reason, they may sometimes address a broader scope than web accessibility alone.

This document is part of a set of modules that describe accessibility issues for users with various disabilities that impact cognitive accessibility. See cognitive or learning disabilities research modules for other modules.

This document is part of a set of related informative publications from the Cognitive and Learning Disabilities Accessibility Task Force (COGA TF), a joint task force of:

the Accessible Platform Architectures Working Group (APA WG), and
the Accessibility Guidelines Working Group (AG WG) of the Web Accessibility Initiative.

Introduction

This module looks at cognitive accessibility issues that may arise when using voice systems and conversational interfaces. These technologies involve bi-directional communication:

The user speaks and the system talks back, or
The system speaks and the user answers by talking or pressing a button.

Disabilities that may require cognitive accessibility support

Disabilities that may require cognitive accessibility support include:

cognitive, developmental, intellectual, learning, and specific learning disabilities,
language and communication disorders,
neurodivergence,
traumatic brain injury,
age-related cognitive decline,
mental health disabilities, and
temporary impairments that affect cognitive function, such as anxiety, illness, effects of medication, treatments such as chemotherapy, and others.

Examples of specific disabilities that may require cognitive accessibility support include attention deficit hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyscalculia, mild cognitive impairment (MCI), Down syndrome, aphasia, and others.

Cognitive accessibility also supports benefit a broad range of users, including:

people who may not be fluent in a language, and
people who may have different educational or cultural backgrounds.

Voice systems and conversational interfaces

Voice systems and conversational interfaces may include:

Voice Menu Systems in telephone self-service applications that require the user to respond by pressing keys or saying words. These systems are sometimes called interactive voice response (IVR).

Voice User Interfaces (VUIs) or conversational interfaces such as Amazon Alexa, Cortana, Google Assistant, or Siri. These interfaces tend to use voice recognition, natural language processing (NLP), and artificial intelligence (AI) to understand and respond to users.

It is worth noting that many crucial systems integrate voice systems and conversational interfaces, including emergency notifications, healthcare scheduling, prescription refilling, and more. With this in mind, full accessibility needs to be supported.

An example use case of a voice system used in telephone self-service may be as follows:

Upon dialing the number, the user may be asked: "Press 1 for hours of operation, press 2 to refill your prescription, press 3 to speak to the pharmacist.” The system then waits for a response.

An example use case for a conversational interface may be as follows:

The user says the activating phrase to begin an interaction with the VUI such as “Hello, [Name of System].” The user then asks, “What is the weather forecast for today?” The VUI provides a verbal response.

Voice systems are often implemented with the W3C VoiceXML 2.0 [[voicexml20]] standard and supporting standards from the Voice Browser Working Group . [[VBWG]]

VoiceXML 2.0 has an appendix regarding accessibility that briefly discusses users with hearing and speech disabilities as well as WCAG and WAI specifications. Cognitive disabilities are mentioned once in a bullet about allowing users to control the length of time before a timeout. See VoiceXML 2.0 [[voicexml20]] Appendix H - Accessibility.

We are planning to expand this paper to include more information about challenges with internationalization and localization that users may encounter with voice systems and conversational interfaces.

This module focuses on explaining the issues that lead to diverse user needs impacting web accessibility. For this reason, it may sometimes address a broader scope than web accessibility alone.

Challenges for People with Disabilities that Impact Cognitive Accessibility

Voice systems and conversational interfaces can create a number of cognitive accessibility barriers. These technologies can create challenges due to heavy demands on memory and on the ability to understand and produce speech in real time. In particular, voice systems and VUIs can be inaccessible to people with disabilities that affect:

memory,
executive function (the brain’s ability to plan, focus, remember, and manage tasks and emotions),
reasoning,
knowledge,
processing speed.
attention,
language and auditory perception,
speech and language production and comprehension, and
mental health.

General Concerns

Voice systems and conversational interfaces depend on the users’ knowledge and abilities. Many groups fall outside the norm in these categories. In these cases, the system often fails.

Training and artificial intelligence (AI): User groups in this module's scope often have different speech patterns, vocabulary, impaired memory, executive function, and cognitive function.

For the system to work well, it must be trained with a set of training content where different abilities and impairments are well represented. This includes long-term impairments such as mild cognitive impairment (MCI), learning disabilities, intellectual disabilities, and mood disorders, as well as temporary disabilities such as stress.

Requirements, user needs, and functional needs: Similarly, when teams use user needs and functional needs in the product lifecycle, they often focus on peers or groups with biases in terms of cognitive abilities. For example, university students are often used for focus groups, but often do not represent cognitive and speech issues associated with aging. It is essential to include user needs from a diverse perspective beyond neurotypical audiences (See User Story and User Needs .)

Testing: It is also essential to test with a wide group of users with diverse cognitive abilities to determine whether the system actually works as intended. Watch for increased levels of frustration, errors, and worsening of the users’ mood.

Memory: Recalling and Responding to Prompts

General: The user needs to recall information to successfully interact with the system, such as activating phrases, as well as information presented by the system during the interaction.

Voice Menu Systems: Menus that present several choices at once may pose challenges for users with disabilities related to working memory. Such systems require users to hold multiple pieces of information, such as the number associated with an option, while processing the terms that follow. This is true of systems that require either a voice response or a key press.

Many designers assume that users can remember lists of about seven items. This assumes a typical working memory. People with impaired working memory can hold significantly fewer items simultaneously. As a result, they may not be able to use a system that requires them to compare items or remember numbers while processing words or directions.

However, with a supportive design and prompts that reduce cognitive load, voice menus can be used even by people who can only hold two or three items in their working memories.

For example, the instruction “to speak to a nurse, press 2” stands by itself and does not require remembering anything before or after. Pausing between “to speak to a nurse” and “press 2” gives users time to decide if they want to speak to a nurse before they are given the rest of the instructions. The order of the instruction is important, and so is the pacing. The goal is to give users time to process the prompt and reduce the need for memory.

Voice User Interfaces: For VUIs, users may be required to remember key phrases (such as activating phrases like “Hey, Google”) in order to operate successfully. Users who have difficulty recalling such phrases due to long-term memory impairments may not be able to operate the system.

Executive Function: Deciding When to Respond

General: If a system response is too slow, a user with disabilities related to executive functioning may not know if their input was received and may press the key or speak again.

Voice Menu Systems: The user needs to be able to decide when to act on a menu choice. If the user does not know how many options will be presented or if the system presents them too slowly, the user may make an incorrect choice based on partial information.

Reasoning: Making the Correct Selection

Voice Menu Systems: The user may need to compare similar options such as "billing," "accounts," and "sales," and decide which one is best suited to accomplish their goal. Without additional context or prior knowledge, the user is likely to select the wrong menu option.

Knowledge: Interpreting Responses

General: If responses produced by a system are not provided in clear and accessible language, the user may have difficulty interpreting them. The user may not understand the response or know if they are using the system correctly.

Processing Speed

General: Systems that time out may not give users with disabilities that affect processing speed sufficient time to interpret information and formulate a response. Advertisements and additional, unrequested information also increase the amount of processing required.

Attention: Making Correct Selections

Voice Menu Systems: The user needs to focus on the different options and select the correct one. It can be challenging to focus on long or multi-level spoken menus without written counterparts. Advertisements or other unrequested information may also make it harder for users with attention-related disabilities to stay focused on the task they are trying to complete.

Language and Auditory Perception: Correctly Interpreting Spoken Information

General: The user needs to interpret the terms and choose the one that most closely matches the user’s goal. This involves speech perception, language understanding, and time limits. The sounds of language need to be heard, interpreted, and understood within a given time.

Users with disabilities related to language and auditory perception may make mistakes in interpretation due to auditory-only input.

Speech and Language Production: Responding to Voice and Speech Recognition Systems

We identified three issues to consider that apply to all kinds of voice systems or conversational interfaces:

Timeouts: The user needs to formulate a spoken response to the prompt before the system "times out" and generates another prompt. For example: Users of assistive and augmentative communication devices (AAC) or speech-to-speech technologies may require significant additional time to respond before the system times out.

Spontaneous speech: In directed dialog systems that guide the user through a series of predefined questions and responses, the user only needs to be able to speak a word or short phrase. However, the increasing use of natural language systems means the user may need to describe their issue in their own words. This feature is an advantage for some users because it does not require them to remember menu options. But it can be problematic for users with disabilities that impact their ability to produce spontaneous speech, such as people with aphasia or autistic people for whom stress may impact spoken communication.

Speech recognition: Speech recognition systems might not work well for people whose speech sounds different. Users may not be able to interact with a system that requires verbal input but does not recognize their speech. This affects many groups such as users with MCI or Down syndrome.

Mental Health: Interacting with Voice Systems and Conversational Interfaces

General: Mental health, such as anxiety, may also impact a user’s ability to interact with voice systems or conversational interfaces. High demands on cognitive load, negative experiences with technology, and interruptions can exacerbate anxiety or frustration, and decrease a user’s ability to interact with a system. Note that many people requiring cognitive accessibility supports find their skills are reduced as anxiety and cognitive load increases.

User Story and User Needs

This list of user needs is not complete. We are also seeking feedback on the format for presenting user needs.

In the next version, we will cross-reference this section with the user needs in Making Content Usable for People with Cognitive and Learning Disabilities. New user needs that are identified in this research module will be included in the next version of Making Content Usable for People with Cognitive and Learning Disabilities.

As a user who has cognitive accessibility needs, I need to get human help, without going through a complex menu system (VoiceXML [[voicexml30]]) or a complex voice recognition menu system that relies on memory and executive function , so that I can set an appointment or find out some information.

As a user who has cognitive accessibility needs, I need to use the system without relying on a good memory, learning new information, or dealing with distraction or cognitive overload.

Related user needs

I need to complete my task without learning new terms or phrases.
I need to complete my task without remembering option values such as “press 2” for something.
When I need to choose an option, I need clear understandable information so I can hear the option before the number to select, so I do not have to remember the number while processing the words.
When I need to choose an option, I need the choices presented in a way that does not expect me to remember multiple options, criteria, or prompts at the same time.
I need all the information to make the right choice before I forget options.
I need to easily find a process to select simple help, and not multi-step help.
I need to easily find a human by pressing a reserved digit that I know (typically the number 0).
I need simple-to-navigate voice-menu systems with limited options that make sense to me, so I can identify options quickly and don’t struggle with multiple steps.
I need to know if my response was okay so I know what is going on and don’t get too frustrated.
As a user with impaired processing speed, I need sufficient time and pauses between each option so I can process what was said.
I need time to look up any information requested such as my phone number, as I do not always remember it.
I need to know where I am in the process and if the step is completed.
I need the system to recognize my speech even if it is not typical.
As a slow speaker, I need the system to wait for my response.
I need to easily go back when I make a mistake, without having to start at the beginning.
As a user who often finds menus unusable, I need usability best practices to help me navigate voice systems.
I need to spend my energy completing my task. I do not want to waste my energy while I struggle to understand other material, such as special offers or promotions.
I need to focus on my task to remember what I am doing. Please do not give me distracting information such as special offers, in the middle of my task.
I need the steps to be as short as possible so I do not get cognitive overload.
I need help identifying the right words to say in a voice menu, and the words should be the ones I would use.
I need a process that I can use that does not rely on a lot of words.
I need processes that do not rely on memory or access to information that I entered during previous steps in a process.
I need to be able to use a site without remembering or transcribing passwords, codes, and usernames.
I need to know what the next steps are and when I am finished.

Related Personas

Gopal
Maria

Possible Solutions for Meeting User Needs

Below are some ideas that might help meet the user needs above. Note that digital solutions should also meet [[WCAG21]] and [[coga-usable]]. Also, this module focuses on explaining the issues that lead to diverse user needs impacting web accessibility. For this reason, it may sometimes address a broader scope than web accessibility alone.

Ensure Human Backup is Available and Reachable

For users who are unable to use the automated voice system, it must be possible to reach a human through an easy transfer process rather than being directed to call another phone number.

For telephone self-service systems, there should be a reserved digit for requesting a human operator. The most common digit used for this purpose is "0." However, if another digit is already in widespread use in a particular country, then that digit should always be available to get to a human agent.

Systems should avoid making users reach an agent through the use of complex digit combinations. This could be enforced by requiring implementations to

use the 0 digit consistently to reach an operator, and
not allow the reserved digit (0) to mean anything other than going to an operator.

Other numbers can be used for special actions too, but there shouldn’t be too many—too many rules can be confusing and hard to remember. Also, repeating these options too often can be distracting.

Human help should be trained to support users with disabilities. Too often, human help places users back into the system they did not manage.

Allow Users To Change Settings

Make it easy to set user preferences when available, such as adjusting the settings by using natural language ("Slow down!"). Examples of customization include:

Extra time should be a user setting for both the speed of speech and for more input time.
Timed text should be adjustable (as with all accessible media).
The user should be able to extend or disable timeout as a system default on their device.
Advertisements and other extraneous information should not be read aloud as they can confuse the user, increase cognitive load, and make it harder to retain attention.
Terms used should be as simple as possible. Uncommon and unfamiliar terms should be explained.
Consider customizable activation phrases for VUIs.
Consider customizable voices, including regional and standard accents, to create familiarity for the user.

Simplify Error Recovery

Error recovery should be simple and take the user to a human operator. Error response should not end the interaction or lead to a more complex menu. Keypad or telephone-based systems should use a reserved digit to help with error recovery. Example: Setting a default for a human operator as the number 0.

Design Prompts to Reduce Cognitive Load

Give prompts in ways that reduce the cognitive load. For example, the prompt "press 1 for the help desk," requires the user to remember the digit 1 while interpreting the term “help desk.” This wording is harder to process than the prompt "for the help desk (pause): press 1" or "for the help desk (pause) or for more help (pause): press 1."

Remove Unnecessary Words

Avoid additional information such as special offers or extra text that does not support the task directly. For example, the following sentences can cause anxiety or add to the users’ cognitive load:

"We are here to help you, and we have been for the last 25 years. We enjoy helping our customers."
"Avoid unnecessary delays by answering the following questions quickly and correctly."

Ensure Voice and Speech Recognition for All Users

Natural language understanding (NLU) or natural language interpretation (NLI) and AI systems allow users to state their requests in their own words, which can help users who have difficulty remembering menu options or mapping the menu options to their goals. However, the following should be taken into account:

Many groups requiring cognitive accessibility supports have different speech patterns. Therefore, the system should be trained and tested with diverse user groups who require cognitive accessibility supports. Train and test speech recognition for users with non-typical speech patterns or speech impairments related to disability. Include contingencies for pauses, use of incorrect terms, and mistakes.
Ensure alternative access for users whose speech is not recognized. Natural language interfaces can be challenging for users who have difficulty producing speech or language. Alternative access should be available. This should include a menu dialog that can work without speech (such as via text), directed dialog (menu-based) fallback, and/or transfer to a human agent.
Make sure fallbacks are tested extensively with a diverse user group to ensure that they do reach human help. Ensure fallbacks do not result in frustrating loops.
Use existing standards for speech recognition if they’re available for the languages your system supports. For example, the European Telecommunications Standards Institute (ETSI) [[ETSI-ES202-076V211]] has a standard for voice commands in many European languages. When using existing standards, keep in mind that asking users to remember too many commands can be overwhelming.

Follow Best Practices in General Voice Interaction Design

Standard best practices in VUI apply to users with disabilities that impact cognitive accessibility, and should be followed, such as those provided by the Association for Conversational Interaction Design Wiki [[ ACIxD ]] or ETSI ETR 096 [[ETSI-ETR-096]].

Some examples of generally accepted best practices in VUI design include:

Pauses between phrases to allow processing time.
Long timeouts.
Simple error recovery, taking the user to a human operator if the error persists.
No advertisements or extraneous information.
Jargon-free and simple terms.
Ability to review the conversation as it is happening, such as through a visual log.

See the ACIxD wiki [[ACIxD]] for additional recommendations and details.

Follow Best Practices in Cognitively Accessible VUI Design

Some specific best practices for users with disabilities that impact cognitive accessibility include:

Manageable lists: Repeat and/or chunk lists into manageable groups. Each list item should be a single idea.
Reminders: Provide cues as to what the user needs to do and when.
Guided instructions: Provide instructions to guide the user through the steps in the process.
Task history: Include methods that inform the user of what they have done recently to help prevent repeating a task and to inform the user where they are in a process.
Timers: Track the user’s time on a step and provide cues or reminders if thresholds of time are crossed.
Task overview: Provide a way to attain a list of all the steps required and information needed to complete the task or process.

Follow Appropriate Legislative Requirements

For example, in Section 255 [[Title42-section255]] of the U.S. Telecommunications Act, paragraph 1193.41 Input, Controls, and Mechanical Functions clauses (g), (h) and (i) apply to cognitive disabilities and require that equipment should be operable without time-dependent controls, the ability to speak, and should be operable by persons with limited cognitive skills.

Integrate New Technology-Based Solutions

Recent technological developments may be helpful for users with disabilities that require cognitive accessibility supports.

Visual VUI: When a call comes in on a smartphone, the system can ask the user if they want to switch to a visual interface that mirrors the voice interface. This allows a user to see the prompts instead of having to remember what they heard. Where possible, supplement text with images, symbols, or icons to increase understandability.
Adaptive voice interface: This technology is sensitive to the user's behavior and changes the voice interface dynamically. For example, it can slow down or speed up to match the user's speech rate.
Tapered prompts: Best practices in VUI design include adjusting the prompts based on the user's behavior. For example, if the user takes a long time to respond to a prompt, a simpler or more explanatory version of prompts may be used instead of the default. This approach may give longer instructions with simpler words to users who need more guidance, while allowing experienced users to move quickly through the system.

Status of these solutions

Note: The proposed solutions in this document have been tested for users in the general population and have been shown to improve the usability of voice systems. Some of these solutions have been tested with users with disabilities that require cognitive accessibility supports, primarily in academic research contexts.

Currently, VoiceXML does not directly enforce accessibility for people with disabilities that impact cognitive accessibility. However, considerable literature on VUI design exists and in many cases applies to cognitive accessibility for voice systems. Developers must become aware of these resources and of the need to design systems with these users in mind.