This document outlines accessibility-related user needs, requirements and scenarios for natural language interfaces. These user needs should influence accessibility requirements in related specifications and in the design of applications that include natural language interfaces. The concept of a natural language interface is first clarified. User needs and associated requirements are then described.
This document is not a collection of baseline requirements. Some requirements may be implemented at a system or platform level and others at the application level.
Introduction
What is a Natural Language Interface?
A natural language interface is a user interface in which the user and the system communicate via a natural (human) language. The user provides input via speech or some other method, and the system generates responses in the form of utterances delivered by speech, text or some other method.
Systems that provide natural language interfaces often support spoken interaction. In this case, speech recognition processes the user's input, and speech synthesis generates spoken responses. However, the use of speech is not essential to a natural language interface.
Typical examples of natural language interfaces include:
Voice agents that communicate via speech. These agents may run on a computing device such as a mobile phone, tablet, laptop or desktop computer. They can be embedded in specialized hardware such as consumer appliances, or the automation system of a vehicle.
Chat bots in Web applications. Chatbots that process natural language requests from the user. For example, a customer service application available on an organization's web site could offer a natural language user interface to handle customers' inquiries. Not all chatbots are speech-based, some text-based chatbots can let people use speech via a keyboard dictation function.
Interactive Voice Response (IVR) systems that interact with the user via a telephone call. IVR systems accept speech or key pad input and generating speech output.
These examples are not definitive. Variations of the examples and applications that do not fit these patterns are possible.
Natural Language Interfaces and accessibility
Natural language interfaces can be made accessible users with disabilities at the platform and application levels via multiple modes of input and output. For example, some users with physical disabilities may need speech input, while others may need a keyboard, switch input, an eye tracking system, or some combination.
Similarly, natural language output may be spoken or visually displayed as text. These and other requirements are detailed below. These requirements may best be satisfied by an assistive technology. For example, a chat bot that lacks a spoken interface may satisfy a user's need for speech input via a browser or operating system dictation function.
Cross disability support
For some disability types, the requirements for authors and designers are straightforward. At the heart of current accessibility testing are technical code specifications that map to accessibility requirements and can be tested and verified to check if certain statements are true or false. For some disability types this may be more of a support continuum rather than a binary model. In some of these areas the criteria for these models may not be clear. A user interface that is responsive, and can be personalized to support shifting user needs, is a good example.
Current work in accessibility guidelines and standards is moving toward accommodating these new ways of measuring more subjective accessibility requirements that support the needs of people with disabilities but may not be easily measured in a binary fashion.
With this in mind, in the context of Natural Language Interfaces, it’s especially important that application design support the cognitive needs of users,especially if the interface includes speech input because speech input is cognitively taxing In several ways.
Speech input commands must be quickly called to mind, which requires cognitive effort for experienced users and more effort for new users. Speech input also taxes attention. Similar to type-ahead results, speech input results must be watched to make sure that the computer has not made a wording mistake that may be difficult to figure out later. At the same time, the language centre of the brain, used for speech input, is also used for many of the thought processes that go into things people do on computers such as writing or coding. Good design can mitigate the extra cognitive effort and is doubly important for those who have learning or cognitive disabilities. Good practices such as discoverability, ease of use and simple affordances are important considerations in making natural language interaction viable for all users and may require particular understanding when designing these interfaces.
For example, there are particular challenges for people with cognitive disabilities using interfaces that rely on memory. The design should accomodate this need by providing step-by-step instructions. By reminding users how many steps they have completed and how many more steps they need to complete supports the user's memory, rather than relying on it.
When speech input is used, it is important to provide command prompts if needed, so users do not have to rely on their memories to come up with commands at the same time as they are completing steps. There are other user needs patterns relating to supporting the needs of people with cognitive disabilities that can be referred to in 'Making Content Usable for People with Cognitive and Learning Disabilities'. [[content-usable]]
Voice user interfaces
Voice user interfaces (VUI) using speech such as those found on a range of commercially available devices for home and mobile use represent a part of the stack that make up natural language interfaces. This document aims to identify accessibility related user needs and requirements for VUIs and indicate further areas of work and research in terms of how they relate to new standards like WCAG 3 and other emerging technologies.
Scope
Natural language interfaces frequently occur as components of larger user interfaces and systems. For example, a chat bot may be included in a web application. A natural language interface may be an essential part of a multi-modal application that uses a combination of language and gestural inputs. An example would be an interactive navigation tool that allows the user to issue spoken commands and to interact with a graphical map with a pointing device [[multimodal-interactive-maps]]. A natural language interface may be used to operate an underlying service that is also available via alternative user interfaces. For example, a weather forecast reporting application may offer a Web-based API that functions as the basis of a natural language interface and an independent, graphical interface reachable via a Web page or via a desktop or mobile application.
This document addresses the accessibility of the natural language aspect of the over-all user interface. It is concerned with the accessibility of natural language interactions to users with disabilities. If any other (non-linguistic) user interface is offered for the same underlying application or service, its accessibility to users with disabilities is independent of the user needs and requirements collected here. To make such alternative interfaces accessible, Web Content Accessibility Guidelines [[wcag21]] should be consulted, in the first instance.
This document does not address the policy question of whether a natural language interface should satisfy the accessibility requirements identified here, if there are alternative, accessible user interfaces available that offer exactly the same functionality (i.e., the same application or service) to the user. For example, the question whether a natural language interface must offer non-speech input and output methods under circumstances in which its entire functionality is available via an alternative, Web-based user interface or mobile application, depends on policy and regulatory issues that lie outside the scope of this document. Rather, the assumption here is that the natural language interface is to be made accessible to people with disabilities. The user needs and system requirements provide insight into how this can be achieved.
Services and agents
Behind these interfaces there are services that provide core processing, evaluation and content. This document aims to look at these services and determine to what degree they can and should support the needs of people with disabilities; what system requirements are, or where further research is needed.
Ideally by satisfying system requirements, developers of platforms and applications offering natural language interfaces can meet corresponding user needs. Currently, no stance is taken in this document regarding which needs are best satisfied at the platform level, by an assistive technology, or in the development of applications, but this will change as the document develops. These architectural considerations are left to be decided by system designers, and therefore there may be requirements in accessible system design that they need to be aware of. Often, they also depend on the services provided by the underlying operating system or by the web platform.
If natural language interaction is provided as part of a system that also offers other styles of interaction, this document should be read in combination with guidance provided elsewhere which is relevant to the other interface and service aspects. Notably,
RTC Accessibility User Requirements (RAUR) [[raur]] identifies user needs and corresponding requirements for the accessibility of real-time communication applications, such as video conference tools and web-based telephony systems.
XR Accessibility User Requirements (XAUR) [[xaur]] identifies user needs and corresponding requirements for the accessibility of virtual reality and augmented reality.
User Agent Accessibility Guidelines (UAAG) [[uaag]]- if we consider the service behind the interface - what parts of the User Agent Accessibility Guidelines (UAAG) are relevant for these particular services?
As a general principle, the entire interface of a system or application needs to be accessible to users with disabilities. If only the natural language interaction component is accessible, some users will be unable to complete tasks successfully. For example, a smart agent that answers a user's questions by searching the web for information and then displaying it on screen is only accessible as a whole if both the interaction and the presentation of the information satisfy the user's access needs. If the on-screen information is not accessible, then the user cannot complete the task of acquiring and understanding the information requested.
User need definition
The term 'user needs' in this document relates to what people with various disabilities need to successfully use natural language interfaces. User needs are dependent on the context in which an application is used, including the user's capabilities and the environmental conditions in which interaction with the interface takes place. For example, a spoken interaction would be inaccessible to a person who is deaf, or to a hearing person situated in a noisy environment. Although disability-related needs are the focus of this document, the user needs described here are not limited to people with specific types of disability. The capabilities of users vary greatly. They include a variety of physical, sensory, learning and cognitive abilities that should be taken into account in the design of platforms and applications.
User needs and requirements
This section outlines a variety of user needs and system requirements that can satisfy them.
User identification and authentication
User need 1: A user with a physical disability needs to use speech as the only means of communicating with a system that can be shared with other users. Due to security and privacy requirements, each user must be authenticated individually.
REQ 1: Support voice identification as a means of biometric authentication.
To achieve adequate security, voice identification may need to be combined with other factors of authentication.
User Need 2: A user who is deaf or who has a speech disability needs to interact with a system that can be shared with other users. Due to security and privacy requirements, each user must be authenticated individually.
REQ 2a: Support a means of biometric authentication other than voice identification.
REQ 2b: Support a non-biometric means of authentication, such as a hardware security token.
In some cases, this requirement can be met simply by using authentication mechanisms provided by the underlying operating system or browser environment.
Means of input and output
User Need 3: Different users have a need for different input devices or mechanisms. For example, a person with a physical disability may need speech input, single switch input, eye tracking input or a combination of these. A person who is deaf or who has a speech disability may need to use keyboard input.
REQ 3: Support multiple input devices and methods either within the natural language interface/s or via alternatives. Make sure these methods don’t interfere with each other so they can be used in combination.
This requirement can often be met by supporting the input methods available from the underlying platform, including assistive technologies.
If software that incorporates a natural language interface supports multiple input mechanisms, support for any specific mechanism may be available only on particular hardware devices or in particular environments. For example, a smart speaker may support only speech input, whereas the same smart agent running on a mobile system such as a phone or tablet may support text input via a keyboard or any device capable of emulating a keyboard.
See the requirement to support a keyboard interface specified in WCAG 2.1 [[WCAG21]], success criterion 2.1.1.
User Need 4: Users need different output devices or mechanisms. For example, a user who is blind may require speech output. A user who is deaf, or who has a speech disability, may require visually displayed text output. A user who is deaf-blind may require braille output.
REQ 4: Support multiple output devices and methods either within the natural language interface/s or via alternatives.
This requirement can often be met by supporting the output methods available from the underlying platform, including assistive technologies.
If software that incorporates a natural language interface supports multiple output mechanisms, support for any specific mechanism may be available only on particular hardware devices or in particular environments. For example, a smart speaker supports only audio/sound output, whereas the same smart agent running on a mobile system, such as a phone or tablet, may support a visual display as well, and be compatible with braille devices.
User Need 5: A user needs to use the same input and output mechanisms to complete an entire task involving an interaction with the system.
REQ 5: Provide a mode of operation in which the user does not need to switch from one input or output mechanism to another partway through completing an interactive task.
User Need 6: A user who is deaf or hard of hearing needs to provide speech input to an application, while having the output presented as text.
REQ 6: Support a mode of operation in which the user can speak to the system, and system natural language output is presented visually or conveyed via assistive technology.
User Need 7: A user needs linguistic information presented both as speech and as text to be adequately perceived or understood.
REQ 7: Provide a mode of operation in which the spoken output of the system is accompanied by a synchronized text transcript.
User Need 8: A user with a speech disability needs to provide textual input to the system, but can effectively perceive and understand spoken information.
REQ 8: Provide a mode of operation in which keyboard or other forms of textual input can be given, in combination with speech output.
User Need 9: A user who is deaf-blind needs to communicate with the system via a refreshable braille device.
REQ 9: Support a mode of operation in which input and output are both provided as text.
Support for braille displays is assumed to be provided by a screen reader running under the device's operating system. Therefore, support for keyboard input and textual output is the stated requirement for the natural language interface itself, leaving interaction with the braille hardware to the operating system on which the user interface is run.
Communicating in a language that the user needs
User Need 10: A user who is deaf or hard of hearing needs to communicate with the system in a sign language.
REQ 10a: Provide a mode of operation in which sign language presented by the user is recognized and processed by the system.
REQ 10b: Provide a mode of operation in which the system's output is presented visually in a sign language.
REQ 10c: As an alternative to requirements 10a and 10b, provide a mode of operation in which a human sign language interpreter relays communication between the user and the system.
At present, it is generally infeasible to implement REQ 10a and REQ 10b with sufficient reliability and accuracy to be useful. Sign language processing (including automatic recognition, translation, and production of sign languages) involves challenging research problems. See [[Bragg-et-al]] for details. These two requirements are nevertheless stated here to encourage further research and development efforts.
Sign languages vary by country and region. Therefore, multiple sign languages may need to be supported, depending on the intended audience of the system.
User Need 11: A user with a learning or cognitive disability needs to communicate with the system in a symbol set supported by a particular augmentative and alternative communication (AAC) assistive technology.
Speech recognition and speech production
User Need 12: A user with atypical speech characteristics needs to provide spoken input to the system.
REQ 12a: Ensure that the system can recognize atypical varieties of speech with adequate accuracy to enable the application to be successfully used.
REQ 12b: Provide a mode of operation in which the system is trained to recognize a particular user's speech more accurately than it can without training.
User Need 13: A user with atypical speech characteristics needs opportunities to correct the system's speech recognition errors.
REQ 13a: Enable the system to estimate the probability that a user's utterance has been recognized correctly.
REQ 13b: If the system's confidence in its recognition of the user's utterance is below a reasonable threshold, prompt the user to repeat or confirm the request made or the information spoken.
REQ 13c: Allow the user to decide at any time to switch input methods, even if a spoken dialogue is already in progress.progress.
REQ 13d: Allow the user to correct speech input using speech, such as a simple spoken command.
REQ 13b is only an appropriate strategy if the system's confidence measure is strongly correlated with its actual recognition accuracy for people with speech-related disabilities. This correlation should be established empirically, in a variety of real use contexts, before relying on this approach. Otherwise, the system's prompting for input to be repeated or for confirmation will be insufficiently associated with cases of genuine recognition error.
User Need 14: A user needs to adjust the speaking rate, volume or pitch of speech generated by the system in order to understand it well or to interact more efficiently.
REQ 14: Provide an accessible user interface with which the speaking rate, pitch and volume of speech generated by the system can be configured.
To ensure this user interface is accessible, it should satisfy relevant accessibility requirements drawn from this document or elsewhere. For example, a system could provide spoken commands, and a settings dialogue in a graphical user interface, as alternative mechanisms for configuring speech properties.
Visually displayed text
User Need 15: A user who has low vision or a learning disability needs to adjust the font style or spacing of text displayed by the system.
REQ 15: Ensure that font properties and text spacing are configurable by the user, including font size, font style, character, word, line and paragraph spacing.
In some cases, this requirement can be met by capabilities of the operating system or browsing environment.
See the text spacing requirement specified in WCAG 2.1 [[WCAG21]], success criterion 1.4.12.
Designing for understanding and effective use
Understanding how to interact with the interface
User Need 16: A user who is unfamiliar with the system or who has a learning or cognitive disability needs to know what the system can do and how to ask the system to do it.
REQ 16a: Provide commands so the user can request help or instructions.
REQ 16b: Provide commands that give an overview of what the system can do.
REQ 16c: Provide documentation in a form that satisfies accessibility guidelines which explains and gives examples of how to use the system.
This need is particularly applicable to systems which can serve a wide range of requests, such as personal assistants. All users need to know how to interact with a system to start using it. It is important that people with cognitive disabilities can easily access designs that make two things obvious: what the system does and how to set about doing it.
User Need 17: A user who is unfamiliar with the system or who has a learning or cognitive disability needs to know how to interact with it to achieve a particular goal.
REQ 17a: Provide prompts or menus of options that inform the user of what choices are available and what information is requested at each step of a dialogue with the system.
REQ 17b: Provide commands or menu options for requesting explanations and instructions that help the user to complete tasks successfully.
User Need 18: A user who is unfamiliar with the system or who has a learning or cognitive disability needs to use it without having to learn specific commands, requests, phrases or vocabulary.
REQ 18a: Design the system to respond appropriately to a variety of alternative words, phrases and sentences that may be used to ask the same question, to give the same command, or to supply the same information.
REQ 18b: Design the system to respond appropriately to words and phrases that are likely to be familiar to users of other systems with similar features.
REQ 18cEnable users to suppress or change commands and utterances recognized by the system, to save them, and to share these customizations with other users. This allows an individual user to configure a set of recognized utterances that is familiar to them, and to import customizations created for similar systems.
REQ 18d: If the user's input is ambiguous or cannot be processed, prompt for clarification or additional information, or present a menu of relevant choices.
Commands for performing a variety of functions typically supported by speech interfaces used for telephony and multimedia applications are standardized in [[ETSI-ES-202-076]].
User Need 19: A user with a learning or cognitive disability needs to review information, prompts or questions before deciding how to respond.
REQ 19a: Design the system to comply with a user's requests for its natural language output (e.g., spoken utterances) to be repeated.
REQ 19b: Summarize or present information that has been supplied by the user, then ask the user for confirmation, before performing irreversible actions such as financial transactions.
REQ 19c: If the text of the dialogue between the user and the system is presented in writing (e.g., on screen or via a braille device), ensure that the user can review the entire history of the conversation (scrolling the display, if necessary).
See WCAG 2.1 [[WCAG21]], success criterion 3.3.4.
Giving users enough time to interact
User Need 20: A user with a learning or cognitive disability needs ample time to decide how to respond during a dialogue with the system.
REQ 20a: Unless there are compelling reasons to the contrary, do not limit the amount of time available for the user to respond.
REQ 20b: If a time limit is unavoidable, allow the length of the time limit to be adjusted, or for the time limit to be eliminated, or prompt for the user to extend it before it expires.
REQ 20c: Warn users of time limits before any period of time that is subject to a limit begins.
REQ 20d: Provide a mode of operation in which the system reminds the user periodically that it is waiting for input, and of any time limit that has been imposed.
The mode of operation described in requirement 19d may be distracting or anxiety-provoking for some users. Therefore, it should be optional.
See WCAG 2.1 [[WCAG21]], success criteria 2.2.1, 2.2.3, and 2.2.6.
Communicating in language that is clear, simple, and appropriate to the audience
User Need 21: Users, especially those who have learning or cognitive disabilities, need the system to use language that is clear and comprehensible to them.
REQ 21a: Use language (including vocabulary and syntax) that is no more complex than is necessary for clear communication.
REQ 21b: Use vocabulary (including terminology) that is reasonably predicted to be familiar to the intended users of the system, including users who may have learning or cognitive disabilities.
REQ 21c: Provide a mode of operation in which simpler language than the default can be requested.
REQ 21d: Provide definitions or explanations of terms that are likely to be unfamiliar to intended users of the system, including users who may have learning or cognitive disabilities.
See WCAG 2.1 [[WCAG21]], success criteria 3.1.3, 3.1.4, and 3.1.5.
User Need 22: Users, especially those who have learning or cognitive disabilities, need the system to use language that is appropriate to their social and cultural context in order to be clear and understandable.
REQ 22a: Provide a mode of operation in which the use of language, including terminology, currency, units of measure, and date and time formats, is localized according to the user's preferences.
REQ 22b: By default, localize the use of language, including terminology, currency, units of measure, and date and time formats, to the user's country and region.
Pronunciation
User Need 23: Users, especially those who have learning or cognitive disabilities, need spoken language to be pronounced correctly in order to be understood.
REQ 23a: Provide a mode of operation in which the pronunciation (e.g., accent) of spoken language is localized according to the user's preferences.
REQ 23b: By default, localize the pronunciation of spoken language according to the user's country and region.
REQ 23c: Ensure that spoken text is pronounced correctly, including names, rarely occurring words, and words that have different pronunciations depending on context.
Avoiding and recovering from input errors
User Need 24: Users, especially those who have learning or cognitive disabilities, need opportunities to correct data entry errors using the input method of their choice.
REQ 24a: Check information provided by the user for errors.
REQ 24b: If errors are detected that can be automatically corrected with high reliability, make the correction and then prompt the user to confirm the information provided.
REQ 24c: For errors that cannot be reliably and automatically corrected, provide an explanation to the user and request valid information.
REQ 24d: Provide suggestions for correcting the error, if there is a known and relatively short list of alternative, valid responses.
See WCAG 2.1 [[WCAG21]], success criteria 3.3.1, 3.3.3, 3.3.4, and 3.3.6.
User Need 25: Users, especially those with learning or cognitive disabilities, need opportunities to avoid making errors that are irrevocable.
REQ 25: Provide means of reversing actions that can be made reversible.
See WCAG 2.1 [[WCAG21]], success criterion 3.3.6.
Using multimodal interfaces to enhance understanding
User Need 26: Some users with learning disabilities need textual information to be spoken and presented in written form simultaneously.
REQ 26: Provide a mode of operation in which textual information is spoken and presented on screen concurrently, with synchronized visual highlighting of the text as it is spoken.
The purpose of this multimodal presentation of text is to enhance comprehension of the material, especially by people with learning disabilities that affect reading.
User Need 27: Some users with learning or cognitive disabilities need graphical content that complements and reinforces the meaning of textual information.
REQ 27: If appropriate graphical conventions exist for presenting information that is provided to the user, then display a graphical presentation in addition to any textual (e.g., spoken) output.
Information presented graphically must also be available as text. See '' above.