Abstract
This document outlines various accessibility related user needs, requirements and scenarios for real-time communication (RTC). These user needs should drive accessibility requirements in various related specifications and the overall architecture that enables it. It first introduces a definition of RTC as used throughout the document and outlines how RTC accessibility can support the needs of people with disabilities. It defines the term user needs as used throughout the document and then goes on to list a range of these user needs and their related requirements. Following that some quality related scenarios are outlined and finally a data table that maps the user needs contained in this document to related use case requirements found in other technical specifications.
This document is most explicitly not a collection of baseline requirements. It is also important to note that some of the requirements may be implemented at a system or platform level, and some may be authoring requirements.
Introduction
What is real-time communication (RTC)?
Real-time communication (RTC) is an evolution beyond the traditional data exchange model of client to server resulting in real-time peer to peer audio, video and data exchange directly between supported user agents. This allows instantaneous applications for video, text and audio calls, text chat, file exchange, screen sharing and gaming, all without the need for browser plug-ins. While real-time communication (RTC) applications are enabled in the main by specifications like WebRTC, WebRTC is not the sole specification with responsibility to enable accessible real-time communication applications. The use cases and requirements are broad - for example as outlined in the IETF RFC 7478 'Web Real-Time Communication Use Cases and Requirements' document. [[ietf-rtc]] [[webrtc]]
Real-time communication and accessibility
RTC has the potential to allow improved accessibility features that will support a broad range of user needs for people with a wide range of disabilities. These needs can be met through improved audio and video quality, audio routing, captioning, improved live transcription, transfer of alternate formats such as sign-language, text-messaging / chat, real time user support and status polling.
RTC accessibility is enabled by a combination of technologies and specifications such as those from the Media Working Group, Web and Networks Interest Group, Second Screen, and Web Audio Working group as well as AGWG and ARIA. The Accessible Platform Architectures Working Group (APA) hopes this document will inform how these groups meet various responsibilities for enabling accessible RTC, as well updating related use cases in various groups. For examples, view the current work on WebRTC Next Version Use Cases First Public Working Draft. [[webrtc-use-cases]]
User needs definition
This document outlines various accessibility related user needs for RTC accessibility. The term 'user needs' in this document relates to what people with various disabilities need to successfully use RTC applications. These needs may relate to having particular supports in an application, being able to complete tasks or access other functions. These user needs should drive accessibility requirements for RTC accessibility and its related architecture.
User needs are presented here with their related requirements; some in a range of scenarios (which can be thought of as similar to user stories).
User needs and requirements
The following outlines a range of user needs and requirements. The user needs have also been compared to existing use cases for real-time text (RTT) such as the IETF 'Framework for Real-Time Text over IP Using the Session Initiation Protocol (SIP)' RFC 5194 and the European Procurement Standard EN 301 549. [[rtt-sip]] [[EN301-549]]
Window anchoring and pinning
- User Need 1: A deaf or hard of hearing user needs to anchor or pin certain windows in an RTC application so both a sign language interpreter and the person speaking (whose speech is being interpreted) are simultaneously visible.
- REQ 1a: Provide the ability to anchor or pin specific windows so the user can associate the sign language interpreter with the correct speaker.
- REQ 1b: Allow the use of flexible pinning of captions or other related content alternatives. This may be to second screen devices.
- REQ 1c: Ensure the source of any captions, transcriptions or other alternatives is clear to the user, even when second screen devices are used.
- REQ 1d: Atomic pieces of data such as information regarding the person currently speaking, activities such as the people entering or leaving a meeting, or the last message posted in the chat channel, can be pinned to a user interface.
- REQ 1e: For pinned content, there is a need to handle and support the metadata that allows the client engine to re-aggregate or re-route any pinned content.
Not all atomic items necessarily are pinned next to other atomic elements but some may be dependent, related or updated synchronously. For example, if there are multiple atomic data points destined for an 80 character braille display that has been sectioned to display 4 atomic items in up to 19 spaces each (leaving at least one blank cell for spacing).
Here the term atomic relates to small pieces of data. For the purposes of accessibility conformance testing, the definitions and use of the terms 'atomic' and 'atomic rules' may also be useful. [[applicability-atomic]] [[rule-types]]
Pause 'on record' captioning in RTC
- User Need 2: A deaf or hard of hearing user may need captioning of content to be private in a meeting or presentation.
- REQ 2a: Ensure there is a host operable toggle in the captioning service (whether human or automated) that facilitates going on and off record for the preserved transcript, but continues to provide captions meanwhile for 'off record' conversations.
- REQ 2b: Ensure the toggle between saving recordings also applies to the saving of captions. There should be a mechanism that both audio and captions can be paused or stopped, and both can be simultaneously restored for recording.
Accessibility user preferences and profiles
- User Need 3: A user may need to change device or environment and have their accessibility user preferences preserved.
- REQ 3a: Ensure user profiles and accessibility preferences in RTC applications are mobile and can move with the user as they change device or environment.
Incoming calls and caller ID
- User Need 4: A screen-reader user or user with a cognitive impairment needs to know a call is incoming and needs to recognise the ID of the caller. A deaf or hard of hearing user may also need to identify an incoming relay call.
- REQ 4a: Provide indication of incoming calls in an unobtrusive way via a symbol set or other browser notification.
- REQ 4b: Alert assistive technologies via relevant APIs.
- REQ 4c: Support the presentation and display of call prefix information for relay calls.
Successful design of operations required for acting on incoming calls, getting informed about who the caller is and connecting relay services should not require complicated sequences of user actions.
Routing and communication channel control
- User Need 5: A user of speech and Augmentative and Alternative Communication (AAC), or a blind user of screen reader and braille output devices simultaneously, will need to manage audio and other output separately.
- REQ 5a: Provide or support a range of browser level audio output options.
- REQ 5b: Allow controlled routing of alerts and other browser output to a braille device or other hardware.
- User Need 6: A deaf user needs to move parts of a live teleconference session (as separate streams) to one or more devices for greater control.
- REQ 6a: Allow the separate routing of video streams such as, captioning or a sign language interpreter to a separate high resolution display.
- User Need 7: Users with cognitive disabilities or blind users may have relative volume levels set as preferences that relate to importance, urgency or meaning.
- REQ 7a: Allow the panning or setting of relative levels of different audio output.
- REQ 7b: Support multichannel audio in the browser.
Audio description in live conferencing
- User Need 8: A user may struggle to hear audio description in a live teleconferencing situation.
- REQ 8a: Ensure Audio Description (AD) recommended sound values are dynamic and have independent volume, EQ adjustment and routing capability.
- REQ 8b: Support a users custom EQ profile.
- REQ 8c: If not transmitted in a live screen share - ensure the platform doesn't strip captions or descriptions that may have been part of the original video.
Moving beyond mono in this context is also important, as the stereo spread allows audio descriptions to be sound staged. Applications should also inherit customization settings from the user's operating system.
Quality synchronisation and playback
- User Need 9: Any user watching captioning or audio description needs to be confident it is synchronised and accurate.
- REQ 9a: Ensure that any outages or loss to captioning or audio description will be repaired while preserving context and meaning.
- REQ 9b: Ensure that the integrity of related alternate supporting tracks or streams such as transcriptions, are in sync with any repairs.
Simultaneous voice, text & signing
- User Need 10: A deaf user needs to simultaneously talk on a call, send and receive real-time text and/or instant messages via a text interface and watch sign language using a video stream.
- REQ 10a: Ensure support for multiple simultaneous streams.
This user need may also indicate necessary support for 'Total conversation' services as defined by ITU in WebRTC applications. These are combinations of voice, video, and RTT in the same real-time session. [[total-conversation]]
Emergency calls: Support for Real-Time Text (RTT)
- User Need 11: In an emergency situation an Augmentative and Alternative Communication (AAC) user, deaf, speech impaired, hard of hearing or deaf blind user needs to make an emergency call, instantly send and receive related text messages and/or sign via a video stream.
- REQ 11a: Provide or ensure support for RTT in WebRTC.
- REQ 11b: Avoid the problem of unsent emergency messages. A user may not be aware when they have not successfully sent an emergency message. For example, RTT avoids this problem due to instantaneous data transfer but this may be an issue for other messaging methods or platforms.
Text and Video relay services (VRS)
- User Need 12: A deaf, speech impaired, or hard of hearing user, needs to communicate on a call using a remote video interpretation service (VRI) to access sign language and interpreter services.
- REQ 12a: Provide or ensure support for video relay and remote interpretation services. This user need may relate to interoperability with third-party services; IETF has looked at standardizing a way to use Session Initiation Protocol (SIP) with VRS services. [[ietf-relay]]
- REQ 12b: Provide VRS and VRI support for different specified sign languages and various spoken language translations. A user may also need to stream or pin both.
- REQ 12c: Ensure that privacy and security options are maintained when using relay services.
To successfully connect video or text relay services should not require a complicated sequence of user actions.
Distinguishing sent and received text with RTT
- User Need 13: A deaf or deaf blind user needs to tell the difference between incoming text and outgoing text.
- REQ 13a: Ensure when used with RTT functionality, WebRTC handles the routing of this information to a format or output of the users choosing.
Call participants and status
- User Need 14: In a teleconference a user needs to know what participants are on the call, as well as their status.
- REQ 14a: Ensure participant details such as name and status; whether the person is muted or talking is accessible to users of assistive technologies.
- REQ 14b: Ensure participant metadata such as their name, their affiliation or other relevant information, is correctly associated with the meeting record and can be preserved for review after the call. This should be done with the participants consent.
Captioning support
- User Need 15: A deaf user or user with a cognitive disability needs to access a channel containing live transcriptions during a conference call or broadcast.
- REQ 15a: Honor user preferences relating to captioned content. Provide support for signing or use of symbol sets e.g. Augmentative and Alternative Communication (AAC).
Assistance for users with cognitive disabilities
- User Need 16: Users with cognitive disabilities may need assistance when using audio or video communication.
- REQ 16a: Ensure a WebRTC video call can host a technical or user support channel.
- REQ 16b: Provide support that is customised to the needs of the user. This may be via a relay service or speech-speech-relay-service.
Personalized symbol sets for users with cognitive disabilities
- User Need 17: Users with cognitive disabilities may need to use symbol sets or AAC for identifying functions available in a WebRTC enabled client for voice, file or data transfer.
- REQ 17a: Provide personalization support for symbols set replacements of existing user interface rendering of current functions or controls.
This relates to cognitive accessibility requirements. For related work at W3C see the 'Personalization Semantics Content Module 1.0' and 'Media Queries Level 5'. [[personalization]] [[media-queries]]
Internet relay chat (IRC) style interfaces
- User Need 18: To translate text to speech interactions into comprehensible speech; a blind screen reader user depending on text to speech (TTS) to interact with their computers and smart devices needs a traditional Internet relay chat (IRC) style interface.
- REQ 18a: Preserve IRC as a configuration option in user agents that implement WebRTC as opposed to having only the real-time text type interface. RTT is favoured by users who are deaf or hearing impaired. For screen reader users, TTS cannot reasonably translate text into comprehensible speech unless characters are transmitted in very close timing to one another. Typical gaps will result in stuttering and highly unintelligible speech output from the TTS engine.
Some braille users will also prefer the RTT model. However, braille users desiring text displayed with standard contracted braille might better be served in the manner users relying on TTS engines are served, by buffering the data to be transmitted until an end of line character is reached.
Relationship between RTC and XR Accessibility
There are potential real-time communication application issues that may only apply in immersive environments or augmented reality contexts.
For example, if an RTC application is also an XR application then relevant XR accessibility requirements should be addressed as well. [[xaur]]
Quality of service scenarios
Audio frequency bandwidth
Scenario: A hard of hearing user needs better stereo sound to have a quality experience in work calls or meetings with friends or family. Transmission aspects, such as decibel range for audio needs to be of high-quality. For calls, industry allows higher audio resolution but still mostly in mono only.
EN 301 549 Section 6, recommends for WebRTC enabled conferencing and communication the application shall be able to encode and decode communication with a frequency range with an upper limit of at least 7KHz. More details can be found at Accessible Procurement standard for ICT products and services EN 301 549 (PDF)
Quality requirements for video
Scenario: A hard of hearing user needs better stereo sound so they can have a quality experience in watching HD video or having a HD meeting with friends or family. Similarly for video quality, transmission aspects such as frames per second needs to be of high-quality.
A hard of hearing user often combines their perception of speech from audio with their perception of lip movement and other visual clues to create an overall understanding of speech. For the visual parts, the requirements on video are the same as expressed in '5.1 Deaf users: Video resolution and frame rates' about perception of sign language because lip movements are also part of sign language, equally rapid and as detailed as the other parts of sign language.
EN 301 549 Section 6, recommends for WebRTC enabled conferencing and communication the application shall be able to encode and decode communication with a frequency range with an upper limit of at least 7KHz. More details can be found at Accessible Procurement standard for ICT products and services EN 301 549 (PDF)
WebRTC lets applications prioritise bandwidth dedicated to audio / video / data streams; there is also some experimental work in signalling these needs to the network layer as well as support for prioritising frame rate over resolution in case of congestion. [[webrtc-priority]]
Change Log
The following is a list of new user needs and requirements since the publication of the previous working draft:
- Window anchoring and pinning: A deaf or hard of hearing user needs to anchor or pin certain windows in an RTC application so both a sign language interpreter and the person speaking (whose speech is being interpreted) are simultaneously visible.
- REQ 1a: Provide the ability to anchor or pin specific windows so the user can associate the sign language interpreter with the correct speaker.
- REQ 1b: Allow the use of flexible pinning of captions or other related content alternatives. This may be to second screen devices.
- REQ 1c: Ensure the source of any captions, transcriptions or other alternatives is clear to the user, even when second screen devices are used.
- REQ 1d: Atomic pieces of data such as information regarding the person currently speaking, activities such as the people entering or leaving a meeting, or the last message posted in the chat channel, can be pinned to a user interface.
- REQ 1e: For pinned content, there is a need to handle and support the metadata that allows the client engine to re-aggregate or re-route any pinned content.
- Pause 'on record' captioning in RTC : A deaf or hard of hearing user may need captioning of content to be private in a meeting or presentation.
- REQ 2a: Ensure there is a host operable toggle in the captioning service (whether human or automated) that facilitates going on and off record for the preserved transcript, but continues to provide captions meanwhile for 'off record' conversations.
- REQ 2b: Ensure the toggle between saving recordings also applies to the saving of captions. There should be a mechanism that both audio and captions can be paused or stopped, and both can be simultaneously restored for recording.
- Accessibility user preferences and profiles: A user may need to change device or environment and have their accessibility user preferences preserved.
- REQ 3a: Ensure user profiles and accessibility preferences in RTC applications are mobile and can move with the user as they change device or environment.
The following is a list of updated requirements to existing user needs:
- Incoming calls and caller ID - REQ 4c: Support the presentation and display of call prefix information for relay calls.
- Audio description in live conferencing - REQ 8b: Support a users custom EQ profile.
- Audio description in live conferencing - REQ 8c: If not transmitted in a live screen share - ensure the platform doesn't strip captions or descriptions that may have been part of the original video.
- Emergency calls and RTT - REQ 11b: Avoid the problem of unsent emergency messages. A user may not be aware when they have not successfully sent an emergency message. For example, RTT avoids this problem due to instantaneous data transfer but this may be an issue for other messaging methods or platforms.
- Video relay services (VRS) and video remote interpretation (VRI) - REQ 12b: Provide support for other sign languages and translations. For example, VRS calls may be made between a sign language user and a person speaking another language. There are variations in signing itself such as Irish Sign Language (ISL), which is related to French sign language, and British Sign Language (BSL). A user may need to stream or pin both.
- Video relay services (VRS) and video remote interpretation (VRI) - REQ 12c: Ensure that privacy and security options are maintained when using relay services.
The following are other changes in this document:
- Changed the title of 'Dynamic audio description values in live conferencing' to 'Audio description in live conferencing'.
- New note on the relationship between RTC and XR Accessibility User Requirements.
- New note on personalization semantics and CSS media queries.
- Moved 'User Need 19: A deaf user watching a signed broadcast needs a high-quality frame rate to maintain legibility and clarity in order to understand what is being signed' to the 'Quality of service issues' section.
- Added note on ITU definition of Total Conversation services that relates to 'REQ 10a: Ensure support for multiple simultaneous streams'.
This user need may also indicate necessary support for 'Total conversation' services as defined by ITU in WebRTC applications. These are combinations of voice, video, and RTT in the same real-time session. [total-conversation]
This document has been updated based on document feedback, discussion and Research Questions Task Force consensus.
Acknowledgments
Participants of the APA working group active in the development of this document:
- Shadi Abou-Zahra, W3C
- Judy Brewer, W3C
- Michael Cooper, W3C
- Scott Hollier, Edith Cowan University & Centre For Accessibility
- Stephen Noble, Pearson Plc
- Joshue O'Connor, W3C
- John Paton, Royal National Institute of Blind People
- Janina Sajka, Invited Expert
- Jason White, Educational Testing Service
Previously Active Participants, Commenters, and Other Contributors
- Lidia Best, ITU-T
- Wendy Dannels, National Technical Institute for the Deaf
- Dominique Hazael-Massieux, W3C
- Gunnar Hellström, Omnitor
- Masahito Kawamori, ITU-T
- Steve Lee, W3C
- Estella Oncins Noguer, TransMedia UAB
- Lisa Seeman, Invited Expert