Recent advances in machine learning and generative artificial intelligence (AI) have expanded the range of problems for which effective computational solutions are feasible. Examples of such problems include improving the accuracy of speech recognition, image recognition, performing programming tasks, question answering, and natural language interaction. Machine learning technology also has unique and significant limitations. This document examines the consequences of these developments for the accessibility of the Web to people with disabilities, encompassing issues relevant to content authors, users, and accessibility evaluators. In doing so, it both addresses the potential benefits and clarifies issues that should be considered in the further evolution of Web standards and applications.

Fundamental concepts of machine learning and generative AI are first introduced. The discussion then centers on a series of cases illustrative of accessibility related applications transformed by this technology.

This is a draft collection of relevant information related to cross-disability accessibility guidance of how developments in machine learning and generative Artificial Intelligence (AI) clearly bears an impact on web accessibility standards and processes. Given the rapid changes in the consumption and development of AI design, this is intended to be a starting point to group the accessibility implications of machine learning and generative AI technologies.

This is an early draft of the document, circulated for initial public review. Future revisions are expected to have a different structure, and to address topic not treated in the current draft. These additional topics may include, but are not limited to, the following. Comments on scope and structure are welcome. See also section

Accessibility issues raised by the use of machine learning and generative AI in Web-based applications, including its interactive role in contributing to the user interface.
The different considerations applicable to the application of machine learning in the authoring or content development environment, by contrast with the user's environment.
Further discussion of the role of machine learning-based AI in evaluating the accessibility of Web content.
The use of generative AI in software development, and in particular for writing or enhancing user interface code.
Clarification of the ways in which machine learning and generative AI affect computational problems relevant to accessibility, both positively in offering new functionality and opportunities for personalized interfaces, and negatively in creating unique risks.
Achieving appropriate combinations of AI and human expertise in meeting challenges of accessibility.
Approaches to evaluating and refining machine learning systems during their development that take into account accessibility-related requirements.

Introduction

Definitions

Artificial intelligence

Artificial Intelligence refers to a machine or computer program's ability to think and learn without explicit instructions. The terminology “AI” in this context is generally thought to have been established by John McCarthy in 1955 [[RN7]]. Although AI is used in a variety of disciplines, it is currently most relevant in machine learning and consumer applications known as generative AI.

Machine learning

Machine learning represents a field of study within the AI domain that has a focus towards statistical algorithm - capable of learning from data and performing tasks without specific instructions. This form of AI tends to focus on determinations and predictions.

Generative AI

While generative AI is generally classified as a subset of machine learning, its difference lies within its specific focus on creating new data based on specific human creative inputs such as text, images and audio [[RN8]].

Accessibility context

The rapid increase in the use of machine learning and the specific focus on generative AI represents potential benefits and challenges in the context of Web accessibility provision. As online generative AI platforms such as ‘ChatGPT’ continue to offer consumers the unrestricted ability to create text and images, including video and audio from a variety of inputs, it is important to consider how accessible these outputs will be presented, and if machine learning algorithm may address broader accessibility issues in everyday tasks. For example, a web browser can use machine learning and generative AI to update content in real-time. This could be done through means such as the addition of alternative text to images which do not have one, and recognising text that should be a heading but is not marked correctly.

These real-time changes would affect existing web standard such as the Web Content Accessibility Guidelines (WCAG), User Agent Accessibility Guidelines (UAAG) and the Authoring Tool Accessibility Guidelines (ATAG), assisting these standards in potentially addressing accessibility issues observed in various other accessibility related notes and resources. Furthermore, with the recent development by several manufacturers in launching computers and mobile devices with built-in Neural Processing Units (NPUs) to specifically support machine learning features on local devices, such as through the reduction of latency for live captioning. The potential for AI benefits within the web accessibility space could be profound.

In addition, machine learning also raises the prospect for existing tools to be used in identifying and remediating accessibility issues. For instance, the proficiency of automated testing tools and accessibility overlays could be significantly increased to provide greater accessibility support and guidance to relevant developers and designers. The WCAG Success Criteria could also be improved in the sense of how much can be checked using AI, while increasing the applicability of automated remediation.

However, the emerging nature of AI in its machine learning, and more specifically, its generative context, has made it difficult for users and developers alike to accurately perceive how effective automated accessibility tools are, and the degree of reliability they may offer in future directions. As such, this guidance is designed to provide some assistance in how machine learning is currently providing support, and its current level of effectiveness.

Purpose and Scope of this Document

This document addresses issues of accessibility raised by machine learning and generative AI that lie within the following scope.

The use of machine learning or generative AI to enhance the accessibility of Web content, including its application to
- code generation,
- web content creation and authoring tools,
- accessibility evaluation and content remediation, and
- the delivery of Web content to users (e.g., with automated accessibility enhancements).
The application of machine learning-based AI by users themselves, including
- its use in assistive technologies,
- its application in accessibility-related features of user agents,
- its inclusion in the accessibility features of Web-based applications (e.g., in implementing speech recognition as part of an automated captioning feature in a meeting application), and
- the accessibility of Web-based applications that support the use of generative AI to create content, notably the problem of ensuring that the output of such applications satisfies accessibility requirements for authors and audiences with disabilities.

In contributing to an understanding of these topics, this document

Identifies accessibility issues to be considered in connection with the applications described above,
identifies applicable W3C accessibility guidance that already exists, and
characterizes potential approaches or solutions that may resolve the accessibility issues identified.

The principal audiences of this document are

participants in relevant W3C Working Groups, Interest Groups, and Community Groups contributing to specifications, accessibility guidance, or Web technologies connected with machine learning or generative AI,
Researchers or software developers engaged in the use of these technologies, and who wish to enhance their knowledge of accessibility issues prior to publication of any formal guidance by the W3C or inclusion of accessibility considerations in applicable W3C specifications, and
Developers of applications employing machine learning, including generative AI, seeking to understand accessibility considerations prior to the development of any formal, W3C guidance.

AI and ML in authoring accessible content

@@ This section will discuss the nature of the role of AI and ML in assisting in the creation of accessible content. Some questions to consider are: What is needed in this space? How can we have confidence that AI/ML generated content is meeting real user needs? What does AI or ML do poorly? How can quality be determined? What guardrails are needed in order to ensure repair heuristics are of a high standard?

AI and the User Perspective - accessibility use cases

@@ This section explores the current guidance and standards as well as what relevant standards need to be considered when incorporating AI or ML technologies into development pipelines for accessible content creation.

Relevance of current standards and guidance

The W3C Accessibility Initiative (WAI) consists of three guidelines capable of assisting in the creation of accessible content that may potentially be supported by AI features. These guidelines include standards relating to web content, user agents and authoring tools.

The Web Content Accessibility Guidelines (WCAG) 2.2 features several guidelines and success criteria (SC) that are potentially being addressed, such as SC 1.1.1 with the provision of automated alternative text, and SC 1.2.4 with the provision of time-based media through automated live captioning.

In addition, the User Agent Accessibility (UAAG) Guidelines 2.0 could suggest that user agents, like web browsers with generative AI, could make real-time adjustments during browsing sessions, such as fixing color contrast issues and improving poor heading structures through the interpretation of codes.

Furthermore, the Authoring Tool Accessibility Guidelines (ATAG) 2.0 could also offer support, particularly in Part B, in which the creation of accessible content is of particular importance. For example, authoring tools that have automatically generated alternative text could support the creation of accessible content.

Alternative text for images

There currently exists a number of machine learning-based tools that have been integrated into popular social media platforms, alongside authoring tools, that are equipped to create an automated alternative text description based on machine learning algorithms that scan and determine the contents of visual materials, such as an image. Until recently, this automated process was considered to hold high inaccuracy to the point where its utility was questioned [[RN3]]. Recent developments have improved automated alternative text accuracy, but criticism persists due to limitations in providing detail and recognising the importance of relevant data.

A graph with different colored bars Description automatically generated — A coloured bar graph representing the favourite colour of children (inspired by an example from Twinkl, n.d.)

A good example can be seen in a popular image used to illustrate data (Twinkl, n.d.). The image features a classic bar graph of responses from children clarifying their favourite colour, in which yellow has been found the achieve the highest result with 9 responses. While an appropriate alternative text for the image should endeavour to capture the significant points of the graph with detail, such as its drawn intention - the information organising its X and Y axis, as well as the resulting data, the automated alternative text simply describes this image as “a graph with different coloured bars”. While technically accurate, this information lacks depth to convey important technical details from the graph.

A nebula in space with stars Description automatically generated — An image provided by the James Webb Space Telescope

A second example is shown through an image from the James Webb Space Telescope. As all images publicly released include automated alternative text, the alternative text for Figure 2 was compared to that of a manually created alternative text. The former reads the description, “The image is divided horizontally by an undulating line between a cloudscape forming a nebula along the bottom portion and a comparatively clear upper portion”, while the latter states: “Speckled across both portions is a starfield, showing innumerable stars of many sizes. The smallest of these are small, distant, and faint points of light. The largest of these appear larger, closer, brighter, and more fully resolved with 8-point diffraction spikes. The upper portion of the image is bluish and has wispy translucent cloudlike streaks rising from the nebula below.”

Upon observation, the automated alternative text presents a simplified iteration of the image, using the brief narration, “a nebula in space with stars”. As such, once again, this comparison supports that, while automated alternative text provided by machine learning is representative of the image being studied and could assist in delivering a basic and minimised understanding of an image, it does not have the ability to incorporate the orientation of detail required to capture the essence of the image.

Although machine learning techniques embedded in authoring tools and other platforms may provide some information, generative AI platforms that are able to create images, videos and other visual media content based on text input tend not to provide automated alternative text. Hence, this would make it difficult for people who are blind or have low vision to attain a meaningful interpretation of these AI-generated outputs.

Automatic Speech Recognition for captioning

A common alternative accessibility support is the provision of live captioning through the features of Automated Speech Recognition (ASR) processes, whereby generative AI is employed to sample speech and convert them into captions as the same language, or subtitles in a different language.

Until recently, the accuracy and validity of ASR techniques in the same language were considered as ineffective practices as the translation quality was poor due to an extra machine learning step that attempted to convert ASR outputs further. Currently however, the provision of captions has become increasingly reliable, with an approximate success rate of 85%. Furthermore, its ability to translate in real time has improved substantially [[RN5]].

While ASR captions may provide merits when serving the role of a complementary tool alongside curated content, it is still generally perceived by the Deaf community as lacking in accuracy, thereby preventing ASR from being considered as a truly beneficial feature. Other issues associated with ASR may include delays in processing time, as most contemporary solutions require devices to be connected for the generative AI features to occur online, the lack of grammar and punctuation, the lack of block captioning associated with pre-recorded materials, and the necessity for good quality audio in a quiet environment in order for ASR to optimise its output.

However, that is not to discredit the fact that some of these issues are beginning to be corrected. With the introduction of NPUs into consumer devices, ASR captions can be rendered on devices rather than online. Early testing suggests that this significantly improves the speed of captions being presented, as well as provides enhanced auto-correction features in an instance where ASR misunderstands a word or phrase.

While these advancements do not resolve all issues of ASR, and the notion remains that this process is still not adequately accessible as a standalone feature, they demonstrate that current generative AI processes are capable of potentially contributing to accessibility improvements.

ASR technologies may also be applied to audio-only material in which pre-recorded content could be provided to a generative AI process to output a transcript. While the removal of real-time ASR may increase reliability to a point where some specialist writing tools could be applicable to utilise this function, issues of accuracy, spelling, grammar, punctuation and errors persist.

Plain language

Current demonstrations of generative AI often employ examples of language being translated or simplified to showcase the potential of these mechanisms. While the ability for popular AI platforms to convert language based on specific word count, rhyming pattern and subject matter are often viewed as successful outcomes, converting complex language content to the level of lower secondary reading abilities often require human intervention.

Key concepts including the need for common words, the definition of words, and the removal of double negatives are exemplary demonstrations of how generative AI may achieve some positive effects. However, other plain language concepts such as the conversion of text into literal language, understanding different grammar tenses, and addressing text with nested clauses highlight some current limitations of AI. For example, a translation of the popular poem, “Mary Had a Little Lamb”, into a non-English language, and then back to English observes the error of “it’s fleece was white as snow” being converted to “it snowed sheep hair”. While the other lines in the poem were largely accurately translated, the issue of literal language and the shortcoming of generative AI in understanding regional contexts and language structures remains.

Automated Language detection

Within the WCAG 2.2 standard, Guideline 3.1 explains the necessity to globally define language on a page, as well as when there is a change of language in parts. This would then allow assistive technologies to understand the language change and adjust its language selection accordingly. A potential benefit to machine learning could be the improved ability in identifying the type of language used. This would then enable assistive technologies such as screen readers to select that language if supported.

Although there are few examples in a web context where the language of content is defined by machine learning processes, rather than coded directly, the ability for assistive technologies to immediately comprehend the language being presented could provide a considerable benefit.

Colour contrast

Another area where machine learning could identify an issue lies in the remediation of colour contrast issues as they are recognised. Automated testing tools at present are capable of detecting some colour contrast issues [[RN11]], hence, the possibility of real-time remediation could be likely in the case that a detected text colour contrast is below the 4.5:1 colour contrast ratio for foreground, background and text colour, or the 3:1 colour contrast ratio for user interface elements. With generative AI processes, these elements could be made more contrasting through colour adjustments, while largely preserving the colours intended by the original author.

Heading structure

Currently, the use of elements such as bold text to look like a heading without the provision of a programmatically determined heading structure is a significant accessibility issue that remains in web content. With a generative AI feature, web pages or app screens can visually identify headings and their nested content, utilising current technologies. While there is presently no awareness of its potential implementation, the remediation of heading structure represents a significant opportunity to provide improvements in readability and navigation should generative AI be capable of effectively addressing this issue.

Adjustment of visual spacing

At present, some web browsers offer the proficiency to structurally reorder web content to optimise the aspect of readability. As enhancements in machine learning continues, it is likely that such features will continue to improve and confront accessibility concerns relating to text, word, and line spacing for the purpose of better supporting people with a cognitive and print disability.

Link purpose

Presently, a widespread issue existing within online channels may be attributed to the use of non-descriptive links such as “click here” or “read more”, as identified in WCAG 2.2 SC 2.4.4 Link Purpose. While it is currently necessary to create links with a descriptive text to remediate the issue, generative AI represents an opportunity for issues to be addressed in real-time. This could be accomplished through the employment of AI following a non-descriptive link to its source, and distinguishing its content before resolving the link for users in a manner that is more indicative of what the link represents.

Sign language

The utility of sign language in relation to WCAG presently stands as a Level AAA success criterion that is not as widely implemented within policy-related legislative frameworks, with compliancy up to Level AA appearing to be the most adopted. Thus, support for sign language is limited within time-based media such as online videos.

However, thanks to generative AI, it is now feasible to provide language translation services from text or symbol-based languages to sign language. Some websites and movie apps already offer a limited version of this capability. On the other hand, while this observation shows promise for the future use of sign language, the effectiveness of generative AI in translating sign language, coupled with the diverse localised variations in sign language across countries, poses challenges for providing a fully effective automated solution at this time.

AI for evaluation tools & accessibility testing

At present, there are several automated testing tools available based on coding assessments within websites, apps and documents. Such tools are often employed to identify issues of non-compliance with the WCAG standard, additionally providing subsequent guidance in remediation. Some of these tools are free of charge but limited in functionality, while others provide enterprise-level remediation guidance, and is capable of monitoring web content in real time.

Although these tools are generally considered a helpful companion for remediation, they currently possess flaws; some notable issues include acquiring different results between the use of different tools [[RN2]], difficulty in determining if the tool has identified a specific issue, or simply noting that the issue requires review. These issues follow research that has constantly indicated automated tools bear the restricted capacity of executing low-level coverage in checking and recommending remediations, with the remaining requiring some form of intervention [[RN6]].

While automated testing tools have been largely based on scripting language such as JavaScript to manually review and report on code, deep machine learning is likely to improve these tools to a considerable degree going forward, allowing the application of generative AI processes to more clearly identified issues. For instance, most tools may accurately determine if alternative text is available for images but cannot determine the effectiveness of present alternative text. With the inclusion of generative AI, improvements are likely to be offered in what automated tools can evaluate, such as the quality of captions, descriptive links and correcting other issues mentioned previously such as the use of language and headings. Although there is currently little evidence of generative AI featuring in such tools, it is very much possible that testing and evaluation of web content will improve along with the rest of the rapidly evolving generative AI content.

AI and user interface generation

@@This section will discuss how AI can be used to create and/or modify the user interface. Some core things to consider: What need is being met when we ask AI to modify or change a UI? What does an MVP AI generated UI look like?. How will the quality of generated user interfaces be determined? Are there potential harms and anti-patterns that need to be considered?

Accessibility Overlays

The rapid increase of accessibility overlays on websites has been viewed as rather controversial by people with disability. While these tools could be useful for individuals unfamiliar with assistive technologies that are built into computing and mobile devices, critics of overlays point to the tools being marketed as an accessibility solution, thus causing the code to interrupt the use of more developed assistive technologies such as screen readers [[RN9]]. Furthermore, these overlay features carry the tendency to be limited in functionality as compared to tools installed in an operating system.

However, the promise of generative AI may be able to address the criticism that such tools lack functionality. An accessibility overlay capable of utilising generative AI functionality may be able to provide increased real-time support in overcoming accessibility issues or improving its interpretation of content, such as for images, language and page structure. Although these tools are currently promoted as a collection of accessibility features somewhat independent from the content, the applicability of an overlay that contributes accessibility improvements is similar to the use of AI chatbots and other prompting mechanisms, thereby suggesting this may prove to be another area where generative AI could introduce improvements.

Potential harms and anti-patterns in AI / ML

@@This section looks at some of the existential aspects of AI / ML. These are in the context of supporting the needs of people with disabilities successfully, and explores the role of AI/ML from the perspective of its principle etiology or reason for being and how that may impact the field of accessibility for better or for worse.

There are also some potential threats in terms of the deterioration of overall quality in the field, or moving towards an overreliance on tools that may be fundamentally flawed or biased.

There are secondary issues or harms inherent in outsourcing human expertise where quality aspects of universal or inclusive design that are often from deep human practitioner may come in sharp contrast with brute force computational approaches to 'fixing the web' that may have a lighter architecture or be built on superficial or weaker knowledge.

These AI / ML approaches may be then built on leaky abstractions and weaker understanding of both the technical requirements of best practices and the user needs they are trying to address.