Inaccessibility of CAPTCHA

Abstract

Various approaches have been employed over many years to distinguish human users of web sites from robots. While the traditional CAPTCHA approach of asking the user to identify obscured text in an image remains common, other approaches have also emerged. These approaches generally require users to perform a task believed to be relatively easy for humans but difficult for robots. Unfortunately the nature of the task inherently excludes many people with disabilities, resulting in an incorrect denial of service to these users. Research findings also have indicated that many popular CAPTCHA techniques are no longer particularly effective or secure challenging providers to deploy alternative approaches to block robots, yet support access for people with disabilities. This document examines a number of solutions that allow systems to test for human users, and the extent to which these solutions adequately accommodate people with disabilities.

Introduction

Web sites with resources that are attractive to aggregators such as sign-up web pages, travel and event ticket sites, web-based email accounts, and social media portals have long taken measures to ensure that they can offer their service to individual users without exposing their data and content to web robots.

An early solution was the use of graphical representations of text in registration or comment areas of the web site. The site would attempt to verify that the user was in fact a human by requiring the user to complete a task referred to as a Completely Automated Public Turing Test, or CAPTCHA. The assumption was that humans would find this task easy, while robots would find it nearly impossible to perform.

The CAPTCHA was initially developed by researchers at Carnegie Mellon University and has been primarily associated with a technique whereby an individual had to identify a distorted set of characters from a bit-mapped image, then enter those characters into a web form. This approach is widely familiar to users of the web, though the term CAPTCHA is generally recognized only by web professionals.

In recent times the types of CAPTCHA that appear on web sites and mobile apps have changed significantly. Since our concern here is the accessibility of systems that seek to distinguish human users from their robotic impersonators, the term “CAPTCHA” is used in this document generically to refer to all approaches which are specifically designed to differentiate a human from a computer. We also include fully noninteractive approaches in our categorization.

While online users broadly have reported finding traditional CAPTCHAs frustrating to complete, it is generally assumed that a CAPTCHA can be resolved within a few incorrect attempts. The point of distinction for people with disabilities is that a CAPTCHA not only separates computers from humans, but also often prevents people with disabilities from performing the requested procedure. For example, asking users who are blind, visually impaired or dyslexic to identify textual characters in a distorted graphic is asking them to perform a task they are intrinsically least able to accomplish. Similarly, asking users who are deaf or hearing impaired to identify and transcribe in writing the content of an audio CAPTCHA is asking them to perform a task they’re intrinsically least likely to accomplish. Furthermore, traditional CAPTCHAs have generally presumed that all web users can read and transcribe a particular character set or English-based words, thus making the test inaccessible to a large number of web users worldwide.

While Accessibility best practices require, and assistive technologies expect, substantive graphical images to be authored with text equivalents, alternative text on CAPTCHA images would clearly be self-defeating. CAPTCHAs are, consequently, allowed under the W3C's Web Content Accessibility Guidelines (WCAG) provided that "text alternatives that identify and describe the purpose of the non-text content are provided, and alternative forms of CAPTCHA using output modes for different types of sensory perception are provided to accommodate different disabilities."

The rationale for this requirement is simple. A CAPTCHA without an accessible and usable alternative makes it impossible for users with certain disabilities to create accounts, write comments, or make purchases on such sites. In essence, such CAPTCHAs fail to properly recognize users with disabilities as human, obstructing their participation in contemporary society. Such issues also extend to situational disabilities whereby a user may not be able to effectively view a traditional CAPTCHA on a mobile device due to the small screen size, or hear an audio-based CAPTCHA in a noisy environment.

Stand-Alone Approaches

There are many techniques available to web sites to discourage or eliminate fraudulent activities such as inappropriate account creation. Several of them may be as effective as the visual verification technique while being more accessible to people with disabilities. Others may be overlaid as an accommodation for the purposes of accessibility. The following list highlights common CAPTCHA types and their respective accessibility implications.

Traditional character-based CAPTCHA

The traditional character-based CAPTCHA, as previously discussed, is largely inaccessible and insecure. It focuses on the presentation of letters or words presented in an image and designed to be difficult for robots to identify. The user is then asked to enter the CAPTCHA information into a form.

The use of a traditional CAPTCHA is obviously problematic for people who are blind, as the screen readers they rely on to use web content cannot process the image, thus preventing them from uncovering the information required by the form. Because the characters embedded in a CAPTCHA are often distorted or have other characters in close proximity to each other in order to foil technological solution by robots, they are also very difficult for users with other visual disabilities. This common CAPTCHA technique is also less reliably solved by users with cognitive and learning disabilities, see The Effect of CAPTCHA on User Experience among Users with and without Learning Disabilities [[captcha-ld]]. Because they’re intentionally distorted to foil robots, they also foil users who are more easily confused by surrealistic images or who do not possess sufficiently acute vision to “see” beyond the presented distortion and uncover the text the site requires in order to proceed.

While some sites have begun providing CAPTCHAs utilizing languages other than English, an assumption that all web users can understand and reproduce English has predominated. Clearly, this is not the case. Native and literate Arabic or Thai speakers, for example, should not be assumed to possess a proficiency with the ISO 8859-1 character set [[iso-8859-1]], let alone ready access to a keyboard to facilitate their ready reproduction in the CAPTCHA's form field—demonstrating an important barrier imposed by CAPTCHAs based on written English and related language character sets; see Effects of Text Rotation, String Length, and Letter Format on Text-based CAPTCHA Robustness [[captcha-robustness]].

Sound output

To re-frame the problem, text is easy to manipulate, which is good for assistive technologies, but just as good for robots. One logical solution to this problem is to offer another non-textual method of using the same content. To achieve this, audio is played that contains a series of characters, words, or phrases being read out which the user then needs to enter into a form. As with visual CAPTCHA however, robots are also capable of recognizing spoken content—as Amazon’s Alexa and Android’s Google Assistant, among other spoken dialog systems, have so ably demonstrated. Consequently, the characters, words, or phrases the user is to uncover and transcribe in the form are also distorted in an audio CAPTCHA and are usually played over a sonic environment of obfuscating sounds.

The industry recognized this problem early. CNet reported in Spam-bot tests flunk the blind [[newscom]] that “Hotmail’s sound output, which is itself distorted to avoid the same programmatic abuse, was unintelligible to all four test subjects, all of whom had good hearing.”

If the sound output, which is itself distorted to avoid the same programmatic abuse, can render the CAPTCHA difficult to hear; there can also be confusion in understanding whether a number is to be entered as a numerical value or as a word, e.g. ‘7’ or ‘seven’. Often the audio CAPTCHA user will hear sounds which seem to be words or numerical values that should be entered, but turn out to be just background noise.

Sound is also intrinsically temporal, but the import of this unavoidable fact is too often under appreciated—perhaps because the world we live in as seen through the eyes is also temporal. Unlike the real world seen through the eyes however, the traditional CAPTCHA is a still image that can be stared at until comprehension dawns. Sound has no analog to the visual still image.

Whenever any portion of an audio CAPTCHA is not understood; at least some part of the CAPTCHA must be replayed, usually several times. Currently, few audio CAPTCHAs provide an easily invoked and reliable replay feature, let alone a pause, rewind, and fast-forward feature. Consequently, an entirely new audio CAPTCHA is often played should any part of one audio CAPTCHA prove difficult to understand.

Some audio CAPTCHA tacitly admit this failure by offering a link allowing the user to Download the audio CAPTCHA, typically as a mp3 file. The implicit assumption is that the user will use a favorite audio player—which does provide pause, play, rewind, and fast forward capabilities—to play the audio CAPTCHA MP3 file again and again until comprehension dawns, perhaps pausing and rewinding the playback and perhaps writing down on the side the text destined for the web form. Clearly this is very inconvenient and subject to web site time outs. It also illustrates why simply providing an audio CAPTCHA alternative to the traditional visual CAPTCHA does not provide equivalent access to the user.

Furthermore, just as not all web users should be presumed proficient with English in visual CAPTCHA, they should not be presumed capable of understanding and transcribing aural English in an audio CAPTCHA. Unfortunately, non English audio CAPTCHAs appear to be very rare indeed. We are aware of only one multilingual CAPTCHA solution provider with support for a significant number of the world's languages.

Users who are deaf-blind, don’t have or use a sound card, find themselves in noisy environments, or don’t have required sound plugins properly configured and functioning, are thus also prevented from proceeding. Furthermore, relatively few audio CAPTCHAs properly support all the various browsers and operating systems in use today. Similarly, users of browsers which do not support easy direction of sound output to a particular audio device, or to all available audio devices on the system, are also hampered.

Although auditory forms of CAPTCHA that present distorted speech create recognition difficulties for screen reader users, the accuracy with which such users can complete the CAPTCHA tasks is increased if the user interface is carefully designed to prevent screen reader audio and CAPTCHA audio from being intermixed. This can be achieved by implementing functions for controlling the audio that do not require the user to move focus away from the text response field; see Evaluating existing audio CAPTCHAs and an interface optimized for non-visual use [[eval-audio]].

Experiments with a combined auditory and visual CAPTCHA requiring users to identify well known objects by recognizing either images or sounds, suggest that this technique is highly usable by screen reader users. However, its security-related properties remain to be explored, as mentioned in Towards a universally usable human interaction proof: evaluation of task completion strategies [[task-completion]].

Biometrics

Biometric identifiers have become a very popular authentication mechanism, especially on mobile platforms which routinely now provide the requisite hardware. Some physical characteristic of the user, such as a fingerprint or a facial profile, is first acquired and then recognized to verify the individual’s identity. This process effectively limits the ability of web robots to create a large number of false identities.

However, biometric authentication mechanisms also need to be carefully designed to avoid introducing accessibility barriers. Individuals who lack the biological characteristics required by a particular authentication method, e.g., fingers, or who are unable to perform the enrollment or identification procedures are effectively precluded from using it. This can result in denial of access to certain users with disabilities on systems relying on biometrics for authentication. Consequently, reliance on a single biometric identifier to identify a user is now insufficient to satisfy public sector procurement standards in the European Union EN 301 549, section 5.3 [[en-301-549]] and regulations under section 508 of the Rehabilitation Act and Section 255 of the Communications Act, 36 CFR 1194, Appendix C, section 403 in the United States [[36-cfr-1194]].

Where biometrics are used as an alternative to CAPTCHA, systems should be designed to allow users to choose among multiple and unrelated biometric identifiers. It should also be noted that biometrics can reliably and uniquely identify individuals making these identifiers highly attractive as login authentication mechanisms. This alternative is unsuitable, however, for applications in which it is necessary to preserve the user’s anonymity (i.e., the application is required to verify solely that the user is human, without obtaining identifying information).

Limited-use accounts

Users of free accounts very rarely need full and immediate access to a site’s resources. For example, users who are searching for concert tickets may need to conduct only three searches a day, and new email users may only need to send the same notification of their new address to their friends. Sites may create policies that limit the frequency of interaction explicitly (that is, by disabling an account for the rest of the day) or implicitly (by slowing the response times incrementally). Creating limits for new users can be an effective means of making high-value sites unattractive targets to robots.

Drawbacks to this approach include the need to perform sufficient testing and data collection to determine useful limits that will serve human users yet frustrate robots. It requires site designers to look at statistics of normal and exceptional users, and determine whether clear demarcation exists between them.

Non-interactive checks

While CAPTCHA and other interactive approaches to limiting the activities of web robots are sometimes effective, they do make using a site more cumbersome. This is often unnecessary, as non-interactive mechanisms exist to check for spam or other invalid content typically introduced by robots.

This category contains three non-interactive approaches: spam filtering, in which an automated tool evaluates the content of a transaction, heuristic checks, which evaluate the behavior of the client, and honeypots which are designs intended to be ignored, or even to be invisible w to users while trapping robotic responses.

Spam filtering

Applications that use continuous authentication and “hot words” to flag spam content, or Bayesian filtering to detect other patterns consistent with spam, are very popular, and quite effective. While such risk analysis systems may experience false negatives from time to time, properly-tuned systems can achieve results comparable to a traditional visual CAPTCHA, while also removing the added cognitive burden on the user and eliminating access barriers.

Most major blogging software contains spam filtering capabilities, or can be fitted with a plug-in for this functionality. Many of these filters can automatically delete messages that reach a certain spam threshold, and mark questionable messages for manual moderation. More advanced systems can control attacks based on posting frequency, filter content sent using the Trackback protocol, and ban users by IP address range, temporarily or permanently.

Heuristic Approaches

Heuristics are discoveries in a process that seem to indicate a given result. It may be possible to detect the presence of a robotic user based on the volume of data the user requests, series of common pages visited, IP addresses, data entry methods, or other signature data that can be collected.

Again, this requires a careful examination of site data. If pattern-matching algorithms can’t find good heuristics, then this is not a good solution. Also, polymorphism, or the creation of changing footprints, is apt to result, if it hasn’t already, in robots, just as polymorphic (“stealth”) viruses appeared to get around virus checkers looking for known viral footprints.

Another heuristic approach identified in Botz-4-Sale: Surviving DDos Attacks that Mimic Flash Crowds [[killbots]] involves the use of CAPTCHA images, with a twist: how the user reacts to the test is as important as whether or not it was solved. This system, which was designed to thwart distributed denial of service (DDoS) attacks, bans automated attackers which make repeated attempts to retrieve a certain page, while protecting against marking humans incorrectly as automated traffic. When the server’s load drops below a certain level, the CAPTCHA-based authentication process is removed entirely.

Honeypots

Providing a CAPTCHA visible to robots but not to humans appears to be sufficiently successful to be supported in several content management systems such as Drupal Honeypots and in several commercial WordPress plugins. The form is created to attract robots and then hidden from the user by markup such as CSS-Hidden. It's an approach that is easily implemented even in hand authored markup and should be considered. The Hilton Hotel Corporation has used a honeypot CAPTCHA on the Sign In page for Hilton Honors, its loyalty program website where a prominent focusable field is labeled: "This field is for robots only. Please leave blank."

Logic puzzles

The goal of visual verification is to separate human from machine. One reasonable way to do this is to test for logic. Simple mathematical or word puzzles, trivia, or similar logic tests may raise the bar for robots, at least to the point where using them is more attractive elsewhere.

It should be noted that users with cognitive disabilities may still have difficulty with logic-puzzle CAPTCHA solutions. Answers may need to be handled flexibly, if they require free-form text. Also, a system would have to maintain a vast number of questions, or shift them around programmatically, in order to keep spiders from capturing them all for use by web robots. This approach is also more likely subject to defeat by human operators engaged in crowd-sourcing activity on behalf of attackers.

Image and video

Visual comparison CAPTCHAs

There are a number of CAPTCHA techniques based on the identification of still images. This can include requiring the user to identify whether an image is a man or a woman, or whether an image is human-shaped or avatar-shaped among other comparison solutions, such as CAPTCHAStar! A novel CAPTCHA based on interactive shape discovery, [[captchastar]], FaceCAPTCHA: a CAPTCHA that identifies the gender of face images unrecognized by existing gender classifiers [[facecaptcha]], and Social and egocentric image classification for scientific and privacy applications [[social-classification]].

While alternative audio comparison CAPTCHAs could be provided such as using similar or different sounds for comparison, the reliance on visual comparison alone would make these techniques difficult, if not impossible for people with vision-related disabilities to use.

3D CAPTCHA

A 3D representation of letters and numbers can make it more difficult for OCR software to identify them, in turn increasing the security of the CAPTCHA, described in On the security of text-based 3D CAPTCHAs [[3d-captcha-security]]. However, this solution raises similar accessibility issues to traditional CAPTCHAs.

Video Game CAPTCHA

This process suggests the completion of a basic video game as a CAPTCHA, like Game-based image semantic CAPTCHA on handset devices [[game-captcha]]. The benefits include removal of language barriers.

Multiple interface methods could potentially make such solutions accessible. It would also afford the benefit of making CAPCHA solving an arguably enjoyable process, reducing the frustrations generally associated with traditional CAPTCHAs.

Multi-Party Approaches

The Google reCAPTCHA

Acquired in 2009 from Carnegie Mellon University, Google's reCAPTCHA overwhelmingly dominates CAPTCHA deployment on the web today. Version 2 of reCAPTCHA continues to provide an API that has been most effectively marketed as the "no CAPTCHA re CAPTCHA," and it's checkbox proclaiming: "I'm not a robot" has become a cultural icon, spawning various cultural offshoots in art, theater, and popular music.

The checkbox is, of course, no checkbox at all in the traditional HTML sense. The pseudo-checkbox process works by collecting a trove of user data well beyond mouse movement and keyboard navigation, including The date, the language the browser is set to, All cookies placed by Google over the last 6 months, CSS information for that page, an inventory of mouse clicks made on that screen (or touches if on a touch device), an inventory of plugins installed on the browser, and an itemization of All Javascript objects, all to determine whether the user is human or robot. Of course Google also generally knows much about individual users, including their customary IP addresses, the telephone numbers and email addresses of their friends, family and colleagues, where they have been at every moment of every day, as well as their web search and YouTube habits. This is why the simple checkbox can keep the CAPTCHA process disarmingly simple, though it also explains why the privacy policy accompanies the "no CAPTCHA reCAPTCHA". The Privacy Polisy is required to satisfy legal requirements in California and in the E.U.

For a time Google's reCAPTCHA V2 was regarded the most accessible CAPTCHA solution for one simple reason, it was capable of being comfortably completed using a variety of assistive technologies. More recently it has been widely observed that utilizing keyboard navigation, as many assistive technology users do, simply results in the presentation of a traditional inaccessible CAPTCHA as a fall-back mechanism becoming but a mere extra hurdle in the user's quest for access. Our own tests with various browsers on various operating environments have been generally successful with Google's own reCAPTCHA test page. However, browsing in incognito mode, clearing or blocking cookies, and additional factors can apparently trigger a fallback to traditional CAPTCHA these days for many assistive technology users.

One recent reCAPTCHA V. 2 innovation seems most promissing. Rather than reproduce characters, users are asked to type the words they see (or hear). It even appears unnecessary to spell these correctly, or to enter all the words presented in order to be adjudicated as human.

Late in 2018 Google released reCAPTCHA V3 promissing to eliminate "the need to interrupt users with challenges at all." Obviously, this would be idea and we believe Google has adopted several strategies recommended here toward achieving that goal. However, as the failure fallback remains the presentation of a traditional CAPTCHA, it remains imperative to do better by users who require alternative CAPTCHA options as also enumerated here.

In this context we note that it has become common for users to access various on line services through multiple devices such as desktop and mobile computers, smart phones, tablets, and wearables such as smart watches. This proliferation provides online services multiple vectors for simple and effective user authentication, including persons with disabilities [[auth-mult]]. We note that several major service providers (such as Facebook) now support cross-site user authentication. As of this writing however, it appears only Google's V. 3 reCAPTCHA API provides cross-site CAPTCHA services, i.e. a service to distinguish human from robot users without actually passing specific identifying data.

We would expect Google's V. 3 reCAPTCHA system would score no need to present a CAPTCHA whenever another browser tab is already properly logged in to a Google product such as Calendar on whatever registered user device. However, while this may prevent the third party site from collecting personal data, it only assists Google in acquiring more user data. this constitutes a significant cost to the user's privacy in an industry so capable of cross-referencing massive ammounts of data in the absence of meaningful regulations and controls on where and how that data may be used. This is a very strong accessibility concern as people with disabilities are generally loathe to disclose any information about their disability except when and only when they expressly authorize it to be revealed. Regretably, in recent years the news has widely reported the sobering fact that an authenticated login on some service also does not a human user guarantee.

Federated identity systems

Many large companies such as Microsoft, Apple, Amazon, Google and the Kantara Initiative have created competing “federated network identity” systems, which can allow a user to create an account, set his or her preferences, payment data, etc., and have that data persist across all sites and devices that use the same service. Due to large companies now requiring a federated identity to use cloud-based services on their respective digital ecosystems, the popularity of federated identities has increased significantly. As a result, many web sites and services allow a portable form of authentication and identification across the Web. Perhaps some of these could also provide cross-site CAPTCHA services.

Single sign-on

Single sign-on services provide multiple disparate services to users, including to third parties through APIs. Because of this wide scope, they need to be among the most accessible services on the Web in order to offer equal benefits to people with disabilities. They also pose two significant risks for users. One is the aforementioned opportunity to collect massive amounts of personal data. The other risk comes into play should user credentials ever be exposed where users can suddenly find themselves exposed on multiple sites.

Public-key (PKI) infrastructure solutions

Another approach is to use certificates for individuals who wish to verify their identity. The certificate can be issued in such a way as to ensure something close to a one-person-one-vote system e.g. by issuing these certificates in person and enabling users to develop distributed trust networks, or by having these certificates issued from highly trusted authorities such as governments as in Estonia's e-Residency Program whose certificates can be trusted not only to validate a human user but also to provide an unimpeachable audit trail should issues arise.

The cost of creating fraudulent certificates needs to be high enough to destroy the value of producing them in most cases. Sites would need to use mechanisms which are widely implemented in user agents.

A variant of this concept, in which only people with disabilities who are affected by other verification systems would register, is sometimes proposed. Such approaches raise significant privacy and stigmatisation concerns and are usually opposed strongly by people with disabilities themselves and by organizations that serve them. Such approaches should not be confused with situations where people voluntarily self-identify as individuals with disabilities. An example is the U.S. based Bookshare whose services are only available to persons with documented print disabilities. Bookshare provides its users access to printed materials which are otherwise unavailable in accessible alternative formats such as audio or Braille. An international copyright treaty administered by the United Nations' World Intellectual Property Organization (WIPO) known as the Marrakesh Treaty [[marrakesh]] provides for exceptions to national copyright law to allow copyrighted materials to be provided to print disabled users in specialized formats.