A survey of execution character encodings

Introduction

This paper contains the results of a survey of the default and supported execution character encodings of current C and C++ compilers. Results are included only for compilers that have not been obviously discontinued. Wide character encodings are not reflected in the results. Distinctions between source and execution character encodings are ignored unless otherwise stated.

The list of compilers to include in this survey were culled from the following sources:

The supported character encodings for each compiler were determined by consulting available documentation. No testing was performed because the author lacks access to most of these compiler.

Motivation

At the C++ standard committee meeting in Jacksonville, 2018, EWG discussed the long-term ramifications of the P0482 char8_t proposal. What will happen as the world continues to migrate to UTF-8? If all compilers were to transition to UTF-8 as their (primary) execution character encoding, what would become of code written to use char8_t?

The data presented in this paper is intended to help set expectations regarding the possibility of one day mandating a specific execution character encoding for all compilers. Unfortunately, the collected data is just a snapshot in time and therefore does not reflect current trends nor predict any particular future. It nevertheless does give an indication of the amount of effort that would be needed to migrate all compilers to a particular execution character encoding.

Future directions

JF Bastien had the excellent suggestion that this data, as well as data for other implementation defined behavior and quantities be collected at cppreference.com. The author intends to pursue this direction for further collection and maintenance of this data.

Survey results

Compiler Version and release date Default execution character encoding Other execution character encodings supported? References/Notes
GCC 7.3.0
2018-01-25
UTF-8 Yes https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Preprocessor-Options.html#Preprocessor-Options

The -fexec-charset option enables selection of an alternate execution character set.
Clang 5.0
2017-09-07
UTF-8 No According to a post to the Clang developer's mailing list, there is interest in providing support for additional execution character encodings:
http://lists.llvm.org/pipermail/cfe-dev/2018-January/056721.html
Microsoft Visual C++ 14.1 (Visual Studio 2017)
2017-03-07
Locale dependent ASCII based code page Yes; ASCII based code pages https://docs.microsoft.com/en-us/cpp/cpp/character-sets2
https://docs.microsoft.com/en-us/cpp/build/reference/execution-charset-set-execution-character-set
https://docs.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8

The /execution-charset and /utf-8 options enable selection of an alternate execution character set.
AMD Optimizing C/C++ Compiler (Clang based) 1.1
2017-12-15
UTF-8 (presumed) No (presumed) https://developer.amd.com/wordpress/media/2017/04/AOCC-1.1-User-Guide.pdf
Arm Compiler 5.06 update 6
2017-09-28
Locale dependent Yes; requires ISO-8859-1 compatibility https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/dui0472/latest/c-and-c-implementation-details/character-sets-and-identifiers-in-arm-c-and-c

The source character set is determined by the current locale unless a Unicode BOM is present or the --locale option is specified. The execution and source character set must be the same and must be a subset of ISO-8859-1.

As of version 6, a clang based compiler is now provided.
Arm Compiler (Clang based) 6.9
2017-10-25
UTF-8 (presumed) No (presumed) execution character encoding documentation has not been identified. Supported encodings are presumed based on derivation from Clang.
Arm Compiler for HPC (Clang based) 18.1
2018-01-17
UTF-8 (presumed) No (presumed) execution character encoding documentation has not been identified. Supported encodings are presumed based on derivation from Clang.
Embarcadero C++Builder RAD Studio 10.2
2017-03-22
Non-Clang based: ASCII based code pages

Clang-based: UTF-8 (presumed)
No

No (presumed)
Implementation defined properties are documented at http://docwiki.embarcadero.com/RADStudio/Tokyo/en/ANSI_Implementation-specific_Standards.

RAD Studio 10.2 provides both Clang and non-Clang based compilers.

execution character encoding documentation has not been identified. Supported encodings are presumed for Clang derived compilers.
Codeplay ComputeCpp 0.6.0
2018-02-13
UTF-8 (presumed) No (presumed) execution character encoding documentation has not been identified. Supported encodings are presumed based on derivation from Clang.
Comeau C/C++ 4.3.10.1
2008-10-06
ASCII with 8-bit extensions (presumed) No (presumed) execution character encoding documentation has not been identified. Supported encodings are presumed based on supported platforms.
Wind River Diab Compiler Unknown Unknown Unknown Documentation is restricted to licensees.
Wind River GCC (gcc based)
Digital Mars C/C++ 8.5.7
2013-08-01
ASCII with 8-bit extensions Yes. http://www.digitalmars.com/ctg/CPP-Language-Implementation.html

The -j option can be used to select from a few alternative code pages.

#pragma setlocale("locale") also allows changing how string literals are interpreted.
Systems/C++ 2.10
2017-07-31
EBCDIC when targeting z/OS

ASCII when targeting z/Linux
yes, ASCII, core language only http://www.dignus.com/dcxx/syscxx.pdf

The -fasciiout option switches the compile time execution character encoding to ASCII, but the provided standard library does not support this.
EDG 4.14
2017-09-11
Configurable Yes, configurable https://www.edg.com/docs/edg_cpp.pdf

The EDG frontend is eminently configurable. Out of the box, it supports ASCII, ISO-8859-1, Shift-JIS, and UTF-8.
PathScale EKOPath Discontinued
Cypress FR Family SOFTUNE V6 C/C++ Compiler V65L09
2017-09-13
Unknown Unknown Unable to identify publicly accessible documentation
Green Hills C/C++ Compilers Unknown Unknown Unknown Unable to identify publicly accessible documentation
HP aCC Unknown Unknown Unknown Unable to identify publicly accessible documentation
IAR C/C++ Compiler for 8051 10.10
2017-04-05
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/8051/webic/doc/EW8051_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Arm 8.22
2018-02-23
ASCII Yes; MBCS, UTF-8, UTF-16 http://ftp.iar.se/WWWfiles/arm/webic/doc/EWARM_DevelopmentGuide.ENU.pdf

The --source_encoding option affects both the source and execution character encodings.
IAR C/C++ Compiler for AVR32 4.30
2015
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/AVR32/webic/doc/EWAVR32_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for AVR 7.10
2017-05-30
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/AVR/webic/doc/EWAVR_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Maxim MAXQ 2.43
2017-10-31
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/maxq/guides/EWMAXQ_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for National CR16C 3.30
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/cr16c/webic/doc/EWCR16C_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for NXP Coldfire 1.23
2010
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/cf/EWCF_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for NXP HCS12 4.10
2014
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/hcs12/guides/EWHCS12_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for NXP S08 1.20
2011
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/s08/EWS08_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas 78K 4.7
2010
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/78K/webic/doc/EW78K_CompilerGuide.ENU.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas H8 2.30
2011
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/H8/webic/doc/EWH8_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas M16C and R8C 3.70
2014
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/ewm16c/EWM16C_CompilerGuide.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas M32C 3.30
2004
ASCII No http://ftp.iar.se/WWWfiles/m32c/guides/EWM32C_CompilerReference.pdf
IAR C/C++ Compiler for Renesas R32C 1.40
2012
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/r32c/EWR32C_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas RH850 1.40
2016
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/rh850/EWRH850_DevelopmentGuide.ENU.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas RL78 3.10
2017-09-20
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/rl78/EWRL78_DevelopmentGuide.ENU.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas RX 3.10
2017-04-21
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/RX/webic/doc/EWRX_DevelopmentGuide.ENU.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas SuperH 2.30
2014
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/superh/guides/EWSH_DevelopmentGuide.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for Renesas V850 4.20
2015
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/v850/webic/ew/doc/EWV850_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for SAM8 3.20
2003
ASCII No http://ftp.iar.se/WWWfiles/sam8/EWSAM8_CompilerReference.pdf
IAR C/C++ Compiler for STM8 3.10
2017-06-30
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/stm8/guides/EWSTM8_DevelopmentGuide.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IAR C/C++ Compiler for TI MSP430 7.11
2017-11-20
ASCII Yes; Host ASCII compatible MBCS http://ftp.iar.se/WWWfiles/msp430/webic/doc/EW430_CompilerReference.pdf

The --enable_multibytes option enables setting the execution character encoding to the host encoding.
IBM XL C/C++ for AIX 13.1.3
2015-12-08
ASCII Yes; MBCS https://www.ibm.com/support/knowledgecenter/SSGH3R_13.1.3/com.ibm.compilers.aix.doc/welcome.html

The -qmbcs option can be used to enable MBCS support.
IBM XL C/C++ for Linux on Power (Clang based) 13.1.6
2017-12-05
UTF-8 (presumed) No (presumed) https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.6/com.ibm.compilers.linux.doc/welcome.html
IBM XL C/C++ for Linux on z Systems (Clang based) 1.5
2015-11-24
UTF-8 (presumed) No (presumed) https://www.ibm.com/support/knowledgecenter/SSVUN6_1.2.0/com.ibm.compilers.loz.doc/welcome.html
IBM XL C/C++ for z/VM 1.3
2011-10-12
Locale dependent EBCDIC based code page Yes; EBCDIC based code pages The XL C/C++ for z/VM compiler is derived from the XL C/C++ for z/OS compiler with some additional limitations:
https://www.ibm.com/support/knowledgecenter/SSB27U_6.4.0/com.ibm.zvm.v640.vmcug/vmcug20.htm
IBM XL C/C++ for Blue Gene/Q 12.1
2012-06-15
ASCII Yes; MBCS https://www.ibm.com/support/knowledgecenter/SS2LWA_12.1.0/welcome.html

The -qmbcs option can be used to enable MBCS support.
IBM z/OS C/C++ 2.3
2017-07
Locale dependent EBCDIC based code page with ISO-2022 shift sequences Yes; EBCDIC based code pages, ASCII based code pages https://www-01.ibm.com/support/docview.wss?uid=swg27036892

The CONVLIT and LOCALE options enable overriding the execution character encoding.
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/com.ibm.zos.v2r3.cbcux01/convlit.htm
ImageCraft JumpStart C and C++ for Cortex (GCC based) 9.01
2018-03-30
UTF-8 (presumed) Yes (presumed) https://www.imagecraft.com/help/ICCV9CORTEX/JumpStart%20C++%20for%20Cortex%20User%20Guide-V0.9.pdf
ImageCraft JumpStart C for AVR 8.28
2018-03-26
ASCII (presumed) No (presumed) http://imagecraft.com/help/ICCV8AVR/index.htm
Intel C/C++ 17.0
2016-09-15
ASCII (presumed) Yes; MBCS https://software.intel.com/en-us/intel-cplusplus-compiler-17.0-user-and-reference-guide

The -multibyte-chars option can be used to enable MBCS support.
KAI C++ Discontinued
Synopsys DesignWare ARC MetaWare C/C++ Unknown Unknown Unknown Unable to identify publicly accessible documentation
Synopsys DesignWare ARC MetaWare C/C++ (Clang based)
Synopsys chess
NXP CodeWarrior
Mentor Performance Optimized GNU Compiler (GCC based)
Mentor Graphics EDGE C/C++ Compiler
Open64 4.5.2.1
2013-03-28
https://developer.amd.com/wordpress/media/2012/10/open64.pdf
Oracle Developer Studio
nVidea PGI
Renesas 78K0R
Renesas CC-RL RL78 Family
Renesas CX
Renesas R32C Family
Renesas RH850 Family
Renesas V850
Renesas C/C++ Compiler Package for RX Family
Renesas C/C++ Compiler Package for SuperH Family
Renesas C/C++ Compiler Package for M32R Family [M3T-CC32R]
Renesas C/C++ Compiler Package for R8C and M16C Families
Renesas C/C++ Compiler Package for M16C Series and R8C Family [M3T-NC30WA]
Renesas C/C++ Compiler Package for H8SX, H8S, H8 Family
Sony SN
Sony Orbis Clang compiler for PS4 (Clang based)
Stratus VOS C++ compiler (GCC based)
Symantec C++
TenDRA C/C++
Texas Instruments C/C++ Compiler
Ultimate C/C++
Open Watcom C/C++
Analog Devices Blackfin and TigerSHARC
Archelon C
Archelon CSR Kalimba C
CADUL C cross compiler for Intel 80X86
CEVA compiler (NVIDIA)
Nvidia CUDA
Cosmic
Fujitsu FR Family
Hexagon Tools
HI-CROSS+ Motorola HC16
Motorola DSP563
HI-TECH C compiler/linker
Hitachi ch38
HiveCC
Keil CA51
Marvell C compiler/linker
Microchip MPLAB pic24
Microchip MPLAB pic32
Microchip MPLAB XC8 C
Microtec
Microware Ultra C for OS-9
MPLAB C18
MPLAB XC16 C
Nintendo Cafe Platform
NXP StarCore Freescale
Panasonic C
Panasonic MN101E/MN101L
Paradigm C/C++
Plan 9 C
QNX
Rowley Crossworks for MSP430
Tasking 68K Toolset
Tasking ARM Toolset
Tasking Classic Toolset for C166
Tasking DSP56X Toolset
Tasking IFX SLE88
Tasking SLE88
Tasking Tricore
Tasking VX Toolset for C166
Tensilica Xtensa C/C++
TI ARP32 C/C++
TI msp430 C/C++
TI tms320c6x and tms320c55x
TI tms320C3x/4x C
TI tms320c28x
TriMedia tmcc
WinAVR
ZiLOG eZ80