Document Number: | DXXXXR0 <DRAFT> |
---|---|
Date: | 2018-04-23 |
Audience: | Evolution Working Group Library Evolution Working Group |
Reply-to: | Tom Honermann <tom@honermann.net> |
This paper contains the results of a survey of the default and supported execution character encodings of current C and C++ compilers. Results are included only for compilers that have not been obviously discontinued. Wide character encodings are not reflected in the results. Distinctions between source and execution character encodings are ignored unless otherwise stated.
The list of compilers to include in this survey were culled from the following sources:
The supported character encodings for each compiler were determined by consulting available documentation. No testing was performed because the author lacks access to most of these compiler.
At the C++ standard committee meeting in Jacksonville, 2018, EWG discussed the long-term ramifications of the P0482 char8_t proposal. What will happen as the world continues to migrate to UTF-8? If all compilers were to transition to UTF-8 as their (primary) execution character encoding, what would become of code written to use char8_t?
The data presented in this paper is intended to help set expectations regarding the possibility of one day mandating a specific execution character encoding for all compilers. Unfortunately, the collected data is just a snapshot in time and therefore does not reflect current trends nor predict any particular future. It nevertheless does give an indication of the amount of effort that would be needed to migrate all compilers to a particular execution character encoding.
JF Bastien had the excellent suggestion that this data, as well as data for other implementation defined behavior and quantities be collected at cppreference.com. The author intends to pursue this direction for further collection and maintenance of this data.
Compiler | Version and release date | Default execution character encoding | Other execution character encodings supported? | References/Notes |
---|---|---|---|---|
GCC | 7.3.0 2018-01-25 |
UTF-8 | Yes |
https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Preprocessor-Options.html#Preprocessor-Options The -fexec-charset option enables selection of an alternate execution character set. |
Clang | 5.0 2017-09-07 |
UTF-8 | No | According to a post to the Clang developer's mailing list, there is
interest in providing support for additional execution character
encodings: http://lists.llvm.org/pipermail/cfe-dev/2018-January/056721.html |
Microsoft Visual C++ | 14.1 (Visual Studio 2017) 2017-03-07 |
Locale dependent ASCII based code page | Yes; ASCII based code pages |
https://docs.microsoft.com/en-us/cpp/cpp/character-sets2 https://docs.microsoft.com/en-us/cpp/build/reference/execution-charset-set-execution-character-set https://docs.microsoft.com/en-us/cpp/build/reference/utf-8-set-source-and-executable-character-sets-to-utf-8 The /execution-charset and /utf-8 options enable selection of an alternate execution character set. |
AMD Optimizing C/C++ Compiler (Clang based) | 1.1 2017-12-15 |
UTF-8 (presumed) | No (presumed) |
https://developer.amd.com/wordpress/media/2017/04/AOCC-1.1-User-Guide.pdf |
Arm Compiler | 5.06 update 6 2017-09-28 |
Locale dependent | Yes; requires ISO-8859-1 compatibility |
https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/dui0472/latest/c-and-c-implementation-details/character-sets-and-identifiers-in-arm-c-and-c The source character set is determined by the current locale unless a Unicode BOM is present or the --locale option is specified. The execution and source character set must be the same and must be a subset of ISO-8859-1. As of version 6, a clang based compiler is now provided. |
Arm Compiler (Clang based) | 6.9 2017-10-25 |
UTF-8 (presumed) | No (presumed) | execution character encoding documentation has not been identified. Supported encodings are presumed based on derivation from Clang. |
Arm Compiler for HPC (Clang based) | 18.1 2018-01-17 |
UTF-8 (presumed) | No (presumed) | execution character encoding documentation has not been identified. Supported encodings are presumed based on derivation from Clang. |
Embarcadero C++Builder | RAD Studio 10.2 2017-03-22 |
Non-Clang based: ASCII based code pages Clang-based: UTF-8 (presumed) |
No No (presumed) |
Implementation defined properties are documented at
http://docwiki.embarcadero.com/RADStudio/Tokyo/en/ANSI_Implementation-specific_Standards. RAD Studio 10.2 provides both Clang and non-Clang based compilers. execution character encoding documentation has not been identified. Supported encodings are presumed for Clang derived compilers. |
Codeplay ComputeCpp | 0.6.0 2018-02-13 |
UTF-8 (presumed) | No (presumed) | execution character encoding documentation has not been identified. Supported encodings are presumed based on derivation from Clang. |
Comeau C/C++ | 4.3.10.1 2008-10-06 |
ASCII with 8-bit extensions (presumed) | No (presumed) | execution character encoding documentation has not been identified. Supported encodings are presumed based on supported platforms. |
Wind River Diab Compiler | Unknown | Unknown | Unknown | Documentation is restricted to licensees. |
Wind River GCC (gcc based) | ||||
Digital Mars C/C++ | 8.5.7 2013-08-01 |
ASCII with 8-bit extensions | Yes. |
http://www.digitalmars.com/ctg/CPP-Language-Implementation.html The -j option can be used to select from a few alternative code pages. #pragma setlocale("locale") also allows changing how string
literals are interpreted. |
Systems/C++ | 2.10 2017-07-31 |
EBCDIC when targeting z/OS ASCII when targeting z/Linux |
yes, ASCII, core language only |
http://www.dignus.com/dcxx/syscxx.pdf The -fasciiout option switches the compile time execution character encoding to ASCII, but the provided standard library does not support this. |
EDG | 4.14 2017-09-11 |
Configurable | Yes, configurable |
https://www.edg.com/docs/edg_cpp.pdf The EDG frontend is eminently configurable. Out of the box, it supports ASCII, ISO-8859-1, Shift-JIS, and UTF-8. |
PathScale EKOPath | Discontinued | |||
Cypress FR Family SOFTUNE V6 C/C++ Compiler | V65L09 2017-09-13 |
Unknown | Unknown | Unable to identify publicly accessible documentation |
Green Hills C/C++ Compilers | Unknown | Unknown | Unknown | Unable to identify publicly accessible documentation |
HP aCC | Unknown | Unknown | Unknown | Unable to identify publicly accessible documentation |
IAR C/C++ Compiler for 8051 | 10.10 2017-04-05 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/8051/webic/doc/EW8051_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Arm | 8.22 2018-02-23 |
ASCII | Yes; MBCS, UTF-8, UTF-16 |
http://ftp.iar.se/WWWfiles/arm/webic/doc/EWARM_DevelopmentGuide.ENU.pdf The --source_encoding option affects both the source and execution character encodings. |
IAR C/C++ Compiler for AVR32 | 4.30 2015 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/AVR32/webic/doc/EWAVR32_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for AVR | 7.10 2017-05-30 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/AVR/webic/doc/EWAVR_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Maxim MAXQ | 2.43 2017-10-31 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/maxq/guides/EWMAXQ_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for National CR16C | 3.30 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/cr16c/webic/doc/EWCR16C_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for NXP Coldfire | 1.23 2010 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/cf/EWCF_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for NXP HCS12 | 4.10 2014 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/hcs12/guides/EWHCS12_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for NXP S08 | 1.20 2011 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/s08/EWS08_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas 78K | 4.7 2010 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/78K/webic/doc/EW78K_CompilerGuide.ENU.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas H8 | 2.30 2011 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/H8/webic/doc/EWH8_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas M16C and R8C | 3.70 2014 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/ewm16c/EWM16C_CompilerGuide.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas M32C | 3.30 2004 |
ASCII | No | http://ftp.iar.se/WWWfiles/m32c/guides/EWM32C_CompilerReference.pdf |
IAR C/C++ Compiler for Renesas R32C | 1.40 2012 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/r32c/EWR32C_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas RH850 | 1.40 2016 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/rh850/EWRH850_DevelopmentGuide.ENU.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas RL78 | 3.10 2017-09-20 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/rl78/EWRL78_DevelopmentGuide.ENU.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas RX | 3.10 2017-04-21 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/RX/webic/doc/EWRX_DevelopmentGuide.ENU.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas SuperH | 2.30 2014 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/superh/guides/EWSH_DevelopmentGuide.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for Renesas V850 | 4.20 2015 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/v850/webic/ew/doc/EWV850_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for SAM8 | 3.20 2003 |
ASCII | No | http://ftp.iar.se/WWWfiles/sam8/EWSAM8_CompilerReference.pdf |
IAR C/C++ Compiler for STM8 | 3.10 2017-06-30 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/stm8/guides/EWSTM8_DevelopmentGuide.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IAR C/C++ Compiler for TI MSP430 | 7.11 2017-11-20 |
ASCII | Yes; Host ASCII compatible MBCS |
http://ftp.iar.se/WWWfiles/msp430/webic/doc/EW430_CompilerReference.pdf The --enable_multibytes option enables setting the execution character encoding to the host encoding. |
IBM XL C/C++ for AIX | 13.1.3 2015-12-08 |
ASCII | Yes; MBCS |
https://www.ibm.com/support/knowledgecenter/SSGH3R_13.1.3/com.ibm.compilers.aix.doc/welcome.html The -qmbcs option can be used to enable MBCS support. |
IBM XL C/C++ for Linux on Power (Clang based) | 13.1.6 2017-12-05 |
UTF-8 (presumed) | No (presumed) |
https://www.ibm.com/support/knowledgecenter/SSXVZZ_13.1.6/com.ibm.compilers.linux.doc/welcome.html |
IBM XL C/C++ for Linux on z Systems (Clang based) | 1.5 2015-11-24 |
UTF-8 (presumed) | No (presumed) |
https://www.ibm.com/support/knowledgecenter/SSVUN6_1.2.0/com.ibm.compilers.loz.doc/welcome.html |
IBM XL C/C++ for z/VM | 1.3 2011-10-12 |
Locale dependent EBCDIC based code page | Yes; EBCDIC based code pages | The XL C/C++ for z/VM compiler is derived from the XL C/C++ for z/OS
compiler with some additional limitations: https://www.ibm.com/support/knowledgecenter/SSB27U_6.4.0/com.ibm.zvm.v640.vmcug/vmcug20.htm |
IBM XL C/C++ for Blue Gene/Q | 12.1 2012-06-15 |
ASCII | Yes; MBCS |
https://www.ibm.com/support/knowledgecenter/SS2LWA_12.1.0/welcome.html The -qmbcs option can be used to enable MBCS support. |
IBM z/OS C/C++ | 2.3 2017-07 |
Locale dependent EBCDIC based code page with ISO-2022 shift sequences | Yes; EBCDIC based code pages, ASCII based code pages |
https://www-01.ibm.com/support/docview.wss?uid=swg27036892 The CONVLIT and LOCALE options enable overriding the execution character encoding. https://www.ibm.com/support/knowledgecenter/SSLTBW_2.3.0/com.ibm.zos.v2r3.cbcux01/convlit.htm |
ImageCraft JumpStart C and C++ for Cortex (GCC based) | 9.01 2018-03-30 |
UTF-8 (presumed) | Yes (presumed) |
https://www.imagecraft.com/help/ICCV9CORTEX/JumpStart%20C++%20for%20Cortex%20User%20Guide-V0.9.pdf |
ImageCraft JumpStart C for AVR | 8.28 2018-03-26 |
ASCII (presumed) | No (presumed) |
http://imagecraft.com/help/ICCV8AVR/index.htm |
Intel C/C++ | 17.0 2016-09-15 |
ASCII (presumed) | Yes; MBCS |
https://software.intel.com/en-us/intel-cplusplus-compiler-17.0-user-and-reference-guide The -multibyte-chars option can be used to enable MBCS support. |
KAI C++ | Discontinued | |||
Synopsys DesignWare ARC MetaWare C/C++ | Unknown | Unknown | Unknown | Unable to identify publicly accessible documentation |
Synopsys DesignWare ARC MetaWare C/C++ (Clang based) | ||||
Synopsys chess | ||||
NXP CodeWarrior | ||||
Mentor Performance Optimized GNU Compiler (GCC based) | ||||
Mentor Graphics EDGE C/C++ Compiler | ||||
Open64 | 4.5.2.1 2013-03-28 |
https://developer.amd.com/wordpress/media/2012/10/open64.pdf | ||
Oracle Developer Studio | ||||
nVidea PGI | ||||
Renesas 78K0R | ||||
Renesas CC-RL RL78 Family | ||||
Renesas CX | ||||
Renesas R32C Family | ||||
Renesas RH850 Family | ||||
Renesas V850 | ||||
Renesas C/C++ Compiler Package for RX Family | ||||
Renesas C/C++ Compiler Package for SuperH Family | ||||
Renesas C/C++ Compiler Package for M32R Family [M3T-CC32R] | ||||
Renesas C/C++ Compiler Package for R8C and M16C Families | ||||
Renesas C/C++ Compiler Package for M16C Series and R8C Family [M3T-NC30WA] | ||||
Renesas C/C++ Compiler Package for H8SX, H8S, H8 Family | ||||
Sony SN | ||||
Sony Orbis Clang compiler for PS4 (Clang based) | ||||
Stratus VOS C++ compiler (GCC based) | ||||
Symantec C++ | ||||
TenDRA C/C++ | ||||
Texas Instruments C/C++ Compiler | ||||
Ultimate C/C++ | ||||
Open Watcom C/C++ | ||||
Analog Devices Blackfin and TigerSHARC | ||||
Archelon C | ||||
Archelon CSR Kalimba C | ||||
CADUL C cross compiler for Intel 80X86 | ||||
CEVA compiler (NVIDIA) | ||||
Nvidia CUDA | ||||
Cosmic | ||||
Fujitsu FR Family | ||||
Hexagon Tools | ||||
HI-CROSS+ Motorola HC16 | ||||
Motorola DSP563 | ||||
HI-TECH C compiler/linker | ||||
Hitachi ch38 | ||||
HiveCC | ||||
Keil CA51 | ||||
Marvell C compiler/linker | ||||
Microchip MPLAB pic24 | ||||
Microchip MPLAB pic32 | ||||
Microchip MPLAB XC8 C | ||||
Microtec | ||||
Microware Ultra C for OS-9 | ||||
MPLAB C18 | ||||
MPLAB XC16 C | ||||
Nintendo Cafe Platform | ||||
NXP StarCore Freescale | ||||
Panasonic C | ||||
Panasonic MN101E/MN101L | ||||
Paradigm C/C++ | ||||
Plan 9 C | ||||
QNX | ||||
Rowley Crossworks for MSP430 | ||||
Tasking 68K Toolset | ||||
Tasking ARM Toolset | ||||
Tasking Classic Toolset for C166 | ||||
Tasking DSP56X Toolset | ||||
Tasking IFX SLE88 | ||||
Tasking SLE88 | ||||
Tasking Tricore | ||||
Tasking VX Toolset for C166 | ||||
Tensilica Xtensa C/C++ | ||||
TI ARP32 C/C++ | ||||
TI msp430 C/C++ | ||||
TI tms320c6x and tms320c55x | ||||
TI tms320C3x/4x C | ||||
TI tms320c28x | ||||
TriMedia tmcc | ||||
WinAVR | ||||
ZiLOG eZ80 |