Jump to Table of Contents Collapse Sidebar

D1886R0.0
Error speed benchmarking

Draft Proposal,

This version:
https://wg21.link/P1886R0
Author:
(National Instruments)
Audience:
WG21; DG
Project:
ISO/IEC JTC1/SC22/WG21 14882: Programming Language — C++
Source:
github

Abstract

The author measures the speed of various error handling approaches. Abort and x64 exceptions are fast on the happy path. Error code strategies are fast on the sad path.

1. Introduction

In this paper, we will look at the relative speeds of error handling strategies in micro-benchmarking scenarios. "Happy" and "sad" path timings will be presented.

2. Measuring methodology

All benchmarks lie. It’s important to know how a benchmark is set up so that the useful parts can be distinguished from the misleading parts.

Gathering representative timings for very small pieces of code is difficult on modern hardware. One type of micro-architectural quirk that makes these measurements very painful is code alignment [Bakhvalov]. Each call and each branch can introduce stalls if jumping to a poorly aligned location. These stalls can dominate the timings. The author has seen code alignment differences swing a benchmark running at 451 cycles per iteration to 310 cycles per iteration, for a 45% increase in execution time. This involved no system calls, no atomic operations, no divisions, or any other highly expensive operations being performed.

This research used a novel technique to get representative timings across a wide range of jump alignments. At the beginning of each function, a random number of nop instructions (0 to 31 inclusive) were inserted. A matching set of nop instructions were inserted at a later point in the execution path so that a total of 31 user inserted nop instructions were always executed. 1024 instances of this program, all with different layouts of nops, were then built and run.

Tests were built to optimize for speed. The test case optimizes away in the face of link-time optimizations (LTO) / whole program optimizations, so they were not used. Some compilers only support profile-guided optimizations (PGO) after turning on link-time optimizations, so PGO wasn’t used either. Only the exception cases were built with exceptions turned on.

For all but the exception failure cases, 16K warm-up iterations were run before starting timing. Within the timer bounds, 512K iterations of the test case were run. The exception failure cases only had 256 warm-up iterations and 16K timed iterations, due to just how slow throwing an exception can be.

All tests were run on the same machine. The processor was an Intel Core i7-3770 running at 3.3915 GHz. The Core i7-3770 has an Ivy Bridge microarchitecture, and was released in April 2012. C-states, Hyper-Threading, and Turbo Boost were all disabled in order to improve benchmark reproducibility. MSVC benchmarks were run on the 64-bit version of Windows 10. Clang and GCC benchmarks were run on the Slax Linux distribution. (TODO: verify)

3. Starter test case for on-line analysis

To start with, we will look at code similar to the following:
struct Dtor[N] {
  extern ~Dtor[N]() {
    /* Random number of NOPs K[N] := [0,31] */
  }
};
struct inline_nops {
  inline inline_nops() {
    /* Random number of NOPs Z := [0,31] */
  }
  inline ~inline_nops() {
    /* 31 - Z NOPs */
  }
};
int global_int = 0;
int callee(bool do_err) {
  inline_nops n;
  if(do_err)
    return 1;
  return 0;
}
int caller[N](bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  int e = NEXT_FUNC(do_err);
  if(e) return e;
  global_int = 0;
  return 0;
}
int main() {
  /* Test happy path */
  startTimer();
  /* X NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    (void)caller[0](false);
  }
  /* 31 - X NOPs */
  endTimer();
  print(duration() / ITERATIONS);

  /* Test sad path */
  startTimer();
  /* Y NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    (void)caller[0](true);
  }
  /* 31 - Y NOPs */
  endTimer();
  print(duration() / ITERATIONS);
}
This code has some important properties.

In the actual tests all the function bodies are in separate .cpp files, and link-time / whole-program optimizations aren’t used. If they had been used, the entire program would get optimized away, removing our ability to measure error handling differences.

The initial part of this paper is only looking at three types of error handling.

The actual code used for the benchmark can be found on my github.

4. Happy path measurements

Abort and x64 exceptions are consistently faster than return codes. Windows x86 exceptions are slower than both return codes and abort semantics. While return codes are slower than abort and x64 exceptions, they aren’t much slower. The cost ranges from less than a cycle per function to four cycles per function in my experiments.

4.1. Write to a global

Write to global: Median cycles per iteration. A histogram can be found here

4.2. Write through an argument

Instead of writing to a global, we write through a pointer argument.
int callee(int *val, bool do_err) {
  (void)val;
  inline_nops n;
  if(do_err)
    return 1;
  return 0;
}
int caller[N](int *val, bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  int e = NEXT_FUNC(val, do_err);
  if(e) return e;
  *val = 0;
  return 0;
}

Write through argument: Median cycles per iteration. A histogram can be found here

4.3. "Return" a value

In this case, we’ll use the output channel available to us to "return" an incremented value. When handling errors in an error code fashion, we end up using an output parameter.
int callee(int *val, bool do_err) {
  inline_nops n;
  if(do_err)
    return 1;
  *val = 0;
  return 0;
}
int caller[N](int *val, bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  int out_val = 0;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  int e = NEXT_FUNC(&out_val, do_err);
  if(e) return e;
  *val = out_val + 1;
  return 0;
}

Return a value: Median cycles per iteration. A histogram can be found here

5. Sad path measurements

Sad path timings for abort aren’t shown, as the recovery path is very system specific. Throwing an exception is more than 100x slower than returning an error code.

5.1. Write to a global

Write to a global, sad path: Median cycles per iteration. A histogram can be found here

5.2. Write through an argument

Write through an argument, sad path: Median cycles per iteration. A histogram can be found here

5.3. "Return" a value

Return a value, sad path: Median cycles per iteration. A histogram can be found here

6. Off-line analysis

The following analysis is done by examining assembly, but not running it. This analysis is valuable, but is no substitute for data on real machines.

6.1. LLVM-MCA happy path off-line analysis

LLVM-MCA [MCA] is a tool that will analyze assembly and provide statistics on a hypothetical execution on a specific simulated CPU. This can be used to provide a hardware independent analysis of generated code. There is at least one substantial limitation though, in that calls and returns are typically not modeled. Regardless, the analysis is still useful.

I focused on analyzing the difference between exception builds and abort / no exception builds, all in an error neutral function. In the off-line analysis, I’m only focused on 64-bit x86-64 (Clang, GCC, and MSVC) and 64-bit ARM (Clang and GCC). Exceptions on 32-bit x86 on Windows is so much slower in the happy path that it isn’t interesting to dive in at the assembly level.

In general, there are some minor missed optimization opportunities in some of the smallest error handling examples.

I will be looking at this code snippet:

struct Dtor {~Dtor();};
void callee();
int global;
void caller() {
  Dtor d;
  callee();
  global = 0;
}
Compiler Exceptions No exceptions
MSVC x64
sub rsp, 40
call void callee(void)
; Note the ordering difference
mov DWORD PTR int global, 0
lea rcx, QWORD PTR d$[rsp]
call Dtor::~Dtor(void)
add rsp, 40
ret 0
sub rsp, 40
call void callee(void)
; Note the ordering difference
lea rcx, QWORD PTR d$[rsp]
mov DWORD PTR int global, 0
call Dtor::~Dtor(void)
add rsp, 40
ret 0
With MSVC x64, we get almost identical codegen between code with exceptions and without. On MSVC x64, there are some optimization passes that don’t run when exceptions are turned on, and one of those passes is responsible for switching the order of the lea and mov instructions. I have observed a 1-3 cycle difference in benchmark results just from switching the order of those two instructions. In the data I am providing, this results in the exception based code being slightly faster than the abort based code.
Compiler Exceptions No exceptions
Clang 9.0 x64 1397 cycles over 100 iterations.
Counts on Intel Ivy Bridge microarchitecture
pushq %rbx
subq $16, %rsp
callq callee()
movl $0, global(%rip)
leaq 8(%rsp), %rdi
callq Dtor::~Dtor()
addq $16, %rsp
popq %rbx
retq
1199 cycles over 100 iterations.
Counts on Intel Ivy Bridge microarchitecture
pushq %rax
; no subq here
callq callee()
movl $0, global(%rip)
movq %rsp, %rdi ; movq instead of leaq
callq Dtor::~Dtor()
; no addq here
popq %rax
retq
GCC 8.2 x64 1397 cycles over 100 iterations.
Counts on Intel Ivy Bridge microarchitecture
pushq %rbx
subq $16, %rsp
call callee()
; ordering difference
movl $0, global(%rip)
leaq 15(%rsp), %rdi
call Dtor::~Dtor()
addq $16, %rsp
popq %rbx
ret
752 cycles over 100 iterations.
Counts on Intel Ivy Bridge microarchitecture
; no pushq
subq $24, %rsp
call callee()
; ordering difference
leaq 15(%rsp), %rdi
movl $0, global(%rip)
call Dtor::~Dtor()
addq $24, %rsp
; no popq
ret
x64 GCC and Clang both generate slower code for this use case when exceptions are enabled. This is a missed optimization, but isn’t inherent to the design of exceptions.

In my timings, GCC’s abort code was 2 cycles faster than GCC’s exception code. Clang’s abort code was 5 cycles faster than Clang’s exception code.

Compiler Exceptions No exceptions
Clang 9.0 ARM64 684 cycles over 100 iterations.
Counts on exynos-m5 architecture
str x19, [sp, #-32]!
stp x29, x30, [sp, #16]
add x29, sp, #16
bl callee()
adrp x8, global
add x0, sp, #8
str wzr, [x8, :lo12:global]
bl Dtor::~Dtor()
ldp x29, x30, [sp, #16]
ldr x19, [sp], #32
ret
665 cycles over 100 iterations.
Counts on exynos-m5 architecture
sub sp, sp, #32 // sub instead of str
stp x29, x30, [sp, #16]
add x29, sp, #16
bl callee()
adrp x8, global
add x0, sp, #8
str wzr, [x8, :lo12:global]
bl Dtor::~Dtor()
ldp x29, x30, [sp, #16]
add sp, sp, #32 // add instead of ldr
ret
GCC 8.2 ARM64 639 cycles over 100 iterations.
Counts on exynos-m5 architecture
stp x29, x30, [sp, -48]!
mov x29, sp
bl callee()
adrp x1, .LANCHOR0
add x0, sp, 40
str wzr, [x1, #:lo12:.LANCHOR0]
bl Dtor::~Dtor()
ldp x29, x30, [sp], 48
ret
639 cycles over 100 iterations.
Counts on exynos-m5 architecture
stp x29, x30, [sp, -32]! // different constant
mov x29, sp
bl callee()
adrp x1, .LANCHOR0
add x0, sp, 24 // different constant
str wzr, [x1, #:lo12:.LANCHOR0]
bl Dtor::~Dtor()
ldp x29, x30, [sp], 32 // different constant
ret
ARM64 GCC and Clang both generate different code for this use case when exceptions are enabled. This is likely another missed optimization. With GCC, the only penalty that is paid is slightly more stack usage.

6.2. Sad path affecting the happy path and macro-benchmark theorizing

GCC and Clang x64 place an exception "landing pad" at the end of each function that needs to run user code. I’ll be looking at the assembly, including the landing pad, of the following code:
struct Dtor {~Dtor();};
void callee();
int global;
void caller() {
  Dtor d;
  callee();
  global = 0;
}
Clang 9.0 GCC 7.4 GCC 9.2
push    rbx
sub     rsp, 16
call    callee()
mov     dword ptr [rip + global], 0
lea     rdi, [rsp + 8]
call    Dtor::~Dtor()
add     rsp, 16
pop     rbx
ret
; Begin landing pad
mov     rbx, rax
lea     rdi, [rsp + 8]
call    Dtor::~Dtor()
mov     rdi, rbx
call    _Unwind_Resume
push    rbx
sub     rsp, 16
call    callee()
lea     rdi, [rsp+15]
mov     DWORD PTR global[rip], 0
call    Dtor::~Dtor()
add     rsp, 16
pop     rbx
ret
; Begin landing pad
lea     rdi, [rsp+15]
mov     rbx, rax
call    Dtor::~Dtor()
mov     rdi, rbx
call    _Unwind_Resume
push    rbp
sub     rsp, 16
call    callee()
mov     DWORD PTR global[rip], 0
lea     rdi, [rsp+15]
call    Dtor::~Dtor()
add     rsp, 16
pop     rbp
ret
; Begin landing pad
mov     rbp, rax
jmp     .L2
caller() [clone .cold]:
.L2:
lea     rdi, [rsp+15]
call    Dtor::~Dtor()
mov     rdi, rbp
call    _Unwind_Resume

In all three cases, some cold landing pad code is directly adjacent to hot happy path code. This doesn’t cause any problems in microbenchmarks, but in theory, it could reduce instruction cache utilization in larger benchmarks. Starting in GCC 8.1, GCC will only put a tiny trampoline in the landing pad. The bulk of the landing pad code is moved into a different, cold location. This should, in theory, improve instruction cache utilization in the happy path compared to Clang and earlier GCCs, though it will still be at a minor disadvantage compared to terminate based implementations. I have no measurements to confirm or reject this theory though. [Spencer18] measured a small, non-zero performance degradation ( <1% ) in the happy path in a AAA game when turning on exceptions.

The MSVC x64 /d2FH4 implementation does not place any exception handling code near the happy path. The MSVC x86 sad path exception handling code is also far away from the happy path, but the happy path does exception bookkeeping.

7. Conclusion

Exceptions are reasonable error handling strategies in many environments, but they don’t serve all needs in all use cases. C++ needs good performance, standards conforming ways to signal failures in constructors and operator overloads. Currently, exceptions do well on the happy path, but there is no good answer for the sad path. C++ is built on the foundation that you don’t pay for what you don’t use, and that you can’t write the language abstractions better by hand. This paper provides evidence that you can write error handling code by hand that results in faster code on the sad path than the equivalent exception throwing code. In each of the sad path test cases, return values beat exceptions by more than a factor of 100.

8. Acknowledgments

Charts generated with [ECharts].

Appendix A: Additional error handling types

While researching this paper, I gathered data on some additional error handling mechanisms. I will provide a brief description of the error handling mechanism, and I will provide the data. Only a minimal amount of code and explanation will be provided though. This data could be useful in future error handling discussions.

In each of these cases, the struct involved is of the following form:

struct error_struct {
  void *error = nullptr;
  void *domain = nullptr;
};
A null error pointer indicates no error.

In the sad path graphs, exceptions are omitted so that the remaining values can be easily compared.

Write to a global (happy path)

Write to global: Median cycles per iteration. A histogram can be found here

Write through an argument (happy path)

Write to global: Median cycles per iteration. A histogram can be found here

"Return" a value (happy path)

Write to global: Median cycles per iteration. A histogram can be found here

Write to a global (sad path)

Write to global: Median cycles per iteration. A histogram can be found here

Write through an argument (sad path)

Write to global: Median cycles per iteration. A histogram can be found here

"Return" a value (sad path)

Write to global: Median cycles per iteration. A histogram can be found here

Appendix B: The build flags

MSVC

The compiler and flags are the same for 32-bit and 64-bit builds, except that the 32-bit linker uses /machine:x86 and the 64-bit linker uses /machine:x64

/d2FH4 is a critical flag for these benchmarks. Without it, the results are drastically different. See [MoFH4] for a description of this flag.

Compiler marketing version: Visual Studio 2019

Compiler toolkit version: 14.22.27905

cl.exe version: 19.22.27905

Compiler codegen flags (no exceptions): /GR /Gy /Gw /O2 /MT /d2FH4 /std:c++latest /permissive- /DNDEBUG

Compiler codegen flags (with exceptions): /EHs /GR /Gy /Gw /O2 /MT /d2FH4 /std:c++latest /permissive- /DNDEBUG

Compiler codegen flags (outcome): /GR- /Gy /Gw /O2 /MT /d2FH4 /std:c++latest /permissive- /DNDEBUG

Linker flags: /OPT:REF /release /subsystem:CONSOLE /incremental:no /OPT:ICF /NXCOMPAT /DYNAMICBASE *.obj

Clang x64

Toolchains used:

Compiler codegen flags (no exceptions): -fno-exceptions -O2 -mtune=ivybridge -ffunction-sections -fdata-sections -std=c++17 -stdlib=libc++ -static -DNDEBUG

Compiler codegen flags (exceptions): -O2 -mtune=ivybridge -ffunction-sections -fdata-sections -std=c++17 -stdlib=libc++ -static -DNDEBUG

Compiler codegen flags (outcome): -fno-exceptions -fno-rtti -O2 -mtune=ivybridge -ffunction-sections -fdata-sections -std=c++17 -stdlib=libc++ -static -DNDEBUG

Linking flags: -Wl,--gc-sections -pthread -static -static-libgcc -stdlib=libc++ *.o libc++abi.a

GCC x64

Toolchain used: GCC 7.3.1 from the Red Hat Developer Toolset 7.1

Compiler codegen flags (no exceptions): -fno-exceptions -O2 -mtune=ivybridge -ffunction-sections -fdata-sections -std=c++17 -static -DNDEBUG

Compiler codegen flags (exceptions): -O2 -mtune=ivybridge -ffunction-sections -fdata-sections -std=c++17 -static -DNDEBUG

Compiler codegen flags (outcome): -fno-exceptions -fno-rtti -O2 -mtune=ivybridge -ffunction-sections -fdata-sections -std=c++17 -static -DNDEBUG

Linking flags: -Wl,--gc-sections -pthread -static -static-libgcc -static-libstdc++ *.o

Appendix C: The code

As stated before, this isn’t the exact code that was benchmarked. In the benchmarked code, functions were placed in distinct translation units in order to avoid inlining. The following code is provided to demonstrate what the error handling code looks like.

Common support code

Expand to see code snippets All cases
struct Dtor[N] {
  extern ~Dtor[N]() {
    /* Random number of NOPs K[N] := [0,31] */
  }
};
struct inline_nops {
  inline inline_nops() {
    /* Random number of NOPs Z := [0,31] */
  }
  inline ~inline_nops() {
    /* 31 - Z NOPs */
  }
};

abort, return_val

int main() {
  /* Test happy path */
  startTimer();
  /* X NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    (void)caller[0](false);
  }
  /* 31 - X NOPs */
  endTimer();
  print(duration() / ITERATIONS);

  startTimer();
  /* Y NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    (void)caller[0](true);
  }
  /* 31 - Y NOPs */
  endTimer();
  print(duration() / ITERATIONS);
}

throw_exception

class err_exception : public std::exception {
public:
  int val;
  explicit err_exception(int e) : val(e) {}
  const char *what() const noexcept override { return ""; }
};
int main() {
  /* Test happy path */
  startTimer();
  /* X NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    try { (void)caller[0](false); } catch(...){}
  }
  /* 31 - X NOPs */
  endTimer();
  print(duration() / ITERATIONS);

  /* Test sad path */
  startTimer();
  /* Y NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    try { (void)caller[0](true); } catch(...){}
  }
  /* 31 - Y NOPs */
  endTimer();
  print(duration() / ITERATIONS);
}

Write to a global

Expand to see code snippets All cases
int global_int = 0;

return_val

int callee(bool do_err) {
  inline_nops n;
  if(do_err)
    return 1;
  return 0;
}
int caller[N](bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  int e = NEXT_FUNC(do_err);
  if(e) return e;
  global_int = 0;
  return 0;
}

abort, throw_exception

void caller[N](bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  NEXT_FUNC(do_err);
  global_int = 0;
}
abort
void callee(bool do_err) {
  inline_nops n;
  if(do_err)
    abort();
}
throw_exception
void callee(bool do_err) {
  inline_nops n;
  if(do_err)
    throw err_exception(1);
}

Write through an argument

Expand to see code snippets abort, return_val
int main() {
  int val=0;
  /* Test happy path */
  startTimer();
  /* X NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    (void)caller[0](&val, false);
  }
  /* 31 - X NOPs */
  endTimer();
  print(duration() / ITERATIONS);

  /* Test sad path */
  startTimer();
  /* Y NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    (void)caller[0](&val, true);
  }
  /* 31 - Y NOPs */
  endTimer();
  print(duration() / ITERATIONS);
}

return_val

int callee(int *, bool do_err) {
  inline_nops n;
  if(do_err)
    return 1;
  return 0;
}
int caller[N](int *val, bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  int e = NEXT_FUNC(val, do_err);
  if(e) return e;
  *val = 0;
  return 0;
}

abort, throw_exception

void caller[N](int *val, bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  NEXT_FUNC(val, do_err);
  *val = 0;
}
abort
void callee(int *, bool do_err) {
  inline_nops n;
  if(do_err)
    abort();
}
throw_exception
void callee(int *, bool do_err) {
  inline_nops n;
  if(do_err)
    throw err_exception(1);
}
int main() {
  int val=0;
  /* Test happy path */
  startTimer();
  /* X NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    try { (void)caller[0](&val, false); } catch(...) {}
  }
  /* 31 - X NOPs */
  endTimer();
  print(duration() / ITERATIONS);

  /* Test sad path */
  startTimer();
  /* Y NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    try { (void)caller[0](&val, true); } catch(...) {}
  }
  /* 31 - Y NOPs */
  endTimer();
  print(duration() / ITERATIONS);
}

"Return" a value

Expand to see code snippets return_val
int callee(int *val, bool do_err) {
  inline_nops n;
  if(do_err)
    return 1;
  *val = 0;
  return 0;
}
int caller[N](int *val, bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  int out_val = 0;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  int e = NEXT_FUNC(&out_val, do_err);
  if(e) return e;
  *val = out_val + 1;
  return 0;
}
int main() {
  /* Test happy path */
  startTimer();
  /* X NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    int val=0;
    (void)caller[0](&val, false);
  }
  /* 31 - X NOPs */
  endTimer();
  print(duration() / ITERATIONS);

  /* Test sad path */
  startTimer();
  /* Y NOPs */
  for (int i = 0; i < ITERATIONS; ++i) {
    int val=0;
    (void)caller[0](&val, true);
  }
  /* 31 - Y NOPs */
  endTimer();
  print(duration() / ITERATIONS);
}

abort, throw_exception

int caller[N](bool do_err) {
  /* 31 - K[N] NOPs */
  Dtor[N] d;
  /* NEXT_FUNC is callee when N == 7, or caller[N+1] otherwise */
  return 1 + NEXT_FUNC(do_err);
}
abort
int callee(bool do_err) {
  inline_nops n;
  if(do_err)
    abort();
  return 0;
}
throw_exception
int callee(bool do_err) {
  inline_nops n;
  if(do_err)
    throw err_exception(1);
  return 0;
}

Appendix D: Histograms

Happy path: Write to a global

Write to global: Cycles per iteration histogram

Happy path: Write through an argument

Write through an argument: Cycles per iteration histogram

Happy path: "Return" a value

Return a value: Cycles per iteration histogram

Exception sad path: Write to a global

Write to a global: Cycles per iteration histogram. throw_exception and return_val only.

Exception sad path: Write through an argument

Write through an argument: Cycles per iteration histogram. throw_exception and return_val only.

Exception sad path: "Return" a value

Return a value: Cycles per iteration histogram. throw_exception and return_val only.

Other sad path: Write to a global

Write to a global: Cycles per iteration histogram. throw_exception omitted.

Other sad path: Write through an argument

Write through an argument: Cycles per iteration histogram. throw_exception omitted.

Other sad path: "Return" a value

Return a value: Cycles per iteration histogram. throw_exception omitted.

References

Informative References

[Bakhvalov]
Denis Bakhvalov. Code alignment issues.. URL: https://dendibakh.github.io/blog/2018/01/18/Code_alignment_issues
[ECharts]
ECharts. URL: https://ecomfe.github.io/echarts-doc/public/en/index.html
[Expected]
Sy Brand. expected: C++11/14/17 std::expected with functional-style extensions. URL: https://github.com/TartanLlama/expected
[MCA]
llvm-mca - LLVM Machine Code Analyzer. URL: https://llvm.org/docs/CommandGuide/llvm-mca.html
[MoFH4]
Modi Mo. Making C++ Exception Handling Smaller On x64. URL: https://devblogs.microsoft.com/cppblog/making-cpp-exception-handling-smaller-x64/
[Outcome]
Niall Douglas. outcome: Provides very lightweight outcome<T> and result<T> (non-Boost edition). URL: https://github.com/ned14/outcome
[Spencer18]
Michael Spencer. C++Now 2018: How Compilers Reason About Exceptions. URL: https://www.youtube.com/watch?v=C4gpj-MDstY