Abstract: Use of uninitialized variables or data members is a well known source of undefined behavior (UB) in C++. Tools like MemorySanitizer [MSAN] rely on UB to detect use of uninitialized memory and provide a diagnostic (tools are free to assume that UB is never intentional). Programmers that use these tools may resist adding initializers when a default value is not desirable since doing so interferes with the ability to diagnose an unintended use before an appropriate value has been determined and assigned. Compilers and static analysis tools diagnose uninitialized variables and data members as potential sources of UB; these diagnostics are undesriable when omission of an initializer is intentional. Failure to identify and correct use of uninitialized memory results in a program that exhibits UB at run-time. The observed behavior may not be consistent if the uninitialized memory contents differ from one execution of the program to another; this complicates debugging. Enclosed is a proposal to allow a programmer to provide a "poisoned" or "tainted" value in an initializer or expression so as to avoid UB without compromising the ability of tools to diagnose use of such values.

Poisoned values

Introduction

Original proposal message: https://lists.isocpp.org/std-proposals/2021/06/2682.php

Motivation

Discuss LLVM and Coverity. https://lists.llvm.org/pipermail/llvm-dev/2021-June/151036.html Apparently my replies to the mailing list don't preserve quoting properly :(

Design Options

Should support for other kinds of tainted values be supported? For example, the ability to mark the buffer filled by a call to fread()? If so, support is needed to enable inspection of tainted values by a sanitizer and to declare the values no longer tainted.

Example:

struct S {
  S() {}
  S(int v) : dm(v) {}

  // dm may only be used if a value was provided during construction.
  int dm = POISON(-1);
};

int f(bool b) {
  S s = b ? S() : S(42);
  // The following access of s.dm violates preconditions if b is true.
  // Would like MemorySanitizer to diagnose as an access of an uninitialized
  // data member.  But if MemorySanitizer is not enabled, then would like to
  // reliably return -1.
  return s.dm;
}

It may be useful to annotate a portion of an initializer:

struct aggregate {
  const char *desc;
  int value;
};
aggregate a = { "the thing", POISON(-1) };

Likewise, applying poison outside of initialization context could be useful:

int *p = new int;
...
delete p;
p = POISON(nullptr);

Poison must be propagated, but copying it must not be considered a defect. According to this paper , this is consistent with how MemorySanitizer works. When copying a poisoned value, it propagates the poison.

struct S {
  S() {}
  S(int v) : dm(v) {}

  // dm may only be used if a value was provided during construction.
  int dm = POISON(-1);
};

Proposal

Implementation Experience

Acknowledgements

Thank you to Ville Voutilainen for initial proposal feedback.

References

[N4885] "Working Draft, Standard for Programming Language C++", N4885, 2021.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/n4885.pdf
[MSAN] The Clang Team, "MemorySanitizer", 2021.
https://clang.llvm.org/docs/MemorySanitizer.html

Formal Wording

Hide deleted text

These changes are relative to N4885 [N4885]