Document number |
ISO/IEC/JTC1/SC22/WG21/D1605R0 |
Date |
2019-09-18 |
Reply-to |
Rene Rivera, grafikrobot@gmail.com |
Audience |
WG21 |
1. Abstract
This proposes to add a core language facility to control the class data member order layout without otherwise impacting class definitions.
3. Introduction
In many domains where C++ thrives there is a contention with the desire for optimal data and code vs clear definitions. It is a desire that is hampered by the member layout rules in C++. Developers are faced with the choice of having well grouped and relevant information in class definitions and suboptimal memory use, or having memory use and incoherent class definitions. This proposal aims to add a facility to reconcile both goals of class design. This proposal hopes to achieve these goals:
-
Fine grain member layout control.
-
Keep member access control for access control.
To solve the problem we need to first see the problem. We can use a working example to work through what we need to address. Let start with a common use case of having a class with flags and values to enable/disable different features of it (highly abstracted):
class A
{
public:
// Feature A allows for using A.
// This feature is optional and
// is used when feature_a_enabled
// == true. The feature_a_value
// is a value in the range
// [0,15000].
bool feature_a_enabled;
unsigned int feature_a_value;
// Feature B allows for using B.
// This feature is optional and
// is used when feature_b_enabled
// == true. The feature_b_value
// is a value in the range
// [0,60000].
bool feature_b_enabled;
unsigned int feature_b_value;
};
We can also look at a likely function that uses the members to do a calculation. In this case we’ll look at minimal function:
unsigned int A_q(A const & a)
{
return a.feature_a_value + a.feature_b_value;
}
As expected the resulting assembly code for this is minimal and efficient:
A_q(A const&):
mov eax, DWORD PTR [rdi+4]
add eax, DWORD PTR [rdi+12]
ret
The problem comes in when we look at the data size of the class:
sizeof(A) == 16
Sixteen bytes might seem small. But if we are dealing with a large number items the data size becomes a serious consideration:
sizeof(A[1024*1024]) == 16777216
When faced with that result and the constraints of some systems, say an
embedded system with only 64MiB of total RAM, having one data structure
take up 1/3 of your system is not acceptable. And programmers have used
various techniques to ameliorate such waste. The most common being rearranging
members to minimize the alignment padding. For our example we can place the
bool
members last and together to allow all the members to be packed:
class A
{
public:
// Feature A allows for using A.
// This feature is optional and
// is used when feature_a_enabled
// == true. The feature_a_value
// is a value in the range
// [0,15000].
bool feature_a_enabled;
unsigned int feature_a_value;
// Feature B allows for using B.
// This feature is optional and
// is used when feature_b_enabled
// == true. The feature_b_value
// is a value in the range
// [0,60000].
bool feature_b_enabled;
unsigned int feature_b_value;
};
class A
{
public:
// Feature A allows for using A.
// This feature is optional and
// is used when feature_a_enabled
// == true. The feature_a_value
// is a value in the range
// [0,15000].
unsigned int feature_a_value;
// Feature B allows for using B.
// This feature is optional and
// is used when feature_b_enabled
// == true. The feature_b_value
// is a value in the range
// [0,60000].
unsigned int feature_b_value;
bool feature_a_enabled;
bool feature_b_enabled;
};
With that arrangement we still have the minimal, optimal, access for our
prototypical A_q
function:
A_q(A const&):
mov eax, DWORD PTR [rdi+4]
add eax, DWORD PTR [rdi+12]
ret
A_q(A const&):
mov eax, DWORD PTR [rdi]
add eax, DWORD PTR [rdi+4]
ret
But more importantly we’ve reduced the overall size of the structure.
sizeof(A) == 16
sizeof(A) == 12
sizeof(A[1024*1024]) == 16777216
sizeof(A[1024*1024]) == 12582912
This only works when we restrict ourselves to follow the ORDERRULE [1]. Which is not always possible, and almost always not desired. We can go further in our space saving though. We can turn the data structure into a bitfield since we know the numerical limits of all our data members. And with some trial and error, and some knowledge of what compiler and system we are supporting can further optimize not just the size but minimize the impact this will have on the generated code. We can therefore do the following:
class A
{
public:
// Feature A allows for using A.
// This feature is optional and
// is used when feature_a_enabled
// == true. The feature_a_value
// is a value in the range
// [0,15000].
bool feature_a_enabled;
unsigned int feature_a_value;
// Feature B allows for using B.
// This feature is optional and
// is used when feature_b_enabled
// == true. The feature_b_value
// is a value in the range
// [0,60000].
bool feature_b_enabled;
unsigned int feature_b_value;
};
class A
{
public:
// Feature A allows for using A.
// This feature is optional and
// is used when feature_a_enabled
// == true. The feature_a_value
// is a value in the range
// [0,15000].
// Feature B allows for using B.
// This feature is optional and
// is used when feature_b_enabled
// == true. The feature_b_value
// is a value in the range
// [0,60000].
unsigned int feature_b_value:16;
unsigned int feature_a_value:14;
bool feature_a_enabled:1;
bool feature_b_enabled:1;
};
A_q(A const&):
mov eax, DWORD PTR [rdi+4]
add eax, DWORD PTR [rdi+12]
ret
AA_q(A const&):
movzx eax, WORD PTR [rdi+2]
movzx edx, WORD PTR [rdi]
and eax, 16383
add eax, edx
ret
Even though we’ve added some instructions to deal with the bit field we are still rather optimal in our access. What do we gain in terms of size?
sizeof(A) == 16
sizeof(A) == 4
sizeof(A[1024*1024]) == 16777216
sizeof(A[1024*1024]) == 4194304
This is a now in the palatable range.. We are tracking 1MiB objects in 4MiB. This, of course, comes at a price. We have now entirely detached the documentation in the class with the members they refer to. And making it even worse, the members are seemingly randomly arranged for the casual observer. This is ripe for causing all kinds of future maintenance problems for whomever is trying to understand this code.
There has been at least one previous attempt to solve this problem. P1112 [2] proposes a class level attribute to classify the kind of member layout to apply.
4. Proposal
We propose adding an optional layout:
labeled section to class definitions
wherein one list the order of members, already declared, in the class. The
layout:
section would:
-
List the names of any members one wishes to specific the order of.
-
Members listed would come first in the class member layout.
-
Members not listed would follow with the existing layout rules.
-
Member layout order does not alter initialization.
To continue with our example from above, the new class declaration using this feature could be:
class A
{
public:
// Feature A allows for using A.
// This feature is optional and
// is used when feature_a_enabled
// == true. The feature_a_value
// is a value in the range
// [0,15000].
// Feature B allows for using B.
// This feature is optional and
// is used when feature_b_enabled
// == true. The feature_b_value
// is a value in the range
// [0,60000].
unsigned int feature_b_value:16;
unsigned int feature_a_value:14;
bool feature_a_enabled:1;
bool feature_b_enabled:1;
};
class A
{
public:
// Feature A allows for using A.
// This feature is optional and
// is used when feature_a_enabled
// == true. The feature_a_value
// is a value in the range
// [0,15000].
bool feature_a_enabled:1;
unsigned int feature_a_value:14;
// Feature B allows for using B.
// This feature is optional and
// is used when feature_b_enabled
// == true. The feature_b_value
// is a value in the range
// [0,60000].
bool feature_b_enabled:1;
unsigned int feature_b_value:16;
layout:
// This layout gives us a 4 byte
// structure size with minimal
// additional access instructions.
// When compiling with x86-64
// gcc 9.2 with -O3.
feature_b_value;
feature_a_value;
feature_a_enabled;
feature_b_enabled;
};
A_q(A const&):
movzx eax, WORD PTR [rdi+2]
movzx edx, WORD PTR [rdi]
and eax, 16383
add eax, edx
ret
A_q(A const&):
movzx eax, WORD PTR [rdi+2]
movzx edx, WORD PTR [rdi]
and eax, 16383
add eax, edx
ret
sizeof(A) == 4
sizeof(A) == 4
sizeof(A[1024*1024]) == 4194304
sizeof(A[1024*1024]) == 4194304
As we can see the effect of optimizing the layout for the application use case
is preserved. But the drawbacks of the optimization are removed. The layout:
now contains the the members of the class in the order we require they be in.
Features of this proposal:
-
Puts the control of member layout where it matters, in the user’s hands. Where the particular tradeoffs of memory vs. performance can be made.
-
The layout can’t be ignored by the compiler and hence provides ABI stability across compiler version and possibly across compilers.
-
Coexists with existing
#pragma pack
compiler feature as it makes the ordering orthogonal from the packing. -
Doesn’t override alignment and addressing requirements, again, because the ordering control is orthogonal. For example from use of
alignas
. -
Simple, minimal, and clear syntax makes it easy to understand intent and effect.
-
Allows control of individual bit-field members within the same syntax as other members.
-
The layout declarations can be easily documented to provide rationales for users of the class.
-
Does not, definitionally, force override ordering of all members and hence allows for minimal targeted optimizations.
6. Design Decisions and Considerations
6.1. Why not have algorithmic layouts?
P1112 [2] proposes a mechanism to have "smart" algorithmic layout
control. It proposes to add a [[layout(?)]]
attribute to the class to select
from an existing set of algorithmic layouts like: smallest
, declorder
,
cacheline
, and so on. A key problem with an algorithmic approach is the
increased risk of ABI violations as pointed out in P1112.[2]
Dealing with the C++ ABI is difficult enough as it is. We would like to
avoid adding to the uncertainly and complexity of the C++ ABI.
6.2. Should alignment control be allowed in the layout declarations?
We need to consider if other member data specifications that affect size of
the class should be consolidated, i.e. allowed, in the layout declaration
section. For example alignas
could be allowed as such:
class A
{
public:
bool a_f;
int a;
bool b_f;
int b;
layout:
a;
alignas(16) b;
a_f;
b_f;
};
Poll: Should alignment control be allowed in the layout section, in this proposal? Poll: Should alignment control be allowed in the layout section, in a different proposal?
6.3. Should layout order be reflect-ed?
There needs to be some thought and consideration given to how layout ordering can or should be available through reflection.
6.4. Should we additionally specify padding options?
It would be interesting to consider adding syntax to formalize both general
and specific inter-member padding. I.e. it could be of benefit to extend this
proposal to formalize and improve the common #pragma pack
feature.
Poll: Should padding specifications, in some form, be added to this proposal? Poll: Should padding specifications, in some form, be added to a different proposal?